US9619301B2 - Multi-core memory model and speculative mode processor management - Google Patents
Multi-core memory model and speculative mode processor management Download PDFInfo
- Publication number
- US9619301B2 US9619301B2 US14/110,140 US201214110140A US9619301B2 US 9619301 B2 US9619301 B2 US 9619301B2 US 201214110140 A US201214110140 A US 201214110140A US 9619301 B2 US9619301 B2 US 9619301B2
- Authority
- US
- United States
- Prior art keywords
- processor
- core
- processor cores
- processor core
- processing thread
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related, expires
Links
- 230000015654 memory Effects 0.000 title claims abstract description 150
- 238000012545 processing Methods 0.000 claims abstract description 36
- 238000000034 method Methods 0.000 claims abstract description 30
- 230000008569 process Effects 0.000 claims description 7
- 239000000835 fiber Substances 0.000 description 45
- 230000007246 mechanism Effects 0.000 description 25
- 238000013459 approach Methods 0.000 description 14
- 230000007717 exclusion Effects 0.000 description 6
- 238000004891 communication Methods 0.000 description 5
- 230000003190 augmentative effect Effects 0.000 description 3
- 238000013461 design Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 230000007704 transition Effects 0.000 description 3
- 238000007792 addition Methods 0.000 description 2
- 230000006399 behavior Effects 0.000 description 2
- 230000006837 decompression Effects 0.000 description 2
- 230000000717 retained effect Effects 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 230000001010 compromised effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 230000000135 prohibitive effect Effects 0.000 description 1
- 230000003362 replicative effect Effects 0.000 description 1
- 238000000638 solvent extraction Methods 0.000 description 1
- 230000001052 transient effect Effects 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/52—Program synchronisation; Mutual exclusion, e.g. by means of semaphores
- G06F9/524—Deadlock detection or avoidance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/52—Program synchronisation; Mutual exclusion, e.g. by means of semaphores
- G06F9/526—Mutual exclusion algorithms
- G06F9/528—Mutual exclusion algorithms by using speculative mechanisms
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30076—Arrangements for executing specific machine instructions to perform miscellaneous control operations, e.g. NOP
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30181—Instruction operation extension or modification
- G06F9/30189—Instruction operation extension or modification according to execution mode, e.g. mode flag
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3824—Operand accessing
- G06F9/3834—Maintaining memory consistency
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3836—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
- G06F9/3842—Speculative instruction execution
Definitions
- the present invention relates to multi-core processors and their method of operation.
- the invention relates to efficient memory access mechanisms for multi-core processors.
- a “multi-core processor” is a single computing component comprising a number of independent processors each of which is able to read and execute program instructions.
- the cores may be integrated onto a single chip, or may be discrete components interconnected together.
- a multi-core processor allows different or the same sets of instructions to be executed in parallel, significantly increasing processing power as compared to single core processors.
- FIG. 1A illustrates schematically a single-core processor memory architecture comprising a main memory (off chip) and a single-core on-chip processor with layer 1 (L1) and layer 2 (L2) caches.
- FIG. 1B illustrates schematically a multi-chip processor architecture again with a (common) off chip main memory.
- a particular problem that is encountered with multi-core processors concerns memory access. This is known as the “shared state problem” and arises when individual cores of the system try to access the same data (shared data) from the some location (of a memory) at the some time. If two different cores of the system are allowed to access the same data at the same time, the consistency of that data may be compromised and the system becomes unreliable.
- Locks are resources that may be owned by only one processing instance (processor or thread). If a core acquires “ownership” of a lock, that core is guaranteed exclusive access to the underlying resources (such as data).
- TM software transactional memory
- concurrent access to data by cores is allowed.
- the first accessing core is stopped and all changes performed by that core are rolled back to a safe state. Thereafter, only the second accessing core is allowed to act on the shared data. After the second accessing core has finished acting on the shared data, the first accessing core is allowed to act on the shared data.
- this may be considered non-composable, i.e., two pieces of otherwise correct program code, when combined, may not perform correctly, resulting in hard-to-detect deadlock or live-lock situations.
- the transactional memory approach while composable, results in a large processing overhead (usually requiring hardware support).
- the transactional memory approach is not scalable, i.e., addition of further cores to an existing system results in lower performance.
- the multi-core system may become increasingly inefficient as the number of cores trying to access the same data is increased.
- WO2010/020828 describes a method and architecture for sharing data in a multi-core processor architecture.
- Foong. A et al An Architecture for Software-based iSCSI on Multiprocessor Servers describes the use of a software implementation of iSCSI in the context of chip multiprocessing (CMP).
- CMP chip multiprocessing
- each processor core is provided with its own private cache and the device comprises or has access to a common memory.
- the method comprises executing a processing thread on a selected first processor core.
- the method further comprises implementing a normal access mode for executing an operation within said processing thread and comprising allocating sole responsibility for writing data to given blocks of said common memory, to respective processor cores.
- the method further comprises implementing a speculative execution mode switchable to override said normal access mode.
- This speculative execution mode comprises, upon identification of an operation within said processing thread, transferring responsibility for performing said operation to a plurality of second processor cores, and optionally performing said operation on the first processor core as well. This includes copying data from a given block of said common memory to the private cache of each of said second processors and optionally said first processor. Each of said second processors and optionally said first processor is allowed to modify the data in its own private cache without making the changes visible to other processors.
- sole responsibility for writing data to said given block of said common memory is temporarily allocated to one of said second processor cores, or optionally to said first processor core if said operation has been performed on said first processor core, whichever one is deemed to have successfully performed said operation, and execution of said processing thread at said first processor core is resumed.
- this may comprises the step of identifying within said processing thread an operation that will or may result in the writing of data to a block of said common memory for which a second processor core is responsible. Execution of the processing thread on the first processor core is suspended and responsibility for performing said operation transferred to said second processor core. This includes copying data between the memory block allocated to the second processor core and the private cache of that second processor core. Upon completion of said operation at said second processor core, execution of said processing thread is resumed at said first processor core.
- Embodiments of the invention enable the normal mode to be employed when speculative execution is not required.
- the advantages of the normal mode, discussed above, can be obtained in this case.
- the normal mode can be suspended to allow any one of the operating multi-cores to access the appropriate block(s) in the common memory.
- the second cores may perform said operation under respective, different sets of assumptions with the successful core being chosen based upon a determination of a correct set of assumptions.
- the steps of transferring responsibility for performing said operation may comprise, for the or each second processor core, adding said operation to a task queue for the second processor core, the second processor core executing the queued operations in order.
- the second processor may return an operation completion message to said first processor.
- the step of identifying within the processing thread an operation that will or may result in the writing of data to a block of said common memory for which a second processor core is responsible may comprise identifying within compiled code an explicit instruction identifying a block or set of blocks of said common memory.
- a switch from said normal mode to said speculative mode may be triggered by identification of an explicit instruction within compiled code to switch to said speculative mode.
- the method may comprise identifying within compiled code a number of processor cores on which said operation is to be performed, and performing the operation on that identified number of cores.
- the successful core may be determined on the basis of a criterion or criteria identified in the compiled code.
- a multi-core processor where each processor core is provided with its own private cache and the device comprises or has access to a common memory.
- the processor is configured to execute a processing thread on a selected first processor core, and to implement a normal common memory access mode for executing an operation within a processing thread and comprising allocating sole responsibility for writing data to given blocks of said common memory, to respective processor cores.
- the processor is further configured to implement a speculative execution mode switchable to override said normal access mode.
- the speculative execution mode comprises, upon identification of said operation within said processing thread, transferring responsibility for performing said operation to a plurality of second processor cores, and optionally performing said operation on the first processor core as well, including copying data from a given block of said common memory to the private cache of each of said second processors and optionally said first processor.
- Each of said second processors and optionally said first processor is allowed to modify the data in its own private cache without making the changes visible to other processors.
- sole responsibility for writing data to said given block of said common memory is temporarily allocated to one of said second processor cores, or optionally to said first processor core if said operation has been performed on said first processor core, whichever one is deemed to have successfully performed said operation, and execution of said processing thread at said first processor core is resumed.
- FIG. 1A illustrates schematically a conventional single-core processor architecture
- FIG. 1B illustrates schematically a conventional multi-core processor architecture
- FIG. 2 illustrates a state transition diagram for memory blocks according to an improved multi-core processor architecture
- FIG. 3 is a flow diagram showing a multi-core processor operation process including both a normal and a speculative operating mode
- FIG. 4 illustrates schematically a processor architecture for implementing the process of FIG. 3 .
- a speculative execution typically involves executing the same code in parallel on two or more cores of a multi-core processor, each execution relying upon different data, e.g. conditions.
- a speculative execution may be initiated, for example, by a primary core (executing the main processing thread) prior to a data result being computed or received by the primary core.
- Each secondary core is provided with the operation code and possible data result.
- That core can select the appropriate secondary core operating on that result, i.e. the “winning” core.
- the secondary core may by that time have completed its task or will at least have begun its execution. At this point, ongoing execution of the task by any other secondary cores may be aborted.
- This architecture can simultaneously support a shared memory model as well as software driven speculative execution, without the overhead generally associated with traditional cache coherence protocols. It is expected that the architecture will provide enhanced cache re-use efficiency and hence improved memory bandwidth.
- the architecture presented here builds upon the architecture of WO2010/020828 by introducing a new memory and cache hierarchy and consistency model that relies heavily on input from software to simplify the cache architecture, improve cache usage efficiency (and, implicitly, memory bandwidth utilization) and provide support for additional mechanisms including software-driven coarse grain speculative execution.
- the new mechanisms that are described also provide simple architectural support for hybrid software-hardware implementation of transactional memory.
- the proposed architecture makes use of the following features:
- each tile acts independently as a single uni-processor system.
- Each tile consists of a processor core that has its own private cache hierarchy, consisting of private data and code L1 caches and a private L2 cache that is not shared with any other tile and does not participate in any cache coherence mechanism.
- the cache hierarchy of each tile is in fact designed as in a single core chip's case, for bridging the speed of the core and the speed of the memory and there's no coherency mechanism available between the different caches.
- the interconnect architecture between the tiles is orthogonal to the design of the memory system: there is a need to have a communication mechanism between tiles, but the actual design of it is not relevant as long as it provides a reliable medium for transferring messages between the tiles and allows each tile's cache controller to access the main memory.
- a first principle of the proposed architecture is that caches are distributed and each core's private cache is organized as a single-core machine's cache, acting as a bridge between the memory's access speed and the speed of the processor.
- the first is the explicit marking at the source code level of the code chunks that access shared memory areas; the second one is the implementation of the principle of moving the computation to the data, rather than replicating the data.
- Marking at the source code level is the basic mechanism that a programmer shall use to convey—to the compiler and the hardware—information about accesses to shared memory in terms of location in the code and accessed memory blocks. These marked blocks are referred to here as “transactions” (as the semantics and the marking itself are very similar to the definition of transactions: the complete code block will either be executed fully or will be rolled back and re-executed at a later time).
- the beginning of the code segment that accesses one or several shared memory blocks is marked with “BEGIN TRANSACTION”, while the end of it is marked with “END TRANSACTION”.
- the marking includes the list of shared memory blocks that will be accessed within the transaction. To distinguish over transactions used to model speculative execution, these transactions are termed “sharing transactions”.
- This marking of the code allows the compiler to map out dependencies between transactions as well as proper placement—home location—of shared memory blocks across available hardware. The computation can then be moved to the data.
- Transactions are grouped by the compiler into “transaction groups”.
- Group membership is defined by a simple rule: a transaction belongs to a group if and only if it accesses at least one shared memory block accessed by at least one other transaction in the group.
- transaction groups represent dependencies between transactions in terms of the shared memory blocks that are accessed.
- Each transaction group is assigned a processor core—the “resource guardian” or home location—on which all the transactions in the transaction group will be executed. Implicitly, this core is also the home location of all of the shared memory blocks accessed by transactions in the transaction group, in the sense that all accesses to that memory block will happen on this core (physically the memory block may still be allocated anywhere in the memory).
- One core can be home to multiple transaction groups, but the distribution of a transaction group across multiple cores has a number of issues that are not trivial to address.
- This mechanism turns the resource guardian cores into a special implementation of the lock concept: the execution of the transactions in the associated transaction group must be “serialized” in order to enforce mutual exclusion of the execution and this is precisely what is being achieved by assigning transaction groups to dedicated processor cores.
- the resource guardian cores will implement a local queue of transactions that will be used to store transactions that have to wait for previously issued transactions to complete. Such partitioning and moving of the computations to where the data is located also results in a memory consistency model that can guarantee global ordering of reads and writes.
- Nested transactions may lead to dead-lock situations, i.e. in the case that a nested transaction has to execute on another core. Suspending the current resource guardian core and off-loading execution to another core can lead to a circular dependency between resource guardians, i.e., a dead-lock situation.
- Vajda A. Handling of Shared Memory in Many-core systems without Locks and Transactional Memory. 3 rd Workshop on Programmability Issues for Multi-core Computers
- MULTIPROG a method for detecting such a deadlock and for removing it through a rollback mechanism.
- some form of transactional memory was proposed as the solution for rollback; here, a new method based on the usage of the L2 cache will be elaborated upon.
- Any shared read/write memory block can at any given time be present in only one tile's cache. The same memory block is permitted to be present in multiple caches if and only if it is accessed for reading only by all cores, all of the time.
- Thread level coarse grained speculative execution, augmented with semantic information provided by the programmer, has recently been proposed as a solution to improve the performance of sequential applications on multi-core and many-core architectures [Vajda A. Stenström P. Semantic Information based Speculative Parallel Execution. Proc. 3 rd Workshop on Parallel Execution of Sequential Programs on Multi-Core Architecture].
- a mechanism that can complement the approach described in the previous section (normal mode) to provide support for spawning, squashing and committing speculative threads of execution will now be considered.
- a speculative fiber is essentially a thread created at the request of the programmer—or based on programmer provided hints—that will execute an arbitrary part of the program speculatively, assuming that certain conditions (such as values of memory locations) will be met. The result of the complete execution is kept invisible until it can be decided—either by the hardware or explicitly by the programmer—whether the assumed conditions were met.
- a concrete application of the concept of speculative fibers is described in Vajda A, Stenström P. Semantic Information based Speculative Parallel Execution. Proc. 3 rd Workshop on Parallel Execution of Sequential Programs on Multi-Core Architecture, where it is successfully applied to speeding up Huffman decompression.
- a transaction is characterized by the following features:
- the transaction is used in two contexts: for accessing shared memory and for performing speculative execution of parts of a program.
- a “PRELUDE” code segment can be defined by the programmer to set the context for the fiber's execution; in this segment a special variable—“_fiber”—can be accessed that gives the index of the fiber that can be used to decide on the fiber specific adaptations.
- a special code segment marked with “ELECTION”—shall be provided by the programmer to choose which fiber's result—if any—will be retained.
- This code segment shall set the value of the “_fiber” special variable to the winning fiber's identity (or an undefined value, if there is no winner).
- the definition of a speculative fiber might be as follows:
- memory blocks can be in one of the following states:
- FIG. 2 illustrates the state transition diagram for memory blocks.
- a Private memory block will always be cached in the private cache of the tile on which the thread to which it belongs is executed; a Read-shared block can be cached on any tile that accesses it.
- the new mechanism will be applied: it will be cached only on its resource guardian and the execution of the threads accessing it will always be moved to the resource guardian.
- the Speculative state is a transient state applicable during speculative execution of a transaction or operation.
- Both Private and Write-shared memory blocks can transition into this state, in case the processing thread that is the owner of the memory block (for Private memory blocks), or one of the processing threads having access to the Write-shared memory block, enters a speculative transaction.
- the execution of the thread is moved to the resource guardian where it will only be executed once all the other transactions preceding it have been executed.
- the cache controller acts exactly as in a single processor system.
- the core can steer the pre-fetching process of the cache controller based on the content of its queue of transactions to be executed: the cache controller, if possible, can pre-fetch the code and data needed for the next transaction while the current one is still executing. Also, in order to guarantee that rollbacks can be executed safely, after each transaction that is successfully executed, the content of the cache has to be flushed back to the main memory.
- FIG. 4 illustrates schematically the multi-core processor comprising a plurality of cores 2 each having private caches L1, L2.
- the processor has access to a common memory 3 .
- the solid lines coupling the private caches and the common memory blocks indicate the home cache relationships when the processor is operated in the normal mode.
- the dashed lines indicate that, in the speculative mode, any of the caches (at least any of those involved in the speculative mode execution) may access any of the blocks in the common memory dependent of course upon access being restricted to the “winning” cache.
- speculative mode of execution can also be used to implement transactional memory semantics.
- transactional memory can be modeled as a special case of speculative execution, with some small modifications:
- This method will seek to ensure that at least one transaction—the last one to complete—will make progress, as all the previous ones have probably rolled back due to the detection of a conflict.
- Huffman coding [Huffman, D., A method for the construction of minimum redundancy codes. In Proc. IRE , vol. 40], this process is a lossless compression algorithm, relying on building a binary tree where leaves represent symbols from the data that is being compressed; each symbol is assigned a code based on the path to the corresponding leaf, from the root. The algorithm relies on assigning shorter codes for more frequent symbols. Decoding of Huffman-encoded streams is considered hard to parallelize. Indeed, it is impossible to split up the compressed stream into chunks, as there are no reliable ways to detect where a new code section starts.
- each of the fibers can execute Huffman decoding within a speculative transaction and safely write into the main output buffer; our proposed mechanism will make sure that these writes are kept local to the core which executes the speculative fiber.
- the local copies of all the other cores will simply be discarded and the “winner” core will commit the changes back to the main memory. Relying on this underlying mechanism enables the impact on the source code to be minimal: the call to the decompression function is simply marked as a speculative transaction; all other “housekeeping” can be taken care of by the underlying run-time system.
- the approach described here can provide safe shared memory support, transactional memory semantics and support for software driven speculative execution in one integrated solution.
- the approach involves: for shared memory applications, sharing on hardware level is restricted and software driven migration of computation relied upon. For sequential, single threaded code executed speculatively on the other hand, multiple cached versions of the some memory areas is allowed, augmented with software controlled selection of a winning version as the basis for maintaining consistency.
- a key insight that is promoted is that hardware-only solutions—even augmented with compiler support—are not sufficient.
- This approach may be developed by exploring how multi-threaded cores can be exploited to improve the parallelism in, for example, execution on resource guardian cores.
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
Description
-
- Each core has a certain amount of private cache. These private caches may have different sizes for different cores.
- No coherence protocol is used between these private caches. Rather, each and every memory block within a main memory is mapped to one and only one core. The private cache to which a memory block is mapped is referred to as that core's “home cache”. As a result, in a “normal mode” of operation, a memory block is only accessible by the core which owns it.
- Access to the main memory can happen in two modes, namely:
- Normal access mode: relevant content of the main memory is cached in a particular home cache and the execution thread needing access to it will have to be executed on the core owning the cache. This is for example as described in WO2010/020828.
- Speculative access mode: the relevant content of the main memory is cached and written to multiple private caches. However, once the updates to the cached copies are completed, only one (or none) of the modified versions is written back to the main memory.
-
- The modified versions of the memory blocks (held in the private caches) are not committed back to the main memory, i.e., are not made visible to other cores, before the execution of all speculative fibers spawned on behalf of the speculative transactions are concluded and the selection of the correct variant is performed.
- At the end of the execution of the speculative fibers, one speculative fiber is selected as the winner; its modified version of the shared memory block is committed (made visible to other cores), while all the other speculative fibers will be “squashed”. It is possible that no fiber is selected as the winner, in which case the complete speculative execution is discarded
The software is in charge of deciding which speculative fiber to select as winner and the decision is communicated to the hardware, for example using a special instruction.
-
- The code it shall execute;
- The memory blocks it will access that may be accessed by other transactions concurrently;
- A type: a sharing transaction has to be executed in mutual exclusion with regards to other transactions acting on at least one of its shared memory blocks, while a speculative transaction will be executed over multiple cores simultaneously, but only one of the executions (or none) will be retained, all others being squashed.
-
- There shall be a mechanism to define the total number of desired fibers for the transaction.
- Each speculative fiber has to execute within a different context from the other fibers, such as different value assumptions. To set the context up, a mechanism needs to be provided for the programmer to define the variations specific to each fiber.
- At the end of the execution of all speculative fibers, a winner fiber needs to be selected.
-
- BEGIN TRANSACTION FIBERS=16 <list of memory blocks>
- PRELUDE
- <modify some memory based on the value of _fiber>
- END
- <actual code that is executed speculatively>
- ELECTION
- <decide which speculative fiber shall be kept by setting the value of _fiber accordingly>
- END
- END TRANSACTION
-
- Idle: the memory block is not in use.
- Private: the memory block is allocated and accessed by one single thread.
- Read-shared: the memory block is allocated, but it is read-only.
- Write-shared: the memory block is allocated and it is accessed both for reading and writing.
- Speculative: the memory block is accessed as part of an ongoing speculative execution.
-
- Different fibers may execute different transactions—it's not required that the same transaction is executed by all fibers
- The system needs to keep track of all changes to blocks marked as Speculative
With these changes, transactional memory can be implemented as follows: - When a transaction is entered, the memory blocks it accesses are marked as Speculative and the transaction is executed as a speculative fiber, on one core; if the memory blocks are already marked Speculative, there may be other ongoing transactions.
- At the end of the transaction, the ELECTION section will check if any of the blocks were modified elsewhere; if not, the transaction is committed, otherwise it is rolled bock.
Claims (9)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/110,140 US9619301B2 (en) | 2011-04-06 | 2012-04-05 | Multi-core memory model and speculative mode processor management |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201161472268P | 2011-04-06 | 2011-04-06 | |
US201161472874P | 2011-04-07 | 2011-04-07 | |
PCT/EP2012/056282 WO2012136766A1 (en) | 2011-04-06 | 2012-04-05 | Multi-core processors |
US14/110,140 US9619301B2 (en) | 2011-04-06 | 2012-04-05 | Multi-core memory model and speculative mode processor management |
Publications (2)
Publication Number | Publication Date |
---|---|
US20140033217A1 US20140033217A1 (en) | 2014-01-30 |
US9619301B2 true US9619301B2 (en) | 2017-04-11 |
Family
ID=45952538
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/110,140 Expired - Fee Related US9619301B2 (en) | 2011-04-06 | 2012-04-05 | Multi-core memory model and speculative mode processor management |
Country Status (2)
Country | Link |
---|---|
US (1) | US9619301B2 (en) |
WO (1) | WO2012136766A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160117193A1 (en) * | 2014-10-22 | 2016-04-28 | International Business Machines Corporation | Resource mapping in multi-threaded central processor units |
Families Citing this family (41)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2014031540A1 (en) * | 2012-08-20 | 2014-02-27 | Cameron Donald Kevin | Processing resource allocation |
US9424228B2 (en) | 2012-11-01 | 2016-08-23 | Ezchip Technologies Ltd. | High performance, scalable multi chip interconnect |
US9183144B2 (en) | 2012-12-14 | 2015-11-10 | Intel Corporation | Power gating a portion of a cache memory |
GB2514956B (en) * | 2013-01-21 | 2015-04-01 | Imagination Tech Ltd | Allocating resources to threads based on speculation metric |
US10275593B2 (en) * | 2013-04-01 | 2019-04-30 | Uniquesoft, Llc | Secure computing device using different central processing resources |
CN104252391B (en) * | 2013-06-28 | 2017-09-12 | 国际商业机器公司 | Method and apparatus for managing multiple operations in distributed computing system |
CN104572506B (en) * | 2013-10-18 | 2019-03-26 | 阿里巴巴集团控股有限公司 | A kind of method and device concurrently accessing memory |
US10339023B2 (en) | 2014-09-25 | 2019-07-02 | Intel Corporation | Cache-aware adaptive thread scheduling and migration |
CN105740164B (en) * | 2014-12-10 | 2020-03-17 | 阿里巴巴集团控股有限公司 | Multi-core processor supporting cache consistency, reading and writing method, device and equipment |
GB2533415B (en) | 2014-12-19 | 2022-01-19 | Advanced Risc Mach Ltd | Apparatus with at least one resource having thread mode and transaction mode, and method |
CN105868016B (en) * | 2015-01-20 | 2019-04-02 | 复旦大学 | A kind of thread transfer distribution method avoiding multi-core processor hot-spot |
US9772824B2 (en) * | 2015-03-25 | 2017-09-26 | International Business Machines Corporation | Program structure-based blocking |
US9940136B2 (en) * | 2015-06-26 | 2018-04-10 | Microsoft Technology Licensing, Llc | Reuse of decoded instructions |
US9946548B2 (en) | 2015-06-26 | 2018-04-17 | Microsoft Technology Licensing, Llc | Age-based management of instruction blocks in a processor instruction window |
US10409606B2 (en) | 2015-06-26 | 2019-09-10 | Microsoft Technology Licensing, Llc | Verifying branch targets |
US10175988B2 (en) | 2015-06-26 | 2019-01-08 | Microsoft Technology Licensing, Llc | Explicit instruction scheduler state information for a processor |
US11755484B2 (en) | 2015-06-26 | 2023-09-12 | Microsoft Technology Licensing, Llc | Instruction block allocation |
US10169044B2 (en) | 2015-06-26 | 2019-01-01 | Microsoft Technology Licensing, Llc | Processing an encoding format field to interpret header information regarding a group of instructions |
US10409599B2 (en) | 2015-06-26 | 2019-09-10 | Microsoft Technology Licensing, Llc | Decoding information about a group of instructions including a size of the group of instructions |
US10191747B2 (en) | 2015-06-26 | 2019-01-29 | Microsoft Technology Licensing, Llc | Locking operand values for groups of instructions executed atomically |
US9952867B2 (en) | 2015-06-26 | 2018-04-24 | Microsoft Technology Licensing, Llc | Mapping instruction blocks based on block size |
US10346168B2 (en) | 2015-06-26 | 2019-07-09 | Microsoft Technology Licensing, Llc | Decoupled processor instruction window and operand buffer |
US10095519B2 (en) | 2015-09-19 | 2018-10-09 | Microsoft Technology Licensing, Llc | Instruction block address register |
US20180088977A1 (en) * | 2016-09-28 | 2018-03-29 | Mark Gray | Techniques to determine and mitigate latency in virtual environments |
US11119923B2 (en) * | 2017-02-23 | 2021-09-14 | Advanced Micro Devices, Inc. | Locality-aware and sharing-aware cache coherence for collections of processors |
US11727997B2 (en) * | 2017-07-07 | 2023-08-15 | Micron Technology, Inc. | RPMB improvements to managed NAND |
US11277455B2 (en) | 2018-06-07 | 2022-03-15 | Mellanox Technologies, Ltd. | Streaming system |
US11068612B2 (en) | 2018-08-01 | 2021-07-20 | International Business Machines Corporation | Microarchitectural techniques to mitigate cache-based data security vulnerabilities |
US10884799B2 (en) * | 2019-01-18 | 2021-01-05 | EMC IP Holding Company LLC | Multi-core processor in storage system executing dynamic thread for increased core availability |
US11625393B2 (en) * | 2019-02-19 | 2023-04-11 | Mellanox Technologies, Ltd. | High performance computing system |
EP3699770A1 (en) | 2019-02-25 | 2020-08-26 | Mellanox Technologies TLV Ltd. | Collective communication system and methods |
US11750699B2 (en) | 2020-01-15 | 2023-09-05 | Mellanox Technologies, Ltd. | Small message aggregation |
US11252027B2 (en) | 2020-01-23 | 2022-02-15 | Mellanox Technologies, Ltd. | Network element supporting flexible data reduction operations |
US11876885B2 (en) | 2020-07-02 | 2024-01-16 | Mellanox Technologies, Ltd. | Clock queue with arming and/or self-arming features |
CN112307067B (en) * | 2020-11-06 | 2024-04-19 | 支付宝(杭州)信息技术有限公司 | Data processing method and device |
CN112486703B (en) * | 2020-11-27 | 2024-02-06 | 中船重工(武汉)凌久电子有限责任公司 | Global data memory management method based on multi-core multi-processor parallel system |
US11749333B2 (en) * | 2020-12-10 | 2023-09-05 | SK Hynix Inc. | Memory system |
US11556378B2 (en) | 2020-12-14 | 2023-01-17 | Mellanox Technologies, Ltd. | Offloading execution of a multi-task parameter-dependent operation to a network device |
CN114035847B (en) * | 2021-11-08 | 2023-08-29 | 海飞科(南京)信息技术有限公司 | Method and apparatus for parallel execution of kernel programs |
CN114741351B (en) * | 2022-06-10 | 2022-10-21 | 深圳市航顺芯片技术研发有限公司 | Multi-core chip and computer equipment |
US11922237B1 (en) | 2022-09-12 | 2024-03-05 | Mellanox Technologies, Ltd. | Single-step collective operations |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050223200A1 (en) * | 2004-03-30 | 2005-10-06 | Marc Tremblay | Storing results of resolvable branches during speculative execution to predict branches during non-speculative execution |
WO2006071969A1 (en) | 2004-12-29 | 2006-07-06 | Intel Corporation | Transaction based shared data operations in a multiprocessor environment |
US20070192540A1 (en) * | 2006-02-10 | 2007-08-16 | International Business Machines Corporation | Architectural support for thread level speculative execution |
US20070271445A1 (en) * | 2003-02-13 | 2007-11-22 | Sun Microsystems, Inc. | Selectively monitoring stores to support transactional program execution |
US20080282064A1 (en) * | 2007-05-07 | 2008-11-13 | Michael Norman Day | System and Method for Speculative Thread Assist in a Heterogeneous Processing Environment |
WO2010020828A1 (en) | 2008-08-18 | 2010-02-25 | Telefonaktiebolaget L M Ericsson (Publ) | Data sharing in chip multi-processor systems |
-
2012
- 2012-04-05 US US14/110,140 patent/US9619301B2/en not_active Expired - Fee Related
- 2012-04-05 WO PCT/EP2012/056282 patent/WO2012136766A1/en active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070271445A1 (en) * | 2003-02-13 | 2007-11-22 | Sun Microsystems, Inc. | Selectively monitoring stores to support transactional program execution |
US20050223200A1 (en) * | 2004-03-30 | 2005-10-06 | Marc Tremblay | Storing results of resolvable branches during speculative execution to predict branches during non-speculative execution |
WO2006071969A1 (en) | 2004-12-29 | 2006-07-06 | Intel Corporation | Transaction based shared data operations in a multiprocessor environment |
US20070192540A1 (en) * | 2006-02-10 | 2007-08-16 | International Business Machines Corporation | Architectural support for thread level speculative execution |
US20080282064A1 (en) * | 2007-05-07 | 2008-11-13 | Michael Norman Day | System and Method for Speculative Thread Assist in a Heterogeneous Processing Environment |
WO2010020828A1 (en) | 2008-08-18 | 2010-02-25 | Telefonaktiebolaget L M Ericsson (Publ) | Data sharing in chip multi-processor systems |
Non-Patent Citations (20)
Title |
---|
A. Vajda, "Handling of Shared Memory in Many-core systems without Locks and Transactional Memory." In 3rd Workshop on Programmability Issues for Multi-core Computers (MULTIPROG), with HiPEAC 2010. pp. 1-12, 2010. |
András Vajda, "The case for coherence-less distributed cache architecture." Proceedings of the 4th Workshop on Chip Multi-processor Memory Systems and Interconnects. pp. 1-3, 2010. |
Andras Vajda, Per Stenstrom, Semantic information based speculative parallel execution, Jun. 22, 2010, HAL archives-Ouvertes. * |
Andras Vajda, Per Stenstrom, Semantic information based speculative parallel execution, Jun. 22, 2010, HAL archives—Ouvertes. * |
Andras Vajda, Per Stenstrom. "Semantic information based speculative parallel execution." Wei Liu and Scott Mahlke and Tin-fook Ngai. Pespma 2010-Workshop on Parallel Execution of Sequential Programs on Multi-core Architecture, pp. 1-13, Jun. 2010, Saint Malo, France. |
Andras Vajda, Per Stenstrom. "Semantic information based speculative parallel execution." Wei Liu and Scott Mahlke and Tin-fook Ngai. Pespma 2010—Workshop on Parallel Execution of Sequential Programs on Multi-core Architecture, pp. 1-13, Jun. 2010, Saint Malo, France. |
Annie Foong, Gary McAlpine, Dave Minturn, Greg Regnier, Vikram Saletore, "An Architecture for Software-Based iSCSI on Multiprocessor Servers," Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05), 2005, pp. 1-7, 213b, doi:10.1109/IPDPS.2005.89. |
Anoop Gupta, Wolf-Dietrich Weber, and Todd Mowry. "Reducing Memory and Traffic Requirements for Scalable Directory-Based Cache Coherence Schemes." In ICPP (1), pp. 312-321. 1990. |
David A. Huffman, "A method for the construction of minimum redundancy codes." Proceedings of the IRE 40.9 (1952): pp. 1098-1101. |
David Chaiken, Craig Fields, Kiyoshi Kurihara, and Anant Agarwal. "Directory-based cache coherence in large-scale multiprocessors." Computer 23, No. 6 (1990): pp. 49-58. |
Hakan Nilsson and Per Stenström. "The scalable tree protocol-a cache coherence approach for large-scale multiprocessors." In Parallel and Distributed Processing, 1992. Proceedings of the Fourth IEEE Symposium on, pp. 498-506. IEEE, 1992. |
Hakan Nilsson and Per Stenström. "The scalable tree protocol—a cache coherence approach for large-scale multiprocessors." In Parallel and Distributed Processing, 1992. Proceedings of the Fourth IEEE Symposium on, pp. 498-506. IEEE, 1992. |
International Preliminary Report on Patentability, Application No. PCT/EP2012/056282, dated Oct. 17, 2013, 6 pages. |
International Search Report and Written Opinion, Application No. PCT/EP2012/056282, dated Jul. 5, 2012, 8 pages. |
James R. Goodman, "Using cache memory to reduce processor-memory traffic." In ACM SIGARCH Computer Architecture News, vol. 11, No. 3, pp. 124-131. ACM, 1983. |
M. Aater Suleman, Onur Mutlu, Moinuddin K. Qureshi, and Yale N. Patt. "Accelerating critical section execution with asymmetric multi-core architectures." In ACM SIGRACH Computer Architecture News, vol. 37, No. 1, pp. 253-264. ACM, 2009. |
Mark S. Papamarcos and Janak H. Patel. "A low-overhead coherence solution for multiprocessors with private cache memories." In ACM SIGARCH Computer Architecture News, vol. 12, No. 3, pp. 348-354. ACM, 1984. |
Martinez, Speculative Synchronization: Applying Thread-Level Speculation to Explicitly Parallel Applications, Oct. 5, 2002, Association for Computer Machinery. * |
Paul Sweazey and Alan Jay Smith. "A class of compatible cache consistency protocols and their support by the IEEE futurebus." In ACM SIGARCH Computer Architecture News, vol. 14, No. 2, pp. 414-423. IEEE Computer Society Press, 1986. |
Randy H. Katz, Susan J. Eggers, David A. Wood, C. L. Perkins, and Robert G. Sheldon. "Implementing a cache consistency protocol." vol. 13, No. 3. pp. 1-31, ACM, 1985. |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160117193A1 (en) * | 2014-10-22 | 2016-04-28 | International Business Machines Corporation | Resource mapping in multi-threaded central processor units |
US9898348B2 (en) * | 2014-10-22 | 2018-02-20 | International Business Machines Corporation | Resource mapping in multi-threaded central processor units |
Also Published As
Publication number | Publication date |
---|---|
US20140033217A1 (en) | 2014-01-30 |
WO2012136766A1 (en) | 2012-10-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9619301B2 (en) | Multi-core memory model and speculative mode processor management | |
US8438341B2 (en) | Common memory programming | |
RU2501071C2 (en) | Late lock acquire mechanism for hardware lock elision (hle) | |
US8661449B2 (en) | Transactional computation on clusters | |
Scott et al. | Shared-memory synchronization | |
JP5592015B2 (en) | Apparatus, method and system for dynamically optimizing code utilizing adjustable transaction size based on hardware limitations | |
US7584332B2 (en) | Computer systems with lightweight multi-threaded architectures | |
KR101355496B1 (en) | Scheduling mechanism of a hierarchical processor including multiple parallel clusters | |
Harris et al. | Transactional memory: An overview | |
KR101496063B1 (en) | Apparatus, method, and system for providing a decision mechanism for conditional commits in an atomic region | |
KR102008733B1 (en) | A load store buffer agnostic to threads implementing forwarding from different threads based on store seniority | |
KR101774993B1 (en) | A virtual load store queue having a dynamic dispatch window with a distributed structure | |
KR101804027B1 (en) | A semaphore method and system with out of order loads in a memory consistency model that constitutes loads reading from memory in order | |
KR101993562B1 (en) | An instruction definition to implement load store reordering and optimization | |
Blundell et al. | Unrestricted transactional memory: Supporting I/O and system calls within transactions | |
US8707016B2 (en) | Thread partitioning in a multi-core environment | |
Malhotra et al. | ParTejas: A parallel simulator for multicore processors | |
Liu et al. | No barrier in the road: a comprehensive study and optimization of ARM barriers | |
Ohmacht et al. | IBM Blue Gene/Q memory subsystem with speculative execution and transactional memory | |
Yiapanis et al. | Compiler-driven software speculation for thread-level parallelism | |
Qian et al. | BulkCommit: scalable and fast commit of atomic blocks in a lazy multiprocessor environment | |
Villegas et al. | Hardware support for scratchpad memory transactions on GPU architectures | |
Vajda et al. | Coherence-less Memory Model for Shared Memory, Speculative Multi-core Processors | |
Shahid et al. | Hardware transactional memories: A survey | |
Xiang et al. | MSpec: A design pattern for concurrent data structures |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: TELEFONAKTIEBOLAGET L M ERICSSON (PUBL), SWEDEN Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:STENSTROEM, PER;REEL/FRAME:036467/0599 Effective date: 20120223 Owner name: OY L M ERICSSON AB, FINLAND Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:VAJDA, ANDRAS;REEL/FRAME:036468/0320 Effective date: 20120315 Owner name: TELEFONAKTIEBOLAGET L M ERICSSON (PUBL), SWEDEN Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:OY L M ERICSSON AB;REEL/FRAME:036469/0660 Effective date: 20120319 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
FEPP | Fee payment procedure |
Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
LAPS | Lapse for failure to pay maintenance fees |
Free format text: PATENT EXPIRED FOR FAILURE TO PAY MAINTENANCE FEES (ORIGINAL EVENT CODE: EXP.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
STCH | Information on status: patent discontinuation |
Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362 |
|
FP | Lapsed due to failure to pay maintenance fee |
Effective date: 20210411 |