CN111788562A - Atomic operation execution method and device - Google Patents

Atomic operation execution method and device Download PDF

Info

Publication number
CN111788562A
CN111788562A CN201880090540.XA CN201880090540A CN111788562A CN 111788562 A CN111788562 A CN 111788562A CN 201880090540 A CN201880090540 A CN 201880090540A CN 111788562 A CN111788562 A CN 111788562A
Authority
CN
China
Prior art keywords
storage unit
data
atomic operation
state
shared
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201880090540.XA
Other languages
Chinese (zh)
Other versions
CN111788562B (en
Inventor
夏晶
信恒超
涂珍喜
曾红义
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Publication of CN111788562A publication Critical patent/CN111788562A/en
Application granted granted Critical
Publication of CN111788562B publication Critical patent/CN111788562B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

The application provides an atomic operation execution method and device, and relates to the field of computers. The method can execute the atomic operation in the target storage unit when the state of the data corresponding to the atomic operation in the target storage unit is a modified state or an exclusive state; when the state of the data in the target storage unit is a failure state or a shared state, whether the data is data shared by the plurality of first storage units is detected through the second storage unit, if the data is data shared by the plurality of first storage units, an atomic operation is executed in the second storage unit, and if the data is not data shared by the plurality of first storage units, the atomic operation can be executed in the target storage unit or an upper storage unit. The method provided by the application can determine the node for executing the atomic operation according to the sharing condition of the data corresponding to the atomic operation, avoid the conflict generated when the atomic operation is executed, and ensure the reliability and the performance of the computer equipment when the atomic operation is executed.

Description

Atomic operation execution method and device Technical Field
The present application relates to the field of computers, and in particular, to a method and an apparatus for executing an atomic operation.
Background
Atomic operation (atomic operation) refers to an operation or series of operations that cannot be interrupted by a thread scheduling mechanism. Currently, many Instruction Set Architectures (ISAs) have instructions for atomic operations, and many current applications rely on such fine-grained atomic operations.
In the related art, atomic operations are generally performed in a level one Cache (L1 Cache) of a computer device. The Cache (Cache Memory) is a temporary storage between a Central Processing Unit (CPU) and a Memory, and has a smaller capacity but a faster exchange speed than the Memory. Generally, the cache can be divided into a first-level cache and a second-level cache, and part of the CPU also has a third-level cache. When a CPU needs to read a datum, the datum is firstly searched from the first-level cache, and then is searched from the second-level cache if the datum is not found, and then is searched from the third-level cache or the memory if the datum is not found.
For a computer device adopting a multi-core CPU, when multiple CPU cores attempt to execute an atomic operation at the same address at the same time, the L1 Cache of each CPU core contends for the exclusive right of data corresponding to the atomic operation, and invalidates data in the L1 caches of other CPU cores, resulting in an atomic operation execution conflict and performance degradation of the computer device.
Disclosure of Invention
The application provides an atomic operation execution method and an atomic operation execution device, which can solve the problem that performance of computer equipment is possibly reduced when multi-core computer equipment in the related art executes atomic operation.
In one aspect, an execution method of an atomic operation is provided, and the method may be applied to a memory in a multi-core computer device, where the memory includes multiple levels of memory cells, where two adjacent levels of memory cells exist in the multiple levels of memory cells, where in the two adjacent levels of memory cells, a first level of memory cell includes multiple first memory cells, a second level of memory cell includes at least one second memory cell, each first memory cell is a cache cell, each second memory cell is a cache cell or a memory, and each second memory cell is shared by multiple processor cores; the method can comprise the following steps:
when a target storage unit in the first storage units receives an atomic operation, if the state of data corresponding to the atomic operation in the target storage unit is a modified state or an exclusive state, executing the atomic operation in the target storage unit; if the state of the data in the target storage unit is a failure state or a shared state, detecting whether the data is shared by a plurality of first storage units through the second storage unit; if the data is shared by a plurality of first storage units, executing the atomic operation in the second storage unit; if the data is not shared by the first memory cells, the atomic operation is performed in the target memory cell or a higher-level memory cell of the target memory cell.
According to the method, the first storage unit and the second storage unit in the memory can determine the execution node of the atomic operation according to the state of the data corresponding to the atomic operation in each level of storage units, so that execution conflict can be avoided, and the performance of computer equipment is guaranteed.
Optionally, the memory may further include: the upper level memory unit is positioned at the upper level of the first level memory unit; the method may further comprise: when any memory cell in the upper memory cell receives the atomic operation, if the state of the data in any memory cell is a modified state or an exclusive state, executing the atomic operation in any memory cell; and if the state of the data in any storage unit is a failure state or a shared state, sending the atomic operation to a next-level storage unit through any storage unit.
When the state of the data in any storage unit is a modified state or an exclusive state, it may be determined that the data is exclusively shared only inside the any storage unit, and thus the data may be directly executed in the any storage unit.
Optionally, if the state of the data in the target storage unit is a failure state or a shared state, the process of detecting whether the data is shared by the plurality of first storage units through the second storage unit may include:
if the state of the data in the target storage unit is a failure state or a shared state, detecting whether the address of the data is recorded in a conflict list through the target storage unit, wherein the data indicated by the address recorded in the conflict list can be data shared by a plurality of first storage units;
if the address is not recorded in the conflict list, it may be determined that the atomic operation may not generate a conflict when executed, and in order to ensure the exclusivity of data, a probe request carrying the address of the data may be sent to the second storage unit through the target storage unit, and whether the data indicated by the address is data shared by a plurality of first storage units is detected by the second storage unit, so as to further verify the sharing condition of the data.
If the address is recorded in the conflict list, it may be determined that a conflict may occur when the atomic operation is executed, and therefore, the target storage unit may directly forward the atomic operation to the second storage unit, and detect whether the data corresponding to the atomic operation is data shared by the plurality of first storage units through the second storage unit, that is, the second storage unit further determines an execution node of the atomic operation.
Optionally, if the data indicated by the address is data shared by a plurality of first storage units, the method may further include:
sending a failure response message to the target storage unit through the second storage unit, wherein the failure response message is used for indicating that the state of the data in the target storage unit is an invalid state; and recording the address of the data in the conflict list through the target storage unit according to the failure response message.
After the target storage unit records the address of the data in the conflict list, when the atomic operation is received again, the atomic operation may be directly forwarded to the second storage unit, so that the second storage unit determines the execution node of the atomic operation.
Optionally, if the data indicated by the address is not data shared by the plurality of first memory cells, the process of performing the atomic operation in the target memory cell or the upper level memory cell of the target memory cell may include:
if the data indicated by the address is not data shared by the plurality of first storage units and the state of the data in the target storage unit is a shared state, sending a success response message to the target storage unit through the second storage unit and executing the atomic operation in the target storage unit, wherein the success response message is used for indicating that the state of the data is a modified state or an exclusive state;
if the data indicated by the address is not data shared by the plurality of first storage units and the state of the data in the target storage unit is an invalid state, sending the success response message and the data to the target storage unit through the second storage unit and executing the atomic operation in the target storage unit;
if the address is the address received for the first time and the state of the data in the target storage unit is an invalid state, the data corresponding to the atomic operation is sent to the target storage unit or a higher-level storage unit of the target storage unit through the second storage unit, and the atomic operation is executed in the storage unit receiving the data.
If the data indicated by the address is not data shared by the plurality of first memory cells, the second memory cell may determine that the data is shared only within the target memory cell, and thus the atomic operation may be performed by the target memory cell.
Optionally, if the data corresponding to the atomic operation is not data shared by the plurality of first storage units, the process of executing the atomic operation in the target storage unit or the upper level storage unit of the target storage unit may include:
if the data corresponding to the atomic operation is shared in the target storage unit, sending the data corresponding to the atomic operation to the target storage unit, and executing the atomic operation in the target storage unit;
and if the data corresponding to the atomic operation is exclusively shared in the upper storage unit of the target storage unit, sending the data corresponding to the atomic operation to the upper storage unit, and executing the atomic operation in the upper storage unit.
When the data corresponding to the atomic operation is not the data shared by the plurality of first storage units, the second storage unit may accurately determine the node for executing the atomic operation according to the shared node of the data.
Optionally, after the target storage unit executes the atomic operation, the method may further include: the address of the data is deleted from the conflict list through the target storage unit, and when the atomic operation is received again subsequently, the target storage unit can determine the execution node of the atomic operation by sending a probe request.
Optionally, the process of detecting, by the second storage unit, whether the data is data shared by the plurality of first storage units may include:
detecting whether a processor core initiating the atomic operation in a preset time period shares a first storage unit or not through the second storage unit; when the number of the first storage units shared by the processor cores initiating the atomic operation within the preset time period is multiple, determining that the data is shared by the multiple first storage units; when the processor cores initiating the atomic operation in the preset time period share the same first storage unit, it may be determined that the data is not data shared by the plurality of first storage units.
According to the method, the first storage unit and the second storage unit in the memory can determine the execution node of the atomic operation according to the state of the data corresponding to the atomic operation in each level of storage units, so that execution conflict can be avoided, and the performance of computer equipment is guaranteed.
In another aspect, a memory is provided, which is applied in a multi-core computer device, and includes: the multi-level memory unit comprises two adjacent levels of memory units, wherein in the two adjacent levels of memory units, a first level of memory unit comprises a plurality of first memory units, a second level of memory unit comprises at least one second memory unit, and each second memory unit is shared by a plurality of processor cores; various memory locations in the memory may be used to implement the execution of atomic operations provided by the above aspects.
In yet another aspect, a computer-readable storage medium having instructions stored therein, which when run on a computer, cause the computer to perform the method of performing an atomic operation provided by the above aspect.
In a further aspect, a chip is provided, the chip comprising programmable logic circuits and/or program instructions, when the chip is run, for implementing the method of performing atomic operations provided by the above aspects.
In yet another aspect, a computer device is provided that may include a multi-core processor, and a memory provided by the above aspects.
In yet another aspect, a computer program product containing instructions is provided, which when run on a computer causes the computer to perform the method of performing atomic operations as provided by the above aspects.
In summary, the present application provides an atomic operation execution method and apparatus, where if a state of data corresponding to an atomic operation in a target storage unit is a modified state or an exclusive state, the atomic operation may be executed in the target storage unit, if the state of the data in the target storage unit is a failure state or a shared state, whether the data is data shared by a plurality of first storage units may be detected by a second storage unit, if the data is data shared by the plurality of first storage units, the atomic operation may be executed in the second storage unit, and if the data is not data shared by the plurality of first storage units, the atomic operation may be executed in the target storage unit or an upper storage unit of the target storage unit. According to the scheme provided by the application, the node for executing the atomic operation can be determined according to the sharing condition of the data corresponding to the atomic operation, the situation that the performance of computer equipment is reduced due to the fact that conflict is generated when the atomic operation is executed is avoided, and the reliability when the atomic operation is executed is guaranteed.
Drawings
FIG. 1 is a schematic structural diagram of a memory in a multi-core computer device according to an embodiment of the present invention;
FIG. 2 is a schematic structural diagram of another memory according to an embodiment of the present invention;
FIG. 3 is a flow chart of a method for performing an atomic operation according to an embodiment of the present invention;
FIG. 4 is a partial structural diagram of a memory according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of another memory according to an embodiment of the present invention.
Detailed Description
Fig. 1 is a schematic structural diagram of a memory in a multi-core computer device according to an embodiment of the present invention, where a CPU in the multi-core computer device includes multiple processor cores, that is, the CPU is a multi-core CPU. The memory may include multiple levels of memory cells, each level of memory cells may include one or more memory cells, and each level of memory cells may include memory cells that are buffer cells or may be a memory.
There are two adjacent levels of memory cells among the multi-level memory cells, a first level memory cell 10 among the two adjacent levels of memory cells may include a plurality of first memory cells 100, and a second level memory cell 20 among the two adjacent levels of memory cells includes at least one second memory cell 200. Each of the first memory units 100 may be a cache unit, and the cache unit may be an exclusive cache shared by one processor core, or may be a shared cache shared by a plurality of processor cores. Each second storage unit 200 may be a cache unit or may be a memory, and each second storage unit 200 is shared by a plurality of processor cores. Moreover, each second storage unit 200 corresponds to at least two first storage units 100, that is, at least two first storage units 100 in the first-level storage unit 10 may be gathered to the same second storage unit 200.
Fig. 2 is a schematic structural diagram of another memory according to an embodiment of the present invention. For example, as shown in fig. 2, the memory may include four levels of memory cells, where the memory cells included in the first three levels of memory cells of the four levels of memory cells are all cache cells, the last level of memory cell includes only one memory cell, and the memory cell is a memory. The first three-level cache unit may include a two-level exclusive cache and a one-level shared cache. The first level exclusive Cache comprises a plurality of first level caches (L1 caches), the second level exclusive Cache comprises a plurality of second level caches (L2 caches), and the third level shared Cache can comprise a plurality of third level caches (L3 caches). The Memory included in the last stage of Memory unit may be a Double Data Rate (DDR) Synchronous Dynamic Random Access Memory (SDRAM).
Wherein, each exclusive cache (L1 or L2) is shared by one processor core, and each shared cache L3 is shared by a plurality of processor cores, that is, the exclusive caches shared by the plurality of processor cores can be gathered to the same shared cache L3. The plurality of shared caches L3 are pooled into memory, i.e., memory may be shared by all processor cores of the computer device.
For the memory shown in fig. 2, the first-level memory unit 10 in two adjacent levels of memory units may be the third-level shared cache, and the second-level memory unit may be the memory 20. Or, the first-level storage unit in the two adjacent levels of storage units may be the second-level exclusive cache, and the second-level storage unit may be the third-level shared cache.
Optionally, the storage may also only include multiple levels of exclusive caches, and the multiple exclusive caches may be directly aggregated to the memory, that is, the storage may also not include a shared cache. Accordingly, the first-level memory unit 10 in two adjacent levels of memory units may be the second-level exclusive cache, and the second-level memory unit 20 may be the memory.
Referring to fig. 2, it can be further seen that in the memory provided by the embodiment of the present invention, each storage Unit (i.e. cache Unit or memory) is provided with an Arithmetic Logic Unit (ALU). Each memory location may perform an atomic operation through the ALU.
Fig. 3 is a flowchart of an execution method of an atomic operation according to an embodiment of the present invention, which may be applied to the memory shown in fig. 1 or fig. 2. Referring to fig. 3, the method may include:
step 201, when any cache unit receives an atomic operation, detecting the state of data corresponding to the atomic operation in the cache unit of the current level through the cache unit.
In the embodiment of the present invention, when any cache unit receives an atomic operation initiated by a processor core, a state of data corresponding to the atomic operation in the cache unit of the current level may be detected first. This state may refer to four states defined in the cache coherency protocol: modified (M) state, Exclusive (E) state, shared (S) state, and Invalid (I) state.
The M state indicates that the data in the cache unit has been modified and the data is inconsistent with the data in the memory, but the data in the current cache unit is the standard. The E state refers to that data in the cache unit is consistent with data in the memory, and the data only exists in the cache unit of the current level, and the other cache units do not have the data, that is, the processor core corresponding to the cache unit of the current level shares the data alone. The S state refers to that data in the cache unit is consistent with data in the memory, and the data exists in a plurality of cache units, i.e. the data is shared by a plurality of processor cores. The I state refers to invalid data in which data in the cache location is unavailable.
Optionally, the minimum unit of each cache unit when exchanging data with the memory is a cache line (cache line), and the size of each cache line is generally 64 bytes or 128 bytes. Each cache line includes, in addition to data, the state of the data and the address of the data in memory. In the embodiment of the present invention, after receiving an atomic operation, any cache unit may obtain an address of data corresponding to the atomic operation, and further may determine, according to the address, a cache line in which the data is located, and obtain a state of the data.
When any of the cache units detects that the state of the data in the current level cache unit is M state or E state, step 202 may be performed. When the first-level cache unit detects that the state of the data in the current-level cache unit is the I state or the S state, if the any cache unit is the upper-level storage unit of the first storage unit, step 203 may be executed; if the any cache unit is the first storage unit, step 204 may be performed.
In the embodiment of the present invention, for the multiple levels of storage units in the storage, if a storage unit of a next level of the storage unit of a certain level is a multi-core shared storage unit (for example, a shared cache or a memory), a conflict list may be configured in each storage unit of the certain level of storage unit, where the conflict list records in advance addresses of data shared by multiple storage units, that is, addresses of data corresponding to atomic operations that may generate a conflict. The first-level storage unit configured with the conflict list is the first-level storage unit. As can be seen from fig. 2, the memory may further include an upper level storage unit 30 located at an upper level of the first level storage unit 10, where the upper level storage unit 30 includes a storage unit which is generally an exclusive cache, and the exclusive cache does not store the conflict list. For example, as shown in FIG. 2, the upper level memory unit 30 of the first level memory unit 10 may include two levels of exclusive cache.
Therefore, when any cache unit detects that the state of the data in the cache unit of the current level is the I state or the S state, it may continue to detect whether a conflict list is stored in the cache unit of the current level, and if the conflict list is not stored, step 203 may be executed; if the conflict list is stored, step 204 may be performed.
Step 202, the atomic operation is executed in any of the cache units.
When the cache unit detects that the state of the data in the cache unit of the current level is M state or E state, it may be determined that the data is not shared to other cache units, and thus the atomic operation may be directly performed in the cache unit of the current level (i.e. the cache unit of the current level).
For example, assume that the CPU of the computer device is a 4-core CPU, i.e., the CPU includes 4 processor cores. If the level one cache L1 of the processor core 1 receives the atomic operation and detects that the state of the data corresponding to the atomic operation in the level one cache L1 is the E state, the level one cache L1 of the processor core 1 can directly execute the atomic operation.
And step 203, sending the atomic operation to a next-level storage unit through any cache unit which receives the atomic operation.
When any cache unit detects that the state of the data is I state or S state, the data may be shared by multiple processor cores according to the definition in the cache coherency protocol. In order to ensure the exclusivity of data, if any cache unit receiving the atomic operation is a higher-level memory unit of the first memory unit (i.e., no conflict list is stored in any cache unit), any cache unit receiving the atomic operation may send the atomic operation to a next-level memory unit. The next level memory unit may continue the method shown in step 201.
For example, the storage shown in fig. 2 is taken as an example, and the first-level storage unit 10 in the storage is taken as a third-level shared cache including a plurality of shared caches L3, and the second-level storage unit 20 is taken as an example for description. If the first-level cache L1 of the processor core 1 detects that the status of the data corresponding to the atomic operation in the first-level cache L1 is in the I state, the first-level cache L1 may send the atomic operation to the second-level cache L2 of the processor core 1. If the second-level cache L2 of the processor core 1 receives the atomic operation and detects that the status of the data corresponding to the atomic operation in the second-level cache L2 is in the I state, the second-level cache L2 may send the atomic operation to the next-level storage unit, i.e., the shared cache L3 of the processor core 1.
And step 204, detecting whether the address of the data is recorded in the conflict list through the target storage unit receiving the atomic operation.
If any cache unit receiving the atomic operation is the first storage unit in the memory, when the target storage unit receiving the atomic operation detects that the state of the data in the target storage unit is the I state or the S state, it may continue to detect whether the address of the data is recorded in the conflict list. In the embodiment of the present invention, as described above, each first storage unit in the memory stores a conflict list, where the conflict list records in advance addresses of data shared by multiple storage units, that is, addresses of data corresponding to atomic operations that may generate a conflict.
If the target memory unit detects that the address is not recorded in the conflict list, i.e. address Miss (Miss), step 205 may be executed; if the target memory unit detects that the address is recorded in the conflict list, i.e. an address Hit (Hit), step 211 can be executed.
For example, fig. 4 is a schematic diagram of a partial structure of a memory according to an embodiment of the present invention. Referring to FIG. 4, each of the first storage units (e.g., the shared cache L3 shown in FIG. 4) may have a Conflict Monitor (Conflict Monitor) disposed therein for maintaining the Conflict list. The conflict list maintained by the conflict monitor may be as shown in table 1. Referring to table 1, four addresses, 0x00000001 and 0x00000080, are recorded in the conflict list. After the shared cache L3 of the processor core 1 receives the atomic operation sent by the exclusive cache L2, when it is detected that the state of the data corresponding to the atomic operation in the current-level storage unit is the S state and the address of the data corresponding to the atomic operation is 0x00000001, step 211 may be executed because the address is recorded in the conflict list shown in table 1. If the address of the data corresponding to the atomic operation is 0x00000350, step 205 may be executed because the address is not recorded in the conflict list shown in table 1.
TABLE 1
1 0x00000001
2 0x00000080
3 0x00000175
4 0x00000260
Optionally, in the embodiment of the present invention, when the target storage unit detects an address miss, it may further detect whether the data is stored in the target storage unit. If the data is stored, the atomic operation can be directly executed in the target storage unit; if the data is not stored, step 205 may be performed again.
Step 205, sending a probe request carrying the address of the data to the second storage unit through the target storage unit.
If the target storage unit detects that the address is not recorded in the conflict list, a probe request carrying the address of the data may be sent to the second storage unit through the target storage unit in order to further determine the node location where the atomic operation is performed.
For example, a private probe request cleainunique _ Try may be predefined in the memory, and when the shared cache L3 of the processor core 1 detects that the address of the data corresponding to the atomic operation is not recorded in the conflict monitor, the probe request cleainunique _ Try may be sent to the memory.
Step 206, detecting whether the data indicated by the address carried in the probe request is data shared by the plurality of first memory units through the second memory unit.
In this embodiment of the present invention, a cache coherence Directory (Directory) for maintaining each first storage unit may be disposed in the second storage unit (for example, the memory shown in fig. 4), and a state of data indicated by each address in each first storage unit may be recorded in the Directory. Referring to fig. 4, a sharing monitor for recording the sharing condition of the data indicated by each address may be further provided in the second storage unit. The shared monitor may record therein an address of data corresponding to the atomic operation processed by the second storage unit in a last preset time period, an Identification (ID) of a processor core from which each atomic operation was initiated last time, and a rule of the identification of the processor core from which each atomic operation was initiated (for example, the number of times that the processor core indicated by each identification initiates the same atomic operation and a range in which the identification of the processor core from which each atomic operation is initiated is located).
After receiving the probe request, the second storage unit may check whether the data indicated by the address carried in the probe request is data shared by the plurality of first storage units by querying the directory and/or the shared monitor. If the data is shared by a plurality of first storage units, step 207 and step 208 may be executed; if the data is not shared by the plurality of first storage units, step 210 can be performed.
As an alternative implementation manner, the second storage unit may query the sharing monitor to detect whether the processor cores initiating the atomic operation share one first storage unit within the preset time period. When it is detected that the first storage unit occupied by the processor core initiating the atomic operation within the preset time period includes a plurality of first storage units, it may be predicted that the data corresponding to the atomic operation is data shared by the plurality of first storage units within the latest time period, and step 207 and step 208 may be performed. When it is detected that the processor cores initiating the atomic operation share the same first memory unit within the preset time period, it may be predicted that the data corresponding to the atomic operation is not shared by the plurality of first memory units within the latest time period, and is shared only within one first memory unit, so step 209 may be performed.
The preset time period may be a preset fixed time period, or may also be a time period determined according to the initiation time of the last several times of atomic operations. That is, the second memory unit can detect whether the processor cores which have initiated the atomic operation for the last several times share one first memory unit.
As another alternative implementation, the second storage unit may detect the state of the data indicated by the address in each first storage unit by querying the directory. If the states of the data in the first storage units are all S states, the data can be determined to be data shared by the first storage units. If the state of the data in a certain first memory cell is M state or E state, or the states in the first memory cells are all I state, the second memory cell may determine that the data is not shared by the plurality of first memory cells.
For example, assume that the IDs of the processor cores that initiated the atomic operation within the last period of time recorded in the shared monitor include CPU1, CPU2, and CPU4, and wherein CPU1 and CPU2 share the same shared cache L3 and the shared cache of CPU4 is another shared cache L3. The memory may determine that the data corresponding to the atomic operation is shared among multiple shared caches, and thus step 207 and step 208 may be performed. If the IDs of the processor cores initiating the atomic operation in the last period of time recorded in the monitor are both CPU1 and CPU2, the memory may determine that the data corresponding to the atomic operation is shared only in one shared cache, that is, is not shared by multiple shared caches, and thus step 210 may be executed.
Step 207, the atomic operation is performed in the second memory location.
When it is detected that the data corresponding to the atomic operation is data shared by the plurality of first memory units, in order to avoid a conflict, the atomic operation may be executed in a second memory unit shared by the plurality of processor cores.
Step 208, sending a failure response message to the target storage unit through the second storage unit. Step 209 is performed.
When the second storage unit detects that the data corresponding to the atomic operation is data shared by the plurality of first storage units, since the atomic operation needs to be executed in the second storage unit, a failure response message may be further sent to the target storage unit through the second storage unit, where the failure response message may be used to indicate that the state of the data corresponding to the atomic operation is an I state.
For example, when the memory detects that the data is shared by the plurality of shared caches L3, the atomic operation may be performed in the memory and a failure response message may be sent to the shared cache L3 of the processor core 1.
Step 209 records the address of the data in the conflict list through the target storage unit.
After receiving the failure response message, the target storage unit may record the address of the data in the conflict list, so that the atomic operation of the address may be directly forwarded to the second storage unit when the atomic operation is received next time.
Optionally, in this embodiment of the present invention, the depth of the conflict list may be N (N is an integer greater than 1), that is, N addresses may be recorded in the conflict list. When the number of addresses recorded in the conflict list is already N, if there is a new address, the target storage unit may delete the address recorded first (i.e., the address recorded first overflows). When the target storage unit receives the atomic operation corresponding to the overflowed address again, the probe request can be sent to the second storage unit again.
For example, assuming that the address of the data corresponding to the atomic operation is 0x00000350, the shared cache L3 of processor core 1 may add the address 0x00000350 to the conflict list shown in table 1.
Step 210, the target storage unit or the upper storage unit of the target storage unit is instructed by the second storage unit to execute the atomic operation.
When the second storage unit detects that the data is not data shared by the plurality of first storage units, the atomic operation may be performed in the target storage unit or an upper storage unit of the target storage unit.
In an optional implementation manner, if the data indicated by the address is not data shared by the plurality of first storage units, and the state of the data in the target storage unit is an S state, a success response message may be sent to the target storage unit through the second storage unit, and the atomic operation is executed in the target storage unit.
In another optional implementation manner, if the data indicated by the address is not data shared by the plurality of first storage units, and the state of the data in the target storage unit is an I state, a success response message and the data may be sent to the target storage unit through the second storage unit, and the atomic operation is executed in the target storage unit.
Wherein the success response message may be used to indicate that the state of the data in the target storage unit is converted into an M state or an E state. According to the above cache coherency protocol, when the state of the data in the target storage unit is the S state, the data in the target storage unit is consistent with the data in the second storage unit, and therefore the second storage unit may send a success response message only to the target storage unit. When the state of the data in the target storage unit is the I state, the data in the target storage unit is already unavailable, and thus the second storage unit can transmit the data stored in the second storage unit at the same time as transmitting the success response message.
For example, assuming that the memory detects that the data corresponding to the atomic operation is not data shared by multiple shared caches, and the state of the data in the shared cache L3 of the processor core 1 is S state, the memory may directly send a success response message to the shared cache L3 of the processor core 1. After receiving the success response message, the shared cache L3 of the processor core 1 may execute the atomic operation in the current-level cache unit.
In yet another alternative implementation, if the second storage unit does not detect the address in the directory or the shared monitor, the address may be determined to be the first received address, and at this time, if the state of the data in the target storage unit is the I state, the data may be determined to be data that is shared exclusively by the processor cores. In this case, if the target storage unit is a shared cache, the data corresponding to the atomic operation may be sent to a higher-level storage unit of the target storage unit (for example, may be sent to an exclusive cache located at a higher level than the target storage unit) through the second storage unit, and the atomic operation may be executed in the higher-level storage unit. If the target storage unit is an exclusive cache, the data corresponding to the atomic operation may be sent to the target storage unit, and the operation may be executed in the target storage unit.
If the memory does not exist in the cache coherence directory or the shared monitor detects the address of the data corresponding to the atomic operation, the data may be determined to be the first accessed data, and the memory may initiate a Snoop store (Snoop stack) request to send the data corresponding to the atomic operation to the exclusive cache L1 or L2 of the processor core 1 for execution.
Step 211, forwarding the atomic operation to the second storage unit through the target storage unit.
In the embodiment of the present invention, if in step 204, the target storage unit detects that the address of the data is already recorded in the conflict list, it may be determined that the data may be data shared by multiple first storage units, and therefore the atomic operation may be directly forwarded to the second storage unit.
For example, when the address of the data corresponding to the atomic operation acquired by the shared cache L3 of the processor core 1 is 0x00000001, the address is recorded in the conflict list shown in table 1, and therefore the atomic operation may be directly forwarded to the memory.
Step 212, detecting whether the data corresponding to the atomic operation is the data shared by the plurality of first storage units through the second storage unit.
Further, it may be continuously detected, by the second storage unit, whether the data corresponding to the atomic operation is data shared by the plurality of first storage units. If the data is shared by a plurality of first memory cells, step 207 may be performed, i.e., the atomic operation is performed by the second memory cell. If the data is not shared by the plurality of first storage units, i.e. when the sharing condition of the data changes, step 213 can be executed.
The step 206 may be referred to in the process of detecting whether the data is shared by the plurality of first storage units through the second storage unit, and details are not described here.
Step 213, sending the data corresponding to the atomic operation to the target storage unit or the upper storage unit of the target storage unit through the second storage unit, and executing the atomic operation in the storage unit receiving the data.
If the second storage unit detects that the data corresponding to the atomic operation is not data shared by the plurality of first storage units, it may be determined that the sharing condition of the data corresponding to the atomic operation has changed, and therefore, the data corresponding to the atomic operation may be sent to the target storage unit or an upper storage unit of the target storage unit, and the storage unit that receives the data may execute the atomic operation.
Optionally, if the data corresponding to the atomic operation is shared in the target storage unit, the data corresponding to the atomic operation may be sent to the target storage unit, and the atomic operation is executed in the target storage unit. If the data corresponding to the atomic operation is exclusively shared in the upper level storage unit of the target storage unit, the data corresponding to the atomic operation may be sent to the upper level storage unit, and the atomic operation may be executed in the upper level storage unit.
When the atomic operation is executed through the target storage unit, the target storage unit can also delete the address of the data from the conflict list, and when the atomic operation is subsequently received again, the target storage unit can directly detect whether the data corresponding to the atomic operation is locally stored or not, and if the data is stored, the atomic operation can be directly executed; if the data is not stored, the location of the node performing the atomic operation may be determined by sending a probe request.
In the embodiment of the present invention, if the target storage unit is a shared cache, when the second storage unit detects that the data corresponding to the atomic operation is not data shared by the plurality of first storage units, the second storage unit may further determine the sharing condition of the data by querying the shared monitor. For example, when the second storage unit detects that the processor cores initiating the atomic operation are all the same within the preset time period, it may be determined that the data is exclusively shared in an upper storage unit (e.g., the exclusive cache L2 or L1) of the target storage unit, and thus the atomic operation may be performed in the upper storage unit. When the second storage unit detects that the processor core initiating the atomic operation within the preset time period includes a plurality of processor cores sharing the target storage unit, it may be determined that the data is shared inside the target storage unit, and thus the atomic operation may be performed in the target storage unit.
If the target storage unit is an exclusive cache, when the second storage unit detects that the data corresponding to the atomic operation is not data shared by the plurality of first storage units, it may be determined that the data is only exclusive in the target storage unit, and the atomic operation may be directly executed in the target storage unit.
For example, if the memory detects that the sharing condition of the data corresponding to the atomic operation is changed from sharing by multiple shared caches to sharing inside the shared cache of the processor core 1 (i.e. the shared cache shares the data exclusively), it may be determined that the data is shared by the shared cache of the processor core 1 or the exclusive cache of the processor core 1, and therefore the memory may initiate a Snoop stack request to send the data corresponding to the atomic operation to the shared cache L3 shared by the processor core 1 for execution, or to execute in the exclusive cache L1 or L2 shared by the processor core 1 for execution.
If the memory detects that the processor cores initiating the atomic operation are all processor cores 1 within the latest preset time period recorded in the monitor, the memory may send the data corresponding to the atomic operation to an exclusive cache L1 or L2 of the processor cores 1 for execution. If the memory detects that the processor core initiating the atomic operation includes the processor core 1 and the processor core 2 within the latest preset time period recorded in the monitor, data corresponding to the atomic operation may be sent to the shared cache L3 shared by the processor core 1 and the processor core 2 for execution.
If the memory sends data to the shared cache L3 of the processor core 1, the shared cache L3 also needs to delete the address of the data recorded in the conflict list shown in table 1 to refresh the conflict list.
Optionally, in an embodiment of the present invention, a module for maintaining cache consistency may be further disposed in the second storage unit, and the methods implemented by the second storage unit in the foregoing method embodiments may all be implemented by the module. For example, when the second storage unit is a memory, as shown in fig. 4, a Home Agent (HA) module for maintaining cache consistency may be disposed in the memory 20, and the methods implemented by the second storage unit in the above method embodiments may all be implemented by the HA module.
For the memories shown in fig. 2 and 4, with the method provided by the embodiment of the present invention, if data corresponding to a certain atomic operation is exclusive data, the state of the data in each buffer unit is initialized to the I state step by step. After the processor core first initiates the atomic operation, according to the above method flow, the atomic operation will be executed in HA or shared cache L3, and then HA will stack data to L1 or L2. The subsequent atomic operation will be performed at either the exclusive cache L1 or L2.
If the data corresponding to the atomic operation is shared in a shared cache L3 (Intra L3), then according to the above steps 206 and 210, the shared cache L3 can receive a success response message after sending a probe request to the HA. The state of the data in the shared cache L3 may be changed from the I or S state to the E or M state. Each processor core that subsequently shares the shared cache L3 performs the atomic operation in the shared cache L3.
If the data corresponding to the atomic operation is shared among multiple shared caches L3 (Inter L3), then according to the above steps 206 to 209, a failure response message is received after a certain shared cache L3 sends a probe request to the HA. The state of the data in the shared cache L3 cannot be changed from the I or S state to the E or M state. Thereafter, the shared cache L3 may record the address of the data in a conflict list, and when the atomic operation is subsequently received, the data is directly forwarded to the HA for execution.
If the sharing condition of the data corresponding to a certain atomic operation is converted from the internal sharing of a certain shared cache L3 (or a certain exclusive cache) to the sharing among a plurality of shared caches L3 (or a plurality of exclusive caches), the E or M state of the data in the shared cache L3 (or the exclusive cache) is updated to the I or S state.
If the sharing condition of the data corresponding to a certain atomic operation is converted from sharing among a plurality of shared caches L3 (or a plurality of exclusive caches) to sharing inside a certain shared cache L3 (or a certain exclusive cache), after the HA detects the conversion of the sharing state, the HA or the shared cache L3 may perform the atomic operation, and then the data stack may be transferred to the shared cache L3 (or the exclusive cache). If shared cache L3 receives the data, the address of the data may be deleted from the conflict list.
It should be noted that, the order of the steps of the execution method of the atomic operation provided in the embodiment of the present invention may be appropriately adjusted, and the steps may also be increased or decreased according to the situation. For example, step 208 may be performed before step 207, or may also be performed synchronously with step 207. Alternatively, steps 212 and 213 may be deleted as appropriate, i.e., step 207 may be performed directly after step 211. Any method that can be easily conceived by a person skilled in the art within the technical scope disclosed in the present application is covered by the protection scope of the present application, and thus the detailed description thereof is omitted.
In summary, an embodiment of the present invention provides an atomic operation executing method, where if a state of data corresponding to an atomic operation in a target storage unit is a modified state or an exclusive state, the atomic operation may be executed in the target storage unit, if the state of the data in the target storage unit is a failure state or a shared state, whether the data is data shared by a plurality of first storage units may be detected by a second storage unit, if the data is data shared by the plurality of first storage units, the atomic operation may be executed in the second storage unit, and if the data is not data shared by the plurality of first storage units, the atomic operation may be executed in the target storage unit or an upper storage unit of the target storage unit. The method provided by the application can determine the node for executing the atomic operation according to the sharing condition of the data corresponding to the atomic operation, avoid the performance reduction of computer equipment caused by the conflict generated during the execution of the atomic operation, and ensure the reliability during the execution of the atomic operation.
The embodiment of the invention also provides a memory, and the memory can be applied to multi-core computer equipment. Referring to fig. 1, the memory may include: a multi-level memory cell, in which two adjacent levels of memory cells exist, in the two adjacent levels of memory cells, the first level memory cell 10 includes a plurality of first memory cells 100, and the second level memory cell 20 includes at least one second memory cell 200, for example, only one second memory cell 200 is included in the structure shown in fig. 1. Each first storage unit 100 is a cache unit, each second storage unit 200 may be a cache unit or a memory, and each second storage unit 200 is shared by a plurality of processor cores.
The target storage unit 100 of the first storage units, which receives the atomic operation, is configured to execute the atomic operation in the target storage unit when a state of data corresponding to the atomic operation in the target storage unit is a modified state or an exclusive state.
The second storage unit 200 may be configured to detect whether the data is shared by the plurality of first storage units when the state of the data in the target storage unit is a failure state or a shared state; when the data is shared by a plurality of first storage units, executing the atomic operation in the second storage unit; and instructing the target storage unit or the upper storage unit of the target storage unit to execute the atomic operation when the data is not the data shared by the plurality of first storage units.
Optionally, fig. 5 is a schematic structural diagram of another memory provided in an embodiment of the present invention, and referring to fig. 5, the memory may further include: the upper level storage unit 30 located at the upper level of the first level storage unit 10, the upper level storage unit 30 may include a plurality of storage units 300, and each storage unit 300 may be an exclusive cache or a shared cache.
For example, as shown in fig. 5, each storage unit 300 in the upper level storage unit 30 may be an exclusive cache. Each first storage unit 100 in the first-level storage unit 10 may be a shared cache, the second-level storage unit 20 includes only one second storage unit 200, and the second storage unit 200 may be a memory.
Any of the upper level memory units 30 that receives the atomic operation may be configured to:
when the state of the data in any storage unit is a modified state or an exclusive state, executing the atomic operation in any storage unit;
and when the state of the data in any one storage unit is a failure state or a shared state, sending the atomic operation to a next-level storage unit.
Optionally, the target storage unit 100 may also be used to implement the methods shown in step 204, step 205, and step 211 in the foregoing method embodiments.
The second memory cell 200 may be configured to: upon receiving the probe request, the method shown in step 206 of the above-described method embodiment is implemented, and upon receiving the atomic operation, the method shown in step 212 of the above-described method embodiment is implemented.
Optionally, the second storage unit 200 may be further configured to: the method shown in step 208 and step 209 in the above method embodiment is implemented when it is detected that the data indicated by the address is data shared by a plurality of first memory units.
Optionally, the second storage unit 200 may further be configured to: when the data indicated by the address is not data shared by the plurality of first memory cells, the method shown in step 210 in the above-described method embodiment is implemented.
Accordingly, the target storage unit 100 may be configured to perform the atomic operation after receiving the success response message. The upper level memory unit 300 of the target memory unit may be configured to execute the atomic operation after receiving the data corresponding to the atomic operation sent by the second memory unit.
Optionally, the second storage unit 200 may be configured to: when the data corresponding to the atomic operation is shared in the target storage unit, sending the data corresponding to the atomic operation to the target storage unit for execution;
and when the data corresponding to the atomic operation is exclusively shared in the upper storage unit of the target storage unit, sending the data corresponding to the atomic operation to the upper storage unit for execution.
Optionally, the target storage unit 100 may be further configured to delete the address of the data from the conflict list after the atomic operation is performed.
Optionally, the second storage unit 200 may be configured to:
detecting whether a processor core initiating the atomic operation in a preset time period shares a first storage unit or not;
when the number of the first storage units shared by the processor cores initiating the atomic operation in the preset time period is multiple, determining that the data is shared by the multiple first storage units;
and when the processor cores initiating the atomic operation in the preset time period share the same first storage unit, determining that the data is not the data shared by the plurality of first storage units.
In summary, embodiments of the present invention provide a memory, where the memory includes a plurality of first storage units and a plurality of exclusive caches, if a state of data corresponding to an atomic operation in a target storage unit is a modified state or an exclusive state, the atomic operation may be executed in the target storage unit, if the state of the data in the target storage unit is a failed state or a shared state, it may be detected through a second storage unit whether the data is data shared by the plurality of first storage units, if the data is data shared by the plurality of first storage units, the atomic operation is executed in the second storage unit, and if the data is not data shared by the plurality of first storage units, the atomic operation may be executed in the target storage unit or an upper storage unit of the target storage unit. The method provided by the application can determine the node for executing the atomic operation according to the sharing condition of the data corresponding to the atomic operation, avoid the performance reduction of computer equipment caused by the conflict generated during the execution of the atomic operation, and ensure the reliability during the execution of the atomic operation.
The embodiment of the invention also provides computer equipment which can comprise a multi-core processor and the memory provided by the embodiment.
The embodiment of the present invention also provides a computer-readable storage medium, in which instructions are stored, and when the computer-readable storage medium runs on a computer, the computer is enabled to execute the execution method of the atomic operation provided by the above method embodiment.
Embodiments of the present invention further provide a chip, where the chip includes a programmable logic circuit and/or a program instruction, and when the chip is operated, the chip is configured to implement the execution method of the atomic operation provided in the above method embodiments.
Embodiments of the present invention further provide a computer program product containing instructions, which, when running on a computer, causes the computer to execute the method for executing atomic operations provided by the above method embodiments.
The above description is only an alternative embodiment of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of changes or substitutions within the technical scope of the present application, and all such changes or substitutions are included in the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (19)

  1. An atomic operation execution method is applied to a memory in a multi-core computer device, the memory includes multiple levels of memory cells, two adjacent levels of memory cells exist in the multiple levels of memory cells, in the two adjacent levels of memory cells, a first level of memory cell includes multiple first memory cells, a second level of memory cell includes at least one second memory cell, each first memory cell is a cache cell, each second memory cell is a cache cell or a memory, and each second cache cell is shared by multiple processor cores, the method includes:
    when a target storage unit in the first storage units receives an atomic operation, if the state of data corresponding to the atomic operation in the target storage unit is a modified state or an exclusive state, executing the atomic operation in the target storage unit;
    if the state of the data in the target storage unit is a failure state or a shared state, detecting whether the data is shared by a plurality of first storage units through the second storage unit;
    if the data is shared by a plurality of first storage units, executing the atomic operation in the second storage unit;
    and if the data is not the data shared by the plurality of first storage units, executing the atomic operation in the target storage unit or the upper storage unit of the target storage unit.
  2. The method of claim 1, wherein the memory further comprises: the upper-level storage unit is positioned at the upper level of the first-level storage unit; the method further comprises the following steps:
    when any storage unit in the upper-level storage units receives the atomic operation, if the state of the data in any storage unit is a modified state or an exclusive state, executing the atomic operation in any storage unit;
    and if the state of the data in any storage unit is a failure state or a shared state, sending the atomic operation to a next-level storage unit of any storage unit through any storage unit.
  3. The method of claim 1, wherein if the state of the data in the target storage unit is a failure state or a shared state, detecting whether the data is shared by the plurality of first storage units via the second storage unit comprises:
    if the state of the data in the target storage unit is a failure state or a shared state, detecting whether the address of the data is recorded in a conflict list or not through the target storage unit;
    if the address is not recorded in the conflict list, sending a probing request carrying the address of the data to the second storage unit through the target storage unit, and detecting whether the data indicated by the address is shared by a plurality of first storage units or not through the second storage unit;
    if the address is recorded in the conflict list, the atomic operation is forwarded to the second storage unit through the target storage unit, and whether the data corresponding to the atomic operation is data shared by a plurality of first storage units or not is detected through the second storage unit.
  4. The method of claim 3, wherein if the data indicated by the address is data shared by a plurality of first memory locations, the method further comprises:
    sending a failure response message to the target storage unit through the second storage unit, wherein the failure response message is used for indicating that the state of the data in the target storage unit is an invalid state;
    and recording the address of the data in the conflict list through the target storage unit according to the failure response message.
  5. The method of claim 3, wherein if the data indicated by the address is not shared by the plurality of first memory cells, performing the atomic operation in the target memory cell or an upper level memory cell of the target memory cell comprises:
    if the data indicated by the address is not data shared by the plurality of first storage units and the state of the data in the target storage unit is a shared state, sending a success response message to the target storage unit through the second storage unit and executing the atomic operation in the target storage unit, wherein the success response message is used for indicating that the state of the data is a modified state or an exclusive state;
    if the data indicated by the address is not data shared by the plurality of first storage units and the state of the data in the target storage unit is an invalid state, sending the success response message and the data to the target storage unit through the second storage unit and executing the atomic operation in the target storage unit;
    and if the address is the address received for the first time and the state of the data in the target storage unit is an invalid state, sending the data corresponding to the atomic operation to the target storage unit or a higher-level storage unit of the target storage unit through the second storage unit, and executing the atomic operation in the storage unit receiving the data.
  6. The method of claim 3, wherein if the data corresponding to the atomic operation is not shared by the plurality of first storage units, performing the atomic operation in the target storage unit or an upper storage unit of the target storage unit comprises:
    if the data corresponding to the atomic operation is shared in the target storage unit, sending the data corresponding to the atomic operation to the target storage unit, and executing the atomic operation in the target storage unit;
    and if the data corresponding to the atomic operation is exclusively shared in a superior storage unit of the target storage unit, sending the data corresponding to the atomic operation to the superior storage unit, and executing the atomic operation in the superior storage unit.
  7. The method of claim 6, wherein after the target storage unit performs the atomic operation, the method further comprises:
    deleting the address of the data from the conflict list by the target storage unit.
  8. The method according to any one of claims 3 to 7, wherein the detecting, by the second storage unit, whether the data is shared by the plurality of first storage units comprises:
    detecting whether the processor cores initiating the atomic operation in a preset time period share one first storage unit or not through the second storage unit;
    when the number of the first storage units shared by the processor cores initiating the atomic operation in a preset time period is multiple, determining that the data is shared by the multiple first storage units;
    and when the processor cores initiating the atomic operation in the preset time period share the same first storage unit, determining that the data is not the data shared by the plurality of first storage units.
  9. The memory is characterized by being applied to multi-core computer equipment and comprising multi-level storage units, wherein two adjacent levels of storage units exist in the multi-level storage units, in the two adjacent levels of storage units, a first level storage unit comprises a plurality of first storage units, a second level storage unit comprises at least one second storage unit, each first storage unit is a cache unit, each second storage unit is a cache unit or a memory, and each second storage unit is shared by a plurality of processor cores;
    a target storage unit, which receives an atomic operation, in the first storage units, and is used for executing the atomic operation in the target storage unit when a state of data corresponding to the atomic operation in the target storage unit is a modified state or an exclusive state;
    the second storage unit is used for detecting whether the data is shared by the first storage units when the state of the data in the target storage unit is a failure state or a shared state; when the data is data shared by a plurality of first storage units, executing the atomic operation in the second storage unit; and when the data is not the data shared by the plurality of first storage units, instructing the target storage unit or the upper storage unit of the target storage unit to execute the atomic operation.
  10. The memory of claim 9, further comprising: the upper-level storage unit is positioned at the upper level of the first-level storage unit;
    any memory cell of the upper level memory cells that receives the atomic operation is configured to:
    when the state of the data in any storage unit is a modified state or an exclusive state, executing the atomic operation in any storage unit;
    and when the state of the data in any storage unit is a failure state or a shared state, sending the atomic operation to a next-level storage unit.
  11. The memory of claim 9,
    the target storage unit is further configured to: when the state of the data in the target storage unit is a failure state or a shared state, detecting whether the address of the data is recorded in a conflict list; if the address is not recorded in the conflict list, sending a probing request carrying the address of the data to the second storage unit; if the address is recorded in the conflict list, forwarding the atomic operation to the second storage unit;
    the second storage unit is configured to: detecting whether the data indicated by the address is data shared by a plurality of first storage units or not when the probe request is received; and when the atomic operation is received, detecting whether the data corresponding to the atomic operation is the data shared by the plurality of first storage units.
  12. The memory of claim 11,
    the second storage unit is further configured to: when the data indicated by the address is detected to be data shared by a plurality of first storage units, sending a failure response message to the target storage unit, wherein the failure response message is used for indicating that the state of the data in the target storage unit is an invalid state;
    the target storage unit is further to: and recording the address of the data in the conflict list according to the failure response message.
  13. The memory of claim 11,
    the second storage unit is further configured to: when the data indicated by the address is not data shared by a plurality of first storage units and the state of the data in the target storage unit is a shared state, sending a success response message to the target storage unit, wherein the success response message is used for indicating that the state of the data is a modified state or an exclusive state;
    when the data indicated by the address is not data shared by a plurality of first storage units and the state of the data in the target storage unit is an invalid state, sending the success response message and the data to the target storage unit;
    when the address is the address received for the first time and the state of the data in the target storage unit is an invalid state, sending the data corresponding to the atomic operation to the target storage unit or a superior storage unit of the target storage unit;
    the target storage unit is configured to execute the atomic operation after receiving the successful response message or the data corresponding to the atomic operation sent by the second storage unit;
    and the upper storage unit of the target storage unit is used for executing the atomic operation after receiving the data corresponding to the atomic operation sent by the second storage unit.
  14. The memory of claim 11,
    the second storage unit is configured to: when the data corresponding to the atomic operation is shared in the target storage unit, sending the data corresponding to the atomic operation to the target storage unit for execution;
    and when the data corresponding to the atomic operation is exclusively shared in the upper storage unit of the target storage unit, sending the data corresponding to the atomic operation to the upper storage unit for execution.
  15. The memory of claim 14,
    the target storage unit is further configured to delete the address of the data from the conflict list after the atomic operation is performed.
  16. The memory according to any one of claims 11 to 15, wherein the second storage unit is configured to:
    detecting whether a processor core initiating the atomic operation in a preset time period shares a first storage unit or not;
    when the number of the first storage units shared by the processor cores initiating the atomic operation in a preset time period is multiple, determining that the data is shared by the multiple first storage units;
    and when the processor cores initiating the atomic operation in the preset time period share the same first storage unit, determining that the data is not the data shared by the plurality of first storage units.
  17. A computer-readable storage medium having stored therein instructions which, when run on a computer, cause the computer to perform a method of performing an atomic operation according to any one of claims 1 to 8.
  18. A chip comprising programmable logic circuits and/or program instructions for implementing a method of performing an atomic operation as claimed in any one of claims 1 to 8 when the chip is run.
  19. A computer device, comprising: a multi-core processor, and a memory as claimed in any one of claims 9 to 16.
CN201880090540.XA 2018-07-11 2018-07-11 Atomic operation execution method and device Active CN111788562B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/095250 WO2020010540A1 (en) 2018-07-11 2018-07-11 Atomic operation execution method and apparatus

Publications (2)

Publication Number Publication Date
CN111788562A true CN111788562A (en) 2020-10-16
CN111788562B CN111788562B (en) 2024-10-18

Family

ID=69141890

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201880090540.XA Active CN111788562B (en) 2018-07-11 2018-07-11 Atomic operation execution method and device

Country Status (2)

Country Link
CN (1) CN111788562B (en)
WO (1) WO2020010540A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11467962B2 (en) * 2020-09-02 2022-10-11 SiFive, Inc. Method for executing atomic memory operations when contested

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102591800A (en) * 2011-12-31 2012-07-18 龙芯中科技术有限公司 Data access and storage system and method for weak consistency storage model
CN103279428A (en) * 2013-05-08 2013-09-04 中国人民解放军国防科学技术大学 Explicit multi-core Cache consistency active management method facing flow application
US20150331798A1 (en) * 2014-05-15 2015-11-19 International Business Machines Corporation Managing memory transactions in a distributed shared memory system supporting caching above a point of coherency
CN105094840A (en) * 2015-08-14 2015-11-25 浪潮(北京)电子信息产业有限公司 Atomic operation implementation method and device based on cache consistency principle
WO2016049808A1 (en) * 2014-09-29 2016-04-07 华为技术有限公司 Cache directory processing method and directory controller of multi-core processor system

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108228483B (en) * 2016-12-15 2021-09-14 北京忆恒创源科技股份有限公司 Method and apparatus for processing atomic write commands
CN108268384A (en) * 2016-12-30 2018-07-10 华为技术有限公司 Read the method and device of data
CN108182281B (en) * 2018-01-26 2022-02-01 创新先进技术有限公司 Data processing control method, device, server and medium based on stream computing

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102591800A (en) * 2011-12-31 2012-07-18 龙芯中科技术有限公司 Data access and storage system and method for weak consistency storage model
CN103279428A (en) * 2013-05-08 2013-09-04 中国人民解放军国防科学技术大学 Explicit multi-core Cache consistency active management method facing flow application
US20150331798A1 (en) * 2014-05-15 2015-11-19 International Business Machines Corporation Managing memory transactions in a distributed shared memory system supporting caching above a point of coherency
WO2016049808A1 (en) * 2014-09-29 2016-04-07 华为技术有限公司 Cache directory processing method and directory controller of multi-core processor system
CN105094840A (en) * 2015-08-14 2015-11-25 浪潮(北京)电子信息产业有限公司 Atomic operation implementation method and device based on cache consistency principle

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
陈李维;张广飞;汪文祥;王焕东;李玲;: "用于多核同步优化的cache一致性协议设计", 高技术通讯, no. 11, 15 November 2013 (2013-11-15) *

Also Published As

Publication number Publication date
CN111788562B (en) 2024-10-18
WO2020010540A1 (en) 2020-01-16

Similar Documents

Publication Publication Date Title
CN104679669B (en) The method of cache cache accumulator systems and access cache row cache line
CN108234641B (en) Data reading and writing method and device based on distributed consistency protocol
US10860323B2 (en) Method and apparatus for processing instructions using processing-in-memory
US10402327B2 (en) Network-aware cache coherence protocol enhancement
US20180039424A1 (en) Method for accessing extended memory, device, and system
CN109684237B (en) Data access method and device based on multi-core processor
WO2013051154A1 (en) Memory allocation control method, program and information processing device
EP3404537B1 (en) Processing node, computer system and transaction conflict detection method
CN107341114B (en) Directory management method, node controller and system
JP2005519391A (en) Method and system for cache coherence in a DSM multiprocessor system without increasing shared vectors
CN113342709A (en) Method for accessing data in a multiprocessor system and multiprocessor system
US11782848B2 (en) Home agent based cache transfer acceleration scheme
US9063667B2 (en) Dynamic memory relocation
KR102680596B1 (en) System and method for storing cache location information for cache entry transfer
JP6343722B2 (en) Method and device for accessing a data visitor directory in a multi-core system
CN115794366A (en) Memory prefetching method and device
CN104239270A (en) High-speed cache synchronization method and high-speed cache synchronization device
WO2019140885A1 (en) Directory processing method and device, and storage system
CN111788562B (en) Atomic operation execution method and device
CN106406745B (en) Method and device for maintaining Cache data consistency according to directory information
CN113419973A (en) Message forwarding method and device
US8938588B2 (en) Ensuring forward progress of token-required cache operations in a shared cache
CN107533512B (en) Method and equipment for merging table entries in directory
US11755485B2 (en) Snoop filter device
CN112463650A (en) Method, device and medium for managing L2P table under multi-core CPU

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant