CN117472803B - Atomic instruction execution method and device and electronic equipment - Google Patents

Atomic instruction execution method and device and electronic equipment Download PDF

Info

Publication number
CN117472803B
CN117472803B CN202311828916.1A CN202311828916A CN117472803B CN 117472803 B CN117472803 B CN 117472803B CN 202311828916 A CN202311828916 A CN 202311828916A CN 117472803 B CN117472803 B CN 117472803B
Authority
CN
China
Prior art keywords
instruction
target
atomic
data cache
atomic instruction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311828916.1A
Other languages
Chinese (zh)
Other versions
CN117472803A (en
Inventor
李祖松
郇丹丹
邱剑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Micro Core Technology Co ltd
Original Assignee
Beijing Micro Core Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Micro Core Technology Co ltd filed Critical Beijing Micro Core Technology Co ltd
Priority to CN202311828916.1A priority Critical patent/CN117472803B/en
Publication of CN117472803A publication Critical patent/CN117472803A/en
Application granted granted Critical
Publication of CN117472803B publication Critical patent/CN117472803B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0893Caches characterised by their organisation or structure
    • G06F12/0895Caches characterised by their organisation or structure of parts of caches, e.g. directory or tag array
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0808Multiuser, multiprocessor or multiprocessing cache systems with cache invalidating means
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention provides an atomic instruction execution method, an atomic instruction execution device and electronic equipment, which relate to the technical field of computer processors, wherein a data cache stores resource data corresponding to a plurality of atomic instructions, and the method comprises the following steps: responding to an access request input by a user, generating a target atomic instruction, and determining a matching result according to the target atomic instruction and a plurality of resource data; if the matching result indicates no matching, determining target resource data based on the access failure queue and the data cache; and executing the target atomic instruction according to the target resource data. According to the method, an execution pipeline of an atomic instruction is tightly coupled with a main pipeline of a data cache, and under the condition that resource data stored in the data cache is not matched with a target atomic instruction, the target resource data can be rapidly determined according to a memory failure queue and the data cache so as to execute the target atomic instruction, so that the execution efficiency of the atomic instruction is improved, and further, the execution efficiency of a parallel program is improved.

Description

Atomic instruction execution method and device and electronic equipment
Technical Field
The present invention relates to the field of computer processors, and in particular, to a method and an apparatus for executing atomic instructions, and an electronic device.
Background
Modern high performance processors often employ parallel execution in which multiple processors, multiple processor cores, and multiple threads communicate via shared resources. To support operations on shared resources, synchronization relationships among multiple processors, multiple processor cores, and multiple threads may be maintained by atomic instructions, which typically have data loading, data storage, and computation functions. In addition, all atomic instructions are handled in the main pipeline of a Data Cache (DCache).
The existing atomic instruction execution method is to wait for the execution of other instructions in front of the atomic instruction, execute the atomic instruction again when the resource data requested by the atomic instruction is ready and no access conflict exists, and wait for the execution of the atomic instruction before the execution of the other instructions in front of the atomic instruction. The method has lower execution efficiency of atomic instructions, and under the condition of more frequent operation of shared resources in parallel programs, the performance of the parallel programs is greatly influenced, so that the execution efficiency of the parallel programs is lower.
Disclosure of Invention
The invention provides an atomic instruction execution method, an atomic instruction execution device and electronic equipment, which are used for solving the defects that the execution efficiency of an atomic instruction is lower in the existing atomic instruction execution method, the performance of a parallel program is greatly influenced under the condition that the operation of shared resources in the parallel program is more frequent, and the execution efficiency of the parallel program is lower.
The invention provides an atomic instruction execution method, which stores resource data corresponding to a plurality of atomic instructions in a data cache, and comprises the following steps:
responding to an access request input by a user, generating a target atomic instruction, and determining a matching result according to the target atomic instruction and a plurality of resource data;
if the matching result indicates no matching, determining target resource data based on a memory failure queue and the data cache;
And executing the target atomic instruction according to the target resource data.
According to the atomic instruction execution method provided by the invention, the determining the target resource data based on the access invalidation queue and the data cache comprises the following steps: determining a target resource data tag corresponding to the target atomic instruction; reading the read data corresponding to the target resource data tag based on the access failure queue and the data cache; and determining the target resource data according to the read data.
According to the atomic instruction execution method provided by the invention, the reading of the read data corresponding to the target resource data tag based on the access invalidation queue and the data cache comprises the following steps: determining an access backfill request corresponding to the target atomic instruction, and determining a storage state of the access invalidation queue; under the condition that the storage state is not full, according to the access backfill request, updating the read data corresponding to the target resource data tag in the access invalidation queue to the data cache, and reading the read data from the updated data cache; and continuously retransmitting the access backfill request to the access invalidation queue under the condition that the storage state is full until the read data corresponding to the target resource data tag in the access invalidation queue is updated to the data cache according to the access backfill request under the condition that the storage state is determined to be not full, and reading the read data from the updated data cache.
According to the atomic instruction execution method provided by the invention, the target atomic instruction comprises an memory atomic operation (AMO) instruction or a load reserved LR/conditional Storage (SC) instruction pair; the executing the target atomic instruction according to the target resource data includes: executing the AMO instruction according to the target resource data and the preset beats under the condition that the target atomic instruction is the AMO instruction; determining an execution period according to a preset execution time and a preset back-off time under the condition that the target atomic instruction is the LR/SC instruction pair; and executing the LR/SC instruction pair in the execution cycle.
According to the atomic instruction execution method provided by the invention, the LR/SC instruction pair comprises an LR instruction and an SC instruction; the method further comprises the steps of: determining a target address corresponding to the LR instruction; in the execution period, in the process of executing the LR/SC instruction pair, blocking an external consistency detection request to the target address, and blocking the execution of other LR/SC instruction pairs in the next execution period; and allowing the external consistency detection request in the preset back-off time, and continuously blocking the execution of the other LR/SC instruction pairs in the next execution period.
According to the atomic instruction execution method provided by the invention, the target atomic instruction is generated in response to the access request input by the user, and the atomic instruction execution method comprises the following steps: responding to the access request input by the user, and generating a plurality of atomic instructions; for each atomic instruction, acquiring a translation lookaside buffer TLB query request corresponding to the atomic instruction; acquiring a data cache access request corresponding to the atomic instruction under the condition that the address of the atomic instruction is legal according to the TLB query request; and determining the target atomic instruction according to a plurality of data cache access requests.
According to the atomic instruction execution method provided by the invention, the target atomic instruction is determined according to a plurality of data cache access requests, and the atomic instruction execution method comprises the following steps: determining a target data cache access request corresponding to the highest priority from a plurality of data cache access requests according to the priorities corresponding to the data cache access requests; and determining an atomic instruction corresponding to the target data cache access request as the target atomic instruction.
The invention also provides an atomic instruction execution device, the data cache stores resource data corresponding to each of a plurality of atomic instructions, the device comprises:
The resource data determining module is used for responding to an access request input by a user, generating a target atomic instruction and determining a matching result according to the target atomic instruction and a plurality of resource data; if the matching result indicates no matching, determining target resource data based on a memory failure queue and the data cache;
and the atomic instruction execution module is used for executing the target atomic instruction according to the target resource data.
The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the atomic instruction execution method as described above when executing the program.
The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements an atomic instruction execution method as described in any of the above.
The invention also provides a computer program product comprising a computer program which when executed by a processor implements the atomic instruction execution method as described in any one of the above.
The invention provides an atomic instruction execution method, an atomic instruction execution device and electronic equipment, wherein a data cache stores resource data corresponding to a plurality of atomic instructions, and the method comprises the following steps: responding to an access request input by a user, generating a target atomic instruction, and determining a matching result according to the target atomic instruction and a plurality of resource data; if the matching result indicates no matching, determining target resource data based on the access failure queue and the data cache; and executing the target atomic instruction according to the target resource data. According to the method, an execution pipeline of an atomic instruction is tightly coupled with a main pipeline of a data cache, and under the condition that resource data stored in the data cache is not matched with a target atomic instruction, the target resource data can be rapidly determined according to a memory failure queue and the data cache so as to execute the target atomic instruction, so that the execution efficiency of the atomic instruction is improved, and further, the execution efficiency of a parallel program is improved.
Drawings
In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of an atomic instruction execution method provided by the invention;
FIG. 2 is a state transition diagram of an atomic instruction execution state machine according to the present invention;
FIG. 3 is a schematic diagram of an execution pipeline of atomic instructions provided by the present invention;
FIG. 4 is a schematic diagram of an atomic instruction execution device according to the present invention;
fig. 5 is a schematic structural diagram of an electronic device provided by the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be noted that, the execution body according to the embodiment of the present invention may be an atomic instruction execution device or an electronic device.
The following further describes embodiments of the present invention by taking an electronic device as an example.
As shown in fig. 1, which is a flow chart of an atomic instruction execution method provided by the present invention, a data cache stores resource data corresponding to each of a plurality of atomic instructions, and the method may include:
101. and responding to the access request input by the user, generating a target atomic instruction, and determining a matching result according to the target atomic instruction and the plurality of resource data.
The atomic instruction refers to a special instruction in a fifth generation reduced instruction set (Reduced Instruction Set Computer-Five, RISC-V), and the atomic instruction is not interrupted by other instructions or external events in the execution process, so that the problems of data competition and inconsistency in the parallel computing process can be effectively avoided. Alternatively, the atomic instruction may include: memory atomic operation (Atomic Memory Operation, AMO) instruction, or Load Reserved (LR)/store conditional (Store Conditional, SC) instruction pair.
The matching result is used for indicating that the target resource data corresponding to the target atomic instruction belongs to a plurality of resource data stored in the data cache.
After responding to an access request input by a user, the electronic equipment can generate a target atomic instruction according to the access request, further match target resource data corresponding to the target atomic instruction with a plurality of resource data stored in a data cache, and if the target resource data corresponding to the target atomic instruction belongs to the plurality of resource data stored in the data cache, indicate that the target resource data corresponding to the target atomic instruction is matched with one resource data in the plurality of resource data, and determine the matching as a matching result of the target resource data corresponding to the target atomic instruction; if the target resource data corresponding to the target atomic instruction does not belong to the plurality of resource data stored in the data cache, the target resource data corresponding to the target atomic instruction is not matched with any one of the plurality of resource data, and the mismatch is determined as a matching result of the target resource data corresponding to the target atomic instruction. For subsequent determination of the target resource data.
In some embodiments, the electronic device generating the target atomic instruction in response to the access request input by the user may include: the method comprises the steps that the electronic equipment responds to an access request input by a user to generate a plurality of atomic instructions; for each atomic instruction, the electronic device obtains a translation lookaside buffer TLB query request corresponding to the atomic instruction; under the condition that the address of the atomic instruction is legal according to the TLB query request, acquiring a data cache access request corresponding to the atomic instruction; the electronic device determines a target atomic instruction based on a plurality of data cache access requests.
The translation lookaside buffer (Translation Lookaside Buffer, TLB) refers to a cache of the central processing unit (Central Processing Unit, CPU) for improving the translation speed of virtual addresses to physical addresses.
In the process of generating the target atomic instructions, after responding to an access request input by a user, the electronic equipment can generate a plurality of atomic instructions according to the access request; for each atomic instruction, the electronic device obtains a TLB query request corresponding to the atomic instruction, and may complete TLB query and address exception checking (i.e., address validity checking) according to the TLB query request, and stop executing the atomic instruction if it is determined that the address of the atomic instruction is illegal; and under the condition that the address of the atomic instruction is legal, acquiring a data cache access request corresponding to the atomic instruction. Based on the above, how many atomic instructions with legal addresses can acquire how many data cache access requests; the electronic device may then determine a target atomic instruction based on the plurality of data cache access requests, the address of the target atomic instruction also being legitimate.
In some embodiments, the determining, by the electronic device, the target atomic instruction from the plurality of data cache access requests may include: the electronic equipment determines a target data cache access request corresponding to the highest priority from the plurality of data cache access requests according to the priorities corresponding to the plurality of data cache access requests; the electronic device determines an atomic instruction corresponding to the target data cache access request as a target atomic instruction.
The priority is used for representing the importance degree of the data cache access, different data cache access requests correspond to different priorities, and the priorities corresponding to the data cache access requests are stored in advance.
In the process of determining the target atomic instruction, after acquiring a plurality of data cache access requests, the electronic equipment determines the priorities corresponding to the data cache access requests, further determines the highest priority from all the priorities, and determines the data cache access request corresponding to the highest priority as the target data cache access request; then, the electronic device determines an atomic instruction corresponding to the target data cache access request as a target atomic instruction.
102. If the matching result indicates no match, determining target resource data based on the access failure queue and the data cache.
The memory invalidation Queue (Miss Queue) is used for searching and storing resource data which needs to be backfilled when the target atomic instruction access data cache is invalidated.
The target resource data refers to resource data required when the target atomic instruction is executed.
If the matching result indicates no matching, it is indicated that the target resource data corresponding to the target atomic instruction does not belong to a plurality of resource data stored in the data cache, that is, the target atomic instruction accesses the data cache to fail (miss), and at this time, the electronic device may determine the target resource data based on the access failure queue and the data cache, so as to execute the target atomic instruction subsequently.
Optionally, when the matching result indicates matching, it is indicated that the target resource data corresponding to the target atomic instruction belongs to a plurality of resource data stored in the data cache, that is, the target atomic instruction accesses the data cache, and at this time, the electronic device may directly read the target resource data from the data cache.
Optionally, under the condition that the data cache needs to be replaced, the electronic device firstly obtains a replacement selection result corresponding to the data cache, and can determine multiple resource data according to the replacement selection result, wherein the multiple resource data can include target resource data.
Wherein the replacement selection result is used to characterize the resource data in the replacement data cache.
In some embodiments, the determining, by the electronic device, target resource data based on the access invalidation queue and the data cache may include: the electronic equipment determines a target resource data tag corresponding to the target atomic instruction; the electronic equipment reads the read data corresponding to the target resource data tag based on the access failure queue and the data cache; the electronic device determines target resource data according to the read data.
It should be noted that, the data cache stores resource data and a resource data TAG (TAG) corresponding to each of the plurality of atomic instructions, where the resource data corresponds to the resource data TAG one by one.
In the process of determining the target resource data, the electronic equipment can determine the target resource data label corresponding to the target atomic instruction after determining the target atomic instruction; then, the electronic device may send a tag reading request to the data cache according to the target resource data tag, and further perform tag matching check according to a feedback result of the data cache on the tag reading request, that is, determine a matching result of the target resource data tag and each resource data tag in the data cache, and indicate that there is no resource data tag matching the target resource data tag in the data cache if the matching result indicates no matching, that is, the target atomic instruction fails to access the data cache, and the electronic device may read the read data corresponding to the target resource data tag based on the access failure queue and the data cache; then, the electronic device can determine write data corresponding to the target atomic instruction, and splice the read data and the write data to obtain target resource data.
The electronic device determines the read data corresponding to the target resource data tag, and the timing of writing data corresponding to the target atomic instruction determined by the electronic device is not limited.
Optionally, if the matching result indicates matching, it indicates that there is a resource data tag matching the target resource data tag in the data cache, that is, the target atomic instruction accesses the data cache, and the electronic device may directly read the read data corresponding to the target resource data tag from the data cache; then, the electronic device can determine write data corresponding to the target atomic instruction, and splice the read data and the write data to obtain target resource data.
In some embodiments, the electronic device, based on the access failure queue and the data cache, reads the read data corresponding to the target resource data tag, may include: the electronic equipment determines an access backfill request corresponding to the target atomic instruction and determines the storage state of an access invalidation queue; under the condition that the storage state is not full, the electronic equipment updates the read data corresponding to the target resource data tag in the access failure queue to the data cache according to the access backfill request, and reads the read data from the updated data cache; and under the condition that the storage state is the full state, the electronic equipment continuously resends the access backfill request to the access and storage invalidation queue until the read data corresponding to the target resource data tag in the access and storage invalidation queue is updated to the data cache according to the access and backfill request under the condition that the storage state is determined to be the not full state, and the read data is read from the updated data cache.
Under the condition that the target atomic instruction accesses the data cache to fail, the electronic equipment firstly determines an access backfill request corresponding to the target atomic instruction and judges the storage state of an access failure queue: if the access and storage invalidation queue is not full, controlling the access and storage invalidation queue to retrieve the resource data to be backfilled from other lower-level caches (such as a second-level Cache (L2 Cache)) according to the access and storage invalidation queue, namely the read data corresponding to the target resource data tag, and updating the read data to the data Cache by the electronic equipment through the access and storage invalidation queue, so that the read data can be read from the updated data Cache; if the access invalidation queue is full, the access backfill request is continuously resent to the access invalidation queue until the access invalidation queue is determined to be not full, the electronic equipment controls the access invalidation queue to retrieve the resource data needing to be backfilled from other lower-level caches according to the access backfill request, namely the read data corresponding to the target resource data tag, and updates the read data to the data cache through the access invalidation queue, so that the read data can be read from the updated data cache.
Optionally, the electronic device may implement scheduling of access backfill requests through an atomic instruction buffer Unit (atom Unit) to control the access invalidation queue.
103. And executing the target atomic instruction according to the target resource data.
After the target resource data is acquired, the electronic device may execute the target atomic instruction according to the target resource data. The whole process improves the execution efficiency of atomic instructions, and further improves the execution efficiency of parallel programs.
In some embodiments, the target atomic instruction may include a memory atomic operation AMO instruction, or, a load reserve LR/conditional store SC instruction pair; the electronic device executing the target atomic instruction according to the target resource data may include: under the condition that the target atomic instruction is an AMO instruction, the electronic equipment executes the AMO instruction according to the target resource data and the preset beat number; under the condition that the target atomic instruction is an LR/SC instruction pair, the electronic equipment determines an execution period according to the preset execution time and the preset back-off time; during an execution cycle, an LR/SC instruction pair is executed.
The preset beats refer to the stop beats of the AMO instruction. Alternatively, the preset number of beats may be set to 1, 2, 3, or the like. The embodiment of the invention mainly relates to the case that the preset beat number is set to 2.
The preset execution time refers to the total execution time of the LR/SC instruction pair, which can be expressed by LRSC cycles.
The preset back-off time may be indicated by LRSCBackOff cycles.
The execution period may be calculated by the formula LRSC cycles-LRSCBackOff cycles. During an execution cycle, the electronic device must be able to execute LR/SC instruction pairs, typically with a restriction on the instruction set, such as RISC-V up to 16 instructions.
The execution cycle and the preset back-off time are the same, and the execution cycle is the same as the preset back-off time.
In the process of executing the target atomic instruction according to the target resource data, the electronic equipment judges the category of the target atomic instruction: if the target atomic instruction is an AMO instruction, executing the AMO instruction according to the target resource data and the preset beat number; if the target atomic instruction is an LR/SC instruction pair, determining an execution period according to a preset execution time and a preset back-off time, and executing the LR/SC instruction pair in the execution period.
Alternatively, after the AMO instruction is executed, the electronic device may determine a first execution result corresponding to the AMO instruction from the data cache, or after the LR/SC instruction pair is executed, the electronic device may determine a second execution result corresponding to the LR/SC instruction pair from the data cache.
In some embodiments, the LR/SC instruction pair includes an LR instruction and an SC instruction; the method may further comprise: the electronic equipment determines a target address corresponding to the LR instruction; in the execution period, the electronic equipment blocks an external consistency detection request for a target address in the process of executing the LR/SC instruction pair, and blocks the execution of other LR/SC instruction pairs in the next execution period; the electronic device allows external coherency probe requests for a preset back-off time and continues to block execution of other LR/SC instruction pairs for the next execution period.
Wherein, the external consistency Probe (Probe) request refers to a Probe request of the external device to a target address corresponding to the LR instruction.
It should be noted that, the electronic device may record the target address corresponding to the LR instruction by registering a reserved set (reserved set) in the data cache.
The electronic equipment acquires a target address corresponding to the LR instruction in the reserved set; then, in the execution period, in the process of executing the LR/SC instruction pair by the electronic equipment, the external consistency Probe request of the external equipment to the target address is blocked, and the execution of other LR/SC instruction pairs in the next execution period is blocked, so that the current LR/SC instruction pair can be successfully executed without being disturbed; then, the electronic device allows the external device to request the external consistent Probe of the target address within a preset back-off time, at this time, the external device can obtain the operation authority of the data cache, and simultaneously, the execution of other LR/SC instruction pairs in the next execution period is continuously blocked, so as to avoid the deadlock caused by the simultaneous execution of the LR instructions by the external device and the electronic device.
In the embodiment of the invention, a target atomic instruction is generated in response to an access request input by a user, and a matching result is determined according to the target atomic instruction and a plurality of resource data; if the matching result indicates no matching, determining target resource data based on the access failure queue and the data cache; and executing the target atomic instruction according to the target resource data. According to the method, an execution pipeline of an atomic instruction is tightly coupled with a main pipeline of a data cache, and under the condition that resource data stored in the data cache is not matched with a target atomic instruction, the target resource data can be rapidly determined according to a memory failure queue and the data cache so as to execute the target atomic instruction, so that the execution efficiency of the atomic instruction is improved, and further, the execution efficiency of a parallel program is improved.
Embodiments of the invention are further described in connection with the following examples:
exemplary, as shown in fig. 2, a state transition diagram of an atomic instruction execution state machine provided by the present invention is shown. As can be seen from fig. 2, the execution of an atomic instruction may go through the following state machines:
1. EMPTY: idle;
2. tlb_req: issuing a translation lookaside buffer TLB query request;
3. Tlb_resp: completion of TLB lookup and address exception checking (i.e., address validity checking);
4. flush_st_req: issuing a Flush (Flush) request of a Store Buffer and a Store Queue to empty the Store Buffer and the Store Queue to ensure that possible write data modifications are written to a data cache (DCache) prior to an atomic instruction;
5. flush_st_resp: waiting for finishing refreshing the stock buffer area and the stock queue;
6. DCACHE_REQ: initiating an atomic instruction operation to a data cache;
7. DCACHE_RESP: waiting for the data cache to return the execution result of the atomic instruction;
8. FINISH: and writing back the execution result of the atomic instruction, and completing the atomic instruction execution.
Exemplary, as shown in fig. 3, is a schematic diagram of an execution pipeline of an atomic instruction provided by the present invention. The electronic equipment firstly arbitrates the priority of a plurality of data cache access requests, selects a target data cache access request corresponding to the highest priority, and determines an atomic instruction corresponding to the target data cache access request as a target atomic instruction. As can be seen from fig. 3, the execution pipeline of the target atomic instruction is tightly coupled with the main pipeline of the data cache (DCache), and the electronic device sends a read data cache TAG (TAG) request to the data cache according to the target resource data TAG corresponding to the target atomic instruction, and further performs TAG matching check according to the feedback result of the data cache on the read data cache TAG request, that is, determines the matching result of the target resource data TAG and each resource data TAG in the data cache: if the matching result indicates no matching, the data cache is indicated that no resource data tag matched with the target resource data tag exists, that is, the target atomic instruction accesses the data cache to fail, and the electronic device can perform invalidation access by using an atomic instruction buffer Unit (atom Unit) so as to control a memory invalidation Queue (Miss Queue) to retrieve resource data needing backfilling from other lower-level caches, namely, read data corresponding to the target resource data tag; if the matching result indicates matching, it indicates that there is a resource data tag matching the target resource data tag in the data cache, that is, the target atomic instruction accesses the data cache hit, and the electronic device may directly read the read data corresponding to the target resource data tag from the data cache.
Then, the electronic equipment updates the read Data to the Data cache, and further reads the read Data cache Data (Data) from the updated Data cache, and the read Data cache Data and the write Data corresponding to the target atomic instruction are spliced to obtain target resource Data; then, the electronic device may execute the target atomic instruction according to the target resource data, for example, in the case that the target atomic instruction is a memory atomic operation AMO instruction, the memory atomic operation AMO instruction may be executed according to a preset beat number (e.g. 2), and an execution result of the memory atomic operation AMO instruction may be written back to the data cache.
Illustratively, the AMO instruction performs an atomic operation on an operand in memory and sets the destination register to the pre-operation memory value. The process between memory read and write of atomic operation is not interrupted, and the memory value is not modified by other processors. The specific functions of the AMO instruction are as follows:
atomic double word exchange (Atomic Memory Operation: swap Doubleword) amowap.d: amoswap.d rd, rs2, (rs 1), noted x [ rd ] = AMO64 (M [ x [ rs1] ] SWAP x [ rs2 ]). The following atomic operations are performed: the double word in the address x rs1 in the memory is marked as t, the double word is changed into the value of x rs2, and x rd is set as t.
Atomic Word exchange (Atomic Memory Operation: swap Word) amowap. W: amoswap.wrd, rs2, (rs 1), noted as x [ rd ] = AMO32 (M [ x [ rs1] ] SWAP x [ rs2 ]). The following atomic operations are performed: the word in the memory with address x rs1 is marked as t, the word is changed into the value of x rs2, and x rd is set as t of sign bit expansion.
Atom plus Doubleword (Atomic Memory Operation: add Doubleword) AMOADD.D: amoadd.d rd, rs2, (rs 1), noted x [ rd ] =amo64 (M [ x [ rs1] ] +x [ rs2 ]). The following atomic operations are performed: the double word in the address x rs1 in the memory is marked as t, the double word is changed into t+xrs 2, and x rd is set as t.
Atomic addition (Atomic Memory Operation: add Word) AMOADD.W: amoadd.wrd, rs2, (rs 1), noted x [ rd ] =amo32 (M [ x [ rs1] ] +x [ rs2 ]). The following atomic operations are performed: the word in the memory with address x rs1 is marked as t, the word is changed into t+xrs 2, and x rd is set as t of sign bit expansion.
Atomic Doubleword AND (Atomic Memory Operation: AND Doubleword) AMOAND.D: amoand. D rd, rs2, (rs 1), noted x [ rd ] = AMO64 (M [ x [ rs1] ] & x [ rs2 ]). The following atomic operations are performed: and (3) marking a double word in the memory with the address x [ rs1] as t, changing the double word into a result of t and x [ rs2] bit AND, and setting x [ rd ] as t.
Atomic Word AND (Atomic Memory Operation: AND Word) AMOAND.W: amoand.wrd, rs2, (rs 1), noted x [ rd ] =amo32 (M [ x [ rs1] ] & x [ rs2 ]). The following atomic operations are performed: and (3) marking a word in the memory with the address x [ rs1] as t, changing the word into a result of t and x [ rs2] bit AND, and setting x [ rd ] as t of sign bit expansion.
Atomic Doubleword OR (Atomic Memory Operation: OR Doubleword) AMOOR.D: amoor.d rd, rs2, (rs 1), noted x [ rd ] =amo64 (M [ x [ rs1] ] |x [ rs2 ]). The following atomic operations are performed: and (3) marking a double word in the memory with the address x [ rs1] as t, changing the double word into t and x [ rs2] bit or, and setting x [ rd ] as t.
Atomic Word OR (Atomic Memory Operation: OR Word) amoor. W: amoor.wrd, rs2, (rs 1), noted as x [ rd ] =amo32 (M [ x [ rs1] ] |x [ rs2 ]). The following atomic operations are performed: and (3) marking a word in the memory with the address x [ rs1] as t, changing the word into t and the result of x [ rs2] bit OR, and setting x [ rd ] as t of sign bit expansion.
Atomic double word exclusive OR (Atomic Memory Operation: XOR double word) AMOXOR.D: the amoxor. D rd, rs2, (rs 1), denoted x [ rd ] = AMO64 (M [ x [ rs1] ] x [ rs2 ]). The following atomic operations are performed: and (3) marking a double word in the memory with the address x [ rs1] as t, changing the double word into the result of bitwise exclusive OR of t and x [ rs2], and setting x [ rd ] as t.
Atomic Word exclusive-or (Atomic Memory Operation: XOR Word) amoxor. W: the amoxor.wrd, rs2, (rs 1), denoted x [ rd ] = AMO32 (M [ x [ rs1] ] x [ rs2 ]). The following atomic operations are performed: and (3) marking a word in the memory with the address x [ rs1] as t, changing the word into the result of bitwise exclusive OR of t and x [ rs2], and setting x [ rd ] as t of sign bit expansion.
Atomic maximum doubleword (Atomic Memory Operation: maximum Doubleword) amomax.d: amomax. D rd, rs2, (rs 1), noted as x [ rd ] = AMO64 (M [ x [ rs1] ] MAX x [ rs2 ]). The following atomic operations are performed: the double word in the memory address x rs1 is marked as t, the double word is changed into the larger one of t and x rs2 (compared by two's complement), and x rd is set as t.
Atomic Maximum Word (Atomic Memory Operation: maximum Word) amomax. W: amomax. Wrd, rs2, (rs 1), noted as x [ rd ] = AMO32 (M [ x [ rs1] ] MAX x [ rs2 ]). The following atomic operations are performed: the word in the memory with address x rs1 is marked as t, the word is changed into the larger one of t and x rs2 (compared by two's complement), and x rd is set as t of sign bit expansion.
Atomic Unsigned maximum doubleword (Atomic Memory Operation: maximum Doubleword, unoigned) amomaxu.d: amomaxu.d rd, rs2, (rs 1), noted as x [ rd ] = AMO64 (M [ x [ rs1] ] MAXU x [ rs2 ]). The following atomic operations are performed: the double word in the memory address x rs1 is marked as t, the double word is changed into the larger one of t and x rs2 (using unsigned comparison), and x rd is set as t.
Atomic Unsigned Maximum Word (Atomic Memory Operation: maximum Word, unoigned) amomaxu. W: amomaxu.wrd, rs2, (rs 1), noted as x [ rd ] = AMO32 (M [ x [ rs1] ] MAXU x [ rs2 ]). The following atomic operations are performed: the word in the memory address x rs1 is marked as t, the word is changed to the larger one of t and x rs2 (by unsigned comparison), and x rd is set as t of sign bit extension.
Atomic minimum doubleword (Atomic Memory Operation: minimum Doubleword) amomin.d: amomin. D rd, rs2, (rs 1), noted x [ rd ] = AMO64 (M [ x [ rs1] ] MIN x [ rs2 ]). The following atomic operations are performed: the double word in the memory address x rs1 is marked as t, the double word is changed to the smaller one of t and x rs2 (compared by two's complement), and x rd is set as t.
Atomic min Word (Atomic Memory Operation: minimum Word) amomin. W: amomin. Wrd, rs2, (rs 1), noted as x [ rd ] = AMO32 (M [ x [ rs1] ] MIN x [ rs2 ]). The following atomic operations are performed: the word in the memory with address x rs1 is marked as t, the word is changed to the smaller one of t and x rs2 (compared by two's complement), and x rd is set as t of sign bit expansion.
Atomic Unsigned minimum doubleword (Atomic Memory Operation: minimum Doubleword, unshared) amominu.d: amominu.d rd, rs2, (rs 1), noted as x [ rd ] = AMO64 (M [ x [ rs1] ] minux [ rs2 ]). The following atomic operations are performed: the double word in the memory address x rs1 is marked as t, the double word is changed to the smaller of t and x rs2 (by unsigned comparison), and x rd is set as t.
Atomic Unsigned min Word (Atomic Memory Operation: minimum Word, unoigned) amominu. W: amominu.wrd, rs2, (rs 1), noted as x [ rd ] =amo32 (M [ x [ rs1] ] minux [ rs2 ]). The following atomic operations are performed: the word in memory with address x rs1 is marked as t, the word is changed to the smaller of t and x rs2 (by unsigned comparison), and x rd is set as t of sign bit extension.
Illustratively, LR instructions and SC instructions are used in pairs, guaranteeing atomicity of operations between the two instructions. The LR instruction reads a memory word/double word, stores it in the destination register, and keeps a reservation record of the word/double word. If there is a reservation record at the destination address of the SC instruction, the SC instruction stores the word/double word at that address. If the store is successful, the SC instruction writes a 0 into the destination register, otherwise a non-0 error code is written. The specific functions of LR and SC instructions are as follows:
load reserved doubleword (Load-Reserved Doubleword) lr.d: lr.d, (rs 1), denoted x [ rd ] =loadreserved 64 (M [ x [ rs1 ]). Eight bytes are loaded from the address x [ rs1] in the memory, x [ rd ] is written, and the memory double-word registration is reserved.
Load-Reserved Word (Load-Reserved Word) lr.w: lr.wrd, (rs 1), denoted x [ rd ] =loadreserved 32 (M [ x [ rs1 ]). Four bytes are loaded from the address x [ rs1] in the memory, the sign bit is written into x [ rd ] after expansion, and the memory word is registered and reserved.
Conditional Store doubleword (Store-Conditional Doubleword) sc.d: sc.d rd, rs2, (rs 1), noted x [ rd ] = StoreConditnal 64 (M [ x [ rs1], x [ rs2 ]). If there is a load reservation on memory address x rs1, the 8-byte number in the x rs2 register is stored into that address. If the storing is successful, 0 is stored in the register x [ rd ], otherwise, an error code other than 0 is stored.
Conditional Store Word (sc.w): sc.wrd, rs2, (rs 1), noted as x [ rd ] = storeConditnal 32 (M [ x [ rs1], x [ rs2 ]). If there is a load reservation on memory address x rs1, the 4-byte number in the x rs2 register is stored into that address. If the storing is successful, 0 is stored in the register x [ rd ], otherwise, an error code other than 0 is stored.
In addition, other complex instruction sets (Complex Instruction Set Computer, CISC) And reduced instruction sets (Reduced Instruction Set Computer, RISC) provide similar atomic instructions, such as SWAP (abbreviated SWP) instructions, COMPARE AND CHANGE, CMPXCHG or COMPARE And SWAP (CAS) instructions, and atomic ADD (LOAD-ADD) instructions.
The atomic instruction execution device provided by the invention is described below, and the atomic instruction execution device described below and the atomic instruction execution method described above can be referred to correspondingly.
As shown in fig. 4, which is a schematic structural diagram of an atomic instruction execution device provided by the present invention, a data cache stores resource data corresponding to each of a plurality of atomic instructions, where the device may include:
a resource data determining module 401, configured to generate a target atomic instruction in response to an access request input by a user, and determine a matching result according to the target atomic instruction and a plurality of resource data; if the matching result indicates no matching, determining target resource data based on the access failure queue and the data cache;
an atomic instruction execution module 402, configured to execute the target atomic instruction according to the target resource data.
Optionally, the resource data determining module 401 is specifically configured to determine a target resource data tag corresponding to the target atomic instruction; reading the read data corresponding to the target resource data tag based on the access failure queue and the data cache; and determining the target resource data according to the read data.
Optionally, the resource data determining module 401 is specifically configured to determine an access backfill request corresponding to the target atomic instruction, and determine a storage state of the access invalidation queue; under the condition that the storage state is not full, according to the access backfill request, updating the read data corresponding to the target resource data tag in the access invalidation queue to the data cache, and reading the read data from the updated data cache; and continuously retransmitting the access backfill request to the access invalidation queue under the condition that the storage state is full, updating the read data corresponding to the target resource data tag in the access invalidation queue to the data cache according to the access backfill request under the condition that the storage state is determined to be the non-full state, and reading the read data from the updated data cache.
Optionally, the target atomic instruction includes a memory atomic operation AMO instruction, or a load reserve LR/conditional store SC instruction pair; the atomic instruction execution module 402 is specifically configured to execute the AMO instruction according to a preset beat number according to the target resource data when the target atomic instruction is the AMO instruction; determining an execution period according to a preset execution time and a preset back-off time under the condition that the target atomic instruction is the LR/SC instruction pair; during the execution cycle, the LR/SC instruction pair is executed.
Optionally, the LR/SC instruction pair includes an LR instruction and an SC instruction; the atomic instruction execution module 402 is specifically configured to determine a target address corresponding to the LR instruction; in the execution period, in the process of executing the LR/SC instruction pair, blocking the external consistency detection request of the target address, and blocking the execution of other LR/SC instruction pairs in the next execution period; the external coherency probe request is allowed for the preset back-off time and execution of the other LR/SC instruction pair is continuously blocked for the next execution cycle.
Optionally, the resource data determining module 401 is specifically configured to generate a plurality of atomic instructions in response to an access request input by the user; for each atomic instruction, acquiring a translation lookaside buffer TLB query request corresponding to the atomic instruction; under the condition that the address of the atomic instruction is legal according to the TLB inquiry request, acquiring a data cache access request corresponding to the atomic instruction; the target atomic instruction is determined based on a plurality of data cache access requests.
Optionally, the resource data determining module 401 is specifically configured to determine, according to priorities corresponding to a plurality of data cache access requests, a target data cache access request corresponding to a highest priority from the plurality of data cache access requests; and determining an atomic instruction corresponding to the target data cache access request as the target atomic instruction.
As shown in fig. 5, a schematic structural diagram of an electronic device provided by the present invention may include: processor 510, communication interface (Communications Interface) 520, memory 530, and communication bus 540, wherein processor 510, communication interface 520, memory 530 complete communication with each other through communication bus 540. Processor 510 may invoke logic instructions in memory 530 to execute an atomic instruction execution method having stored in a data cache resource data corresponding to each of a plurality of atomic instructions, the method comprising: responding to an access request input by a user, generating a target atomic instruction, and determining a matching result according to the target atomic instruction and a plurality of resource data; if the matching result indicates no matching, determining target resource data based on the access failure queue and the data cache; and executing the target atomic instruction according to the target resource data.
Further, the logic instructions in the memory 530 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
In another aspect, the present invention also provides a computer program product, where the computer program product includes a computer program, where the computer program can be stored on a non-transitory computer readable storage medium, where the computer program, when executed by a processor, can perform an atomic instruction execution method provided by the methods above, where a data cache stores resource data corresponding to each of a plurality of atomic instructions, and where the method includes: responding to an access request input by a user, generating a target atomic instruction, and determining a matching result according to the target atomic instruction and a plurality of resource data; if the matching result indicates no matching, determining target resource data based on the access failure queue and the data cache; and executing the target atomic instruction according to the target resource data.
In still another aspect, the present invention provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the atomic instruction execution method provided by the above methods, and the data cache stores resource data corresponding to each of a plurality of atomic instructions, the method comprising: responding to an access request input by a user, generating a target atomic instruction, and determining a matching result according to the target atomic instruction and a plurality of resource data; if the matching result indicates no matching, determining target resource data based on the access failure queue and the data cache; and executing the target atomic instruction according to the target resource data.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (9)

1. An atomic instruction execution method, wherein resource data corresponding to each of a plurality of atomic instructions is stored in a data cache, the method comprising:
responding to an access request input by a user, generating a target atomic instruction, and determining a matching result according to the target atomic instruction and a plurality of resource data; the target atomic instruction comprises an memory atomic operation (AMO) instruction or a load reserved LR/conditional Storage (SC) instruction pair;
if the matching result indicates no matching, determining target resource data based on a memory failure queue and the data cache;
executing the AMO instruction according to the target resource data and the preset beats under the condition that the target atomic instruction is the AMO instruction;
determining an execution period according to a preset execution time and a preset back-off time under the condition that the target atomic instruction is the LR/SC instruction pair; and executing the LR/SC instruction pair in the execution cycle.
2. The method of claim 1, wherein the determining the target resource data based on the memory miss queue and the data cache comprises:
determining a target resource data tag corresponding to the target atomic instruction;
Reading the read data corresponding to the target resource data tag based on the access failure queue and the data cache;
and determining the target resource data according to the read data.
3. The method of claim 2, wherein the reading the read data corresponding to the target resource data tag based on the access invalidate queue and the data cache comprises:
determining an access backfill request corresponding to the target atomic instruction, and determining a storage state of the access invalidation queue;
under the condition that the storage state is not full, according to the access backfill request, updating the read data corresponding to the target resource data tag in the access invalidation queue to the data cache, and reading the read data from the updated data cache;
and continuously retransmitting the access backfill request to the access invalidation queue under the condition that the storage state is full until the read data corresponding to the target resource data tag in the access invalidation queue is updated to the data cache according to the access backfill request under the condition that the storage state is determined to be not full, and reading the read data from the updated data cache.
4. The method of claim 1, wherein the LR/SC instruction pair comprises an LR instruction and an SC instruction; the method further comprises the steps of:
determining a target address corresponding to the LR instruction;
in the execution period, in the process of executing the LR/SC instruction pair, blocking an external consistency detection request to the target address, and blocking the execution of other LR/SC instruction pairs in the next execution period;
and allowing the external consistency detection request in the preset back-off time, and continuously blocking the execution of the other LR/SC instruction pairs in the next execution period.
5. A method according to any one of claims 1-3, wherein generating a target atomic instruction in response to a user-entered access request comprises:
responding to the access request input by the user, and generating a plurality of atomic instructions;
for each atomic instruction, acquiring a translation lookaside buffer TLB query request corresponding to the atomic instruction; acquiring a data cache access request corresponding to the atomic instruction under the condition that the address of the atomic instruction is legal according to the TLB query request;
and determining the target atomic instruction according to a plurality of data cache access requests.
6. The method of claim 5, wherein determining the target atomic instruction from a plurality of data cache access requests comprises:
determining a target data cache access request corresponding to the highest priority from a plurality of data cache access requests according to the priorities corresponding to the data cache access requests;
and determining an atomic instruction corresponding to the target data cache access request as the target atomic instruction.
7. An atomic instruction execution apparatus in which resource data corresponding to each of a plurality of atomic instructions is stored in a data cache, the apparatus comprising:
the resource data determining module is used for responding to an access request input by a user, generating a target atomic instruction and determining a matching result according to the target atomic instruction and a plurality of resource data; the target atomic instruction comprises an memory atomic operation (AMO) instruction or a load reserved LR/conditional Storage (SC) instruction pair; if the matching result indicates no matching, determining target resource data based on a memory failure queue and the data cache;
the atomic instruction execution module is used for executing the AMO instruction according to the target resource data and the preset beat number under the condition that the target atomic instruction is the AMO instruction; determining an execution period according to a preset execution time and a preset back-off time under the condition that the target atomic instruction is the LR/SC instruction pair; and executing the LR/SC instruction pair in the execution cycle.
8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the atomic instruction execution method of any one of claims 1 to 6 when the program is executed.
9. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the atomic instruction execution method according to any of claims 1 to 6.
CN202311828916.1A 2023-12-28 2023-12-28 Atomic instruction execution method and device and electronic equipment Active CN117472803B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311828916.1A CN117472803B (en) 2023-12-28 2023-12-28 Atomic instruction execution method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311828916.1A CN117472803B (en) 2023-12-28 2023-12-28 Atomic instruction execution method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN117472803A CN117472803A (en) 2024-01-30
CN117472803B true CN117472803B (en) 2024-03-29

Family

ID=89627855

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311828916.1A Active CN117472803B (en) 2023-12-28 2023-12-28 Atomic instruction execution method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN117472803B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102483704A (en) * 2009-08-31 2012-05-30 国际商业机器公司 Transactional memory system with efficient cache support
WO2023236355A1 (en) * 2022-06-10 2023-12-14 成都登临科技有限公司 Method for acquiring instruction in parallel by multiple thread groups, processor, and electronic device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11068407B2 (en) * 2018-10-26 2021-07-20 International Business Machines Corporation Synchronized access to data in shared memory by protecting the load target address of a load-reserve instruction

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102483704A (en) * 2009-08-31 2012-05-30 国际商业机器公司 Transactional memory system with efficient cache support
WO2023236355A1 (en) * 2022-06-10 2023-12-14 成都登临科技有限公司 Method for acquiring instruction in parallel by multiple thread groups, processor, and electronic device

Also Published As

Publication number Publication date
CN117472803A (en) 2024-01-30

Similar Documents

Publication Publication Date Title
US8180977B2 (en) Transactional memory in out-of-order processors
US7925839B1 (en) System and method for performing memory operations in a computing system
US8706973B2 (en) Unbounded transactional memory system and method
TWI525539B (en) Method, processor, and system for synchronizing simd vectors
US8868837B2 (en) Cache directory lookup reader set encoding for partial cache line speculation support
US8041926B2 (en) Transparent concurrent atomic execution
US8190859B2 (en) Critical section detection and prediction mechanism for hardware lock elision
US20110040906A1 (en) Multi-level Buffering of Transactional Data
US6481251B1 (en) Store queue number assignment and tracking
JP2009521767A (en) Finite transaction memory system
US20130339629A1 (en) Tracking transactional execution footprint
US9098327B2 (en) Method and apparatus for implementing a transactional store system using a helper thread
EP2641171B1 (en) Preventing unintended loss of transactional data in hardware transactional memory systems
WO2009009583A1 (en) Bufferless transactional memory with runahead execution
US20120137077A1 (en) Miss buffer for a multi-threaded processor
US20170206035A1 (en) Random-Access Disjoint Concurrent Sparse Writes to Heterogeneous Buffers
US20050283783A1 (en) Method for optimizing pipeline use in a multiprocessing system
CN117472803B (en) Atomic instruction execution method and device and electronic equipment
CN113900968B (en) Method and device for realizing synchronous operation of multi-copy non-atomic write storage sequence
JP6222100B2 (en) Data storage device, data storage method and program
CN117478089B (en) Method and device for executing stock instruction and electronic equipment
CN117472804B (en) Access failure queue processing method and device and electronic equipment
US20080104335A1 (en) Facilitating load reordering through cacheline marking
WO2012098812A1 (en) Multiprocessor system, multiprocessor control method, and processor
CN115729628A (en) Advanced submission method for unequal data of superscalar microprocessor storage instruction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant