CN111414318A - Data consistency implementation method based on advanced updating - Google Patents

Data consistency implementation method based on advanced updating Download PDF

Info

Publication number
CN111414318A
CN111414318A CN202010210475.9A CN202010210475A CN111414318A CN 111414318 A CN111414318 A CN 111414318A CN 202010210475 A CN202010210475 A CN 202010210475A CN 111414318 A CN111414318 A CN 111414318A
Authority
CN
China
Prior art keywords
cacheline
data
counter
cpu
cache
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010210475.9A
Other languages
Chinese (zh)
Other versions
CN111414318B (en
Inventor
顾晓峰
李青青
虞致国
魏敬和
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangnan University
Original Assignee
Jiangnan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangnan University filed Critical Jiangnan University
Priority to CN202010210475.9A priority Critical patent/CN111414318B/en
Publication of CN111414318A publication Critical patent/CN111414318A/en
Application granted granted Critical
Publication of CN111414318B publication Critical patent/CN111414318B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0815Cache consistency protocols
    • G06F12/0831Cache consistency protocols using a bus scheme, e.g. with bus monitoring or watching means
    • G06F12/0835Cache consistency protocols using a bus scheme, e.g. with bus monitoring or watching means for main memory peripheral accesses (e.g. I/O or DMA)

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

The method comprises the step of adding a counter for each L1 DCache and other cachelines of all levels of caches, and recording the access condition of the Cacheline containing dirty data copies, so that the data copies containing dirty data in the caches are updated to the next layer of memory in advance when the memory is idle, instead of refreshing the caches until the data is transmitted by DMA, thereby relieving the delay problem caused by the Cache refreshing operation before the data is transmitted by DMA, fully transferring the memory, and improving the efficiency of a DMA transmission system.

Description

Data consistency implementation method based on advanced updating
Technical Field
The invention relates to a method for realizing data consistency based on advanced updating, belonging to the technical field of integrated circuits.
Background
At present, a hierarchical storage system is mostly adopted for a main stream processor, that is, a multi-level Cache (Cache memory) is added between a processor and a main memory (hereinafter referred to as a "main memory") to make up for a performance gap between a CPU and the main memory.
Partial data copies of a main memory are stored in the Cache, and the data consistency of the multi-level Cache is maintained by generally adopting two write strategies, namely a write-back method and a write-through method. The former writes back a dirty copy of data to main memory only if the Cacheline containing the dirty is replaced or invalidated. The strategy reduces the access times of the main memory, improves the system efficiency, but increases the maintenance difficulty of the Cache consistency. The write-through method updates data in a main memory when the CPU writes to the Cache. Although the strategy effectively ensures the consistency of the Cache, the data transmission quantity of the bus is increased, and the write operation of the main memory is delayed for a long time, thereby affecting the overall performance of the system. Therefore, modern processors mostly adopt a write-back method.
DMA (Direct Memory Access) is an efficient data transmission method, and a DMA controller is used to control data to be directly transmitted between an I/O device and a main Memory and between peripheral devices without intervention of a CPU. However, the data consistency problem is also introduced in the DMA transfer, and researchers mainly solve the data consistency problem between the DMA transfer and each level of Cache and main memory from two layers of software and hardware at present. But either software or hardware solutions require that the Cache be flushed before DMA transfers data. According to the characteristics of large data volume of DMA once transmission and large main memory read-write delay, the Cache refresh operation before DMA transmission needs to take a long time, and the efficiency of the DMA cannot be fully exerted.
Disclosure of Invention
In order to solve the problems that the Cache refreshing operation before DMA transmission needs to take a long time and the efficiency of the DMA cannot be fully exerted, the invention provides a data consistency implementation method based on advanced updating, and the technical scheme is as follows:
a method for realizing data consistency is applied to a multi-core processor system and comprises the steps of adding a counter for each L1 DCache and cachelines of other levels of Cache in the multi-core processor system, recording the access condition of the Cacheline containing dirty data copies, and updating the data copies containing the dirty data to a next-level memory in advance when L1 DCache and other levels of Cache are idle.
Optionally, the multi-core processor system includes at least two CPUs, and when L1 DCache and other caches in different levels are idle, the data copy containing the dirty data in the Cache is updated to the next layer of memory in advance, where the updating includes:
step 1, a first CPU requests to access a certain data copy, wherein the first CPU is any CPU in a multi-core processor system;
step 2, when a certain Cache is idle, the counters corresponding to the cachelines are compared, and the Cacheline with the maximum counter value is actively written back to the next-level memory, meanwhile, if another Cache at the same level initiates an access request to the next-level memory and is not actively written back, the next-level memory preferentially processes the access request of another Cache at the same level, wherein the certain Cache refers to any L1 DCache or other caches at all levels in the multi-core processor system;
step 3, the Cache receives the write-back response and actively writes the Cache line with the maximum counter value back to the next-level memory; if other caches at the same level also contain the data copy of the Cacheline and the dirty position 0 of the Cacheline corresponding to the Cache, the state makes corresponding transition according to a consistency protocol;
step 4, DMA initiates an access request;
step 5, the first CPU receives an access request of the DMA, starts to refresh the Cache, waits for the first CPU to refresh the corresponding dirty data copy into the main memory, and returns a response;
and 6, receiving the response information sent by the first CPU by the DMA, and starting to transmit data.
Optionally, in step 5, before the DMA initiates the access request, the first CPU writes back the partially dirty data copy to the main memory in advance.
Optionally, in step 1:
when the first CPU has write miss, after waiting for the first CPU to finish the write operation, setting the counter corresponding to the Cacheline to be 1, and adding 1 to the counters corresponding to other cachelines containing dirty data.
Optionally, in step 1:
when a write hit occurs to the first CPU, there may be two states for the data copy in Cacheline: and the data copy is consistent with the next-level storage and inconsistent with the next-level storage, wherein the inconsistency with the next-level storage indicates that the data copy in the Cacheline contains dirty data:
if the data copy in the Cacheline is consistent with the next-level storage, after the first CPU finishes the write operation, setting a counter corresponding to the Cacheline to be 1, and adding 1 to counters corresponding to other cachelines containing dirty data;
if the data copy in the Cacheline contains dirty data, after the first CPU completes the write operation, the counter corresponding to the Cacheline is set to be 1, if the numerical values of other counters are smaller than the original values of the counters corresponding to the Cacheline which are written and hit, the numerical values of the other counters are added to be 1, and the numerical values of the other counters are kept unchanged.
Optionally, in step 1:
when a read hit occurs for the first CPU:
if dirty data is contained in the Cacheline, subtracting 1 from a counter corresponding to the Cacheline, adding 1 to a counter value which is smaller than the original value of the counter by 1, and keeping the values of other counters unchanged; when the value of a counter corresponding to Cacheline is less than or equal to 2, if the CPU requests to read the Cacheline, the value of the counter is kept unchanged;
if the data copy is consistent with the next-level storage, the counter values corresponding to all the cachelines are not changed after the first CPU finishes the reading operation.
Optionally, in step 1:
when a read miss occurs for the first CPU:
when the data copy exists in other peer caches and the Cacheline contains dirty data, if the multi-core processor system can share the data copy containing the dirty data, the first CPU reads the data copy from the other peer caches, a counter of the Cacheline in the local Cache is set to be 1, and the value of the counter of the other Cacheline containing the dirty data is added with 1; if the multi-core processor system cannot share the copy containing the dirty data, after the first CPU finishes the read operation, setting the counter of the Cacheline in the local Cache to be an initial value of 0, and keeping the numerical values of the counters of other cachelines unchanged;
if the data copy exists in other same-level caches and is consistent with the next-level storage, or only exists in a low-level memory, after the first CPU finishes the read operation, the counter of the Cacheline in the local Cache is set as an initial value of 0, and the values of the counters of other cachelines containing dirty cachelines are kept unchanged.
Optionally, when any L1 DCache in the multi-core processor system and cachelines containing dirty data in other caches at different levels are written back or are invalidated, the counter corresponding to the Cacheline is cleared, if the counter value corresponding to the cachelines containing dirty data is greater than the original value of the counter corresponding to the Cacheline, the counter value is decremented by 1, and the values of the remaining counters remain unchanged.
Optionally, the maximum recordable value of the counter additionally provided for each L1 DCache and cachelines of other caches in the multi-core processor system is the number N of cachelines of the current Cache, and the bit width of the counter is [ log ]2N-1,0]。
The invention also provides a multi-core processor system which adopts the method to realize data consistency, the multi-core processor system comprises at least two CPUs, a counter is additionally arranged for each L1 DCache and cachelines of other levels of caches in the multi-core processor system in the realization process, the access condition of the cachelines containing dirty data copies is recorded, and the data copies containing the dirty data in the caches are updated to the next layer of memory in advance when L1 DCache and other levels of caches are idle.
The invention also provides the data consistency implementation method and/or the application of the multi-core processor system in the technical field of integrated circuits.
The invention has the beneficial effects that:
according to the invention, a counter is additionally arranged for each L1 DCache and the cachelines of other caches at all levels, and the access condition of the Cacheline containing the dirty data copy is recorded, so that the data copy containing the dirty data in the caches is updated to the main memory in advance when the memory is idle, the delay problem caused by Cache refreshing operation before data transmission by DMA is relieved, the memory is fully called, and the efficiency of a DMA transmission system is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a flow chart of the steps described in the present invention.
Fig. 2 is a diagram of a processor system architecture for an embodiment.
FIG. 3 is a state transition diagram of the Cache coherence protocol used in the embodiment.
FIG. 4 is a flow chart of a CPU request write operation.
FIG. 5 is a flow chart of a CPU request read operation.
FIG. 6 is a flow chart of the CPU0 proactive write back conflicting with a CPU1 write operation.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
Introduction of basic terms:
ICache, instruction cache.
DCache, data cache.
Cacheline, Cacheline.
Cache, Cache memory, a Cache is generally divided into a plurality of groups, each group is composed of a plurality of cachelines, in a multi-level storage system, the Cache has level differentiation, which is represented by L1, L2, …, and represents different levels of Cache, such as L1 DCache, L2 Cache.
The first embodiment is as follows:
the embodiment provides a data consistency implementation method based on advanced updating, which is applied to a multi-core processor system, and in the implementation process, a counter is additionally arranged for each L1 DCache and cachelines of other caches at all levels, the access condition of the Cacheline containing dirty data copies is recorded, the copies containing dirty data are written back to a main memory in advance, the problem of delay caused by Cache refreshing operation before DMA data transmission is effectively solved, and the efficiency of a DMA transmission system is improved.
The maximum value which can be recorded by the increased counter of the invention isThe number N of cachelines of the current Cache, i.e. the bit width of the counter is [ log ]2N-1,0]. At least two CPUs are included in a multi-core processor system,
referring to fig. 1, the method includes:
step 1, a first CPU requests to access a certain data copy, wherein the first CPU is any CPU in a multi-core processor system.
Specifically, when the first CPU has a write miss, after the first CPU finishes the write operation, the counter corresponding to the Cacheline is set to 1, and the counters corresponding to other cachelines containing dirty data are incremented by 1.
When the first CPU makes a write hit, the data copy in Cacheline has two states: consistent with the next level of storage, and inconsistent with the next level of storage (i.e., containing dirty data):
if the data copy in the Cacheline is consistent with the next-level storage, after the first CPU finishes the write operation, setting a counter corresponding to the Cacheline to be 1, and adding 1 to counters corresponding to other cachelines containing dirty data;
if the Cacheline contains dirty data, after the first CPU completes the write operation, the counter corresponding to the Cacheline is set to be 1, if the numerical values of other counters are smaller than the original values of the counters corresponding to the Cacheline which are written and hit, the numerical values of the other counters are added to be 1, and the numerical values of the other counters are kept unchanged.
When the first CPU makes a read hit, the data copy in Cacheline has two states: consistent with the next level of storage, and inconsistent with the next level of storage (i.e., containing dirty data):
if dirty data is contained in the Cacheline, subtracting 1 from a counter corresponding to the Cacheline, adding 1 to a counter value which is smaller than the original value of the counter by 1, and keeping the values of other counters unchanged; when the value of a counter corresponding to Cacheline is less than or equal to 2, if the CPU requests to read the Cacheline, the value of the counter is kept unchanged;
if the data copy is consistent with the next-level storage, the counter values corresponding to all the cachelines are not changed after the first CPU finishes the reading operation.
When a read miss occurs to the first CPU, this copy of the data may be present in the other L1 DCache or only in main memory.
When the data copy exists in other peer caches and the Cacheline contains dirty data, if the multi-core processor system can share the data copy containing the dirty data, the first CPU reads the data copy from the other peer caches, a counter of the Cacheline in the local Cache is set to be 1, and the value of the counter of the other Cacheline containing the dirty data is added with 1; if the multi-core processor system cannot share the copy containing the dirty data, after the first CPU finishes the read operation, setting the counter of the Cacheline in the local Cache to be an initial value of 0, and keeping the numerical values of the counters of other cachelines unchanged;
if the data copy exists in other same-level caches and is consistent with the next-level storage, or only exists in a low-level memory, after the first CPU finishes the read operation, the counter of the Cacheline in the local Cache is set as an initial value of 0, and the values of the counters of other cachelines containing dirty cachelines are kept unchanged.
And 2, when a certain Cache is idle, namely no access request exists, comparing counters corresponding to the cachelines in the caches, and requesting the Cache line with the maximum value of the counter to be actively written back to a next-stage memory, and meanwhile, if another Cache at the same stage initiates an access request to the next-stage memory and is not actively written back, preferentially processing the access request of another Cache at the same stage by the next-stage memory, wherein a certain Cache refers to any L1 DCache or other caches at different stages in the multi-core processor system.
And 3, receiving the write-back response by the local Cache, and actively writing the Cache with the maximum counter value back to the next-level memory. If other caches at the same level also contain data copies of the Cacheline with the largest counter value, and the dirty position 0 of the Cacheline corresponding to other caches makes corresponding transition according to the consistency protocol.
And 4, the DMA initiates an access request.
And 5, the first CPU receives the access request of the DMA, starts to refresh the Cache, waits for the first CPU to refresh the corresponding dirty data copy into the main memory, and returns a response.
Before the DMA initiates an access request, the first CPU writes back part of the dirty data copy to the main memory in advance, so that the delay caused by Cache refreshing operation is effectively reduced.
And 6, receiving the response information sent by the first CPU by the DMA, and starting to transmit data.
Example two:
the embodiment provides an application description of the data consistency implementation method based on the early update in the first embodiment in practice, referring to fig. 2, which is specifically as follows:
in this embodiment, the hardware device comprises a multi-core processor system including a CPU0, a CPU1, a second level shared Cache (L2 Cache), a Bus (Bus), a main memory (Mem), and an interconnect fabric, wherein each CPU has a huffman architecture including a 32kB instruction Cache (ICache) and a data Cache (DCache).
Wherein L1 DCache adopts 4-way set-associative organization structure, the number of sets is 64, the size of Cacheline is 128 bytes, so the maximum value N which can be recorded by the counter corresponding to the dirty Cacheline of L1 DCache is 256.
The embodiment is based on a write-back method and a write invalidation policy, and adopts an MSI protocol (Modified Shared invalid protocol) to maintain consistency, and the state of the embodiment is described as follows:
m (modified): the current data copy is modified, is the current latest data in the processor system, is inconsistent with the data copy in the memory, only has a unique copy in the current Cache, and needs to be written back to the memory when replacement occurs;
s (shared): the current data copy is in a shared state, is consistent with the data copy in the memory, possibly exists in a plurality of caches at the same time, and does not need to be written back to the memory when replacement or rewriting occurs;
i (Invalid): indicating that the current copy of data is invalid.
The state transitions of the MSI coherency protocol employed in the present application are shown in FIG. 3.
Referring to fig. 4, the CPU0 requests a write operation to a Cacheline:
when a write miss occurs to the CPU0, and the Cacheline exists in L1 DCache in the CPU1, the status is M, if the Cacheline status is M, the CPU1 writes the Cacheline back to a next-level memory, modifies the Cacheline status in the CPU1 to I, clears a counter corresponding to the Cacheline before the Cacheline, decrements 1 if a counter value corresponding to a Cacheline in another M state is greater than a counter value corresponding to the Cacheline originally, keeps the counter values unchanged, and simultaneously the CPU0 applies for a Cacheline to the local L1 DCache, updates the Cacheline status to M after the write operation is completed, sets the counter value corresponding to the Cacheline to 1, and increments 1 by the counter value corresponding to the other M state in the local L1 DCache.
When the CPU0 has write miss and the Cacheline exists in L2 Cache or a main memory, the CPU0 applies for a Cacheline to the local L1 DCache, and after the write operation is completed, updates the Cacheline state to M, sets the value of the counter corresponding to the Cacheline state to 1, and adds 1 to the counter value corresponding to the Cacheline in the other M state in the local L1 DCache.
When the CPU0 has a write hit, the Cacheline may be in an M or S state, if the Cacheline is in an M state, the CPU0 directly performs a write operation on the Cacheline, and sets the counter corresponding to the Cacheline to 1, if the value of the other counter is smaller than the original value of the counter corresponding to the Cacheline that is written and hit, then adds 1, and the value of the other counter remains unchanged.
Referring to fig. 5, CPU0 requests a read operation for a Cacheline:
when a read hit occurs to the CPU0, the Cacheline may be in M or S state. If the Cacheline state is M, after the CPU0 reads the Cacheline, the Cacheline state remains unchanged, the counter corresponding to the Cacheline is decremented by 1, the counter value that is 1 less than the original value of the counter is incremented by 1, and the values of the other counters remain unchanged. When the value of the counter corresponding to Cacheline is less than or equal to 2, if the CPU0 requests to read the Cacheline, the value of the counter remains unchanged. If the Cacheline state is S, the CPU0 only performs a read operation without changing any Cacheline state and corresponding counter value.
CPU1 writes the Cacheline back to the next-level memory, updates the state to S, clears the counter corresponding to the Cacheline before, decrements the counter value corresponding to the Cacheline in other M states by 1 if the counter value corresponding to the Cacheline in other M states is larger than the original counter value corresponding to the Cacheline, keeps the values of other counters unchanged, CPU0 applies for a Cacheline to local L1 DCache, updates the Cacheline state to S after the read operation is completed, and does not change the states of any other cachelines and the values of the corresponding counters.
When a read miss occurs in the CPU0 and the Cacheline exists in the L2 Cache or the main memory, the CPU0 applies for a Cacheline from the local L1 DCache, and after loading data, updates the state of the Cacheline to S without changing the states of any other cachelines and the values of the corresponding counters.
Referring to fig. 6, when L1 DCache of the CPU0 is idle, that is, there is no access request, the counters corresponding to the cachelines are compared, and active write-back of the Cacheline with the largest counter value is requested, and meanwhile, if L1 DCache of the CPU1 initiates an access request to L2 Cache and does not actively write-back, the L2 Cache preferentially processes the access request of the CPU 1.
In the method for realizing data consistency based on updating in advance, in a multi-core system, when a plurality of caches are idle at the same level, in order to avoid deadlock caused by simultaneous application of active write-back by the plurality of caches, a polling arbitration mechanism is adopted, and dirty data copies are sequentially written back to a next-level memory.
In the second embodiment of the present application, when two L1 DCache simultaneously apply for active write back, according to the polling arbitration mechanism, responses to L1 DCache actively applying write back in CPU0 and CPU1 are sequentially returned, and CPU0 and CPU1 sequentially write back the M state Cacheline with the largest counter value in local L1 DCache.
In the method for realizing data consistency based on advanced updating, in a multi-core system, when L1 DCache of a CPU0 is idle, Cacheline with the maximum counter value is requested to be actively written back to a L2 Cache, and if an access request is initiated to a L2 Cache by a CPU1 and the Cache is not actively written back, the access request of the CPU1 is preferentially processed by a L2 Cache.
According to the invention, a counter is additionally arranged for each L1 DCache and the cachelines of other levels of caches, and the access condition of the Cacheline containing the dirty data copy is recorded, so that when L1 DCache and the cachelines of other levels of caches are idle, part of the data copy containing the dirty data in the caches is updated to a main memory in advance, instead of starting to refresh the caches before DMA data transmission, the problem of time delay caused by Cache refresh operation before DMA data transmission is solved, a memory is fully called, and the efficiency of a DMA transmission system is improved.
Some steps in the embodiments of the present invention may be implemented by software, and the corresponding software program may be stored in a readable storage medium, such as an optical disc or a hard disk.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (10)

1. A data consistency implementation method is characterized by being applied to a multi-core processor system and comprising the steps of adding a counter for Cache lines of L1 DCache and other levels of Cache in the multi-core processor system, recording the access condition of the Cache lines containing dirty data copies, and updating the data copies containing the dirty data to a next level of memory in advance when L1 DCache and other levels of Cache are idle.
2. The method according to claim 1, wherein the multi-core processor system comprises at least two CPUs, and the updating of the data copy containing the dirty data to the next-stage memory in advance when L1 DCache and other stages of caches are idle comprises:
step 1, a first CPU requests to access a certain data copy, wherein the first CPU is any CPU in a multi-core processor system;
step 2, when a certain Cache is idle, the counters corresponding to the cachelines are compared, and the Cacheline with the maximum counter value is actively written back to the next-level memory, meanwhile, if another Cache at the same level initiates an access request to the next-level memory and is not actively written back, the next-level memory preferentially processes the access request of another Cache at the same level, wherein the certain Cache refers to any L1 DCache or other caches at all levels in the multi-core processor system;
step 3, the Cache receives the write-back response and actively writes the Cache line with the maximum counter value back to the next-level memory; if other caches at the same level also contain the data copy, the dirty position 0 of the Cacheline corresponding to other caches at the same level makes corresponding transition according to the consistency protocol;
step 4, DMA initiates an access request;
step 5, the CPU receives an access request of the DMA, starts to refresh the Cache, waits for the corresponding dirty data containing copy to be fully refreshed to the main memory, and returns a response;
and 6, receiving the response information sent by the CPU by the DMA, and starting to transmit data.
3. The method according to claim 2, wherein in step 5, the first CPU writes back the partially dirty copy of the data to the main memory in advance before the DMA initiates the access request.
4. The method according to claim 2, wherein in step 1:
when the first CPU has write miss, after waiting for the first CPU to finish the write operation, setting a counter corresponding to the Cacheline to be 1, and adding 1 to counters corresponding to other cachelines containing dirty data.
5. The method according to claim 2, wherein in step 1:
when a write hit occurs to the first CPU, there may be two states for the data copy in Cacheline: and the data copy is consistent with the next-level storage and inconsistent with the next-level storage, wherein the inconsistency with the next-level storage indicates that the data copy in the Cacheline contains dirty data:
if the data copy in the Cacheline is consistent with the next-level storage, after the first CPU finishes the write operation, setting a counter corresponding to the Cacheline to be 1, and adding 1 to counters corresponding to other cachelines containing dirty data;
if the data copy in the Cacheline contains dirty data, after the first CPU completes the write operation, the counter corresponding to the Cacheline is set to be 1, if the numerical values of other counters are smaller than the original values of the counters corresponding to the Cacheline which are written and hit, the numerical values of the other counters are added to be 1, and the numerical values of the other counters are kept unchanged.
6. The method according to claim 2, wherein in step 1:
when a read hit occurs for the first CPU:
if dirty data is contained in the Cacheline, subtracting 1 from a counter corresponding to the Cacheline, adding 1 to a counter value which is smaller than the original value of the counter by 1, and keeping the values of other counters unchanged; when the value of a counter corresponding to Cacheline is less than or equal to 2, if the CPU requests to read the Cacheline, the value of the counter is kept unchanged;
if the data copy is consistent with the next-level storage, the counter values corresponding to all the cachelines are not changed after the first CPU finishes the reading operation.
7. The method according to claim 2, wherein in step 1:
when a read miss occurs for the first CPU:
when the data copy exists in other peer caches and the Cacheline contains dirty data, if the multi-core processor system can share the data copy containing the dirty data, the first CPU reads the data copy from the other peer caches, a counter of the Cacheline in the local Cache is set to be 1, and the value of the counter of the other Cacheline containing the dirty data is added with 1; if the multi-core processor system cannot share the copy containing the dirty data, after the first CPU finishes the read operation, setting the counter of the Cacheline in the local Cache to be an initial value of 0, and keeping the numerical values of the counters of other cachelines unchanged;
if the data copy exists in other same-level caches and is consistent with the next-level storage, or only exists in a low-level memory, after the first CPU finishes the read operation, the counter of the Cacheline in the local Cache is set as an initial value of 0, and the values of the counters of other cachelines containing dirty cachelines are kept unchanged.
8. The method of claim 2, wherein when a Cacheline containing dirty data in any L1 DCache and other caches of each stage in the multi-core processor system is written back or invalidated, a counter corresponding to the Cacheline is cleared, if a counter value corresponding to the other cachelines containing dirty data is greater than an original value of the counter corresponding to the Cacheline, the counter value is decremented by 1, and the values of the remaining counters remain unchanged.
9. The method of claim 1, wherein the maximum recordable value of the counter added for the cachelines of each L1 DCache and other caches in the multi-core processor system is the number N of cachelines of the current Cache, and the bit width of the counter is [ log ]2N-1,0]。
10. A multi-core processor system is characterized in that the multi-core processor system adopts the method of any one of claims 1 to 8 to achieve data consistency, the multi-core processor system comprises at least two CPUs, a counter is additionally arranged for each L1 DCache and cachelines of other levels of caches in the multi-core processor system in the achieving process, the access condition of the cachelines containing dirty data copies is recorded, and the data copies containing the dirty data in the caches are updated to a main memory in advance when L1 DCache and other levels of caches are idle.
CN202010210475.9A 2020-03-24 2020-03-24 Data consistency implementation method based on advanced updating Active CN111414318B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010210475.9A CN111414318B (en) 2020-03-24 2020-03-24 Data consistency implementation method based on advanced updating

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010210475.9A CN111414318B (en) 2020-03-24 2020-03-24 Data consistency implementation method based on advanced updating

Publications (2)

Publication Number Publication Date
CN111414318A true CN111414318A (en) 2020-07-14
CN111414318B CN111414318B (en) 2022-04-29

Family

ID=71494283

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010210475.9A Active CN111414318B (en) 2020-03-24 2020-03-24 Data consistency implementation method based on advanced updating

Country Status (1)

Country Link
CN (1) CN111414318B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080059707A1 (en) * 2006-08-31 2008-03-06 Srihari Makineni Selective storage of data in levels of a cache memory
CN101866318A (en) * 2010-06-13 2010-10-20 北京北大众志微系统科技有限责任公司 Management system and method for cache replacement strategy
CN102779017A (en) * 2012-06-29 2012-11-14 华中科技大学 Control method of data caching area in solid state disc
CN103019655A (en) * 2012-11-28 2013-04-03 中国人民解放军国防科学技术大学 Internal memory copying accelerating method and device facing multi-core microprocessor
CN104615576A (en) * 2015-03-02 2015-05-13 中国人民解放军国防科学技术大学 CPU+GPU processor-oriented hybrid granularity consistency maintenance method
CN105095116A (en) * 2014-05-19 2015-11-25 华为技术有限公司 Cache replacing method, cache controller and processor
CN105740168A (en) * 2016-01-23 2016-07-06 中国人民解放军国防科学技术大学 Fault-tolerant directory cache controller
CN105740164A (en) * 2014-12-10 2016-07-06 阿里巴巴集团控股有限公司 Multi-core processor supporting cache consistency, reading and writing methods and apparatuses as well as device
CN106909515A (en) * 2017-02-11 2017-06-30 郑州云海信息技术有限公司 Towards multinuclear shared last level cache management method and device that mixing is hosted
CN109669881A (en) * 2018-12-11 2019-04-23 中国航空工业集团公司西安航空计算技术研究所 A kind of calculation method based on the space Cache reservation algorithm

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080059707A1 (en) * 2006-08-31 2008-03-06 Srihari Makineni Selective storage of data in levels of a cache memory
CN101866318A (en) * 2010-06-13 2010-10-20 北京北大众志微系统科技有限责任公司 Management system and method for cache replacement strategy
CN102779017A (en) * 2012-06-29 2012-11-14 华中科技大学 Control method of data caching area in solid state disc
CN103019655A (en) * 2012-11-28 2013-04-03 中国人民解放军国防科学技术大学 Internal memory copying accelerating method and device facing multi-core microprocessor
CN105095116A (en) * 2014-05-19 2015-11-25 华为技术有限公司 Cache replacing method, cache controller and processor
CN105740164A (en) * 2014-12-10 2016-07-06 阿里巴巴集团控股有限公司 Multi-core processor supporting cache consistency, reading and writing methods and apparatuses as well as device
CN104615576A (en) * 2015-03-02 2015-05-13 中国人民解放军国防科学技术大学 CPU+GPU processor-oriented hybrid granularity consistency maintenance method
CN105740168A (en) * 2016-01-23 2016-07-06 中国人民解放军国防科学技术大学 Fault-tolerant directory cache controller
CN106909515A (en) * 2017-02-11 2017-06-30 郑州云海信息技术有限公司 Towards multinuclear shared last level cache management method and device that mixing is hosted
CN109669881A (en) * 2018-12-11 2019-04-23 中国航空工业集团公司西安航空计算技术研究所 A kind of calculation method based on the space Cache reservation algorithm

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
HONGZHOU ZHAO: ""SPACE:Sharing pattern-based directory coherence for multicore scalability"", 《IEEE》 *
ZHIGANG HU: ""Timekeeping techniques for predicting and optimizing memory behavior"", 《IEEE》 *
娄耘赫: ""面向大数据处理的多核处理器Cache一致性协议"", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 *
曹彦荣: ""DMA传输与Cache一致性分析"", 《硅谷》 *

Also Published As

Publication number Publication date
CN111414318B (en) 2022-04-29

Similar Documents

Publication Publication Date Title
US10078592B2 (en) Resolving multi-core shared cache access conflicts
JP5431525B2 (en) A low-cost cache coherency system for accelerators
EP0731944B1 (en) Coherency and synchronization mechanism for i/o channel controllers in a data processing system
JP3737834B2 (en) Dual cache snoop mechanism
US6295582B1 (en) System and method for managing data in an asynchronous I/O cache memory to maintain a predetermined amount of storage space that is readily available
US6321296B1 (en) SDRAM L3 cache using speculative loads with command aborts to lower latency
US11500797B2 (en) Computer memory expansion device and method of operation
JPH09218822A (en) Method for keeping consistency of cache by using successive encoding snoop response and its system
CA2289402C (en) Method and system for efficiently handling operations in a data processing system
JPH09223118A (en) Snoop cache memory control system
JPH10154100A (en) Information processing system, device and its controlling method
JP4295814B2 (en) Multiprocessor system and method of operating multiprocessor system
US6807608B2 (en) Multiprocessor environment supporting variable-sized coherency transactions
CN111414318B (en) Data consistency implementation method based on advanced updating
US6021466A (en) Transferring data between caches in a multiple processor environment
JPH0816885B2 (en) Cache memory control method
WO2022246769A1 (en) Data access method and apparatus
US11847062B2 (en) Re-fetching data for L3 cache data evictions into a last-level cache
JP3507314B2 (en) Memory controller and computer system
JP4856373B2 (en) Memory system, control method thereof, and method of maintaining data coherency
JPH04347750A (en) Control system for parallel cache memory
JPH08106417A (en) Memory access method and memory sharing system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant