CN111414318B - Data consistency implementation method based on advanced updating - Google Patents

Data consistency implementation method based on advanced updating Download PDF

Info

Publication number
CN111414318B
CN111414318B CN202010210475.9A CN202010210475A CN111414318B CN 111414318 B CN111414318 B CN 111414318B CN 202010210475 A CN202010210475 A CN 202010210475A CN 111414318 B CN111414318 B CN 111414318B
Authority
CN
China
Prior art keywords
cacheline
counter
cache
data
cpu
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010210475.9A
Other languages
Chinese (zh)
Other versions
CN111414318A (en
Inventor
顾晓峰
李青青
虞致国
魏敬和
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangnan University
Original Assignee
Jiangnan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangnan University filed Critical Jiangnan University
Priority to CN202010210475.9A priority Critical patent/CN111414318B/en
Publication of CN111414318A publication Critical patent/CN111414318A/en
Application granted granted Critical
Publication of CN111414318B publication Critical patent/CN111414318B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0815Cache consistency protocols
    • G06F12/0831Cache consistency protocols using a bus scheme, e.g. with bus monitoring or watching means
    • G06F12/0835Cache consistency protocols using a bus scheme, e.g. with bus monitoring or watching means for main memory peripheral accesses (e.g. I/O or DMA)

Abstract

The invention discloses a method for realizing data consistency based on advanced updating, belonging to the technical field of integrated circuits. The method comprises the steps that a counter is additionally arranged for each L1DCache and the cachelines of other caches at all levels, and the access condition of the cachelines containing dirty data copies is recorded, so that the data copies containing the dirty data in the caches are updated to a next-layer memory in advance when the memories are idle, and the caches are not refreshed before the data are transmitted by the DMA, so that the problem of time delay caused by Cache refreshing operation before the data are transmitted by the DMA is solved, the memories are fully called, and the efficiency of a DMA transmission system is improved.

Description

Data consistency implementation method based on advanced updating
Technical Field
The invention relates to a method for realizing data consistency based on advanced updating, belonging to the technical field of integrated circuits.
Background
At present, a hierarchical storage system is mostly adopted for a main stream processor, that is, a multi-level Cache (Cache memory) is added between a processor and a main memory (hereinafter referred to as a "main memory") to make up for a performance gap between a CPU and the main memory.
Partial data copies of a main memory are stored in the Cache, and the data consistency of the multi-level Cache is maintained by generally adopting two write strategies, namely a write-back method and a write-through method. The former writes back a dirty copy of data to main memory only if the Cacheline containing the dirty is replaced or invalidated. The strategy reduces the access times of the main memory, improves the system efficiency, but increases the maintenance difficulty of the Cache consistency. The write-through method updates data in a main memory when the CPU writes to the Cache. Although the strategy effectively ensures the consistency of the Cache, the data transmission quantity of the bus is increased, and the write operation of the main memory is delayed for a long time, thereby affecting the overall performance of the system. Therefore, modern processors mostly adopt a write-back method.
DMA (Direct Memory Access) is an efficient data transmission method, and a DMA controller is used to control data to be directly transmitted between an I/O device and a main Memory and between peripheral devices without intervention of a CPU. However, the data consistency problem is also introduced in the DMA transfer, and researchers mainly solve the data consistency problem between the DMA transfer and each level of Cache and main memory from two layers of software and hardware at present. But either software or hardware solutions require that the Cache be flushed before DMA transfers data. According to the characteristics of large data volume of DMA once transmission and large main memory read-write delay, the Cache refresh operation before DMA transmission needs to take a long time, and the efficiency of the DMA cannot be fully exerted.
Disclosure of Invention
In order to solve the problems that the Cache refreshing operation before DMA transmission needs to take a long time and the efficiency of the DMA cannot be fully exerted, the invention provides a data consistency implementation method based on advanced updating, and the technical scheme is as follows:
a data consistency implementation method is applied to a multi-core processor system and comprises the following steps: adding a counter for each L1DCache and cachelines of other levels of Cache in the multi-core processor system, and recording the access condition of the Cacheline containing the dirty data copy; when the L1DCache and other levels of Cache are idle, the data copy containing the dirty data in the Cache is updated to the next level of memory in advance.
Optionally, the multi-core processor system includes at least two CPUs, and when the L1DCache and other caches in each stage are idle, the data copy containing the dirty data in the L1DCache is updated to the next-layer memory in advance, where the updating includes:
step 1, a first CPU requests to access a certain data copy, wherein the first CPU is any CPU in a multi-core processor system;
step 2, when a certain Cache is idle, comparing counters corresponding to the cachelines, and requesting the next-level memory to actively write back the Cacheline with the maximum counter value; meanwhile, if another Cache at the same level initiates an access request to a next-level memory and the Cache is not actively written back, the next-level memory preferentially processes the access request of another Cache at the same level, wherein a certain Cache refers to any L1DCache or other caches at all levels in the multi-core processor system;
step 3, the Cache receives the write-back response and actively writes the Cache line with the maximum counter value back to the next-level memory; if other caches at the same level also contain the data copy of the Cacheline and the dirty position 0 of the Cacheline corresponding to the Cache, the state makes corresponding transition according to a consistency protocol;
step 4, DMA initiates an access request;
step 5, the first CPU receives an access request of the DMA, starts to refresh the Cache, waits for the first CPU to refresh the corresponding dirty data copy into the main memory, and returns a response;
and 6, receiving the response information sent by the first CPU by the DMA, and starting to transmit data.
Optionally, in step 5, before the DMA initiates the access request, the first CPU writes back the partially dirty data copy to the main memory in advance.
Optionally, in step 1:
when the first CPU has write miss, after waiting for the first CPU to finish the write operation, setting the counter corresponding to the Cacheline to be 1, and adding 1 to the counters corresponding to other cachelines containing dirty data.
Optionally, in step 1:
when a write hit occurs to the first CPU, there may be two states for the data copy in Cacheline: and the data copy is consistent with the next-level storage and inconsistent with the next-level storage, wherein the inconsistency with the next-level storage indicates that the data copy in the Cacheline contains dirty data:
if the data copy in the Cacheline is consistent with the next-level storage, after the first CPU finishes the write operation, setting a counter corresponding to the Cacheline to be 1, and adding 1 to counters corresponding to other cachelines containing dirty data;
if the data copy in the Cacheline contains dirty data, after the first CPU completes the write operation, the counter corresponding to the Cacheline is set to be 1, if the numerical values of other counters are smaller than the original values of the counters corresponding to the Cacheline which are written and hit, the numerical values of the other counters are added to be 1, and the numerical values of the other counters are kept unchanged.
Optionally, in step 1:
when a read hit occurs for the first CPU:
if dirty data is contained in the Cacheline, subtracting 1 from a counter corresponding to the Cacheline, adding 1 to a counter value which is smaller than the original value of the counter by 1, and keeping the values of other counters unchanged; when the value of a counter corresponding to Cacheline is less than or equal to 2, if the CPU requests to read the Cacheline, the value of the counter is kept unchanged;
if the data copy is consistent with the next-level storage, the counter values corresponding to all the cachelines are not changed after the first CPU finishes the reading operation.
Optionally, in step 1:
when a read miss occurs for the first CPU:
when the data copy exists in other peer caches and the Cacheline contains dirty data, if the multi-core processor system can share the data copy containing the dirty data, the first CPU reads the data copy from the other peer caches, a counter of the Cacheline in the local Cache is set to be 1, and the value of the counter of the other Cacheline containing the dirty data is added with 1; if the multi-core processor system cannot share the copy containing the dirty data, after the first CPU finishes the read operation, setting the counter of the Cacheline in the local Cache to be an initial value of 0, and keeping the numerical values of the counters of other cachelines unchanged;
if the data copy exists in other same-level caches and is consistent with the next-level storage, or only exists in a low-level memory, after the first CPU finishes the read operation, the counter of the Cacheline in the local Cache is set as an initial value of 0, and the values of the counters of other cachelines containing dirty cachelines are kept unchanged.
Optionally, when any L1DCache in the multi-core processor system and cachelines containing dirty data in other caches at different levels are written back or invalidated, the counter corresponding to the Cacheline is cleared, if the counter value corresponding to the cachelines containing dirty data is greater than the original value of the counter corresponding to the Cacheline, the counter value is decremented by 1, and the counter values of the other counters remain unchanged.
Optionally, the maximum recordable value of the counter additionally provided for each L1DCache and cachelines of other caches in the multi-core processor system is the number N of cachelines of the current Cache, and the bit width of the counter is [ log ]2N-1,0]。
The invention also provides a multi-core processor system which adopts the method to realize data consistency, the multi-core processor system comprises at least two CPUs, in the realization process, a counter is added for each L1DCache and Cacheline of other caches at all levels in the multi-core processor system, and the access condition of the Cacheline containing the dirty data copy is recorded; when the L1DCache and other levels of Cache are idle, updating the data copy containing the dirty data in the Cache to the next layer of memory in advance.
The invention also provides the data consistency implementation method and/or the application of the multi-core processor system in the technical field of integrated circuits.
The invention has the beneficial effects that:
according to the invention, a counter is additionally arranged for each L1DCache and the cachelines of other caches at all levels, and the access condition of the Cacheline containing the dirty data copy is recorded, so that the data copy containing the dirty data in the caches is updated to the main memory in advance when the memory is idle, the delay problem caused by Cache refreshing operation before data transmission by DMA is relieved, the memory is fully called, and the efficiency of a DMA transmission system is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a flow chart of the steps described in the present invention.
Fig. 2 is a diagram of a processor system architecture for an embodiment.
FIG. 3 is a state transition diagram of the Cache coherence protocol used in the embodiment.
FIG. 4 is a flow chart of a CPU request write operation.
FIG. 5 is a flow chart of a CPU request read operation.
FIG. 6 is a flow chart of the CPU0 proactive write back conflicting with a CPU1 write operation.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
Introduction of basic terms:
ICache, instruction cache.
DCache, data cache.
Cacheline, Cacheline.
Cache, Cache memory, a Cache is generally divided into a plurality of groups, each group is composed of a plurality of cachelines, in a multi-level storage system, there is a level differentiation of caches, denoted by L1, L2, …, for different levels of caches, such as L1DCache, L2 Cache.
The first embodiment is as follows:
the embodiment provides a data consistency implementation method based on advanced updating, which is applied to a multi-core processor system, and in the implementation process, a counter is additionally arranged for each L1DCache and the cachelines of other caches at all levels, the access condition of the Cacheline containing dirty data copies is recorded, the copies containing dirty data are written back to a main memory in advance, the problem of delay caused by Cache refreshing operation before DMA data transmission is effectively solved, and the efficiency of a DMA transmission system is improved.
The maximum value which can be recorded by the increased counter is the number N of cachelines of the current Cache, namely the bit width of the counter is [ log ]2N-1,0]. At least two CPUs are included in a multi-core processor system,
referring to fig. 1, the method includes:
step 1, a first CPU requests to access a certain data copy, wherein the first CPU is any CPU in a multi-core processor system.
Specifically, when the first CPU has a write miss, after the first CPU finishes the write operation, the counter corresponding to the Cacheline is set to 1, and the counters corresponding to other cachelines containing dirty data are incremented by 1.
When the first CPU makes a write hit, the data copy in Cacheline has two states: consistent with the next level of storage, and inconsistent with the next level of storage (i.e., containing dirty data):
if the data copy in the Cacheline is consistent with the next-level storage, after the first CPU finishes the write operation, setting a counter corresponding to the Cacheline to be 1, and adding 1 to counters corresponding to other cachelines containing dirty data;
if the Cacheline contains dirty data, after the first CPU completes the write operation, the counter corresponding to the Cacheline is set to be 1, if the numerical values of other counters are smaller than the original values of the counters corresponding to the Cacheline which are written and hit, the numerical values of the other counters are added to be 1, and the numerical values of the other counters are kept unchanged.
When the first CPU makes a read hit, the data copy in Cacheline has two states: consistent with the next level of storage, and inconsistent with the next level of storage (i.e., containing dirty data):
if dirty data is contained in the Cacheline, subtracting 1 from a counter corresponding to the Cacheline, adding 1 to a counter value which is smaller than the original value of the counter by 1, and keeping the values of other counters unchanged; when the value of a counter corresponding to Cacheline is less than or equal to 2, if the CPU requests to read the Cacheline, the value of the counter is kept unchanged;
if the data copy is consistent with the next-level storage, the counter values corresponding to all the cachelines are not changed after the first CPU finishes the reading operation.
When a read miss occurs to the first CPU, this copy of the data may be present in the other L1DCache or only in main memory.
When the data copy exists in other peer caches and the Cacheline contains dirty data, if the multi-core processor system can share the data copy containing the dirty data, the first CPU reads the data copy from the other peer caches, a counter of the Cacheline in the local Cache is set to be 1, and the value of the counter of the other Cacheline containing the dirty data is added with 1; if the multi-core processor system cannot share the copy containing the dirty data, after the first CPU finishes the read operation, setting the counter of the Cacheline in the local Cache to be an initial value of 0, and keeping the numerical values of the counters of other cachelines unchanged;
if the data copy exists in other same-level caches and is consistent with the next-level storage, or only exists in a low-level memory, after the first CPU finishes the read operation, the counter of the Cacheline in the local Cache is set as an initial value of 0, and the values of the counters of other cachelines containing dirty cachelines are kept unchanged.
Step 2, when a certain Cache is idle, namely no access request exists, comparing counters corresponding to the cachelines in the Cache, and requesting active write-back of the Cacheline with the maximum counter value to a next-level memory; meanwhile, if another Cache at the same level initiates an access request to a next-level memory and the Cache is not actively written back, the next-level memory preferentially processes the access request of another Cache at the same level, wherein a certain Cache refers to any one L1DCache or other caches at different levels in the multi-core processor system.
And 3, receiving the write-back response by the local Cache, and actively writing the Cache with the maximum counter value back to the next-level memory. If other caches at the same level also contain data copies of the Cacheline with the largest counter value, and the dirty position 0 of the Cacheline corresponding to other caches makes corresponding transition according to the consistency protocol.
And 4, the DMA initiates an access request.
And 5, the first CPU receives the access request of the DMA, starts to refresh the Cache, waits for the first CPU to refresh the corresponding dirty data copy into the main memory, and returns a response.
Before the DMA initiates an access request, the first CPU writes back part of the dirty data copy to the main memory in advance, so that the delay caused by Cache refreshing operation is effectively reduced.
And 6, receiving the response information sent by the first CPU by the DMA, and starting to transmit data.
Example two:
the embodiment provides an application description of the data consistency implementation method based on the early update in the first embodiment in practice, referring to fig. 2, which is specifically as follows:
in this embodiment, the hardware device includes a multi-core processor system, which includes a CPU0, a CPU1, a second level shared Cache (L2 Cache), a Bus (Bus), a main memory (Mem), and an interconnect structure. Each CPU employs a Harvard architecture, including a 32kB instruction cache (ICache) and data cache (DCache).
The L1DCache adopts 4-way set associative organization structure, the number of sets is 64, the Cacheline size is 128 bytes, so the maximum value N recordable by the counter corresponding to the dirty Cacheline of L1DCache is 256.
The embodiment is based on a write-back method and a write invalidation policy, and adopts an MSI Protocol (Modified Shared Invalid Protocol) to maintain consistency, and the states of the MSI Protocol are described as follows:
m (modified): the current data copy is modified, is the current latest data in the processor system, is inconsistent with the data copy in the memory, only has a unique copy in the current Cache, and needs to be written back to the memory when replacement occurs;
s (shared): the current data copy is in a shared state, is consistent with the data copy in the memory, possibly exists in a plurality of caches at the same time, and does not need to be written back to the memory when replacement or rewriting occurs;
i (Invalid): indicating that the current copy of data is invalid.
The state transitions of the MSI coherency protocol employed in the present application are shown in FIG. 3.
Referring to fig. 4, the CPU0 requests a write operation to a Cacheline:
when a write miss occurs to the CPU0 and the Cacheline is stored in L1DCache in the CPU1, then its state is M. If the Cacheline state is M, the CPU1 writes the Cacheline back to the next-level memory, and modifies the Cacheline state in the CPU1 to I, the counter corresponding to the Cacheline before is cleared, if the counter value corresponding to the Cacheline in the other M state is greater than the counter value corresponding to the Cacheline originally, the counter value is decremented by 1, and the counter values in the other M states remain unchanged; meanwhile, the CPU0 applies for a Cacheline from the local L1DCache, and after the write operation is completed, updates the Cacheline state to M, sets the value of the counter corresponding to the Cacheline state to 1, and adds 1 to the value of the counter corresponding to the Cacheline in the other M state in the local L1 DCache.
When the CPU0 has write miss and the Cacheline exists in the L2Cache or the main memory, the CPU0 applies for a Cacheline to the local L1DCache, and after the write operation is completed, updates the Cacheline state to M, sets the value of the counter corresponding to the Cacheline state to 1, and adds 1 to the counter value corresponding to the Cacheline in the other M state in the local L1 DCache.
When a write hit occurs to the CPU0, the Cacheline may be in M or S state. If the Cacheline state is M, the CPU0 directly performs write operation on the Cacheline, sets the counter corresponding to the Cacheline to 1, if the value of the other counter is smaller than the original value of the counter corresponding to the Cacheline hit by write, adds 1, and keeps the values of the other counters unchanged. If the Cacheline state is S, the CPU0 directly performs write operation on the Cacheline, updates the state to M, sets the counter corresponding to the Cacheline to 1, increments the values of the other counters by 1, and simultaneously the CPU1 invalidates the Cacheline and updates the corresponding Cacheline state in the L1DCache in the CPU1 to I.
Referring to fig. 5, CPU0 requests a read operation for a Cacheline:
when a read hit occurs to the CPU0, the Cacheline may be in M or S state. If the Cacheline state is M, after the CPU0 reads the Cacheline, the Cacheline state remains unchanged, the counter corresponding to the Cacheline is decremented by 1, the counter value that is 1 less than the original value of the counter is incremented by 1, and the values of the other counters remain unchanged. When the value of the counter corresponding to Cacheline is less than or equal to 2, if the CPU0 requests to read the Cacheline, the value of the counter remains unchanged. If the Cacheline state is S, the CPU0 only performs a read operation without changing any Cacheline state and corresponding counter value.
When a read miss occurs to the CPU0 and the Cacheline is stored in L1DCache in the CPU1, its state is M. The CPU1 writes the Cacheline back to the next-level memory, updates the state to S, and clears the counter corresponding to the Cacheline before, and if the counter value corresponding to the Cacheline in the other M state is greater than the counter value corresponding to the Cacheline originally, decrements the counter value by 1, and the counter values of the other counters remain unchanged; the CPU0 applies for a Cacheline from the local L1DCache, and after the read operation is completed, updates the Cacheline state to S without changing the state of any other Cacheline and the value of the corresponding counter.
When the CPU0 has a read miss and the Cacheline exists in the L2Cache or the main memory, the CPU0 applies for a Cacheline from the local L1DCache, and updates the state of the Cacheline to S after loading data, without changing the states of any other cachelines and the values of the corresponding counters.
Referring to fig. 6, when the L1DCache of the CPU0 is idle, that is, there is no access request, the counters corresponding to the cachelines are compared, and active write-back of the Cacheline with the largest counter value is requested. Meanwhile, if the L1DCache of the CPU1 initiates an access request to the L2Cache and does not actively write back, the L2Cache preferentially processes the access request of the CPU 1.
In the method for realizing data consistency based on updating in advance, in a multi-core system, when a plurality of caches are idle at the same level, in order to avoid deadlock caused by simultaneous application of active write-back by the plurality of caches, a polling arbitration mechanism is adopted, and dirty data copies are sequentially written back to a next-level memory.
In the second embodiment of the present application, when two L1DCache apply for active write-back simultaneously, responses of L1DCache applying for write-back in CPU0 and CPU1 are sequentially returned according to the polling arbitration mechanism, and CPU0 and CPU1 sequentially write-back to M state Cacheline with the largest counter value in local L1 DCache.
In the method for realizing data consistency based on advanced updating, in a multi-core system, when an L1DCache of a CPU0 is idle, a Cache line with the maximum counter value is requested to be actively written back to an L2Cache, and if an access request is initiated to the L2Cache by the CPU1 and the Cache is not actively written back, the access request of the CPU1 is preferentially processed by the L2 Cache.
According to the invention, a counter is additionally arranged for each L1DCache and the cachelines of other levels of caches, and the access condition of the Cacheline containing the dirty data copy is recorded, so that when the L1DCache and the cachelines of other levels of caches are idle, part of the data copy containing the dirty data in the caches is updated to a main memory in advance, instead of starting to refresh the caches before DMA data transmission, the problem of time delay caused by Cache refresh operation before DMA data transmission is solved, a memory is fully called, and the efficiency of a DMA transmission system is improved.
Some steps in the embodiments of the present invention may be implemented by software, and the corresponding software program may be stored in a readable storage medium, such as an optical disc or a hard disk.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (8)

1. A data consistency implementation method is applied to a multi-core processor system and comprises the following steps: adding a counter for each L1DCache and cachelines of other levels of Cache in the multi-core processor system, and recording the access condition of the Cacheline containing the dirty data copy; when the L1DCache and other levels of Cache are idle, updating the data copy containing the dirty data in the Cache to the next level of memory in advance;
the multi-core processor system comprises at least two CPUs, and when the L1DCache and other levels of Cache are idle, the data copy containing dirty data in the L1DCache and other levels of Cache are updated to the next level of memory in advance, wherein the steps comprise:
step 1, a first CPU requests to access a certain data copy, wherein the first CPU is any CPU in a multi-core processor system;
step 2, when a certain Cache is idle, comparing counters corresponding to the cachelines, and requesting the next-level memory to actively write back the Cacheline with the maximum counter value; meanwhile, if another Cache at the same level initiates an access request to a next-level memory and the Cache is not actively written back, the next-level memory preferentially processes the access request of another Cache at the same level, wherein a certain Cache refers to any L1DCache or other caches at all levels in the multi-core processor system;
step 3, the Cache receives the write-back response and actively writes the Cache line with the maximum counter value back to the next-level memory; if other caches at the same level also contain the data copy, the dirty position 0 of the Cacheline corresponding to other caches at the same level makes corresponding transition according to the consistency protocol;
step 4, DMA initiates an access request;
step 5, the CPU receives an access request of the DMA, starts to refresh the Cache, waits for the corresponding dirty data containing copy to be fully refreshed to the main memory, and returns a response;
step 6, DMA receives the response information sent by the CPU, and starts to transmit data;
in step 5, before the DMA initiates the access request, the first CPU writes back the partial dirty data copy to the main memory in advance.
2. The method according to claim 1, wherein in step 1:
when the first CPU has write miss, after waiting for the first CPU to finish the write operation, setting a counter corresponding to the Cacheline to be 1, and adding 1 to counters corresponding to other cachelines containing dirty data.
3. The method according to claim 1, wherein in step 1:
when a write hit occurs to the first CPU, there may be two states for the data copy in Cacheline: and the data copy is consistent with the next-level storage and inconsistent with the next-level storage, wherein the inconsistency with the next-level storage indicates that the data copy in the Cacheline contains dirty data:
if the data copy in the Cacheline is consistent with the next-level storage, after the first CPU finishes the write operation, setting a counter corresponding to the Cacheline to be 1, and adding 1 to counters corresponding to other cachelines containing dirty data;
if the data copy in the Cacheline contains dirty data, after the first CPU completes the write operation, the counter corresponding to the Cacheline is set to be 1, if the numerical values of other counters are smaller than the original values of the counters corresponding to the Cacheline which are written and hit, the numerical values of the other counters are added to be 1, and the numerical values of the other counters are kept unchanged.
4. The method according to claim 1, wherein in step 1:
when a read hit occurs for the first CPU:
if dirty data is contained in the Cacheline, subtracting 1 from a counter corresponding to the Cacheline, adding 1 to a counter value which is smaller than the original value of the counter by 1, and keeping the values of other counters unchanged; when the value of a counter corresponding to Cacheline is less than or equal to 2, if the CPU requests to read the Cacheline, the value of the counter is kept unchanged;
if the data copy is consistent with the next-level storage, the counter values corresponding to all the cachelines are not changed after the first CPU finishes the reading operation.
5. The method according to claim 1, wherein in step 1:
when a read miss occurs for the first CPU:
when the data copy exists in other peer caches and the Cacheline contains dirty data, if the multi-core processor system can share the data copy containing the dirty data, the first CPU reads the data copy from the other peer caches, a counter of the Cacheline in the local Cache is set to be 1, and the value of the counter of the other Cacheline containing the dirty data is added with 1; if the multi-core processor system cannot share the copy containing the dirty data, after the first CPU finishes the read operation, setting the counter of the Cacheline in the local Cache to be an initial value of 0, and keeping the numerical values of the counters of other cachelines unchanged;
if the data copy exists in other same-level caches and is consistent with the next-level storage, or only exists in a low-level memory, after the first CPU finishes the read operation, the counter of the Cacheline in the local Cache is set as an initial value of 0, and the values of the counters of other cachelines containing dirty cachelines are kept unchanged.
6. The method of claim 1, wherein: when any L1DCache in the multi-core processor system and cachelines containing dirty data in other caches at all levels are written back or are invalid, the counter corresponding to the Cacheline is cleared, if the counter value corresponding to the other cachelines containing the dirty data is larger than the original value of the counter corresponding to the Cacheline, the counter value is decremented by 1, and the values of the other counters are kept unchanged.
7. The method of claim 1, wherein the maximum recordable value of the counter added for each L1DCache and cachelines of other levels of caches in the multi-core processor system is the number N of cachelines of the current Cache, and the bit width of the counter is [ log ]2N-1,0]。
8. A multi-core processor system is characterized in that the multi-core processor system adopts the method of any one of claims 1 to 6 to realize data consistency, the multi-core processor system comprises at least two CPUs, in the realization process, a counter is added for each L1DCache and cachelines of other levels of Cache in the multi-core processor system, and the access condition of the cachelines containing dirty data copies is recorded; when the L1DCache and other levels of Cache are idle, updating the data copy containing the dirty data in the Cache to the main memory in advance.
CN202010210475.9A 2020-03-24 2020-03-24 Data consistency implementation method based on advanced updating Active CN111414318B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010210475.9A CN111414318B (en) 2020-03-24 2020-03-24 Data consistency implementation method based on advanced updating

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010210475.9A CN111414318B (en) 2020-03-24 2020-03-24 Data consistency implementation method based on advanced updating

Publications (2)

Publication Number Publication Date
CN111414318A CN111414318A (en) 2020-07-14
CN111414318B true CN111414318B (en) 2022-04-29

Family

ID=71494283

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010210475.9A Active CN111414318B (en) 2020-03-24 2020-03-24 Data consistency implementation method based on advanced updating

Country Status (1)

Country Link
CN (1) CN111414318B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101866318A (en) * 2010-06-13 2010-10-20 北京北大众志微系统科技有限责任公司 Management system and method for cache replacement strategy
CN102779017A (en) * 2012-06-29 2012-11-14 华中科技大学 Control method of data caching area in solid state disc
CN103019655A (en) * 2012-11-28 2013-04-03 中国人民解放军国防科学技术大学 Internal memory copying accelerating method and device facing multi-core microprocessor
CN104615576A (en) * 2015-03-02 2015-05-13 中国人民解放军国防科学技术大学 CPU+GPU processor-oriented hybrid granularity consistency maintenance method
CN105095116A (en) * 2014-05-19 2015-11-25 华为技术有限公司 Cache replacing method, cache controller and processor
CN105740168A (en) * 2016-01-23 2016-07-06 中国人民解放军国防科学技术大学 Fault-tolerant directory cache controller
CN105740164A (en) * 2014-12-10 2016-07-06 阿里巴巴集团控股有限公司 Multi-core processor supporting cache consistency, reading and writing methods and apparatuses as well as device
CN106909515A (en) * 2017-02-11 2017-06-30 郑州云海信息技术有限公司 Towards multinuclear shared last level cache management method and device that mixing is hosted
CN109669881A (en) * 2018-12-11 2019-04-23 中国航空工业集团公司西安航空计算技术研究所 A kind of calculation method based on the space Cache reservation algorithm

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7596662B2 (en) * 2006-08-31 2009-09-29 Intel Corporation Selective storage of data in levels of a cache memory

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101866318A (en) * 2010-06-13 2010-10-20 北京北大众志微系统科技有限责任公司 Management system and method for cache replacement strategy
CN102779017A (en) * 2012-06-29 2012-11-14 华中科技大学 Control method of data caching area in solid state disc
CN103019655A (en) * 2012-11-28 2013-04-03 中国人民解放军国防科学技术大学 Internal memory copying accelerating method and device facing multi-core microprocessor
CN105095116A (en) * 2014-05-19 2015-11-25 华为技术有限公司 Cache replacing method, cache controller and processor
CN105740164A (en) * 2014-12-10 2016-07-06 阿里巴巴集团控股有限公司 Multi-core processor supporting cache consistency, reading and writing methods and apparatuses as well as device
CN104615576A (en) * 2015-03-02 2015-05-13 中国人民解放军国防科学技术大学 CPU+GPU processor-oriented hybrid granularity consistency maintenance method
CN105740168A (en) * 2016-01-23 2016-07-06 中国人民解放军国防科学技术大学 Fault-tolerant directory cache controller
CN106909515A (en) * 2017-02-11 2017-06-30 郑州云海信息技术有限公司 Towards multinuclear shared last level cache management method and device that mixing is hosted
CN109669881A (en) * 2018-12-11 2019-04-23 中国航空工业集团公司西安航空计算技术研究所 A kind of calculation method based on the space Cache reservation algorithm

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
"DMA传输与Cache一致性分析";曹彦荣;《硅谷》;20140423(第8期);第39-40页 *
"SPACE:Sharing pattern-based directory coherence for multicore scalability";Hongzhou Zhao;《IEEE》;20170213;第2节,3.3节,图1 *
"Timekeeping techniques for predicting and optimizing memory behavior";Zhigang Hu;《IEEE》;20040226;第1-9页 *
"面向大数据处理的多核处理器Cache一致性协议";娄耘赫;《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》;20160315(第03期);第I137-155页 *

Also Published As

Publication number Publication date
CN111414318A (en) 2020-07-14

Similar Documents

Publication Publication Date Title
US10078592B2 (en) Resolving multi-core shared cache access conflicts
JP5431525B2 (en) A low-cost cache coherency system for accelerators
EP0731944B1 (en) Coherency and synchronization mechanism for i/o channel controllers in a data processing system
US6295582B1 (en) System and method for managing data in an asynchronous I/O cache memory to maintain a predetermined amount of storage space that is readily available
JP3737834B2 (en) Dual cache snoop mechanism
US6662277B2 (en) Cache system with groups of lines and with coherency for both single lines and groups of lines
US6751705B1 (en) Cache line converter
US11500797B2 (en) Computer memory expansion device and method of operation
JPH09223118A (en) Snoop cache memory control system
JPH10154100A (en) Information processing system, device and its controlling method
JP4295814B2 (en) Multiprocessor system and method of operating multiprocessor system
US6807608B2 (en) Multiprocessor environment supporting variable-sized coherency transactions
CN115203071A (en) Application of default shared state cache coherency protocol
CN111414318B (en) Data consistency implementation method based on advanced updating
US6021466A (en) Transferring data between caches in a multiple processor environment
US20040030843A1 (en) Asynchronous non-blocking snoop invalidation
JPH0816885B2 (en) Cache memory control method
WO2022246769A1 (en) Data access method and apparatus
US11847062B2 (en) Re-fetching data for L3 cache data evictions into a last-level cache
JP3507314B2 (en) Memory controller and computer system
JP4856373B2 (en) Memory system, control method thereof, and method of maintaining data coherency
JPH08106417A (en) Memory access method and memory sharing system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant