CN111414318B

CN111414318B - Data consistency implementation method based on advanced updating

Info

Publication number: CN111414318B
Application number: CN202010210475.9A
Authority: CN
Inventors: 顾晓峰; 李青青; 虞致国; 魏敬和
Original assignee: Jiangnan University
Current assignee: Jiangnan University
Priority date: 2020-03-24
Filing date: 2020-03-24
Publication date: 2022-04-29
Anticipated expiration: 2040-03-24
Also published as: CN111414318A

Abstract

The invention discloses a method for realizing data consistency based on advanced updating, belonging to the technical field of integrated circuits. The method comprises the steps that a counter is additionally arranged for each L1DCache and the cachelines of other caches at all levels, and the access condition of the cachelines containing dirty data copies is recorded, so that the data copies containing the dirty data in the caches are updated to a next-layer memory in advance when the memories are idle, and the caches are not refreshed before the data are transmitted by the DMA, so that the problem of time delay caused by Cache refreshing operation before the data are transmitted by the DMA is solved, the memories are fully called, and the efficiency of a DMA transmission system is improved.

Description

Data consistency implementation method based on advanced updating

Technical Field

The invention relates to a method for realizing data consistency based on advanced updating, belonging to the technical field of integrated circuits.

Background

At present, a hierarchical storage system is mostly adopted for a main stream processor, that is, a multi-level Cache (Cache memory) is added between a processor and a main memory (hereinafter referred to as a "main memory") to make up for a performance gap between a CPU and the main memory.

Partial data copies of a main memory are stored in the Cache, and the data consistency of the multi-level Cache is maintained by generally adopting two write strategies, namely a write-back method and a write-through method. The former writes back a dirty copy of data to main memory only if the Cacheline containing the dirty is replaced or invalidated. The strategy reduces the access times of the main memory, improves the system efficiency, but increases the maintenance difficulty of the Cache consistency. The write-through method updates data in a main memory when the CPU writes to the Cache. Although the strategy effectively ensures the consistency of the Cache, the data transmission quantity of the bus is increased, and the write operation of the main memory is delayed for a long time, thereby affecting the overall performance of the system. Therefore, modern processors mostly adopt a write-back method.

DMA (Direct Memory Access) is an efficient data transmission method, and a DMA controller is used to control data to be directly transmitted between an I/O device and a main Memory and between peripheral devices without intervention of a CPU. However, the data consistency problem is also introduced in the DMA transfer, and researchers mainly solve the data consistency problem between the DMA transfer and each level of Cache and main memory from two layers of software and hardware at present. But either software or hardware solutions require that the Cache be flushed before DMA transfers data. According to the characteristics of large data volume of DMA once transmission and large main memory read-write delay, the Cache refresh operation before DMA transmission needs to take a long time, and the efficiency of the DMA cannot be fully exerted.

Disclosure of Invention

In order to solve the problems that the Cache refreshing operation before DMA transmission needs to take a long time and the efficiency of the DMA cannot be fully exerted, the invention provides a data consistency implementation method based on advanced updating, and the technical scheme is as follows:

a data consistency implementation method is applied to a multi-core processor system and comprises the following steps: adding a counter for each L1DCache and cachelines of other levels of Cache in the multi-core processor system, and recording the access condition of the Cacheline containing the dirty data copy; when the L1DCache and other levels of Cache are idle, the data copy containing the dirty data in the Cache is updated to the next level of memory in advance.

Optionally, the multi-core processor system includes at least two CPUs, and when the L1DCache and other caches in each stage are idle, the data copy containing the dirty data in the L1DCache is updated to the next-layer memory in advance, where the updating includes:

step 1, a first CPU requests to access a certain data copy, wherein the first CPU is any CPU in a multi-core processor system;

step 2, when a certain Cache is idle, comparing counters corresponding to the cachelines, and requesting the next-level memory to actively write back the Cacheline with the maximum counter value; meanwhile, if another Cache at the same level initiates an access request to a next-level memory and the Cache is not actively written back, the next-level memory preferentially processes the access request of another Cache at the same level, wherein a certain Cache refers to any L1DCache or other caches at all levels in the multi-core processor system;

step 3, the Cache receives the write-back response and actively writes the Cache line with the maximum counter value back to the next-level memory; if other caches at the same level also contain the data copy of the Cacheline and the dirty position 0 of the Cacheline corresponding to the Cache, the state makes corresponding transition according to a consistency protocol;

step 4, DMA initiates an access request;

step 5, the first CPU receives an access request of the DMA, starts to refresh the Cache, waits for the first CPU to refresh the corresponding dirty data copy into the main memory, and returns a response;

and 6, receiving the response information sent by the first CPU by the DMA, and starting to transmit data.

Optionally, in step 5, before the DMA initiates the access request, the first CPU writes back the partially dirty data copy to the main memory in advance.

Optionally, in step 1:

when the first CPU has write miss, after waiting for the first CPU to finish the write operation, setting the counter corresponding to the Cacheline to be 1, and adding 1 to the counters corresponding to other cachelines containing dirty data.

Optionally, in step 1:

when a write hit occurs to the first CPU, there may be two states for the data copy in Cacheline: and the data copy is consistent with the next-level storage and inconsistent with the next-level storage, wherein the inconsistency with the next-level storage indicates that the data copy in the Cacheline contains dirty data:

if the data copy in the Cacheline is consistent with the next-level storage, after the first CPU finishes the write operation, setting a counter corresponding to the Cacheline to be 1, and adding 1 to counters corresponding to other cachelines containing dirty data;

if the data copy in the Cacheline contains dirty data, after the first CPU completes the write operation, the counter corresponding to the Cacheline is set to be 1, if the numerical values of other counters are smaller than the original values of the counters corresponding to the Cacheline which are written and hit, the numerical values of the other counters are added to be 1, and the numerical values of the other counters are kept unchanged.

Optionally, in step 1:

when a read hit occurs for the first CPU:

if dirty data is contained in the Cacheline, subtracting 1 from a counter corresponding to the Cacheline, adding 1 to a counter value which is smaller than the original value of the counter by 1, and keeping the values of other counters unchanged; when the value of a counter corresponding to Cacheline is less than or equal to 2, if the CPU requests to read the Cacheline, the value of the counter is kept unchanged;

if the data copy is consistent with the next-level storage, the counter values corresponding to all the cachelines are not changed after the first CPU finishes the reading operation.

Optionally, in step 1:

when a read miss occurs for the first CPU:

when the data copy exists in other peer caches and the Cacheline contains dirty data, if the multi-core processor system can share the data copy containing the dirty data, the first CPU reads the data copy from the other peer caches, a counter of the Cacheline in the local Cache is set to be 1, and the value of the counter of the other Cacheline containing the dirty data is added with 1; if the multi-core processor system cannot share the copy containing the dirty data, after the first CPU finishes the read operation, setting the counter of the Cacheline in the local Cache to be an initial value of 0, and keeping the numerical values of the counters of other cachelines unchanged;

if the data copy exists in other same-level caches and is consistent with the next-level storage, or only exists in a low-level memory, after the first CPU finishes the read operation, the counter of the Cacheline in the local Cache is set as an initial value of 0, and the values of the counters of other cachelines containing dirty cachelines are kept unchanged.

Optionally, when any L1DCache in the multi-core processor system and cachelines containing dirty data in other caches at different levels are written back or invalidated, the counter corresponding to the Cacheline is cleared, if the counter value corresponding to the cachelines containing dirty data is greater than the original value of the counter corresponding to the Cacheline, the counter value is decremented by 1, and the counter values of the other counters remain unchanged.

Optionally, the maximum recordable value of the counter additionally provided for each L1DCache and cachelines of other caches in the multi-core processor system is the number N of cachelines of the current Cache, and the bit width of the counter is [ log ]₂N-1,0]。

The invention also provides a multi-core processor system which adopts the method to realize data consistency, the multi-core processor system comprises at least two CPUs, in the realization process, a counter is added for each L1DCache and Cacheline of other caches at all levels in the multi-core processor system, and the access condition of the Cacheline containing the dirty data copy is recorded; when the L1DCache and other levels of Cache are idle, updating the data copy containing the dirty data in the Cache to the next layer of memory in advance.

The invention also provides the data consistency implementation method and/or the application of the multi-core processor system in the technical field of integrated circuits.

The invention has the beneficial effects that:

according to the invention, a counter is additionally arranged for each L1DCache and the cachelines of other caches at all levels, and the access condition of the Cacheline containing the dirty data copy is recorded, so that the data copy containing the dirty data in the caches is updated to the main memory in advance when the memory is idle, the delay problem caused by Cache refreshing operation before data transmission by DMA is relieved, the memory is fully called, and the efficiency of a DMA transmission system is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a flow chart of the steps described in the present invention.

Fig. 2 is a diagram of a processor system architecture for an embodiment.

FIG. 3 is a state transition diagram of the Cache coherence protocol used in the embodiment.

FIG. 4 is a flow chart of a CPU request write operation.

FIG. 5 is a flow chart of a CPU request read operation.

FIG. 6 is a flow chart of the CPU0 proactive write back conflicting with a CPU1 write operation.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

Introduction of basic terms:

ICache, instruction cache.

DCache, data cache.

Cacheline, Cacheline.

Cache, Cache memory, a Cache is generally divided into a plurality of groups, each group is composed of a plurality of cachelines, in a multi-level storage system, there is a level differentiation of caches, denoted by L1, L2, …, for different levels of caches, such as L1DCache, L2 Cache.

The first embodiment is as follows:

the embodiment provides a data consistency implementation method based on advanced updating, which is applied to a multi-core processor system, and in the implementation process, a counter is additionally arranged for each L1DCache and the cachelines of other caches at all levels, the access condition of the Cacheline containing dirty data copies is recorded, the copies containing dirty data are written back to a main memory in advance, the problem of delay caused by Cache refreshing operation before DMA data transmission is effectively solved, and the efficiency of a DMA transmission system is improved.

The maximum value which can be recorded by the increased counter is the number N of cachelines of the current Cache, namely the bit width of the counter is [ log ]₂N-1,0]. At least two CPUs are included in a multi-core processor system,

referring to fig. 1, the method includes:

step 1, a first CPU requests to access a certain data copy, wherein the first CPU is any CPU in a multi-core processor system.

Specifically, when the first CPU has a write miss, after the first CPU finishes the write operation, the counter corresponding to the Cacheline is set to 1, and the counters corresponding to other cachelines containing dirty data are incremented by 1.

When the first CPU makes a write hit, the data copy in Cacheline has two states: consistent with the next level of storage, and inconsistent with the next level of storage (i.e., containing dirty data):

if the Cacheline contains dirty data, after the first CPU completes the write operation, the counter corresponding to the Cacheline is set to be 1, if the numerical values of other counters are smaller than the original values of the counters corresponding to the Cacheline which are written and hit, the numerical values of the other counters are added to be 1, and the numerical values of the other counters are kept unchanged.

When the first CPU makes a read hit, the data copy in Cacheline has two states: consistent with the next level of storage, and inconsistent with the next level of storage (i.e., containing dirty data):

When a read miss occurs to the first CPU, this copy of the data may be present in the other L1DCache or only in main memory.

Step 2, when a certain Cache is idle, namely no access request exists, comparing counters corresponding to the cachelines in the Cache, and requesting active write-back of the Cacheline with the maximum counter value to a next-level memory; meanwhile, if another Cache at the same level initiates an access request to a next-level memory and the Cache is not actively written back, the next-level memory preferentially processes the access request of another Cache at the same level, wherein a certain Cache refers to any one L1DCache or other caches at different levels in the multi-core processor system.

And 3, receiving the write-back response by the local Cache, and actively writing the Cache with the maximum counter value back to the next-level memory. If other caches at the same level also contain data copies of the Cacheline with the largest counter value, and the dirty position 0 of the Cacheline corresponding to other caches makes corresponding transition according to the consistency protocol.

And 4, the DMA initiates an access request.

And 5, the first CPU receives the access request of the DMA, starts to refresh the Cache, waits for the first CPU to refresh the corresponding dirty data copy into the main memory, and returns a response.

Before the DMA initiates an access request, the first CPU writes back part of the dirty data copy to the main memory in advance, so that the delay caused by Cache refreshing operation is effectively reduced.

Example two:

the embodiment provides an application description of the data consistency implementation method based on the early update in the first embodiment in practice, referring to fig. 2, which is specifically as follows:

in this embodiment, the hardware device includes a multi-core processor system, which includes a CPU0, a CPU1, a second level shared Cache (L2 Cache), a Bus (Bus), a main memory (Mem), and an interconnect structure. Each CPU employs a Harvard architecture, including a 32kB instruction cache (ICache) and data cache (DCache).

The L1DCache adopts 4-way set associative organization structure, the number of sets is 64, the Cacheline size is 128 bytes, so the maximum value N recordable by the counter corresponding to the dirty Cacheline of L1DCache is 256.

The embodiment is based on a write-back method and a write invalidation policy, and adopts an MSI Protocol (Modified Shared Invalid Protocol) to maintain consistency, and the states of the MSI Protocol are described as follows:

m (modified): the current data copy is modified, is the current latest data in the processor system, is inconsistent with the data copy in the memory, only has a unique copy in the current Cache, and needs to be written back to the memory when replacement occurs;

s (shared): the current data copy is in a shared state, is consistent with the data copy in the memory, possibly exists in a plurality of caches at the same time, and does not need to be written back to the memory when replacement or rewriting occurs;

i (Invalid): indicating that the current copy of data is invalid.

The state transitions of the MSI coherency protocol employed in the present application are shown in FIG. 3.

Referring to fig. 4, the CPU0 requests a write operation to a Cacheline:

when a write miss occurs to the CPU0 and the Cacheline is stored in L1DCache in the CPU1, then its state is M. If the Cacheline state is M, the CPU1 writes the Cacheline back to the next-level memory, and modifies the Cacheline state in the CPU1 to I, the counter corresponding to the Cacheline before is cleared, if the counter value corresponding to the Cacheline in the other M state is greater than the counter value corresponding to the Cacheline originally, the counter value is decremented by 1, and the counter values in the other M states remain unchanged; meanwhile, the CPU0 applies for a Cacheline from the local L1DCache, and after the write operation is completed, updates the Cacheline state to M, sets the value of the counter corresponding to the Cacheline state to 1, and adds 1 to the value of the counter corresponding to the Cacheline in the other M state in the local L1 DCache.

When the CPU0 has write miss and the Cacheline exists in the L2Cache or the main memory, the CPU0 applies for a Cacheline to the local L1DCache, and after the write operation is completed, updates the Cacheline state to M, sets the value of the counter corresponding to the Cacheline state to 1, and adds 1 to the counter value corresponding to the Cacheline in the other M state in the local L1 DCache.

When a write hit occurs to the CPU0, the Cacheline may be in M or S state. If the Cacheline state is M, the CPU0 directly performs write operation on the Cacheline, sets the counter corresponding to the Cacheline to 1, if the value of the other counter is smaller than the original value of the counter corresponding to the Cacheline hit by write, adds 1, and keeps the values of the other counters unchanged. If the Cacheline state is S, the CPU0 directly performs write operation on the Cacheline, updates the state to M, sets the counter corresponding to the Cacheline to 1, increments the values of the other counters by 1, and simultaneously the CPU1 invalidates the Cacheline and updates the corresponding Cacheline state in the L1DCache in the CPU1 to I.

Referring to fig. 5, CPU0 requests a read operation for a Cacheline:

when a read hit occurs to the CPU0, the Cacheline may be in M or S state. If the Cacheline state is M, after the CPU0 reads the Cacheline, the Cacheline state remains unchanged, the counter corresponding to the Cacheline is decremented by 1, the counter value that is 1 less than the original value of the counter is incremented by 1, and the values of the other counters remain unchanged. When the value of the counter corresponding to Cacheline is less than or equal to 2, if the CPU0 requests to read the Cacheline, the value of the counter remains unchanged. If the Cacheline state is S, the CPU0 only performs a read operation without changing any Cacheline state and corresponding counter value.

When a read miss occurs to the CPU0 and the Cacheline is stored in L1DCache in the CPU1, its state is M. The CPU1 writes the Cacheline back to the next-level memory, updates the state to S, and clears the counter corresponding to the Cacheline before, and if the counter value corresponding to the Cacheline in the other M state is greater than the counter value corresponding to the Cacheline originally, decrements the counter value by 1, and the counter values of the other counters remain unchanged; the CPU0 applies for a Cacheline from the local L1DCache, and after the read operation is completed, updates the Cacheline state to S without changing the state of any other Cacheline and the value of the corresponding counter.

When the CPU0 has a read miss and the Cacheline exists in the L2Cache or the main memory, the CPU0 applies for a Cacheline from the local L1DCache, and updates the state of the Cacheline to S after loading data, without changing the states of any other cachelines and the values of the corresponding counters.

Referring to fig. 6, when the L1DCache of the CPU0 is idle, that is, there is no access request, the counters corresponding to the cachelines are compared, and active write-back of the Cacheline with the largest counter value is requested. Meanwhile, if the L1DCache of the CPU1 initiates an access request to the L2Cache and does not actively write back, the L2Cache preferentially processes the access request of the CPU 1.

In the method for realizing data consistency based on updating in advance, in a multi-core system, when a plurality of caches are idle at the same level, in order to avoid deadlock caused by simultaneous application of active write-back by the plurality of caches, a polling arbitration mechanism is adopted, and dirty data copies are sequentially written back to a next-level memory.

In the second embodiment of the present application, when two L1DCache apply for active write-back simultaneously, responses of L1DCache applying for write-back in CPU0 and CPU1 are sequentially returned according to the polling arbitration mechanism, and CPU0 and CPU1 sequentially write-back to M state Cacheline with the largest counter value in local L1 DCache.

In the method for realizing data consistency based on advanced updating, in a multi-core system, when an L1DCache of a CPU0 is idle, a Cache line with the maximum counter value is requested to be actively written back to an L2Cache, and if an access request is initiated to the L2Cache by the CPU1 and the Cache is not actively written back, the access request of the CPU1 is preferentially processed by the L2 Cache.

According to the invention, a counter is additionally arranged for each L1DCache and the cachelines of other levels of caches, and the access condition of the Cacheline containing the dirty data copy is recorded, so that when the L1DCache and the cachelines of other levels of caches are idle, part of the data copy containing the dirty data in the caches is updated to a main memory in advance, instead of starting to refresh the caches before DMA data transmission, the problem of time delay caused by Cache refresh operation before DMA data transmission is solved, a memory is fully called, and the efficiency of a DMA transmission system is improved.

Some steps in the embodiments of the present invention may be implemented by software, and the corresponding software program may be stored in a readable storage medium, such as an optical disc or a hard disk.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A data consistency implementation method is applied to a multi-core processor system and comprises the following steps: adding a counter for each L1DCache and cachelines of other levels of Cache in the multi-core processor system, and recording the access condition of the Cacheline containing the dirty data copy; when the L1DCache and other levels of Cache are idle, updating the data copy containing the dirty data in the Cache to the next level of memory in advance;

the multi-core processor system comprises at least two CPUs, and when the L1DCache and other levels of Cache are idle, the data copy containing dirty data in the L1DCache and other levels of Cache are updated to the next level of memory in advance, wherein the steps comprise:

step 3, the Cache receives the write-back response and actively writes the Cache line with the maximum counter value back to the next-level memory; if other caches at the same level also contain the data copy, the dirty position 0 of the Cacheline corresponding to other caches at the same level makes corresponding transition according to the consistency protocol;

step 4, DMA initiates an access request;

step 5, the CPU receives an access request of the DMA, starts to refresh the Cache, waits for the corresponding dirty data containing copy to be fully refreshed to the main memory, and returns a response;

step 6, DMA receives the response information sent by the CPU, and starts to transmit data;

in step 5, before the DMA initiates the access request, the first CPU writes back the partial dirty data copy to the main memory in advance.

2. The method according to claim 1, wherein in step 1:

when the first CPU has write miss, after waiting for the first CPU to finish the write operation, setting a counter corresponding to the Cacheline to be 1, and adding 1 to counters corresponding to other cachelines containing dirty data.

3. The method according to claim 1, wherein in step 1:

4. The method according to claim 1, wherein in step 1:

when a read hit occurs for the first CPU:

5. The method according to claim 1, wherein in step 1:

when a read miss occurs for the first CPU:

6. The method of claim 1, wherein: when any L1DCache in the multi-core processor system and cachelines containing dirty data in other caches at all levels are written back or are invalid, the counter corresponding to the Cacheline is cleared, if the counter value corresponding to the other cachelines containing the dirty data is larger than the original value of the counter corresponding to the Cacheline, the counter value is decremented by 1, and the values of the other counters are kept unchanged.

7. The method of claim 1, wherein the maximum recordable value of the counter added for each L1DCache and cachelines of other levels of caches in the multi-core processor system is the number N of cachelines of the current Cache, and the bit width of the counter is [ log ]₂N-1,0]。

8. A multi-core processor system is characterized in that the multi-core processor system adopts the method of any one of claims 1 to 6 to realize data consistency, the multi-core processor system comprises at least two CPUs, in the realization process, a counter is added for each L1DCache and cachelines of other levels of Cache in the multi-core processor system, and the access condition of the cachelines containing dirty data copies is recorded; when the L1DCache and other levels of Cache are idle, updating the data copy containing the dirty data in the Cache to the main memory in advance.