CN111414318A

CN111414318A - Data consistency implementation method based on advanced updating

Info

Publication number: CN111414318A
Application number: CN202010210475.9A
Authority: CN
Inventors: 顾晓峰; 李青青; 虞致国; 魏敬和
Original assignee: Jiangnan University
Current assignee: Jiangnan University
Priority date: 2020-03-24
Filing date: 2020-03-24
Publication date: 2020-07-14
Anticipated expiration: 2040-03-24
Also published as: CN111414318B

Abstract

The method comprises the step of adding a counter for each L1 DCache and other cachelines of all levels of caches, and recording the access condition of the Cacheline containing dirty data copies, so that the data copies containing dirty data in the caches are updated to the next layer of memory in advance when the memory is idle, instead of refreshing the caches until the data is transmitted by DMA, thereby relieving the delay problem caused by the Cache refreshing operation before the data is transmitted by DMA, fully transferring the memory, and improving the efficiency of a DMA transmission system.

Description

Data consistency implementation method based on advanced updating

Technical Field

The invention relates to a method for realizing data consistency based on advanced updating, belonging to the technical field of integrated circuits.

Background

At present, a hierarchical storage system is mostly adopted for a main stream processor, that is, a multi-level Cache (Cache memory) is added between a processor and a main memory (hereinafter referred to as a "main memory") to make up for a performance gap between a CPU and the main memory.

Partial data copies of a main memory are stored in the Cache, and the data consistency of the multi-level Cache is maintained by generally adopting two write strategies, namely a write-back method and a write-through method. The former writes back a dirty copy of data to main memory only if the Cacheline containing the dirty is replaced or invalidated. The strategy reduces the access times of the main memory, improves the system efficiency, but increases the maintenance difficulty of the Cache consistency. The write-through method updates data in a main memory when the CPU writes to the Cache. Although the strategy effectively ensures the consistency of the Cache, the data transmission quantity of the bus is increased, and the write operation of the main memory is delayed for a long time, thereby affecting the overall performance of the system. Therefore, modern processors mostly adopt a write-back method.

DMA (Direct Memory Access) is an efficient data transmission method, and a DMA controller is used to control data to be directly transmitted between an I/O device and a main Memory and between peripheral devices without intervention of a CPU. However, the data consistency problem is also introduced in the DMA transfer, and researchers mainly solve the data consistency problem between the DMA transfer and each level of Cache and main memory from two layers of software and hardware at present. But either software or hardware solutions require that the Cache be flushed before DMA transfers data. According to the characteristics of large data volume of DMA once transmission and large main memory read-write delay, the Cache refresh operation before DMA transmission needs to take a long time, and the efficiency of the DMA cannot be fully exerted.

Disclosure of Invention

In order to solve the problems that the Cache refreshing operation before DMA transmission needs to take a long time and the efficiency of the DMA cannot be fully exerted, the invention provides a data consistency implementation method based on advanced updating, and the technical scheme is as follows:

a method for realizing data consistency is applied to a multi-core processor system and comprises the steps of adding a counter for each L1 DCache and cachelines of other levels of Cache in the multi-core processor system, recording the access condition of the Cacheline containing dirty data copies, and updating the data copies containing the dirty data to a next-level memory in advance when L1 DCache and other levels of Cache are idle.

Optionally, the multi-core processor system includes at least two CPUs, and when L1 DCache and other caches in different levels are idle, the data copy containing the dirty data in the Cache is updated to the next layer of memory in advance, where the updating includes:

step 1, a first CPU requests to access a certain data copy, wherein the first CPU is any CPU in a multi-core processor system;

step 2, when a certain Cache is idle, the counters corresponding to the cachelines are compared, and the Cacheline with the maximum counter value is actively written back to the next-level memory, meanwhile, if another Cache at the same level initiates an access request to the next-level memory and is not actively written back, the next-level memory preferentially processes the access request of another Cache at the same level, wherein the certain Cache refers to any L1 DCache or other caches at all levels in the multi-core processor system;

step 3, the Cache receives the write-back response and actively writes the Cache line with the maximum counter value back to the next-level memory; if other caches at the same level also contain the data copy of the Cacheline and the dirty position 0 of the Cacheline corresponding to the Cache, the state makes corresponding transition according to a consistency protocol;

step 4, DMA initiates an access request;

step 5, the first CPU receives an access request of the DMA, starts to refresh the Cache, waits for the first CPU to refresh the corresponding dirty data copy into the main memory, and returns a response;

and 6, receiving the response information sent by the first CPU by the DMA, and starting to transmit data.

Optionally, in step 5, before the DMA initiates the access request, the first CPU writes back the partially dirty data copy to the main memory in advance.

Optionally, in step 1:

when the first CPU has write miss, after waiting for the first CPU to finish the write operation, setting the counter corresponding to the Cacheline to be 1, and adding 1 to the counters corresponding to other cachelines containing dirty data.

Optionally, in step 1:

when a write hit occurs to the first CPU, there may be two states for the data copy in Cacheline: and the data copy is consistent with the next-level storage and inconsistent with the next-level storage, wherein the inconsistency with the next-level storage indicates that the data copy in the Cacheline contains dirty data:

if the data copy in the Cacheline is consistent with the next-level storage, after the first CPU finishes the write operation, setting a counter corresponding to the Cacheline to be 1, and adding 1 to counters corresponding to other cachelines containing dirty data;

if the data copy in the Cacheline contains dirty data, after the first CPU completes the write operation, the counter corresponding to the Cacheline is set to be 1, if the numerical values of other counters are smaller than the original values of the counters corresponding to the Cacheline which are written and hit, the numerical values of the other counters are added to be 1, and the numerical values of the other counters are kept unchanged.

Optionally, in step 1:

when a read hit occurs for the first CPU:

if dirty data is contained in the Cacheline, subtracting 1 from a counter corresponding to the Cacheline, adding 1 to a counter value which is smaller than the original value of the counter by 1, and keeping the values of other counters unchanged; when the value of a counter corresponding to Cacheline is less than or equal to 2, if the CPU requests to read the Cacheline, the value of the counter is kept unchanged;

if the data copy is consistent with the next-level storage, the counter values corresponding to all the cachelines are not changed after the first CPU finishes the reading operation.

Optionally, in step 1:

when a read miss occurs for the first CPU:

when the data copy exists in other peer caches and the Cacheline contains dirty data, if the multi-core processor system can share the data copy containing the dirty data, the first CPU reads the data copy from the other peer caches, a counter of the Cacheline in the local Cache is set to be 1, and the value of the counter of the other Cacheline containing the dirty data is added with 1; if the multi-core processor system cannot share the copy containing the dirty data, after the first CPU finishes the read operation, setting the counter of the Cacheline in the local Cache to be an initial value of 0, and keeping the numerical values of the counters of other cachelines unchanged;

if the data copy exists in other same-level caches and is consistent with the next-level storage, or only exists in a low-level memory, after the first CPU finishes the read operation, the counter of the Cacheline in the local Cache is set as an initial value of 0, and the values of the counters of other cachelines containing dirty cachelines are kept unchanged.

Optionally, when any L1 DCache in the multi-core processor system and cachelines containing dirty data in other caches at different levels are written back or are invalidated, the counter corresponding to the Cacheline is cleared, if the counter value corresponding to the cachelines containing dirty data is greater than the original value of the counter corresponding to the Cacheline, the counter value is decremented by 1, and the values of the remaining counters remain unchanged.

Optionally, the maximum recordable value of the counter additionally provided for each L1 DCache and cachelines of other caches in the multi-core processor system is the number N of cachelines of the current Cache, and the bit width of the counter is [ log ]₂N-1,0]。

The invention also provides a multi-core processor system which adopts the method to realize data consistency, the multi-core processor system comprises at least two CPUs, a counter is additionally arranged for each L1 DCache and cachelines of other levels of caches in the multi-core processor system in the realization process, the access condition of the cachelines containing dirty data copies is recorded, and the data copies containing the dirty data in the caches are updated to the next layer of memory in advance when L1 DCache and other levels of caches are idle.

The invention also provides the data consistency implementation method and/or the application of the multi-core processor system in the technical field of integrated circuits.

The invention has the beneficial effects that:

according to the invention, a counter is additionally arranged for each L1 DCache and the cachelines of other caches at all levels, and the access condition of the Cacheline containing the dirty data copy is recorded, so that the data copy containing the dirty data in the caches is updated to the main memory in advance when the memory is idle, the delay problem caused by Cache refreshing operation before data transmission by DMA is relieved, the memory is fully called, and the efficiency of a DMA transmission system is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a flow chart of the steps described in the present invention.

Fig. 2 is a diagram of a processor system architecture for an embodiment.

FIG. 3 is a state transition diagram of the Cache coherence protocol used in the embodiment.

FIG. 4 is a flow chart of a CPU request write operation.

FIG. 5 is a flow chart of a CPU request read operation.

FIG. 6 is a flow chart of the CPU0 proactive write back conflicting with a CPU1 write operation.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

Introduction of basic terms:

ICache, instruction cache.

DCache, data cache.

Cacheline, Cacheline.

Cache, Cache memory, a Cache is generally divided into a plurality of groups, each group is composed of a plurality of cachelines, in a multi-level storage system, the Cache has level differentiation, which is represented by L1, L2, …, and represents different levels of Cache, such as L1 DCache, L2 Cache.

The first embodiment is as follows:

the embodiment provides a data consistency implementation method based on advanced updating, which is applied to a multi-core processor system, and in the implementation process, a counter is additionally arranged for each L1 DCache and cachelines of other caches at all levels, the access condition of the Cacheline containing dirty data copies is recorded, the copies containing dirty data are written back to a main memory in advance, the problem of delay caused by Cache refreshing operation before DMA data transmission is effectively solved, and the efficiency of a DMA transmission system is improved.

The maximum value which can be recorded by the increased counter of the invention isThe number N of cachelines of the current Cache, i.e. the bit width of the counter is [ log ]₂N-1,0]. At least two CPUs are included in a multi-core processor system,

referring to fig. 1, the method includes:

step 1, a first CPU requests to access a certain data copy, wherein the first CPU is any CPU in a multi-core processor system.

Specifically, when the first CPU has a write miss, after the first CPU finishes the write operation, the counter corresponding to the Cacheline is set to 1, and the counters corresponding to other cachelines containing dirty data are incremented by 1.

When the first CPU makes a write hit, the data copy in Cacheline has two states: consistent with the next level of storage, and inconsistent with the next level of storage (i.e., containing dirty data):

if the Cacheline contains dirty data, after the first CPU completes the write operation, the counter corresponding to the Cacheline is set to be 1, if the numerical values of other counters are smaller than the original values of the counters corresponding to the Cacheline which are written and hit, the numerical values of the other counters are added to be 1, and the numerical values of the other counters are kept unchanged.

When the first CPU makes a read hit, the data copy in Cacheline has two states: consistent with the next level of storage, and inconsistent with the next level of storage (i.e., containing dirty data):

When a read miss occurs to the first CPU, this copy of the data may be present in the other L1 DCache or only in main memory.

And 2, when a certain Cache is idle, namely no access request exists, comparing counters corresponding to the cachelines in the caches, and requesting the Cache line with the maximum value of the counter to be actively written back to a next-stage memory, and meanwhile, if another Cache at the same stage initiates an access request to the next-stage memory and is not actively written back, preferentially processing the access request of another Cache at the same stage by the next-stage memory, wherein a certain Cache refers to any L1 DCache or other caches at different stages in the multi-core processor system.

And 3, receiving the write-back response by the local Cache, and actively writing the Cache with the maximum counter value back to the next-level memory. If other caches at the same level also contain data copies of the Cacheline with the largest counter value, and the dirty position 0 of the Cacheline corresponding to other caches makes corresponding transition according to the consistency protocol.

And 4, the DMA initiates an access request.

And 5, the first CPU receives the access request of the DMA, starts to refresh the Cache, waits for the first CPU to refresh the corresponding dirty data copy into the main memory, and returns a response.

Before the DMA initiates an access request, the first CPU writes back part of the dirty data copy to the main memory in advance, so that the delay caused by Cache refreshing operation is effectively reduced.

Example two:

the embodiment provides an application description of the data consistency implementation method based on the early update in the first embodiment in practice, referring to fig. 2, which is specifically as follows:

in this embodiment, the hardware device comprises a multi-core processor system including a CPU0, a CPU1, a second level shared Cache (L2 Cache), a Bus (Bus), a main memory (Mem), and an interconnect fabric, wherein each CPU has a huffman architecture including a 32kB instruction Cache (ICache) and a data Cache (DCache).

Wherein L1 DCache adopts 4-way set-associative organization structure, the number of sets is 64, the size of Cacheline is 128 bytes, so the maximum value N which can be recorded by the counter corresponding to the dirty Cacheline of L1 DCache is 256.

The embodiment is based on a write-back method and a write invalidation policy, and adopts an MSI protocol (Modified Shared invalid protocol) to maintain consistency, and the state of the embodiment is described as follows:

m (modified): the current data copy is modified, is the current latest data in the processor system, is inconsistent with the data copy in the memory, only has a unique copy in the current Cache, and needs to be written back to the memory when replacement occurs;

s (shared): the current data copy is in a shared state, is consistent with the data copy in the memory, possibly exists in a plurality of caches at the same time, and does not need to be written back to the memory when replacement or rewriting occurs;

i (Invalid): indicating that the current copy of data is invalid.

The state transitions of the MSI coherency protocol employed in the present application are shown in FIG. 3.

Referring to fig. 4, the CPU0 requests a write operation to a Cacheline:

when a write miss occurs to the CPU0, and the Cacheline exists in L1 DCache in the CPU1, the status is M, if the Cacheline status is M, the CPU1 writes the Cacheline back to a next-level memory, modifies the Cacheline status in the CPU1 to I, clears a counter corresponding to the Cacheline before the Cacheline, decrements 1 if a counter value corresponding to a Cacheline in another M state is greater than a counter value corresponding to the Cacheline originally, keeps the counter values unchanged, and simultaneously the CPU0 applies for a Cacheline to the local L1 DCache, updates the Cacheline status to M after the write operation is completed, sets the counter value corresponding to the Cacheline to 1, and increments 1 by the counter value corresponding to the other M state in the local L1 DCache.

When the CPU0 has write miss and the Cacheline exists in L2 Cache or a main memory, the CPU0 applies for a Cacheline to the local L1 DCache, and after the write operation is completed, updates the Cacheline state to M, sets the value of the counter corresponding to the Cacheline state to 1, and adds 1 to the counter value corresponding to the Cacheline in the other M state in the local L1 DCache.

When the CPU0 has a write hit, the Cacheline may be in an M or S state, if the Cacheline is in an M state, the CPU0 directly performs a write operation on the Cacheline, and sets the counter corresponding to the Cacheline to 1, if the value of the other counter is smaller than the original value of the counter corresponding to the Cacheline that is written and hit, then adds 1, and the value of the other counter remains unchanged.

Referring to fig. 5, CPU0 requests a read operation for a Cacheline:

when a read hit occurs to the CPU0, the Cacheline may be in M or S state. If the Cacheline state is M, after the CPU0 reads the Cacheline, the Cacheline state remains unchanged, the counter corresponding to the Cacheline is decremented by 1, the counter value that is 1 less than the original value of the counter is incremented by 1, and the values of the other counters remain unchanged. When the value of the counter corresponding to Cacheline is less than or equal to 2, if the CPU0 requests to read the Cacheline, the value of the counter remains unchanged. If the Cacheline state is S, the CPU0 only performs a read operation without changing any Cacheline state and corresponding counter value.

CPU1 writes the Cacheline back to the next-level memory, updates the state to S, clears the counter corresponding to the Cacheline before, decrements the counter value corresponding to the Cacheline in other M states by 1 if the counter value corresponding to the Cacheline in other M states is larger than the original counter value corresponding to the Cacheline, keeps the values of other counters unchanged, CPU0 applies for a Cacheline to local L1 DCache, updates the Cacheline state to S after the read operation is completed, and does not change the states of any other cachelines and the values of the corresponding counters.

When a read miss occurs in the CPU0 and the Cacheline exists in the L2 Cache or the main memory, the CPU0 applies for a Cacheline from the local L1 DCache, and after loading data, updates the state of the Cacheline to S without changing the states of any other cachelines and the values of the corresponding counters.

Referring to fig. 6, when L1 DCache of the CPU0 is idle, that is, there is no access request, the counters corresponding to the cachelines are compared, and active write-back of the Cacheline with the largest counter value is requested, and meanwhile, if L1 DCache of the CPU1 initiates an access request to L2 Cache and does not actively write-back, the L2 Cache preferentially processes the access request of the CPU 1.

In the method for realizing data consistency based on updating in advance, in a multi-core system, when a plurality of caches are idle at the same level, in order to avoid deadlock caused by simultaneous application of active write-back by the plurality of caches, a polling arbitration mechanism is adopted, and dirty data copies are sequentially written back to a next-level memory.

In the second embodiment of the present application, when two L1 DCache simultaneously apply for active write back, according to the polling arbitration mechanism, responses to L1 DCache actively applying write back in CPU0 and CPU1 are sequentially returned, and CPU0 and CPU1 sequentially write back the M state Cacheline with the largest counter value in local L1 DCache.

In the method for realizing data consistency based on advanced updating, in a multi-core system, when L1 DCache of a CPU0 is idle, Cacheline with the maximum counter value is requested to be actively written back to a L2 Cache, and if an access request is initiated to a L2 Cache by a CPU1 and the Cache is not actively written back, the access request of the CPU1 is preferentially processed by a L2 Cache.

According to the invention, a counter is additionally arranged for each L1 DCache and the cachelines of other levels of caches, and the access condition of the Cacheline containing the dirty data copy is recorded, so that when L1 DCache and the cachelines of other levels of caches are idle, part of the data copy containing the dirty data in the caches is updated to a main memory in advance, instead of starting to refresh the caches before DMA data transmission, the problem of time delay caused by Cache refresh operation before DMA data transmission is solved, a memory is fully called, and the efficiency of a DMA transmission system is improved.

Some steps in the embodiments of the present invention may be implemented by software, and the corresponding software program may be stored in a readable storage medium, such as an optical disc or a hard disk.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A data consistency implementation method is characterized by being applied to a multi-core processor system and comprising the steps of adding a counter for Cache lines of L1 DCache and other levels of Cache in the multi-core processor system, recording the access condition of the Cache lines containing dirty data copies, and updating the data copies containing the dirty data to a next level of memory in advance when L1 DCache and other levels of Cache are idle.

2. The method according to claim 1, wherein the multi-core processor system comprises at least two CPUs, and the updating of the data copy containing the dirty data to the next-stage memory in advance when L1 DCache and other stages of caches are idle comprises:

step 3, the Cache receives the write-back response and actively writes the Cache line with the maximum counter value back to the next-level memory; if other caches at the same level also contain the data copy, the dirty position 0 of the Cacheline corresponding to other caches at the same level makes corresponding transition according to the consistency protocol;

step 4, DMA initiates an access request;

step 5, the CPU receives an access request of the DMA, starts to refresh the Cache, waits for the corresponding dirty data containing copy to be fully refreshed to the main memory, and returns a response;

and 6, receiving the response information sent by the CPU by the DMA, and starting to transmit data.

3. The method according to claim 2, wherein in step 5, the first CPU writes back the partially dirty copy of the data to the main memory in advance before the DMA initiates the access request.

4. The method according to claim 2, wherein in step 1:

when the first CPU has write miss, after waiting for the first CPU to finish the write operation, setting a counter corresponding to the Cacheline to be 1, and adding 1 to counters corresponding to other cachelines containing dirty data.

5. The method according to claim 2, wherein in step 1:

6. The method according to claim 2, wherein in step 1:

when a read hit occurs for the first CPU:

7. The method according to claim 2, wherein in step 1:

when a read miss occurs for the first CPU:

8. The method of claim 2, wherein when a Cacheline containing dirty data in any L1 DCache and other caches of each stage in the multi-core processor system is written back or invalidated, a counter corresponding to the Cacheline is cleared, if a counter value corresponding to the other cachelines containing dirty data is greater than an original value of the counter corresponding to the Cacheline, the counter value is decremented by 1, and the values of the remaining counters remain unchanged.

9. The method of claim 1, wherein the maximum recordable value of the counter added for the cachelines of each L1 DCache and other caches in the multi-core processor system is the number N of cachelines of the current Cache, and the bit width of the counter is [ log ]₂N-1,0]。

10. A multi-core processor system is characterized in that the multi-core processor system adopts the method of any one of claims 1 to 8 to achieve data consistency, the multi-core processor system comprises at least two CPUs, a counter is additionally arranged for each L1 DCache and cachelines of other levels of caches in the multi-core processor system in the achieving process, the access condition of the cachelines containing dirty data copies is recorded, and the data copies containing the dirty data in the caches are updated to a main memory in advance when L1 DCache and other levels of caches are idle.