CN114217809B

CN114217809B - Implementation method of many-core simplified Cache protocol without transverse consistency

Info

Publication number: CN114217809B
Application number: CN202110398338.7A
Authority: CN
Inventors: 何王全; 郑方; 王飞; 过锋; 吴伟; 陈芳园; 朱琪; 钱宏; 管茂林
Original assignee: Wuxi Jiangnan Computing Technology Institute
Current assignee: Wuxi Jiangnan Computing Technology Institute
Priority date: 2021-04-14
Filing date: 2021-04-14
Publication date: 2024-04-30
Anticipated expiration: 2041-04-14
Also published as: CN114217809A

Abstract

The invention discloses a method for realizing a many-core simplified Cache protocol without transverse consistency, which comprises the following steps: s1, analyzing the update condition of data in a Cache line, and marking updated data; s2, if all data in the Cache line are not updated, or all data in the Cache line are updated, jumping to S5, and if only part of data in the Cache line are updated, jumping to S3; s3, when only partial content of data in one Cache line needs to be written back, other bit masks are 0; s4, updating the data with the corresponding mask bit of 1 in the main memory according to the granularity and the setting condition of the mask; s5, directly performing write-back operation on the Cache line. The invention effectively solves the false sharing problem of the shared main memory Cache structure, can also improve the write-back efficiency and effectively reduces the hardware cost of the processor in the aspect of Cache data management.

Description

Implementation method of many-core simplified Cache protocol without transverse consistency

Technical Field

The invention relates to a method for realizing a many-core simplified Cache protocol without transverse consistency, belonging to the technical field of high-performance computing.

Background

In order to alleviate the gap between the data access speed from the main memory and the data processing speed of the processor in the computer system, one or more levels of Cache memories (caches) are added between the processor and the main memory, and a Cache line is a basic unit for transmitting data between the Cache and the main memory, and comprises a plurality of data units. When Cache line data is copied into the Cache from the main memory, the storage control unit creates an entry for the Cache line data, where the entry includes both memory data and location information of the line data in the memory.

In the case of a shared main memory architecture and each processor core containing an independent Cache structure, the computing task within each processor updates different data in the same segment of memory space if the data is closely arranged in memory and in a Cache line mapping space. Then, since the Cache write-back is performed in line units, the consistency of the data in the Cache and the Cache is destroyed, and the data in the Cache is in error, which is a false sharing phenomenon. Under the condition of ensuring shared memory, the complete Cache consistency protocol can ensure the consistency of Cache data in each processor core, but under the architecture of a many-core processor, the implementation difficulty and the hardware cost are very large. The partial (single-sided) Cache coherence protocol can greatly reduce the hardware overhead of related parts of the processor, save physical space for other high-performance parts, but lacks an effective method and mechanism to avoid or solve the problem of false sharing.

"Heterogeneous many-core+shared memory" is an important trend in the development of current processor architectures. Under the structure, the realization difficulty and hardware cost of the complete Cache consistency protocol are large, and a large number of hardware components and circuits are required to be introduced to ensure the consistency of data in each kernel Cache and main memory data. In engineering practice, from the overall design and practical application of the processor, a partial (single-side) Cache consistency protocol is often adopted, so that the hardware cost of related parts of the processor is greatly reduced, the hardware space is saved for other high-performance parts, and the overall performance of the processor is improved. However, the existing technology lacks fine granularity management of Cache write-back data, thereby causing a false sharing problem and affecting the correctness of program logic.

Disclosure of Invention

The invention aims to provide a method for realizing a multi-core simplified Cache protocol without transverse consistency, which aims to solve the problem of false sharing in a shared main memory Cache structure in a multi-core processor.

In order to achieve the above purpose, the invention adopts the following technical scheme: the implementation method of the many-core simplified Cache protocol without transverse consistency comprises the following steps:

s1, acquiring Cache line state bit information of a hardware Cache, analyzing the update condition of data in a Cache line, and marking updated data;

S2, if all data in the Cache line are not updated, or all data in the Cache line are updated, jumping to S5, and if only part of data in the Cache line are updated, jumping to S3;

s3, when only partial content of data in one Cache line needs to be written back, determining the unit size and the data number of the partial data, and setting a mask of a bit corresponding to the partial data to be 1 and setting other bit masks to be 0;

S4, updating the data with the corresponding mask bit of 1 in the main memory according to the granularity and the setting condition of the mask, wherein the data is specifically as follows:

S4.1, according to the physical address mark in the Cache line described in S3, taking out the data of the corresponding address in the main memory, and marking the data as M;

S4.2, setting the granularity of the mask as X bytes, inquiring the mask of the corresponding data bit in S3 aiming at each X byte content of M, and if the mask bit is 0, not changing the content of the X bytes; if the mask bit is 1, writing the X-byte content in the corresponding position of the Cache line into M, and covering the original X-byte content in M;

S4.3, writing the modified M back to the main memory, and exiting;

S5, ignoring a mask mechanism, and directly performing write-back operation on the Cache line.

The further improved scheme in the technical scheme is as follows:

1. In the above scheme, in S1, if the state bit information of the Cache line is a binary number of all 0 bits, all data in the Cache line is not updated; if the state bit information of the Cache line is a binary number with all 1 bits, all data in the Cache line is updated.

Due to the application of the technical scheme, compared with the prior art, the invention has the following advantages:

According to the method and the device for writing back the data in the Cache line, the data granularity of the main memory is adjusted in a mask mode, the fact that the data which are updated in practice in the Cache line are written back to the main memory is guaranteed, the phenomenon that old data cover new data caused by writing back of the data in the Cache line in whole line is avoided, the problem of false sharing of a shared main memory Cache structure is effectively solved, the writing back efficiency is improved, and the hardware cost of a processor in the aspect of Cache data management is effectively reduced.

Drawings

FIG. 1 is a schematic representation of the process of the present invention.

Detailed Description

Examples: the invention provides a method for realizing a many-core simplified Cache protocol without transverse consistency, which specifically comprises the following steps:

S4.3, writing the modified M back to the main memory, and exiting;

In S1, if the state bit information of the Cache line is binary number of which each bit is all 0, all data in the Cache line are not updated; if the state bit information of the Cache line is a binary number with all 1 bits, all data in the Cache line is updated.

Further explanation of the above embodiments is as follows:

According to the method and the device, the granularity of the data written back to the main memory by the Cache line is adjusted in a mask mode, so that the data which is actually updated in the Cache line is ensured to be written back to the main memory, the phenomenon that old data cover new data caused by the whole line write back of the Cache line data is avoided, and the problem of false sharing of a shared main memory Cache structure is effectively solved.

Before the Cache line data is written back to the main memory space, the specific flow of the method is as follows:

1. and analyzing the data updating condition in the Cache line, and marking the updated data.

2. If all the data in the Cache line are not updated, or all the data in the Cache line are updated, jumping to 5; if only part of the data in the Cache line is updated, the jump is to 3.

3. When only partial content of data in one Cache line needs to be written back, determining the unit size and the data number of the partial data, and setting the mask of the bit corresponding to the partial data to 1 and setting other bit masks to 0.

4. And processing by adopting a main memory access mode of atomic read-write, and updating the data with the corresponding mask bit of 1 in the main memory according to the granularity and the setting condition of the mask, thereby ending the method.

5. And ignoring a mask mechanism, directly performing write-back operation on the Cache line, improving the write-back main memory efficiency of the Cache, and ending the method.

When the implementation method of the many-core simplified Cache protocol without transverse consistency is adopted, the granularity of data written back to the main memory by the Cache line is adjusted in a mask mode, so that the data written back to the main memory by the actual updated data in the Cache line is ensured, the phenomenon that old data cover new data caused by the whole line write back of the Cache line data is avoided, the false sharing problem of a shared main memory Cache structure is effectively solved, the write back efficiency is improved, and the hardware cost of a processor in the aspect of Cache data management is effectively reduced.

On one hand, the data granularity of the Cache line write-back main memory is adjusted in a mask mode, so that the false sharing problem of the shared main memory Cache structure is effectively solved;

On the other hand, through the analysis of the Cache line data updating condition, the Cache line is directly written back to main memory for the conditions of updating the whole line data and not updating the whole line data (read only), so that the writing back efficiency is improved;

finally, as an important supplement to the partial Cache consistency protocol, the hardware overhead of the processor in the aspect of Cache data management can be effectively reduced.

In order to facilitate a better understanding of the present invention, the terms used herein will be briefly explained below:

Cache: the Cache memory stores the contents of the frequently accessed RAM locations and the storage addresses of the data items, and when the processor refers to an address in the memory, the Cache checks whether the address exists, if so, the data is returned to the processor, and if not, the normal memory access is performed.

Cache lateral consistency: each core of the multi-core/many-core processor may have a private Cache, and in the case of shared main memory, when the same memory area is modified, the consistency of shared data in the respective caches is guaranteed to be called Cache lateral consistency.

False sharing: when the independent variables are modified by multithreading, if the variables share the same Cache line, the consistency problem needs to be solved by different means when the variables are written back to the main memory, and the performance of the variables is affected.

The above embodiments are provided to illustrate the technical concept and features of the present invention and are intended to enable those skilled in the art to understand the content of the present invention and implement the same, and are not intended to limit the scope of the present invention. All equivalent changes or modifications made in accordance with the spirit of the present invention should be construed to be included in the scope of the present invention.

Claims

1. The implementation method of the many-core simplified Cache protocol without transverse consistency is characterized by comprising the following steps:

S4.3, writing the modified M back to the main memory, and exiting;

2. The method for implementing the many-core thin Cache protocol without transverse consistency according to claim 1, wherein the method comprises the following steps: in S1, if the state bit information of the Cache line is binary number of which each bit is all 0, all data in the Cache line are not updated; if the state bit information of the Cache line is a binary number with all 1 bits, all data in the Cache line is updated.