CN114217809B - Implementation method of many-core simplified Cache protocol without transverse consistency - Google Patents
Implementation method of many-core simplified Cache protocol without transverse consistency Download PDFInfo
- Publication number
- CN114217809B CN114217809B CN202110398338.7A CN202110398338A CN114217809B CN 114217809 B CN114217809 B CN 114217809B CN 202110398338 A CN202110398338 A CN 202110398338A CN 114217809 B CN114217809 B CN 114217809B
- Authority
- CN
- China
- Prior art keywords
- data
- cache line
- cache
- updated
- mask
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 18
- 230000015654 memory Effects 0.000 claims abstract description 47
- 230000009191 jumping Effects 0.000 claims abstract description 9
- 238000013523 data management Methods 0.000 abstract description 4
- 238000012545 processing Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/41—Compilation
- G06F8/44—Encoding
- G06F8/443—Optimisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/41—Compilation
- G06F8/44—Encoding
- G06F8/441—Register allocation; Assignment of physical memory space to logical memory space
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
The invention discloses a method for realizing a many-core simplified Cache protocol without transverse consistency, which comprises the following steps: s1, analyzing the update condition of data in a Cache line, and marking updated data; s2, if all data in the Cache line are not updated, or all data in the Cache line are updated, jumping to S5, and if only part of data in the Cache line are updated, jumping to S3; s3, when only partial content of data in one Cache line needs to be written back, other bit masks are 0; s4, updating the data with the corresponding mask bit of 1 in the main memory according to the granularity and the setting condition of the mask; s5, directly performing write-back operation on the Cache line. The invention effectively solves the false sharing problem of the shared main memory Cache structure, can also improve the write-back efficiency and effectively reduces the hardware cost of the processor in the aspect of Cache data management.
Description
Technical Field
The invention relates to a method for realizing a many-core simplified Cache protocol without transverse consistency, belonging to the technical field of high-performance computing.
Background
In order to alleviate the gap between the data access speed from the main memory and the data processing speed of the processor in the computer system, one or more levels of Cache memories (caches) are added between the processor and the main memory, and a Cache line is a basic unit for transmitting data between the Cache and the main memory, and comprises a plurality of data units. When Cache line data is copied into the Cache from the main memory, the storage control unit creates an entry for the Cache line data, where the entry includes both memory data and location information of the line data in the memory.
In the case of a shared main memory architecture and each processor core containing an independent Cache structure, the computing task within each processor updates different data in the same segment of memory space if the data is closely arranged in memory and in a Cache line mapping space. Then, since the Cache write-back is performed in line units, the consistency of the data in the Cache and the Cache is destroyed, and the data in the Cache is in error, which is a false sharing phenomenon. Under the condition of ensuring shared memory, the complete Cache consistency protocol can ensure the consistency of Cache data in each processor core, but under the architecture of a many-core processor, the implementation difficulty and the hardware cost are very large. The partial (single-sided) Cache coherence protocol can greatly reduce the hardware overhead of related parts of the processor, save physical space for other high-performance parts, but lacks an effective method and mechanism to avoid or solve the problem of false sharing.
"Heterogeneous many-core+shared memory" is an important trend in the development of current processor architectures. Under the structure, the realization difficulty and hardware cost of the complete Cache consistency protocol are large, and a large number of hardware components and circuits are required to be introduced to ensure the consistency of data in each kernel Cache and main memory data. In engineering practice, from the overall design and practical application of the processor, a partial (single-side) Cache consistency protocol is often adopted, so that the hardware cost of related parts of the processor is greatly reduced, the hardware space is saved for other high-performance parts, and the overall performance of the processor is improved. However, the existing technology lacks fine granularity management of Cache write-back data, thereby causing a false sharing problem and affecting the correctness of program logic.
Disclosure of Invention
The invention aims to provide a method for realizing a multi-core simplified Cache protocol without transverse consistency, which aims to solve the problem of false sharing in a shared main memory Cache structure in a multi-core processor.
In order to achieve the above purpose, the invention adopts the following technical scheme: the implementation method of the many-core simplified Cache protocol without transverse consistency comprises the following steps:
s1, acquiring Cache line state bit information of a hardware Cache, analyzing the update condition of data in a Cache line, and marking updated data;
S2, if all data in the Cache line are not updated, or all data in the Cache line are updated, jumping to S5, and if only part of data in the Cache line are updated, jumping to S3;
s3, when only partial content of data in one Cache line needs to be written back, determining the unit size and the data number of the partial data, and setting a mask of a bit corresponding to the partial data to be 1 and setting other bit masks to be 0;
S4, updating the data with the corresponding mask bit of 1 in the main memory according to the granularity and the setting condition of the mask, wherein the data is specifically as follows:
S4.1, according to the physical address mark in the Cache line described in S3, taking out the data of the corresponding address in the main memory, and marking the data as M;
S4.2, setting the granularity of the mask as X bytes, inquiring the mask of the corresponding data bit in S3 aiming at each X byte content of M, and if the mask bit is 0, not changing the content of the X bytes; if the mask bit is 1, writing the X-byte content in the corresponding position of the Cache line into M, and covering the original X-byte content in M;
S4.3, writing the modified M back to the main memory, and exiting;
S5, ignoring a mask mechanism, and directly performing write-back operation on the Cache line.
The further improved scheme in the technical scheme is as follows:
1. In the above scheme, in S1, if the state bit information of the Cache line is a binary number of all 0 bits, all data in the Cache line is not updated; if the state bit information of the Cache line is a binary number with all 1 bits, all data in the Cache line is updated.
Due to the application of the technical scheme, compared with the prior art, the invention has the following advantages:
According to the method and the device for writing back the data in the Cache line, the data granularity of the main memory is adjusted in a mask mode, the fact that the data which are updated in practice in the Cache line are written back to the main memory is guaranteed, the phenomenon that old data cover new data caused by writing back of the data in the Cache line in whole line is avoided, the problem of false sharing of a shared main memory Cache structure is effectively solved, the writing back efficiency is improved, and the hardware cost of a processor in the aspect of Cache data management is effectively reduced.
Drawings
FIG. 1 is a schematic representation of the process of the present invention.
Detailed Description
Examples: the invention provides a method for realizing a many-core simplified Cache protocol without transverse consistency, which specifically comprises the following steps:
s1, acquiring Cache line state bit information of a hardware Cache, analyzing the update condition of data in a Cache line, and marking updated data;
S2, if all data in the Cache line are not updated, or all data in the Cache line are updated, jumping to S5, and if only part of data in the Cache line are updated, jumping to S3;
s3, when only partial content of data in one Cache line needs to be written back, determining the unit size and the data number of the partial data, and setting a mask of a bit corresponding to the partial data to be 1 and setting other bit masks to be 0;
S4, updating the data with the corresponding mask bit of 1 in the main memory according to the granularity and the setting condition of the mask, wherein the data is specifically as follows:
S4.1, according to the physical address mark in the Cache line described in S3, taking out the data of the corresponding address in the main memory, and marking the data as M;
S4.2, setting the granularity of the mask as X bytes, inquiring the mask of the corresponding data bit in S3 aiming at each X byte content of M, and if the mask bit is 0, not changing the content of the X bytes; if the mask bit is 1, writing the X-byte content in the corresponding position of the Cache line into M, and covering the original X-byte content in M;
S4.3, writing the modified M back to the main memory, and exiting;
S5, ignoring a mask mechanism, and directly performing write-back operation on the Cache line.
In S1, if the state bit information of the Cache line is binary number of which each bit is all 0, all data in the Cache line are not updated; if the state bit information of the Cache line is a binary number with all 1 bits, all data in the Cache line is updated.
Further explanation of the above embodiments is as follows:
According to the method and the device, the granularity of the data written back to the main memory by the Cache line is adjusted in a mask mode, so that the data which is actually updated in the Cache line is ensured to be written back to the main memory, the phenomenon that old data cover new data caused by the whole line write back of the Cache line data is avoided, and the problem of false sharing of a shared main memory Cache structure is effectively solved.
Before the Cache line data is written back to the main memory space, the specific flow of the method is as follows:
1. and analyzing the data updating condition in the Cache line, and marking the updated data.
2. If all the data in the Cache line are not updated, or all the data in the Cache line are updated, jumping to 5; if only part of the data in the Cache line is updated, the jump is to 3.
3. When only partial content of data in one Cache line needs to be written back, determining the unit size and the data number of the partial data, and setting the mask of the bit corresponding to the partial data to 1 and setting other bit masks to 0.
4. And processing by adopting a main memory access mode of atomic read-write, and updating the data with the corresponding mask bit of 1 in the main memory according to the granularity and the setting condition of the mask, thereby ending the method.
5. And ignoring a mask mechanism, directly performing write-back operation on the Cache line, improving the write-back main memory efficiency of the Cache, and ending the method.
When the implementation method of the many-core simplified Cache protocol without transverse consistency is adopted, the granularity of data written back to the main memory by the Cache line is adjusted in a mask mode, so that the data written back to the main memory by the actual updated data in the Cache line is ensured, the phenomenon that old data cover new data caused by the whole line write back of the Cache line data is avoided, the false sharing problem of a shared main memory Cache structure is effectively solved, the write back efficiency is improved, and the hardware cost of a processor in the aspect of Cache data management is effectively reduced.
On one hand, the data granularity of the Cache line write-back main memory is adjusted in a mask mode, so that the false sharing problem of the shared main memory Cache structure is effectively solved;
On the other hand, through the analysis of the Cache line data updating condition, the Cache line is directly written back to main memory for the conditions of updating the whole line data and not updating the whole line data (read only), so that the writing back efficiency is improved;
finally, as an important supplement to the partial Cache consistency protocol, the hardware overhead of the processor in the aspect of Cache data management can be effectively reduced.
In order to facilitate a better understanding of the present invention, the terms used herein will be briefly explained below:
Cache: the Cache memory stores the contents of the frequently accessed RAM locations and the storage addresses of the data items, and when the processor refers to an address in the memory, the Cache checks whether the address exists, if so, the data is returned to the processor, and if not, the normal memory access is performed.
Cache lateral consistency: each core of the multi-core/many-core processor may have a private Cache, and in the case of shared main memory, when the same memory area is modified, the consistency of shared data in the respective caches is guaranteed to be called Cache lateral consistency.
False sharing: when the independent variables are modified by multithreading, if the variables share the same Cache line, the consistency problem needs to be solved by different means when the variables are written back to the main memory, and the performance of the variables is affected.
The above embodiments are provided to illustrate the technical concept and features of the present invention and are intended to enable those skilled in the art to understand the content of the present invention and implement the same, and are not intended to limit the scope of the present invention. All equivalent changes or modifications made in accordance with the spirit of the present invention should be construed to be included in the scope of the present invention.
Claims (2)
1. The implementation method of the many-core simplified Cache protocol without transverse consistency is characterized by comprising the following steps:
s1, acquiring Cache line state bit information of a hardware Cache, analyzing the update condition of data in a Cache line, and marking updated data;
S2, if all data in the Cache line are not updated, or all data in the Cache line are updated, jumping to S5, and if only part of data in the Cache line are updated, jumping to S3;
s3, when only partial content of data in one Cache line needs to be written back, determining the unit size and the data number of the partial data, and setting a mask of a bit corresponding to the partial data to be 1 and setting other bit masks to be 0;
S4, updating the data with the corresponding mask bit of 1 in the main memory according to the granularity and the setting condition of the mask, wherein the data is specifically as follows:
S4.1, according to the physical address mark in the Cache line described in S3, taking out the data of the corresponding address in the main memory, and marking the data as M;
S4.2, setting the granularity of the mask as X bytes, inquiring the mask of the corresponding data bit in S3 aiming at each X byte content of M, and if the mask bit is 0, not changing the content of the X bytes; if the mask bit is 1, writing the X-byte content in the corresponding position of the Cache line into M, and covering the original X-byte content in M;
S4.3, writing the modified M back to the main memory, and exiting;
S5, ignoring a mask mechanism, and directly performing write-back operation on the Cache line.
2. The method for implementing the many-core thin Cache protocol without transverse consistency according to claim 1, wherein the method comprises the following steps: in S1, if the state bit information of the Cache line is binary number of which each bit is all 0, all data in the Cache line are not updated; if the state bit information of the Cache line is a binary number with all 1 bits, all data in the Cache line is updated.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110398338.7A CN114217809B (en) | 2021-04-14 | 2021-04-14 | Implementation method of many-core simplified Cache protocol without transverse consistency |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110398338.7A CN114217809B (en) | 2021-04-14 | 2021-04-14 | Implementation method of many-core simplified Cache protocol without transverse consistency |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114217809A CN114217809A (en) | 2022-03-22 |
CN114217809B true CN114217809B (en) | 2024-04-30 |
Family
ID=80695813
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110398338.7A Active CN114217809B (en) | 2021-04-14 | 2021-04-14 | Implementation method of many-core simplified Cache protocol without transverse consistency |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114217809B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102880467A (en) * | 2012-09-05 | 2013-01-16 | 无锡江南计算技术研究所 | Method for verifying Cache coherence protocol and multi-core processor system |
CN105718242A (en) * | 2016-01-15 | 2016-06-29 | 中国人民解放军国防科学技术大学 | Processing method and system for supporting software and hardware data consistency in multi-core DSP (Digital Signal Processing) |
CN111930527A (en) * | 2020-06-28 | 2020-11-13 | 绵阳慧视光电技术有限责任公司 | Method for maintaining cache consistency of multi-core heterogeneous platform |
CN112416615A (en) * | 2020-11-05 | 2021-02-26 | 珠海格力电器股份有限公司 | Multi-core processor, method and device for realizing cache consistency of multi-core processor and storage medium |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9043194B2 (en) * | 2002-09-17 | 2015-05-26 | International Business Machines Corporation | Method and system for efficient emulation of multiprocessor memory consistency |
-
2021
- 2021-04-14 CN CN202110398338.7A patent/CN114217809B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102880467A (en) * | 2012-09-05 | 2013-01-16 | 无锡江南计算技术研究所 | Method for verifying Cache coherence protocol and multi-core processor system |
CN105718242A (en) * | 2016-01-15 | 2016-06-29 | 中国人民解放军国防科学技术大学 | Processing method and system for supporting software and hardware data consistency in multi-core DSP (Digital Signal Processing) |
CN111930527A (en) * | 2020-06-28 | 2020-11-13 | 绵阳慧视光电技术有限责任公司 | Method for maintaining cache consistency of multi-core heterogeneous platform |
CN112416615A (en) * | 2020-11-05 | 2021-02-26 | 珠海格力电器股份有限公司 | Multi-core processor, method and device for realizing cache consistency of multi-core processor and storage medium |
Non-Patent Citations (1)
Title |
---|
基于外部共享Cache的多处理机Cache一致性协议;刘广忠;肖钰;袁淑芳;;河北工程技术高等专科学校学报;20060630(02);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN114217809A (en) | 2022-03-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11893653B2 (en) | Unified memory systems and methods | |
US11994974B2 (en) | Recording a trace of code execution using reference bits in a processor cache | |
US7941631B2 (en) | Providing metadata in a translation lookaside buffer (TLB) | |
US11126536B2 (en) | Facilitating recording a trace file of code execution using index bits in a processor cache | |
JP4764360B2 (en) | Techniques for using memory attributes | |
US7925865B2 (en) | Accuracy of correlation prefetching via block correlation and adaptive prefetch degree selection | |
US9916247B2 (en) | Cache management directory where hardware manages cache write requests and software manages cache read requests | |
US20130091331A1 (en) | Methods, apparatus, and articles of manufacture to manage memory | |
JPH05210585A (en) | Cash management system | |
JP2010507160A (en) | Processing of write access request to shared memory of data processor | |
CN111742301A (en) | Logging cache inflow to higher level caches by request | |
WO2008005687A2 (en) | Global overflow method for virtualized transactional memory | |
US6711651B1 (en) | Method and apparatus for history-based movement of shared-data in coherent cache memories of a multiprocessor system using push prefetching | |
US20220269615A1 (en) | Cache-based trace logging using tags in system memory | |
CN103268297A (en) | Accelerating core virtual scratch pad memory method based on heterogeneous multi-core platform | |
JPH04102948A (en) | Data processing system and method | |
US10853247B2 (en) | Device for maintaining data consistency between hardware accelerator and host system and method thereof | |
US8266379B2 (en) | Multithreaded processor with multiple caches | |
CN114217809B (en) | Implementation method of many-core simplified Cache protocol without transverse consistency | |
JP6249120B1 (en) | Processor | |
US11687453B2 (en) | Cache-based trace logging using tags in an upper-level cache | |
US11989137B2 (en) | Logging cache line lifetime hints when recording bit-accurate trace | |
JP2000047942A (en) | Device and method for controlling cache memory | |
CN114217937A (en) | Compiler support method for alleviating false sharing problem | |
KR910004263B1 (en) | Computer system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |