CN114217809B - Implementation method of many-core simplified Cache protocol without transverse consistency - Google Patents

Implementation method of many-core simplified Cache protocol without transverse consistency Download PDF

Info

Publication number
CN114217809B
CN114217809B CN202110398338.7A CN202110398338A CN114217809B CN 114217809 B CN114217809 B CN 114217809B CN 202110398338 A CN202110398338 A CN 202110398338A CN 114217809 B CN114217809 B CN 114217809B
Authority
CN
China
Prior art keywords
data
cache line
cache
updated
mask
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110398338.7A
Other languages
Chinese (zh)
Other versions
CN114217809A (en
Inventor
何王全
郑方
王飞
过锋
吴伟
陈芳园
朱琪
钱宏
管茂林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuxi Jiangnan Computing Technology Institute
Original Assignee
Wuxi Jiangnan Computing Technology Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuxi Jiangnan Computing Technology Institute filed Critical Wuxi Jiangnan Computing Technology Institute
Priority to CN202110398338.7A priority Critical patent/CN114217809B/en
Publication of CN114217809A publication Critical patent/CN114217809A/en
Application granted granted Critical
Publication of CN114217809B publication Critical patent/CN114217809B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/44Encoding
    • G06F8/443Optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/44Encoding
    • G06F8/441Register allocation; Assignment of physical memory space to logical memory space
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

The invention discloses a method for realizing a many-core simplified Cache protocol without transverse consistency, which comprises the following steps: s1, analyzing the update condition of data in a Cache line, and marking updated data; s2, if all data in the Cache line are not updated, or all data in the Cache line are updated, jumping to S5, and if only part of data in the Cache line are updated, jumping to S3; s3, when only partial content of data in one Cache line needs to be written back, other bit masks are 0; s4, updating the data with the corresponding mask bit of 1 in the main memory according to the granularity and the setting condition of the mask; s5, directly performing write-back operation on the Cache line. The invention effectively solves the false sharing problem of the shared main memory Cache structure, can also improve the write-back efficiency and effectively reduces the hardware cost of the processor in the aspect of Cache data management.

Description

Implementation method of many-core simplified Cache protocol without transverse consistency
Technical Field
The invention relates to a method for realizing a many-core simplified Cache protocol without transverse consistency, belonging to the technical field of high-performance computing.
Background
In order to alleviate the gap between the data access speed from the main memory and the data processing speed of the processor in the computer system, one or more levels of Cache memories (caches) are added between the processor and the main memory, and a Cache line is a basic unit for transmitting data between the Cache and the main memory, and comprises a plurality of data units. When Cache line data is copied into the Cache from the main memory, the storage control unit creates an entry for the Cache line data, where the entry includes both memory data and location information of the line data in the memory.
In the case of a shared main memory architecture and each processor core containing an independent Cache structure, the computing task within each processor updates different data in the same segment of memory space if the data is closely arranged in memory and in a Cache line mapping space. Then, since the Cache write-back is performed in line units, the consistency of the data in the Cache and the Cache is destroyed, and the data in the Cache is in error, which is a false sharing phenomenon. Under the condition of ensuring shared memory, the complete Cache consistency protocol can ensure the consistency of Cache data in each processor core, but under the architecture of a many-core processor, the implementation difficulty and the hardware cost are very large. The partial (single-sided) Cache coherence protocol can greatly reduce the hardware overhead of related parts of the processor, save physical space for other high-performance parts, but lacks an effective method and mechanism to avoid or solve the problem of false sharing.
"Heterogeneous many-core+shared memory" is an important trend in the development of current processor architectures. Under the structure, the realization difficulty and hardware cost of the complete Cache consistency protocol are large, and a large number of hardware components and circuits are required to be introduced to ensure the consistency of data in each kernel Cache and main memory data. In engineering practice, from the overall design and practical application of the processor, a partial (single-side) Cache consistency protocol is often adopted, so that the hardware cost of related parts of the processor is greatly reduced, the hardware space is saved for other high-performance parts, and the overall performance of the processor is improved. However, the existing technology lacks fine granularity management of Cache write-back data, thereby causing a false sharing problem and affecting the correctness of program logic.
Disclosure of Invention
The invention aims to provide a method for realizing a multi-core simplified Cache protocol without transverse consistency, which aims to solve the problem of false sharing in a shared main memory Cache structure in a multi-core processor.
In order to achieve the above purpose, the invention adopts the following technical scheme: the implementation method of the many-core simplified Cache protocol without transverse consistency comprises the following steps:
s1, acquiring Cache line state bit information of a hardware Cache, analyzing the update condition of data in a Cache line, and marking updated data;
S2, if all data in the Cache line are not updated, or all data in the Cache line are updated, jumping to S5, and if only part of data in the Cache line are updated, jumping to S3;
s3, when only partial content of data in one Cache line needs to be written back, determining the unit size and the data number of the partial data, and setting a mask of a bit corresponding to the partial data to be 1 and setting other bit masks to be 0;
S4, updating the data with the corresponding mask bit of 1 in the main memory according to the granularity and the setting condition of the mask, wherein the data is specifically as follows:
S4.1, according to the physical address mark in the Cache line described in S3, taking out the data of the corresponding address in the main memory, and marking the data as M;
S4.2, setting the granularity of the mask as X bytes, inquiring the mask of the corresponding data bit in S3 aiming at each X byte content of M, and if the mask bit is 0, not changing the content of the X bytes; if the mask bit is 1, writing the X-byte content in the corresponding position of the Cache line into M, and covering the original X-byte content in M;
S4.3, writing the modified M back to the main memory, and exiting;
S5, ignoring a mask mechanism, and directly performing write-back operation on the Cache line.
The further improved scheme in the technical scheme is as follows:
1. In the above scheme, in S1, if the state bit information of the Cache line is a binary number of all 0 bits, all data in the Cache line is not updated; if the state bit information of the Cache line is a binary number with all 1 bits, all data in the Cache line is updated.
Due to the application of the technical scheme, compared with the prior art, the invention has the following advantages:
According to the method and the device for writing back the data in the Cache line, the data granularity of the main memory is adjusted in a mask mode, the fact that the data which are updated in practice in the Cache line are written back to the main memory is guaranteed, the phenomenon that old data cover new data caused by writing back of the data in the Cache line in whole line is avoided, the problem of false sharing of a shared main memory Cache structure is effectively solved, the writing back efficiency is improved, and the hardware cost of a processor in the aspect of Cache data management is effectively reduced.
Drawings
FIG. 1 is a schematic representation of the process of the present invention.
Detailed Description
Examples: the invention provides a method for realizing a many-core simplified Cache protocol without transverse consistency, which specifically comprises the following steps:
s1, acquiring Cache line state bit information of a hardware Cache, analyzing the update condition of data in a Cache line, and marking updated data;
S2, if all data in the Cache line are not updated, or all data in the Cache line are updated, jumping to S5, and if only part of data in the Cache line are updated, jumping to S3;
s3, when only partial content of data in one Cache line needs to be written back, determining the unit size and the data number of the partial data, and setting a mask of a bit corresponding to the partial data to be 1 and setting other bit masks to be 0;
S4, updating the data with the corresponding mask bit of 1 in the main memory according to the granularity and the setting condition of the mask, wherein the data is specifically as follows:
S4.1, according to the physical address mark in the Cache line described in S3, taking out the data of the corresponding address in the main memory, and marking the data as M;
S4.2, setting the granularity of the mask as X bytes, inquiring the mask of the corresponding data bit in S3 aiming at each X byte content of M, and if the mask bit is 0, not changing the content of the X bytes; if the mask bit is 1, writing the X-byte content in the corresponding position of the Cache line into M, and covering the original X-byte content in M;
S4.3, writing the modified M back to the main memory, and exiting;
S5, ignoring a mask mechanism, and directly performing write-back operation on the Cache line.
In S1, if the state bit information of the Cache line is binary number of which each bit is all 0, all data in the Cache line are not updated; if the state bit information of the Cache line is a binary number with all 1 bits, all data in the Cache line is updated.
Further explanation of the above embodiments is as follows:
According to the method and the device, the granularity of the data written back to the main memory by the Cache line is adjusted in a mask mode, so that the data which is actually updated in the Cache line is ensured to be written back to the main memory, the phenomenon that old data cover new data caused by the whole line write back of the Cache line data is avoided, and the problem of false sharing of a shared main memory Cache structure is effectively solved.
Before the Cache line data is written back to the main memory space, the specific flow of the method is as follows:
1. and analyzing the data updating condition in the Cache line, and marking the updated data.
2. If all the data in the Cache line are not updated, or all the data in the Cache line are updated, jumping to 5; if only part of the data in the Cache line is updated, the jump is to 3.
3. When only partial content of data in one Cache line needs to be written back, determining the unit size and the data number of the partial data, and setting the mask of the bit corresponding to the partial data to 1 and setting other bit masks to 0.
4. And processing by adopting a main memory access mode of atomic read-write, and updating the data with the corresponding mask bit of 1 in the main memory according to the granularity and the setting condition of the mask, thereby ending the method.
5. And ignoring a mask mechanism, directly performing write-back operation on the Cache line, improving the write-back main memory efficiency of the Cache, and ending the method.
When the implementation method of the many-core simplified Cache protocol without transverse consistency is adopted, the granularity of data written back to the main memory by the Cache line is adjusted in a mask mode, so that the data written back to the main memory by the actual updated data in the Cache line is ensured, the phenomenon that old data cover new data caused by the whole line write back of the Cache line data is avoided, the false sharing problem of a shared main memory Cache structure is effectively solved, the write back efficiency is improved, and the hardware cost of a processor in the aspect of Cache data management is effectively reduced.
On one hand, the data granularity of the Cache line write-back main memory is adjusted in a mask mode, so that the false sharing problem of the shared main memory Cache structure is effectively solved;
On the other hand, through the analysis of the Cache line data updating condition, the Cache line is directly written back to main memory for the conditions of updating the whole line data and not updating the whole line data (read only), so that the writing back efficiency is improved;
finally, as an important supplement to the partial Cache consistency protocol, the hardware overhead of the processor in the aspect of Cache data management can be effectively reduced.
In order to facilitate a better understanding of the present invention, the terms used herein will be briefly explained below:
Cache: the Cache memory stores the contents of the frequently accessed RAM locations and the storage addresses of the data items, and when the processor refers to an address in the memory, the Cache checks whether the address exists, if so, the data is returned to the processor, and if not, the normal memory access is performed.
Cache lateral consistency: each core of the multi-core/many-core processor may have a private Cache, and in the case of shared main memory, when the same memory area is modified, the consistency of shared data in the respective caches is guaranteed to be called Cache lateral consistency.
False sharing: when the independent variables are modified by multithreading, if the variables share the same Cache line, the consistency problem needs to be solved by different means when the variables are written back to the main memory, and the performance of the variables is affected.
The above embodiments are provided to illustrate the technical concept and features of the present invention and are intended to enable those skilled in the art to understand the content of the present invention and implement the same, and are not intended to limit the scope of the present invention. All equivalent changes or modifications made in accordance with the spirit of the present invention should be construed to be included in the scope of the present invention.

Claims (2)

1. The implementation method of the many-core simplified Cache protocol without transverse consistency is characterized by comprising the following steps:
s1, acquiring Cache line state bit information of a hardware Cache, analyzing the update condition of data in a Cache line, and marking updated data;
S2, if all data in the Cache line are not updated, or all data in the Cache line are updated, jumping to S5, and if only part of data in the Cache line are updated, jumping to S3;
s3, when only partial content of data in one Cache line needs to be written back, determining the unit size and the data number of the partial data, and setting a mask of a bit corresponding to the partial data to be 1 and setting other bit masks to be 0;
S4, updating the data with the corresponding mask bit of 1 in the main memory according to the granularity and the setting condition of the mask, wherein the data is specifically as follows:
S4.1, according to the physical address mark in the Cache line described in S3, taking out the data of the corresponding address in the main memory, and marking the data as M;
S4.2, setting the granularity of the mask as X bytes, inquiring the mask of the corresponding data bit in S3 aiming at each X byte content of M, and if the mask bit is 0, not changing the content of the X bytes; if the mask bit is 1, writing the X-byte content in the corresponding position of the Cache line into M, and covering the original X-byte content in M;
S4.3, writing the modified M back to the main memory, and exiting;
S5, ignoring a mask mechanism, and directly performing write-back operation on the Cache line.
2. The method for implementing the many-core thin Cache protocol without transverse consistency according to claim 1, wherein the method comprises the following steps: in S1, if the state bit information of the Cache line is binary number of which each bit is all 0, all data in the Cache line are not updated; if the state bit information of the Cache line is a binary number with all 1 bits, all data in the Cache line is updated.
CN202110398338.7A 2021-04-14 2021-04-14 Implementation method of many-core simplified Cache protocol without transverse consistency Active CN114217809B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110398338.7A CN114217809B (en) 2021-04-14 2021-04-14 Implementation method of many-core simplified Cache protocol without transverse consistency

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110398338.7A CN114217809B (en) 2021-04-14 2021-04-14 Implementation method of many-core simplified Cache protocol without transverse consistency

Publications (2)

Publication Number Publication Date
CN114217809A CN114217809A (en) 2022-03-22
CN114217809B true CN114217809B (en) 2024-04-30

Family

ID=80695813

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110398338.7A Active CN114217809B (en) 2021-04-14 2021-04-14 Implementation method of many-core simplified Cache protocol without transverse consistency

Country Status (1)

Country Link
CN (1) CN114217809B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102880467A (en) * 2012-09-05 2013-01-16 无锡江南计算技术研究所 Method for verifying Cache coherence protocol and multi-core processor system
CN105718242A (en) * 2016-01-15 2016-06-29 中国人民解放军国防科学技术大学 Processing method and system for supporting software and hardware data consistency in multi-core DSP (Digital Signal Processing)
CN111930527A (en) * 2020-06-28 2020-11-13 绵阳慧视光电技术有限责任公司 Method for maintaining cache consistency of multi-core heterogeneous platform
CN112416615A (en) * 2020-11-05 2021-02-26 珠海格力电器股份有限公司 Multi-core processor, method and device for realizing cache consistency of multi-core processor and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9043194B2 (en) * 2002-09-17 2015-05-26 International Business Machines Corporation Method and system for efficient emulation of multiprocessor memory consistency

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102880467A (en) * 2012-09-05 2013-01-16 无锡江南计算技术研究所 Method for verifying Cache coherence protocol and multi-core processor system
CN105718242A (en) * 2016-01-15 2016-06-29 中国人民解放军国防科学技术大学 Processing method and system for supporting software and hardware data consistency in multi-core DSP (Digital Signal Processing)
CN111930527A (en) * 2020-06-28 2020-11-13 绵阳慧视光电技术有限责任公司 Method for maintaining cache consistency of multi-core heterogeneous platform
CN112416615A (en) * 2020-11-05 2021-02-26 珠海格力电器股份有限公司 Multi-core processor, method and device for realizing cache consistency of multi-core processor and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于外部共享Cache的多处理机Cache一致性协议;刘广忠;肖钰;袁淑芳;;河北工程技术高等专科学校学报;20060630(02);全文 *

Also Published As

Publication number Publication date
CN114217809A (en) 2022-03-22

Similar Documents

Publication Publication Date Title
US11893653B2 (en) Unified memory systems and methods
US11994974B2 (en) Recording a trace of code execution using reference bits in a processor cache
US7941631B2 (en) Providing metadata in a translation lookaside buffer (TLB)
US11126536B2 (en) Facilitating recording a trace file of code execution using index bits in a processor cache
JP4764360B2 (en) Techniques for using memory attributes
US7925865B2 (en) Accuracy of correlation prefetching via block correlation and adaptive prefetch degree selection
US9916247B2 (en) Cache management directory where hardware manages cache write requests and software manages cache read requests
US20130091331A1 (en) Methods, apparatus, and articles of manufacture to manage memory
JPH05210585A (en) Cash management system
JP2010507160A (en) Processing of write access request to shared memory of data processor
CN111742301A (en) Logging cache inflow to higher level caches by request
WO2008005687A2 (en) Global overflow method for virtualized transactional memory
US6711651B1 (en) Method and apparatus for history-based movement of shared-data in coherent cache memories of a multiprocessor system using push prefetching
US20220269615A1 (en) Cache-based trace logging using tags in system memory
CN103268297A (en) Accelerating core virtual scratch pad memory method based on heterogeneous multi-core platform
JPH04102948A (en) Data processing system and method
US10853247B2 (en) Device for maintaining data consistency between hardware accelerator and host system and method thereof
US8266379B2 (en) Multithreaded processor with multiple caches
CN114217809B (en) Implementation method of many-core simplified Cache protocol without transverse consistency
JP6249120B1 (en) Processor
US11687453B2 (en) Cache-based trace logging using tags in an upper-level cache
US11989137B2 (en) Logging cache line lifetime hints when recording bit-accurate trace
JP2000047942A (en) Device and method for controlling cache memory
CN114217937A (en) Compiler support method for alleviating false sharing problem
KR910004263B1 (en) Computer system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant