CN101008921A - Embedded heterogeneous polynuclear cache coherence method based on bus snooping - Google Patents

Embedded heterogeneous polynuclear cache coherence method based on bus snooping Download PDF

Info

Publication number
CN101008921A
CN101008921A CNA2007100669294A CN200710066929A CN101008921A CN 101008921 A CN101008921 A CN 101008921A CN A2007100669294 A CNA2007100669294 A CN A2007100669294A CN 200710066929 A CN200710066929 A CN 200710066929A CN 101008921 A CN101008921 A CN 101008921A
Authority
CN
China
Prior art keywords
data block
chip cache
state
cache
write
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CNA2007100669294A
Other languages
Chinese (zh)
Inventor
陈天洲
严力科
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CNA2007100669294A priority Critical patent/CN101008921A/en
Publication of CN101008921A publication Critical patent/CN101008921A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Memory System Of A Hierarchy Structure (AREA)

Abstract

This invention discloses one imbed abnormal multi-nuclear buffer accordance method based on bus detection, which is based on one time strategy processor one degree high speed buffer accordance with main function to keep same data in multi-nuclear local one degree high speed buffer and sharing two degree high speed buffer multiple copy accordance. This invention is suitable for abnormal multi-nuclear system processor based on bus to combine two strategies advantages.

Description

Embedded heterogeneous polynuclear cache coherence method based on bus snooping
Technical field
The present invention relates to heterogeneous multi-nucleus processor system field, relate in particular to a kind of embedded heterogeneous polynuclear cache coherence method based on bus snooping.
Background technology
Polycaryon processor, promptly chip multiprocessors is meant integrated a plurality of microprocessor cores on a chip, the executive routine code under the situation that does not promote processor working frequency, reduces the power consumption of processor concurrently, and obtains very high polymerization.And heterogeneous multi-nucleus processor refers to that promptly a plurality of microprocessor cores integrated on the chip are isomeries.
2006 be history of computers is arranged since regenerate the soonest 1 year of update processor, with Intel and AMD is that the processor manufacturer of representative has issued dual core processor at the beginning of the year, issued many moneys dual core processor afterwards, four core processors issued again in the end of the year 2006, on January 10th, 2007, Intel has showed the eight nuclear computing machines that dispose two four core processors, and polycaryon processor begins to come into the market the real arriving in computing machine multinuclear epoch completely.To the end of the year 2006, the polycaryon processor shipment amount has reached more than 90% of the total shipment amount of processor.In the years to come, the number of process nuclear will get more and more, and according to the processor development course figure of Intel Company, Intel will be 128 to examine 144 nuclears at main flow processor in 2010.In addition, other some chip manufacturers are are also researching and developing the more processor of multinuclear, and wherein the U.S. Silicon Valley one tame Rapport of venture company announces, the chip of 1,000 simple processor has been integrated in the plan exploitation.The new page of computing machine developing history has been opened in the arrival in multinuclear epoch undoubtedly.
In the present polycaryon processor system, the second level cache that a plurality of cores respectively have independent on-chip cache and share.In this polycaryon processor system, cache structure has brought new problem: if the process in the different processor nuclear need be shared some data, so same data just have a plurality of copies and leave in respectively in each on-chip cache, after the data in certain on-chip cache are updated, and the identical data copy in other on-chip caches is not made corresponding modify, what then those processor cores read from privately owned on-chip cache will be " dirty " data, cause the phenomenon of a plurality of version coexistences of same data.Here it is so-called on-chip cache consistency problem.
Cause that the inconsistent reason of data roughly has three kinds:
But 1, share inconsistent that write data causes.As previously mentioned, the copy of same data is present in a plurality of on-chip caches, when certain processor core has been revised data in the own on-chip cache, and the identical data copy is not done same modification thereupon in other on-chip caches, causes the inconsistent of data in a plurality of on-chip caches.In addition, after the data in certain on-chip cache are updated, before not writing back second level cache, also can cause the data of on-chip cache and second level cache inconsistent.If just have this moment a processor core process (supposing does not have the copy of revising data in the privately owned on-chip cache of this processor core) to need this data, when reading the second level cache data, will cause error in data.
What 2, process migration caused is inconsistent.In multiprocessor karyonide system, process can migration mutually in processor core.If the process in certain processor core has been revised the data in the privately owned on-chip cache, but before also not writing back second level cache, need to move to other processor core relaying reforwarding row for a certain reason, and what read will be the data of " out-of-date " this moment.
3, the data that cause of defeated people's output function are inconsistent.Suppose to exist in a plurality of on-chip caches the copying data of same data block in the second level cache, when system start-up I/O operates, I/O processor (passage DMA) just might upgrade the data in the second level cache, thereby causes the inconsistent of on-chip cache and second level cache data.
Coherence request is meant, if certain data is modified in the on-chip cache, so on second level cache (and higher level), the copy of these data must be immediately or last correct, and guarantee that its person quotes the correctness of this data content on the second level cache, exceeding again simultaneously increases communication load, cache coherent protocol problem to be solved that Here it is.The approach that realizes cache coherence on multicomputer system has two big classes.One class is the software way, when program compilation, by software analysis, data is divided into two kinds of available high-speed cache and unavailable high-speed caches.But public write data all belongs to unavailable high-speed cache class between each processor, can not put into high-speed cache.Another kind of is the hardware way, when program run, dynamically identifies the condition of inconsistent generation and in time handles by hardware, thereby make the use of high-speed cache that very high efficient be arranged.And this way is transparent to programmer and systemic software development personnel, has alleviated the software development burden, thereby generally has been used.
The strategy that generally adopts in the cache coherence strategy writes back (Write-Back) and writes the strategy by (Write-Through) at present.In the Cache that uses the Write-Through strategy, data block has two states: effectively with invalid.Effectively this data block contents of expression is correct, and this data block contents of invalid representation is " out-of-date " or not at Cache.Effective status in the Cache that adopts the Write-Through strategy further is subdivided into two kinds here: read one and write (read-write) state and read-only (read-only) state.A more than data block copy is correct in the read-only state representation total system, for example one in Cache, another is in storer.Read-write state represents that data block was modified once at least, and respective data blocks also is not modified in the storer, and it is correct promptly having only a data block copy in total system.
Summary of the invention
The object of the present invention is to provide embedded heterogeneous polynuclear cache coherence method based on bus snooping.
It is as follows that the present invention solves the technical scheme that topic adopts between its technology:
1) the data block state is distinguished
Whether the data block during according to read-write operation is to write for the first time, data block in the on-chip cache is divided into four kinds of states: " effectively ", engineering noise, " reservation " and " rewriting ";
2) the on-chip cache data block one of four states conversion of asking
The data block of processor core visit on-chip cache causes the variation of data block state between one of four states.The incident that triggers the conversion of on-chip cache data block is divided into read operation and write operation:
I. during read operation, two kinds of possibilities are arranged: a kind of possibility is when having effective data block in on-chip cache, the direct reading of data of processor, and the on-chip cache state is constant; Another kind may be exactly not have effective data block in the on-chip cache, at this moment triggering is read the disappearance incident, and on-chip cache is called in effective data block by system, and the corresponding data bulk state is changed to " effectively ";
II. during write operation, hit or do not hit two kinds of possibilities: write when hitting, when on-chip cache data block state is in " effectively " state, and be " reservation " with the state transitions of on-chip cache data block, the corresponding data bulk state with other process nuclear on-chip cache is changed to engineering noise simultaneously; When writing when not hitting, the state of local on-chip cache data block is changed to " reservation ", and the corresponding data bulk state with other on-chip cache is changed to engineering noise simultaneously;
3) carry out read-write operation according to the data block state
Visit is divided into read operation and write operation to processor core to on-chip cache:
I. during read operation, two kinds of possibilities are arranged: a kind of possibility is the direct reading of data of processor core when having effective data block in on-chip cache; Another kind of possibility is exactly not have effective data block in the on-chip cache, and system manages effective data block is called in on-chip cache, when respective data blocks is in dirty [state, also will forbid the second level cache operation simultaneously; If do not exist in the system be in effectively, the respective data blocks of reservation or dirty [state, illustrate that then the data block in the second level cache operation is correct copy, at this moment directly from the second level cache operation, read in just passable;
II. during write operation, hit or do not hit two kinds may: when on-chip cache data block state is in " effectively " state, employing is write wears strategy, the content that writes is write second level cache simultaneously; When the on-chip cache data block is in " reservation " or " rewriting " attitude, use to write back strategy; When writing when not hitting, trigger and write the disappearance incident, on-chip cache is at first called in correct data block by system, and use is write and is worn strategy and write back data block.
The present invention compares with background technology, and the useful effect that has is:
This method is applicable to the heterogeneous multi-core system processor based on bus, be in the same place writing the advantages of wearing, wear and write back strategy, reduced invalid operation owing to write for the first time to have adopted respectively to write with later each time write operation with writing back two kinds of strategies, reduce flow bus, improved the efficient of bus.
Description of drawings
Fig. 1 is an overview flow chart;
Fig. 2 is the data block state transition graph;
On behalf of native processor nuclear, Rl read among Fig. 2; On behalf of non-native processor nuclear, Rr read; On behalf of native processor nuclear, Wl write; On behalf of non-native processor nuclear, Wr read.
Specific implementation method
Specific implementation flow process of the present invention as shown in Figure 1.
The first step: the data block state is distinguished
Whether the data block during according to read-write operation is to write for the first time, and data block in the on-chip cache is divided into four kinds of states:
" effectively ": read in from second level cache and the on-chip cache data block consistent with second level cache copy;
Engineering noise: in on-chip cache, can not find or on-chip cache in data block contents " out-of-date ";
" reservation ": data are read in behind the on-chip cache from second level cache and were only write once, and the copy in the on-chip cache is consistent with copy in the second level cache, and it is correct copy;
" rewriting ": the data block in the on-chip cache was write more than once, and it is unique correct data block, at this moment the data block that the data block in the second level cache neither be correct.
Second step: the conversion between on-chip cache data block one of four states
This step is narrated the conversion of the data block state of on-chip cache with regard to processor to the different operating of on-chip cache, as shown in Figure 2.The incident that triggers the conversion of on-chip cache data block is divided into read operation and write operation, and on behalf of native processor nuclear, Rl read among Fig. 2; On behalf of non-native processor nuclear, Rr read; On behalf of native processor nuclear, Wl write; On behalf of non-native processor nuclear, Wr read:
During read operation, two kinds of possibilities are arranged.A kind of possibility is when having effective data block in on-chip cache, can be effectively, keep or rewriting, and the corresponding data bulk state is constant in this case.Another kind of possibility is exactly not have effective data block in the on-chip cache, and promptly data block is in disarmed state.Read disappearance incident with triggering this moment, and system manages effective data block is called in on-chip cache, which kind of situation no matter, and the respective data blocks of reading in the on-chip cache of back will enter " effectively " state.
During write operation, two kinds of possibilities are also arranged: write and hit and write and do not hit.Writing when hitting, when on-chip cache data block state is in " effectively " state, and is " reservation " with the state transitions of on-chip cache data block, and the corresponding data bulk state with other process nuclear on-chip cache is changed to engineering noise simultaneously; When the on-chip cache data block was in " reservation " or " rewriting " attitude, state transitions was to " rewriting " attitude, and this moment, other the on-chip cache data block that has identical content must be to be in the engineering noise attitude;
When writing when not hitting, trigger and write the disappearance incident, on-chip cache is at first called in correct data block by system, the method of calling in read to lack identical, write on-chip cache then write for the first time because be, wear strategy so use to write, write second level cache simultaneously.This moment, state was to shift like this: the state of local on-chip cache is changed to " reservation ", and the corresponding data bulk state with other on-chip cache is changed to engineering noise simultaneously.
The 3rd step: carry out read-write operation according to the data block state
Visit is divided into read operation and write operation to processor core to on-chip cache:
When processor core during, two kinds of possibilities are arranged to the on-chip cache read operation.A kind of possibility is to exist effectively in on-chip cache, can be effectively, keep or rewriting, and during data block, the direct reading of data of processor core, the on-chip cache state is constant.Another kind of possibility is exactly not have effective data block in the on-chip cache, and promptly data block is in disarmed state.Read disappearance incident with triggering this moment, system manages effective data block is called in on-chip cache, concrete process is as follows: at first judge whether exist in the system be in effectively, the respective data blocks of reservation or dirty [state, if exist, then it is called in local on-chip cache; When respective data blocks is in dirty [state, also to forbid the second level cache operation simultaneously.If do not exist in the system be in effectively, the respective data blocks of reservation or dirty [state, illustrate that then the data block in the second level cache is correct copy (also being unique copy), at this moment directly from second level cache, read in just passable.
Processor core is during to the on-chip cache write operation, and is similar to read operation, also have two kinds may.Perhaps hit, perhaps do not hit.When writing when hitting, will cause the transfer of on-chip cache state.Specifically, when the on-chip cache state is in " effectively " state, employing write wear strategy, the content that writes on-chip cache is write second level cache simultaneously, and be " reservation " with the state transitions of on-chip cache, the corresponding data bulk state with other on-chip cache is changed to engineering noise simultaneously; When the on-chip cache data block is in " reservation " or " rewriting " attitude, use writes back strategy, the state transitions of on-chip cache is to " rewriting " attitude, this moment, other the on-chip cache that has identical content must be to be in the engineering noise attitude, so these on-chip caches need not to carry out state transitions again.When writing when not hitting, trigger and write the disappearance incident, on-chip cache is at first called in correct data block by system, and the method for calling in is write on-chip cache then with reading disappearance, because be to write for the first time, wears strategy so use is write, and writes second level cache simultaneously.This moment, state was to shift like this: the state of local on-chip cache is changed to " reservation ", and the corresponding data bulk state with other on-chip cache is changed to engineering noise simultaneously.

Claims (1)

1. embedded heterogeneous polynuclear cache coherence method based on bus snooping is characterized in that:
1) the data block state is distinguished
Whether the data block during according to read-write operation is to write for the first time, data block in the on-chip cache is divided into four kinds of states: " effectively ", engineering noise, " reservation " and " rewriting ";
2) conversion between on-chip cache data block one of four states
The data block of processor core visit on-chip cache causes the variation of data block state between one of four states; The incident that triggers the conversion of on-chip cache data block is divided into read operation and write operation:
I. during read operation, two kinds of possibilities are arranged: a kind of possibility is when having effective data block in on-chip cache, the direct reading of data of processor, and the on-chip cache state is constant; Another kind may be exactly not have effective data block in the on-chip cache, and on-chip cache is called in effective data block by system, and the corresponding data bulk state is changed to " effectively ";
II. during write operation, hit or do not hit two kinds of possibilities: write when hitting, when on-chip cache data block state is in " effectively " state, and be " reservation " with the state transitions of on-chip cache data block, the corresponding data bulk state with other process nuclear on-chip cache is changed to engineering noise simultaneously; When writing when not hitting, the state of local on-chip cache data block is changed to " reservation ", and the corresponding data bulk state with other on-chip cache is changed to engineering noise simultaneously;
3) carry out read-write operation according to the data block state
Visit is divided into read operation and write operation to processor core to on-chip cache:
I. during read operation, two kinds of possibilities are arranged: a kind of possibility is the direct reading of data of processor core when having effective data block in on-chip cache; Another kind of possibility is exactly not have effective data block in the on-chip cache, and system manages effective data block is called in on-chip cache, when respective data blocks is in dirty [state, also will forbid the second level cache operation simultaneously; If do not exist in the system be in effectively, the respective data blocks of reservation or dirty [state, illustrate that then the data block in the second level cache operation is correct copy, at this moment directly from the second level cache operation, read in just passable;
II. during write operation, hit or do not hit two kinds may: when on-chip cache data block state is in " effectively " state, employing is write wears strategy, the content that writes is write second level cache simultaneously; When the on-chip cache data block is in " reservation " or " rewriting " attitude, use to write back strategy; When writing when not hitting, trigger and write the disappearance incident, on-chip cache is at first called in correct data block by system, and use is write and is worn strategy and write back data block.
CNA2007100669294A 2007-01-26 2007-01-26 Embedded heterogeneous polynuclear cache coherence method based on bus snooping Pending CN101008921A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNA2007100669294A CN101008921A (en) 2007-01-26 2007-01-26 Embedded heterogeneous polynuclear cache coherence method based on bus snooping

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNA2007100669294A CN101008921A (en) 2007-01-26 2007-01-26 Embedded heterogeneous polynuclear cache coherence method based on bus snooping

Publications (1)

Publication Number Publication Date
CN101008921A true CN101008921A (en) 2007-08-01

Family

ID=38697361

Family Applications (1)

Application Number Title Priority Date Filing Date
CNA2007100669294A Pending CN101008921A (en) 2007-01-26 2007-01-26 Embedded heterogeneous polynuclear cache coherence method based on bus snooping

Country Status (1)

Country Link
CN (1) CN101008921A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102262608A (en) * 2011-07-28 2011-11-30 中国人民解放军国防科学技术大学 Method and device for controlling read-write operation of processor core-based coprocessor
CN103370696A (en) * 2010-12-09 2013-10-23 国际商业机器公司 Multicore system, and core data reading method
CN104268102A (en) * 2014-10-10 2015-01-07 浪潮集团有限公司 Method for writing caches of storage servers in hybrid modes
CN104572528A (en) * 2015-01-27 2015-04-29 东南大学 Method and system for processing access requests by second-level Cache
CN106603355A (en) * 2015-10-15 2017-04-26 华为技术有限公司 Computing device, node device and server
CN107003962A (en) * 2014-12-27 2017-08-01 英特尔公司 Cache unanimously low overhead layering connection of the agency to uniform structure
CN103902470B (en) * 2012-12-25 2017-10-24 华为技术有限公司 Read processing method, equipment and the system during missing
CN109062613A (en) * 2018-06-01 2018-12-21 杭州中天微系统有限公司 Multicore interconnects L2 cache and accesses verification method

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103370696A (en) * 2010-12-09 2013-10-23 国际商业机器公司 Multicore system, and core data reading method
CN103370696B (en) * 2010-12-09 2016-01-20 国际商业机器公司 Multiple nucleus system and Nuclear Data read method
CN102262608A (en) * 2011-07-28 2011-11-30 中国人民解放军国防科学技术大学 Method and device for controlling read-write operation of processor core-based coprocessor
CN103902470B (en) * 2012-12-25 2017-10-24 华为技术有限公司 Read processing method, equipment and the system during missing
CN104268102A (en) * 2014-10-10 2015-01-07 浪潮集团有限公司 Method for writing caches of storage servers in hybrid modes
CN107003962B (en) * 2014-12-27 2021-07-13 英特尔公司 Method and device for maintaining cache consistency in computing system and computing system
CN107003962A (en) * 2014-12-27 2017-08-01 英特尔公司 Cache unanimously low overhead layering connection of the agency to uniform structure
CN104572528A (en) * 2015-01-27 2015-04-29 东南大学 Method and system for processing access requests by second-level Cache
US10366006B2 (en) 2015-10-15 2019-07-30 Huawei Technologies Co., Ltd. Computing apparatus, node device, and server
CN106603355B (en) * 2015-10-15 2019-10-18 华为技术有限公司 A kind of computing device, node device and server
CN106603355A (en) * 2015-10-15 2017-04-26 华为技术有限公司 Computing device, node device and server
CN109062613A (en) * 2018-06-01 2018-12-21 杭州中天微系统有限公司 Multicore interconnects L2 cache and accesses verification method
CN109062613B (en) * 2018-06-01 2020-08-28 杭州中天微系统有限公司 Multi-core interconnection secondary cache access verification method
US11550646B2 (en) 2018-06-01 2023-01-10 C-Sky Microsystems Co., Ltd. Method of verifying access of multi-core interconnect to level-2 cache

Similar Documents

Publication Publication Date Title
US11645099B2 (en) Parallel hardware hypervisor for virtualizing application-specific supercomputers
Adve et al. A unified formalization of four shared-memory models
CN101008921A (en) Embedded heterogeneous polynuclear cache coherence method based on bus snooping
US5276828A (en) Methods of maintaining cache coherence and processor synchronization in a multiprocessor system using send and receive instructions
Archibald et al. Cache coherence protocols: Evaluation using a multiprocessor simulation model
US8180971B2 (en) System and method for hardware acceleration of a software transactional memory
CN100495361C (en) Method and system for maintenance of memory consistency
US6668308B2 (en) Scalable architecture based on single-chip multiprocessing
CN101630287B (en) Performance based cache management
US20040098575A1 (en) Processor cache memory as RAM for execution of boot code
CN103927277A (en) CPU (central processing unit) and GPU (graphic processing unit) on-chip cache sharing method and device
CN104487946A (en) Method, apparatus, and system for adaptive thread scheduling in transactional memory systems
CN101604263A (en) A kind of method that realizes multi-duplicate running of core code segment of operation system
CN101065736A (en) Managing multiprocessor operations
Agarwal et al. Rebound: scalable checkpointing for coherent shared memory
KR102656509B1 (en) Improved durability for systems on chips (SOCs)
CN101334735B (en) Non-disruptive code update of a single processor in a multi-processor computing system
US9311241B2 (en) Method and apparatus to write modified cache data to a backing store while retaining write permissions
Cavé et al. Traleika glacier: A hardware-software co-designed approach to exascale computing
US8112590B2 (en) Methods and apparatus for reducing command processing latency while maintaining coherence
Asaduzzaman et al. A novel directory based hybrid cache coherence protocol for shared memory multiprocessors
Wu et al. Design of RISC-V heterogeneous multi-core SOC architecture for edge computing for power applications
Vianès et al. A Case for Second-Level Software Cache Coherency on Many-Core Accelerators
Banâtre et al. Scalable shared memory multiprocessors: Some ideas to make them reliable
Kirk An enhanced snoopy cache design for real-time multi-processing

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication