CN101008921A

CN101008921A - Embedded heterogeneous polynuclear cache coherence method based on bus snooping

Info

Publication number: CN101008921A
Application number: CNA2007100669294A
Authority: CN
Inventors: 陈天洲; 严力科
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2007-01-26
Filing date: 2007-01-26
Publication date: 2007-08-01

Abstract

This invention discloses one imbed abnormal multi-nuclear buffer accordance method based on bus detection, which is based on one time strategy processor one degree high speed buffer accordance with main function to keep same data in multi-nuclear local one degree high speed buffer and sharing two degree high speed buffer multiple copy accordance. This invention is suitable for abnormal multi-nuclear system processor based on bus to combine two strategies advantages.

Description

Embedded heterogeneous polynuclear cache coherence method based on bus snooping

Technical field

The present invention relates to heterogeneous multi-nucleus processor system field, relate in particular to a kind of embedded heterogeneous polynuclear cache coherence method based on bus snooping.

Background technology

Polycaryon processor, promptly chip multiprocessors is meant integrated a plurality of microprocessor cores on a chip, the executive routine code under the situation that does not promote processor working frequency, reduces the power consumption of processor concurrently, and obtains very high polymerization.And heterogeneous multi-nucleus processor refers to that promptly a plurality of microprocessor cores integrated on the chip are isomeries.

2006 be history of computers is arranged since regenerate the soonest 1 year of update processor, with Intel and AMD is that the processor manufacturer of representative has issued dual core processor at the beginning of the year, issued many moneys dual core processor afterwards, four core processors issued again in the end of the year 2006, on January 10th, 2007, Intel has showed the eight nuclear computing machines that dispose two four core processors, and polycaryon processor begins to come into the market the real arriving in computing machine multinuclear epoch completely.To the end of the year 2006, the polycaryon processor shipment amount has reached more than 90% of the total shipment amount of processor.In the years to come, the number of process nuclear will get more and more, and according to the processor development course figure of Intel Company, Intel will be 128 to examine 144 nuclears at main flow processor in 2010.In addition, other some chip manufacturers are are also researching and developing the more processor of multinuclear, and wherein the U.S. Silicon Valley one tame Rapport of venture company announces, the chip of 1,000 simple processor has been integrated in the plan exploitation.The new page of computing machine developing history has been opened in the arrival in multinuclear epoch undoubtedly.

In the present polycaryon processor system, the second level cache that a plurality of cores respectively have independent on-chip cache and share.In this polycaryon processor system, cache structure has brought new problem: if the process in the different processor nuclear need be shared some data, so same data just have a plurality of copies and leave in respectively in each on-chip cache, after the data in certain on-chip cache are updated, and the identical data copy in other on-chip caches is not made corresponding modify, what then those processor cores read from privately owned on-chip cache will be " dirty " data, cause the phenomenon of a plurality of version coexistences of same data.Here it is so-called on-chip cache consistency problem.

Cause that the inconsistent reason of data roughly has three kinds:

But 1, share inconsistent that write data causes.As previously mentioned, the copy of same data is present in a plurality of on-chip caches, when certain processor core has been revised data in the own on-chip cache, and the identical data copy is not done same modification thereupon in other on-chip caches, causes the inconsistent of data in a plurality of on-chip caches.In addition, after the data in certain on-chip cache are updated, before not writing back second level cache, also can cause the data of on-chip cache and second level cache inconsistent.If just have this moment a processor core process (supposing does not have the copy of revising data in the privately owned on-chip cache of this processor core) to need this data, when reading the second level cache data, will cause error in data.

What 2, process migration caused is inconsistent.In multiprocessor karyonide system, process can migration mutually in processor core.If the process in certain processor core has been revised the data in the privately owned on-chip cache, but before also not writing back second level cache, need to move to other processor core relaying reforwarding row for a certain reason, and what read will be the data of " out-of-date " this moment.

3, the data that cause of defeated people's output function are inconsistent.Suppose to exist in a plurality of on-chip caches the copying data of same data block in the second level cache, when system start-up I/O operates, I/O processor (passage DMA) just might upgrade the data in the second level cache, thereby causes the inconsistent of on-chip cache and second level cache data.

Coherence request is meant, if certain data is modified in the on-chip cache, so on second level cache (and higher level), the copy of these data must be immediately or last correct, and guarantee that its person quotes the correctness of this data content on the second level cache, exceeding again simultaneously increases communication load, cache coherent protocol problem to be solved that Here it is.The approach that realizes cache coherence on multicomputer system has two big classes.One class is the software way, when program compilation, by software analysis, data is divided into two kinds of available high-speed cache and unavailable high-speed caches.But public write data all belongs to unavailable high-speed cache class between each processor, can not put into high-speed cache.Another kind of is the hardware way, when program run, dynamically identifies the condition of inconsistent generation and in time handles by hardware, thereby make the use of high-speed cache that very high efficient be arranged.And this way is transparent to programmer and systemic software development personnel, has alleviated the software development burden, thereby generally has been used.

The strategy that generally adopts in the cache coherence strategy writes back (Write-Back) and writes the strategy by (Write-Through) at present.In the Cache that uses the Write-Through strategy, data block has two states: effectively with invalid.Effectively this data block contents of expression is correct, and this data block contents of invalid representation is " out-of-date " or not at Cache.Effective status in the Cache that adopts the Write-Through strategy further is subdivided into two kinds here: read one and write (read-write) state and read-only (read-only) state.A more than data block copy is correct in the read-only state representation total system, for example one in Cache, another is in storer.Read-write state represents that data block was modified once at least, and respective data blocks also is not modified in the storer, and it is correct promptly having only a data block copy in total system.

Summary of the invention

The object of the present invention is to provide embedded heterogeneous polynuclear cache coherence method based on bus snooping.

It is as follows that the present invention solves the technical scheme that topic adopts between its technology:

1) the data block state is distinguished

Whether the data block during according to read-write operation is to write for the first time, data block in the on-chip cache is divided into four kinds of states: " effectively ", engineering noise, " reservation " and " rewriting ";

2) the on-chip cache data block one of four states conversion of asking

The data block of processor core visit on-chip cache causes the variation of data block state between one of four states.The incident that triggers the conversion of on-chip cache data block is divided into read operation and write operation:

I. during read operation, two kinds of possibilities are arranged: a kind of possibility is when having effective data block in on-chip cache, the direct reading of data of processor, and the on-chip cache state is constant; Another kind may be exactly not have effective data block in the on-chip cache, at this moment triggering is read the disappearance incident, and on-chip cache is called in effective data block by system, and the corresponding data bulk state is changed to " effectively ";

II. during write operation, hit or do not hit two kinds of possibilities: write when hitting, when on-chip cache data block state is in " effectively " state, and be " reservation " with the state transitions of on-chip cache data block, the corresponding data bulk state with other process nuclear on-chip cache is changed to engineering noise simultaneously; When writing when not hitting, the state of local on-chip cache data block is changed to " reservation ", and the corresponding data bulk state with other on-chip cache is changed to engineering noise simultaneously;

3) carry out read-write operation according to the data block state

Visit is divided into read operation and write operation to processor core to on-chip cache:

I. during read operation, two kinds of possibilities are arranged: a kind of possibility is the direct reading of data of processor core when having effective data block in on-chip cache; Another kind of possibility is exactly not have effective data block in the on-chip cache, and system manages effective data block is called in on-chip cache, when respective data blocks is in dirty [state, also will forbid the second level cache operation simultaneously; If do not exist in the system be in effectively, the respective data blocks of reservation or dirty [state, illustrate that then the data block in the second level cache operation is correct copy, at this moment directly from the second level cache operation, read in just passable;

II. during write operation, hit or do not hit two kinds may: when on-chip cache data block state is in " effectively " state, employing is write wears strategy, the content that writes is write second level cache simultaneously; When the on-chip cache data block is in " reservation " or " rewriting " attitude, use to write back strategy; When writing when not hitting, trigger and write the disappearance incident, on-chip cache is at first called in correct data block by system, and use is write and is worn strategy and write back data block.

The present invention compares with background technology, and the useful effect that has is:

This method is applicable to the heterogeneous multi-core system processor based on bus, be in the same place writing the advantages of wearing, wear and write back strategy, reduced invalid operation owing to write for the first time to have adopted respectively to write with later each time write operation with writing back two kinds of strategies, reduce flow bus, improved the efficient of bus.

Description of drawings

Fig. 1 is an overview flow chart;

Fig. 2 is the data block state transition graph;

On behalf of native processor nuclear, Rl read among Fig. 2; On behalf of non-native processor nuclear, Rr read; On behalf of native processor nuclear, Wl write; On behalf of non-native processor nuclear, Wr read.

Specific implementation method

Specific implementation flow process of the present invention as shown in Figure 1.

The first step: the data block state is distinguished

Whether the data block during according to read-write operation is to write for the first time, and data block in the on-chip cache is divided into four kinds of states:

" effectively ": read in from second level cache and the on-chip cache data block consistent with second level cache copy;

Engineering noise: in on-chip cache, can not find or on-chip cache in data block contents " out-of-date ";

" reservation ": data are read in behind the on-chip cache from second level cache and were only write once, and the copy in the on-chip cache is consistent with copy in the second level cache, and it is correct copy;

" rewriting ": the data block in the on-chip cache was write more than once, and it is unique correct data block, at this moment the data block that the data block in the second level cache neither be correct.

Second step: the conversion between on-chip cache data block one of four states

This step is narrated the conversion of the data block state of on-chip cache with regard to processor to the different operating of on-chip cache, as shown in Figure 2.The incident that triggers the conversion of on-chip cache data block is divided into read operation and write operation, and on behalf of native processor nuclear, Rl read among Fig. 2; On behalf of non-native processor nuclear, Rr read; On behalf of native processor nuclear, Wl write; On behalf of non-native processor nuclear, Wr read:

During read operation, two kinds of possibilities are arranged.A kind of possibility is when having effective data block in on-chip cache, can be effectively, keep or rewriting, and the corresponding data bulk state is constant in this case.Another kind of possibility is exactly not have effective data block in the on-chip cache, and promptly data block is in disarmed state.Read disappearance incident with triggering this moment, and system manages effective data block is called in on-chip cache, which kind of situation no matter, and the respective data blocks of reading in the on-chip cache of back will enter " effectively " state.

During write operation, two kinds of possibilities are also arranged: write and hit and write and do not hit.Writing when hitting, when on-chip cache data block state is in " effectively " state, and is " reservation " with the state transitions of on-chip cache data block, and the corresponding data bulk state with other process nuclear on-chip cache is changed to engineering noise simultaneously; When the on-chip cache data block was in " reservation " or " rewriting " attitude, state transitions was to " rewriting " attitude, and this moment, other the on-chip cache data block that has identical content must be to be in the engineering noise attitude;

When writing when not hitting, trigger and write the disappearance incident, on-chip cache is at first called in correct data block by system, the method of calling in read to lack identical, write on-chip cache then write for the first time because be, wear strategy so use to write, write second level cache simultaneously.This moment, state was to shift like this: the state of local on-chip cache is changed to " reservation ", and the corresponding data bulk state with other on-chip cache is changed to engineering noise simultaneously.

The 3rd step: carry out read-write operation according to the data block state

When processor core during, two kinds of possibilities are arranged to the on-chip cache read operation.A kind of possibility is to exist effectively in on-chip cache, can be effectively, keep or rewriting, and during data block, the direct reading of data of processor core, the on-chip cache state is constant.Another kind of possibility is exactly not have effective data block in the on-chip cache, and promptly data block is in disarmed state.Read disappearance incident with triggering this moment, system manages effective data block is called in on-chip cache, concrete process is as follows: at first judge whether exist in the system be in effectively, the respective data blocks of reservation or dirty [state, if exist, then it is called in local on-chip cache; When respective data blocks is in dirty [state, also to forbid the second level cache operation simultaneously.If do not exist in the system be in effectively, the respective data blocks of reservation or dirty [state, illustrate that then the data block in the second level cache is correct copy (also being unique copy), at this moment directly from second level cache, read in just passable.

Processor core is during to the on-chip cache write operation, and is similar to read operation, also have two kinds may.Perhaps hit, perhaps do not hit.When writing when hitting, will cause the transfer of on-chip cache state.Specifically, when the on-chip cache state is in " effectively " state, employing write wear strategy, the content that writes on-chip cache is write second level cache simultaneously, and be " reservation " with the state transitions of on-chip cache, the corresponding data bulk state with other on-chip cache is changed to engineering noise simultaneously; When the on-chip cache data block is in " reservation " or " rewriting " attitude, use writes back strategy, the state transitions of on-chip cache is to " rewriting " attitude, this moment, other the on-chip cache that has identical content must be to be in the engineering noise attitude, so these on-chip caches need not to carry out state transitions again.When writing when not hitting, trigger and write the disappearance incident, on-chip cache is at first called in correct data block by system, and the method for calling in is write on-chip cache then with reading disappearance, because be to write for the first time, wears strategy so use is write, and writes second level cache simultaneously.This moment, state was to shift like this: the state of local on-chip cache is changed to " reservation ", and the corresponding data bulk state with other on-chip cache is changed to engineering noise simultaneously.

Claims

1. embedded heterogeneous polynuclear cache coherence method based on bus snooping is characterized in that:

1) the data block state is distinguished

2) conversion between on-chip cache data block one of four states

The data block of processor core visit on-chip cache causes the variation of data block state between one of four states; The incident that triggers the conversion of on-chip cache data block is divided into read operation and write operation:

I. during read operation, two kinds of possibilities are arranged: a kind of possibility is when having effective data block in on-chip cache, the direct reading of data of processor, and the on-chip cache state is constant; Another kind may be exactly not have effective data block in the on-chip cache, and on-chip cache is called in effective data block by system, and the corresponding data bulk state is changed to " effectively ";

3) carry out read-write operation according to the data block state