CN101615133A

CN101615133A - The apparatus and method that are used for delaying fine-grained copy-on-write

Info

Publication number: CN101615133A
Application number: CN200810131937A
Authority: CN
Inventors: 王华勇; 侯锐; 王鲲; 沈晓卫
Original assignee: International Business Machines Corp
Current assignee: International Business Machines Corp
Priority date: 2008-06-27
Filing date: 2008-06-27
Publication date: 2009-12-30

Abstract

A kind of method and apparatus that is used for delaying fine-grained copy-on-write is provided, this device comprises: director cache, be configured to write cacheline in the high-speed cache of storage instruction access processor of copy, the cacheline that mark is visited in response to needs; And version handover module, be configured in response to judging that the cacheline that is labeled in the high-speed cache will soon be swapped out, from than the original value that reads the lower level memory system on the corresponding address, and this original value and this address be stored in the daily record, and remove the described mark that is about to the cacheline that swapped out.The copy of writing of the present invention is fine-grained and postpones to carry out, thereby saved expenses such as resource and time, and has very low hardware complexity and cost.

Description

The apparatus and method that are used for delaying fine-grained copy-on-write

Technical field

The present invention relates to computer realm, be specifically related to write copy (copy-on-write) technology, more specifically relate to a kind of apparatus and method that are used for delaying fine-grained copy-on-write.

Background technology

The basic meaning of writing copy is meant, when attempting that a certain address carried out write operation, the legacy data on this address is copied to other places, and new data writes this address then.Computer system has been preserved new and old two piece of data on this address, so legacy data can recover when needed.Write copy and only when write data, just carry out replicate run, when read data, do not carry out replicate run.

Aspect such as program bug check when writing version management, the operation of maintenance, the transaction memory (TransactionalMemory) of realization that copy can be applicable to memory allocation function and some data type in virtual storage management, the compiler, virtual disk management, database snapshot.

Prior art write that copy normally realizes as follows: certain memory pages A of process 1 mark is read-only.If process 2 has been write this page, this write operation can successful execution, can trigger an interruption on the contrary.Interrupt service routine arrives page B with the content replication of page A, then page B is mapped to the address space (by revising page table) of process 2.After interruption was returned, the write operation of process 2 re-executed, at this moment the actual page B that writes of data.Hereafter, process 2 will be can't see page A, when visiting this address at every turn, actual access all be page B.

There are following two shortcomings in the copy of writing of prior art: 1) duplicate and be based on page or leaf, that is to say that duplicating is a page size (4KB usually) at least at every turn.Only revise a few byte sometimes, but also will duplicate whole page or leaf, thereby unnecessarily wasted resource.2) when first write operation takes place, duplicate at once, because the response interruption is relative time-consuming operation with duplicating the page, thereby this write operation can be waited for for a long time.Because the above-mentioned shortcoming of writing copy of prior art, so it can not satisfy such as demands of applications such as transaction memories well.

Obviously, need in the art a kind of more fine-grained and have a duplication technology scheme of writing that postpones the copy characteristic.

Summary of the invention

According to an aspect of the present invention, provide a kind of inertia fine granularity that is used for to write the method for duplicating, this method comprises: write the cacheline in the high-speed cache of storage instruction access processor of copy in response to needs, the cacheline that mark is visited; And in response to judging that the cacheline that is labeled in the high-speed cache will soon be swapped out, from than the original value that reads the lower level memory system on the corresponding address, and this original value and this address be stored in the daily record, and remove the described mark that is about to the cacheline that swapped out.

According to another aspect of the present invention, a kind of device that is used for delaying fine-grained copy-on-write is provided, this device comprises: director cache is configured to write cacheline in the high-speed cache of storage instruction access processor of copy, the cacheline that mark is visited in response to needs; And version handover module, be configured in response to judging that the cacheline that is labeled in the high-speed cache will soon be swapped out, from than the original value that reads the lower level memory system on the corresponding address, and this original value and this address be stored in the daily record, and remove the described mark that is about to the cacheline that swapped out.

Advantage of the present invention comprises: it is fine-grained writing copy, can finish in high-speed cache, and not need the page of virtual memory operation, thereby save resource; Write copy and postpone, do not need to finish immediately, thereby saved time and expense; In addition, the copy scheme of writing of the present invention makes full use of existing hardware component, thereby has very low hardware complexity and cost.

Description of drawings

Set forth the creative feature that is considered to characteristics of the present invention in the claims.But, below reading by the reference accompanying drawing to the detailed description of illustrative embodiment can understand better invention itself with and preferably use pattern, other target, feature and advantage, in the accompanying drawings:

Fig. 1 shows the exemplary realization of devices in accordance with embodiments of the present invention in the Power architecture;

Fig. 2 shows the storage instruction of writing in the copy zone and carries out the synoptic diagram of writing copy function;

Fig. 3 shows the cacheline (Cache line) according to the embodiments of the invention expansion;

Fig. 4 shows the cacheline according to the expansion of this another embodiment of the present invention; And

Fig. 5 shows the method that is used for delaying fine-grained copy-on-write according to an embodiment of the invention.

Embodiment

Embodiments of the invention are described with reference to the accompanying drawings.In the following description, many details have been set forth so that more fully understand the present invention.But, it is apparent that for those skilled in the art realization of the present invention can not have these details.In addition, should be understood that the present invention is not limited to the specific embodiment introduced.On the contrary, can consider to implement the present invention, and no matter whether they relate to different embodiment with the combination in any of following feature and element.Therefore, following aspect, feature, embodiment and advantage for illustrative purposes only usefulness and should not be counted as the key element or the qualification of claims, unless clearly propose in the claim.

Apparatus and method of the present invention can be implemented in the architecture of Power class, for example in the Power series of products of IBM, also can be implemented in other architectures such as Intel processor.In addition, device of the present invention both can be implemented in the single processor system, also can be implemented in the multicomputer system.Below mainly apparatus and method of the present invention are described with reference to the Power processor of IBM.

Fig. 1 shows the exemplary realization of devices in accordance with embodiments of the present invention in the Power architecture.

As shown in the figure, this Power architecture comprises L2 high-speed cache and controller 101 thereof.The great majority control of L2 cache management is to be carried out by the RC module (explorer) in the director cache 101.The RC resume module is to the request of L2.Read request comes from processor, is used for the L1 data cache and reloads or instruction fetch.Write request comes from storage queue.The RC module is finished following function:

Return data: when processor sends read request to L2, if hit, the RC module from L2 to the processor return data; If do not hit, the RC module obtains to return to processor behind the desired data by bus.

Buffer consistency: if processor to L2 send read or write request but do not hit, or sending write request, to hit one be not the cache blocks of modification state (the M state in the MSI buffer consistency agreement), the RC module initiates a message to bus, to keep buffer consistency.

Write data:, write the L2 buffer memory then when the read request of L2 when miss, is obtained required data by bus.Perhaps the data of the write request of sending from processor write the L2 buffer memory by the RC module.

Synchronously: when the write operation from a processor hits a L2 cacheline, and this cacheline is initiated to the invalid operation (back-invalidate) of this another processor when being marked as the L1 data cache that is present in another processor.

First three function is represented by three arrows that are connected with the RC module among Fig. 1.The 4th function shows the control path of existence from the RC module to relevant L1 data cache.

In this Power architecture, the monitoring module in the director cache 101 be in charge of from bus monitoring to the message of consistency protocol.When consistency protocol message is hit effective L2 cacheline, monitor module and take suitable action according to the type of message.Basically is monitored module design task and can be comprised from the high-speed cache reading of data and by bus and send it to other processors; Perhaps send it to than lower level memory system from the high-speed cache reading of data and by the module of throwing out (castout machine); Perhaps upgrade cached state simply.The module of throwing out is responsible for data are transferred to than lower level memory system from the L2 high-speed cache.For example, when a cacheline was replaced, the module of throwing out was sent it into internal memory or L3 high-speed cache.

In order to improve cache hit rate, the Power architecture pre-fetches data into high-speed cache by the hardware data parts of looking ahead.For example, L2 is miss can to initiate prefetch operation, with some cache blocks from get the L2 high-speed cache than lower level memory system.The quantity of the cacheline of being got can be adjusted by being provided with of field that changes a special register.

It more than is simple introduction to the associated components in the L2 director cache 101 in the existing P ower architecture.1 be described in the device of realizing in this Power architecture that is used for delaying fine-grained copy-on-write according to an embodiment of the invention with reference to the accompanying drawings.Be to be noted that the delaying fine-grained copy-on-write device that is used for of the present invention also can realize in other architectures.

As shown in Figure 1, this device that is used for delaying fine-grained copy-on-write comprises director cache 101 and version handover module 102.Wherein, director cache 101 is write the cacheline in the storage instruction access cache of copy, the cacheline that mark is visited in response to needs; Version handover module 102 is in response to judging that the cacheline that is labeled in the high-speed cache will soon be swapped out, from than the original value that reads the lower level memory system on the corresponding address, and this original value and this address be stored in the daily record, and remove the described mark that is about to the cacheline that swapped out.

Wherein, the described needs storage instruction of writing copy can be arranged in and write the copy zone.Write the copy zone and be meant one section such in program code zone, any storage instruction in this zone is write copy function with execution.In source code, beginning and the end of writing copy zone are by a pair of special instruction, and for example BEGIN_COW and END_COW come mark.In this article, the storage operation of writing in the copy zone is called as the C storage.Describedly write copy function and be meant that at first the initial value with this memory address place is stored in the another location when the write operation carried out a certain memory address.For example, carry out to give an order when processor:

Store variable_A to[address_A] time,

In fact processor will carry out following operation:

Load?temp?from[address_A]

Store?tmp?to[address_B]

Store?variable_A?to[address_A]

That is to say, when will newly be worth be written into before, old value must at first be saved to a certain location (address_B).This ad-hoc location is to stipulate in advance.Can use several different methods, for example define this ad-hoc location by a register.In an embodiment of the present invention, this ad-hoc location is present in the following data logging.

In one embodiment of the invention, each processor thread will have the daily record of two special uses, and these two daily records are present in the internal memory.Wherein, data logging is used to preserve the initial value on certain memory address, and the address daily record is used to preserve this memory address.The base address of these two daily records can be specified by special-purpose register is set during program initialization.

Fig. 2 shows the storage instruction of writing in the copy zone and carries out the synoptic diagram of writing copy function.This figure just does a notional introduction.As shown in the figure, when carrying out when writing copy function by the storage instruction in the copy zone " Store Var to[A] " of writing of BEGIN_COW and END_COW instruction institute mark, the initial value that memory address [A] is located will be stored in the data logging space in the internal memory, memory address [A] itself will be stored in the address log space in the internal memory, will be stored in the memory address [A] and newly be worth Var.

The device that is used for delaying fine-grained copy-on-write of the present invention is realized, and to write copy be " fine-grained ", this means that the data block size of each copy is very little.In an embodiment of the present invention, this granularity is the size of cacheline, for example 128B.That is to say, write copy and preservation is comprised for example whole cacheline of variable variable_A at every turn, rather than preserve the whole page that comprises a variable writing in the duplication technology, neither preserve a variable itself as existing.

Of the present invention be used for that the delaying fine-grained copy-on-write device realized to write copy be " delay ", this means that the operation of writing copy do not carry out immediately when C storage instruction access cache, but have a delay, carry out when promptly judging that by described version handover module 102 cache blocks that is labeled in this high-speed cache is about to be swapped out.

In the Power architecture, processor has 4 tunnel of 32KB-64KB to be write and penetrates (write-through) L1 data cache, and 2M-4M 8 the tunnel comprise formula (inclusive) L2 high-speed cache.The size of the cacheline of L1 and L2 is 128B.

According to embodiments of the invention, described mark need be write the cacheline of the storage instruction visit of copy and can realize by add a sign on this cacheline.Fig. 3 shows the cacheline according to the embodiments of the invention expansion.As shown in the figure, added a sign (can be described as the A sign) to each L1 and L2 cacheline, whether this this cacheline of A sign indication is visited by the C storage instruction.In this article, its A indicates that the cacheline that is set up is called as the A piece.The hardware cost of this A sign only is one.This embodiment is suitable for only having a processor thread to use the situation of L1 and L2 high-speed cache.

According to another embodiment of the present invention, for each the bar storage instruction that copies in the zone of writing in the program, remove cacheline (is for example carried out above-mentioned mark, the A sign is set) outside, the sign (can be described as color-coded) that sign is carried out the processor thread of this storage instruction also is set on this cacheline.

Fig. 4 shows the cacheline according to the expansion of this another embodiment of the present invention.As shown in the figure, in this embodiment, a plurality of processor threads are shared the L2 high-speed cache.Therefore, in the cacheline of this L2 high-speed cache, except having added aforesaid A sign, also added color-coded (being described second sign), this which processor thread of color-coded expression has been visited this cacheline, and can be the identifier of processor thread.Therefore, this color-codedly can be used for distinguishing a plurality of processor threads that may visit this high-speed cache simultaneously.

Return Fig. 1, according to embodiments of the invention, described director cache 101 also is configured to: submit the described copy of writing in response to determining, remove the mark of all cachelines that are labeled accordingly in the described high-speed cache.

According to embodiments of the invention, described director cache 101 also is configured to: in response to the described copy of writing of definite rollback, all cachelines that are labeled accordingly in the described high-speed cache are invalid, and the value than lower level memory system address place of storing in the described daily record turned back to accordingly locate than the lower level memory system address.

Particularly, (for example in the controller of L1 and L2 high-speed cache) will add the hardware of carrying out following two operations in processor, and these two operations all should be carried out atomically:

Atom is invalid: all A pieces in L1 and the L2 high-speed cache (or belonging to all A pieces that certain processor thread promptly has certain color mark) are invalid.

Atom is removed: the A sign of all A pieces in L1 and the L2 high-speed cache (or belonging to all A pieces that certain processor thread promptly has certain color mark) is removed.

Like this, when a C storage instruction access cache piece, the A sign is set up.When writing the end of copy zone thereby writing the copy submission, be eliminated by this A sign of writing each A piece of copy area domain browsing.In addition, corresponding record in data logging and the address daily record can be abandoned, this can by for example make simply the daily record of log record pointed begin realize.When writing the copy zone, be disabled by this each A piece of writing the copy area domain browsing, and the initial value in the daily record is turned back to the raw address place by rollback.In the Power architecture, because the L1 data cache is a tracing formula high-speed cache, so storage operation is always visited the L2 high-speed cache.A piece in the L1 data cache comes from the read operation of the A piece that hits in the L2 high-speed cache.

In an embodiment of the present invention, because it is nested not allow to write the copy zone, each processor thread can only be in one at any time at the most to be write in the copy zone, so cacheline described color-coded that has A sign in the high-speed cache identified and write the copy zone under the C storage instruction of visiting this cacheline.Like this, for having realized color-coded embodiments of the invention, when submitting to or rollback one when writing the copy zone, can be according to the color-coded A sign of removing corresponding cacheline in the cacheline, or invalid corresponding cacheline.

An example of typically writing copy is as follows:

......

BEGIN-COW

load?x?from?address?A

store?y?to?address?B

if(condition){

END-COW；

}else{

rollback；

char＊p＝data_log_tail；

char＊q＝addr_log_tail；

while(p＞＝data_log_head){

To transfer to address * q by the data that p points to;

P-=128; // suppose that the high-speed cache block size is 128B

Q-=4; // suppose it is 32 systems, each address is 4B like this

}

......

From this example as seen, when writing the end of copy zone, judge and need to determine rollback, still submit to by the logic of program itself.Submit to if desired, then carry out the END-COW instruction, thereby remove the A sign of all A pieces of being visited by this replication region (i.e. this processor thread) by processor.If program need to determine rollback, then call rollback instruction, thereby make that all A piece is invalid in the high-speed cache; In addition, also the data in the data logging are duplicated back former memory address with a circulation.

As shown in Figure 1, according to embodiments of the invention, comprise prediction module 103 in the described version handover module 102.Therefore the built-in function of described prediction module nonintervention L2 high-speed cache can not increase the checking cost of L2 high-speed cache.Described prediction module can have one or more.Described prediction module is used for predicting the A piece that will be swapped out from the L2 high-speed cache, and when doping the A piece that will from the L2 high-speed cache, be swapped out, prediction module sends order to the hardware data parts of looking ahead, with will this predicted A piece than low memory for example the initial value in the internal memory be loaded in the version buffer of version handover module 102.

In other embodiments of the invention, described prediction module 103 also can be positioned at outside this version handover module 102, for example can be arranged in the module of throwing out of L2 director cache.It is emphasized that in addition the version handover module can be implemented in the data pre-fetching inside modules that present processor has had fully, this is consistent with spirit of the present invention.

According to one embodiment of the invention, the cacheline that is labeled in this high-speed cache of described judgement will be comprised that the quantity of the cacheline that is labeled in the cache set of judging this high-speed cache reaches a threshold value by swapping out.

For example, can use following any pattern to judge that the cacheline that is labeled in the high-speed cache will be swapped out by described prediction module 103:

1. aggressive mode.In this pattern, can there be a plurality of prediction module 103, each prediction module 103 is responsible for a fixed area of L2 high-speed cache.Prediction module 103 travels through the cache set in its oneself the fixed area termly, and calculates how many A pieces are arranged in the described cache set.Because the supposition character of prediction module 103, this calculating needn't be all very accurate at every turn.Tolerate that certain inexactness can reduce hard-wired complicacy.When the quantity of the A piece in the cache set reached a threshold value, for example for 8 road L2 high-speed caches, when the quantity of the A piece in the cache set reached 6, prediction module 103 can determine that the cacheline in this cache set will be swapped out.At this moment, if first cacheline (perhaps arbitrary other particular cache block) in this cache set is the A piece, then prediction module 103 can select this first cacheline (perhaps these other particular cache block) to carry out the version switching, is about to this cacheline and is preserving than the initial value in the lower level memory system; Otherwise prediction module 103 can select an A piece to carry out the version switching from this cache set randomly.

2. Passive Mode.In this pattern, prediction module 103 does not initiatively monitor the L2 high-speed cache, but waits for RC module notice.When the A piece is added to a cache set, and the quantity of the A piece in this cache set reached a threshold value at for example 6 o'clock, and the RC module can be notified prediction module 103.At this moment, if first cacheline in this cache set (perhaps arbitrary other particular cache block) is the A piece, then prediction module 103 can select this first cacheline (perhaps these other particular cache block) to carry out the version switching; Otherwise prediction module 103 can select an A piece to carry out the version switching from this cache set randomly.

The realization of the another possibility of Passive Mode is, when high-speed cache is chosen the cache blocks that need be replaced from some group, not as classic method, only to select one, but select two or more, as long as have an A piece in the cache blocks that is selected at least, just predict that it will be replaced.

According to another embodiment of the present invention, the cacheline that is labeled in described this high-speed cache of prediction will be comprised that the cacheline of the one or more ad-hoc locations in the cache set of judging this high-speed cache is labeled by swapping out.

For example, can use following pattern to judge that the cacheline that is labeled in the high-speed cache will be swapped out by described prediction module 103.

3. simple mode.In this pattern, the specific cacheline in cache set is when for example first cacheline becomes the A piece, and prediction module 103 or version handover module 102 determine that these cachelines will be swapped out, and immediately this A piece is carried out version and switch.

According to other embodiments of the invention, the cacheline that is labeled in this high-speed cache of described prediction will be comprised by swapping out: in response to swap out cacheline in this high-speed cache of described director cache, judge that this cacheline is labeled.

In such embodiments, can not have described prediction module 103, but when director cache 101 swaps out cacheline, judge whether this cacheline is the A piece.If the A piece, then notification version handover module 102 carries out the version switching.If not the A piece, processing normally then swaps out.According to embodiments of the invention, as described below, this judgement and notice can be carried out by the module of throwing out in the director cache 101.

Certainly, in some embodiments of the invention, can both there be prediction module 103, thereby the A piece that will be swapped out is carried out prior prediction and version switching, again when director cache 101 swaps out cacheline, judge whether this cacheline is the A piece, and when judging that this cacheline is the A piece, notification version handover module 102 carries out version and switches.

As shown in Figure 1, according to embodiments of the invention, described version handover module 102 can comprise the version impact damper, is used to store the initial value from the predicted A piece of the parts of looking ahead.

According to embodiments of the invention, when the initial value from the predicted A piece of the parts of looking ahead arrived the version impact damper of version handover module 102, version handover module 102 began the version handoff procedure immediately.This version handoff procedure can may further comprise the steps:

Version handover module 102 is removed the A sign of corresponding A piece.

Version handover module 102 writes back to new address in the L2 high-speed cache with the initial value of the A piece of being predicted stored in the version impact damper and memory address thereof.Described initial value (128B) can be written to the address of order assignment in the data logging space; And described memory address (8B) is regarded as normal data, can be written to the address of order assignment in the log space of address.This address modification is the key character of version handover module 102, and it helps to have realized from postponing the switching of (lazy) version management (eager) version management to immediately.

That is to say, according to embodiments of the invention, will be corresponding with the cacheline that described judgement will be swapped out than the value at lower level memory system address place and should be stored in the described high-speed cache and place, address that should be different than the low memory address than the low memory address.

Then, the respective entries in the version impact damper is invalid.This version handoff procedure can be very efficiently, because it can be finished before initial value is written into the L2 high-speed cache.Initial value and address thereof can finally be written to the L2 high-speed cache by the RC module.Because they are common storage operations, therefore, they are not provided with the A sign.

In addition, when the A piece was swapped out by the module of throwing out in the director cache 101, the module of throwing out can be passed through a message informing version handover module 102, and this message comprises the address of the cacheline that is swapped out and affiliated processor thread sign.Then, version handover module 102 can use the clauses and subclauses in this address search version impact damper.If find, then this A piece is normally thrown out; Otherwise the machine of throwing out is waited for, is got the version impact damper up to the initial value at this A block address place.It should be noted that the L2 cacheline that those A sign the is not set up machine of can being thrown out normally swaps out, and needn't notification version handover module 102.

Under normal conditions, the cache set of overflowing (promptly continuing to have the A piece to enter when its whole cachelines are the A piece) is the sub-fraction of whole cache set, and the A piece in the cache set is little by little rather than suddenly to increase.In addition, it should be not recurrent that the A piece swaps out, because the L2 cache hit rate can be very high.Therefore, prediction module can be worked well, and avoids most of A pieces to replace in advance.

According to some embodiments of the present invention, described director cache 101 also is configured to: have only when all cachelines in the cache set of described high-speed cache all are labeled, just the cacheline that is labeled in this cache set with described high-speed cache swaps out.Specifically, in this embodiment of the present invention, the logic that swaps out in the L2 high-speed cache has been carried out corresponding change.Unless all cachelines in the identical cache set of L2 high-speed cache all are the A pieces, the A piece in this cache set can not be swapped out.When A piece in the cache set must be swapped out (at this moment, all the behavior A pieces in this cache set), can select the particular cache block in this cache set, for example first cacheline.It should be noted that the L1 data cache is not subjected to above-mentioned restriction.A piece in the L1 data cache can freely be swapped out.

In other embodiment of the present invention, can there be above-mentioned restriction.That is to say that the A piece in the cache set can the same selected swapping out with other conventional bar, just when being swapped out, judge whether it is the A piece.If the A piece, then the notification version handover module carries out the version hand-off process.

More than at the L2 high-speed cache in the Power architecture the exemplary realization that is used for the device of delaying fine-grained copy-on-write of the present invention has been described, be to be noted that foregoing description only for example, rather than limitation of the present invention.Can have at the device that is used for delaying fine-grained copy-on-write of the present invention and to compare manyly, still less or different modules with above description, and the connection between each module can be with described different with relation of inclusion.For example, in some embodiments of the invention, described version handover module 102 can not have described version impact damper, but directly daily record is put in the initial value and the address thereof of predicted A piece.In some embodiments of the invention, can not have the described hardware data parts of looking ahead, but carry out the predicted initial value of A piece and looking ahead of address thereof by described version handover module 102, or the like.Described these variations all are within the scope of the present invention.In addition, the title of above-mentioned each module is only decided for sake of convenience, is not limitation of the present invention therefore.

The method that is used for delaying fine-grained copy-on-write is according to an embodiment of the invention described with reference to the accompanying drawings.This method can be realized by the above device that is used for delaying fine-grained copy-on-write according to an embodiment of the invention, also can be realized by other devices.For simplicity's sake, omitted the details of part in the following description with above repetition.Therefore, can more specifically understand each step that is used for the method for delaying fine-grained copy-on-write of the present invention referring to above description.

Fig. 5 shows the method that is used for delaying fine-grained copy-on-write according to an embodiment of the invention.As shown in the figure, this method may further comprise the steps:

In step 501, write the cacheline in the high-speed cache of storage instruction access processor of copy in response to needs, this cacheline of mark.

According to embodiments of the invention, write the cacheline in the high-speed cache of storage instruction access processor of copy in response to needs, the sign that sign is carried out the processor thread of this storage instruction also is set on this cacheline.

In step 502, in response to judging that the cacheline that is labeled in this high-speed cache will be swapped out, will be corresponding with the cacheline that described judgement will be swapped out than the value at lower level memory system address place and should be stored in the daily record than the lower level memory system address, and the described mark of this cacheline is removed.

According to embodiments of the invention, the cacheline that is labeled in this high-speed cache of described judgement will be comprised by swapping out: the quantity of judging the cacheline that is labeled in the cache set of this high-speed cache reaches threshold value.

According to embodiments of the invention, the cacheline that is labeled in this high-speed cache of described judgement will be comprised by swapping out: judge that the particular cache block in the cache set of this high-speed cache is labeled.

According to embodiments of the invention, the cacheline that is labeled in this high-speed cache of described judgement will be comprised by swapping out: in response to the cacheline in this high-speed cache that swaps out, judge that this cacheline is labeled, and notify this cacheline to be swapped out.

According to embodiments of the invention, have only when all cachelines in the cache set of described high-speed cache all are labeled, the cacheline that is labeled in this cache set of described high-speed cache could be swapped out.

According to embodiments of the invention, described daily record is present in described high-speed cache and described than in the lower level memory system, and described described original value and address are stored in comprises in the daily record and will be stored in places, address different with this address in the described high-speed cache with described original value and address.

In step 503, submit the described copy zone of writing in response to determining, remove the described mark of all cachelines that are labeled in the described high-speed cache.

In step 504, in response to the described copy zone of writing of definite rollback, all cachelines that are labeled accordingly in the described high-speed cache are invalid, and the value than lower level memory system address place of storing in the described daily record turned back to accordingly locate than the lower level memory system address.

More than described and be used for the method that lazy fine granularity is write copy according to an embodiment of the invention, be to be noted that above description only is example, rather than limitation of the present invention.In other embodiments of the invention, this method can have more, still less or different steps.In addition, the order between some steps can change or can executed in parallel, and some steps can be divided into a plurality of littler steps, or merge into a bigger step.For example, can be any order or executed in parallel between the above-mentioned steps 2 and 3 and between step 2 and 4, described step 2 can be divided into a plurality of steps, or the like.All these variations are all located within the scope of the present invention.

Although specifically illustrated and illustrated the present invention with reference to preferred embodiment, those technician in this area should be understood that and can carry out various changes and can not deviate from the spirit and scope of the present invention it in form and details.

Claims

1. method that is used for delaying fine-grained copy-on-write comprises:

Write the cacheline in the high-speed cache of storage instruction access processor of copy in response to needs, the cacheline that mark is visited; And

In response to judging that the cacheline that is labeled in the high-speed cache will soon be swapped out, from than the original value that reads the lower level memory system on the corresponding address, and this original value and this address be stored in the daily record, and remove the described mark that is about to the cacheline that swapped out.

2. according to the method for claim 1, also comprise:

Submit the described copy of writing in response to determining, remove the mark of all cachelines that are labeled accordingly in the described high-speed cache.

3. according to the method for claim 1, also comprise:

In response to the described copy of writing of definite rollback, all cachelines that are labeled accordingly in the described high-speed cache are invalid, and the value than lower level memory system address place of storing in the described daily record turned back to accordingly locate than the lower level memory system address.

4. according to the process of claim 1 wherein that the cacheline that is labeled in the described judgement high-speed cache is about to be swapped out comprise:

The quantity of judging the cacheline that is labeled in the cache set of this high-speed cache reaches threshold value.

5. according to the process of claim 1 wherein that the cacheline that is labeled in the described judgement high-speed cache is about to be swapped out comprise:

The cacheline of judging the one or more ad-hoc locations in the cache set of this high-speed cache is labeled.

6. according to the process of claim 1 wherein that the cacheline that is labeled in the described judgement high-speed cache is about to be swapped out comprise:

When the cacheline from cache set, selecting to be swapped out, select more than a cacheline that will be swapped out; And

Judge in the described selecteed cacheline and comprise the cacheline that is labeled.

7. according to the process of claim 1 wherein that the cacheline that is labeled in the described judgement high-speed cache is about to be swapped out comprise:

In response to the cacheline in this high-speed cache that swaps out, judge that this cacheline is labeled.

8. have only when all cachelines in the cache set of described high-speed cache all are labeled according to the process of claim 1 wherein, the cacheline that is labeled in this cache set of described high-speed cache could be swapped out.

9. described from being to realize by means of the data pre-fetching device in the processor than the original value that reads the lower level memory system on the corresponding address according to the process of claim 1 wherein.

10. according to the method for claim 1, wherein said daily record is present in described high-speed cache and described than in the lower level memory system, and described original value and address be stored in comprise in the daily record described original value and address are stored on the addresses different with this address in the described high-speed cache.

11. the method according to claim 1 also comprises:

Write the cacheline in the high-speed cache of storage instruction access processor of copy in response to needs, this cacheline of mark is carried out the processor thread of this storage instruction with sign.

12. a device that is used for delaying fine-grained copy-on-write comprises:

Director cache is configured to write cacheline in the high-speed cache of storage instruction access processor of copy, the cacheline that mark is visited in response to needs; And

The version handover module, be configured in response to judging that the cacheline that is labeled in the high-speed cache will soon be swapped out, from than the original value that reads the lower level memory system on the corresponding address, and this original value and this address be stored in the daily record, and remove the described mark that is about to the cacheline that swapped out.

13. according to the device of claim 12, wherein, described director cache also is configured to:

14. according to the device of claim 12, wherein, described director cache also is configured to:

15. according to the device of claim 12, the cacheline that is labeled in the wherein said judgement high-speed cache is about to be swapped out comprise:

16. according to the device of claim 12, the cacheline that is labeled in the wherein said judgement high-speed cache is about to be swapped out comprise:

17. according to the device of claim 12, the cacheline that is labeled in the wherein said judgement high-speed cache is about to be swapped out comprise:

18. according to the device of claim 12, the cacheline that is labeled in the wherein said judgement high-speed cache is about to be swapped out comprise:

In response to swap out cacheline in this high-speed cache of described director cache, judge that this cacheline is labeled.

19. according to the device of claim 12, wherein said director cache also is configured to:

Have only when all cachelines in the cache set of described high-speed cache all are labeled, just the cacheline that is labeled in this cache set of described high-speed cache is swapped out.

20. it is, wherein said from being to realize by means of the data pre-fetching device in the processor than the original value that reads the lower level memory system on the corresponding address according to the device of claim 12.

21. device according to claim 12, wherein said daily record is present in described high-speed cache and described than in the lower level memory system, and with described original value and address be stored in comprise in the daily record with described original value and address be stored in the described high-speed cache with should be than on the different address, low memory address.

22. according to the device of claim 12, wherein said director cache also is configured to:

Write the cacheline in the storage instruction access cache of copy in response to needs, this cacheline of mark is carried out the processor thread of this storage instruction with sign.