CN104813293A - Memory management using dynamically allocated dirty mask space - Google Patents

Memory management using dynamically allocated dirty mask space Download PDF

Info

Publication number
CN104813293A
CN104813293A CN201380061576.2A CN201380061576A CN104813293A CN 104813293 A CN104813293 A CN 104813293A CN 201380061576 A CN201380061576 A CN 201380061576A CN 104813293 A CN104813293 A CN 104813293A
Authority
CN
China
Prior art keywords
dirty
cache line
cache
mask
memory
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201380061576.2A
Other languages
Chinese (zh)
Other versions
CN104813293B (en
Inventor
梁坚
于春
徐飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qualcomm Inc
Original Assignee
Qualcomm Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qualcomm Inc filed Critical Qualcomm Inc
Publication of CN104813293A publication Critical patent/CN104813293A/en
Application granted granted Critical
Publication of CN104813293B publication Critical patent/CN104813293B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0891Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using clearing, invalidating or resetting means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0804Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with main memory updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0877Cache access modes
    • G06F12/0886Variable-length word access
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0893Caches characterised by their organisation or structure
    • G06F12/0895Caches characterised by their organisation or structure of parts of caches, e.g. directory or tag array
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/60Details of cache memory
    • G06F2212/604Details relating to cache allocation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

Systems and methods related to a memory system including a cache memory are disclosed. The cache memory system includes a cache memory including a plurality of cache memory lines and a dirty buffer including a plurality of dirty masks. A cache controller is configured to allocate one of the dirty masks to each of the cache memory lines when a write to the respective cache memory line is not a full write to that cache memory line. Each of the dirty masks indicates dirty states of data units in one of the cache memory lines. The cache controller stores identification (ID) information that associates the dirty masks with the cache memory lines to which the dirty masks are allocated.

Description

Use the memory management in the dirty mask space of dynamic assignment
Technical field
The present invention relates to memory management, and or rather, relate to the management of cache memory.
Background technology
Cache memory also referred to as high-speed cache, its in several data disposal system to accelerate access to data.Byte can allow client to write some bytes of cache line by write cache, and makes other byte unaffected.When being written to byte and can writing cache memory, it is important for maintaining data coherency.Multiple byte can write cache memory writing scheme can in order to maintain data coherency.Some in these writing schemes may impair system performances or consumed multiple memory space.
Summary of the invention
In one example, dirty impact damper can be included in when the write of cache line is not write completely to described cache line and distribute to the multiple dirty mask of corresponding cache line.In one example, dirty impact damper can be the part of cache memory.In other example, it can be separated with described cache memory, such as, be independent storage arrangement.The dirty situation of the data cell in described dirty mask instruction cache line.Each in described cache line can comprise storage allocation to the position of the identification (ID) of the described dirty mask of described cache line.For example, described ID can be stored in the such as dirty buffer index of flag in identical cache line such as dirty flag and/or complete dirty flag usually, described dirty flag can indicate at least one byte in described cache line to be dirty, and described complete dirty flag can indicate each byte in described cache line to be dirty.This can allow to access convenience.But in other example, described ID can be stored in other memory storage location be separated with the described flag in described cache line.
In an example, the present invention describes a kind of cache memory system, and it comprises: the cache memory comprising multiple cache line; Comprise the dirty impact damper of multiple dirty mask; And controller, it is configured to, when the write to each in described cache line is not the write completely to described cache line, the one in described dirty mask be distributed to described corresponding cache line.Each in described dirty mask indicates the dirty situation of the data cell in the one in described cache line.Identification (ID) information that the described cache line that described controller storage makes described dirty mask and described dirty mask be assigned to is associated.
In another example, the present invention describes a kind of method of operational store system, and it comprises: write data into the cache memory comprising multiple cache line; When the write to each in described cache line is not the write completely to described cache line, one in multiple dirty mask is distributed to described cache line, and wherein said dirty mask indicates the dirty situation of the data cell in described cache line; And store identification (ID) information that is associated of described cache line that described dirty mask and described dirty mask are assigned to.
In another example, the present invention describes a kind of accumulator system, and it comprises: for writing data into the device of the cache memory comprising multiple cache line; During for not being the write completely to described cache line in the write to each in described cache line, the one in multiple dirty mask is distributed to the device of described cache line, wherein said dirty mask indicates the dirty situation of the data cell in described cache line; And for storing the device of identification (ID) information that the described cache line that makes described dirty mask and described dirty mask be assigned to is associated.
In another example, the present invention describes a kind of system, and it comprises: processor; Be coupled to the primary memory of described processor; Be coupled to the cache memory of described processor.Described cache memory comprises: controller; Multiple cache line; And dirty impact damper.Described dirty impact damper comprises dirty mask.One in described dirty mask, when the write to each in described cache line is not the write completely to described cache line, is distributed to described cache line by described controller.Described dirty mask indicates the dirty situation of the data cell in described cache line.Identification (ID) information that the described cache line that described controller storage makes described dirty mask and described dirty mask be assigned to is associated.
In another example, the present invention describes a kind of method of operational store system, and it comprises: use the write being indicated to cache line in cache memory not to be that the dirty flag that writes completely is to follow the tracks of the state of described cache line; When the write to described cache line is not the write completely to described particular cache row, dirty mask is distributed to described cache line; And identification (ID) information of following the tracks of described distributed dirty mask is attached to described particular cache row, make to access described dirty mask.
In another example, the present invention describes a kind of cache memory system, and it comprises: be not that the dirty flag write completely is to follow the tracks of the device of the state of described cache line for using the write of the cache line be indicated in cache memory; During for not being the write completely to described particular cache row in the write to described cache line, dirty mask is distributed to the device of described cache line; And identification (ID) information for following the tracks of described distributed dirty mask is attached to described particular cache and exercises to access the device of described dirty mask.
In another example, the present invention describes a kind of non-transitory computer-readable media.Described computer-readable media causes programmable processor to carry out the instruction of following operation when being included in execution: use the write being indicated to cache line in cache memory not to be that the dirty flag that writes completely is to follow the tracks of the state of described cache line; When the write to described cache line is not the write completely to described particular cache row, dirty mask is distributed to described cache line; And identification (ID) information of following the tracks of described distributed dirty mask is attached to described particular cache row, make to access described dirty mask.
The details of one or more example is set forth in accompanying drawing and following description.Further feature, target and advantage will be apparent from description and accompanying drawing and claims.
Accompanying drawing explanation
Fig. 1 illustrates the block diagram can implementing the instance processes system of technology of the present invention, and described disposal system comprises processor, cache memory and primary memory.
Fig. 2 illustrates the block diagram can implementing the additional detail of the example cache memory of Fig. 1 of technology of the present invention.
Fig. 3 illustrates the block diagram can implementing the example of the cache line of technology of the present invention.
Fig. 4 A and 4B is the concept map of the example that the data processing in the caches using technology of the present invention is described.
Fig. 5 is the process flow diagram of the case method illustrated according to technology of the present invention.
Fig. 6 is another process flow diagram of the case method illustrated according to technology of the present invention.
Embodiment
Scheme that a scheme in order to maintain data coherency is sometimes referred to as " reading-distribute-write ".When receiving write request, it is capable that first processor can read Target Cache from system storage, and then selected data unit (such as byte) can be written to cache memory by described processor.The data cell do not write and system storage have identical value.When regaining cache line from high-speed cache, whole cache line is sent to system storage.Any unaffected data cell can be written into identical value.Utilize this scheme, any write to cache line causes the reading to system storage.This causes the extra traffic of system storage and the undesirable time delay to write request.In Modern Digital System, bandwidth of memory may be the bottleneck of system performance usually.Can be especially true for Graphics Processing Unit (GPU).Therefore, this scheme may not be preferred owing to the business of the increase of system storage.
Another program in order to maintain data coherency relates to storage so-called " the dirty mask of byte ".(data cell can access to next byte usually).In byte dirty mask instruction cache line is dirty any byte.Not yet be written to higher levels storer in byte packet containing the data being different from higher levels storer (such as, system storage), when making there is inconsistent data in high-speed cache and system storage, described byte is dirty.For example, the dirty mask of byte can, in order to indicate whether the amendment to cache line occurs, make to need change to be written to system storage.Therefore, when write back cache, the position in the dirty mask of byte can indicate the secondary high memory hierarchy being necessary cache line to be write back in accumulator system stratum, such as system storage.
Substitute and 1 dirty position is used for whole cache line, the dirty mask of 1/byte can be stored for each cache line.When being retracted, dirty mask can write as byte and enable data and send together with cache line data, to make the obscene word joint can determining the data comprising the corresponding data be different from higher levels storer." non-dirty " byte (that is, not yet write and still identical with the corresponding data in higher levels storer byte) can be not written into system storage.Utilize the scheme of this type, mask is stored in each cache line by system together with data.1 bitmask is utilized for each byte, mask random access memory (RAM) is data RAM 1/8, thus there is large memory area cost.Due to RAM framework, mask ram region may be greater than 1/8 of the general area of RAM usually.As explanation, in an example, high-speed cache is 8 kilobyte (KB) (using 512x128 position RAM), and region is about 0.061mm in 28nm technology 2.But in this example, the mask RAM of 256x32 position has 0.013mm 2region, it is 21% of the region of cache memory.
Example architecture for storing dirty BYTE MASK system as described above is illustrated in table 1.In an example, assuming that high-speed cache has 128 cache line (0...127), there is in every a line 1 kbytes of data.When using mask dirty with the framework store byte of following table 1, each cache line needs 1024 dirty BYTE MASK, and it consumes 128K bit register, as shown in hereafter table 1.
Illustrated by table 1, cache line can comprise effective flag to indicate the data in cache line whether effective, and comprises label, the address corresponding to the data in high-speed cache in described label instruction primary memory.Data (not showing in table 1) are also the parts of cache line.Dirty flag no longer mates data in main system memory in order to indicate whether to write any byte in the data of cache line to make it.Cache line also can comprise dirty BYTE MASK, and which byte of described dirty BYTE MASK instruction cache line is dirty.
Effective cache line Address Cache line is written into Which byte is indicated to be dirty
Effective 0 Label 0 Dirty Dirty BYTE MASK 0 [1023:0]
Effective 1 Label 1 Dirty Dirty BYTE MASK 1 [1023:0]
Effective 2 Label 2 Dirty Dirty BYTE MASK 2 [1023:0]
... ... ...
... ... ...
... ... ...
Effective 127 Label 127 Dirty Dirty BYTE MASK 127 [1023:0]
Table 1 is for the tag architecture of the dirty mask of byte
Reading-distribution-writing scheme as described above and the dirty mask scheme of store byte both have himself shortcoming.Reading-distribution-writing scheme adopts extra bandwidth, and the dirty mask of store byte causes region to punish (area penalty).In the example illustrated in Table 1, the dirty mask of byte is for each bytes store one in cache line.If certain bits works, such as logical one, then corresponding byte is dirty.Other configuration is also possible.
In the second framework discussed above, the dirty mask of cache memories store byte.Each cache line in described system has the dedicated location for the dirty mask of byte.The write request only not covering whole high-speed cache just will use dirty mask.Read requests will not use any dirty mask, and the write request covering whole cache line can use 1 dirty flag.Therefore, storer that the dedicated location for the dirty mask of byte may use than necessary situation is large to make each cache line in system have.
According to technology of the present invention, substitute and 1 bitmask is used for each byte in each cache line, 1 flag can be used in conjunction with the dirty mask of a pond byte for each cache line.Therefore, the write request covering whole cache line can use 1 dirty flag to describe the state of cache memory.Therefore, alternative 1 bitmask for each byte uses 1 flag for whole cache line.For each dirty but entirely not dirty cache line, the pointer of the one pointed in the dirty mask memory position of byte can be provided.In one example, each mask in the dirty mask of described pond byte can for each bytes store one in cache line.If certain bits works, such as logical one, then corresponding byte is dirty.Other configuration is also possible.
For example, pointer can in order to point to the dirty mask memory position of independent byte.Independent byte dirty mask memory position can in order to the dirty mask of store byte.Which byte of each byte dirty mask instruction particular cache row is dirty.In other words, obscene word joint is being written into and making it no longer contain those bytes of the data identical with the correspond to memories position in primary memory in cache line.The number of the dirty mask memory position of byte can be less than the number of cache line, because in general, and all cache line in non-cached are all dirty bytes by having simultaneously.Therefore, comparatively small memory can be used in this type of instance system.
Usually, in the application utilizing Graphics Processing Unit (GPU), as an example, great majority request is read requests.Read requests can not cause the byte in cache line dirty, because this type of reads the value do not changed in cache line.In other words, if before reading, cache line contains the data identical with corresponding higher levels storer (such as primary memory), then cache line also will contain the data identical with described corresponding higher levels storer after a read.In addition, most of write request has continuation address.Therefore, in general, in the application utilizing GPU, adjacent write request is by whole for Landfill covering cache line.Tentation data is not override by identical data, if so write whole cache line, then the content of each byte is all dirty.In other words, when each byte of cache line is not containing the data identical with corresponding higher levels memory location, then it is dirty for no longer including which byte of necessary tracking, because all bytes are all dirty.According to technology of the present invention, can replace dirty mask by 1 dirty flag and to the pointer of independent byte dirty mask memory position, described byte dirty mask memory position can be the part of a dirty mask in pond.Whether described 1 dirty flag instruction cache line is dirty, and if then indicate which byte of cache line to be dirty by the independent byte dirty mask memory position of described pointer identification.The dirty mask compared with peanut can be used, because each cache line does not need indivedual dirty mask.That is, some cache line are not dirty, that is, do not comprise any obscene word joint.
The present invention propose a kind of can the write framework of write cache for byte.The framework proposed can have special characteristic.As an example, substitute and use special dirty mask for each cache line, whole high-speed cache can share a dirty mask in pond.Compared with each cache line being comprised to the system of the dirty mask of byte time, dirty mask is shared in described pond can use comparatively small memory.The frequency that an aspect of the Dynamic System of the dirty mask that use number is less than the number of cache line is read operation can be allowed.Read operation does not change cache memory, and does not therefore cause using the dirty mask from the dirty mask in described pond.Therefore, the number of dirty mask significantly can be less than the number of cache line, because as discussed above, some systems can perform the reading (its do not cause data dirty) of big figure.Because many data may not (such as) simultaneously dirty with other cache line, the dirty mask of a pond byte therefore can be used to substitute the dirty mask of byte for each cache line.In addition, in some systems, identical cache line is arrived in write often, this means that these row often become " entirely dirty ".Complete dirty cache line does not use the dirty mask of byte, because each byte is all dirty.
Illustrating that the number of dirty mask can significantly be less than in the example of the number of cache line, the high-speed cache with 1000 cache line can use only 20 dirty masks.If write request is not written into whole cache line and makes the subgroup of only byte be dirty, then can by dirty mask allocation of space to particular cache row.Distribute dirty mask space identification (ID) this cache line can be attached to, to make to access described mask when regaining this cache line.In general, ID can store together with the such as cache line flag such as dirty flag and/or complete dirty flag, described dirty flag can indicate at least one byte in cache line to be dirty, and described complete dirty flag can indicate each byte in cache line to be all dirty.This can allow to access convenience.But in other example, ID can be stored in other memory storage location.
No longer be stored in the meaning in dirty BYTE MASK in the information about particular cache row, recoverable cache line.Such as be used for autonomous memory fresh information write cache line make it no longer dirty time or cache line become entirely dirty and because each byte be all dirty and do not need to follow the tracks of which byte be dirty time, this situation can be there is.In some cases, if each the dirty BYTE MASK in system all in use, then may be necessary to regain cache line, even if it is dirty but entirely not dirty.Therefore, system can write cache line from primary memory 16, to make cache line no longer dirty.
Secondly, the renewal that any write request being attached with the cache line (such as, relative to specific dirty mask cache line in use) of dirty mask should cause corresponding dirty mask is hit.When upgrading dirty mask, whether detection logic can detect dirty mask is all 1.Once dirty mask is all 1, thus instruction cache line is entirely dirty, then make it additional from cache line releasing by being set as by the ID of dirty mask invalid.This instruction cache line is " entirely dirty ".When needing the secondary high storer whole cache line be written in memory level, cache line is entirely dirty.In other words, make the secondary high storer in memory level (such as main system memory or other intermediate store a certain) not writing whole cache line for time up-to-date, cache line is entirely dirty.
Whether each cache line can use 1 " entirely dirty " flag to indicate particular cache row entirely dirty.Therefore, for complete dirty cache line, its dirty mask can be used for other request.For example, the dirty mask for complete dirty cache line can be used for redistributing to indicate another cache line not writing (such as, entirely dirty) completely.The full dirty situation of cache line can be indicated by complete flag referred to above.Thus, no longer need the dirty mask for that cache line, because for complete dirty cache line, each byte of that cache line is all dirty.Therefore, in this example, mask is dirty byte for tracking is unnecessary.
For comparing between other purposes and the example of some technology proposed by the invention of the dirty mask of byte, assuming that cache memory hash 128 cache line, each has 1 kbytes of data.When the dirty mask of other technology store byte of use, each cache line needs 1024 dirty BYTE MASK, and it consumes 128K bit register, as shown in table 1 above.
Utilize the technology proposed in the present invention, in an example, whether each cache line can use 1 " entirely dirty " flag to be entirely dirty to indicate particular cache row.If particular cache is capable is not entirely dirty, then which impact damper of instruction is just being stored the dirty BYTE MASK (DirtyByteMask) being used for particular cache row by 3 position indexes (being called 3 dirty buffer index in hereafter table 2).Assuming that most of cache line right and wrong are dirty or entirely dirty, then 8 impact dampers (dirty impact damper) can be enough large for the high-speed cache for having 128 cache line.
Each 8 dirty impact damper that can store 1k position flag and 8k bit register can be comprised in impact damper.In an example, four extra bits of each cache line be used in label are used, illustrated by table 2.Total of 128 cache line consumes as 4*128=512 position.Buffer in combination dirty with 8k position, for tracking data coherence (namely, it is dirty for following the tracks of which byte and which byte is not dirty) total bit order be 8.5k position, it is than much smaller for the 128k position needed for the dirty mask of store byte in other scheme referred to above.The cache line illustrated in table 2 also comprises label, the address in the primary memory (or a certain higher levels storer) of the data of described label instruction containing the data corresponded in cache memory.For example, label can indicate wherein be stored in data in particular cache line also can in order to the address in the primary memory that read by processor.
The tag architecture of table 2 proposed method
Fig. 1 illustrates the block diagram comprising the instance processes system 10 of processor 12, cache memory 14 and primary memory 16.Processor 12 can be microprocessor, digital signal processor (DSP), CPU (central processing unit) (CPU), Graphics Processing Unit (GPU) or other processor.In addition, processor 12 can comprise Digital Logic, such as other Digital Logic of field programmable gate array (FPGA), Complex Programmable Logic device (CPLD) and enforcement processing capacity.Processor 12 can comprise its one or more combination.For example, processor 12 can comprise GPU and provide other logic of processing capacity.In some instances, processor 12 can be one or more processor, such as the combination of multiple CPU, multiple DSP, one or more CPU and DSP, or other combination of CPU, DSP, GPU, FPGA, CPLD or other processing logic.
Processor 12 can use cache memory 14 to read for temporary data and write the average memory access time to reduce originally will be necessary accessing main memory 16 (it can be main system memory).Compared with primary memory 16 time, cache memory 14 can be less, and such as, the data with relatively low amount store.Compared with primary memory 16 time, cache memory 14 also can be very fast storer.For example, less dock cycles is spent from the comparable reading from primary memory 16 of the reading of cache memory 14.In some instances, cache memory can with processor 12 on identical chips, indicated by dotted line 24.In other example, cache memory can in the independent chip that can be adjacent to processor 12.In some instances, cache memory 14 can serve multiple processor.Cache memory 14 also can comprise multiple stratum level, both such as level 1 and level 2 high-speed cache.In cache design, in general may there is trading off between time delay and hit rate.Larger high-speed cache can have better hit rate but time delay is longer.Trade off for solving this, many systems can use multiple cache hierarchy, its medium and small fast cache by more greatly, slower high-speed cache supplements.In general, first multi-layer high-speed cache by checking that usually minimum level 1 (L1) high-speed cache operates.If level 1 high-speed cache has hit, then processor proceeds.If less cache-miss, then in general, time large high-speed cache (L2) can be checked.This process can be continued, until primary memory 16 can be checked with the high-speed cache of more and more high-level.
In addition, cache memory 14 can store the copy of the data of the memory location of the frequent use of autonomous memory 16.As long as most of memory access caches is in cache memory 14, the time delay of the average delay of memory access just comparable primary memory 16 is closer to the time delay of cache memory 14.To understand, the number percent that the number percent that cache memory 14 read or write generation reads than primary memory and writes is higher, and in general the memory access performance using the system of cache memory 14 and primary memory 16 will be higher.
In general, primary memory 16 can be the volatile computer memories of random access memory (RAM) or other type.In some instances, primary memory 16 can be both mix of RAM and ROM (read-only memory) (ROM).In some cases, if need store executable code but seldom or not need to store data, then primary memory 16 can be such as nonvolatile memory (such as, ROM).When needing wherein to store indivisible data, can single memory be used, can register be used, cache memory 14 can be used, maybe can use the memory storage of other type.In various example, primary memory 16 such as can be coupled to processor 12 via system bus 26.In general, primary memory 16 will be regarded as the storer higher than cache memory 14 level.In general, compared with cache memory 14 time, primary memory 16 can be comparatively large, such as, have relatively large data and store.In addition, in general, when compared with cache memory 14, primary memory 16 will be slower.For example, compared with reading or write data to processor 12 time that cache memory 14 spends, reading or write data to the time that primary memory 16 spends may be longer.
In illustrated example, processor 12 is coupled to cache memory 14 to allow to read and be written to cache memory 14 by processor 12.In addition, processor 12 is coupled to primary memory 16 to allow to read and be written to primary memory 16 by processor 12.During the techniques described herein can be applicable to that wherein cache memory 14 and primary memory are only coupled via processor 12, the storer that can use in conjunction with the techniques described herein configures.In addition, in illustrated example, primary memory 16 can be coupled to cache memory 14 to allow to transmit data when intervening without the need to processor 12 between primary memory 16 and cache memory 14.For example, data transfer controller 22 can control this data transmission.As described, data transfer controller 22 can in primary memory 16 and cache memory 14 outside.In other example, data transfer controller 22 can be the part of primary memory 16 or cache memory 14.Data transfer controller 22 also can be included in the assembly in primary memory 16 and cache memory 14 or the assembly in primary memory 16, cache memory 14 and in these device outsides.In some instances, director cache 20 and data transfer controller 22 can be single controller.To understand, the configuration of other storer is also possible.For example, high-speed cache can be connected to primary memory and be free of attachment to processor, or high-speed cache can be connected to processor and be free of attachment to primary memory.In another example, the system with the storer of multiple higher levels can use in conjunction with the techniques described herein.For example, some systems can have the first level cache with processor 12 on identical chips and with processor 12 and primary memory 16 the second level cache (displaying) on different chips.In general, any cache memory (such as, the first level cache, optional second level cache, etc.) is by the storage arrangement for being separated with primary memory 16 and in general can not with primary memory 16 on identical chips.
In one example, cache memory 14 for byte can write cache, it comprises for the various aspects of following the tracks of which (if there is) byte processor 12 written data and not yet write data.Cache memory 14 does not use the special mask for each cache line.But, in an example, in cache memory 14, the dirty mask in pond in whole cache sharing dirty mask space 18.The dirty BYTE MASK in described pond or flag can distribute to the different rows of cache memory 14 on demand.In general, this type of cache memory architectures can be compared to cache systems that each cache line comprises a dirty mask and use less memory location, smaller power and comparatively zonule.By sharing a dirty row in pond, less total memory position can be used.
As illustrated in Figure 1, the dirty mask in described pond in dirty mask space 18 can be stored in a part for cache memory 14.Therefore, the dirty mask in described pond in dirty mask space 18 can be the part of the physical piece identical with cache memory 14.But in other example, the dirty mask in described pond in dirty mask space 18 can be stored in the storer be separated with cache memory or the physical piece be separated.Cache memory 14 can be coupled to allow to perform the technology of subject application in this single memory position.As discussed above, the number of dirty mask can be less than the number of the cache line in cache memory 14.For example, the cache memory 14 with 1000 cache line can use only 20 dirty masks.
Dirty mask instruction byte in dirty mask space 18 is " dirty ".Time after transmitting at last system storage by value write byte, described byte is dirty.The respective value that in general value write will be different from primary memory 16.In other words, make in write value to be stored in value in byte may not identical with the value in the correspond to memories position be stored in primary memory 16 time, described byte is dirty.On the contrary, when being stored in the value in the correspond to memories position that the coupling of the value in byte is stored in primary memory 16, that is, we know not yet be written into since it transmits from last system storage time, described byte is not dirty.To understand, whether the specified byte that various instance system is not followed the tracks of in cache memory is just storing the value being different from the respective value be stored in primary memory 16.But if occur to the write of that byte in cache memory after last system storage transmits, then can suppose, value no longer mates the value be stored in the correspond to memories position of primary memory 16.In general, when the specified byte of cache line is dirty but not each byte of that cache line is all dirty, the dirty mask in dirty mask space 18 allows system to distinguish between dirty byte and not dirty byte.If the byte in particular cache row is not dirty, then that cache line does not need dirty mask, does not at least need at that special time.At All Other Times a certain, one or more byte of that cache line may be dirty.Therefore, the dirty mask in dirty mask space 18 can be used at that time.On the other hand, when all bytes of particular cache row are all dirty, then that cache line does not need dirty mask yet.Again, can situation for this reason at least that special time.In addition, it is dirty for need not following the tracks of which specified byte, because complete dirty flag may indicate each byte in cache line to be all dirty.Describe in detail in this respect relative to the example of Fig. 4 A and 4B hereafter discussed.
In some instances, cache memory 14 and dirty mask space 18 can be controlled by director cache 20.Director cache 20 can be configured to, when the write to each in cache line is not the write completely to described cache line, the one in dirty mask be distributed to corresponding cache line.In one example, the dirty situation of the data cell in the one in each instruction cache line in described dirty mask.In addition, director cache 20 can store identification (ID) information that described cache line that described dirty mask and described dirty mask are assigned to is associated.
Director cache 20 can be DLC (digital logic circuit), processor or other circuit that can implement various aspect of the present invention.Director cache 20 can comprise a certain combination of hardware, software or hardware and software.In addition, although director cache 20 is illustrated as at high-speed cache 14 inner, in other example, all or part of of director cache 20 can be separated with cache memory 14.In other example, processor 12 can in order to control the functional to implement controlling functions of cache memory 14.
Whether device, the system and method for implementing each aspect of the present invention also can follow the tracks of a series of byte " entirely dirty ".When each individual byte in that position is all dirty, the byte in cache memory 14 is entirely dirty.Which if each individual byte in cache locations is all dirty (position for entirely dirty), then do not need to use the dirty mask in dirty mask space 18 to be dirty to follow the tracks of individual byte for that position.Also relative to the example of Fig. 4 A and 4B hereafter discussed, this situation is described in more detail.
System illustrated in fig. 1 can be configured to the accumulator system implementing to comprise the cache memory 14 comprising a series of cache line.Each cache line can comprise complete dirty flag and dirty buffer index.The disposal system 10 comprising cache memory system also can comprise the dirty impact damper comprising dirty mask space.When the write to particular cache row is not the write completely to described particular cache row, described cache line can be distributed in dirty mask space, wherein the identification (ID) in distributed dirty mask space is attached to described particular cache row, makes to access described dirty mask space.
Fig. 2 illustrates the block diagram can implementing the example cache memory 14 of Fig. 1 of technology of the present invention.Cache memory 14 can comprise the several cache line 110,112,114,116,118,120,122 for storing data and the various out of Memory relevant with described data.For example, cache line 110,112,114,116,118,120,122 can comprise flag to indicate the data in cache line whether effective.Cache line 110,112,114,116,118,120,122 also can comprise label, and described label instruction corresponds to the address in the primary memory of the described data in high-speed cache.To understand, and show that the cache line 110,112,114,116,118,120,122 of peanut is for explanation, and usually will there is the cache line of huge amount.Whether can comprise dirty flag has been written into make it no longer to mate data in primary memory 16 with each in cache line 110,112,114,116,118,120,122 with any byte in designation data.
In addition, complete dirty flag can be comprised to indicate whether each byte in the one in cache line 110,112,114,116,118,120,122 is dirty with cache line 110,112,114,116,118,120,122.When such as processor 12 has been written to all bytes in cache line and all bytes in cache line contain the value different from the data value in the correspond to memories position in primary memory 16, this situation can be there is.
Cache line 110,112,114,116,118,120,122 dirty but incomplete dirty time, cache line 110,112,114,116,118,120,122 also can comprise dirty buffer index, and described dirty buffer index serves as the pointer of the position in the dirty mask in dirty mask space 108.Cache line complete dirty time, a series of flag or mask need not be used to be dirty to follow the tracks of which byte.This is because all bytes are all dirty.Therefore, cache line 110,112,114,116,118,120,122 complete dirty time, another cache line 110,112,114,116,118,120,122 can be distributed in any position corresponding to the described cache line 110,112,114,116,118,120,122 be stored in dirty mask space 108.Therefore, dirty mask space 108 some movable state distribute to dirty but entirely not dirty cache line 110,112,114,116,118,120,122.In addition, dirty mask space 108 can dynamically be deallocated from entirely dirty or not dirty any cache line 110,112,114,116,118,120,122.In other words, when the data in particular cache row are entirely dirty or not dirty, can from the information of dirty mask space 108 " withdrawal " about described cache line.After the described cache line of withdrawal, the space that information thus of can redistributing uses is for another cache line.
In other situation again, if such as one or more cache line 110,112,114,116,118,120,122 is not dirty and never dirty, then one or many person in cache line 110,112,114,116,118,120,122 never dynamically may be distributed in mask space 108.Or one or more cache line 110,112,114,116,118,120,122 may be entirely dirty and may never dirty mistake.Such as in a write of cache line, writing whole cache line (such as simultaneously, each byte in cache line) and make described cache line never dirty become complete dirty time, this situation may be there is.
Cache memory 14 illustrated in fig. 2 can be configured to implement accumulator system.Cache memory 14 can comprise a series of cache line.Each cache line can comprise complete dirty flag and dirty buffer index.In addition, cache memory 14 also can comprise the dirty impact damper comprising dirty mask space.In other example, dirty impact damper can be outside at cache memory 14.When the write to particular cache row is not the write completely to described particular cache row, described cache line can be distributed in dirty mask space, wherein the identification (ID) in distributed dirty mask space is attached to described particular cache row, makes to access described dirty mask space.
Fig. 3 is the block diagram of illustrated example cache line 110, and described cache line is the one in the cache line 110,112,114,116,118,120,122 of the Fig. 2 that can implement technology of the present invention.Cache line 110 comprises effective flag 300 to indicate the data in cache line whether effective.Cache line 110 also comprises label 302, address in the primary memory 16 of the data (that is, also can be used for primary memory 16 data of the processor 12 in cache memory 14) of described label instruction containing the data corresponded in cache memory 14.Data 304 are also the parts of cache line 110, and can comprise three data bytes 306,308,310 of such as data 304.In other example, more or less byte is also possible.
In illustrated example, whether dirty flag 312 can be written in order to any byte in designation data and make it no longer may mate data in primary memory 16, and such as, whether new value is written to cache line 110 by processor 12.To understand, the data value identical with the data value existed can be written to cache line by processor 12, but in general, value will be not identical.In addition, in general, in fact the system implementing these technology may not check to check whether value changes.But, the state such as " dirty " or " entirely dirty " can be supposed such as when there is various write.Value in one or more byte being stored in cache line changes and these values are no longer identical with the value in the correspondence position be stored in primary memory 16 time, then the byte changed is " dirty ".In cache line some but not all byte is dirty time, it be dirty is dirty specified byte with instruction that system can use mask to follow the tracks of which byte.Mask can be stored in the one in the pond memory location in dirty mask space 18.
Complete dirty flag 314 can be comprised to indicate whether each byte in cache line 110 is dirty.In the case, if cache line 110 is entirely dirty, then do not need, in dirty mask space 18, there is dirty impact damper mask.Cache line 110 dirty but incomplete dirty time, cache line 110 also can comprise dirty buffer index 316, and described dirty buffer index serves as the pointer of the position in dirty mask space 18.
As illustrated in Figure 3, in some instances, dirty flag 312 and complete dirty flag 314 can in order to indicate the state of each cache line 110.In other example, " cleaning " flag that instruction cache line can be used not dirty and complete dirty flag.In general, clean flag can be the inverse of dirty flag 312.Other example can comprise clean flag, dirty flag and complete dirty flag.
To understand, cache line 110 complete dirty time, a series of flag or mask need not be used to be dirty to follow the tracks of which byte.This is because all bytes are all dirty.When all bytes are all dirty, single " entirely dirty " flag provides enough information to determine which byte is dirty.Which byte of instruction be stored in dirty mask space 18 is that dirty any mask can be used by another cache line.Therefore, cache line 110 complete dirty time, any position corresponding to the cache line 110 be stored in dirty mask space 18 can be distributed to another cache line and save to indicate obscene word.Therefore, dirty but entirely not dirty cache line 110,112,114,116,118,120,122 is dynamically distributed in dirty mask space 18.Dirty mask space 18 can be dynamically deallocated from entirely dirty or not dirty any cache line 110,112,114,116,118,120,122.
For example, the renewal that any write request being attached with the cache line 110,112,114,116,118,120,122 of the dirty mask be stored in cache line may cause corresponding dirty mask is hit.When upgrading dirty mask, whether detection logic can detect dirty mask is all 1, that is, there is " 1 " for each byte in cache line.Once dirty mask is all 1, then cache line is entirely dirty.Therefore, by set dirty mask 314 with indicate cache line be " entirely dirty " come remove additional dirty mask from cache line 110.When making each byte not comprise the data identical with the corresponding byte in primary memory in each byte write in cache line, cache line is entirely dirty.Now, the secondary high storer whole cache line be written in memory level is needed, such as primary memory 16.In other words, when making the secondary high storer in memory level not be up-to-date writing whole cache line, cache line is entirely dirty.Whether each cache line can use 1 " entirely dirty " flag to be entirely dirty to indicate particular cache row.Therefore, for complete dirty cache line, its dirty mask can be used for redistributing to indicate another cache line do not write completely.The full dirty situation of cache line can be indicated by complete dirty flag.Thus, no longer need dirty mask for that cache line, because for complete dirty cache line, each byte in that cache line is all dirty.Mask is unnecessary for the dirty byte of tracking.
As illustrated in Figure 3, cache line can be configured to comprise complete dirty flag 314 and dirty buffer index 316.When the write to particular cache row 110 is not the write completely to described particular cache row, dirty mask space 18 can be can distribute to the cache memory 14 of described cache line 110 or the part of single memory device.Distribute dirty mask space identification (ID) be attached to described particular cache row 110, make to access described dirty mask space 18.
Fig. 4 A and 4B is the concept map of the example of the data processing illustrated in the cache memory 14 using technology of the present invention.Fig. 4 A illustrates the value in five different time point process in single cache line 110.Time 0 indicates original state.Time 1 to 3 illustrates that the various data of cache line 110 write.With such as read-distribute-writing scheme compared with time, the scheme proposed can save system storage read bandwidth.Time 4 illustrates the withdrawal of reading from address 723 and cache line 777, information about cache line 777 is no longer stored in dirty BYTE MASK, because cache line has become entirely dirty and do not needed to follow the tracks of which byte is dirty (because each byte is all dirty).Illustrate in Fig. 4 B that how the value and these values that are used for the individual address in dirty mask space 18 and the individual address in primary memory 16 are five different time points changes.
As discussed, five different times are described.These five different time points are time 0 to the time 4.Time 0 is original state.At time 0 place, data are not yet written to cache line 110.In general, why to be worth may not be particular importance at place in the time 0 for various flag and data register.
Data then can be written to cache line 110 from primary memory 16.In illustrated example, when cache memory 14 receives write request from processor 12, it is not known the data in primary memory 16 at that time and does not extract described data.But, cache memory 14 can be write data into according to next byte of needs one of processor 12.If in fact never need described data, be then not written into cache memory 14.This can save the reading bandwidth from primary memory 16 to cache memory 14.These data can as its time 1,2 and 3 place be written into word for word save land and be stored in cache line 110, and can be read by processor 12 or be write.At time 1,2 and 3 place, Update Table 304 by different way, as hereafter discussed in more detail.At time 4 place, data 304 are write back to primary memory 16 from cache line 110, again store identical data to make main memory 16 and cache line 110.Now, recoverable cache line 110, and data can be written to cache line 110 from address 723.Cache line 110 can be write data into, because processor 12 needs described data.
Illustrated by the example of Fig. 4 A, at time 0 place, any data in our general " being indifferent to " cache line 110.Value can be a certain original state or the junk data from the previous write to cache line 110." be indifferent to " and indicated by " X ".Capitalization " X " indicates hexadecimal value (being 4 separately), and small letter " x " indicates indivedual position.Therefore, in this example, effective flag 300, dirty flag 312 and complete dirty flag 314 are indivedual position.In another example, the value that may store at time 0 place there will be a known valid value, and just no longer need process.
Illustrated by the example of Fig. 4 A, label 302 comprises three hexadecimal values (altogether 12).Address in the primary memory 16 that label 302 indicates the data be stored in now in cache line to derive from.Dirty buffer index 316 comprises three binary digits.Dirty buffer index 316 serves as the pointer of mask, and which byte it indicates be dirty.
Data 304 comprise the data of three bytes.Each data byte 306,308,310 is two hexadecimal digits.A byte is 8, and each hexadecimal digit represents 4 positions.In illustrated example, each address in dirty mask space 18 comprises three positions.In illustrated example, primary memory 16 comprises six hexadecimal digits (altogether 24).
Example illustrated by Fig. 4 A to 4B is not reading-distribution-writing scheme.Therefore, because in the example of Fig. 4 A to 4B, processor 12 does not need initial data values, therefore the undeclared data being written to cache memory 14 of described figure.In reading-distribution-write example, valid data can be written to cache memory 14 from primary memory 16, and more particularly, data can be written to the cache line 110 of cache memory 14.Therefore, effective flag 300 is " 1 ".Label 302 by designation data from primary memory 16 in address be " 777 ".In this example, primary memory 16, address 777 will comprise data " AA AA AA ".
At time 1 place, data 304 are modified.More particularly, data byte 308 is changed into " 00 " from its preceding value.The part of the process that can such as be performed as processor 12 by processor 12 or carry out this amendment by sending from primary memory 16 to the direct memory of cache memory 14.Therefore, data 304 are " dirty ", as indicated by the dirty flag 312 containing value " 1 ".Only one in three data bytes 306,308,310 is dirty; Specifically, data byte 308 is dirty.Cache line 110 is " entirely dirty " not, and complete dirty flag 314 is containing value " 0 ".Because cache line 110 " dirty " but not " entirely dirty " (second discussed state), dirty impact damper mask is therefore needed to be dirty to which byte determined byte and can write cache memory 14 above.For example, the position of obscene word joint is indicated can be stored in dirty mask space 18.Dirty buffer index 316 is containing value " 101 ".This value is the pointer to the address in dirty mask space 18.The address " 101 " in dirty mask space 18 is illustrated in Fig. 4 B, and at time 1 place, address " 101 " (5) in dirty mask space 18 are containing binary value " 010 ".This designation data byte 308 is dirty, is indicated by " 1 " in " 010 ".First " 0 " designation data byte 306 in " 010 " is not dirty, and second " 0 " designation data byte 310 in " 010 " is not dirty.Therefore, in the example illustrated by Fig. 4 B, each in dirty mask space 18, address " 101 " can in order to the dirty situation of the particular data byte 306,308,310 of tracking data 304.
As illustrated in fig. 4 a, at time 2 place, data 304 are modified.More particularly, data byte 306 is changed into " FF " from its preceding value.Data 304 are still " dirty ", as indicated by the dirty flag 312 containing value " 1 ".In three data bytes 306,308,310 only both be dirty, specifically, data byte 306 and 308 is dirty.Cache line 110 not " entirely dirty " and complete dirty flag 314 containing value " 0 ".Because cache line 110 " dirty " but not " entirely dirty " (being the second discussed state again), dirty impact damper mask is therefore needed to be dirty with which byte determined byte and can write cache memory 14 above.For example, the position of obscene word joint is indicated can be stored in dirty mask space 18.Dirty buffer index 316 is containing value " 101 ".This value is the pointer to the address in dirty mask space 18.The address " 101 " in dirty mask space 18 is illustrated in Fig. 4 B, and at time 2 place, the address " 101 " in dirty mask space 18 is containing binary value " 110 ".This designation data byte 306 and 308 is dirty, is indicated by " 1 " in " 110 "." 0 " designation data byte 310 in " 110 " is not dirty.In example illustrated by Fig. 4 B, each in dirty mask space 18, address " 101 " can in order to the dirty situation of the particular data byte 306,308,310 of tracking data 304.
As illustrated in fig. 4 a, at time 3 place, data 304 are modified again.More particularly, data byte 310 is changed into " 88 " from its preceding value.Data 304 are " dirty ", as indicated by the dirty flag 312 containing value " 1 ".But now, in three data bytes 306,308,310, all threes are dirty.Cache line 110 is in the time 3 place " entirely dirty ".Therefore, complete dirty flag 314 is containing value " 1 ".Because cache line 110 " entirely dirty " (the discussed third state), dirty impact damper mask is not therefore needed to be dirty to which byte determined byte and can write cache memory 14 above.All bytes of cache line 110 are all dirty.Value in dirty buffer index 316 the time 3 place be " being indifferent to ", i.e. " xxx ".In addition, the value of dirty mask space 18, address 101 is " being indifferent to ", i.e. " xxx ".This mask space can be reassigned to another cache line 110,112,114,116,118,120,122 to follow the tracks of dirty byte status.
At time 4 place, as illustrated in fig. 4 a, data 304 write back to primary memory 16, address 777.Therefore, at time 4 place, the data in primary memory 16, address 777 are " FF 00 88 ".At time 4 place, cache line 110 is in the first state as described above " not dirty ".In not dirty situation, the dirty impact damper mask be stored in dirty mask space 18 is unwanted, because do not have position to be dirty.Dirty flag 312 the time 4 place contain value " 0 ", and complete dirty flag 314 the time 4 place also containing value " 0 ".In other words, the information about cache line 110 is no longer stored in dirty BYTE MASK, because cache line has become entirely dirty and do not needed to follow the tracks of which byte is dirty (because each byte is all dirty).At time 4 place, cache line 110 can store such as relevant with another primary memory 16 address (such as, address 723) information.
In one example, as described above, if write request is not written into whole cache line, then described cache line can be distributed in dirty mask space.Distribute dirty mask space identification (ID) this cache line can be attached to, to make can access described mask when regaining this cache line.
In an example, dirty mask can in order to follow the tracks of the particular data byte in multiple data byte.Whether the certain bits in mask can change by writing in order to flag data byte, makes described byte no longer mate the value of the corresponding byte in such as primary memory.
In one example, use each individually marked in multiple data byte of the position in mask to allow which is followed the tracks of to be upgraded by least one write.In one example, specific dirty position can in order to indicate specified byte to be dirty.Whether work by each determining in mask and determine to upgrade each in multiple data byte.
Dirty mask is used to allow to determine whether to upgrade data element.For example, particular logic circuit can be provided to verify that all dirty position in dirty mask is marked as dirty, such as, operate (Boolean ' and ' operation) by carrying out boolean's ' with ' to all dirty positions.Now, data elements groups can be labeled as write-back candidate.
Hit the renewal that any write request being attached with the cache line of dirty mask should cause corresponding dirty mask.When upgrading dirty mask, whether detection logic can detect dirty mask is all 1.Once dirty mask is all 1, thus the byte in instruction particular cache line is entirely dirty, then make it additional from cache line releasing by being set as by ID invalid.This instruction cache line is " entirely dirty ".When needing the secondary high storer whole cache line be written in memory level, cache line is entirely dirty.In other words, make the secondary high storer in memory level not writing whole cache line for time up-to-date, cache line is entirely dirty.Whether each cache line can use 1 " completely " flag to be entirely dirty to indicate particular cache row.Therefore, for complete dirty cache line, its dirty mask can be used for redistributing to indicate another cache line do not write completely.The full dirty situation of cache line can be indicated by complete dirty flag.Thus, dirty mask is no longer needed for that cache line.
Fig. 5 is the process flow diagram of the case method illustrated according to technology of the present invention.Cache memory 14 can follow the tracks of the state (400) of the cache line in cache memory.For example, director cache 20 can follow the tracks of the state of the cache line in cache memory 14.Can such as use the write being indicated to cache line be not to each byte in that particular cache row the dirty flag write completely to carry out this operation.Therefore, to the write generation of cache line and said write is not the write completely to each byte in that particular cache row time, some bytes in cache line may be different from the correspond to memories position in primary memory 16.In addition, some bytes in cache line may be identical with the correspond to memories position in primary memory 16.But will understand, along with the follow-up write to identical cache line occurs, final whole cache line can be different from the correspond to memories position in primary memory 16.As discussed above, after last storer transmits, in general contingent write will make the data be stored in high-speed cache change from the data be stored in correspond to memories position.However, still likely identical value is written to cache memory by processor 12, does not in fact change after write occurs to make data.In fact in general system do not check that data are to be sure of that it changes.But system only so supposes and is such as dirty or entirely dirty based on the value write by suitable dirty flag marker.For example, the write to whole cache line can occur, or finally can change each byte in described cache line to multiple writes of cache line.
When the write to particular cache row is not the write completely to described particular cache row, described cache line (402) can be distributed in dirty mask space 18 by cache memory.For example, director cache 20 can perform described distribution.To understand, director cache 20 can be the part of cache memory 14 or is separated with cache memory 14.Therefore, the dirty mask in dirty mask space 18 can be distributed to cache memory on demand and deallocate from cache line.Use dirty mask space 18 but not dirty mask space for each cache line can need comparatively small memory.
The identification (ID) that cache memory can follow the tracks of distributed dirty mask space is attached to particular cache row, makes to access described dirty mask space (404).For example, director cache 20 can follow the tracks of ID.Therefore, ID can provide cache line and distribute connection between dirty mask space (such as dirty mask), make dirty mask can be dirty in order to which byte determining in cache line and do not comprise the data identical with the corresponding byte in primary memory 16 potentially.
In some instances, if all dirty mask in dirty mask space 18 has distributed to cache line and need extra dirty mask, then such as can become dirty (each byte is different from the corresponding byte in primary memory 16) entirely in corresponding cache line and deallocate a dirty mask before.If this situation occurs, the dirty mask in dirty mask space 108 then can be selected to deallocate from particular cache row, and the correspond to memories that the data in cache line can be write back in primary memory 16, make cache line be no longer dirty.
When occurring to the write request comprising a cache line of corresponding dirty mask in a series of cache line, some examples of system and method as herein described can cause the renewal to the described dirty mask in dirty mask space.In addition, the ID in distributed dirty mask space can be attached to particular cache row by high-speed cache, makes can access described dirty mask space when regaining this cache line.In some instances, high-speed cache also can indicate the particular cache row that dirty impact damper mask is assigned to.High-speed cache also can indicate at least one byte of cache line to be different from corresponding byte in primary memory.
Various example as herein described can indicate cache memory 14 can perform various action.To understand, in some instances, processor, controller or cache memory 14 other logical circuit of inside can perform these actions.Other example can comprise not in cache memory 14 inside but control the processor of one or many person in function as herein described, controller or other logical circuit.Therefore, can in cache memory inside, at cache memory outside or the dirty mask space function performing dynamic assignment with its combination.It can be performed by hardware circuit, software or its a certain combination.
Fig. 6 is another process flow diagram of the case method illustrated according to technology of the present invention.Instance system, device or equipment can write data into the cache memory (450) comprising a series of cache line.Each cache line can comprise complete dirty flag and dirty buffer index.Complete dirty flag instruction cache line is " entirely dirty ".In other words, each byte in cache line is different from the corresponding byte in primary memory 16.
Dirty buffer index can comprise address or the index to dirty impact damper.For example, dirty buffer index can cover at least one pointer of at least one position that at least one in dirty mask is assigned to.In another example, dirty buffer index covers the pointer of the position that the one in multiple dirty mask is assigned to.Dirty impact damper instruction is different from the byte of the corresponding byte in primary memory 16.
Instance system, device or equipment can the write to particular cache row be not to described particular cache row completely write time by the dirty mask allocation of space in dirty impact damper give described cache line (452).For example, the director cache 20 in instance system, device or equipment can perform this type of and distribute.In some instances, director cache 20 can be the part of cache memory 14.In other example, it can be isolated system.In addition, distribute dirty mask space identification (ID) be attached to particular cache row, make to access dirty mask space.
Will be appreciated that, depend on example, some action of any one in technology described herein or event can perform by different order, can add, merge or all omit described action or event (such as, put into practice described technology not necessarily all described actions or event).In addition, in certain embodiments, can simultaneously (such as, by multiple threads, interrupt processing or multiple processor) and non-sequentially performs an action or event.
In one or more example, described function can be implemented in hardware, software, firmware or its any combination.If with implement software, then described function can be used as one or more instruction or code stores or launches on computer-readable media, and is performed by hardware based processing unit.Computer-readable media can comprise computer-readable storage medium, it corresponds to tangible medium, such as data storage medium, or comprise the communication medium promoting any media (such as, according to a kind of communication protocol) computer program being sent to another place from.In this way, computer-readable media may correspond to usually in (1) tangible computer readable memory medium, and its right and wrong are temporary, or (2) communication medium, such as signal or carrier wave.Data storage medium can be can by one or more computing machine or one or more processor access with retrieval for implementing any useable medium of the instruction of the technology described in the present invention, code and/or data structure.Computer program can comprise computer-readable media.
Unrestricted by means of example, this type of computer-readable storage medium can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage apparatus, disk storage device or other magnetic storage device, flash memory or can be used to the form storing instruction or data structure expectation program code and can by other media any of computer access.Equally, rightly any connection can be called computer-readable media.For example, if use the wireless technology such as concentric cable, fiber optic cables, twisted-pair feeder, digital subscribe lines (DSL) or such as infrared ray, radio and microwave from website, server or other remote source software, then the wireless technology of concentric cable, fiber optic cables, twisted-pair feeder, DSL or such as infrared ray, radio and microwave is contained in the definition of media.However, it should be understood that, described computer-readable storage medium and data storage medium do not comprise connection, carrier wave, signal or other temporary media, but are in fact directed to non-transitory tangible storage medium.As used herein, disk and case for computer disc are containing compact disc (CD), laser-optical disk, optical compact disks, digital versatile disc (DVD), flexible plastic disc and Blu-ray Disc, the wherein usual magnetically rendering data of disk, and usage of CD-ROM laser optics ground rendering data.The combination of each above also should be included in the scope of computer-readable media.
Instruction can be performed by one or more processor, and one or more processor described is such as the integrated or discrete logic of one or more digital signal processor (DSP), general purpose microprocessor, special IC (ASIC), field programmable logic array (FPLA) (FPGA) or other equivalence.Therefore, " processor " can refer to said structure or be suitable for implementing any one in other structure arbitrary of technology described herein as used herein, the term.In addition, in certain aspects, as herein describedly functionally can to provide in the specialized hardware be configured for encoding and decoding and/or software module, or be incorporated in combination codec.And, described technology can be implemented in one or more circuit or logic element completely.
Technology of the present invention can be implemented in extensive multiple device or equipment, comprises wireless handset, integrated circuit (IC) or one group of IC (such as, chipset).Describing various assembly, module or unit in the present invention is function aspects in order to emphasize to be configured to the device performing the technology disclosed, but is not necessarily realized by different hardware unit.But, as described above, various unit can be combined in codec hardware unit in conjunction with suitable software and/or firmware, or is provided by the set of interoperability hardware cell, and described hardware cell comprises one or more processor as described above.
Various example has been described.These and other example belongs in the scope of appended claims.

Claims (52)

1. a cache memory system, it comprises:
Comprise the cache memory of multiple cache line;
Comprise the dirty impact damper of multiple dirty mask; And
Director cache, it is configured to when the write to each in described cache line is not the write completely to described cache line, one in described dirty mask is distributed to described corresponding cache line, each in wherein said dirty mask indicates the dirty situation of the data cell in the one in described cache line, and the identification id information that the described cache line that wherein said director cache storage makes described dirty mask and described dirty mask be assigned to is associated.
2. cache memory system according to claim 1, wherein said cache memory comprises multiple cache line further, and each in described cache line comprises:
Store the position of complete dirty flag, when each byte in described complete dirty flag instruction cache line is dirty; And
Store the position of the dirty buffer index for storing described id information.
3. cache memory system according to claim 2, the described dirty flag wherein for particular cache row indicates at least one data cell of described cache line to be written into since the last data write from primary memory.
4. cache memory system according to claim 2, wherein said dirty flag comprises 1 flag.
5. cache memory system according to claim 2, the described complete dirty flag wherein for particular cache row indicates described cache line to be entirely dirty.
6. cache memory system according to claim 2, wherein said complete dirty flag comprises 1 flag.
7. cache memory system according to claim 2, wherein said id information is stored in dirty buffer index, described dirty buffer index provides the address of dirty impact damper mask, the described cache line that the described dirty impact damper mask of described address instruction is assigned to.
8. cache memory system according to claim 1, wherein said dirty buffer index comprises the pointer of the position that the one in described multiple dirty mask is assigned to.
9. cache memory system according to claim 1, wherein said director cache is configured to the renewal caused when there is the write request to the one in described cache line the one distributed in the described dirty mask of described corresponding cache line further.
10. cache memory system according to claim 1, wherein said director cache is configured to the described id information of the one be used in described cache line to be stored in the position in described corresponding cache line.
11. cache memory systems according to claim 1, wherein said dirty impact damper comprises a part for described cache memory.
12. cache memory systems according to claim 1, wherein said dirty impact damper comprises the storer be separated with described cache memory.
13. cache memory systems according to claim 1, it comprises the processor being coupled to described cache memory and the primary memory being coupled to described processor further, and described processor is configured to read and writes data to described cache memory and described primary memory.
The method of 14. 1 kinds of operational store systems, it comprises:
Write data into the cache memory comprising multiple cache line; And
When the write to each in described cache line is not the write completely to described cache line, one in multiple dirty mask is distributed to described cache line, and wherein said dirty mask indicates the dirty situation of the data cell in described cache line; And
The identification id information that the described cache line that storage makes described dirty mask and described dirty mask be assigned to is associated.
15. methods according to claim 14, each in wherein said multiple cache line comprises further:
Store the position of complete dirty flag, when each byte in described complete dirty flag instruction cache line is dirty; And
Store the position of the dirty buffer index for storing described id information.
16. methods according to claim 14, its be included in further occur in described multiple cache line comprise the write request of a cache line of corresponding dirty mask time cause renewal to described dirty mask.
17. methods according to claim 14, it comprises further and being stored in the position in described corresponding cache line by the described id information of the one be used in described cache line.
18. methods according to claim 14, its at least one data cell comprising the described cache line of instruction further has been written into since the last data write from primary memory.
19. methods according to claim 14, it comprises the described cache line of instruction is further entirely dirty.
20. methods according to claim 14, it comprises further and being also assigned with to the described dirty mask of described cache line instruction.
21. 1 kinds of cache memory systems, it comprises:
For writing data into the device of the cache memory comprising multiple cache line; And
During for not being the write completely to described cache line in the write to each in described cache line, the one in multiple dirty mask is distributed to the device of described cache line, wherein said dirty mask indicates the dirty situation of the data cell in described cache line; And
For storing the device of the identification id information that the described cache line that makes described dirty mask and described dirty mask be assigned to is associated.
22. cache memory systems according to claim 21, each in wherein said cache line comprises further:
For storing the device of complete dirty flag, when each byte in described complete dirty flag instruction cache line is dirty; And
For storing the device of the dirty buffer index for storing described id information.
23. cache memory systems according to claim 21, its comprise further for occur in described multiple cache line comprise the write request of a cache line of corresponding dirty mask time cause the device of the renewal to described dirty mask.
24. cache memory systems according to claim 21, it comprises the device in the position for being stored in by the described id information of the one be used in described cache line in described corresponding cache line further.
25. cache memory systems according to claim 21, it comprises the device that at least one data cell of being used to indicate described cache line has been written into since the last data write from primary memory further.
26. cache memory systems according to claim 21, it comprises further, and to be used to indicate described cache line be entirely dirty device.
27. cache memory systems according to claim 21, it comprises the device for being also assigned with to the described dirty impact damper of described particular cache row instruction further.
28. 1 kinds of systems, it comprises:
Processor;
Be coupled to the primary memory of described processor;
Be coupled to the cache memory of described processor, described cache memory comprises:
Director cache;
Multiple cache line; And
Dirty impact damper, it comprises dirty mask, described director cache is not when the write to each in described cache line is the write completely to described cache line, one in described dirty mask is distributed to described cache line, wherein said dirty mask indicates the dirty situation of the data cell in described cache line, and the identification id information that the described cache line that wherein said director cache storage makes described dirty mask and described dirty mask be assigned to is associated.
29. processor systems according to claim 28, wherein said multiple cache line, each in described cache line comprises:
Store the position of complete dirty flag, when each byte in described complete dirty flag instruction cache line is dirty; And
Store the position of the dirty buffer index for storing described id information.
30. processor systems according to claim 29, wherein indicate at least one data cell of described cache line to be different from corresponding data unit in described primary memory for the described dirty flag of particular cache row.
31. processor systems according to claim 29, the described complete dirty flag wherein for particular cache row indicates described cache line to be entirely dirty.
32. processor systems according to claim 29, wherein said id information is stored in dirty buffer index, described dirty buffer index provides the address of dirty impact damper mask, the described cache line that the described dirty impact damper mask of described address instruction is assigned to.
33. processor systems according to claim 28, wherein said director cache be configured to further occur in described multiple cache line comprise the write request of a cache line of corresponding dirty mask time cause renewal to described dirty mask.
34. processor systems according to claim 28, wherein said director cache is configured to the described id information of the one be used in described cache line to be stored in the position in described corresponding cache line.
35. processor systems according to claim 28, wherein said dirty impact damper comprises a part for described cache memory.
36. processor systems according to claim 28, wherein said dirty impact damper comprises the storer be separated with described cache memory.
The method of 37. 1 kinds of operational store systems, it comprises:
The write being indicated to cache line in cache memory is used not to be when that the dirty flag that writes completely is to follow the tracks of the state of described cache line;
When the write to described cache line is not the write completely to described particular cache row, dirty mask is distributed to described cache line; And
The identification id information of following the tracks of described distributed dirty mask is attached to described particular cache row, makes to access described dirty mask.
38. according to method according to claim 37, its be included in further occur in multiple cache line comprise the write request of a cache line of corresponding dirty mask time cause renewal to described dirty mask.
39. according to method according to claim 37, and it comprises further and being stored in the position in described corresponding cache line by the described id information of the one be used in described cache line.
40. according to method according to claim 37, and it comprises further and indicates dirty impact damper mask to be also assigned with to described particular cache row.
41. according to method according to claim 37, and its at least one data cell comprising the described cache line of instruction is further different from the corresponding data unit in primary memory.
42. 1 kinds of cache memory systems, it comprises:
Not when that the dirty flag write completely is to follow the tracks of the device of the state of described cache line for using the write of the cache line be indicated in cache memory;
During for not being the write completely to described particular cache row in the write to described cache line, dirty mask is distributed to the device of described cache line; And
Identification id information for following the tracks of described distributed dirty mask is attached to described particular cache and exercises to access the device of described dirty mask.
43. cache memory systems according to claim 42, its comprise further for occur in multiple cache line comprise the write request of a cache line of corresponding dirty mask time cause the device of the renewal to described dirty mask.
44. cache memory systems according to claim 42, it comprises the device in the position for being stored in by the described id information of the one be used in described cache line in described corresponding cache line further.
45. cache memory systems according to claim 42, it comprises the device for indicating dirty impact damper mask to be also assigned with to described particular cache row further.
46. cache memories according to claim 42, it comprises the device that at least one data cell being used to indicate described cache line is different from the corresponding data unit in primary memory further.
47. 1 kinds of non-transitory computer-readable medias, cause programmable processor to carry out the instruction of following operation when it is included in execution:
The write being indicated to cache line in cache memory is used not to be when that the dirty flag that writes completely is to follow the tracks of the state of described cache line;
When the write to described cache line is not the write completely to described particular cache row, dirty mask is distributed to described cache line; And
The identification id information of following the tracks of described distributed dirty mask is attached to described particular cache row, makes to access described dirty mask.
48. non-transitory computer-readable medias according to claim 47, wherein said instruction cause further described processor occur in multiple cache line comprise the write request of a cache line of the correspondence dirty mask in described dirty mask time upgrade described dirty mask.
49. non-transitory computer-readable medias according to claim 47, wherein said instruction causes described processor to be stored in the position in described corresponding cache line by the described id information of the one be used in described cache line further.
50. non-transitory computer-readable medias according to claim 47, wherein said instruction causes described processor to indicate dirty impact damper mask to be also assigned with to described particular cache row further.
51. non-transitory computer-readable medias according to claim 47, wherein processor comprises director cache, and described director cache is coupled to cache memory and is configured to control described cache memory.
52. non-transitory computer-readable medias according to claim 51, wherein said director cache comprises programmable logic device, and described computer-readable media comprises the storer storing and be configured to the bit stream that described programmable logic device is programmed.
CN201380061576.2A 2012-11-28 2013-10-28 Use the memory management in the dirty mask space of dynamically distributes Active CN104813293B (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US13/687,761 US9342461B2 (en) 2012-11-28 2012-11-28 Cache memory system and method using dynamically allocated dirty mask space
US13/687,761 2012-11-28
PCT/US2013/067111 WO2014085002A1 (en) 2012-11-28 2013-10-28 Memory management using dynamically allocated dirty mask space

Publications (2)

Publication Number Publication Date
CN104813293A true CN104813293A (en) 2015-07-29
CN104813293B CN104813293B (en) 2017-10-31

Family

ID=49551817

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201380061576.2A Active CN104813293B (en) 2012-11-28 2013-10-28 Use the memory management in the dirty mask space of dynamically distributes

Country Status (6)

Country Link
US (1) US9342461B2 (en)
EP (1) EP2926257B1 (en)
JP (1) JP6009688B2 (en)
KR (1) KR101662969B1 (en)
CN (1) CN104813293B (en)
WO (1) WO2014085002A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112835532A (en) * 2021-02-25 2021-05-25 上海壁仞智能科技有限公司 Method for cache control and computing device
CN112860182A (en) * 2019-11-27 2021-05-28 美光科技公司 Bit mask valid sectors for write-back merge

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160026579A1 (en) * 2014-07-22 2016-01-28 Lsi Corporation Storage Controller and Method for Managing Metadata Operations in a Cache
KR102362239B1 (en) 2015-12-30 2022-02-14 삼성전자주식회사 Memory system including dram cache and cache management method thereof
US10585798B2 (en) * 2017-11-27 2020-03-10 Intel Corporation Tracking cache line consumption
US10705590B2 (en) 2017-11-28 2020-07-07 Google Llc Power-conserving cache memory usage
KR20220030440A (en) * 2020-08-31 2022-03-11 삼성전자주식회사 Electronic device, system-on-chip, and operating method
JP7350699B2 (en) * 2020-09-11 2023-09-26 株式会社東芝 write-back cache device
US11681631B2 (en) * 2021-06-25 2023-06-20 Microsoft Technology Licensing, Llc Write-behind optimization of covering cache

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030061452A1 (en) * 2001-09-27 2003-03-27 Kabushiki Kaisha Toshiba Processor and method of arithmetic processing thereof
CN1612113A (en) * 2003-10-14 2005-05-04 国际商业机器公司 Energy saving cache and its operation method
CN1797326A (en) * 2004-12-21 2006-07-05 三菱电机株式会社 Control circuit and its control method
CN102160040A (en) * 2008-09-17 2011-08-17 松下电器产业株式会社 Cache memory, memory system, data copying method and data rewriting method
US20120246410A1 (en) * 2011-03-24 2012-09-27 Kabushiki Kaisha Toshiba Cache memory and cache system

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5155824A (en) * 1989-05-15 1992-10-13 Motorola, Inc. System for transferring selected data words between main memory and cache with multiple data words and multiple dirty bits for each address
US5802572A (en) * 1996-03-15 1998-09-01 International Business Machines Corporation Write-back cache having sub-line size coherency granularity and method for maintaining coherency within a write-back cache
JP3204295B2 (en) * 1997-03-31 2001-09-04 日本電気株式会社 Cache memory system
US6205521B1 (en) 1997-11-03 2001-03-20 Compaq Computer Corporation Inclusion map for accelerated cache flush
US7203798B2 (en) 2003-03-20 2007-04-10 Matsushita Electric Industrial Co., Ltd. Data memory cache unit and data memory cache system
JP4009306B2 (en) 2003-11-18 2007-11-14 松下電器産業株式会社 Cache memory and control method thereof
TW200534096A (en) 2003-12-22 2005-10-16 Matsushita Electric Ind Co Ltd Cache memory and its controlling method
US20060143397A1 (en) 2004-12-29 2006-06-29 O'bleness R F Dirty line hint array for cache flushing
US7380070B2 (en) 2005-02-17 2008-05-27 Texas Instruments Incorporated Organization of dirty bits for a write-back cache
US20060274070A1 (en) 2005-04-19 2006-12-07 Herman Daniel L Techniques and workflows for computer graphics animation system
US8180968B2 (en) * 2007-03-28 2012-05-15 Oracle America, Inc. Reduction of cache flush time using a dirty line limiter
US7917699B2 (en) 2007-12-21 2011-03-29 Mips Technologies, Inc. Apparatus and method for controlling the exclusivity mode of a level-two cache
JP2011248389A (en) * 2008-09-09 2011-12-08 Panasonic Corp Cache memory and cache memory system
US8732409B2 (en) 2008-11-17 2014-05-20 Entropic Communications, Inc. Cache management policy and corresponding device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030061452A1 (en) * 2001-09-27 2003-03-27 Kabushiki Kaisha Toshiba Processor and method of arithmetic processing thereof
CN1612113A (en) * 2003-10-14 2005-05-04 国际商业机器公司 Energy saving cache and its operation method
CN1797326A (en) * 2004-12-21 2006-07-05 三菱电机株式会社 Control circuit and its control method
CN102160040A (en) * 2008-09-17 2011-08-17 松下电器产业株式会社 Cache memory, memory system, data copying method and data rewriting method
US20120246410A1 (en) * 2011-03-24 2012-09-27 Kabushiki Kaisha Toshiba Cache memory and cache system

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112860182A (en) * 2019-11-27 2021-05-28 美光科技公司 Bit mask valid sectors for write-back merge
CN112860182B (en) * 2019-11-27 2024-05-10 美光科技公司 Bit-masked valid sector for write-back combining
CN112835532A (en) * 2021-02-25 2021-05-25 上海壁仞智能科技有限公司 Method for cache control and computing device

Also Published As

Publication number Publication date
WO2014085002A1 (en) 2014-06-05
JP2015535631A (en) 2015-12-14
US20140149685A1 (en) 2014-05-29
US9342461B2 (en) 2016-05-17
JP6009688B2 (en) 2016-10-19
EP2926257A1 (en) 2015-10-07
KR101662969B1 (en) 2016-10-05
CN104813293B (en) 2017-10-31
KR20150091101A (en) 2015-08-07
EP2926257B1 (en) 2019-06-26

Similar Documents

Publication Publication Date Title
CN104813293A (en) Memory management using dynamically allocated dirty mask space
CN103383672B (en) High-speed cache control is to reduce transaction rollback
CN102460400B (en) Hypervisor-based management of local and remote virtual memory pages
US8745334B2 (en) Sectored cache replacement algorithm for reducing memory writebacks
US8417882B2 (en) Storage device and deduplication method
US8402205B2 (en) Multi-tiered metadata scheme for a data storage array
US20150347310A1 (en) Storage Controller and Method for Managing Metadata in a Cache Store
US9990289B2 (en) System and method for repurposing dead cache blocks
US8966170B2 (en) Elastic cache of redundant cache data
US5787478A (en) Method and system for implementing a cache coherency mechanism for utilization within a non-inclusive cache memory hierarchy
CN109815163A (en) The system and method for efficient cache row processing based on prediction
US20080168236A1 (en) Performance of a cache by detecting cache lines that have been reused
US8145870B2 (en) System, method and computer program product for application-level cache-mapping awareness and reallocation
CN104166634A (en) Management method of mapping table caches in solid-state disk system
CN1940892A (en) Circuit arrangement, data processing system and method of cache eviction
US11442867B2 (en) Using a second content-addressable memory to manage memory burst accesses in memory sub-systems
CN103729306A (en) Multi CPU invalidate operation bypass through address range check
US7721047B2 (en) System, method and computer program product for application-level cache-mapping awareness and reallocation requests
CN104461932A (en) Directory cache management method for big data application
US20140143498A1 (en) Methods and apparatus for filtering stack data within a cache memory hierarchy
US20050055513A1 (en) Implementation of a pseudo-LRU algorithm in a partitioned cache
CN109478163B (en) System and method for identifying a pending memory access request at a cache entry
CN116627890A (en) Directory and cache fusion device with asymmetric tag and data and application method
CN111309645A (en) Novel hybrid memory garbage collection method and system based on nonvolatile memory

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant