GB2533768A - Cleaning a write-back cache - Google Patents

Cleaning a write-back cache Download PDF

Info

Publication number
GB2533768A
GB2533768A GB1422789.6A GB201422789A GB2533768A GB 2533768 A GB2533768 A GB 2533768A GB 201422789 A GB201422789 A GB 201422789A GB 2533768 A GB2533768 A GB 2533768A
Authority
GB
United Kingdom
Prior art keywords
target
cache line
clean
load
instruction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
GB1422789.6A
Other versions
GB2533768B (en
Inventor
Due Engh-Halstvedt Andreas
Nystad Jørn
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ARM Ltd
Original Assignee
ARM Ltd
Advanced Risc Machines Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ARM Ltd, Advanced Risc Machines Ltd filed Critical ARM Ltd
Priority to GB1422789.6A priority Critical patent/GB2533768B/en
Priority to US14/957,117 priority patent/US20160179676A1/en
Publication of GB2533768A publication Critical patent/GB2533768A/en
Application granted granted Critical
Publication of GB2533768B publication Critical patent/GB2533768B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0804Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with main memory updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/12Replacement control
    • G06F12/121Replacement control using replacement algorithms
    • G06F12/126Replacement control using replacement algorithms with special data handling, e.g. priority of data or instructions, handling errors or pinning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0893Caches characterised by their organisation or structure
    • G06F12/0895Caches characterised by their organisation or structure of parts of caches, e.g. directory or tag array
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/12Replacement control
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/3004Arrangements for executing specific machine instructions to perform operations on memory
    • G06F9/30043LOAD or STORE instructions; Clear instruction

Abstract

An apparatus for processing data incorporates a write back cache 6 with a plurality of cache lines 26, processing circuitry 4 to process program instructions, and an instruction decoder 16 to decode a load and clean instruction to generate control signals which control the processing circuitry to load data from a target portion of a target cache line and to control the write back cache 6 to mark as clean at least said portion of said cache line. The cache line may have many portions, each portion having a dirty flag indicating whether it contains data that has not yet been written back to a memory. Individual or a memory address range of these cache line portions may have their indicators set from dirty to clean by the load and clean program. Eviction control circuitry may be responsive to the execution of the load and clean instructions. A method for compiling a source program is included in which loading a target data value that is a last use instance of the data generates a load and clean instruction where as if the data value is not the last use a load instruction is generated.

Description

CLEANING A WRITE-BACK CACHE
This invention relates to the field of data processing systems.
It is known to provide data processing systems with cache memories in order to provide lower latency access to frequently used or critical data or instructions. One known type of cache memory is a write-back cache memory. Data may be written to a write-back cache memory without other versions of that data, such as held in the main memory, being updated until the data which has been written (the dirty data) is evicted from the write-back cache.
In accordance with at least some example embodiments of the disclosure, there is provided apparatus for processing data comprising: a write-back cache having a plurality of cache lines; processing circuitry to perform processing operations specified by program instructions; and an instruction decoder to decode a load-and-clean instruction to generate control signals: to control said processing circuitry to load data from a target portion of a target cache line; and to control said write-back cache to mark as clean at least said target portion of said target cache line.
In accordance with at least some embodiments of the disclosure there is provided apparatus for processing data comprising: write-back cache means for storing data, said write-back cache means having a plurality of cache lines; processing means for performing processing operations specified by program instructions; and instruction decoding means for decoding a load-and-clean instruction to generate control signals: to control said processing means to load data from a target portion of a target cache line; and to control said write-back cache means to mark as clean at least said target portion of said target cache line.
In accordance with at least some embodiments of the disclosure there is provided a method of processing data comprising: storing data within a write-back cache having a plurality of cache lines; perform processing operations specified by program instructions; and decoding a load-and-clean instruction to generate control signals: to control loading data from a target portion of a target cache line; and to control marking as clean at least said target portion of said target cache line.
In accordance with at least some embodiments of the disclosure there is provided a method of compiling a source program to generate an object program comprising: identifying a last use within said source program of a data value stored at a memory address; if said source program specifies loading a target data value that is a last use of said target data value, then generating a corresponding load-and-clean instruction within said object program; and if said source program specifies loading a target data value that is not a last use of said target data value, then generating a corresponding load instruction within said object program.
Example embodiments will now be described, by way of example only, with reference to the accompanying drawings in which: Figure 1 schematically illustrates a data processing system including a write-back cache; Figure 2 is a flow diagram schematically illustrating the processing of a load instruction using a write-back cache; Figure 3 schematically illustrates the operation of a load-and-clean instruction upon a cache line in accordance with a first example embodiment; Figure 4 schematically illustrates the operation of a load-and-clean instruction upon a cache line in accordance with a second example embodiment; Figures 5 and 6 schematically illustrate the operation of a load-and-clean instruction from different portion within a cache line in accordance with a third example embodiment; Figure 7 is a flow diagram schematically illustrating a first example of eviction control; Figure 8 is a flow diagram schematically illustrating a second example of eviction control; and Figures 9 and 10 are flow diagrams schematically illustrating compiling a source program to utilize load-and-clean instructions.
It is possible that programmer or compiler may identify that when a load is being performed of a data value from a memory address there will be no subsequent use of that data value in the program concerned. Examples of such situations include stack memories where data has been spilled to the stack memory upon a context change and is then POPed from the stack memory when the original context is resumed. In this case, the stack memory serves as temporary storage and once the data has been recovered, the data values stored within the memory address space which provided the temporary storage are no longer required. Another example would be use of a FIFO, circular buffer or other temporary buffer.
When a programmer or compiler has identified that a load will be the last one to be performed upon a data value at a given memory address location, then a load-and-clean instruction may be used for that final load operation in place of, for example, a standard load instruction. The load-and-clean instruction controls a write-back cache which, may be storing the data value to be loaded, to mark that data value as clean after the load-and-clean instruction has been executed such that it no longer needs to be written back to the backing memory system. This saves memory bandwidth in writing back dirty data values which are no longer required and will not be used again. Marking the data as clean does not require the dirty data to be written out to the main memory as would be conventional with a full clean operation (write back and mark as clean).
It is possible that the write-back cache for which the load-and-clean instruction suppresses unnecessary write-back of dirty values may use cache lines which comprise a plurality of portions each having a dirty flag indicative of whether a respective portion has been written with data that has not yet been written back to a memory. As an example, per-byte dirty flags may be provided within each cache line.
In one example embodiment the write-back cache may respond to a load-and-clean instruction to change a dirty flag for at least the target portion of the cache line from which a load is being performed such that if the dirty flag for that target portion is set to "dirty", then it is changed to be clean. It will be appreciated that the load-and-clean instruction may change a dirty flag for a target portion to clean or if the flag for the target portion already indicates that it is clean, then this will be left unchanged. It is possible that the target portion may have been written while it was stored within the write-back cache, but has already been subject to a clean operation, such as by virtue of eviction from and then reloading into the write-back cache.
In some embodiments the load-and-clean instruction may change a dirty flag for the target portion which indicates that the target portion is dirty to indicate that the target portion is clean whilst leaving unchanged any dirty flags for other portions of the target cache line. This type of behavior is suited to embodiments in which the data being cached may correspond to a general purpose buffer in which there is no particular pattern to the accesses to different portions of a cache line.
In other example embodiments, the data structure stored within the write-back cache may be one with a particular access pattern, such as a stack memory. In this case, it may be known that if a load-and-clean instruction is executed for a target portion within a cache line, then any other portions of that cache line within the region extending from the target portion to a predetermined end of the cache line (i.e. extending in a predetermined memory-address-order from the target portion) will also not be needed again and so can be marked as clean (any dirty flags set to clean, but without a write-back needing to be performed). The portions of the cache line extending in the opposite direction to the pre-determined memory address order can have their dirty flags left unchanged.
In other example embodiments, it may be that extending the marking of portions of a cache line as clean when these do not encompass the entire cache line is of reduced benefit and accordingly operation may be simplified when, if the target portion is at a predetermined end of the target cache line, then any dirty flags for all portions of that target cache line are changed as necessary to clean, whereas, if the target portion is not at the predetermined end of the target cache line, then the dirty flags for portions other than the target portion are left unchanged. Such an embodiment still uses individual dirty flags for the different portions of a cache line.
In other embodiments, a plurality of portions of the target cache line may share a dirty flag and in some example embodiments a single dirty flag may be provided for a whole cache line. With such embodiments, the write-back cache may respond to a load-and-clean instruction if the target portion is at a predetermined end of the target cache line to change the dirty flag for the target cache line to indicate that the target cache line is clean and to suppress such action if the target portion is not at the predetermined end of the target cache line.
A feature of at least some example embodiments of the disclosure is that if a target portion for a load-and-clean instruction is marked as clean to avoid any subsequent unnecessary needed write-back, then the cache line containing that target portion remains in the write-back cache and so is available for further access operations, e.g. access operations to different portions of that cache line which are still required and still valid.
The write-back cache comprises cache line eviction circuitry which controls eviction of cache lines from the write-back cache, typically in accordance with one of many known eviction policies. This cache line eviction circuitry may also be responsive to execution of load-and-clean program instructions.
The manner in which the cache line eviction circuitry is responsive to execution of a loadand-clean program instruction can vary. In some example embodiments, when the execution of a load-and-clean program instruction results in all portions of the target cache line concerned being marked as clean, then this will serve to control the cache line eviction circuitry to promote that target cache line within an order for eviction make it the next eviction candidate. In other embodiments, the cache line eviction, e.g. circuitry may respond to execution of a load-and-clean program instruction, as distinct from other forms of memory access instructions, to suppress updating of least-recently-used data associated with that target cache line such that the load-anddean program instruction will not have an influence upon how the cache line is treated for eviction based upon its least-recently-used data.
In some embodiments, in addition to having one or more dirty flags per cache line, a cache line may also include a valid flag indicative of whether that cache line contains any valid data.
While it will be appreciated that the techniques of the disclosure could be used in a wide variety of different forms of processing systems, they may find good utility in the context of processing circuitry which a FIFO and/or circular buffer for memory accesses and/or within a system using a graphics processing unit (which may typically have a predictable pattern of use of data held within a write-back cache such that the last use of that data can be identified and a loadand-clean instruction employed to suppress wasteful unnecessary write-back operations).
Another form of the present technique is the provision of a compiler to identify places with a program in which load-and-clean program instructions can usefully be employed. Compilers typically already track data value usage within program code for reasons other than associated with write-back from cache memories. Given that a compiler can relatively readily identify the last use of a data value within a program, then the compiler when generating a load instruction can determine if that load instruction is the last use of the data value concerned, and accordingly generate a load-and-clean instruction while otherwise generating a "normal" load instruction if the load is not the last use of the data value concerned.
Figure 1 schematically illustrates a data processing system 2 including a processor 4, in the form of a graphics processing unit, a write-back data cache 6, an instruction cache 8 and a main memory 10. The main memory 10 in this example stores a stack memory region 12 having a stack pointer SP indicating the memory address at which data values are to be added to the stack memory region or removed from the stack memory region. There is a stack pointer growth direction associated with the stack memory region 12 and this may be either ascending or descending depending upon the system concerned and/or the configuration of that system.
The processor 4 fetches instructions from the instruction cache 8 to an instruction pipeline 14. When the instructions reach a decode stage, then they are decoded by an instruction decoder 16 which generates control signals which control processing performed by a variety of processing pipelines 18, 20, 22 that include a load/store pipeline 24. The load/store pipeline 24 is responsible for handling memory access instructions including both load-and-clean instructions and standard load instructions.
The write-back data cache 6 includes a plurality of cache lines 26 which each store a plurality of portions of data, e.g. the write-back data cache 6 may support access granularity down to byte accesses and include per-byte dirty bits as well as a per-line valid bit. Eviction circuitry 28 within the write-back data cache serves to control cache line eviction using one of a variety of different eviction algorithms, such as least-recently-used, round-robin, random etc. Figure 2 schematically illustrates load processing at the write-back cache 6. At step 30 processing waits until a load instruction is received. Step 32 then determines whether or not the load instruction hits within the write-back cache data 6. If there is no hit, the step 34 serves to fetch the data from the memory 10 into the write-back data cache 6. Either following the fetch at step 34 or if there is a hit, processing proceeds to step 36 where the write-back data cache 6 determines whether or not the load instruction it has received is a load-and-clean instruction. If the instruction is not a load-and-clean instruction, then step 38 serves to load the target data to the processor 4. If the load instruction is a load-and-clean instruction, then step 40 serves to load the target data to the processor 4 and change to clean any dirty flags for at least the target data within the target cache line where the hit occurred. As will be described below, there are various alternatives in the way in which either the particular target portion of a cache line may be marked as clean or a more extended region of the target cache line.
Figure 3 schematically illustrates a cache line 26 storing data values with different portions of the cache line, each having an associated dirty flag. If the dirty flag has a value of "1", then in this example embodiment this indicates that the corresponding portion of data within the target cache line is dirty (e.g. has been changed since it was fetched into the cache line and not yet written out to main memory). Conversely, if the dirty flag value for a portion is "0", then this indicates that the data value of that portion is consistent with the corresponding data value in the memory 10. The cache line 26 illustrated in Figure 3 is then subject to a load-and-clean instruction reading a target portion of the cache line 26 corresponding to the data values EEFFGGI11-1. The dirty flags for the four bytes which form this target portion were all previously set and the load-and-clean instruction serves to reset these as indicated in the bottom of Figure 3. It will be appreciated that if any of the dirty flags for the target portion was not set, then it would be unchanged by the action of the loadand-clean instruction. Furthermore, it will be appreciated that the particular values that the flag has to indicate dirty status or clean status could be switched or the dirty status could be represented by a flag having a different form.
Figure 4 illustrates a second example embodiment. In this example embodiment, the write-back data cache 6 responds to a load-and-clean instruction to the same target portion as for Figure 3 by marking as clean that target portion and additionally marking as clean all of the target portions within that cache line extending in a predetermined memory-address-order from the target portion to an end of the cache line in that direction. The predetermined memory-address-order may correspond to the stack growth direction. The final line in Figure 4 illustrates that, as a result of the modified behavior of the load-and-clean instruction in controlling the write-back data cache 6, the whole of the cache line 26 is now marked as clean. The predetermined memory-address-order (and the predetermined end referred to below) may be, for example, fixed for the system, set using a configuration parameter(s) for the system and/or selected via a field within the load-and-clean instruction itself, i.e. the instruction encoding specifies the direction.
Figures 5 and 6 illustrates a third example way in which the write-back data cache 6 may respond to a load-and-clean instruction. In this example a single dirty flag is provided for the whole of the cache line 26. When the target portion subject to the load-and-clean instruction is not at a predetermined end of the cache line 26, then the action of marking the target cache line as clean is suppressed. Thus, as can be seen in Figure 5, the target cache line 26 remains marked as dirty. In contrast, Figure 6 illustrates the same example embodiment, but in this case with the target portion being at the predetermined end of the target cache line. In this case, the whole target cache line is marked as clean by changing the value of the dirty bit for that target cache line as shown.
It will be appreciated that the examples of Figures 3, 4, 5 and 6 are only some examples of the way in which a load-and-clean instruction may operate. The manner of operation of the loadand-clean instruction may be fixed for a particular implementation of a write-back cache (and optionally not visible to the programmer) or could possibly be set by parameters associated with the load-and-clean instruction, e.g. the compiler could identify when the data values are stack data values and use the load-and-clean instruction variant of Figure 4 while using the load-and-clean instruction variant of Figure 3 for situations in which the data values correspond to a more general purpose buffer.
Figure 7 illustrates a first example of how eviction control may be modified to operate with load-and-clean instructions. At step 42, processing waits until a stack access is performed. Step 44 then determines whether the cache access performed was a load-and-clean instruction. If the cache access was not a load-and-clean instruction, then step 46 serves to update the least recently used (LRU) status of the target cache line which has been accessed to indicate that it has been accessed. If the cache access was a load-and-clean instruction as identified at step 44, then step 46 is bypassed. The effect of bypassing the LRU status is that the cache line concerned will not be noted as having recently been being used and accordingly will be more likely to be evicted. This is consistent with the cache line having been marked as clean (at least partially).
Figure 8 is a flow diagram schematically illustrating a second example of the way in which load-and-clean instructions may interact with eviction control from the write-back data cache 6. At step 48 processing waits until a load-and-clean instruction is executed. Step 50 then determines whether the whole of the target cache line which has been subject to the load-and-clean instruction is now marked as clean. If the whole of the target cache line is marked as clean, then step 52 serves to promote that target cache line in the eviction queue (e.g. to the top of the queue, or at least higher in the likelihood of eviction). If the determination at step 50 is that the whole of the target cache line is not now clean, then step 52 is bypassed.
It will be appreciated that the processing illustrated in Figures 7 and 8 will be performed by the cache line eviction circuitry 28 of Figure 1 as part of the eviction policy it is operating. A wide variety of different forms of eviction policy will be familiar to those in this technical field.
Figures 9 and 10 are flow diagrams schematically illustrating how a compiler of a source program to generate an object program may incorporate load-and-clean instructions into the object program. Figure 9 illustrates how the compiler can search for the last use of the data value. At step 54 the compiler parses the source program to identify data values loaded from memory and used in the program. Step 56 searches from the end of the source code program towards to the beginning of the source code program to identify the last use of each data value loaded from memory. Step 58 marks the last use occurrences identified at step 56 within the program.
Figure 10 illustrates how a compiler may generate load object instructions subsequent to the processing of Figure 9. At step 60 the compiler waits until it reaches a point within the program being compiled at which it is required to compile a load operation. Step 62 determines whether or not that load operation is marked as the last use of a data value which is to be loaded. These are the uses which were marked in step 58 of Figure 9. If the load is marked, then step 64 generates a loadand-clean instruction. If the load is not marked, then step 66 generates a standard load instruction.
Although illustrative embodiments have been described in detail herein with reference to the accompanying drawings, it is to be understood that the claims are not limited to those precise embodiments, and that various changes, additions and modifications can be effected therein by one skilled in the art without departing from the scope and spirit of the appended claims. For example, various combinations of the features of the dependent claims could be made with the features of the independent claims.

Claims (23)

  1. CLAIMSApparatus for processing data comprising: a write-back cache having a plurality of cache lines; processing circuitry to perform processing operations specified by program instructions; and an instruction decoder to decode a load-and-clean instruction to generate control signals: to control said processing circuitry to load data from a target portion of a target cache line; and to control said write-back cache to mark as clean at least said target portion of said target cache line.
  2. 2. Apparatus as claimed in claim 1, wherein said target cache line comprises a plurality of portions, each of said plurality of portions having a dirty flag indicative of whether a respective portion has been written with data not yet written back to a memory.
  3. 3. Apparatus as claimed in claim 2, wherein write-back cache responds to said load-and-clean instruction to change a dirty flag for at least said target portion indicating that said target portion is dirty to indicate that said target portion is clean.
  4. 4. Apparatus as claimed in claim 3, wherein write-back cache responds to said load-and-clean instruction: to change a dirty flag for said target portion indicating that said target portion is dirty to indicate that said target portion is clean; and to leave unchanged dirty flags for other portions of said target cache line.
  5. 5. Apparatus as claimed in claim 3, wherein write-back cache responds to said load-and-clean instruction: to change a dirty flag for said target portion indicating that said target portion is dirty to indicate that said target portion is clean; to change dirty flags for any further portions of said target cache line extending in a predetermined memory-address-order from said target portion to an end of said target cache line indicating that said further portions are dirty to indicate that said further portions are clean; and to leave unchanged dirty flags for any other portions of said target cache line extending in an opposite predetermined memory-address-order from said target portion to an opposite end of said target cache line.
  6. 6. Apparatus as claimed in claim 3, wherein write-back cache responds to said load-and-clean instruction: to change a dirty flag for said target portion indicating that said target portion is dirty to indicate that said target portion is clean; if said target portion is at a predetermined end of said target cache line, then to change dirty flags for all portions of said target cache line indicating that said target portion is dirty to indicate that said portions of said target cache line are clean; and if said target portion is not at said predetermined end of said target cache line, then to leave unchanged said dirty flags for other portions of said target cache line.
  7. 7. Apparatus as claimed in any one of claims 5 and 6, comprising a stack memory region within said memory, said stack memory region having a direction of growth corresponding to said predetermined memory-address-order.
  8. 8. Apparatus as claimed in claim 1, wherein said target cache line comprises a plurality of portions and said target cache line has a dirty flag indicative of whether any of said plurality of portions has been written with data not yet written back to a memory.
  9. 9. Apparatus as claimed in claim 8, wherein write-back cache responds to said load-and-clean instruction: if said target portion is at a predetermined end of said target cache line, then to change a dirty flag for said target cache line indicating that said target cache line is dirty to indicate that said target cache line is clean; and if said target portion is not at said predetermined end of said target cache line, then to leave unchanged said dirty flag for said target cache line.
  10. 10. Apparatus as claimed in claim 9, comprising a stack memory region within said memory, 1.3 said stack memory region having a direction of growth corresponding to a predetermined memoryaddress-order and said predetermined end of said target cache line corresponds to a latest address in said direction of growth within said target cache line.
  11. 11. Apparatus as claimed in any one of the preceding claims, wherein said target cache line remains valid and available for further access operations following execution of said load-and-clean program instruction.
  12. 12. Apparatus as claimed in any one of the preceding claims, wherein said write-back cache comprises cache line eviction circuitry to control eviction of cache lines from said write-back cache and said cache line eviction circuitry is responsive to execution of load-and-clean program instructions.
  13. 13. Apparatus as claimed in claim 12, wherein said cache line eviction circuitry responds to execution of a load-and-clean program instruction that results in all portions of said target cache line marked as clean to promote said target cache line in an order for eviction.
  14. 14. Apparatus as claimed in claim 12, wherein said cache line eviction circuitry responds to execution of a load-and-clean program instruction to suppress updating of least-recently-used data associated with an access to said target cache line resulting from said load-and-clean program instruction.
  15. 15. Apparatus as claimed in any one of the preceding claims, wherein said plurality of cache lines each have a valid flag indicative of whether a respective cache line contains any valid data.
  16. 16. Apparatus as claimed in any one of the preceding claims, wherein said processing circuitry is programmed to access data within a temporary buffer.
  17. 17. Apparatus as claimed in any one of the preceding claims, wherein said apparatus is a graphics processing unit.
  18. 18. Apparatus for processing data comprising: write-back cache means for storing data, said write-back cache means having a plurality of cache lines; processing means for performing processing operations specified by program instructions; and instruction decoding means for decoding a load-and-clean instruction to generate control signals: to control said processing means to load data from a target portion of a target cache line; and to control said write-back cache means to mark as clean at least said target portion of said target cache line.
  19. 19. A method of processing data comprising: storing data within a write-back cache having a plurality of cache lines; perform processing operations specified by program instructions; and decoding a load-and-clean instruction to generate control signals: to control loading data from a target portion of a target cache line; and to control marking as clean at least said target portion of said target cache line.
  20. 20. A method of compiling a source program to generate an object program comprising: identifying a last use within said source program of a data value stored at a memory address; if said source program specifies loading a target data value that is a last use of said target data value, then generating a corresponding load-and-clean instruction within said object program; and if said source program specifies loading a target data value that is not a last use of said target data value, then generating a corresponding load instruction within said object program.
  21. 21. Apparatus for processing data substantially as hereinbefore described with reference to the accompanying drawing.
  22. 22. A method of processing data substantially as hereinbefore described with reference to the accompanying drawing.
  23. 23. A method of compiling a source program substantially as hereinbefore described with reference to the accompanying drawing.
GB1422789.6A 2014-12-19 2014-12-19 Cleaning a write-back cache Active GB2533768B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
GB1422789.6A GB2533768B (en) 2014-12-19 2014-12-19 Cleaning a write-back cache
US14/957,117 US20160179676A1 (en) 2014-12-19 2015-12-02 Cleaning a write-back cache

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB1422789.6A GB2533768B (en) 2014-12-19 2014-12-19 Cleaning a write-back cache

Publications (2)

Publication Number Publication Date
GB2533768A true GB2533768A (en) 2016-07-06
GB2533768B GB2533768B (en) 2021-07-21

Family

ID=56106274

Family Applications (1)

Application Number Title Priority Date Filing Date
GB1422789.6A Active GB2533768B (en) 2014-12-19 2014-12-19 Cleaning a write-back cache

Country Status (2)

Country Link
US (1) US20160179676A1 (en)
GB (1) GB2533768B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220197813A1 (en) * 2020-12-23 2022-06-23 Intel Corporation Application programming interface for fine grained low latency decompression within processor core

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004017209A2 (en) * 2002-08-14 2004-02-26 Koninklijke Philips Electronics N.V. Optimized write back for context switching
EP1693760A1 (en) * 2005-02-17 2006-08-23 Texas Instruments Incorporated Organization of dirty bits for a write-back cache
US20120159077A1 (en) * 2010-12-21 2012-06-21 Steely Jr Simon C Method and apparatus for optimizing the usage of cache memories

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6021472A (en) * 1995-08-21 2000-02-01 Canon Kabushiki Kaisha Information processing device and control method thereof
US6643741B1 (en) * 2000-04-19 2003-11-04 International Business Machines Corporation Method and apparatus for efficient cache management and avoiding unnecessary cache traffic
JP4434534B2 (en) * 2001-09-27 2010-03-17 株式会社東芝 Processor system
US7454575B2 (en) * 2003-12-22 2008-11-18 Matsushita Electric Industrial Co., Ltd. Cache memory and its controlling method
US7702855B2 (en) * 2005-08-11 2010-04-20 Cisco Technology, Inc. Optimizing cached access to stack storage
JP5440067B2 (en) * 2009-09-18 2014-03-12 富士通株式会社 Cache memory control device and cache memory control method
US9952977B2 (en) * 2009-09-25 2018-04-24 Nvidia Corporation Cache operations and policies for a multi-threaded client
US9645866B2 (en) * 2010-09-20 2017-05-09 Qualcomm Incorporated Inter-processor communication techniques in a multiple-processor computing platform
US20140164708A1 (en) * 2012-12-07 2014-06-12 Advanced Micro Devices, Inc. Spill data management
US9760498B2 (en) * 2014-09-26 2017-09-12 Qualcomm Incorporated Hybrid cache comprising coherent and non-coherent lines

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004017209A2 (en) * 2002-08-14 2004-02-26 Koninklijke Philips Electronics N.V. Optimized write back for context switching
EP1693760A1 (en) * 2005-02-17 2006-08-23 Texas Instruments Incorporated Organization of dirty bits for a write-back cache
US20120159077A1 (en) * 2010-12-21 2012-06-21 Steely Jr Simon C Method and apparatus for optimizing the usage of cache memories

Also Published As

Publication number Publication date
US20160179676A1 (en) 2016-06-23
GB2533768B (en) 2021-07-21

Similar Documents

Publication Publication Date Title
JP6325243B2 (en) Cache replacement policy based on retention priority
US7493451B2 (en) Prefetch unit
CN107479860B (en) Processor chip and instruction cache prefetching method
US20180300258A1 (en) Access rank aware cache replacement policy
US10719448B2 (en) Cache devices with configurable access policies and control methods thereof
US20090172291A1 (en) Mechanism for effectively caching streaming and non-streaming data patterns
JP2021530782A (en) Branch target buffer for multiple tables
US20100011165A1 (en) Cache management systems and methods
KR102478766B1 (en) Descriptor ring management
US8856453B2 (en) Persistent prefetch data stream settings
CN111201518A (en) Apparatus and method for managing capability metadata
JP6457836B2 (en) Processor and instruction code generation device
US20120124291A1 (en) Secondary Cache Memory With A Counter For Determining Whether to Replace Cached Data
US9846647B2 (en) Cache device and control method threreof
US20150032970A1 (en) Performance of accesses from multiple processors to a same memory location
US20110179227A1 (en) Cache memory and method for cache entry replacement based on modified access order
US20160179676A1 (en) Cleaning a write-back cache
JP2007272681A (en) Cache memory device, and method for replacing cache line in same
JP7366122B2 (en) Processor filtered branch prediction structure
JP5971036B2 (en) Arithmetic processing device and control method of arithmetic processing device
US11734011B1 (en) Context partitioning of branch prediction structures