GB2456924A - Allocating memory to free space in a cache - Google Patents

Allocating memory to free space in a cache Download PDF

Info

Publication number
GB2456924A
GB2456924A GB0904430A GB0904430A GB2456924A GB 2456924 A GB2456924 A GB 2456924A GB 0904430 A GB0904430 A GB 0904430A GB 0904430 A GB0904430 A GB 0904430A GB 2456924 A GB2456924 A GB 2456924A
Authority
GB
United Kingdom
Prior art keywords
cache
usage information
memory
list
storage element
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
GB0904430A
Other versions
GB0904430D0 (en
GB2456924B (en
Inventor
Elmar Zipp
Eberhard Pasch
Markus Nosse
Achim Haessler
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Publication of GB0904430D0 publication Critical patent/GB0904430D0/en
Publication of GB2456924A publication Critical patent/GB2456924A/en
Application granted granted Critical
Publication of GB2456924B publication Critical patent/GB2456924B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/0223User address space allocation, e.g. contiguous or non contiguous base addressing
    • G06F12/023Free address space management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0877Cache access modes
    • G06F12/0886Variable-length word access
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0893Caches characterised by their organisation or structure

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

The cache 20 in a computer system is divided into storage elements 24, which may be smaller than a cache line 22. Each storage element has a flag 33 which indicates whether the element is available for allocation. When a program 38 running on the processor requests the allocation of memory in the heap, the system firmware 36 identifies whether there is a contiguous block of storage elements, which is large enough to satisfy the memory request and is available for allocation. If so, the flags are set to indicate that the storage elements are no longer available and the memory block which corresponds to the storage elements is returned. When the program frees the data from the heap, the flags are cleared to indicate that the storage elements are now available.

Description

INTELLECTUAL
PROPERTY OFFICE
Application No (iBO9O443() 6 RTM Date 16 June 2009 The following terms are registered trademarks and should be read as such wherever they occur in this document: Java
DESCRIPTION
METHOD, APPARATUS, COMPUTER PROGRAM PRODUCT AND DATA PROCESSING
PROGRAM OF CONTROLLING CACHE USAGE IN A COMPUTER SYSTEM
BACKGROUND OF THE INVENTION
Field of the Invention
The present invention relates in general to processing within a computing environment and in particular to a method of controlling a cache usage in a computer system and an apparatus of controlling a cache usage in a computer system. Still more particularly, the present invention relates to a data processing program and a computer program product of controlling a cache usage in a computer system.
Description of the Related Art
Software and hardware of existing computing environments generally aim to keep the maximum of used data in the data caches. The software tries to improve data locality. However, most modern software relies on Heap-oriented memory management methodologies that move memory access linearly forward. Since cache size is usually limited cache uses so called optimizing instructions or optimizing algorithms so that a computer program or a hardware maintained structure can keep hot data as effectively as possible in the cache and offers preloading features. Such optimizing algorithms are e.g. Least Recently Used (LRU) algorithm which discards the least recently used items first or Least Frequently Used (LFU) algorithm which counts how often an item is needed and those that are used least often are discarded first. Efforts are taken in software and cache hardware to react on recognized or statistically measured cache usage to improve cache miss rates, but there is no optimal cache exploitation. Without explicit knowledge about the cache allocation, the software is not able to optimize the cache usage to its maximum. It is a base paradigm of today's architectures that insist on hiding the cache infrastructure from any software implementation. There is no function that allows a precise placement of a new memory object in a cached memory area by replacing an old and unused data element, avoiding cache misses during access and minimizing memory consumption. That affects esp. object orientated languages, where most objects "die young".
Summary of the Invention
The technical problem underlying the invention is to provide a method and an apparatus of controlling a cache usage in a computer system, which are able to optimize the use of cache hardware, and to provide a data processing program and a computer program product to perform the method of controlling a cache usage in a computer system.
The invention solves this problem by providing a method of controlling a cache usage in a computer system having the features of claim 1, an apparatus of controlling a cache usage in a computer system having the features of claim 16, a data processing program for performing the method of controlling a cache usage in a computer system having the features of claim 19, and a computer program product causing a computer to perform the method of controlling a cache usage in a computer system having the features of claim 20. Advantageous embodiments of the invention are mentioned in the sub claims.
Accordingly, in an embodiment of the invention a method for controlling a cache usage in a computer system comprises the steps: Generating an usage information, wherein said usage information comprising a first status indicates that a corresponding storage element of the cache is currently available to be used for allocation, and wherein the usage information comprising a second status indicates that the corresponding storage element of the cache is currently not available to be used for allocation. A memory allocation request analyzes the usage information to identify an available storage element in the cache comprising a sufficient size satisfying the memory allocation request. In case of a successful identified storage element comprising the sufficient size a corresponding memory block is returned based on the usage information comprising the first status.
In a further embodiment of the present invention, the usage information is maintained by cache hardware and/or internal software.
In exemplary embodiments of the present invention, the usage information comprises at least one indicator bit each representing a given number of bytes of a cache line, wherein the at least one indicator bit represents the first status by a first logical level and the second status by a second logical level.
In further embodiments of the present invention, an operation to free or release memory blocks verifies if a cache line still holds data for a given address, wherein the operation to free or release memory blocks sets a corresponding indicator bit to the first status in response to an identified cache line still holding said data.
In further embodiments of the present invention, the at least one corresponding indicator bit is reset to the second logical level in response to a memory allocation request which could be satisfied from a storage element within the cache.
In further embodiments of the present invention, the usage information is analyzed to identify a cache line comprising a minimum of related indicator bits with the first logical level and/or a maximum of related indicator bits with the second logical level before performing a line-drop operation in the cache, wherein the identified cache line is dropped during the line-drop operation.
In further embodiments of the present invention, all indicator bits of a related cache line are reset to the second logical level in response to the line-drop operation.
In further embodiments of the present invention, a table with storage pointers to storage element strings of various lengths is maintained, wherein each storage element of the storage element strings is represented by an indicator bit with the first logical level.
In other exemplary embodiments of the present invention, the usage information comprises at least one free list holding list elements each representing a given number of bytes of a cache line, wherein a free list holding at least one list element represents the first status of the usage information, and an empty free list represents the second status of the usage information.
In further embodiments of the present invention, the usage information comprises free lists for different storage element sizes of the cache.
In further embodiments of the present invention, an operation to free or release memory blocks verifies if a cache line holds data for a given address or address block, wherein the operation to free or release memory blocks writes the corresponding address or address block as list elements into corresponding free lists in response to an identified cache line still holding data.
In further embodiments of the present invention, the memory allocation request verifies if an address represented by a first list element of a chosen free list still holds data in the cache to identify an available storage element in the cache comprising a sufficient size satisfying the memory allocation request.
In further embodiments of the present invention, the corresponding first list element is removed from the corresponding free list in response to a memory allocation request which could be satisfied from a storage element within the cache.
In further embodiments of the present invention, all list elements of all free lists are removed in response to a memory allocation request which could not be satisfied within the cache.
In further embodiments of the present invention, new memory is allocated on a heap, wherein data is loaded from a memory into at least one corresponding cache line, wherein for all remaining memory elements new list elements are generated and written in at least one corresponding free list.
In another embodiment of the present invention, an apparatus for controlling a cache usage in a computer system comprises a usage information facility generating usage information. The usage information comprising a first status indicates that a corresponding storage element of the cache is currently available to be used for allocation, wherein the usage information comprising a second status indicates that the corresponding storage element of the cache is currently not available to be used for allocation. The information facility analyzes the usage information in response to a memory allocation request to identify an available storage element in the cache comprising a sufficient size satisfying the memory allocation request, wherein in case of a successful identified storage element comprising the sufficient size the information facility returns a corresponding memory block based on the usage information comprising the first status.
In further embodiments of the present invention, the information facility comprises at least one indicator bit each representing a given number of bytes of a cache line, or at least one free list holding list elements each representing a given number of bytes of a cache line to generate the usage information. The at least one indicator bit represents the first status by a first logical level and the second status by a second logical level.
Similarly, a free list holding at least one list element represents the first Status of the usage information, and an empty free list represents the second status of the usage information.
In further embodiments of the present invention, the information facility is implemented as an entire hardware embodiment or an entire software embodiment or an embodiment containing both hardware and software elements.
In another embodiment of the present invention, a data processing program for execution in a data processing system comprising software code portions for performing a method for controlling a cache usage in a computer system when the program is run on the data processing system.
In yet another embodiment of the present invention, a computer program product stored on a computer-usable medium, comprising computer-readable program means for causing a computer to perform a method for controlling a cache usage in a computer system when the program is run on said computer.
The disclosed embodiments of the invention are able to gain knowledge about the lifetime of the cached data and are able to systematically reuse objects that are not used anymore but still stored in the cache, so that an optimal usage of existing cache hardware structures can be achieved.
The above, as well as additional purposes, features, and advantages of the present invention will become apparent in the
following detailed written description.
Brief Description of the Drawings
Advantageous embodiments of the invention, as described in detail below, as well as prior art embodiments discussed to facilitate the understanding of the invention are shown in the drawings, in which FIG. 1 is a schematic block diagram of a computer system comprising an apparatus of controlling a cache usage in the computer system, in accordance with preferred embodiments of the present invention; FIG. 2 is a block diagram of the apparatus of controlling a cache usage in a computer system and a corresponding cache, in accordance with the a first embodiment of the present invention; FIG. 3 is a state-change diagram of a single indicator bit, in accordance with the first embodiment of the present invention; FIG. 4 -11 are example diagrams showing graphically a degree of cache usage for a series of different long-lived and short-lived memory elements with and without using the apparatus of controlling the cache usage, in accordance with the first embodiment of the present invention; FIG. 12 is a flow diagram of a memory read operation, in
accordance with a prior art embodiment;
FIG. 13 is a flow diagram of a memory read operation comprising the method of controlling a cache usage in a computer system, in accordance with an embodiment of the present invention; FIG. 14 is a block diagram of the apparatus of controlling a cache usage in a computer system and a corresponding cache, in accordance with a second embodiment of the present invention; and FIG. 15 is a flow diagram of an allocation operation comprising the method of controlling a cache usage in a computer system, in accordance with an embodiment of the present invention.
Detailed Description of the Preferred Embodiments
FIG. 1 shows a schematic block diagram of a computer system 1.
Referring to FIG. 1, the shown embodiment of computer system 1 comprises a central processor complex (CPC) 10, a cache 20 and an apparatus of controlling a usage of the cache 20 comprising an optimization facility 30, in accordance with preferred embodiments of the present invention. Furthermore the optimization facility 30 comprises a usage information facility 31, 32 which is a hardware control that cooperates with software memory management interfaces. The main goal is to provide software with memory pieces that reside in the data cache 20 (e.g. cache of Level 1, Level 2, ...) already when a memory allocation operation is done. The cache 20 provides info to the optimization facility 30 to find the best memory address for the allocation operation. To allow that, the cache hardware offers usage information for a cache line 22 or parts of it, indicating what part of the cache 20 is actually available to be used to satisfy a memory allocation request. This usage information is the base for a decision, which memory block should be returned to the memory allocating software component. The usage information comprising a first status indicates that a corresponding storage element of the cache 20 is actually available to be used for allocation, wherein the usage information comprising a second status indicates that the corresponding storage element of the cache 20 is actually not available to be used for allocation. The information facility 31, 32 analyzes the usage information in response to the memory allocation request to identify an available storage element in the cache 20 comprising a sufficient size satisfying the memory allocation request. In case of a successful identified storage element 24 comprising the sufficient size the information facility 31, 32 returns a corresponding memory block based on the usage information comprising the first status. Therefore the allocation request can be satisfied from storage elements 24 within the cache 20. Otherwise the memory allocation request is satisfied by traditional method(s).
The optimization facility 30 gives hint to cache 20 which cache lines 22 can be dropped before any standard aging algorithm like LRU, LFU etc. is applied. Before dropping cache lines 22 that were used the last but still contain valid information, the cache 20 can drop such cache lines 22 that are explicitly marked to contain no valid information anymore. The usage information is maintained by the optimization facility 30 implemented as an entire hardware embodiment or an entire software embodiment or an embodiment containing both hardware and software elements.
The controlling events occur during "allocation" and "free" operations that are issued by software, using a new instruction interface.
-10 -FIG. 2 shows a block diagram of a first embodiment of the optimization facility 30 and a corresponding cache 20, and FIG. 3 shows a state-change diagram of a single indicator bit 33.
Referring to FIG. 2, the shown embodiment of the optimization facility 30 comprises the information facility 31 with a multiple of indicator bits 33 controlled by internal firmware 36 and/or application software 38 to generate usage information wherein each indicator bit 33 represents a given number of bytes of a cache line 22. There could be for example 16 indicator bits 33 per cache line 22 each indicator bit 33 representing 16 bytes of object data. Referring to FIG. 3 each indicator bit 33 represents the first status of the usage information by a first logical level "1" and the second status of the usage information by a second logical level "0". Each indicator bit 33 is set to the first logical level "1" by the information facility 31 during a "free" operation, that means an operation to free or release memory blocks, wherein the set indicator bit indicates a valid data element in cache 20 not used by software. The free operation verifies if a cache line 22 still holds data for a given address, wherein the free operation sets a corresponding indicator bit 33 to the first status in response to an identified cache line 22 still holding data. Each indicator bit 33 is reset by the information facility 31 to the second logical level "0" in response to a memory allocation request which could be satisfied from a storage element 24 within the cache 20. That means the information facility 31 analyzes the status of the indicator bits 33 to find cached empty storage elements 24 during the allocation operation and clears the corresponding indicator bits 33 if suitable storage elements 24 are found. To find a suitable string of empty storage elements 24 the information facility 31 can maintain a table with storage pointers to storage element strings of various length, wherein each storage element 24 of the storage element strings is represented by one indicator bit 33 with the first logical level " 1 -11 -Additionally all indicator bits 33 representing storage elements 24 of a related cache line 22 are reset to the second logical level "0" in response to a line-drop operation. A line drop operation is done, if suitable empty storage elements 24 are not found in the cache 20 during the allocation operation. The optimization facility 30 identifies a cache line 22 comprising a minimum of related indicator bits 33 with the first logical level and/or a maximum of related indicator bits 33 with the second logical level before performing the line-drop operation in the cache 20, wherein the identified cache line 22 is dropped during the corresponding line-drop operation. Additionally the optimization facility 30 can run an optimizing algorithm e.g. the Least Recently Used (LRU) algorithm which discards the least recently used items first, or the Least Frequently Used (LFU) algorithm which counts how often an item is needed and those that are used least often are discarded first, to define the cache line 22 to be dropped, if more than one cache line 22 with a minimum of related indicator bits 33 with the first logical level and/or a maximum of related indicator bits 33 with the second logical level are identified.
The interaction of application software 38 and the information facility 31 is done by means of a suitable instruction or pair of instructions. In reaction to an allocation command the information facility 31 searches in the specified memory area for set indicator bits 33 in the cache 20 using firmware 36 for example. A contiguous string of set indicator bits 33 of sufficient length identifies an available set of storage elements 24 in the cache 20. The Firmware 36 performs the allocation operation and returns the address of this set of data elements 24 and clears the related indicator bits 33. If no suitable set of available storage elements 24 is identified the allocation operation returns corresponding answering data to the information facility 31 which runs normal software memory -12 -allocation methods by extending the memory area or optimizing algorithms to drop identified cache lines 22, for example. A storage element 24 can only match to one allocation request since the indicator bits 33 are changed atomatically.
In the storage area maintained by the information facility 31, only such storage elements 24 can have a set indicator bit 33 that were freed by the information facility 31 before.
Therefore, it is safe to use the returned storage element 24 and to insert it into the memory management structures of the application software 38 without further checking.
The information facility 31 identifies the related cache lines 22 by the firmware 36, for example, and sets all indicator bits 33 of the related storage elements 24 to the first logical level "1" if the corresponding cache line 22 still holds data for that address. If the corresponding cache line 22 it does not hold data anymore, the instruction does nothing.
FIG. 4 -11 show example diagrams showing graphically a degree of cache usage for a series of different long-lived and short-lived memory elements with and without using the apparatus of controlling the cache usage, in accordance with the first embodiment of the present invention. These examples show the benefits if repeating allocation/free operations can directly replace released memory elements 24 in the cache 20 by newly allocated ones. Especially object oriented programming produces a high percentage of short lived memory objects. Some benchmarks show up to 9:1 ratio.
The examples show graphically the degree of cache usage for a series of different short-lived memory elements 24. Each example diagram shows ten allocation sequences and the resulting layout of valid data in the cache 20 as blocks with numbers wherein the number indicates the corresponding allocation sequence. The fat -13 -lined frame indicates the extent of the total used storage blocks in the cache 20 which means the size of a single allocation request. The shaded blocks indicate garbage data blocks, which are not used anymore and empty blocks indicate empty not used data blocks of the cache 20.
FIG. 4 and 5 show a first case in which every second allocation is short-lived and is released before next allocation operation takes place, wherein for description purpose all data blocks of the allocation operations have the same size. In the first case, FIG. 4 shows a simple standard memory management on a linearly growing heap (e.g. in Java) and FIG. 5 shows the same heap if managed with the optimization facility 30 according to embodiments of the present invention. Due to comparison of FIG. 4 with FIG. 5 it can be seen, that the cached memory optimized with the optimization facility 30 shown in FIG. 5 is much denser than the cached memory shown in FIG. 4. Additionally there is no garbage contained in the optimized cache and the extent of used cached memory is significantly lower. According to the first case the optimized cache needs only 60% of the cached memory compared to the standard heap approach shown in FIG. 4.
FIG. 6 and 7 show a second case in which every third allocation is short-lived and is released before next allocation operation takes place, wherein for description purpose all data blocks of the allocation operations have also the same size. In the second case, FIG. 6 shows the simple standard memory management on a linearly growing heap and FIG. 7 shows the same heap if managed with the optimization facility 30 according to embodiments of the present invention. Due to comparison of FIG. 6 with FIG. 7 it can be seen, that the cached memory optimized with the optimization facility 30 shown in FIG. 7 is much denser than the cached memory shown in FIG. 6. Additionally there is no garbage contained in the optimized cache and the extent of used cached memory is significantly lower. According to the second case the -14 -optimized cache needs only 70% of the cached memory compared to the standard heap approach shown in FIG. 6. The second case corresponds to the lower boundary of the typical short-lived to long-lived object lifetime ratio (30%) which is given in the literature for typical benchmarks.
FIG. 8 and 9 show a third case in which two of three allocations are short-lived and are released before the forth allocation operation takes place, wherein for description purpose all data blocks of the allocation operations have the same size. In the third case, FIG. 8 shows a simple standard memory management on a linearly growing heap and FIG. 9 shows the same heap if managed with the optimization facility 30 according to embodiments of the present invention. Due to comparison of FIG. 8 with FIG. 9 it can be seen, that the cached memory optimized with the optimization facility 30 shown in FIG. 5 is much denser than the cached memory shown in FIG. 4. Additionally the extent of used cached memory in the optimized cache is significantly lower. According to the third case the optimized cache needs only 50% of the cached memory compared to the standard heap approach shown in FIG. 8. The third case is a more complex case that does not just linearly grow through the cache but leads to a mix of older and younger memory object neighbourhoods.
FIG. 10 and 11 show a fourth case in which two of three allocations are short-lived and are released before the third allocation operation takes place, wherein for description purpose all data blocks of the allocation operations have the same size. In the fourth case, FIG. 10 shows a simple standard memory management on a linearly growing heap and FIG. 11 shows the same heap if managed with the optimization facility 30 according to embodiments of the present invention. Due to comparison of FIG. 10 with FIG. 11 it can be seen, that the cached memory optimized with the optimization facility 30 shown in FIG. 11 is much denser than the cached memory shown in FIG. -15 - 10. Additionally the extent of used cached memory in the optimized cache is significantly lower. According to the fourth case the optimized cache needs only 20% of the cached memory compared to the standard heap approach shown in FIG. 10. The fourth case is close to the typical short-lived to long-lived ratio in object oriented programs of about 60%. It shows the huge advantage compared to a standard heap environment as e.g. used for Java.
FIG. 12 shows a flow diagram of a memory read operation, in accordance with a prior art embodiment, and FIG. 13 shows a flow diagram of a memory read operation comprising the method of controlling a cache usage in a computer system, in accordance with an embodiment of the present invention.
Referring to FIG. 12, a program executed on a processor requires data e.g. to be loaded into a register in step 100. So in step SilO the registers load triggers a lookup of the required data in the cache, for simplicity only a one level cache structure is shown here. If the data is in the cache, the data from the cache is loaded into the processor in step s140. If the data is not in the cache, a given routine is run in step S120 to find a cache line where the data could be stored. The cache line could be determined e.g. by the LRU algorithm or the LFU algorithm etc. The data in the determined cache line is discarded from the cache line and the requested data from the memory is loaded into the cache line in step Sl30. After the data is loaded in the cache line the data is also loaded from the cache line into the processor in step S140.
Referring to FIG. 13, a memory read operation comprising the method of controlling a cache usage in a computer system, in accordance with an embodiment of the present invention is described. Again a program executed on the processor, which could be part of the central computer complex 10, requires data -16 -e.g. to be loaded into a register in step 100. So in step SilO the registers load triggers again a lookup of the required data in the cache 20, for simplicity only a one level cache structure is shown here. If the data is in the cache 20, the data from the cache 20 is again loaded into the processor in step Sl40. If the data is not in the cache, it is checked in step S115 if there is a cache line 22 or if there are cache lines 22 with all indicator bits 33 set indicating unused cache lines. If there is a cache line 22 or if there are cache lines 22 identified in step S115 the data is load into the identified cache line 22 or cache lines 22 in step S130. If there is no cache line identified in step S1l5 a given routine is run in step S121 to find a cache line 22 where the data could be stored. The cache line could be determined by existing methods like the LRU algorithm or the LFU algorithm etc. plus an indicator bit evaluation. Data in the determined cache line 22 is discarded from the cache line 22. Then the requested data from the memory is loaded into the cache line 22 in step S130. After the data is loaded in the cache line 22 the corresponding indicator bits 33 of the cache line 22 are reset if set in step S131. Again the data is also loaded from the cache line 22 into the processor in step S140.
If a write operation of a processor is going through the cache the read scenarios apply as well. If a processor bypasses the cache 20 for writes then there is no change and no benefit.
Using the free operation the optimization facility 30 can give the cache 20 a hint for a future use of specific memory areas.
Some software products of the prior art do memory management in a fashion that they allocate a block of memory at the beginning and access the memory in a round robin fashion. With the current cache algorithms this leads to the drop of in use cache lines, causing very expensive reloads at the next access in terms of cycles. The major benefit of the method and apparatus of -17 -controlling a cache usage in a computer system according to embodiments of the invention is that the cache subsystem can make absolute right choices in some cases when choosing cache lines for read/write operations.
FIG. 14 shows a block diagram of a second embodiment of the optimization facility 30 and a corresponding cache 20, and FIG. shows a flow diagram of a corresponding allocation operation comprising the method of controlling a cache usage in a computer system, in accordance with an embodiment of the present invention.
Referring to FIG. 14, the shown embodiment of the optimization facility 30 comprises the information facility 32 with a multiple of free lists 34 holding list elements 35 each representing a given number of bytes of a cache line 22 and controlled by internal firmware 36 and/or application software 38 to generate the usage information. There could be f or example a free list 34 holding list elements 35 with 16 Bytes, a free list 34 holding list elements 35 with 32 Bytes, a free list 34 holding list elements 35 with 64 Bytes up to a free list 34 holding list elements 35 with Nx16 Bytes per cache line 22. A free list 34 holding at least one list element 35 represents the first status of the usage information, and an empty free list 34 represents the second status of the usage information.
According to the second embodiment of the optimization facility a free operation verifies if a cache line 22 holds data for a given address or address block. The free operation writes the corresponding address or address block as list elements 35 into corresponding free lists 34 in response to an identified cache line 22 still holding data. So the information facility 32 according to the second embodiment of the optimization facility maintains free lists 34 for each object size instead of using indicator bits 33 like the information facility 31 according to -18 -the first embodiment of the optimization facility 30. Therefore the shown second embodiment of the optimization facility 30 allows an easier hardware implementation compared with the first embodiment shown in FIG. 2 but looses the ability to merge free data blocks and to improve the LRU- or LFU-based line dropping mechanisms.
Referring to FIG. 15, the shown allocation operation comprising the method of controlling a cache usage in a computer system, in accordance with an embodiment of the present invention, is described. An application program executed on a processor, which could be part of the central computer complex 10, runs an allocation operation according to step S200. So in step S210 it is checked if a free list 34 holding list elements 35 having the sufficient size to satisfy a corresponding memory allocation request is empty. If the free list 34 with the corresponding size holds at least one list element 35 the newest list element is popped and removed from the corresponding free list 34 in step S230. Step S240 verifies if an address represented by the popped list element 35 of the chosen free list 34 still holds valid data in the cache 20 to identify an available storage element 24 or a set of available storage elements 24 in the cache 20. If the addressed storage area of the cache 20 is still valid, the allocation operation is performed with the existing address represented by the popped list element 35 in step S250.
In step S260 the object reference is returned to the application program.
If the free list 34 with the corresponding size is empty according to step S2l0, a new heap in multiples of cache lines 22 is allocated in step S220. In step S222 a new top heap pointer is calculated and stored. In step S224 all remaining memory elements 24 of the new heap are written as new list elements 35 in at least one corresponding free list 34. Again in -19 -step S260 the object reference is returned to the application program.
If the addressed storage area of the cache 20 is not valid anymore according to step S240, a new heap in multiples of cache lines 22 is allocated in step S220 and the allocation operation continues with step S222 as described above.
An optional step S212, shown as block with dashed lines, can check if a free list 34 holding list elements 35 having a larger size to satisfy a corresponding memory allocation request, is empty, if the free list 34 having a corresponding size is empty according to step S2l0. If the free list 34 with the larger size holds at least one list element 35 the newest list element 35 is popped and removed from the corresponding free list 34 in step S230 and the allocation operation continues with step S240 as described above. If the free list 34 with the larger size is empty according to step S212 a new heap in multiples of cache lines 22 is allocated in step S220 and the allocation operation continues with step S222 as described above.
If the addressed storage area of the cache 20 is not valid anymore according to step S240, an optional step S252, shown as block with dashed lines, can remove all list elements 35 of all free lists 34. After removing all list elements 35 of all free lists in step S232 the allocation operation continues with step S220 as described above.
The invention can take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment containing both hardware and software elements. In a preferred embodiment, the invention is implemented in software, which includes but is not limited to firmware, resident software, microcode, etc. -20 -Furthermore, the invention can take the form of a computer program product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system.
For the purposes of this description, a computer-usable or computer-readable medium can be any apparatus that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
The medium can be an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system (or apparatus or device) or a propagation medium. Examples of a computer-readable medium include a semiconductor or solid state memory, magnetic tape, a removable computer diskette, a random access memory (RAN), a read-only memory (ROM), a rigid magnetic disk, and an optical disk. Current examples of optical disks include compact disk -read only memory (CD-ROM), compact disk -read/write (CD-R/W), and DVD.
A data processing system suitable for storing and/or executing program code will include at least one processor coupled directly or indirectly to memory elements through a system bus.
The memory elements can include local memory employed during actual execution of the program code, bulk storage, and cache memories which provide temporary storage of at least some program code in order to reduce the number of times code must be retrieved from bulk storage during execution.
Input/output or I/O devices (including but not limited to keyboards, displays, pointing devices, etc.) can be coupled to the system either directly or through intervening I/O controllers.
-21 -Network adapters may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks. Modems, cable modems, and Ethernet cards are just a few of the currently available types of network adapters.
Moreover the data processing system may include an emulator (e.g. software or other emulation mechanisms) in which particular architecture or subset thereof is emulated. In such an environment, one or more emulation functions of the emulator can implement one or more aspects of the present invention, for example the allocation operation or the free operation, even though a computer executing the emulator may have a different architecture than the capabilities being emulated. As one example, in emulation mode, the specific instruction or operation being emulated is decoded, and an appropriate emulation function is built to implement the individual instruction or operation.

Claims (20)

  1. -22 -
    What is claimed iS: 1. A method of controlling a cache usage in a computer system (1), said method comprising: generating an usage information (33, 34), wherein said usage information (33, 34) comprising a first status indicates that a corresponding storage element (24) of said cache (20) is currently available to be used for allocation, wherein said usage information (33, 34) comprising a second status indicates that said corresponding storage element (24) of said cache (20) is currently not available to be used for allocation, wherein a memory allocation request analyzes said usage information (33, 34) to identify an available storage element (24) in said cache (20) comprising a sufficient size satisfying said memory allocation request, wherein in case of a successful identified storage element (24) comprising said sufficient size a corresponding memory block is returned based on said usage information (33, 34) comprising said first status.
  2. 2. The method of claim 1, wherein said usage information is maintained by cache hardware and/or internal software.
  3. 3. The method of claims 1 or 2, wherein said usage information comprises at least one indicator bit (33) each representing a given number of bytes of a cache line (22), wherein said at least one indicator bit (33) represents said first status by a first logical level and said second status by a second logical level.
    -23 -
  4. 4. The method of claim 3, wherein an operation to free or release memory blocks verifies if a cache line (22) still holds data for a given address, wherein said operation to free or release memory blocks sets a corresponding indicator bit (33) to said first status in response to an identified cache line (22) still holding said data.
  5. 5. The method of claim 3 or 4, wherein said at least one corresponding indicator bit (33) is reset to said second logical level in response to a memory allocation request which could be satisfied from a storage element (24) within said cache (20).
  6. 6. The method according to one of the claims 3 to 5, wherein said usage information (33) is analyzed to identify a cache line (22) comprising a minimum of related indicator bits (33) with said first logical level and/or a maximum of related indicator bits (33) with said second logical level before performing a line-drop operation in said cache (20), wherein said identified cache line (22) is dropped during said line-drop operation.
  7. 7. The method of claim 6, wherein all indicator bits (33) of a related cache line (22) are reset to said second logical level in response to said line-drop operation.
  8. 8. The method according to one of the claims 3 to 7, further comprising: maintaining a table with storage pointers to storage element strings of various length (32), wherein each storage element (24) of said storage element strings is represented by an indicator bit (33) with said first logical level.
    -24 -
  9. 9. The method according to claims 1 or 2, wherein said usage information comprises at least one free list (34) holding list elements (35) each representing a given number of bytes of a cache line (22), wherein a free list (34) holding at least one list element (35) represents said first status of said usage information, and an empty free list (34) represents said second status of said usage information.
  10. 10. The method of claim 9, wherein said usage information comprises free lists (34) for different storage element sizes of said cache (20)
  11. 11. The method of claim 8 or 9, wherein an operation to free or release memory blocks verifies if a cache line (22) holds data for a given address or address block, wherein said operation to free or release memory blocks writes said corresponding address or address block as list elements (35) into corresponding free lists (34) in response to an identified cache line (22) still holding data.
  12. 12. The method according to one of the claims 9 to 11, wherein said memory allocation request verifies if an address represented by a first list element (35) of a chosen free list (34) still holds data in said cache (20) to identify an available storage element (24) in said cache comprising a sufficient size satisfying said memory allocation request.
  13. 13. The method according to one of the claims 9 to 12, wherein said corresponding first list element (35) is removed from said corresponding free list (34) in response to a memory allocation request which could be satisfied from a storage element (24) within said cache (20).
    -25 -
  14. 14. The method of claims 12 or 13, wherein all list elements (35) of all free list (34) are removed in response to a memory allocation request which could not be satisfied within said cache (20)
  15. 15. The method of claim 14, wherein new memory is allocated on a heap, wherein data is loaded from a memory into at least one corresponding cache line (22), wherein for all remaining memory elements (24) new list elements (35) are generated and written in at least one corresponding free list (34)
  16. 16. An apparatus of controlling a cache usage in a computer system (1), especially for performing said method according to one of the claims 1 to 15, said apparatus comprising: a usage information facility (31, 32) generating a usage information (33, 34), wherein said usage information (33, 34) comprising a first status indicates that a corresponding storage element (24) of said cache (20) is currently available to be used for allocation, wherein said usage information (33, 34) comprising a second status indicates that said corresponding storage element (24) of said cache (20) is currently not available to be used for allocation, wherein said information facility (31, 32) analyzes said usage information (33, 34) in response to a memory allocation request to identify an available storage element (24) in said cache (20) comprising a sufficient size satisfying said memory allocation request, wherein in case of a successful identified storage element (24) comprising said sufficient size said information facility (31, 32) returns a corresponding -26 -memory block based on said usage information (33, 34) comprising said first status (24)
  17. 17. The apparatus of claim 16, wherein said information facility (31, 32) comprises at least one indicator bit (33) each representing a given number of bytes of a cache line (22), or at least one free list (34) holding list elements (35) each representing a given number of bytes of a cache line (22) to generate said usage information, wherein said at least one indicator bit (33) represents said first status by a first logical level and said second status by a second logical level, and wherein a free list (34) holding at least one list element (35) represents said first status of said usage information, and an empty free list (34) represents said second status of said usage information.
  18. 18. The apparatus of claim 16 or 17, wherein said information facility (31, 32) is implemented as an entire hardware embodiment or an entire software embodiment or an embodiment containing both hardware and software elements.
  19. 19. A data processing program for execution in a data processing system comprising software code portions for performing a method of controlling a cache usage in a computer system (1) according to one of the preceding claims 1 to 15 when said program is run on said data processing system.
  20. 20. A computer program product stored on a computer-usable medium, comprising computer-readable program means for causing a computer to perform a method of controlling a cache usage in a computer system (1) according to one of -27 -the preceding claims 1 to 15 when said program is run on said computer.
GB0904430.6A 2008-04-28 2009-03-16 Method, apparatus, computer program product and data processing program of controlling cache usage in a computer system Active GB2456924B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
EP08155292 2008-04-28

Publications (3)

Publication Number Publication Date
GB0904430D0 GB0904430D0 (en) 2009-04-29
GB2456924A true GB2456924A (en) 2009-08-05
GB2456924B GB2456924B (en) 2012-03-07

Family

ID=40637362

Family Applications (1)

Application Number Title Priority Date Filing Date
GB0904430.6A Active GB2456924B (en) 2008-04-28 2009-03-16 Method, apparatus, computer program product and data processing program of controlling cache usage in a computer system

Country Status (1)

Country Link
GB (1) GB2456924B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6000014A (en) * 1997-04-14 1999-12-07 International Business Machines Corporation Software-managed programmable congruence class caching mechanism
US6016529A (en) * 1997-11-26 2000-01-18 Digital Equipment Corporation Memory allocation technique for maintaining an even distribution of cache page addresses within a data structure
US20040064651A1 (en) * 2002-09-30 2004-04-01 Patrick Conway Method and apparatus for reducing overhead in a data processing system with a cache
US20050091466A1 (en) * 2003-10-27 2005-04-28 Larson Douglas V. Method and program product for avoiding cache congestion by offsetting addresses while allocating memory

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6000014A (en) * 1997-04-14 1999-12-07 International Business Machines Corporation Software-managed programmable congruence class caching mechanism
US6016529A (en) * 1997-11-26 2000-01-18 Digital Equipment Corporation Memory allocation technique for maintaining an even distribution of cache page addresses within a data structure
US20040064651A1 (en) * 2002-09-30 2004-04-01 Patrick Conway Method and apparatus for reducing overhead in a data processing system with a cache
US20050091466A1 (en) * 2003-10-27 2005-04-28 Larson Douglas V. Method and program product for avoiding cache congestion by offsetting addresses while allocating memory

Also Published As

Publication number Publication date
GB0904430D0 (en) 2009-04-29
GB2456924B (en) 2012-03-07

Similar Documents

Publication Publication Date Title
US7502890B2 (en) Method and apparatus for dynamic priority-based cache replacement
JP5142995B2 (en) Memory page management
US7653799B2 (en) Method and apparatus for managing memory for dynamic promotion of virtual memory page sizes
US7689775B2 (en) System using stream prefetching history to improve data prefetching performance
US8214577B2 (en) Method of memory management for server-side scripting language runtime system
US7587566B2 (en) Realtime memory management via locking realtime threads and related data structures
US20130086564A1 (en) Methods and systems for optimizing execution of a program in an environment having simultaneously parallel and serial processing capability
JPH1196074A (en) Computer system for dynamically selecting exchange algorithm
CN107544926B (en) Processing system and memory access method thereof
US20160246724A1 (en) Cache controller for non-volatile memory
US20090210615A1 (en) Overlay management in a flash memory storage device
KR20210019584A (en) Multi-table branch target buffer
US20210182214A1 (en) Prefetch level demotion
US7320061B2 (en) Storage optimization for VARRAY columns
JP2006513493A (en) Managing memory by using a free buffer pool
US8266605B2 (en) Method and system for optimizing performance based on cache analysis
US8954969B2 (en) File system object node management
CN116225686A (en) CPU scheduling method and system for hybrid memory architecture
CN105930136A (en) Processor and instruction code generation device
CN103902369A (en) Cooperative thread array granularity context switch during trap handling
US10713164B1 (en) Cache hit ratio simulation using a partial data set
US8356141B2 (en) Identifying replacement memory pages from three page record lists
US20090320036A1 (en) File System Object Node Management
US10180839B2 (en) Apparatus for information processing with loop cache and associated methods
Banerjee et al. A New Proposed Hybrid Page Replacement Algorithm (HPRA) in Real Time Systems.

Legal Events

Date Code Title Description
746 Register noted 'licences of right' (sect. 46/1977)

Effective date: 20130402