WO2023088535A1

WO2023088535A1 - Cache eviction based on current tiering status

Info

Publication number: WO2023088535A1
Application number: PCT/EP2021/081802
Authority: WO
Inventors: Assaf Natanzon; Zvi Schneider
Original assignee: Huawei Technologies Co., Ltd.
Priority date: 2021-11-16
Filing date: 2021-11-16
Publication date: 2023-05-25
Also published as: CN118119932A

Abstract

There is provided a computing device for management of a cache stored on a memory of a hierarchical storage system that also includes multiple tier data storage devices, at least a lower tier storage device which has slow random access times and a higher tier storage device which has fast random access times. An eviction score is computed for each of the cache entries stored on the cache. The eviction score is based on a latency of re-reading the respective cache entry from the tier data storage device of the multiple tier data storage devices. The cache entry may be re-read from the tier data storage device upon a cache-miss. One or more cache entries are selected for eviction from the cache according to the eviction score. For example, cache entries are ranked by eviction scores, and cache entries with highest scores are evicted.

Description

CACHE EVICTION BASED ON CURRENT TIERING STATUS

BACKGROUND

The present disclosure, in some embodiments thereof, relates to cache management and, more specifically, but not exclusively, to systems, devices, and methods for cache eviction.

Storage systems typically use media of multiple types, so that performance will be optimized while price remains low. A typical storage system may have dynamic random access memory (DRAM) memory for cache which serves inputs and outputs (10s) very fast, and may include other tiers of cache based on SCM (storage class memory) and fast solid state driver (SSD). The SSD tier allows relatively fast random access for reads and write. The storage system may include a hard disk drive (HDD) tier, which allows relatively fast sequential reading and writing. The HDD has very poor performance for random IOs as the seek times in an HDD are very high and can be up to 10 milliseconds (ms).

SUMMARY

It is an object of the present invention to provide a computing device, a system, a computer program product, and a method for management of a cache of a hierarchical storage system.

The foregoing and other objects are achieved by the features of the independent claims. Further implementation forms are apparent from the dependent claims, the description and the figures.

According to a first aspect, a computing device for management of a cache of a hierarchical storage system, configured for: computing an eviction score for each of a plurality of cache entries stored in the cache, the eviction score based on a latency of re-reading the respective cache entry from a tier data storage device of a plurality of tier data storage devices, and selecting a cache entry for eviction from the cache according to the eviction score. According to a second aspect, a computer implemented method of management of a cache of a hierarchical storage system, comprises: computing an eviction score for each of a plurality of cache entries stored in the cache, the eviction score based on a latency of re-reading the respective cache entry from a tier data storage device of a plurality of tier data storage devices, and selecting a cache entry for eviction from the cache according to the eviction score.

According to a third aspect, a non-transitory medium storing program instructions for management of a cache of a hierarchical storage system, which, when executed by a processor, cause the processor to: compute an eviction score for each of a plurality of cache entries stored in the cache, the eviction score based on a latency of re-reading the respective cache entry from a tier data storage device of a plurality of tier data storage devices, and select a cache entry for eviction from the cache according to the eviction score.

Considering on which tier data storage device a cache entry selected for eviction is stored increases overall performance of the cache and/or of the hierarchical storage system.

In a further implementation form of the first, second, and third aspects a first tier data storage device that is higher than a second tier data storage device, the first tier data storage device has at least one of a lower latency delay and a faster access time, for reading data stored thereon than the second tier data storage device, wherein a first cache entry stored on the first tier data stored device is assigned the eviction score indicating higher likelihood to be selected for eviction over a second cache entry stored on the second tier data storage device, and the second cache entry stored on the second tier data storage device is assigned the eviction score indicating lower likelihood to be selected for eviction over the first cache entry.

Selecting the first cache entry stored on the first tier data storage device increases overall performance of the cache and/or of the hierarchical storage system, by lowering the latency penalty for re-reading the first cache entry back into the cache upon a cache miss, in comparison to a higher latency penalty that would be incurred for re-reading the second cache entry back into the cache from the slower second tier data storage device.

In a further implementation form of the first, second, and third aspects, the first tier data storage device comprises a solid state disk, SSD, the second tier data storage device comprises a hard disk drive, HDD, and the third tier data storage device comprises a storage class memory, SCM. Cache entries stored on the faster SSD are more likely to be evicted over cache entries stored on the HDD, since re-reading cache entries back from the SSD incurs a lower delay penalty, thereby increasing overall performance of the cache and/or of the hierarchical storage system.

In a further implementation form of the first, second, and third aspects, further comprising a third tier data storage device that is higher than the first tier data storage device, the third tier data storage device has at least one of a lower latency delay and a faster access time, for reading data stored thereon than the first tier data storage device, wherein a third cache entry stored on the third tier data storage device is assigned the eviction score indicating higher likelihood to be selected for eviction over the first cache entry stored on the first tier data storage device and the second cache entry stored on the second tier data storage device, and the first cache entry and the second cache entry are assigned eviction scores indicating lower likelihood to be selected for eviction over the third cache entry.

Cache entries stored on the very fast SCM are more likely to be evicted over cache entries stored on the SSD and the HDD, since re-reading cache entries back from the SCM incurs a much lower delay penalty, thereby increasing overall performance of the cache and/or of the hierarchical storage system.

In a further implementation form of the first, second, and third aspects, the eviction score denotes an optimization function for a total storage space of the plurality of tier data storage devices that is minimized for minimizing an average read latency of the plurality of tier data storage devices.

The optimization function is for the total storage space. When the optimization function is minimized, the average read latency for the total storage space is minimized, which increases overall performance of the total storage space.

In a further implementation form of the first, second, and third aspects, the eviction score for the cache entry is further computed based on combination of a prediction of likelihood of a read of the cache entry during a future time interval, and the latency, wherein a first cache entry with relatively high predicted likelihood during the future time interval is assigned the eviction score indicating lower likelihood of eviction in comparison to a second cache entry with relatively low predicted likelihood during the future time interval. Considering the likelihood of future reads of the cache entry in the eviction score increases overall performance of the cache and/or hierarchical storage system, by optimizing between likelihood of future reads and penalty for reading from the tier data storage device.

In a further implementation form of the first, second, and third aspects, the eviction score is computed by multiplying a probability of a predicted likelihood of a future read of the cache entry by a latency parameter indicating latency incurred when re-reading the cache entry from a tier data storage device.

When the latency is very high, data may be unlikely to be evicted even when the probability of a future read is fairly low, which increases overall performance of the cache and/or hierarchical storage system.

In a further implementation form of the first, second, and third aspects, the latency parameter indicates an end to end latency incurred when re-reading the cache entry.

In a further implementation form of the first, second, and third aspects, the eviction score is further based on network latency delay when re-reading the respective cache entry from the tier data storage device.

Considering network latency delay, in addition to the delay from the read, provides a better indication of total latency, which is used to improve overall performance of the cache and/or hierarchical storage system.

In a further implementation form of the first, second, and third aspects, the cache is implemented as a priority queue, wherein cache entries are queued according to ranking of eviction scores.

The priority queue based on eviction scores enables fast selection of the next cache entry to be evicted.

In a further implementation form of the first, second, and third aspects, each of the plurality of tier data storage devices has a constant multiplier.

The constant multiplier enables computationally efficient determination of the eviction scores. In a further implementation form of the first, second, and third aspects, the cache comprises a plurality of cache tier, each cache tier corresponding to a tier data storage device, wherein a size of each cache tier is proportional to a delay incurred when re-reading a cache entry from the corresponding tier data storage device.

Dividing the cache into different cache tiers may simplify the process for computing eviction scores and/or selecting cache entries for eviction.

Unless otherwise defined, all technical and/or scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the disclosure pertains. Although methods and materials similar or equivalent to those described herein can be used in the practice or testing of embodiments of the disclosure, exemplary methods and/or materials are described below. In case of conflict, the patent specification, including definitions, will control. In addition, the materials, methods, and examples are illustrative only and are not intended to be necessarily limiting.

BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS

Some embodiments of the disclosure are herein described, by way of example only, with reference to the accompanying drawings. With specific reference now to the drawings in detail, it is stressed that the particulars shown are by way of example and for purposes of illustrative discussion of embodiments of the disclosure. In this regard, the description taken with the drawings makes apparent to those skilled in the art how embodiments of the disclosure may be practiced.

In the drawings:

FIG. 1 is a flowchart of a method of evicting a cache entry according to an eviction score computed based on latency of re-reading the cache entry from a tier data storage device of multiple tier data storage devices, in accordance with some embodiments; and

FIG. 2 is a block diagram of components of a system for evicting a cache entry according to an eviction score computed based on latency of re-reading the cache entry from a tier data storage device of multiple tier data storage devices, in accordance with some embodiments. DETAILED DESCRIPTION

An aspect of some embodiments relates to systems, methods, a computing device and/or apparatus, and/or computer program product (storing code instructions executable by one or more processors) for management of a cache stored on a memory of a hierarchical storage system that also includes multiple tier data storage devices, at least a lower tier storage device which has slow random access times (but fast access for sequentially stored data) and a higher tier storage device which has fast random access times. An eviction score is computed for each one (or individually, for some) of the cache entries stored on the cache. The eviction score is based on a latency of re-reading the respective cache entry from the tier data storage device of the multiple tier data storage devices, for example, according to whether the respective cache entry is re-read from the lower tier data storage device or from the higher tier data storage device. The cache entry may be re-read from the tier data storage device, for example, upon a cache-miss. One or more cache entries are selected for eviction from the cache according to the eviction score. For example, cache entries are ranked by eviction scores, and cache entries with highest scores are evicted.

Cache systems (e.g., read cache) store volatile data, i.e., data that may be lost when the system crashes. It is noted that this is the main difference between a cache and a data storage tier. Data stored on a data storage tier is persistent and is provided with protection against failures and thus may utilize erasure codes. Since the cache is limited, caching mechanisms apply an eviction process which deletes data from the cache to allow insertion of new data. Typical caching systems use different approaches to decide which data to evict from the cache. For example, least recently used (LRU) based approaches evict the least recently used data from the cache. In another example, cache systems apply machine learning based approaches which evict data based on prediction of when the data will be accessed next, which may be dependent on how recently the data was used, but also on other factors, such as multiple last accesses to the data and/or relation between the data and other data pieces. At least some implementations described herein improve over standard approaches of selecting cache entries for eviction. Such standard approaches are dependent only on data visible to the cache, for example, LRU and machine learning based approaches. In contrast, at least some implementations described herein consider the tiering status of the hierarchical storage system when selecting cache entries for eviction. For example, the cache entry is selected for eviction based on the tier data storage device where the evicted entry is stored, such as whether the evicted entry is stored on HDD or SSD. Even in situations when the cache hit ratio is decreased in comparison to standard cache eviction approaches, the overall system performance of the cache and/or of the hierarchical storage system is increased in comparison to the overall performance using standard cache eviction approaches.

When there is a need to evict data, the data with lowest eviction score (or highest eviction score, depending on how the eviction score is determined) may be removed from the cache. For example, every time a new piece of data arrives, or data may be fetched (i.e., read), when space in the cache is full above a threshold (e.g., 95% full), some amount (e.g., 5%) of the cache entries are removed from the cache, so that the cache will be lower than another threshold (e.g., 90% in the described example).

Before explaining at least one embodiment of the disclosure in detail, it is to be understood that the disclosure is not necessarily limited in its application to the details of construction and the arrangement of the components and/or methods set forth in the following description and/or illustrated in the drawings and/or the Examples. The disclosure is capable of other embodiments or of being practiced or carried out in various ways.

The present disclosure may be a system, a method, and/or a computer program product. The computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present disclosure.

The computer readable storage medium can be a tangible device that can retain and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. Computer readable program instructions described herein can be downloaded to respective computing/processing devices from a computer readable storage medium or to an external computer or external storage device via a network, for example, the Internet, a local area network, a wide area network and/or a wireless network.

The computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider). In some embodiments, electronic circuitry including, for example, programmable logic circuitry, field-programmable gate arrays (FPGA), or programmable logic arrays (PLA) may execute the computer readable program instructions by utilizing state information of the computer readable program instructions to personalize the electronic circuitry, in order to perform aspects of the present disclosure.

Aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer readable program instructions.

The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts or carry out combinations of special purpose hardware and computer instructions.

Reference is now made to FIG. 1, which is a flowchart of a method of evicting a cache entry according to an eviction score computed based on latency of re-reading the cache entry from a tier data storage device of multiple tier data storage devices, in accordance with some embodiments. Reference is also made to FIG. 2, which is a block diagram of components of a system 200 for evicting a cache entry according to an eviction score computed based on latency of re-reading the cache entry from a tier data storage device of multiple tier data storage devices, in accordance with some embodiments. System 200 may implement the acts of the method described with reference to FIG. 1, by processor(s) 202 of a computing device 204 executing code instructions (e.g., code 206A) stored in a memory 206.

Computing device 204 manages a cache 206B of memory 206.

Memory 206 storing cache portion 206B may be implemented, for example, as dynamic random access memory (DRAM) and/or storage class memory (SCM). Memory 206 storing cache portion 206B may be selected to have low access times. Cost of memory 206 storing cache 206B may be high, limiting the amount of storage available for cache 206B.

Computing device 204 further manages a hierarchical storage 208 that includes multiple (i.e., two or more) tier data storage devices. The multiple tier data storage devices are implemented as two or more different types of storage devices.

In an exemplary implementation, hierarchical storage 208 includes at least a lower tier data storage device 210 and a higher tier data storage device 212. Data stored in a specific tier data storage device is read and stored as a cache entry in cache 206B. Lower tier data storage device 210 may store a lower data storage tier. Higher tier data storage device 212 may store a higher data storage tier. It is noted that there may be three or more tier data storage devices, with increased performance (i.e., at least decreasing latency from reading data stored thereon) and increasing costs (which limits the amount of available storage space).

The highest tier may provide the best performance, while the lowest tier provides poor performance but has much lower price, enabling a large storage space. The most active data may be stored on the higher tier. The highest tier may be significantly larger than the cache for effectiveness, as the highest tier performs more poorly then the cache, but is cheaper than the cache, enabling for the larger storage capacity. The performance of the storage system may be measured by the number of IOs capable of being served per second and/or the average latency of each IO. In some implementation, the higher storage tier will is implemented as SSD with latency of ~0.1 milliseconds (ms) while the lower tier is based on HDD with random access latency of 5-10 ms.

Lower tier data storage device 210 has relatively slower random access input/output (IO) (e.g., read) times in comparison to higher tier data storage device 212. Higher tier data storage device 212 has relatively faster random I/O (e.g., read and/or write) times in comparison to lower tier data storage device 210.

Lower tier data storage device 210 may cost less (e.g., per megabyte) in comparison to higher tier data storage device 212.

Lower tier data storage device 210 may be implemented, for example, as a hard disk drive (HDD). Lower tier data storage device 210 may provide fast sequential reading and/or writing, but has poor performance for random I/O as the seek times may be very high (e.g., up to 10 milliseconds).

Higher tier data storage device 212 may be implemented, for example, as a solid state drive (SSD), and/or phase-change memory (PCM).

Cache portion 206C may serve as the cache for hierarchical storage 208, such as for cache entries with highest hit rates.

Hierarchical storage 208 may be in communication with a computing system 214, which stores data on hierarchical storage 208 and/or reads data stored on hierarchical storage 208. Hierarchical storage 208 may be integrated within computing system 214, and/or may be implemented as an external storage device. Computing system 214 may be indirectly connected to hierarchical storage 208 via computing device 204, i.e., computing system 214 may communicate with computing device 204, where computing device 204 communicates with hierarchical storage 208, rather than computing system 214 directly communicating with hierarchical storage 208.

Computing system 214 and/or computing device 204 may be implemented as, for example, one of more of a single device, a cluster of multiple devices, a computing cloud, a cloud network, a computer network, a virtual machine(s) (e.g., hypervisor, virtual server), a network node (e.g., switch, a virtual network, a router, a virtual router), a single computing device (e.g., client terminal), a group of computing devices arranged in parallel, a network server, a web server, a storage server, a local server, a remote server, a client terminal, a mobile device, a stationary device, a kiosk, a smartphone, a laptop, a tablet computer, a wearable computing device, a glasses computing device, a watch computing device, and a desktop computer.

Optionally, hierarchical storage 208 is used exclusively by a single user such as a computing device 214. Alternatively, hierarchical storage 208 is used by multiple users such as multiple client terminals 216 accessing hierarchical storage 208 over a network 218, for example, computing system 214 provides cloud storage services and/or virtual storage services to client terminals 216.

Computing device 204 may be implemented as, for example, integrated within hierarchical storage 208 (e.g., as hardware and/or software installed within hierarchical storage 208), integrated within computing system 214 (e.g., as hardware and/or software installed within computing system 214, such as an accelerator chip and/or code stored on a memory of computing system 214 and executed by processor of computing system 214), and/or as an external component (e.g., implemented as hardware and/or software) in communication with hierarchical storage 208, such as a plug-in component. Optionally, hierarchical storage 208 and computing device 204 are implemented as one storage system that exposes storage (e.g., functions, features, capabilities) to computing system(s) 214.

Computing device 204 includes one or more processor(s) 202, implemented as for example, central processing unit(s) (CPU), graphics processing unit(s) (GPU), field programmable gate array(s) (FPGA), digital signal processor(s) (DSP), application specific integrated circuit(s) (ASIC), customized circuit(s), processors for interfacing with other units, and/or specialized hardware accelerators. Processor(s) 202 may be implemented as a single processor, a multi-core processor, and/or a cluster of processors arranged for parallel processing (which may include homogenous and/or heterogeneous processor architectures). It is noted that processor(s) 202 may be designed to implement in hardware one or more features stored as code instructions 206A.

Memory 206 stores code instructions implementable by processor(s) 202, for example, a random access memory (RAM), read-only memory (ROM), and/or a storage device, for example, non-volatile memory, magnetic media, semiconductor memory devices, hard drive, removable storage, and optical media (e.g., DVD, CD-ROM). Memory 206 may store code 206A that when executed by processor(s) 208, implement one or more acts of the method described with reference to FIG. 1.

Computing device 204 may include a data storage device 220 for storing data. Data storage device 220 may be implemented as, for example, a memory, a local hard-drive, a removable storage unit, an optical disk, a storage device, and/or as a remote server and/or computing cloud (e.g., accessed using a network connection). It is noted that code instructions executable by processor(s) 202 may be stored in data storage device 220, for example, with executing portions loaded into memory 206 for execution by processor(s) 202.

Computing device 204 (and/or computing system 214) may be in communication with a user interface 222 that presents data to a user and/or includes a mechanism for entry of data, for example, one or more of a touch-screen, a display, a keyboard, a mouse, voice activated software, and a microphone.

Network 218 may be implemented as, for example, the internet, a local area network, a virtual private network, a virtual public network, a wireless network, a cellular network, a local bus, a point to point link (e.g., wired), and/or combinations of the aforementioned.

Referring now back to FIG. 1, at 102, an eviction score is computed for each one (or at least some individual) cache entries stored in the cache. The eviction score is based on a latency of re-reading the respective cache entry from a tier data storage device of the multiple tier data storage devices, i.e., re-reading the data corresponding to the cache entry from the tier data storage device that persistently stores the data.

Alternatively or additionally, the eviction score may depend on the likelihood the entry will be read again, for example, based on a classical cache which uses LRU or other cache eviction mechanisms.

Optionally, the eviction score denotes an optimization function for a total storage space of the multiple tier data storage devices. The optimization function is minimized for minimizing an average read latency of the multiple tier data storage devices. The optimization function may be based on the total storage space of the hierarchical storage system, included in the multiple tier data storage devices. When the optimization function is minimized, the average read latency for the total storage space is minimized, which increases overall performance of the total storage space.

Alternatively or additionally, the eviction score for the cache entry is further computed based on number of IOs that the respective tier data storage device is capable of processing per certain time interval.

Alternatively or additionally, the eviction score for the cache entry is computed based on combination of a prediction of likelihood of a read of the cache entry during a future time interval, and the latency. In such implementation, a first cache entry with relatively high predicted likelihood during the future time interval is assigned the eviction score indicating lower likelihood of eviction in comparison to a second cache entry with relatively low predicted likelihood during the future time interval. Cache systems may keep the data which is most likely to be read in the next time interval of data reads (e.g., the time interval may be an amount of data which is in the order of the size of the cache, i.e., the time interval may depend on the size of the cache). When there is high probability that a certain data will be read before new data is read which is the size of the cache, the certain data is maintained in the cache. The prediction may be computed by a prediction process that computes for each cache entry (e.g., for each logical block addressing (LB A)) a probability that a read will occur for the cache entry in the next n-number of reads (and/or other future time interval). Considering the likelihood of future reads of the cache entry in the eviction score increases overall performance of the cache and/or hierarchical storage system, by optimizing between likelihood of future reads and penalty for reading from the tier data storage device.

Alternatively or additionally, the eviction score is computed by multiplying a probability of a predicted likelihood of a future read of the cache entry by a latency parameter indicating latency incurred when re-reading the cache entry from a tier data storage device. The latency parameter may indicate an end to end latency incurred when re-reading the cache entry. Other implementations of the latency parameters may be used. When the latency is very high, data may be unlikely to be evicted even when the probability of a future read is fairly low, which increases overall performance of the cache and/or hierarchical storage system. For example, since latency for reading data from the HDD is very high, cache entries which do not have corresponding data stored on the HDD are much more likely to be evicted from the cache than the cache entry having corresponding data stored on the HDD, even when the probability of future reads of the cache entry having corresponding data stored on the HDD is fairly low. Alternatively or additionally, the eviction score is further based on network latency delay when re-reading the respective cache entry from the tier data storage device. Since the network also adds significant amount of latency when reading from storage, a typical read even when data is in DRAM may take about 10-30 microseconds. This means that the actual impact of reading data from SCM compared to DRAM is relatively low. Reading data from NAND may triple or quadruple the total read time. Reading from HDD will cause a large delay of up to 10 milliseconds (ms). Considering network latency delay, in addition to the delay from the read, provides a better indication of total latency, which is used to improve overall performance of the cache and/or hierarchical storage system. For example, cache entries having corresponding data which is stored on SCM may not necessarily be cache, due to the low latency of reading from SCM. Data which is stored on SSD may be stored as cache entries, especially when the data is extremely hot, due to the relatively longer latency of reading from SSD.

Alternatively or additionally, the eviction score is computed based on a constant multiplier assigned to each one of the tier data storage devices. In some embodiments, increasing values of the multiplier may be associated with relatively longer access times. For example, the multiplier for HDD is 100, the multiplier for SSD is 10, and the multiplier for SCM is 1. The constant multiplier enables computationally efficient determination of the eviction scores.

Alternatively or additionally, the eviction score is computed based on one or more other parameters computed for each respective cache entry and/or for data stored on the tier data storage device(s) corresponding to the cache entry, for example, one or more of reads, sequential reads, size of reads, writes, sequential writes, and size of writes, and statistical data parameters for data chunks (e.g., for each data chunk).

The other parameters and/or the eviction score may be dynamically decayed. The decay may be performed by multiplying a current parameter and/or a current eviction score by a decay value less than 1, every time interval to obtain an adapted parameter and/or updated eviction score. Other decay approaches may be used, for example, linear, logarithmic, dynamic changing values, and the like. The eviction score may be computed using the decayed parameter. The decaying approach may prevent increasing the value of the parameter indefinitely, and/or maintains the value of the parameter at a reasonable state that enables processing in a reasonable time. For example, every 5 minutes the number of reads (an example of the parameter of the access pattern) is multiplied by 0.99, such that if there are currently 100 reads, after 5 minutes the number of reads is reduced to 99.

Cache granularity is usually finer than the granularity of the tiering. For example, a cache may operate and/or store data at the block level. Blocks may be the smallest granularity that are operated by the storage system. A user may read and/or write a single block and/or multiple blocks. Blocks may be of a size between about 0.5-32 kilobytes (KB), or other ranges. In contrast, the tiering operates and/or stores data at the chunk level. A chunk may be a continuous address space of, for example, 4 megabytes (MB) or other values. The other parameter(s) may be computed per individual data chunk (e.g., each data chunk), where an individual data chunk includes multiple sequentially stored data blocks.

Optionally, the cache includes (e.g., is divided into) multiple cache tiers. Each cache tier corresponds to a respective tier data storage device, for example, in a 1 : 1 correspondence. For example, in a hierarchical storage system with 3 data storage tiers, such as HDD, SSD, and SCM, three cache tiers are set. A size of each cache tier may be proportional to a delay incurred when re-reading a cache entry from the corresponding tier data storage device. Dividing the cache into different cache tiers may simplify the process for computing eviction scores and/or selecting cache entries for eviction. For example, a large cache queue may be set for data stored in HDD, a smaller cache for data stored on SSD, and a much smaller cache or no cache for data stored on SCM. For example, 90% of the cache may be allocated to data stored on HDD, 9% for data stored on SSD, and 1% for data stored on SCM, or other allocations. Alternatively, a single main cache may be implemented as described herein, with the eviction policy based on the eviction score computed based on added latency, as described herein.

For clarity, it is noted that the cache tiers described herein may be implemented as different cache layers. The cache tiers (e.g., cache layers) are stored in the memory (e.g., DRAM, SCM). The cache tiers (e.g., cache layers) are not stored on another data storage device such as hard disk and/or SSD. The hard disk and/or SSD, which are part of the hierarchical storage system, serve as tiered data storage devices that store data storage tiers, and do not store the cache tiers (e.g., cache layers). As such, it is clarified that the data storage tiers (e.g., stored by the hierarchical storage system, such as hard disk and/or SSD) are different than the cache tiers stored in the memory (e.g., DRAM, SCM). It is noted that the data storage tiers are persistent. Data is stored in only one data storage tier (e.g., the higher tier, or the lower tier). In contrast, the cache is volatile, and data stored in the cache is also stored on a data storage tier. At 104, one or more cache entries are selected for eviction from the cache according to the eviction score. For example, the eviction score may be on a scale of 0-1, or -0-10, or 0-100 (or other scales). Cache entries with relatively higher values are selected for eviction over cache entries with relatively lower values. It is noted that depending on how the eviction score is computed, the reverse case be implemented, where cache entries with relatively lower values are selected for eviction over cache entries with relatively higher values.

The eviction score may indicate likelihood of being selected for eviction.

The number of cache entries selected for eviction, and/or the total size of data included in the cache entries selected for eviction, may be set and/or selected and/or controlled by another process.

In an exemplary, not necessarily limiting, implementation, a first (e.g., higher) tier data storage device that is higher than a second (e.g., lower) tier data storage device has a lower latency delay for reading data stored thereon. A first cache entry stored on the first tier data stored device is assigned the eviction score indicating higher likelihood to be selected for eviction over a second cache entry stored on the second tier data storage device. The second cache entry stored on the second tier data storage device is assigned the eviction score indicating lower likelihood to be selected for eviction over the first cache entry. Selecting the first cache entry stored on the first tier data storage device increases overall performance of the cache and/or of the hierarchical storage system, by lowering the latency penalty for re-reading the first cache entry back into the cache upon a cache miss, in comparison to a higher latency penalty that would be incurred for re-reading the second cache entry back into the cache from the slower second tier data storage device. Since the second tier data storage device is the slowest tier in an example implementation of the hierarchical storage system, a cache read miss that requires fetching data from the second tier data storage device incurs a higher delay penalty. Cache entries which have corresponding data stored on the second tier data storage device are cached with higher priority in the cache, indicating lower likelihood of eviction.

In an exemplary, not necessarily limiting, implementation, the first tier data storage device is implemented as a SSD. The second tier data storage device is implemented as a HDD. The first cache entry stored on the SSD is assigned the eviction score indicating higher likelihood to be selected for eviction over the second cache entry stored on the HDD. The second cache entry stored on the HDD is assigned the eviction score indicating lower likelihood to be selected for eviction over the first cache entry stored on the SSD. Cache entries for which the data cached is stored on the faster SSD are more likely to be evicted over cache entries stored on the HDD, since re-reading cache entries back from the SSD incurs a lower delay penalty, thereby increasing overall performance of the cache and/or of the hierarchical storage system.

Optionally, in the exemplary, not necessarily limiting, implementation, the hierarchical storage system further includes a SCM tier data storage device. A third cache entry stored on the SCM is assigned the eviction score indicating higher likelihood to be selected for eviction over the first cache entry stored on the SSD and the second cache entry stored on the HDD. The first cache entry and the second cache entry are assigned eviction scores indicating lower likelihood to be selected for eviction over the third cache entry. Cache entries stored on the very fast SCM are more likely to be evicted over cache entries stored on the SSD and the HDD, since re-reading cache entries back from the SCM incurs a much lower delay penalty, thereby increasing overall performance of the cache and/or of the hierarchical storage system. For example, since DRAM is about 10 times faster than SCM, about 1000 times faster than NAND and about 100000 times faster than HDD, the penalty for reading data from HDD is very high.

Optionally, the cache is implemented as a priority queue, where cache entries are queued according to ranking of eviction scores. The priority queue based on eviction scores enables fast selection of the next cache entry to be evicted.

At 106, the cache entries selected for eviction are evicted from the cache.

At 108, features 102-106 are iterated, for example, each iteration is performed during a predefined time interval, and/or iterations are triggered by events, such cache misses. Eviction scores may be maintained over iterations when conditions affecting the value of the eviction score have not changed, and/or re-computed over iterations when conditions affecting the value of the eviction score have likely changed. Since the tiering system may also move data between tiers, the cache scores may also change after a tier up/down operation(s).

Optionally, iterations are triggered in response to detected cache events for cache entries. The cache may be monitored to detect cache access patterns. Examples of cache events and/or cache access patterns that are monitored for include a read miss for the cache entry and/or an eviction of the cache entry. Other cache events and/or cache access patterns, and/or collected data parameters used to detect the cache event and/or cache access patterns, include for example, reads, sequential reads, size of reads, writes, sequential writes, and size of writes, and statistical data parameters for data chunks (e.g., for each data chunk).

Other systems, methods, features, and advantages of the present disclosure will be or become apparent to one with skill in the art upon examination of the following drawings and detailed description. It is intended that all such additional systems, methods, features, and advantages be included within this description, be within the scope of the present disclosure, and be protected by the accompanying claims.

The descriptions of the various embodiments of the present disclosure have been presented for purposes of illustration, but are not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein was chosen to best explain the principles of the embodiments, the practical application or technical improvement over technologies found in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

It is expected that during the life of a patent maturing from this application many relevant tier data storage devices will be developed and the scope of the term tier data storage device is intended to include all such new technologies a priori.

As used herein the term “about” refers to ± 10 %.

The terms "comprises", "comprising", "includes", "including", “having” and their conjugates mean "including but not limited to". This term encompasses the terms "consisting of' and "consisting essentially of'.

The phrase "consisting essentially of means that the composition or method may include additional ingredients and/or steps, but only if the additional ingredients and/or steps do not materially alter the basic and novel characteristics of the claimed composition or method.

As used herein, the singular form "a", "an" and "the" include plural references unless the context clearly dictates otherwise. For example, the term "a compound" or "at least one compound" may include a plurality of compounds, including mixtures thereof. The word “exemplary” is used herein to mean “serving as an example, instance or illustration”. Any embodiment described as “exemplary” is not necessarily to be construed as preferred or advantageous over other embodiments and/or to exclude the incorporation of features from other embodiments.

The word “optionally” is used herein to mean “is provided in some embodiments and not provided in other embodiments”. Any particular embodiment of the disclosure may include a plurality of “optional” features unless such features conflict.

Throughout this application, various embodiments of this disclosure may be presented in a range format. It should be understood that the description in range format is merely for convenience and brevity and should not be construed as an inflexible limitation on the scope of the disclosure. Accordingly, the description of a range should be considered to have specifically disclosed all the possible subranges as well as individual numerical values within that range. For example, description of a range such as from 1 to 6 should be considered to have specifically disclosed subranges such as from 1 to 3, from 1 to 4, from 1 to 5, from 2 to 4, from 2 to 6, from 3 to 6 etc., as well as individual numbers within that range, for example, 1, 2, 3, 4, 5, and 6. This applies regardless of the breadth of the range.

Whenever a numerical range is indicated herein, it is meant to include any cited numeral (fractional or integral) within the indicated range. The phrases “ranging/ranges between” a first indicate number and a second indicate number and “ranging/ranges from” a first indicate number “to” a second indicate number are used herein interchangeably and are meant to include the first and second indicated numbers and all the fractional and integral numerals therebetween.

It is appreciated that certain features of the disclosure, which are, for clarity, described in the context of separate embodiments, may also be provided in combination in a single embodiment. Conversely, various features of the disclosure, which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable subcombination or as suitable in any other described embodiment of the disclosure. Certain features described in the context of various embodiments are not to be considered essential features of those embodiments, unless the embodiment is inoperative without those elements.

It is the intent of the applicant(s) that all publications, patents and patent applications referred to in this specification are to be incorporated in their entirety by reference into the specification, as if each individual publication, patent or patent application was specifically and individually noted when referenced that it is to be incorporated herein by reference. In addition, citation or identification of any reference in this application shall not be construed as an admission that such reference is available as prior art to the present invention. To the extent that section headings are used, they should not be construed as necessarily limiting. In addition, any priority document(s) of this application is/are hereby incorporated herein by reference in its/their entirety.

Claims

WHAT IS CLAIMED IS:

1. A computing device (204) for management of a cache (206B) of a hierarchical storage system (208), configured for: computing an eviction score for each of a plurality of cache entries stored in the cache, the eviction score based on a latency of re-reading the respective cache entry from a tier data storage device of a plurality of tier data storage devices; and selecting a cache entry for eviction from the cache according to the eviction score.

2. The computing device of claim 1, wherein for a first tier data storage device (212) that is higher than a second tier data storage device (210), the first tier data storage device has at least one of a lower latency delay and a faster access time, for reading data stored thereon than the second tier data storage device, wherein a first cache entry stored on the first tier data stored device is assigned the eviction score indicating higher likelihood to be selected for eviction over a second cache entry stored on the second tier data storage device, and the second cache entry stored on the second tier data storage device is assigned the eviction score indicating lower likelihood to be selected for eviction over the first cache entry.

3. The computing device of claim 2, further comprising a third tier data storage device that is higher than the first tier data storage device, the third tier data storage device has at least one of a lower latency delay and a faster access time, for reading data stored thereon than the first tier data storage device, wherein a third cache entry stored on the third tier data storage device is assigned the eviction score indicating higher likelihood to be selected for eviction over the first cache entry stored on the first tier data storage device and the second cache entry stored on the second tier data storage device, and the first cache entry and the second cache entry are assigned eviction scores indicating lower likelihood to be selected for eviction over the third cache entry.

4 The computing device of claims 2-3, wherein the first tier data storage device comprises a solid state disk, SSD, the second tier data storage device comprises a hard disk drive, HDD, and the third tier data storage device comprises a storage class memory, SCM.

5. The computing device of any of the previous claims, wherein the eviction score denotes an optimization function for a total storage space of the plurality of tier data storage devices that is minimized for minimizing an average read latency of the plurality of tier data storage devices.

6. The computing device of any of the previous claims, wherein the eviction score for the cache entry is further computed based on combination of a prediction of likelihood of a read of the cache entry during a future time interval, and the latency, wherein a first cache entry with relatively high predicted likelihood during the future time interval is assigned the eviction score indicating lower likelihood of eviction in comparison to a second cache entry with relatively low predicted likelihood during the future time interval.

7. The computing device of any of the previous claims, wherein the eviction score is computed by multiplying a probability of a predicted likelihood of a future read of the cache entry by a latency parameter indicating latency incurred when re-reading the cache entry from a tier data storage device.

8. The computing device of claim 7, wherein the latency parameter indicates an end to end latency incurred when re-reading the cache entry.

9. The computing device of any of the previous claims, wherein the eviction score is further based on network latency delay when re-reading the respective cache entry from the tier data storage device.

10. The computing device of any of the previous claims, wherein the cache is implemented as a priority queue, wherein cache entries are queued according to ranking of eviction scores.

11. The computing device of any of the previous claims, wherein each of the plurality of tier data storage devices has a constant multiplier.

12. The computing device of any of the previous claims, wherein the cache comprises a plurality of cache tier, each cache tier corresponding to a tier data storage device, wherein a size of each cache tier is proportional to a delay incurred when re-reading a cache entry from the corresponding tier data storage device.

13. A computer implemented method of management of a cache of a hierarchical storage system, comprising: computing an eviction score for each of a plurality of cache entries stored in the cache (102), the eviction score based on a latency of re-reading the respective cache entry from a tier data storage device of a plurality of tier data storage devices; and selecting a cache entry for eviction from the cache according to the eviction score (106).

14. A non-transitory medium (206) storing program instructions (206 A) for management of a cache (206B) of a hierarchical storage system (208), which, when executed by a processor (202), cause the processor to: compute an eviction score for each of a plurality of cache entries stored in the cache, the eviction score based on a latency of re-reading the respective cache entry from a tier data storage device of a plurality of tier data storage devices; and select a cache entry for eviction from the cache according to the eviction score.