CN117916726A

CN117916726A - Intelligent defragmentation of data storage systems

Info

Publication number: CN117916726A
Application number: CN202180102141.2A
Authority: CN
Inventors: 阿萨夫·纳塔逊; 兹维·施耐德
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2021-10-13
Filing date: 2021-10-13
Publication date: 2024-04-19
Also published as: WO2023061569A1

Abstract

A computing device is provided for managing a tiered storage system that includes a low tier storage device having a long random access time and a high tier storage device having a short random access time. Based on the analysis of the monitored access patterns, the data block group is moved to the lower level and/or the higher level portion to improve access efficiency. The higher level hierarchy is divided into a write cache portion and a hot portion. The write cache portion stores sequentially written data block groups that may be accessed in the future. The hot portion stores groups of data blocks that may be written non-sequentially that are accessed randomly in the future. The lower layer is divided into a cold portion storing non-sequential data block groups and a sequential portion storing sequential data block groups. Defragmentation of the groups of data blocks stored in the sequential portion is performed. The data block groups stored in the other portions are not defragmented.

Description

Intelligent defragmentation of data storage systems

Background

In some embodiments of the invention, the invention relates to defragmentation, and more particularly, but not exclusively, to defragmentation of a hierarchical storage system.

In the maintenance of file systems, defragmentation is a process that reduces the degree of defragmentation. Defragmentation is achieved by physically organizing the contents of the mass storage device used to store the files into a minimum number of contiguous areas (segments, chunks). It also attempts to create a larger free space area using compression to prevent fragments from returning. Some defragmentation utilities attempt to save smaller files together in a single directory, as they are typically accessed sequentially.

Hierarchical storage systems are managed by automatically moving data between high cost, slower speed storage media and low cost, faster speed storage media. While it is desirable to store all data on a fast storage medium that provides fast read access, in practice this approach is costly. Most of the data is stored on a lower cost but slower speed storage medium. Some data is moved to a higher cost but faster storage medium with the goal of fully achieving a trade-off between access time and cost of the storage medium. A more costly but faster storage medium may act as a cache for a less costly but slower storage medium.

Disclosure of Invention

It is an object of the present invention to provide a computing device, system, computer program product and method for hierarchical storage management.

The above and other objects are achieved by the features as claimed in the independent claims. Other embodiments are apparent from the dependent claims, the description and the drawings.

According to a first aspect, a computing device for hierarchical storage management is configured to: monitoring access patterns to a plurality of data block groups located on a low-level data storage device and a high-level data storage device; moving the set of data blocks to a sequential portion of the low-level data store or a non-sequential portion of the low-level data store or the high-level data store based on an analysis of the access pattern of the set of data blocks; defragmentation of the groups of data blocks stored in the sequential portion is performed.

According to a second aspect, a computer-implemented hierarchical storage management method includes: monitoring access patterns to a plurality of data block groups located on a low-level data storage device and a high-level data storage device; moving the set of data blocks to a sequential portion of the low-level data store or a non-sequential portion of the low-level data store or the high-level data store based on an analysis of the access pattern of the set of data blocks; defragmentation of the groups of data blocks stored in the sequential portion is performed.

According to a third aspect, a non-transitory medium stores program instructions for hierarchical storage management, which when executed by a processor, cause the processor to: monitoring access patterns to a plurality of data block groups located on a low-level data storage device and a high-level data storage device; moving the set of data blocks to a sequential portion of the low-level data store or a non-sequential portion of the low-level data store or the high-level data store based on an analysis of the access pattern of the set of data blocks; defragmentation of the groups of data blocks stored in the sequential portion is performed.

The performance efficiency of defragmentation processes performed on data storage systems comprising low-level data storage devices and high-level data storage devices is improved.

In another implementation of the first, second and third aspects, the group of data blocks is moved to the sequential portion in response to the access pattern indicating active data having a large and/or sequential and/or likely to be accessed IO operations within a future time interval.

The sequential portion of the low-level data storage device provides high access efficiency for sequential access operations, particularly for large amounts of sequential storage data.

In another implementation of the first, second and third aspects, the high-level data storage device includes a write cache portion, and the group of data blocks is moved to the write cache portion in response to the access pattern indicating sequential writing of the group of data blocks.

Dividing the low-level data storage device and the high-level data storage device into different portions improves access efficiency to groups of data blocks stored in the respective portions and/or improves overall access efficiency to groups of data blocks stored in the data storage system.

The data block groups sequentially written to the write cache portion are aggregated therein so that the sequentially arranged data block groups can be efficiently transferred to the low-level data storage device.

In another implementation of the first, second and third aspects, the low-level data storage device includes a cold portion and, in response to the access pattern indicating a set of write data blocks that are unlikely to be read within a future time interval, the set of data blocks is moved to the cold portion.

Separating data that is unlikely to be read from other data stored in the sequential portion that is likely to be read improves the performance efficiency of the low-level data storage device.

In another implementation of the first, second and third aspects, the high-level data storage device includes a hot portion and, in response to the access pattern indicating random access, the group of data blocks is moved to the hot portion.

The hot portion provides efficient access to the set of random access data blocks.

In another implementation of the first, second and third aspects, defragmentation is not performed on the non-sequential parts of the low level data storage device nor on the high level data storage device.

Performing defragmentation on non-sequential portions of low-level data storage devices and/or high-level data storage devices is computationally inefficient and may actually increase future access times.

In another implementation of the first, second and third aspects, the low-level data storage device is implemented as a log-based file system.

Defragmentation of sequential portions of a low-level data storage device implemented as a log-based file system improves access efficiency to data blocks stored in the low-level data storage device.

In another implementation of the first, second and third aspects, the defragmentation is performed on sequential blocks stored in the sequential portion during a garbage collection process, wherein the garbage collection process is triggered by a new write of a set of data blocks to a new location in the sequential portion.

Performing defragmentation during garbage collection may avoid or reduce interference with active access operations.

In another implementation of the first, second and third aspects, during the garbage collection process, space is left for a fragmented portion of the set of data blocks, the fragmented portion being moved during defragmentation.

Making room for the defragmented portion during the garbage collection process increases the efficiency of the defragmentation process in creating sequentially arranged data.

In another implementation of the first, second and third aspects, the defragmentation is performed when data is read sequentially from the sequential portions, and the data block groups are re-written sequentially on the sequential portions again in response to the data block groups being non-sequentially located on the sequential portions.

The defragmentation process can be selectively applied in an efficient manner, for example in terms of reducing excessive load on the processor and/or reducing interference with ongoing access operations and/or improving the efficiency of accessing the defragmented data.

In another implementation of the first, second and third aspects, the defragmentation is performed when the set of data blocks is moved from the high level data storage device to the low level data storage device and the access pattern of the set of data blocks indicates a possibility of sequential access.

The defragmentation process can be performed efficiently when performed on groups of data blocks that are moving and possibly sequentially accessed.

In another embodiment of the first, second and third aspects, further comprising: assigning a score to sequentially arranged data block groups indicating relative access to the respective block groups, ordering the data block groups according to the score, and sequentially performing defragmentation on the data block groups according to the ordering.

The data block groups are defragmented in descending order of access liveness, namely the most active data block groups are defragmented first, so that the defragmentation efficiency is improved.

In another embodiment of the first, second and third aspects, the access pattern comprises one or more access parameters selected from the group consisting of: read, sequential read, read size, write, sequential write, and write size.

In another implementation of the first, second and third aspects, the analysis of the access pattern of the set of data blocks comprises a prediction of a future access pattern of the set of data blocks.

The predicted future access pattern can better assign the data block group to the sequential portion or other portion.

In another implementation of the first, second and third aspects, the predictions of future access patterns are obtained as a result of a machine learning model trained on a plurality of recorded training data sets, each record comprising a respective set of blocks marked with ground truth labels of historical access patterns.

Based on the historical access patterns of the learned data block set, the ML model can improve the accuracy of the prediction.

In another implementation of the first, second and third aspects, the prediction of the future access pattern comprises a prefetch pattern by a prefetch process.

The prefetch method may be analyzed to more accurately predict future access patterns.

In another implementation of the first, second and third aspects, the pre-fetching process calculates a probability that each of a plurality of candidate subsequent data block sets is accessed given that a current data block set is accessed, and pre-fetches the subsequent data block set having the highest probability when the current data block set is accessed.

In another implementation of the first, second and third aspects, further comprising dynamically attenuating the access pattern by multiplying a current parameter of the access pattern by an attenuation value of less than 1 per time interval to obtain an adaptation parameter of the access pattern, wherein analyzing the access pattern comprises analyzing the adaptation parameter of the access pattern.

The decay value prevents the parameter value of the access pattern from increasing indefinitely and/or maintains the parameter value of the access pattern in a reasonable state supporting processing in a reasonable time.

In another implementation of the first, second and third aspects, the access pattern is calculated for each data block group comprising a plurality of sequentially stored data blocks, and the moving is performed for each data block group.

Analyzing the access pattern of each group of data blocks instead of each block reduces storage requirements and/or increases computational performance (e.g., processor utilization, processor time) compared to using very large data structures to store the access pattern of each block.

In another implementation of the first, second and third aspects, the access pattern is calculated by a migration up (tier up) and migration down (tier down) process that dynamically moves groups of data blocks between the high-level data storage device and the low-level data storage device to achieve dynamic optimization.

Existing migration-up and/or migration-down processes that evaluate the heat (hotness) of an area of one or more storage devices and/or the probability that the area will be read may be used to determine access patterns and/or analysis, and thus determine movement of groups of data blocks.

In another embodiment of the first, second and third aspects, further comprising dynamically defining an amount of said sequential portions to be defragmented.

Limiting the amount of sequential portions to be defragmented may prevent too much defragmentation and/or too much data to be moved due to defragmentation. Excessive defragmentation and/or excessive data movement may reduce the efficiency of access to the data and/or increase the load on the processor.

Unless defined otherwise, all technical and/or scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention pertains. Although methods and materials similar or equivalent to those described herein can be used in the practice or testing of embodiments of the present invention, exemplary methods and/or materials are described below. In case of conflict, the present specification, including definitions, will control. In addition, these materials, methods, and examples are illustrative only and not necessarily limiting.

Drawings

Some embodiments of the invention are described herein, by way of example only, with reference to the accompanying drawings. With specific reference now to the drawings, it is emphasized that the details shown are merely illustrative and for purposes of illustrative discussion of embodiments of the invention. In this regard, it will be apparent to those skilled in the art how to practice embodiments of the invention from the description of the drawings.

In the drawings:

FIG. 1 is a flow diagram of a method of moving a group of data blocks to a sequential portion of a low-level data store or a non-sequential portion of a low-level data store or a high-level data store according to an analysis of access patterns for the group of data blocks, according to some embodiments;

FIG. 2 is a block diagram of components of a system that moves a group of data blocks to a sequential portion of a low-level data store or a non-sequential portion of a low-level data store or a high-level data store according to an analysis of access patterns for the group of data blocks, according to some embodiments; and

FIG. 3 is a schematic diagram depicting a garbage collection process that leaves room for one or more defragmented portions of a group of data blocks that are moved during defragmentation, according to some embodiments.

Detailed Description

An aspect of some embodiments relates to systems, methods, computing devices and/or apparatuses and/or computer program products (storing code instructions executable by one or more processors) for managing a hierarchical storage system that includes low-level storage devices with long random access times (but fast access for sequentially stored data) and high-level storage devices with short random access times. Access patterns (e.g., read patterns) to groups of data blocks located on a low-level data storage device and a high-level data storage device are monitored. Based on the analysis of the access patterns, the data chunk is moved to portions of the low-level data storage device and/or portions of the high-level data storage device to increase the efficiency of access to data stored on the data storage system. The high-level data storage device is divided into a write cache portion and a hot portion. The write cache portion stores sequentially written data block groups that may be accessed in the future. The hot portion stores groups of data blocks that may be written non-sequentially that are accessed randomly in the future. The low-level data storage devices are divided into a cold portion (also referred to herein as a non-sequential portion) that stores non-sequential data block groups (i.e., data block groups are not sequentially accessed) and a sequential portion that stores sequential data block groups. The groups of data blocks stored in sequential portions of the low-level data storage device are defragmented (i.e., when new data is written non-sequentially, the new data is then read and rewritten in sequential mode). The data block groups stored in other portions (i.e., the write cache portion and/or the hot portion of the high-level data storage device, and/or the cold portion of the low-level data storage device) are not defragmented.

At least some embodiments described herein improve the performance efficiency of defragmentation processes performed on data storage systems that include low-level data storage devices and high-level data storage devices. The defragmentation process is performed to improve the efficiency of access to the data storage system.

Defragmentation is advantageous and is associated with file systems on electromechanical disk drives (hard disk drives, floppy disk drives and optical disc media). The read/write head of a hard disk moves at a slower rate in different areas of the disk when accessing fragmented files than when sequentially accessing the entire contents of a non-fragmented file without moving the read/write head to find other fragments. In modern block storage systems, data is also fragmented in the same manner as an HDD includes multiple logical units, and the logical units are represented in a log-based file system, so that consecutive logical blocks on the LU may not be consecutive on the HDD. Standard data defragmentation methods look at the allocation of fragments on HDDs. A file having a number of fragments of a block device in which consecutive blocks are not aligned is moved to allow realignment. Some file systems perform defragmentation during writing, allocating large contiguous blocks for the file, while others perform defragmentation offline, viewing the fragmented file and moving the data. The standard method performs the defragmentation process on a particular storage device as a whole for all files on the storage device, regardless of the access pattern to the data as described herein. For example: if the file is written only, then in a log-based file system, all write IOs are sequential, and although the file is fragmented, it does not affect the overall performance of the system. In fact, in such a scenario, performing defragmentation may actually compromise overall system performance.

Before explaining at least one embodiment of the invention in detail, it is to be understood that the invention is not necessarily limited in its application to the details of construction and the arrangement of components and/or methods set forth in the following description and/or illustrated in the drawings and/or examples. The invention is capable of other embodiments or of being practiced or of being carried out in various ways.

The present invention may be a system, method and/or computer program product. The computer program product may include a computer-readable storage medium having computer-readable program instructions that cause a processor to perform aspects of the invention.

The computer readable storage medium may be a tangible device that can retain and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.

The computer readable program instructions described herein may be downloaded from a computer readable storage medium to a corresponding computing/processing device or over a network such as the internet, a local area network, a wide area network, and/or a wireless network to an external computer or external storage device.

The computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (local area network, LAN) or a wide area network (wide area network, WAN), and the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, electronic circuitry, including, for example, programmable logic circuitry, field-programmable gate array (GATE ARRAY, FPGA), or programmable logic array (programmable logic array, PLA), etc., may execute the computer-readable program instructions by utilizing state information of the computer-readable program instructions to customize the electronic circuitry to perform aspects of the present invention.

Various aspects of the present invention are described herein in connection with flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer readable program instructions.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may be implemented out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Referring now to FIG. 1, FIG. 1 is a flow chart of a method of moving a group of data blocks to a sequential portion of a low-level data store or a non-sequential portion of a low-level data store or a high-level data store according to an analysis of access patterns for the group of data blocks, according to some embodiments. Referring now also to FIG. 2, FIG. 2 is a block diagram of components of a system 200 for moving a group of data blocks to sequential portions of a low-level data store or non-sequential portions of a low-level data store or a high-level data store according to an analysis of access patterns for the group of data blocks, according to some embodiments. The system 200 may implement the actions of the method described with reference to fig. 1 by one or more processors 202 of a computing device 204 executing code instructions (e.g., code 206A) stored in a memory 206.

The computing device 204 manages a hierarchical memory 208 that includes at least a low-level data storage device 210 and a high-level data storage device 212.

The low-level data storage device 210 includes a sequential portion 210-1 that stores sequential data and a non-sequential portion (also referred to herein as a cold portion) 210-2 that stores non-sequential data.

The high-level data storage device 212 includes a write cache portion 212-1 that stores sequentially written groups of data blocks that may be accessed in the future, and a hot portion 212-2 that stores non-sequentially written groups of data blocks that may be randomly accessed in the future.

The computing device 204 may prefetch data from the low-level data storage device 210 using a prefetch process 206B (e.g., stored on the memory 206, executed by the one or more processors 202), as described herein. The prefetch process 206B may predict the location of the next data component prior to requesting the data component and fetch the next data component prior to requesting. Prefetching may be performed from the low-level data storage device 210, saving space on the high-level data storage device 212. Some prefetch processes 206B are designed to predict the location of data components that are non-sequentially located, for example, in a stride (striding) pattern (e.g., adding a fixed address location relative to a previous address location) and/or in a constant address pattern that initially appears to be random.

The low-level data storage device 210 has a relatively long random access input/output (IO) time compared to the high-level data storage device 212. The high-level data storage device 212 has a relatively short random I/O (e.g., read and/or write) time as compared to the low-level data storage device 210.

The low-level data storage device 210 is less costly (e.g., per megabyte) than the high-level data storage device 212.

For example, the low-level data storage device 210 may be implemented as a hard disk drive (HARD DISK DRIVE, HDD). The low-level data storage device 210 may provide fast sequential reads and/or writes, but poor performance for random I/O because seek times may be very high (e.g., up to 10 milliseconds).

For example, the high-level data storage device 212 may be implemented as a solid state drive (SSD STATE DRIVE) and/or a phase-change memory (PCM).

The high-level data storage 212 may serve as a cache and/or hierarchy of the low-level data storage 210 (e.g., cache when the data is volatile and has a copy in the low-level, and/or hierarchy when the data is non-volatile and/or may be (e.g., only) held in the high-level).

The hierarchical memory 208 is in communication with a computing system 214, and the computing system 214 stores data on the hierarchical memory 208 and/or reads data stored on the hierarchical memory 208. Hierarchical memory 208 may be integrated within computing system 214 and/or may be implemented as an external storage device. The computing system 214 may be indirectly connected to the hierarchical memory 208 through the computing device 204, i.e., the computing system 214 may be in communication with the computing device 204, wherein the computing device 204 is in communication with the hierarchical memory 208, rather than the computing system 214 being in direct communication with the hierarchical memory 208.

For example, computing system 214 and/or computing device 204 may be implemented as one or more of the following: a computing cloud, a cloud network, a computer network, one or more virtual machines (e.g., virtual machine hypervisors, virtual servers), network nodes (e.g., switches, virtual networks, routers, virtual routers), a single computing device (e.g., client terminal), a group of computing devices arranged in parallel, a network server, a Web server, a storage server, a local server, a remote server, a client terminal, a mobile device, a stationary device, a public information machine, a smart phone, a notebook computer, a tablet computer, a wearable computing device, a glasses computing device, a watch computing device, and a desktop computer.

Alternatively, hierarchical memory 208 is exclusively used by a single use (e.g., computing device 214). Or the hierarchical memory 208 may be used by multiple users, such as multiple client terminals 216 accessing the hierarchical memory 208 over a network 218, e.g., the computing system 214 providing cloud storage services and/or virtual storage services to the client terminals 216.

For example, computing device 204 may be implemented as integrated within hierarchical memory 208 (e.g., as hardware and/or software installed within hierarchical memory 208), integrated within computing system 214 (e.g., as hardware and/or software installed within computing system 214, such as an accelerator chip and/or code stored on a memory of computing system 214 and executed by a processor of computing system 214), and/or as an external component in communication with hierarchical memory 208 (e.g., implemented as hardware and/or software) (e.g., as a plug-in component). Alternatively, the hierarchical memory 208 and computing device 204 are implemented as one storage system that exposes storage (e.g., functions, features, capabilities) to one or more computing systems 214.

Computing device 204 includes one or more processors 202, e.g., implemented as one or more central processing units (central processing unit, CPU), one or more graphics processing units (graphics processing unit, GPU), one or more field programmable gate arrays (field programmable GATE ARRAY, FPGA), one or more digital signal processors (DIGITAL SIGNAL processors, DSP), one or more Application SPECIFIC INTEGRATED Circuits (ASIC), one or more custom circuits, processors for interfacing with other units, and/or special-purpose hardware accelerators. One or more processors 202028 may be implemented as a single processor, a multi-core processor, and/or a processor cluster (which may include homogeneous and/or heterogeneous processor architectures) arranged for parallel processing. It is noted that one or more processors 202 may be designed to implement in hardware one or more features stored as code instructions 206A and/or 206B.

The memory 206 stores code instructions that may be implemented by the one or more processors 202, such as random access memory (random access memory, RAM), read-only memory (ROM), and/or storage devices, such as non-volatile memory, magnetic media, semiconductor storage devices, hard drives, removable memory, and optical media (e.g., DVD, CD-ROM). The memory 206 may store code 206A that, when executed by the one or more processors 208, implements one or more actions of the method described with reference to fig. 1 and/or stores prefetch process 206B code as described herein.

The computing device 204 may include a data storage device 220 for storing data, e.g., monitoring access patterns as described herein. For example, data storage device 220 may be implemented as memory, a local hard drive, a removable storage unit, an optical disk, a storage device, and/or a remote server and/or computing cloud (e.g., accessed using a network connection). It is noted that code instructions executable by the one or more processors 202 may be stored in the data storage device 220, e.g., with the execution portion loaded into the memory 206 for execution by the one or more processors 202.

The computing device 204 (and/or computing system 214) may be in communication with a user interface 222, the user interface 222 presenting data to a user and/or including mechanisms for inputting data, such as one or more of a touch screen, display, keyboard, mouse, voice-activated software, and microphone.

For example, the network 218 may be implemented as the internet, a local area network, a virtual private network, a virtual public network, a wireless network, a cellular network, a local bus, a point-to-point link (e.g., wired), and/or a combination thereof.

At 102, access patterns to groups of data blocks located on a low-level data storage device and/or on a high-level data storage device are monitored and/or analyzed.

The access pattern may be calculated based on the collected data parameters, e.g., calculating statistical data parameters for groups of data blocks (e.g., for each group of data blocks). Examples of data parameters for calculating access patterns include: read, sequential read, read size, write, sequential write, and write size.

Optionally, the access pattern is dynamically attenuated. The decay may be performed by multiplying the current parameter of the access pattern by a decay value smaller than 1 for each time interval to obtain an adapted parameter of the access pattern. Other attenuation methods may be used, such as linear, logarithmic, dynamically varying values, etc. The predicted normalized access parameters may be calculated using the adaptation parameters of the access pattern. The decay value prevents the parameter value of the access parameter from increasing indefinitely and/or maintains the parameter value of the access mode in a reasonable state supporting processing in a reasonable time. For example, the number of reads (parameter example of access pattern) per 5 minutes is multiplied by 0.99, so that if there are 100 reads currently, after 5 minutes, the number of reads is reduced to 99.

The access pattern may be calculated as a single data block group (e.g., each data block group), where the single data block group includes a plurality of sequentially stored data blocks. A block may be a minimum granularity of storage system operation. A user may read and/or write a single block and/or multiple blocks. The block size may be between about 0.5 and 32 kilobytes (kb), or in other ranges. The one or more predicted non-normalized access parameters may be calculated for each of a plurality of sequentially stored data blocks, rather than for blocks. A block group may be a contiguous address space, such as 4 megabytes (megabyte, MB) or other value address space. Analyzing the access pattern of each group of data blocks instead of each block reduces storage requirements and/or increases computational performance (e.g., processor utilization, processor time) compared to using very large data structures to store the access pattern of each block. The movement (e.g., as described with reference to 104) may be performed for each data block group.

Optionally, the access pattern is calculated by an up-migration and down-migration process that dynamically moves groups of data blocks between the high-level data storage device and the low-level data storage device to achieve dynamic optimization. Existing migration-up and/or migration-down processes that evaluate the heat (hotness) of an area of one or more storage devices and/or the probability that the area will be read may be used to determine access patterns and/or analysis, and thus determine movement of groups of data blocks.

Optionally, the analysis of the access patterns of the data block group is performed by calculating a prediction of future access patterns of the data block group. The predicted future access pattern can better assign the data block group to the sequential portion or other portion. The predictions of future access patterns may be obtained as a result of a machine learning (MACHINE LEARNING, ML) model (e.g., regressor, neural network, classifier, etc.). The ML model may be trained on a training dataset of records, where each record includes a respective set of blocks labeled with ground truth labels of historical access patterns. Based on the historical access patterns of the learned data block set, the ML model can improve the accuracy of the prediction. Other methods may be used to obtain a predicted future access pattern, such as a set of rules and/or a mathematical prediction model.

Optionally, the access pattern comprises a prefetch pattern by a prefetch process. The prefetch method may be analyzed to more accurately predict future access patterns. For example, the prefetch pattern may be one or a combination of sequential, stride (i.e., each time a fixed step is added), and/or random. When the data components have not been stored on the high-level data storage device, the prefetching process places the prefetched data components (non-sequentially located on the low-level data storage device) on the high-level data storage device. The prefetch process calculates a probability that each of a plurality of candidate subsequent data block sets is accessed given that the current data block set is accessed, and prefetches the subsequent data block set having the highest probability when the current data block set is accessed. The pre-fetching process of calculating probabilities enables the selection of the data block group that achieves the highest accuracy to store on the high-level data storage device, which improves the performance of the high-level data storage device because the stored data block group is most likely to be accessed in the future in fact than other components with lower probabilities stored on the low-level data storage device.

Optionally, the accuracy of the prefetch pattern (e.g., each prefetch pattern) is calculated. The prefetch mode (e.g., data component to be prefetched) may be predicted as described herein with reference to the trusted cache process discussed below. The term trusted buffer as used herein refers to a prefetch buffer that predicts one or more subsequent locations that are not necessarily in order. The accuracy may be calculated as the percentage of prefetch mode that correctly prefetches the correct component relative to all prefetch attempts, including attempts where the prefetch mode cannot prefetch the correct component. Two or more extraction modes with an accuracy higher than a threshold may be selected. For example, the threshold may be 20%, 25%, 30%, 40%, 45%, 50%, or other value. Two or more highest precision prefetch modes are selected because such prefetch modes are most likely to be reselected in the future.

Alternatively, the prefetch process is by calculating a conditional probability of a next access (e.g., read) location, sometimes referred to as a trusted cache prefetch (believe CACHE PREFETCHING), based on the current access (e.g., read) location. A prefetch process (e.g., trusted cache prefetch) calculates a probability that each of a plurality of candidate successor data components is accessed given that the current data component is accessed, and prefetches the successor data component with the highest probability when the current data component is accessed. The prefetch process calculates a probability that each of the plurality of candidate components is acquired by the prefetch mode.

When the conditional probability is above a threshold, data may be prefetched from the next access location. For example, trusted cache prefetching may be used when accesses to data memory are non-sequential but in a repeatable pattern (e.g., in a stride access (i.e., each time an address is incremented by a fixed amount relative to a current access), and/or in another repeatable pattern that initially appears to be random). The next location to be accessed is calculated based on the current and/or previous locations being accessed, based on the absolute address location and/or the relative address location. One exemplary calculation is now described:

After the first location (denoted as a) is accessed, the following memory locations are accessed multiple times: the second location (denoted X) is accessed 10 times, the third location (denoted Y) is accessed 3 times, and the fourth location (denoted Z) is accessed 5 times.

After the fifth location (denoted B) is accessed, the following memory locations are accessed multiple times: the second location (denoted X) is accessed 6 times, the third location (denoted Y) is accessed 2 times, the fourth location (denoted Z) is accessed 4 times, and the sixth location (denoted K) is accessed 7 times.

The conditional probability is calculated as follows:

●p(X|A)＝10/18p(Y|A)＝3/18p(Z|A)＝5/18

●p(X|B)＝6/19p(Y|B)＝2/19p(Z|B)＝4/19p(K|B)＝7/19

If there are two accesses (e.g., IOs) A and B in order, a recommendation of a data location to prefetch may be calculated by calculating a candidate probability for each of the following locations: x, Y, Z, K:

C_X＝p(X|A)+p(X|B)＝10/18+6/19＝0.87

C_Y＝p(Y|A)+p(Y|B)＝0.27

C_Z＝p(Z|A)+p(Z|B)＝0.71

C_K＝p(K|A)+p(K|B)＝0.36

The probabilities are ordered to rank the most likely subsequent locations from which the data prefetch was obtained. One or more prefetch modes may be accessed, e.g., a single prefetch, two prefetches, or more prefetches, and/or according to a threshold. The first prefetch comes from location X. The second prefetch comes from location Z. The third prefetch comes from location K. If a 50% threshold is used, data is prefetched from locations X and Z.

The prefetch location (i.e., X, Y, Z, K) may be referred to as a candidate (candidate). The current access location (i.e., A, B) may be referred to as a voter (voter).

The relationship between the current location and the subsequent location may be represented by a matrix, which may be referred to as a relationship matrix, such as that shown below (e.g., curHis: A B).

	X	Y	Z	K
					A	10	3	5	0
B	6	2	4	7

At 104, one or more data chunk sets are moved to sequential portions of a low-level data store, or to non-sequential portions of a low-level data store, or to a high-level data store, based on the analysis of the access patterns of the data chunk sets.

Optionally, the group of data blocks is moved to the sequential portion in response to the access pattern indicating active data having a large and/or sequential and/or likely to be accessed IO operation within a future time interval. The sequential portion of the low-level data storage device provides high access efficiency for sequential access operations, particularly for large amounts of sequential storage data.

Alternatively or additionally, the new write data is first placed into the write activity level. Dividing the low-level data storage device and the high-level data storage device into different portions improves access efficiency to groups of data blocks stored in the respective portions and/or improves overall access efficiency to groups of data blocks stored in the data storage system. The data block groups written to the write cache portion are aggregated therein so that the sequentially arranged data block groups can be efficiently transferred to the low-level data storage device. The low-level data storage device provides efficient access to sequentially ordered sets of data blocks that are moved from the write cache portion of the high-level data storage device.

Alternatively or additionally, the set of data blocks is moved to a cold portion of the low-level data storage device in response to the access pattern indicating a set of write data blocks that are unlikely to be read within a future time interval. Separating data that is unlikely to be read from other data stored in the sequential portion that is likely to be read improves the performance efficiency of the low-level data storage device. Since the data in the cold section is less likely to be read, the stored data does not have to be stored sequentially and/or does not have to be defragmented. Avoiding defragmentation may avoid unnecessary use of processing resources.

Alternatively or additionally, the set of data blocks is moved to a hot portion of the high-level data storage device in response to the access pattern indicating random access. The hot portion provides efficient access to the set of random access data blocks. Since the set of data blocks stored on the hot portion is randomly accessed, defragmentation is not helpful and is not performed. Avoiding defragmentation may avoid unnecessary use of processing resources. In addition, since high-level media such as SSDs support high-performance random access, the impact of defragmentation on performance is small.

At 106, groups of data blocks stored in sequential portions of the low-level data store are defragmented.

Defragmentation is not necessarily performed on non-sequential portions of the low level data storage device nor on the high level data storage device. Performing defragmentation on non-sequential portions of low-level data storage devices and/or high-level data storage devices is computationally inefficient and may actually increase future access times.

Optionally, the low-level data storage device is implemented as a log-based file system. Defragmentation of sequential portions of a low-level data storage device implemented as a log-based file system improves access efficiency to data blocks stored in the low-level data storage device.

Defragmentation may be performed on sequential groups of blocks stored in sequential portions during a garbage collection process. The garbage collection process may be triggered by a new write to the set of data blocks at a new location in the sequential portion. Performing defragmentation during garbage collection may avoid or reduce interference with active access operations. In log-based file systems, defragmentation is in many cases due to new writes to files/disks being written to new locations, triggering garbage collection. In performing garbage collection, data defragmentation is performed on sequential block groups stored in the sequential portion (e.g., only for sequential block groups).

During the garbage collection process, space may be made for one or more fragmented portions of the data block set that are moved during defragmentation. Making room for the defragmented portion during the garbage collection process increases the efficiency of the defragmentation process in creating sequentially arranged data. For example, in the case where a portion of a data chunk is covered and caused to fragment, during the garbage collection process, space may be left in the new chunk for other fragments of the previous chunk. Defragmentation is not necessarily performed immediately, but may be delayed. Such a method may be implemented for sequential block sets (e.g., only sequential block sets) to order new defragmented data block sets.

Referring now to FIG. 3, FIG. 3 is a schematic diagram depicting a garbage collection process 302 that leaves space 304 for one or more defragmented portions of a set of data blocks 306 that are moved during defragmentation, according to some embodiments. Prior to garbage collection 302, a data chunk 306 is depicted as 306A (assuming a log-based storage system is implemented and the data originally written to chunk 306 is sequential data in a file or chunk device), the portion of the data represented by arrow 308 is covered. Since the data is written to the log, each block location is written to another location in the log, free space is reserved so that the data can be moved again into order after reading the data from 310 and as 306B after garbage collection. The covered data 308 of the data chunk 306A is moved through the garbage collection 302 to create free space 304 for later defragmentation of the data chunk 310. The empty space 304 may be more easily defragmented in the future based on the current compression rate of the new data, e.g., when garbage collection is performed on the block set 310, the data is written to the space 304. Defragmentation may be performed at the same time as the garbage collection process is performed.

Defragmentation may be performed when data is read sequentially from the sequential portions. In response to the data block groups being non-sequentially located on the sequential portion, the data block groups may again be sequentially overwritten on the sequential portion. The defragmentation process can be selectively applied in an efficient manner, for example in terms of reducing excessive load on the processor and/or reducing interference with ongoing access operations and/or improving the efficiency of accessing the defragmented data. When data is sequentially read from the low-level data storage device, but the data is not sequentially located on the low-level device, the data may again be sequentially rewritten to the low-level device (i.e., written without the actual user-based writing occurring). The decision as to whether to overwrite the data again in order may be based on the probability that the read will again occur and the amount of defragmentation of the data block groups on disk, e.g., defragmentation may be performed on the read when the data block groups are stored in more than two locations.

Defragmentation is performed when a group of data blocks is moved from a high level data storage device to a low level data storage device and the access pattern of the group of data blocks indicates the possibility of sequential access. The defragmentation process can be performed efficiently when performed on groups of data blocks that are moving and possibly sequentially accessed.

Alternatively, scores are calculated and assigned to groups of sequentially arranged data blocks. The score indicates relative access to the corresponding block group. The data block groups are ordered according to the score. Defragmentation is performed sequentially on groups of data blocks according to the ordering. The data block groups are defragmented in descending order of access liveness, namely the most active data block groups are defragmented first, so that the defragmentation efficiency is improved.

At 108, for example, one or more features described with respect to 102-106 may be iterated over a plurality of time intervals for dynamically moving groups of data blocks between high and low levels.

For example, the amount of sequential portions used to perform defragmentation may be dynamically defined during an iteration. The heat assessed for defragmentation may depend on the amount of data movement set for defragmentation. For example, 10% of the storage read/write capacity may be dedicated to defragmentation. The heat may be dynamically defined so as not to move too much data due to the defragmentation process. Limiting the amount of sequential portions to be defragmented may prevent too much defragmentation and/or too much data to be moved due to defragmentation. Excessive defragmentation and/or excessive data movement may reduce the efficiency of access to the data and/or increase the load on the processor.

Other systems, methods, features and advantages of the invention will be or will become apparent to one with skill in the art upon examination of the following figures and detailed description. It is intended that all such additional systems, methods, features and advantages be included within this description, be within the scope of the invention, and be protected by the accompanying claims.

The description of the various embodiments of the present invention is intended for purposes of illustration only and is not intended to be exhaustive or limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen to best explain the principles of the embodiments, the practical application, or the technical advancement of the art, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein, as opposed to commercially available technology.

It is expected that during the life of a patent before expiration of this application many relevant defragmentation processes will be developed and the scope of the term defragmentation is intended to include all such new technologies a priori.

The term "about" as used herein means ± 10%.

The terms "including," having, "and variations thereof mean" including but not limited to. This term includes the term "consisting of … …" as well as "consisting essentially of … …".

The phrase "consisting essentially of … …" means that the composition or method may contain additional ingredients and/or steps, provided that the additional ingredients and/or steps do not materially alter the basic and novel characteristics of the claimed composition or method.

As used herein, the singular forms "a", "an" and "the" include plural referents unless the context clearly dictates otherwise. For example, the term "a complex" or "at least one complex" may include a plurality of complexes, including mixtures thereof.

The word "exemplary" is used herein to mean "serving as an example, instance, or illustration. Any embodiment described as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments and/or does not preclude the incorporation of features of other embodiments.

The word "optionally" as used herein means "provided in some embodiments and not provided in other embodiments. Any particular embodiment of the invention may include a plurality of "optional" features unless the features contradict each other.

In the present application, various embodiments of the application may be presented in a range format. It should be understood that the description of the range format is merely for convenience and brevity and should not be construed as a fixed limitation on the scope of the present application. Accordingly, the description of a range should be considered to have specifically disclosed all possible sub-ranges as well as individual values within the range. For example, descriptions of ranges such as from 1 to 6 should be considered as having specifically disclosed sub-ranges from 1 to 3, from 1 to 4, from 1 to 5, from 2 to 4, from 2 to 6, from 3 to 6, etc., and individual numbers within that range such as 1, 2, 3, 4, 5, and 6. This applies regardless of the breadth of the range.

When a range of numbers is indicated herein, the representation includes any recited number (fractional or integer) within the indicated range. The phrases "a range between a first indicated number and a second indicated number" and "a range from a first indicated number to a second indicated number" are used interchangeably herein to mean that all fractions and integers between the first indicated number and the second indicated number are included.

It is appreciated that certain features of the invention, which are, for clarity, described in the context of separate embodiments, may also be provided in combination in a single embodiment. Conversely, various features of the invention which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable subcombination or as any suitable other embodiment of the invention. Some features described in the context of various embodiments are not to be considered essential features of such embodiments unless the embodiments are not operable without such elements.

It is the intention of the inventors that all publications, patents and patent applications mentioned in this specification are herein incorporated in their entirety by reference into the specification, to the same extent as if each individual publication, patent or patent application was specifically and individually indicated to be incorporated herein by reference. Furthermore, citation or identification of any reference to the application shall not be construed as an admission that such reference is available as prior art to the present application. With respect to the use of section titles, the section titles should not be construed as necessarily limiting. In addition, the entire contents of any one or more priority files of the present application are incorporated herein by reference.

Claims

1. A computing device (204) for hierarchical storage management (208), characterized by:

Monitoring access patterns to a plurality of data block groups located on a low-level data storage device (208) and a high-level data storage device (212);

Moving the set of data blocks to a sequential portion (210-1) of the low-level data store or a non-sequential portion (210-2) of the low-level data store or the high-level data store based on an analysis of the access pattern of the set of data blocks; and

Defragmentation of the groups of data blocks stored in the sequential portion is performed.

2. The computing device of any of the preceding claims, wherein the group of data blocks is moved to the sequential portion in response to the access pattern indicating active data having a large and/or sequential and/or likely to be accessed IO operations within a future time interval.

3. The computing device of any of the preceding claims, wherein the high-level data storage device comprises a write cache portion (212-1), and wherein the group of data blocks is moved to the write cache portion in response to the access pattern indicating sequential writing of the group of data blocks.

4. The computing device of any of the preceding claims, wherein the low-level data storage device comprises a cold portion (210-2), and wherein the set of data blocks is moved to the cold portion in response to the access pattern indicating a set of write data blocks that are unlikely to be read within a future time interval.

5. The computing device of any of the preceding claims, wherein the high-level data storage device comprises a hot portion (212-2), and wherein the group of data blocks is moved to the hot portion in response to the access pattern indicating random access.

6. The computing device of any of the preceding claims, wherein defragmentation is not performed on the non-sequential portion of the low-level data storage device nor on the high-level data storage device.

7. The computing device of any of the preceding claims, wherein the low-level data storage device is implemented as a log-based file system.

8. The computing device of claim 7, wherein the defragmentation is performed on sequential blocks stored in the sequential portion during a garbage collection process, wherein the garbage collection process is triggered by a new write of a set of data blocks to a new location in the sequential portion.

9. The computing device of claim 8, wherein during the garbage collection process, space is made for a fragmented portion of the set of data blocks, the fragmented portion being moved during defragmentation.

10. The computing device of any of the preceding claims, wherein the defragmentation is performed when data is read sequentially from the sequential portion, and wherein the groups of data blocks are sequentially rewritten again on the sequential portion in response to the groups of data blocks being non-sequentially located on the sequential portion.

11. The computing device of any of the preceding claims, wherein the defragmentation is performed when the set of data blocks is moved from the high level data storage device to the low level data storage device and the access pattern of the set of data blocks indicates a likelihood of sequential access.

12. The computing device of any of the preceding claims, further comprising: assigning a score to sequentially arranged data block groups indicating relative access to the respective block groups, ordering the data block groups according to the score, and sequentially performing defragmentation on the data block groups according to the ordering.

13. The computing device of any of the preceding claims, wherein the access pattern comprises one or more access parameters selected from the group consisting of: read, sequential read, read size, write, sequential write, and write size.

14. The computing device of any of the preceding claims, wherein the analysis of the access pattern of the set of data blocks comprises a prediction of a future access pattern of the set of data blocks.

15. The computing device of claim 14, wherein the predictions of future access patterns are obtained as a result of a machine learning model trained on a plurality of recorded training data sets, each record comprising a respective set of blocks labeled with ground truth labels of historical access patterns.

16. The computing device of claim 14, wherein the prediction of the future access pattern comprises a prefetch pattern by a prefetch process (206B).

17. The computing device of claim 16, wherein the prefetch process calculates a probability that each of a plurality of candidate subsequent data block sets is accessed given that a current data block set is accessed, and prefetches the subsequent data block set having a highest probability when the current data block set is accessed.

18. The computing device of any of the preceding claims, further comprising dynamically attenuating the access pattern by multiplying a current parameter of the access pattern by an attenuation value less than 1 for each time interval to obtain an adaptation parameter of the access pattern, wherein analyzing the access pattern comprises analyzing the adaptation parameter of the access pattern.

19. The computing device of any of the preceding claims, wherein the access pattern is calculated for each data block group comprising a plurality of sequentially stored data blocks, and the moving is performed for each data block group.

20. The computing device of any of the preceding claims, wherein the access pattern is calculated by an up-migration and down-migration process that dynamically moves groups of data blocks between the high-level data storage device and the low-level data storage device to achieve dynamic optimization.

21. The computing device of any of the preceding claims, further comprising: the amount of the sequential portions to be defragmented is dynamically defined.

22. A computer-implemented hierarchical storage management method, comprising:

Monitoring access patterns (102) to a plurality of groups of data blocks located on a low-level data storage device and a high-level data storage device;

Moving the set of data blocks to a sequential portion of the low-level data store or a non-sequential portion of the low-level data store or the high-level data store (104) based on an analysis of the access pattern of the set of data blocks; and

Defragmentation (106) of the groups of data blocks stored in the sequential portions is performed.

23. A non-transitory medium (206) storing program instructions for hierarchical storage management (206A), which when executed by a processor (202), cause the processor to:

monitoring access patterns to a plurality of data block groups located on a low-level data storage device and a high-level data storage device;

moving the set of data blocks to a sequential portion of the low-level data store or a non-sequential portion of the low-level data store or the high-level data store based on an analysis of the access pattern of the set of data blocks; and