WO2022233391A1 - Placement intelligent de données sur un ensemble de stockage hiérarchique - Google Patents
Placement intelligent de données sur un ensemble de stockage hiérarchique Download PDFInfo
- Publication number
- WO2022233391A1 WO2022233391A1 PCT/EP2021/061606 EP2021061606W WO2022233391A1 WO 2022233391 A1 WO2022233391 A1 WO 2022233391A1 EP 2021061606 W EP2021061606 W EP 2021061606W WO 2022233391 A1 WO2022233391 A1 WO 2022233391A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- storage device
- data
- data storage
- lower tier
- prefetching
- Prior art date
Links
- 238000013500 data storage Methods 0.000 claims abstract description 149
- 238000000034 method Methods 0.000 claims abstract description 47
- 230000008569 process Effects 0.000 claims abstract description 28
- 238000007726 management method Methods 0.000 claims abstract description 13
- 230000032683 aging Effects 0.000 claims description 11
- 238000012544 monitoring process Methods 0.000 claims description 8
- 230000000875 corresponding effect Effects 0.000 claims description 4
- 230000004044 response Effects 0.000 claims description 4
- 239000007787 solid Substances 0.000 claims description 4
- 230000001960 triggered effect Effects 0.000 claims description 4
- 230000002596 correlated effect Effects 0.000 claims description 2
- 238000010586 diagram Methods 0.000 description 9
- 238000004590 computer program Methods 0.000 description 6
- 230000008901 benefit Effects 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 238000004891 communication Methods 0.000 description 3
- 150000001875 compounds Chemical class 0.000 description 3
- 239000000463 material Substances 0.000 description 3
- 239000000203 mixture Substances 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 238000003491 array Methods 0.000 description 2
- 230000001419 dependent effect Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 239000004615 ingredient Substances 0.000 description 2
- 230000002452 interceptive effect Effects 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 230000001413 cellular effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000010485 coping Effects 0.000 description 1
- 239000011521 glass Substances 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000010348 incorporation Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0862—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with prefetch
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/10—Providing a specific technical effect
- G06F2212/1016—Performance improvement
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/60—Details of cache memory
- G06F2212/6026—Prefetching based on access pattern detection, e.g. stride based prefetch
Definitions
- the present disclosure in some embodiments thereof, relates to hierarchical storage management and, more specifically, but not exclusively, to improving read access for data objects stored on a hierarchical storage system.
- Hierarchical storage systems are managed by automatically moving data between high cost and slow storage media, and low cost and fast storage media. While it would be ideal to store all data on fast storage media that provide fast read access, in practice, such solution is expensive. Most of the data is stored on the lower cost but slower storage media. Some data is moved to the higher cost but faster storage media, with the goal of optimizing the tradeoff between access time and costs of the storage media. The higher cost but faster storage media may serve as a cache for the lower cost but slower storage media.
- a computing device for hierarchical storage management is configured for: monitoring a plurality of prefetching patterns by a prefetching process, of a certain data object stored as a plurality of data components that are located non-sequentially on a lower tier data storage device, wherein the data components that are located non-sequentially are placed on a higher tier data storage device, and writing the certain data object in at least two duplicated sets of sequential locations on the lower tier data storage device, by sequentially writing duplicates of the plurality of data components according to the plurality of fetching patterns.
- a method of hierarchical storage management comprises: monitoring a plurality of prefetching patterns by a prefetching process, of a certain data object stored as a plurality of data components that are located non-sequentially on a lower tier data storage device, wherein the data components that are located non-sequentially are placed on a higher tier data storage device, and writing the certain data object in at least two duplicated sets of sequential locations on the lower tier data storage device, by sequentially writing duplicates of the plurality of data components according to the plurality of fetching patterns.
- a non-transitory medium storing program instructions which, when executed by a processor, cause the processor to: monitor a plurality of prefetching patterns by a prefetching process, of a certain data object stored as a plurality of data components that are located non-sequentially on a lower tier data storage device, wherein the data components that are located non-sequentially are placed on a higher tier data storage device, and write the certain data object in at least two duplicated sets of sequential locations on the lower tier data storage device, by sequentially writing duplicates of the plurality of data components according to the plurality of fetching patterns.
- the same data object may be stored on the lower tier data storage device in multiple data components that are non- sequential. Reading the data object incurs a long delay, due to the long random access time of the lower tier data storage device in reading each data component. Sequentially writing the non-sequential data components on the lower tier data storage device reduces the reading access and reading delay, due to the fast reading ability of the lower tier data storage device in reading sequentially located data.
- the same data object may have multiple prefetching patterns to it, for example, a table of data may be read in rows or columns. Sequentially storing the multiple non-sequentially located data components of the same data object according to the multiple prefetching patterns reduces read times and/or improves performance of the lower tier data storage device when any of the fetching patterns are accessed.
- the lower tier data storage device stores at least two copies of same data components of a same certain data object.
- the multiple copies of data enable providing fast sequential reading access, at a tradeoff of additional storage space, which may be cheaper for the lower tier data storage device in comparison to the higher tier data storage device.
- a number of copies of sets of sequential locations of the certain object written to the lower tier data storage device is correlated with a cost factor computed as cost of the higher tier storage device per unit of memory divided by cost of the lower tier storage device per unit of memory.
- a number of copies of sets of sequential locations of the certain object written to the lower tier data storage device is determined according to available storage on the lower tier storage device relative to available storage on the higher tier storage device.
- Data storage utilization may be optimized according to the amount of available space on the lower tier and/or higher tier storage devices.
- a number of copies of sets of sequential locations of the certain object written to the lower tier data storage device is determined according to a number of read accesses versus a number of write accesses, wherein the number of copies is increased when the number of read accesses is increased relative to the number of write accesses.
- Access performance is increased when data that is read heavy is stored multiple times, indicating that the number of copies may be increased. Access performance is not necessarily improved when data that is write heavy is stored multiple times, indicating that the number of copies may be reduced.
- a certain data component mapped to at least two prefetching patterns is written to the lower tier data storage device at least two times.
- Access performance is increased by storing multiple copies of the same data component when mapped to different prefetching patterns.
- the computing device is further configured to receive a request to read the certain data object according to one of the plurality of fetching patterns, and sequentially read the plurality of data components corresponding to the one of the plurality of fetching patterns from the lower tier data storage device.
- Performance in reading the certain data object which has been stored sequentially on the lower tier data storage device is high, due to the fast sequential reading ability of the lower tier data storage device.
- the prefetching process computes probability of each of a plurality of candidate subsequent data components being accessed given a current data component being accessed, and prefetches the subsequent data component having highest probability when the current data component is being accessed.
- the prefetching process computes the probability of the prefetching pattern fetching each of multiple candidate components.
- the multiple candidate components with highest accuracy of prefetching are selected for being sequentially written on the lower tier data storage device.
- first, second, and third aspects further configured for: computing accuracy of each of the plurality of prefetching patterns, selecting at least two of the plurality of fetching patterns having accuracy above a threshold, wherein the certain data object is written in at least two sets of sequential locations on the lower tier data storage device, by sequentially writing the plurality of data components according to the selected at least two of the plurality of fetching patterns.
- Two or more prefetching patterns with highest accuracy are selected, since such prefetching patterns are most likely to be re-selected in the future.
- the data components of the data object are written at multiple sequential locations according to the two or more selected prefetching patterns, providing fast read times for the two or more most common prefetching patterns.
- the lower tier data storage device comprises a hard disk drive (HDD) and the higher tier data storage device comprises a solid state drive (SSD).
- HDD hard disk drive
- SSD solid state drive
- the access performance of the HDD is improved using the tradeoff of requiring additional storage space (i.e., for writing the certain data object in multiple sequential copies) to obtain faster sequential access times, in comparison to SSD which provides fast random access times but is more expensive for the same amount of storage space as the HDD.
- the computing device is further configured for writing an initially accessed data component of the certain data object that triggers the prefetching pattern on the higher tier data storage device, wherein the prefetching pattern for fetching subsequent data components of the certain data object sequentially stored on the lower tier data storage device is trigged by access to the initially accessed data component stored on the higher tier data storage device.
- Access performance is further improved by storing the initial data component on the higher tier data storage device, which provides fast random access.
- Access to the initial data component triggers sequential prefetch of the subsequent components, which are stored sequentially on the lower tier data storage device, taking advantage of the fast sequential access ability of the lower tier data storage device.
- the prefetch is done in parallel to the access to the initial data component, which improves access performance.
- monitoring prefetching patterns comprises monitoring prefetching patterns triggered in response to a plurality of client terminals accessing objects stored on at least one storage server over a network.
- Access patterns of data objects stored on a storage server(s) accessed by client terminals over a network are monitored to identify the data objects most commonly accessed by the client terminals.
- Sequentially storing data objects on the lower tier data storage device according to different access patterns of different users improves performance for the client terminals, for example, interactive experience of users using the client terminals to access the objects is improved, by reducing wait times for the users waiting for the object to load.
- different access patterns of the different users are used to store different sequential copies of the database, enabling substantially immediate obtaining the data, rather than having the users wait for data to be loaded from the lower tier data storage device when stored non-sequentially using standard practice.
- the computing device is further configured for computing an aging parameter for each respective fetching pattern, the aging parameter indicating a maximum elapsed time between accesses of each respective fetching pattern, and removing and/or designating for being written over of a sequential copy of the respective fetching pattern stored on the lower tier data storage device when the aging parameter is exceeded.
- Performance of the lower tier data storage device is improved by removing sequential copies of prefetching patterns that are infrequently accessed.
- FIG. 1 is a flowchart of a method of management of a hierarchical storage by writing a certain data object stored non- sequent! ally on a lower tier data storage device as two or more duplicates sets of sequential location on the lower tier data storage device, in accordance with some embodiments;
- FIG. 2 is a block diagram of components of a system for management of a hierarchical storage by writing a certain data object stored non-sequentially on a lower tier data storage device as two or more duplicates sets of sequential location on the lower tier data storage device, in accordance with some embodiments.
- the present disclosure in some embodiments thereof, relates to hierarchical storage management and, more specifically, but not exclusively, to improving read access for data objects stored on a hierarchical storage system.
- An aspect of some embodiments relates to systems, methods, a computing device and/or apparatus, and/or computer program product (storing code instructions executable by one or more processors) for management of a hierarchical storage system that includes a lower tier storage device which has slow random access times (but fast access for sequentially stored data) and a higher tier storage device which has fast random access times.
- Prefetching patterns by a prefetching process of a data object stored as multiple non- sequential components on the lower tier data storage device are monitored.
- the non- sequential data components are placed on the higher tier data storage device by the prefetching process to improve access times.
- the data object is written sequentially in at least two duplicated sets on the lower tier data storage device.
- Components of the data object are written sequentially to the lower tier data device according to the monitored prefetching patterns. Subsequently, the prefetching process accesses one or more of the sequentially written duplicates of the data object from the lower tier data storage device, which is faster than accessing the non-sequentially written components from the lower tier data storage device, and/or saves space on the higher tier data storage device by reducing or preventing coping the data object (or component thereof) from the lower to the higher tier data storage device.
- the lower tier data storage device may be implemented, for example, as a hard disk drive (HDD), and the higher tier data storage device may be implemented as, for example, a solid state drive (SSD).
- HDD hard disk drive
- SSD solid state drive
- the access performance of the HDD is improved using the tradeoff of requiring additional storage space (i.e., for writing the certain data object in multiple sequential copies) to obtain faster sequential access times, in comparison to SSD which provides fast random access times but is more expensive for the same amount of storage space as the HDD.
- At least some implementations provided herein improve access times to data objects stored on a hierarchical storage system in comparison to other standard approaches.
- the storage system includes a lower tier storage device which has slow random access times (but fast access for sequentially stored data) and a higher tier storage device which has fast random access times.
- a lower tier storage device which has slow random access times (but fast access for sequentially stored data)
- a higher tier storage device which has fast random access times.
- Due to the higher cost of the higher tier storage device in comparison to the lower tier storage device large amounts of data are stored on the lower tier storage device as opposed to being stored on the higher tier storage device.
- Different approaches have been proposed for improving the access times of the lower tier storage device to enable more data to be written to the lower tier storage device.
- log based writes are written to a log device. In the log device, new data is written sequentially, to a new location.
- Data that was overwritten is marked.
- the log storage is divided into large blocks. Data is written sequentially to each large block. Periodically, a garbage collection process read blocks where most of the data was overwritten and rewrites the data that remains into a free block, and free the blocks that were read.
- the main problem in the log based device is that data is written in the order in which it arrives. Data arriving sequentially is written sequentially to the lower level tier device, which is similar to non-log system. However, data to be written that arrives in a random manner is written according to the way of arrival, i.e., sequentially, which results in the data being randomly written on the lower tier data storage device. This means that the read performance is not dependent on the read pattern, but rather on the write pattern.
- access times to data objects stored on the hierarchical storage system is improved by writing a certain data object (which is stored as multiple data components located non-sequentially on a lower tier data storage device) in two or more duplicated sets of location sequential locations on the lower tier data storage device.
- the data components are sequentially written as duplicates according to multiple monitored prefetching patterns.
- the relatively large amount of data storage space available on the lower tier data storage device is utilized to store multiple sequential copies of the data object which is originally stored non-sequentially, according to monitored prefetching patterns.
- Subsequent prefetching (following one or more of the monitored prefetching patterns) of the data object achieves fast access times due to the sequentially stored duplicates of the data components of the data object.
- the present disclosure may be a system, a method, and/or a computer program product.
- the computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present disclosure.
- the computer readable storage medium can be a tangible device that can retain and store instructions for use by an instruction execution device.
- the computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
- Computer readable program instructions described herein can be downloaded to respective computing/processing devices from a computer readable storage medium or to an external computer or external storage device via a network, for example, the Internet, a local area network, a wide area network and/or a wireless network.
- a network for example, the Internet, a local area network, a wide area network and/or a wireless network.
- the computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
- the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
- LAN local area network
- WAN wide area network
- Internet Service Provider for example, AT&T, MCI, Sprint, EarthLink, MSN, GTE, etc.
- electronic circuitry including, for example, programmable logic circuitry, field-programmable gate arrays (FPGA), or programmable logic arrays (PLA) may execute the computer readable program instructions by utilizing state information of the computer readable program instructions to personalize the electronic circuitry, in order to perform aspects of the present disclosure.
- FPGA field-programmable gate arrays
- PLA programmable logic arrays
- each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s).
- the functions noted in the block may occur out of the order noted in the figures.
- two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
- FIG. 1 is a flowchart of a method of management of a hierarchical storage by writing a certain data object stored non-sequentially on a lower tier data storage device as two or more duplicates sets of sequential location on the lower tier data storage device, in accordance with some embodiments.
- FIG. 2 is a block diagram of components of a system 200 for management of a hierarchical storage 208 by writing a certain data object stored non-sequentially on a lower tier data storage device 210 as two or more duplicates sets of sequential location on the lower tier data storage device 210, in accordance with some embodiments.
- System 200 may implement the acts of the method described with reference to FIG.
- Computing device 204 manages a hierarchical storage 208 that includes at least a lower tier data storage device 210 and a higher tier data storage device 212.
- Computing device 204 may use a prefetching process 206B (e.g., stored on memory 206, executed by processor(s) 202) for prefetching data from lower tier data storage device 210, as described herein.
- the prefetching process 206B may predict the location of the next data component before the data component is being requested, and fetch the next data component before the request.
- the prefetching may be performed from lower tier data storage device 210, saving room on higher tier data storage device 212.
- Some prefetching processes 206B are designed to predict locations of data components that are non-sequentially located, for example, located in a striding pattern (e.g., increase by a fixed address location relative to the previous address location) and/or in a constant address pattern that may at first appear to be random.
- Lower tier data storage device 210 has relatively slower random access input/output (IO) (e.g., read) times in comparison to higher tier data storage device 212.
- Higher tier data storage device 212 has relatively faster random I/O (e.g., read and/or write) times in comparison to lower tier data storage device 210.
- IO input/output
- Lower tier data storage device 210 may cost most (e.g., per megabyte) in comparison to higher tier data storage device 212.
- Lower tier data storage device 210 may be implemented, for example, as a hard disk drive (HDD). Lower tier data storage device 210 may provide fast sequential reading and/or writing, but has poor performance for random I/O as the seek times may be very high (e.g., up to 10 milliseconds).
- HDD hard disk drive
- Higher tier data storage device 212 may be implemented, for example, as a solid state drive (SSD), and/or phase-change memory (PCM).
- SSD solid state drive
- PCM phase-change memory
- Higher tier data storage device 212 may serve as a cache and/or a tier (e.g., cache when data is volatile and has a copy in the lower tier, and/or tier when the data is nonvolatile and/or may be kept (e.g., only) in the higher tier) for lower tier data storage device 210.
- a tier e.g., cache when data is volatile and has a copy in the lower tier, and/or tier when the data is nonvolatile and/or may be kept (e.g., only) in the higher tier
- Hierarchical storage 208 is in communication with a computing system 214, which stores data on hierarchical storage 208 and/or reads data stored on hierarchical storage 208.
- Hierarchical storage 208 may be integrated within computing system 214, and/or may be implemented as an external storage device.
- Computing system 214 may be indirectly connected to hierarchical storage 208 via computing device 204, i.e., computing system 214 may communicate with computing device 204, where computing device 204 communicates with hierarchical storage 208, rather than computing system 214 directly communicating with hierarchical storage 208.
- Computing system 214 and/or computing device 204 may be implemented as, for example, one of more of: a computing cloud, a cloud network, a computer network, a virtual machine(s) (e.g., hypervisor, virtual server), a network node (e.g., switch, a virtual network, a router, a virtual router), a single computing device (e.g., client terminal), a group of computing devices arranged in parallel, a network server, a web server, a storage server, a local server, a remote server, a client terminal, a mobile device, a stationary device, a kiosk, a smartphone, a laptop, a tablet computer, a wearable computing device, a glasses computing device, a watch computing device, and a desktop computer.
- a computing cloud e.g., hypervisor, virtual server
- a network node e.g., switch, a virtual network, a router, a virtual router
- a single computing device e.g., client terminal
- hierarchical storage 208 is used exclusively by a single use such as a computing device 214.
- hierarchical storage 208 is used by multiple users such as multiple client terminals 216 accessing hierarchical storage 208 over a network 218, for example, computing system 214 provides cloud storage services and/or virtual storage services to client terminals 216.
- Computing device 204 may be implemented as, for example, integrated within hierarchical storage 208 (e.g., as hardware and/or software installed within hierarchical storage 208), integrated within computing system 214 (e.g., as hardware and/or software installed within computing system 214, such as an accelerator chip and/or code stored on a memory of computing system 214 and executed by processor of computing system 214), and/or as an external component (e.g., implemented as hardware and/or software) in communication with hierarchical storage 208, such as a plug-in component.
- hierarchical storage 208 and computing device 204 are implemented as one storage system that exposes storage (e.g., functions, features, capabilities) to computing system(s) 214.
- Computing device 204 includes one or more processor(s) 202, implemented as for example, central processing unit(s) (CPU), graphics processing unit(s) (GPU), field programmable gate array(s) (FPGA), digital signal processor(s) (DSP), application specific integrated circuit(s) (ASIC), customized circuit(s), processors for interfacing with other units, and/or specialized hardware accelerators.
- processor(s) 202028 may be implemented as a single processor, a multi-core processor, and/or a cluster of processors arranged for parallel processing (which may include homogenous and/or heterogeneous processor architectures). It is noted that processor(s) 202 may be designed to implement in hardware one or more features stored as code instructions 206A and/or 206B.
- Memory 206 stores code instructions implementable by processor(s) 202, for example, a random access memory (RAM), read-only memory (ROM), and/or a storage device, for example, non-volatile memory, magnetic media, semiconductor memory devices, hard drive, removable storage, and optical media (e.g., DVD, CD-ROM).
- Memory 206 may store code 206A that when executed by processor(s) 208, implement one or more acts of the method described with reference to FIG. 1, and/or store prefetching process 206B code as described herein.
- Computing device 204 may include a data storage device 220 for storing data, for example, monitored prefetching patterns as described herein.
- Data storage device 220 may be implemented as, for example, a memory, a local hard-drive, a removable storage unit, an optical disk, a storage device, and/or as a remote server and/or computing cloud (e.g., accessed using a network connection).
- code instructions executable by processor(s) 202 may be stored in data storage device 220, for example, with executing portions loaded into memory 206 for execution by processor(s) 202.
- Computing device 204 may be in communication with a user interface 222 that presents data to a user and/or includes a mechanism for entry of data, for example, one or more of: a touch-screen, a display, a keyboard, a mouse, voice activated software, and a microphone.
- a user interface 222 that presents data to a user and/or includes a mechanism for entry of data, for example, one or more of: a touch-screen, a display, a keyboard, a mouse, voice activated software, and a microphone.
- Network 218 may be implemented as, for example, the internet, a local area network, a virtual private network, a virtual public network, a wireless network, a cellular network, a local bus, a point to point link (e g., wired), and/or combinations of the aforementioned.
- one or more prefetching patterns by one or more prefetching process are monitored.
- the prefetching process prefetches a data object stored as multiple data components that are located non- sequentially on a lower tier data storage device, the prefetching process may also prefetch sequential data since accessing data is slower than reading it sequentially.
- the prefetching pattern indicates the way that the sequence of components of the data object, which may be stored non-sequentially on the lower tier data storage device, are read by the prefetching process.
- the prefetching pattern may be, for example, one or combination of: sequentially, stride (i.e., increase by fixed step each time), and/or randomly.
- the prefetching process places the prefetched data components (that are located non-sequentially on the lower tier data storage device) on a higher tier data storage device when the data component is not already stored on the higher tier data storage device.
- the same data object may be stored on the lower tier data storage device in multiple data components that are non- sequential. Reading the data object incurs a long delay, due to the long random access time of the lower tier data storage device in reading each data component. Sequentially writing the non-sequential data components on the lower tier data storage device reduces the reading access and reading delay, due to the fast reading ability of the lower tier data storage device in reading sequentially located data.
- the same data object may have multiple prefetching patterns to it, for example, a table of data may be read in rows or columns. Sequentially storing the multiple non-sequentially located data components of the same data object according to the multiple prefetching patterns reduces read times and/or improves performance of the lower tier data storage device when any of the fetching patterns are accessed.
- the prefetching pattern(s) are triggered in response to a single user and/or single client terminal (e.g., processor thereof) accessing data objects stored in data storage devices associated with the single client terminal (e.g., HDD and/or SSD of the client terminal).
- multiple prefetching patterns are triggered in response to multiple client terminals accessing data objects stored on one or more storage servers over a network.
- the multiple prefetching patterns may be monitored. Access patterns of data objects stored on a storage server(s) accessed by client terminals over a network may be monitored to identify the data objects most commonly accessed by the client terminals.
- Sequentially storing data objects on the lower tier data storage device according to different access patterns of different users improves performance for the client terminals, for example, interactive experience of users using the client terminals to access the objects is improved, by reducing wait times for the users waiting for the object to load.
- different access patterns of the different users are used to store different sequential copies of the database, enabling substantially immediate obtaining the data, rather than having the users wait for data to be loaded from the lower tier data storage device when stored non-sequentially using standard practice.
- accuracy of the prefetching patterns e.g., each prefetching pattern
- the prefetching pattern e g., data component to be prefetched
- the prefetching pattern may be predicted as described herein with reference to a believe cache process discussed below.
- the term believe cache relates to a prefetch cache that predicts next location(s) which are not necessarily sequential.
- the accuracy may be computed as a percentage of when the prefetching pattern has correctly prefetched the correct component, relative to all prefetching attempts including attempts where the prefetching pattern was unable to prefetch the correct component.
- Two or more fetching patterns having accuracy above a threshold may be selected.
- the threshold may be, for example, 20%, 25%, 30%, 40%, 45%, 50%, or other values.
- Two or more prefetching patterns with highest accuracy are selected, since such prefetching patterns are most likely to be re-selected in the future.
- the data components of the data object are written at multiple sequential locations according to the two or more selected prefetching patterns, providing fast read times for the two or more most common prefetching patterns.
- the data object is then written in two or more sets of sequential locations on the lower tier data storage device, as described below with reference to 104, by sequentially writing the data components according to the selected fetching patterns.
- the prefetching process is based on computing conditional probabilities of a next access (e.g., read) location based on a current access (e.g., read) location, sometimes referred to as believe cache prefetching.
- the prefetching process (e.g., believe cache prefetching) computes probability of each of multiple candidate subsequent data components being accessed given a current data component being accessed, and prefetches the subsequent data component having highest probability when the current data component is being accessed.
- the prefetching process computes the probability of the prefetching pattern fetching each of multiple candidate components.
- the multiple candidate components with highest accuracy of prefetching are selected for being sequentially written on the lower tier data storage device.
- the data may be prefetched from the next access location when the conditional probability is above a threshold.
- cache prefetching may be used, for example, when access to data storage is non- sequential but in a repeatable pattern, for example, in striding access (i.e., each time increase the address by a fixed amount relative to the current access), and/or in another repeatable pattern which may at first appear to be random.
- the next location to be accessed is computed based on the current and/or previous locations that were accessed, based on absolute address locations and/or relative address locations. An exemplary computation is now described:
- a first location (denoted A) is accessed, the following memory locations are accessed multiple times: a second location (denoted X) is accessed 10 times, a third location (denoted Y) is accessed 3 times, and a fourth location (denoted Z) is accessed 5 times.
- a fifth location (denoted B) is accessed, the following memory locations are accessed multiple times: the second location (denoted X) is accessed 6 times, the third location (denoted Y) is accessed 2 times, the fourth location (denoted Z) is accessed 4 times, and a sixth location (denoted K) is accessed 7 times.
- Conditional probabilities are calculated as follows:
- the recommendation for which data location to prefetch from may be computed by calculating the candidate probability of each of the following locations; X, Y, Z, K:
- the probabilities are sorted to rank the most likely next locations from where prefetching of data is obtained.
- One or more prefetching patterns may be accessed, for example, a single prefetch, two prefetches, or more, and/or according to a threshold.
- the first prefetch is from location X.
- the second prefetch is from location Z.
- the third prefetch is from location K. If a threshold of 50% is used, data is prefetched from locations X and Z.
- Prefetch locations i.e., X, Y, Z, K
- Candidates Current access locations
- A, B may be referred to as voter.
- the relationship between the current and next locations may be presented in a matrix, which may be referred to as a relation matrix , for example, as below (e.g., curHis: A B)
- the data object is written in two or more duplicated sets of sequential locations on the lower tier data storage device.
- the duplicates of the data components of the data object are sequentially written on the lower tier data storage device according to the fetching patterns.
- the multiple copies of data enable providing fast sequential reading access, at a tradeoff of additional storage space, which may be cheaper for the lower tier data storage device in comparison to the higher tier data storage device.
- the data object is represented as data components xl, x2, x3, x4, ... , x_n.
- a first prefetching pattern reads the data sequentially.
- a second prefetching pattern reads the data in steps of 10.
- the data object is written twice, according to the first and second prefetching patterns.
- the first duplicate of the data object according to the first prefetching pattern is written as xl, x2, x3, x4,..., x_n.
- the second duplicate of the data object according to the second prefetching pattern is written as xl, xl 1, x21, x_(10*[n/10]+l),x2,x22,x32,..,x_(10*[n/10]+2), . , xl0,x20,x30, . .x_(10*[n/10]+10).
- the lower tier data storage device stores two or more copies of the same data components of the same data object.
- the multiple copies of data enable providing fast sequential reading access, at a tradeoff of additional storage space, which may be cheaper for the lower tier data storage device in comparison to the higher tier data storage device.
- a data component mapped to two or more prefetching patterns may be written to the lower tier data storage device two or more times. Access performance is increased by storing multiple copies of the same data component when mapped to different prefetching patterns.
- the number of copies of the data object, i.e., written in sets of sequential locations to the lower tier data storage device, may be selected according to one or more of:
- Fetching patterns having accuracy (e.g., correct fetching probability) over the threshold Fetching patterns below the threshold may occur too rarely to justify additional space on the lower tier data storage device. Fetching patterns above the threshold may occur repeatedly to justly the additional space.
- a cost factor computed as cost of the higher tier storage device per unit of memory divided by cost of the lower tier storage device per unit of memory.
- data storage utilization and access times may be optimized. For example, when the price of SSD (i.e., higher tier data storage device) vs. HDD (i.e., lower tier) is 2.5 (i.e., SSD costs 2.5 times HDD), storing the data object twice on the HDD provides an optimal tradeoff between cost and performance.
- Number of read accesses versus a number of write accesses The number of copies may be increased when the number of read accesses is larger relative to the number of write accesses. Access performance is increased when data that is read heavy is stored multiple times, indicating that the number of copies may be increased. Access performance is not necessarily improved when data that is write heavy is stored multiple times, indicating that the number of copies may be reduced.
- a request to read the data object according to one of the fetching patterns is accessed (e.g., received).
- the data components corresponding to the fetching pattern of the request are sequentially read from the lower tier data storage device, from the sequentially created duplicate. Performance in reading the certain data object which has been stored sequentially on the lower tier data storage device is high, due to the fast sequential reading ability of the lower tier data storage device.
- An initially accessed data component of the data object that triggers the prefetching pattern on the lower tier data storage device may be written on the higher tier data storage device,
- the prefetching pattern for fetching subsequent data components of the certain data object sequentially stored on the lower tier data storage device is trigged by access to the initially accessed data component stored on the higher tier data storage device. Access performance is further improved by storing the initial data component on the higher tier data storage device, which provides fast random access. Access to the initial data component triggers sequential prefetch of the subsequent components, which are stored sequentially on the lower tier data storage device, taking advantage of the fast sequential access ability of the lower tier data storage device.
- the prefetch is done in parallel to the access to the initial data component, which improves access performance.
- an aging parameter for the fetching patterns may be computed.
- the aging parameter may indicate a maximum elapsed time between accesses of each respective fetching pattern.
- the aging parameter may indicate a rate of use of the prefetching pattern. Older perfecting patterns may become less used.
- the aging parameter may be set, for example, as a preconfigured setting, manually selected, automatically computed, based on amount of available storage on the lower tier data storage device, and/or relative costs between higher and lower tier data storage devices.
- the sequential copy of component(s) of the data object(s) corresponding to the respective fetching pattern, stored on the lower tier data storage device is/are removed and/or designated for being written over. Performance of the lower tier data storage device is improved by removing sequential copies of prefetching patterns that are infrequently accessed.
- one or more features described with respect to 102-108 may be iterated.
- composition or method may include additional ingredients and/or steps, but only if the additional ingredients and/or steps do not materially alter the basic and novel characteristics of the claimed composition or method.
- a compound or “at least one compound” may include a plurality of compounds, including mixtures thereof.
- a numerical range is indicated herein, it is meant to include any cited numeral (fractional or integral) within the indicated range.
- the phrases “ranging/ranges between” a first indicate number and a second indicate number and “ranging/ranges from” a first indicate number “to” a second indicate number are used herein interchangeably and are meant to include the first and second indicated numbers and all the fractional and integral numerals therebetween.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
L'invention concerne un dispositif informatique destinée à la gestion d'un système de stockage hiérarchique qui comprend un dispositif de stockage de niveau inférieur qui a des temps d'accès aléatoire lents (mais un accès rapide correspondant à des données stockées de manière séquentielle) et un dispositif de stockage de niveau supérieur qui a des temps d'accès aléatoire rapides. Des motifs de lecture anticipée par un processus de lecture anticipée d'un objet de données stocké en tant que multiples composants non séquentiels sur le dispositif de stockage de données de niveau inférieur sont surveillés (102). Les composants de données non séquentiels sont placés sur le dispositif de stockage de données de niveau supérieur par le processus de lecture anticipée pour améliorer les temps d'accès. L'objet de données est écrit de manière séquentielle dans au moins deux ensembles dupliqués sur le dispositif de stockage de données de niveau inférieur (104). Les composants de l'objet de données sont écrits de manière séquentielle dans le dispositif de données de niveau inférieur selon les motifs de lecture anticipée surveillés.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202180097244.4A CN117242439A (zh) | 2021-05-04 | 2021-05-04 | 分层存储上的智能数据放置 |
PCT/EP2021/061606 WO2022233391A1 (fr) | 2021-05-04 | 2021-05-04 | Placement intelligent de données sur un ensemble de stockage hiérarchique |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/EP2021/061606 WO2022233391A1 (fr) | 2021-05-04 | 2021-05-04 | Placement intelligent de données sur un ensemble de stockage hiérarchique |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022233391A1 true WO2022233391A1 (fr) | 2022-11-10 |
Family
ID=75825808
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/EP2021/061606 WO2022233391A1 (fr) | 2021-05-04 | 2021-05-04 | Placement intelligent de données sur un ensemble de stockage hiérarchique |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN117242439A (fr) |
WO (1) | WO2022233391A1 (fr) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8607016B2 (en) * | 2004-07-21 | 2013-12-10 | Sandisk Technologies Inc. | FAT analysis for optimized sequential cluster management |
CN104615548A (zh) * | 2010-03-29 | 2015-05-13 | 威盛电子股份有限公司 | 数据预取方法以及微处理器 |
US9152567B2 (en) * | 2013-02-12 | 2015-10-06 | International Business Machines Corporation | Cache prefetching based on non-sequential lagging cache affinity |
-
2021
- 2021-05-04 CN CN202180097244.4A patent/CN117242439A/zh active Pending
- 2021-05-04 WO PCT/EP2021/061606 patent/WO2022233391A1/fr active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8607016B2 (en) * | 2004-07-21 | 2013-12-10 | Sandisk Technologies Inc. | FAT analysis for optimized sequential cluster management |
CN104615548A (zh) * | 2010-03-29 | 2015-05-13 | 威盛电子股份有限公司 | 数据预取方法以及微处理器 |
US9152567B2 (en) * | 2013-02-12 | 2015-10-06 | International Business Machines Corporation | Cache prefetching based on non-sequential lagging cache affinity |
Also Published As
Publication number | Publication date |
---|---|
CN117242439A (zh) | 2023-12-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11797185B2 (en) | Solid-state drive control device and learning-based solid-state drive data access method | |
US9652374B2 (en) | Sparsity-driven matrix representation to optimize operational and storage efficiency | |
CN101937331B (zh) | 用于自适应处理远程原子执行的方法、设备和系统 | |
US8380680B2 (en) | Piecemeal list prefetch | |
CN104756090B (zh) | 提供扩展的缓存替换状态信息 | |
US11513801B2 (en) | Controlling accesses to a branch prediction unit for sequences of fetch groups | |
US10685002B2 (en) | Radix sort acceleration using custom asic | |
US10387340B1 (en) | Managing a nonvolatile medium based on read latencies | |
CN109308191B (zh) | 分支预测方法及装置 | |
CN113625973B (zh) | 数据写入方法、装置、电子设备及计算机可读存储介质 | |
CN111324556B (zh) | 用于将预定数目的数据项预取到高速缓存的方法和系统 | |
TW201941197A (zh) | 混合式記憶體系統 | |
US8688946B2 (en) | Selecting an auxiliary storage medium for writing data of real storage pages | |
CN115756312A (zh) | 数据访问系统、数据访问方法和存储介质 | |
US20180349058A1 (en) | Buffer-based update of state data | |
JP2014220021A (ja) | 情報処理装置、制御回路、制御プログラム、および制御方法 | |
US10719441B1 (en) | Using predictions of outcomes of cache memory access requests for controlling whether a request generator sends memory access requests to a memory in parallel with cache memory access requests | |
US9223714B2 (en) | Instruction boundary prediction for variable length instruction set | |
WO2023083454A1 (fr) | Compression de données et hiérarchisation sensible à la déduplication dans un système de stockage | |
WO2022233391A1 (fr) | Placement intelligent de données sur un ensemble de stockage hiérarchique | |
WO2023061567A1 (fr) | Mémoire cache compressée en tant que niveau de mémoire cache | |
WO2023088535A1 (fr) | Éviction de mémoire cache sur la base d'un état de hiérarchisation en cours | |
WO2022248051A1 (fr) | Mise en cache intelligente de données se prêtant à une prélecture | |
WO2023061569A1 (fr) | Défragmentation intelligente d'un système de stockage de données | |
US11733893B2 (en) | Management of flash storage media |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21723867 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 202180097244.4 Country of ref document: CN |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21723867 Country of ref document: EP Kind code of ref document: A1 |