CN113741819A - Method and device for hierarchical storage of data - Google Patents

Method and device for hierarchical storage of data Download PDF

Info

Publication number
CN113741819A
CN113741819A CN202111081201.5A CN202111081201A CN113741819A CN 113741819 A CN113741819 A CN 113741819A CN 202111081201 A CN202111081201 A CN 202111081201A CN 113741819 A CN113741819 A CN 113741819A
Authority
CN
China
Prior art keywords
channel
storage
layer
channel list
threshold
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111081201.5A
Other languages
Chinese (zh)
Inventor
张�浩
杨俊�
卢冕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
4Paradigm Beijing Technology Co Ltd
Original Assignee
4Paradigm Beijing Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 4Paradigm Beijing Technology Co Ltd filed Critical 4Paradigm Beijing Technology Co Ltd
Priority to CN202111081201.5A priority Critical patent/CN113741819A/en
Publication of CN113741819A publication Critical patent/CN113741819A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0629Configuration or reconfiguration of storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/0647Migration mechanisms

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present disclosure provides a method and apparatus for hierarchical storage of data. The method is applied to a hierarchical storage structure provided with at least two storage layers, wherein the read-write performance of the storage layers is reduced from top to bottom in sequence and comprises a current layer and a next layer directly adjacent to the current layer, and the method comprises the following steps: executing a scheduling thread, wherein if the usage amount of the current layer is equal to or greater than a first threshold, a first migration operation is executed, and the first migration operation migrates at least one foremost channel of the first channel list to the next layer and adds the at least one foremost channel to the second channel list; and if the usage amount of the current layer is less than the first threshold, performing a second migration operation of migrating at least one rearmost channel of a second channel list to the current layer and inserting the rearmost channel into a foremost of a first channel list, wherein the first channel list represents channels included in the current layer, and the second channel list represents channels included in a next layer.

Description

Method and device for hierarchical storage of data
Technical Field
The present disclosure relates to the field of data storage, and more particularly, to a method and an apparatus for hierarchical storage of data.
Background
Data storage is a storage strategy for a system to perform data storage according to a hierarchy of storage performance under a computer architecture. A typical hierarchical storage includes two storage layers, an upper layer having a higher speed and lower latency than a lower layer, and the upper layer having a smaller capacity due to cost, etc. The speed of the central processor is relatively fast and most of the workload requires access (e.g., reading or writing) to the memory. In the prior art, the storage device used for the hierarchical storage is a traditional external storage device, such as an HDD or an SSD, and therefore, a two-level storage architecture is used, which has a great limitation on storage performance due to the storage medium used. In addition, data is usually written to the upper layer and the lower layer simultaneously in the prior art, which increases storage cost.
Disclosure of Invention
The invention aims to provide a method and a system for hierarchical storage of data.
According to one or more aspects of the present disclosure, a method for hierarchically storing data is provided, where the method is applied to a hierarchical storage structure provided with at least two storage tiers, where the read-write performance of the at least two storage tiers decreases from top to bottom in sequence, and the at least two storage tiers include a current tier and a next tier directly adjacent to the current tier, and the method includes: the dispatch thread is executed. Wherein if the usage of the current layer is equal to or greater than a first threshold, a first migration operation is performed; and if the usage amount of the current layer is less than the first threshold value, executing a second migration operation. Wherein the first migration operation migrates at least one first channel of the first channel list to a next layer and appends to the second channel list; the second migration operation migrates at least one channel at the rearmost of the second channel list to the current layer and inserts into the foremost of the first channel list. Wherein the first channel list represents channels included in a current layer, and the second channel list represents channels included in a next layer.
Optionally, the method may further include: and executing channel new construction operation. If the usage amount of the current layer is smaller than the second threshold, a channel is newly built in the current layer, the newly built channel is added to the first channel list, and the storage state corresponding to the newly built channel is set to be located in the current layer; and if the usage amount of the current layer is equal to or larger than a second threshold, newly building a channel at the next layer, adding the newly built channel to a second channel list, and setting the storage state corresponding to the newly built channel to be positioned at the next layer, wherein the second threshold is larger than or equal to the first threshold.
Optionally, the first migration operation may include: when the usage of the next layer is smaller than a third threshold, a target channel is newly built in the next layer; copying at least one first channel of the first channel list to a target channel; updating the storage state corresponding to the target channel to be positioned in the next layer; and deleting at least one channel at the top of the first channel list.
Optionally, the second migration operation may include: when the usage of the current layer is smaller than a third threshold, a target channel is newly established on the current layer; copying at least one channel at the rearmost of the second channel list to a target channel; updating the storage state corresponding to the target channel to be positioned in the current layer; and deleting the rearmost at least one channel of the second channel list.
Optionally, the method may further include: and when determining that the channels to be migrated are not written, executing at least one of the corresponding first migration operation and the second migration operation, wherein the channels to be migrated are at least one of at least one channel at the forefront of the first channel list and at least one channel at the rearmost of the second channel list.
Optionally, the first migration operation may further include: and when the first channel list is determined not to be read, deleting the first channel list.
Optionally, the second migration operation may further include: and when it is determined that at least one channel at the rearmost of the second channel list is not read, performing the step of deleting at least one channel at the rearmost of the second channel list.
Optionally, the at least two storage layers comprise a first layer having the highest read-write performance and a second layer directly adjacent to the first layer, the method may further comprise: and executing channel new construction operation. If the usage amount of the first layer is smaller than the second threshold, a channel is newly built in the first layer, the newly built channel is added to a channel list of the first layer, and the storage state corresponding to the newly built channel is set to be located in the first layer; and if the usage amount of the first layer is equal to or larger than a second threshold value, newly building a channel in the second layer, adding the newly built channel to a channel list of the second layer, and setting the storage state corresponding to the newly built channel to be positioned in the second layer, wherein the second threshold value is larger than or equal to the first threshold value.
Optionally, the at least two storage tiers may include at least one of non-volatile memory, NVMe SSD, SATA SSD, HDD RAID, and HDD.
According to one or more aspects of the present disclosure, there is provided an apparatus for hierarchical storage of data, the apparatus comprising: the storage system comprises a hierarchical storage structure, a storage layer and a storage layer, wherein the hierarchical storage structure is configured to comprise at least two storage layers, the read-write performance of the at least two storage layers is reduced from top to bottom in sequence, and the hierarchical storage structure comprises a current layer and a next layer directly adjacent to the current layer; and a scheduling unit configured to execute the scheduled thread. If the usage of the current layer is equal to or greater than a first threshold, the scheduling unit executes a first migration operation; the scheduling unit performs a second migration operation if the usage amount of the current layer is less than a first threshold. Wherein the first migration operation migrates at least one first channel of the first channel list to a next layer and appends to the second channel list; the second migration operation migrates at least one channel at the rearmost of the second channel list to the current layer and inserts into the foremost of the first channel list. Wherein the first channel list represents channels included in a current layer, and the second channel list represents channels included in a next layer.
Optionally, the apparatus may further include: and the channel new building unit is configured to execute channel new building operation. If the usage of the current layer is smaller than the second threshold, the channel creating unit creates a channel on the current layer, adds the created channel to the first channel list, and sets the storage state corresponding to the created channel to be located on the current layer; and if the usage amount of the current layer is equal to or larger than a second threshold, the channel newly-building unit newly builds a channel in the next layer, adds the newly-built channel to the second channel list, and sets the storage state corresponding to the newly-built channel to be positioned in the next layer, wherein the second threshold is larger than or equal to the first threshold.
Optionally, the apparatus further comprises: a migration unit configured to receive an instruction from the scheduling unit to perform the first migration operation or the second migration operation. For example, the migration unit is configured to perform the following steps when performing the first migration operation: when the usage of the next layer is smaller than a third threshold, a target channel is newly built in the next layer; copying at least one first channel of the first channel list to a target channel; updating the storage state corresponding to the target channel to be positioned in the next layer; and deleting at least one channel at the top of the first channel list. The migration unit is further configured to: and when the first migration operation is executed and the foremost at least one channel of the first channel list is determined not to be read, executing the step of deleting the foremost at least one channel of the first channel list. For another example, the migration unit is configured to, when performing the second migration operation, perform the following steps: when the usage of the current layer is smaller than a third threshold, a target channel is newly established on the current layer; copying at least one channel at the rearmost of the second channel list to a target channel; updating the storage state corresponding to the target channel to be positioned in the current layer; and deleting the rearmost at least one channel of the second channel list. The migration unit is further configured to: when the second migration operation is performed, when it is determined that none of the rearmost at least one channel of the second channel list is read, the step of deleting the rearmost at least one channel of the second channel list is performed. Wherein the third threshold is less than or equal to the second threshold.
Optionally, the migration unit may be further configured to, when it is determined that none of the channels to be migrated is written to, execute at least one of the corresponding first migration operation and second migration operation, where the channel to be migrated is at least one of the foremost at least one channel of the first channel list and the rearmost at least one channel of the second channel list.
Optionally, the at least two storage layers may include a first layer having the highest read-write performance and a second layer directly adjacent to the first layer, and the apparatus may further include: and the channel new building unit is configured to execute channel new building operation. If the usage of the first layer is smaller than the second threshold, the channel creating unit creates a channel in the first layer, adds the created channel to the channel list of the first layer, and sets the storage state corresponding to the created channel to be in the first layer; if the usage amount of the first layer is equal to or greater than a second threshold, the channel creation unit creates a channel at the second layer, appends the created channel to a channel list of the second layer, and sets a storage state corresponding to the created channel to be located at the second layer, wherein the second threshold is greater than or equal to the first threshold.
Optionally, the at least two storage tiers may include at least one of non-volatile memory, NVMe SSD, SATA SSD, HDD RAID, and HDD.
Another aspect of the present disclosure provides a computer-readable storage medium storing instructions that, when executed by at least one computing device, cause the at least one computing device to perform a method of data tiered storage as described above.
Another aspect of the present disclosure provides an electronic device comprising at least one processor and at least one memory storing instructions, wherein the instructions, when executed by the at least one processor, cause the at least one processor to perform the method of data tiered storage as described above.
According to one or more aspects of the present disclosure, a method of hierarchical storage of data achieves better performance than existing schemes by using a non-volatile memory as a first layer of the non-volatile hierarchical storage. In addition, the scheduling thread realizes data migration between different storage layers according to the data migration strategy, so that the hierarchical storage effect is realized to the maximum extent without causing significant additional storage cost.
Drawings
These and/or other aspects and advantages of the present disclosure will become more apparent and more readily appreciated from the following detailed description of the embodiments of the present disclosure, taken in conjunction with the accompanying drawings of which:
fig. 1 is an application scenario diagram illustrating a method of hierarchical storage of data according to an embodiment of the present disclosure;
FIG. 2 is a flow diagram illustrating a method of hierarchical storage of data according to an embodiment of the present disclosure;
FIG. 3 is a flow chart illustrating a newly created channel according to an embodiment of the present disclosure;
FIG. 4 is a flow diagram illustrating migrating threads in accordance with an embodiment of the present disclosure;
FIG. 5 is a block diagram illustrating an apparatus for hierarchical storage of data according to an exemplary embodiment of the present disclosure; and
fig. 6 is a block diagram illustrating an electronic device according to an exemplary embodiment of the present disclosure.
Detailed Description
Embodiments of the present disclosure are described in detail below with reference to the accompanying drawings. Examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the like elements throughout. The embodiments are described below in order to explain the present disclosure by referring to the figures.
Fig. 1 is an application scenario diagram illustrating a method of data hierarchical storage according to an embodiment of the present disclosure.
Data storage is a storage strategy for a system to perform data storage according to a hierarchy of storage performance under a computer architecture. Typically, the speed of a central processor is relatively fast and most of the workload requires access (e.g., reading or writing) to memory. Since the memory performance between memory layers varies from hierarchy to hierarchy, the processing speed is actually limited, causing the central processing unit to spend a lot of time waiting for memory writes or reads to complete the work.
As shown in fig. 1, in a hierarchical storage architecture provided with multiple storage tiers, the storage media may be ordered according to their read and write performance. For example, the hierarchical storage architecture may include first level storage, second level storage, third level storage, … …, Nth level storage. In an embodiment, the first level of storage has the highest read and write performance. For example, the first level storage, the second level storage, the third level storage, … …, and the nth level storage may employ at least one of a nonvolatile memory, NVMe SSD, SATA SSD, HDD RAID, and HDD. For example, the first level of storage may be non-volatile memory, i.e., persistent memory. The second level storage may be NVMe SSD. The third level of storage may be a SATA SSD. The fourth level of storage may be a HDD RAID. Typically, the first level of storage would have a higher cost due to its higher read and write performance. In this case, the first level storage has a relatively small capacity. In the prior art, the nonvolatile memory technology enables data in the memory to be persistent, and compared with the traditional memory, the nonvolatile memory has the advantages of large capacity and low cost; compared with the traditional external storage (HDD/SSD), the method has the characteristic of high-speed persistence. The capacity, price and performance of the nonvolatile memory are all between those of the traditional memory DRAM and the external memory NVMe SSD, so that the nonvolatile memory is suitable for being used as the first-level storage in a hierarchical storage architecture.
In an embodiment, a Kafka distributed event stream/message queue system (e.g., a Kafaka log application scenario as shown in fig. 1) may be employed when it is desired to achieve efficient and reliable processing of an implementation data stream. Under high pressure, log file off-disk storage and reading can create a performance bottleneck. Therefore, how to improve the endurance performance (including reading and writing) is the key to improve the Kafka delay and the throughput performance. On the other hand, Kafka's data has natural cold-hot properties, and in general, newly written data has a greater probability of being read. Therefore, by using the data hierarchical storage method disclosed by the invention, cold and hot data are separated, so that the overall system performance of Kafka is improved.
Fig. 2 is a flow diagram illustrating a method of hierarchical storage of data according to an embodiment of the present disclosure.
In an embodiment of the present disclosure, the method of hierarchical storage of data may be applied to a hierarchical storage architecture as shown in fig. 1. The hierarchical storage architecture may be provided with a hierarchical storage structure of at least two storage tiers, and the read-write performance of the hierarchical storage structure is reduced from top to bottom in sequence. For convenience of description, two storage layers directly adjacent in the hierarchical storage structure will be referred to as a current layer and a next layer directly adjacent to the current layer. The read-write performance of the current layer is higher than that of the next layer. For example, the current level may be a first level of storage as shown in FIG. 1, in which case the next level is a second level of storage as shown in FIG. 1. As another example, the current level may be the N-1 th level of storage, in which case the next level is the Nth level of storage as shown in FIG. 1. In an embodiment, the data may be stored in the hierarchical storage structure in the form of channels, for example, a channel may be a file channel (FileChannel) or a persistent memory channel (PMemChannel). In an embodiment, a channel may have two attributes, a channel id (channel id) and a memory state (MixChannel). The channel ID may be used to identify the channel and the storage status may be used to indicate where the channel is stored. For example, the storage state may be "at the current layer" or "at the next layer". In embodiments, the channel ID and storage state of the channel may be stored in a storage medium separate from the hierarchical storage structure, e.g., may be stored in volatile memory. In another embodiment, the channel ID and storage status of the channel may be stored in a first level storage of the hierarchical storage structure or may be stored in a partition of the first level storage. The first channel list represents channels included in a current layer, and the second channel list represents channels included in a next layer.
As shown in FIG. 2, the data staging method may include executing a dispatch thread. For example, the dispatch thread may be executed at a predetermined cycle (e.g., run every 10S), or the dispatch thread may also listen in real-time. Scheduling threads as described herein may refer to running a daemon process in a single thread in the background of the system to schedule data stored in a hierarchical storage structure. For example, data is scheduled between different memory levels in the hierarchical memory structure. In another embodiment, the scheduled thread may be an on-demand thread. The scheduling thread may maintain a migration task queue, for example, the migration task queue may include a high-low migration task queue and a low-high migration task queue. The scheduling thread may add the created migration task to the migration task queue.
In step S10, it is determined whether the usage amount of the current layer is equal to or greater than a first threshold.
If the usage amount of the current layer is equal to or greater than the first threshold, a first migration operation is performed in step S20. The first migration operation migrates at least one first channel of the first channel list to a next layer and appends to the second channel list. When the usage of the next layer reaches a predetermined threshold of the next layer (e.g., the remaining capacity is insufficient to accept the channel to be migrated, or the capacity is full), the first migration operation is terminated.
If the usage amount of the current layer is less than the first threshold, a second migration operation is performed in step S30. The second migration operation migrates at least one channel at the rearmost of the second channel list to the current layer and inserts into the foremost of the first channel list. The second migration operation terminates when the usage of the current layer reaches a predetermined threshold for the current layer (e.g., the remaining capacity is insufficient to accept the channel to be migrated, or the capacity is full).
In an embodiment, the first threshold may represent a preset usage amount or a storage percentage. The first threshold value can be set according to actual needs. For example, the first threshold may be 60% to 95%.
Fig. 3 is a flow chart illustrating a newly created channel according to an embodiment of the present disclosure.
As shown in fig. 3, the data staging method may further include performing a channel creation operation.
For example, in step S301, it is determined whether the usage amount of the current layer is less than a second threshold.
If the usage of the current layer is less than the second threshold, in step S302, a channel is newly created in the current layer, the newly created channel is added to the first channel list, and the storage status corresponding to the newly created channel is set to be located in the current layer.
If the usage amount of the current layer is equal to or greater than the second threshold, in step S303, in the next layer, a new channel is created, the newly created channel is added to the second channel list, and the storage status corresponding to the newly created channel is set to be located in the next layer.
In an embodiment, the second threshold is equal to or greater than the first threshold. The second threshold may be a value that is set in advance to indicate an upper limit of the usage amount of the current layer, or may be a value that is set in advance to ensure that the current layer has a capacity sufficient to store the newly created channel.
In a preferred embodiment, before step S301, the method may further include: and when the usage of each storage layer above the current layer reaches the newly-built channel threshold corresponding to each storage layer, performing the step of determining whether the usage of the current layer is less than the second threshold in step S301.
In a more specific embodiment, the current layer may be a first layer having the highest read-write performance, and the next layer may be a second layer directly adjacent to the first layer. In this case, performing the channel new operation can be preferentially performed in the first layer.
For example, if the usage amount of the first tier is less than the second threshold, in step S302, a channel is newly created at the first tier, the newly created channel is added to the channel list of the first tier, and the storage status corresponding to the newly created channel is set to be located at the first tier.
If the usage amount of the first tier is equal to or greater than the second threshold, in step S303, a channel is newly created in the second tier, the newly created channel is added to the channel list of the second tier, and the storage status corresponding to the newly created channel is set to be located in the second tier.
Taking the Kafka application scenario as an example, the newly written data has a greater probability of being read. Therefore, according to the embodiment of the disclosure, newly written data is preferentially stored in the storage layer with higher storage performance, so that cold and hot data are separated, and the overall system performance is improved.
FIG. 4 is a flow diagram illustrating migrating threads according to an embodiment of the present disclosure.
The migration thread is a multi-thread-background thread to implement data migration. Referring to fig. 4, in step S401, the migration thread may determine whether the channel to be migrated is written. Channels that are not written to may be referred to as migratable channels. Only migratable channels are considered for migration tasks. The migration task and the write task do not occur simultaneously for the same channel. Because the migration task is only performed if none of the channels have been written to, write errors due to migration can be avoided.
In step S402, the migration thread may determine whether the usage of the target layer is less than a third threshold. The third threshold value may be equal to or less than the second threshold value. For example, the third threshold may be a value set in advance to represent an upper limit of the usage percentage or the usage amount of the target layer. The third threshold may be used to ensure that the target layer has sufficient capacity to store the channels to be migrated.
In step S403, a target channel is newly created at the target layer. For example, the size of the target channel is the same or similar to the size of the channel to be migrated.
In step S404, the channel to be migrated is copied to the target channel.
In step S405, the storage status corresponding to the target channel is updated to "at the target tier". For example, after copying the channel to be migrated to the target channel, the channel ID corresponding to the channel to be migrated is updated to the channel ID of the target channel, and the storage status corresponding to the channel to be migrated is updated to "at the target layer". In an embodiment, the update is an atomic update. In this case, the next read operation will be performed directly on the target channel, bypassing the original to-be-migrated channel.
In step S406, it is determined whether the channel to be migrated is read. If being read, the method returns to wait until the subsequent steps are executed after the reading is not performed any more.
In an embodiment, to determine the read status of the channel to be migrated, a read counter may be set for each channel (e.g., readCounter may be set to an initial value of 0). When the channel is accessed by a read task, the read counter is incremented by 1(readCounter + +), and when the read task ends, the read counter is decremented by 1 (readCounter-). Therefore, when the read counter of the channel to be migrated is 0, it can be determined that the channel to be migrated is not read.
In step S407, when it is determined that none of the channels to be migrated has been read, the channel to be migrated is deleted. Because the deleting task is executed only when the condition that the channels to be migrated are not read is met, reading errors caused by the deleting task are avoided.
Referring to fig. 2 and 4, a dispatch thread according to embodiments of the present disclosure may invoke a migration thread. For example, when the scheduling thread determines in step S20 that the foremost at least one channel of the first channel list is to be migrated to the next layer, the foremost at least one channel of the first channel list may be determined as a channel to be migrated, and the next layer may be determined as the target layer, and the channel to be migrated and the target layer may be transmitted as inputs to the migration thread. Alternatively, when the scheduling thread determines in step S30 that the rearmost at least one channel of the second channel list is to be migrated to the current layer, the rearmost at least one channel of the second channel list may be determined to be a channel to be migrated, and the current layer may be determined to be the target layer, and the channel to be migrated and the target layer may be transmitted to the migration thread as inputs. For example, the scheduling thread may create a migration task based on the channel to be migrated and the target layer and place it in a migration task queue.
In particular embodiments, in step S20 (see FIG. 2), the scheduling thread may create a migration task from the current layer to the next layer and place it in the back-most of the high-low migration task queue (H2L task queue). Thereafter, the migration thread reads H2L task queue (pop H2L task queue), and if a task exists, the following steps are performed: judging whether at least one channel at the head of the first channel list is written (S401); when the usage of the next layer is less than the third threshold (S402), a new target channel is established at the next layer (S403); copying at least one first channel of the first channel list to a target channel (S404); updating a storage state corresponding to the target channel to be located at a next layer (S405); determining that none of the top at least one channel of the first channel list has been read (S406) and deleting the top at least one channel of the first channel list (S407).
In a specific embodiment, in step S30 (see fig. 2), the scheduling thread may create a migration task from the next layer to the current layer and place it in the front of the low-high migration task queue (L2H task queue), after which the L2H task queue (pop L2H task queue) is read by the migration thread, and if there is a task, the following steps are performed: judging whether at least one channel at the rearmost of the second channel list is written (S401); when the usage of the current layer is smaller than a third threshold (S402), a target channel is newly built on the current layer (S403); copying at least one channel at the rearmost of the second channel list to a target channel (S404); updating a storage state corresponding to the target channel to be located at a current layer (S405); determining that none of the rearmost at least one channel of the second channel list is read (S406) and deleting the rearmost at least one channel of the second channel list (S407).
According to one or more aspects of the present disclosure, a method of hierarchical storage of data achieves better performance than existing schemes by using a non-volatile memory as a first layer of the non-volatile hierarchical storage. In addition, according to one or more aspects of the present disclosure, a migration policy of data migration is implemented by a scheduling thread and a migration thread, so that a read error or a write error caused by migration is effectively avoided. Thereby maximizing the effect of hierarchical storage without incurring significant additional storage costs.
Fig. 5 is a block diagram of an apparatus 10 for hierarchical storage of data according to the present disclosure.
According to one or more aspects of the present disclosure, the present disclosure provides an apparatus 10 for hierarchical storage of data, the apparatus 10 comprising: a hierarchical storage structure 110, a scheduling unit 120, a channel creation unit 130, and a migration unit 140.
The hierarchical storage structure 110 is configured to include at least two storage tiers, wherein the at least two storage tiers include a current tier and a next tier directly adjacent to the current tier, and a read-write performance of the current tier is higher than a read-write performance of the next tier. The same as or similar to the hierarchical storage structure described with reference to fig. 2, and thus redundant description is omitted herein.
The dispatch unit 120 is configured to execute dispatch threads. For example, if the usage amount of the current layer is equal to or greater than a first threshold, the scheduling unit 120 performs a first migration operation that migrates at least one channel at the forefront of the first channel list to the next layer and appends to the second channel list. If the usage amount of the current layer is less than the first threshold, the scheduling unit 120 performs a second migration operation, which migrates at least one channel at the rearmost of the second channel list to the current layer and inserts the at least one channel at the foremost of the first channel list. The scheduling unit 120 may be configured to perform the method described with reference to steps S10 to S30 in fig. 2, and thus redundant description is omitted herein.
The channel new building unit 130 is configured to perform a channel new building operation. For example, if the usage of the current layer is less than the second threshold, the channel creating unit creates a channel in the current layer, adds the created channel to the first channel list, and sets the storage state corresponding to the created channel to be in the current layer; and if the usage amount of the current layer is equal to or larger than a second threshold, the channel newly-building unit newly builds a channel in the next layer, adds the newly-built channel to the second channel list, and sets the storage state corresponding to the newly-built channel to be positioned in the next layer, wherein the second threshold is larger than or equal to the first threshold. The channel new building unit 130 may be configured to perform the method described with reference to steps S301 to S303 in fig. 3, and thus redundant description is omitted herein.
The migration unit 140 is configured to receive an instruction from the scheduling unit 120 to perform the first migration operation or the second migration operation. For example, the migration unit 140 is configured to perform the following steps when performing the first migration operation: when the usage of the next layer is smaller than a third threshold, a target channel is newly built in the next layer; copying at least one first channel of the first channel list to a target channel; updating the storage state corresponding to the target channel to be positioned in the next layer; and deleting at least one channel at the top of the first channel list. The migration unit 140 is further configured to: and when the first migration operation is executed and the foremost at least one channel of the first channel list is determined not to be read, executing the step of deleting the foremost at least one channel of the first channel list. For another example, the migration unit 140 is configured to, when performing the second migration operation, perform the following steps: when the usage of the current layer is smaller than a third threshold, a target channel is newly established on the current layer; copying at least one channel at the rearmost of the second channel list to a target channel; updating the storage state corresponding to the target channel to be positioned in the current layer; and deleting the rearmost at least one channel of the second channel list. The migration unit 140 is further configured to: when the second migration operation is performed, when it is determined that none of the rearmost at least one channel of the second channel list is read, the step of deleting the rearmost at least one channel of the second channel list is performed. Wherein the third threshold is less than or equal to the second threshold. The migration unit 140 may be configured to perform the method described with reference to steps S401 to S407 in fig. 4, and thus redundant description is omitted herein.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module/unit performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated herein.
Fig. 6 is a block diagram illustrating an electronic device 600 according to an example embodiment of the present disclosure.
Referring to fig. 6, an electronic device 600 includes at least one memory 601 and at least one processor 602, the at least one memory 601 storing computer-executable instructions that, when executed by the at least one processor 602, cause the at least one processor 602 to perform a method of hierarchical storage of data according to an embodiment of the present disclosure.
By way of example, the electronic device 600 may be a PC computer, tablet device, personal digital assistant, smart phone, or other device capable of executing the instructions described above. Here, the electronic device 600 need not be a single electronic device, but can be any arrangement or collection of circuits capable of executing the above-described instructions (or sets of instructions), either individually or in combination. The electronic device 600 may also be part of an integrated control system or system manager, or may be configured as a portable electronic device that interfaces with local or remote (e.g., via wireless transmission).
In the electronic device 600, the processor 602 may include a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a programmable logic device, a special purpose processor system, a microcontroller, or a microprocessor. By way of example, and not limitation, processors may also include analog processors, digital processors, microprocessors, multi-core processors, processor arrays, network processors, and the like.
The processor 602 may execute instructions or code stored in the memory 601, wherein the memory 601 may also store data. The instructions and data may also be transmitted or received over a network via a network interface device, which may employ any known transmission protocol.
The memory 601 may be integrated with the processor 602, for example, with RAM or flash memory disposed within an integrated circuit microprocessor or the like. Further, memory 601 may comprise a stand-alone device, such as an external disk drive, storage array, or any other storage device usable by a database system. The memory 601 and the processor 602 may be operatively coupled or may communicate with each other, e.g., through I/O ports, network connections, etc., such that the processor 602 can read files stored in the memory.
Further, the electronic device 600 may also include a video display (such as a liquid crystal display) and a user interaction interface (such as a keyboard, mouse, touch input device, etc.). All components of the electronic device 600 may be connected to each other via a bus and/or a network.
According to an embodiment of the present disclosure, there may also be provided a computer-readable storage medium, wherein instructions stored in the computer-readable storage medium, when executed by at least one processor, cause the at least one processor to perform a method of data tiered storage according to an embodiment of the present disclosure. Examples of the computer-readable storage medium herein include: read-only memory (ROM), random-access programmable read-only memory (PROM), electrically erasable programmable read-only memory (EEPROM), random-access memory (RAM), dynamic random-access memory (DRAM), static random-access memory (SRAM), flash memory, non-volatile memory, CD-ROM, CD-R, CD + R, CD-RW, CD + RW, DVD-ROM, DVD-R, DVD + R, DVD-RW, DVD + RW, DVD-RAM, BD-ROM, BD-R, BD-R LTH, BD-RE, Blu-ray or compact disc memory, Hard Disk Drive (HDD), solid-state drive (SSD), card-type memory (such as a multimedia card, a Secure Digital (SD) card or a extreme digital (XD) card), magnetic tape, a floppy disk, a magneto-optical data storage device, an optical data storage device, a hard disk, a magnetic tape, a magneto-optical data storage device, a hard disk, a magnetic tape, a magnetic data storage device, a magnetic tape, a magnetic data storage device, a magnetic tape, a magnetic data storage device, a magnetic tape, a magnetic data storage device, a magnetic tape, a magnetic data storage device, A solid state disk, and any other device configured to store and provide a computer program and any associated data, data files, and data structures to a processor or computer in a non-transitory manner such that the processor or computer can execute the computer program. The computer program in the computer-readable storage medium described above can be run in an environment deployed in a computer apparatus, such as a client, a host, a proxy device, a server, and the like, and further, in one example, the computer program and any associated data, data files, and data structures are distributed across a networked computer system such that the computer program and any associated data, data files, and data structures are stored, accessed, and executed in a distributed fashion by one or more processors or computers.
According to an embodiment of the present disclosure, there may also be provided a computer program product comprising computer instructions which, when executed by at least one processor, implement a method of hierarchical storage of data according to an embodiment of the present disclosure.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (10)

1. A method for hierarchical storage of data is applied to a hierarchical storage structure provided with at least two storage layers, wherein the read-write performance of the at least two storage layers is sequentially reduced from top to bottom, and the at least two storage layers comprise a current layer and a next layer directly adjacent to the current layer, and the method comprises the following steps:
the execution of the dispatch thread is performed,
wherein if the usage amount of the current layer is equal to or greater than a first threshold, performing a first migration operation that migrates at least one first channel of a first channel list to the next layer and appends to a second channel list; and is
Wherein if the usage amount of the current layer is less than a first threshold, a second migration operation is performed, the second migration operation migrates at least one channel at the rearmost of the second channel list to the current layer and inserts the at least one channel at the foremost of the first channel list,
wherein the first channel list represents channels included in the current layer, and the second channel list represents channels included in the next layer.
2. The method of claim 1, further comprising:
the channel new-building operation is executed,
if the usage amount of the current layer is smaller than a second threshold, a channel is newly built in the current layer, the newly built channel is added to the first channel list, and the storage state corresponding to the newly built channel is set to be located in the current layer; and is
Wherein if the usage amount of the current layer is equal to or greater than the second threshold, a channel is newly created in the next layer, the newly created channel is added to the second channel list, and the storage status corresponding to the newly created channel is set to be located in the next layer,
wherein the second threshold is equal to or greater than the first threshold.
3. The method of claim 2, wherein the first migration operation comprises:
when the usage of the next layer is smaller than a third threshold, a target channel is newly built in the next layer;
copying the top at least one channel of the first channel list to the target channel;
updating the storage state corresponding to the target channel to be located in the next layer; and
deleting the first at least one channel of the first channel list,
wherein the third threshold is less than or equal to the second threshold.
4. The method of claim 2, wherein the second migration operation comprises:
when the usage of the current layer is smaller than a third threshold, a target channel is newly established on the current layer;
copying the rearmost at least one channel of the second channel list to the target channel;
updating the storage state corresponding to the target channel to be located in the current layer; and
deleting the last at least one channel of the second channel list,
wherein the third threshold is less than or equal to the second threshold.
5. The method according to claim 3 or 4, characterized in that the method further comprises: when determining that no channel to be migrated is written, executing at least one of the corresponding first migration operation and the second migration operation, wherein the channel to be migrated is at least one of the foremost at least one channel of the first channel list and the rearmost at least one channel of the second channel list.
6. The method of claim 3, wherein the first migration operation further comprises: when it is determined that none of the top at least one channel of the first channel list has been read, then performing the step of deleting the top at least one channel of the first channel list.
7. The method of claim 4, wherein the second migration operation further comprises: when it is determined that none of the rearmost at least one channel of the second channel list is read, re-executing the step of deleting the rearmost at least one channel of the second channel list.
8. An apparatus for hierarchical storage of data, the apparatus comprising:
the storage system comprises a hierarchical storage structure, a storage layer and a storage layer, wherein the hierarchical storage structure is configured to comprise at least two storage layers, the read-write performance of the at least two storage layers is reduced from top to bottom in sequence, and the hierarchical storage structure comprises a current layer and a next layer directly adjacent to the current layer; and
a scheduling unit configured to execute a scheduled thread,
wherein the scheduling unit performs a first migration operation of migrating at least one first channel of a first channel list to the next layer and appending to a second channel list if the usage amount of the current layer is equal to or greater than a first threshold value,
wherein the scheduling unit performs a second migration operation of migrating at least one channel at the rearmost of the second channel list to the current layer and inserting the at least one channel at the rearmost of the second channel list into the foremost of the first channel list if the usage amount of the current layer is less than a first threshold, and
wherein the first channel list represents channels included in the current layer, and the second channel list represents channels included in the next layer.
9. An electronic device comprising at least one processor and at least one memory storing instructions, wherein the instructions, when executed by the at least one processor, cause the at least one processor to perform a method of data staging according to any one of claims 1 to 7.
10. A computer-readable storage medium storing instructions that, when executed by at least one computing device, cause the at least one computing device to perform a method of data tiered storage according to any of claims 1 to 7.
CN202111081201.5A 2021-09-15 2021-09-15 Method and device for hierarchical storage of data Pending CN113741819A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111081201.5A CN113741819A (en) 2021-09-15 2021-09-15 Method and device for hierarchical storage of data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111081201.5A CN113741819A (en) 2021-09-15 2021-09-15 Method and device for hierarchical storage of data

Publications (1)

Publication Number Publication Date
CN113741819A true CN113741819A (en) 2021-12-03

Family

ID=78739072

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111081201.5A Pending CN113741819A (en) 2021-09-15 2021-09-15 Method and device for hierarchical storage of data

Country Status (1)

Country Link
CN (1) CN113741819A (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103078933A (en) * 2012-12-29 2013-05-01 深圳先进技术研究院 Method and device for determining data migration time
CN104573020A (en) * 2015-01-12 2015-04-29 浪潮电子信息产业股份有限公司 Automatic data migrating and optimizing method in hierarchical storage system
CN105657066A (en) * 2016-03-23 2016-06-08 天津书生云科技有限公司 Load rebalance method and device used for storage system
CN105653591A (en) * 2015-12-22 2016-06-08 浙江中控研究院有限公司 Hierarchical storage and migration method of industrial real-time data
CN106502576A (en) * 2015-09-06 2017-03-15 中兴通讯股份有限公司 Migration strategy method of adjustment, capacity change suggesting method and device
CN108089814A (en) * 2016-11-23 2018-05-29 中移(苏州)软件技术有限公司 A kind of date storage method and device
US20190087342A1 (en) * 2017-09-21 2019-03-21 International Business Machines Corporation Dynamic premigration throttling for tiered storage
US20200089425A1 (en) * 2018-09-19 2020-03-19 Fujitsu Limited Information processing apparatus and non-transitory computer-readable recording medium having stored therein information processing program
CN111367469A (en) * 2020-02-16 2020-07-03 苏州浪潮智能科技有限公司 Layered storage data migration method and system
CN111427969A (en) * 2020-03-18 2020-07-17 清华大学 Data replacement method of hierarchical storage system

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103078933A (en) * 2012-12-29 2013-05-01 深圳先进技术研究院 Method and device for determining data migration time
CN104573020A (en) * 2015-01-12 2015-04-29 浪潮电子信息产业股份有限公司 Automatic data migrating and optimizing method in hierarchical storage system
CN106502576A (en) * 2015-09-06 2017-03-15 中兴通讯股份有限公司 Migration strategy method of adjustment, capacity change suggesting method and device
CN105653591A (en) * 2015-12-22 2016-06-08 浙江中控研究院有限公司 Hierarchical storage and migration method of industrial real-time data
CN105657066A (en) * 2016-03-23 2016-06-08 天津书生云科技有限公司 Load rebalance method and device used for storage system
CN108089814A (en) * 2016-11-23 2018-05-29 中移(苏州)软件技术有限公司 A kind of date storage method and device
US20190087342A1 (en) * 2017-09-21 2019-03-21 International Business Machines Corporation Dynamic premigration throttling for tiered storage
US20200089425A1 (en) * 2018-09-19 2020-03-19 Fujitsu Limited Information processing apparatus and non-transitory computer-readable recording medium having stored therein information processing program
CN111367469A (en) * 2020-02-16 2020-07-03 苏州浪潮智能科技有限公司 Layered storage data migration method and system
CN111427969A (en) * 2020-03-18 2020-07-17 清华大学 Data replacement method of hierarchical storage system

Similar Documents

Publication Publication Date Title
JP7089830B2 (en) Devices, systems, and methods for write management of non-volatile memory data
US8103847B2 (en) Storage virtual containers
US8464003B2 (en) Method and apparatus to manage object based tier
US20090132621A1 (en) Selecting storage location for file storage based on storage longevity and speed
US9026730B2 (en) Management of data using inheritable attributes
US10082984B2 (en) Storage device and method of operating the same
US20150058568A1 (en) HIERARCHICAL STORAGE FOR LSM-BASED NoSQL STORES
US20150347311A1 (en) Storage hierarchical management system
KR102585883B1 (en) Operating method of memory system and memory system
US10831374B2 (en) Minimizing seek times in a hierarchical storage management (HSM) system
WO2018171296A1 (en) File merging method and controller
US11366788B2 (en) Parallel pipelined processing for snapshot data deletion
CN112988627A (en) Storage device, storage system, and method of operating storage device
CN110989924B (en) Metadata storage performance optimization method and storage server
CN109313593A (en) Storage system
US20090030868A1 (en) Method And System For Optimal File System Performance
US20150082014A1 (en) Virtual Storage Devices Formed by Selected Partitions of a Physical Storage Device
CN114138200A (en) Pre-writing log method and system based on rocksDB
US11210024B2 (en) Optimizing read-modify-write operations to a storage device by writing a copy of the write data to a shadow block
CN116209986A (en) Partition hints for partitioned namespace storage devices
US11010091B2 (en) Multi-tier storage
US8504764B2 (en) Method and apparatus to manage object-based tiers
US10521156B2 (en) Apparatus and method of managing multi solid state disk system
CN109508140B (en) Storage resource management method and device, electronic equipment and system
US20110153674A1 (en) Data storage including storing of page identity and logical relationships between pages

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination