CN116382573A - Data storage method, computing device and data storage system - Google Patents

Data storage method, computing device and data storage system Download PDF

Info

Publication number
CN116382573A
CN116382573A CN202310233483.9A CN202310233483A CN116382573A CN 116382573 A CN116382573 A CN 116382573A CN 202310233483 A CN202310233483 A CN 202310233483A CN 116382573 A CN116382573 A CN 116382573A
Authority
CN
China
Prior art keywords
data
storage area
storage
target data
data blocks
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310233483.9A
Other languages
Chinese (zh)
Inventor
郭畅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Huawei Cloud Computing Technology Co ltd
Original Assignee
Shenzhen Huawei Cloud Computing Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Huawei Cloud Computing Technology Co ltd filed Critical Shenzhen Huawei Cloud Computing Technology Co ltd
Priority to CN202310233483.9A priority Critical patent/CN116382573A/en
Publication of CN116382573A publication Critical patent/CN116382573A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/0647Migration mechanisms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/062Securing storage systems
    • G06F3/0622Securing storage systems in relation to access
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0626Reducing size or complexity of storage systems
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Techniques For Improving Reliability Of Storages (AREA)

Abstract

The application relates to the technical field of data storage, and provides a data storage method, computing equipment and a data storage system. The method may include: migrating at least one check block in the first storage area to a second storage area, wherein the reading performance of the second storage area is lower than that of the first storage area; after at least one check block is migrated into the second storage area, migrating a plurality of data blocks in the third storage area into the second storage area, wherein the reading performance of the second storage area is lower than that of the first storage area, the at least one check block and the plurality of data blocks are obtained by erasure code coding of target data, and the at least one check block is used for recovering the target data in combination with the plurality of data blocks when the plurality of data blocks are incomplete. According to the method, the check block of the data is migrated first and then the data block of the data is migrated, so that the influence on the reading performance of the data during data archiving and storage can be reduced, and the storage cost is reduced.

Description

Data storage method, computing device and data storage system
Technical Field
The present disclosure relates to the field of data storage technologies, and in particular, to a data storage method, a computing device, and a data storage system.
Background
As the amount of data to be stored increases, the data may be archived for storage to save storage costs. Specifically, after data generation, a storage medium with high storage cost (such as a Hard Disk Drive (HDD) or a Solid State Drive (SSD)) is used for performing standard storage nearby, so that the storage cost is high but high-efficiency reading performance can be provided; after the data is cooled, the data is archived and stored by using a storage medium (such as a magnetic tape or an optical disk) with low storage cost, so that the storage cost can be reduced but the reading performance of the data is reduced.
The cooling of the data does not occur instantaneously, and premature archival storage of the data may result in reduced read performance of the data, affecting the user experience, and is detrimental to storage costs. Therefore, how to reduce the effect on the reading performance of data when the data is archived and stored is a technical problem to be solved.
Disclosure of Invention
The embodiment of the application provides a data storage method, computing equipment and a data storage system, which are used for reducing the influence on the reading performance of data during data archiving and storage and reducing the storage cost.
In a first aspect, embodiments of the present application provide a data storage method, which may be performed by a computing device, or may be performed by a computing device cluster, or may be performed by a component in a computing device, and is not limited. In the method, the computing device migrates at least one check block in the first storage area into a second storage area, the second storage area having a read performance that is lower than the read performance of the first storage area; after migrating the at least one check block into the second storage area, migrating the plurality of data blocks in the third storage area into the second storage area, the second storage area having a read performance lower than that of the first storage area. The at least one check block and the plurality of data blocks are obtained by erasure code encoding of target data, and the at least one check block is used for recovering the target data in combination with the plurality of data blocks when the plurality of data blocks are incomplete.
Alternatively, the reading performance of the first storage area is the same as or different from the reading performance of the third storage area, and is not limited.
In the above embodiment, the computing device first migrates at least one check block of the target data from the first storage area with higher reading performance to the second storage area with lower reading performance, and after migrating at least one check block of the target data, migrates a plurality of data blocks of the target data from the third storage area with higher reading performance to the second storage area with lower reading performance. That is, the plurality of data blocks of the target data are still stored in the third storage area with higher reading performance in the period from the start of the migration of the at least one check block to the start of the migration of the plurality of data blocks, meaning that the reading of the target data can be achieved by accessing the third storage area with higher reading performance in the period, the reading performance of the target data can be ensured, and thus the influence on the reading performance of the target data when the data archiving storage is premature can be reduced. In addition, the reading performance of the storage area is low, and the storage cost of the storage medium corresponding to the storage area is low, the computing equipment firstly transfers at least one check block of the target data to the second storage area with lower reading performance, so that the storage cost of the storage check block can be reduced, and the scheme can ensure the reading performance of the target data and reduce the storage cost.
In one possible implementation, before migrating at least one check block in the first storage area to the second storage area, the method may further include:
the computing device determines that a first condition is satisfied, wherein the first condition may include one or more of: the storage time length of the target data is larger than or equal to a first threshold value, or the access frequency of the target data is smaller than or equal to a second threshold value;
alternatively, the computing device inputs the storage duration of the target data and/or the access frequency of the target data into the data migration model, and obtains a first output result, where the first output result is used to indicate migration of at least one check block.
In the above implementation, the computing device may conditionally migrate at least one check block of the target data according to the first condition or the data migration model, and may select a more appropriate timing to re-migrate the check block such that the impact on the read performance of the target data is reduced.
In one possible implementation, before migrating the plurality of data blocks in the third storage area to the second storage area, the method may further include:
the computing device determines that a second condition is satisfied, wherein the second condition may include one or more of: the storage time length of the target data is larger than or equal to a third threshold value, or the access frequency of the target data is smaller than or equal to a fourth threshold value;
Alternatively, the computing device inputs the storage duration of the target data and/or the access frequency of the target data into the data migration model, and obtains a second output result, where the second output result is used to instruct migration of the plurality of data blocks.
In the above implementation manner, the computing device may conditionally migrate the plurality of data blocks of the target data according to the second condition or the data migration model, and may select a more appropriate time to migrate the plurality of data blocks, so as to reduce the influence on the reading performance of the target data.
Optionally, the third threshold is greater than the first threshold, and the second threshold is greater than the fourth threshold.
In one possible implementation, before migrating the at least one check block into the second storage area, the method may further include: the method comprises the steps that a computing device receives a read request of target data; in response to the read request, determining that at least one check block is stored in the first storage area and determining that a plurality of data blocks are stored in the third storage area based on metadata of the target data; pulling a plurality of data blocks from a third storage area; and obtaining the target data according to the plurality of data blocks when the plurality of data blocks are complete, or pulling at least one check block from the first storage area when the plurality of data blocks are incomplete, and recovering the target data according to the at least one check block and the plurality of data blocks.
Alternatively, after migrating the at least one check block into the second storage area and before migrating the plurality of data blocks in the third storage area into the second storage area, the method may further include: the method comprises the steps that a computing device receives a read request of target data; in response to the read request, determining that at least one check block is stored in the second storage area and determining that a plurality of data blocks are stored in the third storage area based on metadata of the target data; pulling a plurality of data blocks from a third storage area; and obtaining the target data according to the plurality of data blocks when the plurality of data blocks are complete, or pulling at least one check block from the second storage area when the plurality of data blocks are incomplete, and recovering the target data according to the at least one check block and the plurality of data blocks.
Alternatively, after migrating the plurality of data blocks in the third storage area into the second storage area, the method may further include: the method comprises the steps that a computing device receives a read request of target data; in response to the read request, determining that at least one check block is stored in the second storage area and determining that a plurality of data blocks are stored in the second storage area based on metadata of the target data; pulling a plurality of data blocks from a second storage area; and obtaining the target data according to the plurality of data blocks when the plurality of data blocks are complete, or pulling at least one check block from the second storage area when the plurality of data blocks are incomplete, and recovering the target data according to the at least one check block and the plurality of data blocks.
In the above implementation manner, the computing device may implement reading of the target data according to different storage manners of the target data.
In one possible implementation, the storage cost of the storage medium corresponding to the second storage area is lower than the storage cost of the storage medium corresponding to the first storage area, so that migrating at least one check block to the second storage area can reduce the storage cost. For example, the storage medium corresponding to the second storage area may be a low-storage-cost memory such as a magnetic tape or an optical disk, and the storage medium corresponding to the first storage area may be a high-storage-cost memory such as an HDD or an SSD.
In one possible implementation, the storage cost of the storage medium corresponding to the second storage area is lower than the storage cost of the storage medium corresponding to the third storage area, so that migrating the plurality of data blocks to the second storage area may reduce the storage cost. For example, the storage medium corresponding to the second storage area may be a low-storage-cost memory such as a magnetic tape or an optical disk, and the storage medium corresponding to the third storage area may be a high-storage-cost memory such as an HDD or an SSD.
In a second aspect, embodiments of the present application provide a computing device comprising a memory and a processor, the memory having a computer program stored thereon; the processor is configured to read the computer program stored in the memory and execute it, so that the method described in the first aspect and any one of its possible implementations is performed.
In a third aspect, embodiments of the present application provide a cluster of computing devices, including at least one computing device, each computing device including a processor and a memory; the processor of the at least one computing device is configured to execute instructions stored in the memory of the at least one computing device to cause the cluster of computing devices to perform the method of the first aspect and any one of its possible implementations.
In a fourth aspect, embodiments of the present application provide a data storage system including the computing device provided in the second aspect, the first storage area, the second storage area, and the third storage area in the first aspect and possible implementations thereof.
In a fifth aspect, embodiments of the present application provide a computer program product comprising computer executable instructions for causing a computer to perform the method as described in the first aspect and any possible implementation thereof.
In a sixth aspect, embodiments of the present application provide a computer-readable storage medium having stored therein computer-executable instructions for causing a computer to perform the method as described in the first aspect and any one of the possible implementations thereof.
The technical effects that can be achieved by the second aspect to the sixth aspect may be referred to the description of the beneficial effects in the first aspect and the possible implementation manners thereof, and the detailed description is not repeated here.
Drawings
FIG. 1 is a schematic diagram of a distributed storage system;
fig. 2 is a schematic flow chart of a data storage method according to an embodiment of the present application;
FIG. 2a is a flowchart illustrating another data storage method according to an embodiment of the present disclosure;
FIG. 2b is a flowchart illustrating another data storage method according to an embodiment of the present disclosure;
FIG. 3 is a flowchart illustrating a data storage method according to another embodiment of the present disclosure;
fig. 4 is a schematic diagram of a data storage manner according to an embodiment of the present application;
fig. 5 is a flow chart of a data reading method according to an embodiment of the present application;
FIG. 5a is a flowchart illustrating another data reading method according to an embodiment of the present disclosure;
FIG. 6 is a flowchart illustrating a data reading method according to another embodiment of the present disclosure;
FIG. 7 is a schematic diagram of a computing device provided in an embodiment of the present application;
fig. 8 is a schematic diagram of yet another computing device provided in an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application more apparent, the embodiments of the present application will be described in detail below with reference to the accompanying drawings. The terminology used in the description section of the present application is for the purpose of describing particular embodiments of the present application only and is not intended to be limiting of the present application.
Before describing the embodiments provided herein, some of the terms in this application are explained in order to facilitate understanding by those skilled in the art, and are not limited to the terms in this application.
(1) Magnetic tape: the tape is a flexible tape-shaped magnetic recording medium, and the tape can comprise a tape base and a magnetic surface layer, wherein the tape base can be made of a film polyester material, the magnetic surface layer can be made of a magnetic material such as ferric oxide, and the like, and the tape can store data through the magnetic surface layer, and has the characteristics of large storage capacity, low storage cost, long storage time and the like. The magnetic tape accesses data in a sequential access manner, belonging to a linear storage medium.
(2) And (3) distributed storage: data slices (slicing) are stored in a plurality of independent devices in a scattered manner. The traditional network storage system adopts a centralized storage server to store all data, and the storage server becomes a bottleneck of system performance, is also a focus of reliability and safety, and cannot meet the requirements of large-scale storage application. The distributed network storage system adopts an extensible system structure, utilizes a plurality of storage servers (or referred to as storage nodes, which can be physical nodes (simply referred to as nodes) or virtual nodes (simply referred to as VNode)) to share storage load, and utilizes the position servers to position storage information, so that the reliability, availability and access efficiency of the system can be improved, and the system is easy to extend.
(3) Erasure coding (EC for short): the method is a redundancy protection mechanism and realizes data redundancy protection by calculating check slices. The data are segmented into N data blocks (N is an even number), M check blocks are obtained through calculation of an EC coding algorithm, and N/(N+M) storage utilization rate can be obtained through writing the N+M blocks into the distributed server, and M fragmentation faults are tolerated at most, so that the storage utilization rate is improved, and meanwhile, the reliability is improved.
(4) Cold data is understood to be data with a low access frequency (or a small access volume), such as data with a low access frequency within a set period of time. Thermal data is understood to be data with a high access frequency (or a large access volume), such as data with a high access frequency within a set period of time.
Compared with hot data, cold data has low requirements on reading performance, can be stored in a storage medium with lower reading performance, and generally has lower storage cost of the storage medium with lower reading performance, so that the storage cost can be reduced. Similarly, compared with cold data, hot data has a higher requirement on the reading performance, and can be stored in a storage medium with higher reading performance to support high-frequency data access, and generally, the storage cost of the storage medium with higher reading performance is also higher, so that the requirement on the storage cost of hot data is also higher. The storage of cold data may be referred to as archive storage, cold storage, or the like, and is not limited thereto. The storage of the thermal data may be referred to as standard storage, thermal storage, or the like, without limitation.
It should be noted that the specific partitioning rules or criteria for the cold data and the hot data are not limited in the embodiments of the present application.
(5) In the embodiments of the present application, the term "plurality" refers to two or more, and in view of this, the term "plurality" may also be understood as "at least two" in the embodiments of the present application. "at least one" may be understood as one or more, for example as one, two or more. For example, including at least one means including one, two or more, and not limiting what is included, e.g., including at least one of A, B and C, then A, B, C, A and B, A and C, B and C, or A and B and C, may be included. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a exists alone, A and B exist together, and B exists alone. The character "/", unless otherwise specified, generally indicates that the associated object is an "or" relationship.
Unless stated to the contrary, the embodiments of the present application refer to ordinal terms such as "first," "second," etc., for distinguishing between multiple objects and not for defining a sequence, timing, priority, or importance of the multiple objects. For example, the first storage area, the second storage area, and the third storage area are used to distinguish three storage areas, and the priorities, importance degrees, and the like of the three storage nodes are not limited.
The following describes a storage system applicable to the embodiment of the present application.
Fig. 1 shows a schematic architecture diagram of a storage system applicable to an embodiment of the present application. As shown in fig. 1, the system is a distributed storage system, and may include a management platform and a distributed storage node cluster including a plurality of storage servers (e.g., represented as storage node 1, storage node 2, storage nodes 3, … …, and storage node k, where k is an integer greater than or equal to 2).
The management platform is externally connected with the terminal equipment operated by the user through the Internet, and the user can implement management on the distributed storage service provided by the distributed storage system through the management platform. The management platform may be connected within the system via an internal network to a distributed storage node cluster that may provide storage services for traffic data from the requesting clients. Illustratively, the management platform may be used for functional configuration of a plurality of storage nodes of a distributed storage node cluster, including, but not limited to, one or more of the following: and configuring the number of the storage nodes, and configuring at least one data access strategy for each storage node to realize data encoding and decoding, data migration, data reading and the like.
Multiple storage nodes in the distributed storage node cluster can be deployed in the same area, can also be deployed in different areas, and are not limited. The storage media of the plurality of storage nodes may be the same or different. For example, the storage medium of a part of the storage nodes in the distributed storage node cluster may be a storage medium with high storage cost and high reading performance, such as HDD, or SSD, and the storage medium of the rest of the storage nodes may be a storage medium with low storage cost and low reading performance, such as magnetic tape, or optical disk, and the like.
In an alternative embodiment, the distributed storage system shown in fig. 1 may further include a metadata management cluster (MDC), where the MDC may manage an entire system space of the distributed storage node cluster, may store (or maintain) a storage manner of data, store (or maintain) a correspondence between at least one check block of the data and a plurality of data blocks of the data, and so on. Optionally, the MDC may partition a plurality of storage nodes of the distributed storage system according to the preconfiguration information, and may query the space available for the requesting client when the requesting client applies for the storage space, and query the storage mode and the storage address of the data for the access when the data access is performed according to the requesting client. It should be understood that in the embodiments of the present application, MDC is only an example for illustrating a module for performing a metadata management function, and in other embodiments, the function may be integrated in a requesting client, or the function may be implemented by a functional module with other names, which are not described herein.
It should be noted that, in the embodiment of the present application, the management platform and each storage node may be implemented by software, or may be implemented by hardware. Illustratively, an implementation of the management platform is described next. Similarly, the implementation of a storage node may refer to the implementation of a management platform.
Illustratively, the management platform may include code running on the computing instance as an example of a software functional unit. Wherein the computing instance may be at least one of a physical host (computing device), a virtual machine, a container, etc. computing device. Further, the computing device may be one or more. For example, the management platform may include code running on multiple hosts/virtual machines/containers. It should be noted that, multiple hosts/virtual machines/containers for running the application may be distributed in the same region (region), or may be distributed in different regions. Multiple hosts/virtual machines/containers for running the code may be distributed in the same availability zone (availability zones, AZ) or may be distributed in different AZs, each AZ comprising a data center or multiple geographically close data centers. Wherein typically a region may comprise a plurality of AZs.
Also, multiple hosts/virtual machines/containers for running the code may be distributed in the same virtual private cloud (virtual private cloud, VPC) or in multiple VPCs. Where typically one VPC is placed within one region. The inter-region communication between two VPCs in the same region and between VPCs in different regions needs to set a communication gateway in each VPC, and the interconnection between the VPCs is realized through the communication gateway.
By way of example, a module as one example of a hardware functional unit, the management platform may include at least one computing device, such as a server, cloud server, or the like. Alternatively, the management platform may be a device implemented using an application specific integrated circuit (application specific integrated circuit, ASIC), or an editable logic device (programmable logic device, PLD), or the like. The PLD may be a Complex PLD (CPLD), a field programmable gate array (field programmable gate array, FPGA), a general-purpose array logic (generic array logic, GAL), or any combination thereof. Wherein, the multiple computing devices included in the management platform can be distributed in the same region or in different regions. Likewise, multiple computing devices included in the management platform may be distributed in the same VPC, or may be distributed among multiple VPCs. The plurality of computing devices may be any combination of computing devices such as a server, a cloud server, ASIC, PLD, CPLD, FPGA, GAL, and the like, and are not limited thereto.
It should be understood that fig. 1 illustrates only functional modules that may be included in the distributed storage system, and is not limited to the functions and numbers of the functional modules, which are not limited in this embodiment of the present application. The dashed boxes in the figures represent the respective modules as optional independent modules, and do not limit the product form of the modules, and in some embodiments, the MDC may be inherited at the requesting client, or integrated in the management platform, for example, which is not limited in this embodiment of the present application.
Fig. 2 shows a flow chart of a data storage method according to an embodiment of the present application. The method may be performed by a computing device, or by a cluster of computing devices, or by a component of a computing device. The computing device is, for example, the management platform of fig. 1, without limitation. In fig. 2, an execution body is taken as a management platform, and the management platform includes a data interface module, a data encoding module and a data migration module for example. The data interface module can be used as a data access interface, can provide data uploading and accessing capabilities, and can realize data receiving and sending. The data encoding module can provide various types of data encoding capability and the like. And the data migration module can realize data migration and the like.
In particular, the method may include the following procedure.
S201: the data interface module receives target data.
For example, a tenant may write (or upload) target data by requesting a client; accordingly, the data interface module receives the target data. The target data may be voice data, image data, or video data, and the type and content of the target data are not limited in the embodiment of the present application.
S202: the data interface module sends the target data to the data encoding module. Accordingly, the data encoding module receives the target data from the data interface module.
S203: the data encoding module encodes the target data to obtain at least one check block and a plurality of data blocks.
Specifically, after receiving the target data, the data encoding module encodes the target data to obtain at least one check block and a plurality of data blocks. For example, the data encoding module may perform erasure code encoding on the target data to obtain at least one parity block and a plurality of data blocks.
The at least one check block is obtained by performing erasure code coding on the target data. The at least one check block may be used to recover the target data, e.g., the at least one check block may be used to recover the target data in conjunction with a plurality of data blocks when the plurality of data blocks are incomplete (e.g., data blocks are corrupted, or data blocks are lost, etc.). The plurality of data blocks are obtained by performing erasure code coding on target data, and can be used for obtaining the target data. In other words, the at least one check block and the plurality of data blocks are obtained by erasure code encoding for the target data.
It should be noted that, in the embodiment of the present application, a specific implementation process of encoding the target data by the data encoding module is not limited. In addition, in the embodiment of the present application, erasure code encoding is taken as an example, but the present application is not limited thereto, and for example, the data encoding module may encode the target data by using other encoding methods to obtain at least one check block and a plurality of data blocks.
S204: the data encoding module stores at least one check block and a plurality of data blocks.
For example, the data encoding module may store the at least one check block and the plurality of data blocks, such as a standard store (or a hot store). For example, the data encoding module may partition the at least one check block and the plurality of data blocks.
In one possible implementation, the data encoding module may store at least one check block and a plurality of data blocks in different areas in one storage node. In another possible implementation, the data encoding module may store the at least one check block and the plurality of data blocks in different storage nodes. For example, the data encoding module may store at least one check block into a first storage area and a plurality of data blocks into a third storage area, as shown in FIG. 2.
The first storage area and the third storage area may be different storage areas in the same storage node, or may be storage areas in different storage nodes, which is not limited. The first storage area may be a storage area with higher reading performance. The third storage area may be a storage area with higher reading performance. The read performance of the first storage area may be the same as or different from the read performance of the third storage area, and is not limited. The storage area with higher reading performance can be understood as the storage area supporting higher data reading speed; alternatively, it is also understood that the amount of data read from the storage area per unit time is large, or the like. The embodiment of the application does not limit the calculation mode or the measurement standard of the reading performance.
It should be noted that the read performance of the storage area may be understood as the read performance provided (or supported) by the storage area, the read performance of the storage medium corresponding to the storage area, the read performance provided (or supported) by the storage medium corresponding to the storage area, and the like, without limitation.
Read performance is associated with storage costs. For example, if the read performance of a storage area is high, the storage cost of a storage medium corresponding to the storage area is also high. Conversely, if the reading performance of the storage area is low, the storage cost of the storage medium corresponding to the storage area is also low. In other words, the storage medium corresponding to the first storage area is a storage medium with high storage cost. The storage medium corresponding to the third storage area is a storage medium with high storage cost. The storage cost of the storage medium corresponding to the first storage area may be the same as or different from the storage cost of the storage medium corresponding to the third storage area, and is not limited. For example, the storage medium corresponding to the first storage area may be a storage medium (or memory) with high read performance and high storage cost such as HDD, SDD, or the like, but is not limited thereto. For example, the storage medium corresponding to the third storage area may be a storage medium (or memory) having high read performance and high storage cost such as HDD, SDD, or the like, but is not limited thereto. The storage medium corresponding to the storage area is understood to be a storage medium of a storage node where the storage area is located, and is not limited. It should be noted that the calculation manner or the measurement standard of the storage cost is not limited in the embodiment of the present application.
It should be noted that a storage area may be referred to as a storage pool, a storage module, or the like, and may be understood as a storage area divided in one storage node, or may be understood as a storage area formed by a plurality of storage nodes, without limitation.
Alternatively, the first storage area may be a storage area of a local area for accessing the at least one parity block nearby. The first storage area is a storage area of the local area, which can be understood as that the storage node where the first storage area is located is a storage node of the local area. Where a local domain may be understood as an area where target data is uploaded, e.g. a cluster of storage nodes of location a, then the local domain may be a cluster of storage nodes of location a; alternatively, the target data may be generated by a server cluster of the location B, for example. Alternatively, the third storage area may be a storage area of the local area, so as to access the plurality of data blocks nearby, without limitation.
Further, the MDC may generate metadata of the destination data, and the MDC may refer to the description of fig. 1; alternatively, the management platform may also include a metadata module that may be used to generate metadata for the target data, not shown in FIG. 2. The metadata of the target data can be used for determining a storage mode of the target data. Alternatively, the metadata of the target data may include information of correspondence between at least one check block of the target data and a plurality of data blocks of the target data.
To this end, the target data is stored in such a manner that at least one check block is stored in the first storage area and a plurality of data blocks are stored in the third storage area. Because the first storage area and the third storage area are both storage areas with high storage cost, in order to reduce the storage cost, the data migration module may archive and store at least one check block and a plurality of data blocks, for example, migrate to a storage area with low storage cost after the target data is cooled. However, the cooling of the target data does not occur instantaneously, and the target data archiving storage is too late, which is disadvantageous in reducing the storage cost, and may cause the reading performance of the target data to be reduced, affecting the user experience.
In view of this, the data migration module may execute the contents of S205 and S206, to reduce the impact on the reading performance of the data when the data is archived and stored, and reduce the storage cost.
S205: the data migration module migrates at least one check block in the first storage area to the second storage area.
The second storage area may be a storage area with a low reading performance. A storage area with lower read performance may be understood as a storage area supporting a lower data read speed; alternatively, it is also understood that the amount of data read from the storage area per unit time is small, or the like. For example, the storage medium corresponding to the second storage area may be a storage medium (or memory) with low read performance and low storage cost, such as a magnetic tape, an optical disk, or the like, but is not limited thereto.
Specifically, the read performance of the second storage area is lower than that of the first storage area, i.e., the read performance of the first storage area is higher than that of the second storage area. The reading performance of the second storage area is lower than that of the first storage area, which can be understood that the data reading speed supported by the second storage area is lower than that supported by the first storage area; alternatively, it is also understood that the amount of data read from the second storage area per unit time is smaller than the amount of data read from the first storage area per unit time, or the like. Accordingly, the storage cost of the storage medium corresponding to the second storage area is lower than the storage cost of the storage medium corresponding to the first storage area, i.e. the storage cost of the storage medium corresponding to the first storage area is higher than the storage cost of the storage medium corresponding to the second storage area.
The read performance of the second storage area is lower than the read performance of the third storage area, i.e. the read performance of the third storage area is higher than the read performance of the second storage area. The reading performance of the second storage area is lower than that of the third storage area, which can be understood that the data reading speed supported by the second storage area is lower than that supported by the third storage area; alternatively, it is also understood that the amount of data read from the second storage area per unit time is smaller than the amount of data read from the third storage area per unit time, and the like. Accordingly, the storage cost of the storage medium corresponding to the second storage area is lower than the storage cost of the storage medium corresponding to the third storage area, that is, the storage cost of the storage medium corresponding to the third storage area is higher than the storage cost of the storage medium corresponding to the second storage area.
Alternatively, the second storage area may be a storage area of the local area, or may not be a storage area of the local area, which is not limited. The second storage area and the first storage area may be the same area, or may be different areas, and are not limited. The third storage area and the second storage area may be storage areas of the same region or storage areas of different regions, and are not limited.
In one possible implementation, the data migration module may conditionally migrate at least one check block stored in the first storage area into the second storage area. For example, the data migration module may migrate at least one check stored in the first storage area into the second storage area according to predefined migration rules (or predefined migration policies). For example, the data migration module may migrate at least one check block stored in the first storage area into the second storage area when the target data satisfies the first condition. Specifically, the data migration module may determine that the first condition is satisfied and migrate at least one check block stored in the first storage area into the second storage area.
Wherein the first condition may include one or more of: the storage time of the target data is greater than or equal to a first threshold value, or the access frequency of the target data is less than or equal to a second threshold value. For example, the data migration module may determine that the first condition is met and migrate at least one check block in the first storage area into the second storage area when the data migration module detects (or monitors) that the storage of the target data is greater than or equal to a first threshold and/or when the access frequency of the target data is detected (or monitored) to be less than or equal to a second threshold. The first condition is, for example, not limited, set by the tenant, or set by the administrator, or a predefined migration rule.
The storage duration of the target data may be understood as a storage duration of starting to time when the uploading of the target data is completed, and is not limited. It should be noted that, after the target data is encoded and stored in the form of at least one check block and a plurality of data blocks, before the at least one check block is migrated to the second storage area, the storage duration of the target data may include a duration between when the uploading of the target data is completed and when the at least one check block is stored in the first storage area and a storage duration of the at least one check block in the first storage area; or the time period from when the uploading of the target data is completed to when the plurality of data blocks are stored in the third storage area and the storage time period of the plurality of data blocks in the third storage area are included.
The access frequency of the target data can be determined according to the access record of the target data, and the access frequency is not limited.
In another possible implementation, the data migration module may migrate at least one check block stored in the first storage area into the second storage area according to the data migration model. The data migration model may be a model obtained by performing model training on data such as data storage duration according to a historical data access record (for example, the historical data access record is obtained by interaction with a data interface module) by the data migration module. For example, the data migration module may input the storage duration of the target data into the data migration model to obtain the first output result. For another example, the data migration module may input the access frequency of the target data into the data migration model to obtain the first output result. For another example, the data migration module may input the storage duration of the target data and the access frequency of the target data into the data migration model to obtain the first output result.
Wherein the first output result may be used to indicate that at least one check block of the target data is migrated or to indicate that at least one check block of the target data is not migrated. In this embodiment, taking the example that the first output result is used to indicate to migrate at least one check block.
In the above implementation manner, the output of the data migration model is indication information for indicating whether to migrate at least one check block. In another example, the output of the data migration model is one or more parameters, the data migration module may output the one or more parameters through the data interface module, receive an instruction through the data interface module indicating whether to migrate the at least one parity block, and determine whether to migrate the at least one parity block based on the instruction. The instruction may be from a tenant, or from a manager, without limitation.
It should be noted that the training process of the data migration model and the training algorithm used in the embodiments of the present application are not limited.
Further, the MDC may update the metadata of the destination data to update the storage manner of the destination data, which may be described with reference to fig. 1; alternatively, the management platform may also include a metadata module that may be used to update metadata of the target data to update the manner in which the target data is stored, not shown in fig. 2.
S206: after migrating the at least one parity block to the second storage region, the data migration module migrates the plurality of data blocks in the third storage region to the second storage region.
The reading performance of the second storage area is lower than that of the third storage area, and the description of S205 may be referred to, which is not repeated. Optionally, the storage cost of the storage medium corresponding to the second storage area is lower than the storage cost of the storage medium corresponding to the third storage area, and the description of S205 may be referred to specifically, which is not repeated.
In one possible implementation, the data migration module may conditionally migrate the plurality of data blocks stored in the third storage area into the second storage area after migrating the at least one parity block into the second storage area. For example, the data migration module may migrate the plurality of data blocks stored in the third storage area into the second storage area according to predefined migration rules (or predefined migration policies) after migrating the at least one check block into the second storage area. For example, the data migration module may migrate the plurality of data blocks stored in the third storage area into the second storage area after migrating the at least one check block into the second storage area when the target data satisfies the second condition. Specifically, after migrating the at least one check block into the second storage area, the data migration module may determine that the second condition is satisfied and migrate the plurality of data blocks stored in the third storage area into the second storage area.
Wherein the second condition may include one or more of: the storage time of the target data is greater than or equal to a third threshold value, or the access frequency of the target data is less than or equal to a fourth threshold value. For example, after migrating at least one check block to the second storage area, the data migration module may determine that the second condition is satisfied when the data migration module detects (or monitors) that the storage time of the target data is greater than or equal to a third threshold and/or detects (or monitors) that the access frequency of the target data is less than or equal to a fourth threshold, and migrate the plurality of data blocks in the third storage area into the second storage area. The second condition is, for example, not limited, set by the tenant, or set by the administrator, or a predefined migration rule.
Optionally, the third threshold may be greater than the first threshold, and the fourth threshold may be less than the second threshold, so that it may be ensured that the data migration module archives and stores a plurality of data blocks of the target data after the target data is further cooled, and a problem that the read performance of the target data is reduced due to too early archival and storage of the target data may be reduced.
In another possible implementation, the data migration module may migrate the plurality of data blocks stored in the third storage area into the second storage area according to the data migration model after migrating the at least one check block into the second storage area. The description of the data migration model may refer to S205, and will not be described in detail. For example, the data migration module may input the storage duration of the target data into the data migration model to obtain the second output result after migrating the at least one check block into the second storage area. For another example, the data migration module may input the access frequency of the target data into the data migration model to obtain the second output result after migrating the at least one check block into the second storage area. For another example, the data migration module may input the storage duration of the target data and the access frequency of the target data into the data migration model after migrating the at least one check block into the second storage area to obtain the second output result.
Wherein the second output result may be used to indicate a plurality of data blocks of migration target data or to indicate a plurality of data blocks of non-migration target data. In this embodiment, taking the plurality of data blocks of the migration target data as an example, the second output result is used to indicate the migration target data.
In the above implementation manner, the output of the data migration model is indication information for indicating whether to migrate the plurality of data blocks. In another example, the output of the data migration model is one or more parameters, the data migration module may output the one or more parameters through the data interface module, receive an instruction indicating whether to migrate the plurality of data blocks through the data interface module, and determine whether to migrate the plurality of data blocks according to the instruction. The instruction may be from a tenant, or from a manager, without limitation.
It should be noted that the data migration model used by the data migration module in determining whether to migrate at least one check block may be the same as or different from the data migration model used in determining whether to migrate a plurality of data blocks, and is not limited.
It should be noted that, before migrating at least one check block to the second storage area and not migrating the plurality of data blocks to the second storage area, the storage duration of the target data may include a duration between when the uploading of the target data is completed and when the at least one check block is stored to the first storage area, a duration of the at least one check block stored in the first storage area, a duration of the at least one check block migrated to the second storage area, and a duration of the at least one check block stored in the second storage area; or, the time period from the time when the uploading of the target data is completed to the time when the plurality of data blocks are stored in the third storage area and the storage time period of the plurality of data blocks in the third storage area are included.
Further, the MDC may update the metadata of the destination data to update the storage manner of the destination data, and the MDC may refer to the description of fig. 1; alternatively, the management platform may also include a metadata module that may be used to update metadata of the target data to update the manner in which the target data is stored, not shown in fig. 2.
It should be noted that the data migration module may migrate the at least one check block into the second storage area according to the target data satisfying the first condition. Further, after migrating the at least one check block into the second storage area, the data migration module may migrate the plurality of data blocks into the second storage area according to the target data satisfying the second condition; alternatively, the data migration module may migrate the plurality of data blocks into the second storage area according to the data migration model, without limitation.
Similarly, the data migration module migrates at least one check block into the second storage area according to the data migration model. Further, after migrating the at least one check block into the second storage area, the data migration module may migrate the plurality of data blocks into the second storage area according to the target data satisfying the second condition; alternatively, the data migration module may migrate the plurality of data blocks into the second storage area according to the data migration model, without limitation.
In the embodiment shown in FIG. 2, the data migration module may determine a data migration model. In another possible implementation, the management platform may further include a data analysis module, as shown in fig. 2 a. The data analysis module can provide data analysis functions, such as generating a data migration model and/or migration strategy by big data, machine learning, statistical analysis or other methods. For example, the data analysis module may be configured to generate a data migration model and/or a migration policy, etc., based on historical data storage time and/or historical data access frequency, etc. Further, the data analysis module is further configured to send the data migration model (and/or the migration policy) to the data migration module, and accordingly, the data migration module obtains the data migration model (and/or the migration policy). Optionally, the data analysis module is further configured to obtain a data access record with the data interface module.
In fig. 2b, the storage process of the target data is illustrated by using 4 data blocks and 2 check blocks obtained by erasure code encoding of the target data, where the 4 data blocks are denoted by "D1", "D2", "D3" and "D4", the 2 check blocks are denoted by "C1" and "C2", and the specific implementation process can refer to the relevant content of fig. 2 and will not be described again.
In the above embodiment, the data migration module migrates at least one check block of the target data to the second storage area with low reading performance, and then migrates a plurality of data blocks of the target data to the second storage area. Storing the plurality of data blocks of the target data in the third storage area with high reading performance in the period from the start of the migration of the at least one check block to the start of the migration of the plurality of data blocks means that the reading of the target data can be realized by accessing the third storage area with high reading performance in the period, the reading performance of the target data can be ensured, and the influence on the reading performance of the target data when the data archiving storage is too early can be reduced. In addition, the reading performance of the storage area is low, and the storage cost of the storage medium corresponding to the storage area is low, and the data migration module migrates at least one check block of the target data to the second storage area with low reading performance, so that the reading performance of the target data can be ensured, and the storage cost can be reduced.
Fig. 3 is a schematic flow chart of still another data storage method according to an embodiment of the present application. The method may be performed by a computing device, or may be performed by a cluster of computing devices, or may be performed by a component in a computing device, without limitation. The embodiments of the present application are described by taking a computing device as an example. Alternatively, the computing device may be the management platform of any of the embodiments shown in fig. 1, 2a, or 2b, without limitation. As shown in fig. 3, the method may include the following procedure.
S301: the computing device receives the target data. For example, the tenant uploads the target data by requesting the client; accordingly, the computing device receives the target data.
S302: the computing device performs erasure code encoding on the target data to obtain at least one check block and a plurality of data blocks.
S303: the computing device stores at least one check block into a first storage area and a plurality of data blocks into a third storage area.
Alternatively, the computing device may generate metadata of the target data, where the metadata of the target data may refer to the relevant content of the embodiment shown in fig. 2, which is not described herein.
The description of the first storage area and the third storage area may refer to the relevant content of the embodiment shown in fig. 2, which is not repeated.
S304: the computing device determines whether to migrate the at least one check block into the second storage area.
For example, the computing device may determine whether to migrate at least one check block into the second storage area according to the first condition or the data migration module, and the specific implementation process may refer to the relevant content of the embodiment shown in fig. 2, which is not described herein.
If the computing device determines to migrate the at least one parity block into the second storage area, then S305 is performed; otherwise, S304 is continued. Alternatively, if the number of executions of S304 satisfies the set threshold, the flow ends.
The description of the second storage area may refer to the relevant content of the embodiment shown in fig. 2, which is not repeated.
S305: the computing device migrates at least one check block into the second storage area.
Alternatively, the computing device may update the metadata of the target data to update the manner in which the target data is stored.
S306: the computing device determines whether to migrate the plurality of data blocks into the second storage area.
For example, after the computing device migrates at least one check block into the second storage area, whether to migrate the plurality of data blocks into the second storage area may be determined according to the second condition or the data migration model, and the specific implementation process may refer to the relevant content of the embodiment shown in fig. 2 and will not be described in detail.
If the computing device determines to migrate the plurality of data blocks into the second storage area, then S307 is performed; otherwise, S306 is continued. Alternatively, if the number of executions of S306 satisfies the set threshold, the flow ends.
S307: the computing device migrates the plurality of data blocks into the second storage area.
Alternatively, the computing device may update the metadata of the target data to update the manner in which the target data is stored.
Thus, the target data archiving and storing is completed.
It can be understood that, when the computing device is the management platform in any of the embodiments shown in fig. 2, fig. 2a and fig. 2b, S301 may be executed by the data interface module, S302 and S303 may be executed by the data encoding module, and S304 to S307 may be executed by the data migration module, which may refer to details of the embodiment shown in fig. 2, and will not be described again.
The embodiments shown in fig. 2, 2a, 2b and 3 described above describe the storage of target data. Optionally, the computing device may also receive a read request for the target data based on any of the embodiments shown in fig. 2, 2a, 2b, and 3. Further, the computing device responds to the reading request and determines the storage mode of the target data according to the metadata of the target data; and reading the target data according to the storage mode of the target data.
The reading process of the target data is described next.
Before describing the process of reading target data, the storage manner of target data in any of the embodiments shown in fig. 2, 2a, 2b and 3 will be described. In fig. 4, the target data is subjected to erasure code encoding to obtain 4 data blocks and 2 check blocks, wherein the 4 data blocks are denoted by "D1", "D2", "D3" and "D4", and the 2 check blocks are denoted by "C1" and "C2".
As shown in fig. 4 (a), the target data is stored in a standard manner, that is, at least one check block is stored in the first storage area and a plurality of data blocks are stored in the third storage area.
As shown in (b) of fig. 4, the target data is stored in a hybrid storage manner, that is, at least one check block is stored in the second storage area and a plurality of data blocks are stored in the third storage area.
As shown in fig. 4 (c), the target data is stored in an archive storage manner, that is, at least one check block is stored in the second storage area and a plurality of data blocks are stored in the second storage area.
It can be understood that the naming of the storage manner of the target data is not limited in the embodiment of the present application.
Fig. 5 shows a flowchart of a data reading method according to an embodiment of the present application. The method may be performed by a computing device, or may be performed by a cluster of computing devices, or may be performed by a component in a computing device, without limitation. The computing device is, for example, the management platform of fig. 1, without limitation. In fig. 5, an execution body is taken as a management platform, and the management platform includes a data interface module, a data decoding module and a data reading module for example.
The data interface module can be used as a data access interface, can provide data uploading and accessing capabilities, and can realize data receiving and sending. The data decoding module may provide various types of data decoding capabilities, etc., such as for recovering target data. And the data reading module can realize data reading and the like. Optionally, the management platform may further include a data encoding module, a data migration module, a data analysis module, and the like, which are not shown in fig. 5, and may refer to the relevant content in any one of the embodiments shown in fig. 2, fig. 2a, and fig. 2b, which are not described herein. The data encoding module and the data decoding module may be two independent modules, or may be integrated in one module, such as a data encoding and decoding module, without limitation.
In particular, the method may include the following procedure.
S501: the data interface module receives a read request for target data.
For example, a tenant may send a read request for reading target data through a requesting client; accordingly, the data interface module receives a read request for the target data.
S502: the data interface module sends a read request of the target data to the data reading module. Accordingly, the data reading module receives a read request of the target data from the data interface module.
S503: the data reading module determines the storage mode of the target data according to the metadata of the target data. For example, the data reading module responds to the reading request and determines the storage mode of the target data according to the metadata of the target data.
For example, the data reading module interacts with the MDC, which may refer to the description in fig. 1, to obtain metadata of the destination data; alternatively, the management platform may further include a metadata module, not shown in fig. 5, with which the data reading module interacts to obtain metadata of the target data. The storage mode of the target data may include standard storage, hybrid storage and archive storage, and the foregoing may be referred to specifically, and will not be described in detail.
It is understood that the data reading module may read the target data in response to a read request of the target data, or may actively read the target data, which is not limited.
For example, before migrating at least one check block into the second storage area, the data reading module receives a read request of the target data through the data interface module, and in response, the data reading module may determine, according to metadata of the target data, that a storage manner of the target data is standard storage.
For another example, after migrating at least one check block into the second storage area and before migrating a plurality of data blocks into the second storage area, the data reading module receives a read request of the target data through the data interface module, and in response, the data reading module may determine, according to metadata of the target data, that the storage mode of the target data is hybrid storage.
For another example, after migrating the plurality of data blocks into the second storage area, the data reading module receives a read request of the target data through the data interface module, and in response, the data reading module may determine, according to metadata of the target data, that the storage mode of the target data is archive storage.
S504: the data reading module reads the target data according to the storage mode of the target data.
As described above, the storage manner of the target data is various, and accordingly, the implementation manner of S504 is also various.
For example, the storage mode of the target data is standard storage, and the data reading module pulls a plurality of data blocks from the third storage area; when the plurality of data blocks are complete, the data reading module acquires target data according to the plurality of data blocks (or through the data decoding module), or when the plurality of data blocks are incomplete, the data reading module pulls at least one check block from the first storage area, sends the at least one check block and the plurality of data blocks to the data decoding module, and further, the data decoding module recovers the target data according to the plurality of data blocks and the at least one check block.
For another example, the storage mode of the target data is hybrid storage, and the data reading module pulls a plurality of data blocks from the third storage area; when the plurality of data blocks are complete, the data reading module acquires target data according to the plurality of data blocks (or through the data decoding module), or when the plurality of data blocks are incomplete, the data reading module pulls at least one check block from the second storage area, sends the at least one check block and the plurality of data blocks to the data decoding module, and further, the data decoding module recovers the target data according to the plurality of data blocks and the at least one check block.
For another example, the storage mode of the target data is archive storage, and the data reading module pulls a plurality of data blocks from the second storage area; when the plurality of data blocks are complete, the data reading module acquires target data according to the plurality of data blocks (or through the data decoding module), or when the plurality of data blocks are incomplete, the data reading module pulls at least one check block from the second storage area, sends the at least one check block and the plurality of data blocks to the data decoding module, and further, the data decoding module recovers the target data according to the plurality of data blocks and the at least one check block.
Optionally, after the data reading module reads the target data, the target data may be sent through the data interface module, for example, to a target data requester, which is not shown in fig. 5.
Fig. 5a illustrates the reading process of the target data by using the target data to obtain 4 data blocks and 2 check blocks through erasure code encoding, wherein the storage mode of the target data is hybrid storage, the 4 data blocks are denoted by "D1", "D2", "D3" and "D4", and the 2 check blocks are denoted by "C1" and "C2". As shown in fig. 5a, in S503, the data reading module determines, according to metadata of the target data, that the storage mode of the target data is hybrid storage. S504 in fig. 5 is implemented by S504a, or S504a and S504b. In S504a, the data reading module pulls a plurality of data blocks from the third storage area. Optionally, when the plurality of data blocks are incomplete, the data reading module pulls at least one check block from the second storage area, i.e. performs S504b. The specific implementation process may refer to the relevant content of fig. 5, and will not be described in detail.
Fig. 6 is a schematic flow chart of still another data reading method according to an embodiment of the present application. The method may be performed by a computing device, or may be performed by a cluster of computing devices, or may be performed by a component in a computing device, without limitation. The embodiments of the present application are described by taking a computing device as an example. Alternatively, the computing device may be the management platform of any of the embodiments shown in fig. 1, 5 or 5a, without limitation. As shown in fig. 6, the method may include the following procedure.
S601: the computing device receives a read request for target data.
S602: the computing device determines a storage mode of the target data according to the metadata of the target data.
The specific implementation process of S602 may refer to the content of S503, which is not described herein.
If the storage manner of the target data is standard storage, the contents of S603 to S607 are executed. If the storage mode of the target data is hybrid storage, contents of S608 to S612 are performed. If the storage manner of the target data is standard storage, contents of S613 to S617 are performed.
S603: the computing device pulls the plurality of data blocks from the third storage area.
S604: the computing device determines whether the plurality of data blocks are complete.
For example, the computing device may check whether the plurality of data blocks are complete, such as by checking internal information (e.g., verification information, etc.) between the plurality of data blocks to determine whether the plurality of data blocks are complete. If there is a corruption of the data block and/or a loss of the data block from the plurality of data blocks, the computing device determines that the plurality of data blocks are incomplete; otherwise, the computing device determines that the plurality of data blocks are complete.
In S604, when the plurality of data blocks are complete, the computing device performs S605. Alternatively, when the plurality of data blocks are incomplete, the computing device performs S606 and S607.
S605: the computing device obtains target data from the plurality of data blocks.
The embodiment of the application does not limit the specific implementation process of the computing device for acquiring the target data according to the plurality of data blocks.
S606: the computing device pulls at least one check block from the first storage area.
S607: the computing device recovers the target data from the at least one check block and the plurality of data blocks.
The specific implementation process of the computing device for recovering the target data according to the at least one check block and the plurality of data blocks is not limited in the embodiments of the present application.
So far, when the storage mode of the target data is standard storage, the computing device reads the target data.
S608: the computing device pulls the plurality of data blocks from the third storage area.
S609: the computing device determines whether the plurality of data blocks are complete.
The specific implementation process of S609 may refer to S604, and will not be described in detail.
In S609, when the plurality of data blocks are complete, the computing device performs S610. Alternatively, when the plurality of data blocks are incomplete, the computing device performs S611 and S612.
S610: the computing device obtains target data from the plurality of data blocks.
S611: the computing device pulls at least one check block from the second storage area.
S612: the computing device recovers the target data from the at least one check block and the plurality of data blocks.
So far, when the storage mode of the target data is hybrid storage, the computing device reads the target data.
S613: the computing device pulls the plurality of data blocks from the second storage area.
S614: the computing device determines whether the plurality of data blocks are complete.
The specific implementation process of S614 may refer to S604, which is not described herein.
In S614, when the plurality of data blocks are complete, the computing device executes S615. Alternatively, when the plurality of data blocks are incomplete, the computing device performs S616 and S617.
S615: the computing device obtains target data from the plurality of data blocks.
S616: the computing device pulls at least one check block from the second storage area.
S617: the computing device recovers the target data from the at least one check block and the plurality of data blocks.
So far, when the storage mode of the target data is archive storage, the computing device reads the target data.
It can be appreciated that when the computing device is the management platform in any of the embodiments shown in fig. 5 or fig. 5a, S601 may be executed by the data interface module, S605, S607, S610, S612, S615 and S617 may be executed by the data decoding module, and the remaining steps in fig. 6 may be executed by the data reading module, which may refer to details of the embodiments shown in fig. 5 or fig. 5a, and will not be described again.
It should be noted that the division of modules in the embodiments shown in any of fig. 2, 2a, 2b, 5 and 5a is taken as an example and is not limited thereto. For example, the management platform may be divided into a processing module and a transceiver module. The processing module may be configured to implement the functionality of one or more of a data encoding module, a data decoding module, a data analysis module, a data migration module, and a data reading module. The transceiver module may be used to implement the functionality of the data interface module.
The method embodiment is based on the same technical concept as the method embodiment, and the embodiment of the application further provides a computing device 100, where the computing device 100 may be a physical machine or a virtual machine server. The computing device 100 may be used to implement the methods of any of the embodiments shown in fig. 2, 2a, 2b, 3, 5a, and 6, and may implement the benefits provided by the method embodiments described above.
In some embodiments, the computing device 100 may be a management platform in the various method embodiments described above, and may be used to implement the methods described in the various method embodiments described above.
As shown in fig. 7, the computing device 100 may include a processor 110 and a memory 120 coupled to the processor 110. The processor 110 and the memory 120 may be interconnected by a bus, and the processor 110 may serve as a main processor of the computing device 100, i.e., a control core of the computing device 100. The bus may be a peripheral component interconnect standard (peripheral component interconnect, PCI) bus or an extended industry standard architecture (extended industry standard architecture, EISA) bus, or the like. The buses may be divided into address buses, data buses, control buses, etc. The specific connection medium between the processor 110 and the memory 120 is not limited in the embodiments of the present application.
Memory 120 may be a volatile memory, such as a random access memory; the memory may also be a non-volatile memory such as, but not limited to, read-only memory, flash memory, hard disk (HDD) or Solid State Drive (SSD), or the memory 120 is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. The memory 120 may be a combination of the above memories. Memory 120 may be used to store software programs and modules.
The processor 110 may include one or more processors, and the processor 110 executes various functional applications and data processing of the computing device 100 by running software programs and modules stored in the memory 120, such as a data storage method provided by an embodiment of the present application and/or a data reading method provided by an embodiment of the present application.
For example, the processor 110 may be configured to migrate at least one check block in a first storage area to a second storage area, where the second storage area has a read performance that is lower than the read performance of the first storage area; and after migrating the at least one check block into the second storage area, migrating the plurality of data blocks in the third storage area into the second storage area, the second storage area having a read performance lower than that of the first storage area. The at least one check block and the plurality of data blocks are obtained by erasure code encoding of target data, and the at least one check block is used for recovering the target data in combination with the plurality of data blocks when the plurality of data blocks are incomplete.
In some embodiments, as shown in fig. 8, computing device 100 may further include a communication module 130, communication module 130 being connected to processor 110 and memory 120 by a bus. The communication module 130 is configured to communicate with a terminal device of a tenant through a network.
It should be understood that the architecture illustrated by embodiments of the present application are not intended to constitute a particular limitation on computing devices. In other embodiments of the present application, a computing device may include more or fewer components than shown, or may combine certain components, or split certain components, or a different arrangement of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.
Embodiments of the present application also provide a computing device cluster including at least one computing device, each of which may employ the structure shown in fig. 7 or fig. 8, including a processor and a memory. The processor of the at least one computing device is configured to execute instructions stored in the memory of the at least one computing device to perform the various method embodiments described above.
The steps of the method in the embodiments of the present application may be implemented by means of hardware, or may be implemented by means of a processor executing a computer program or instructions. The computer program or instructions may constitute a computer program product.
The embodiment of the application also provides a data storage system, which comprises a computing device, a first storage area, a second storage area and a third storage area. The computing device may employ the structures shown in fig. 7 or 8 for performing the various method embodiments described above.
Embodiments of the present application also provide a computer program product comprising computer-executable instructions. In one embodiment, the computer-executable instructions are for causing a computer to perform the functions of the method embodiments shown in any one of fig. 2, 2a, 2b, 3, 5a, and 6.
Embodiments of the present application also provide a computer-readable storage medium having executable instructions stored therein. In one embodiment, the computer-executable instructions are for causing a computer to perform the functions of the method embodiments shown in any one of fig. 2, 2a, 2b, 3, 5a, and 6.
The computer readable storage medium provided by the embodiments of the present application may be a random access memory (random access memory, RAM), a flash memory, a read-only memory (ROM), a programmable read-only memory (programmableROM, PROM), an erasable programmable read-only memory (EPROM), an electrically erasable programmable read-only memory (EEPROM), a register, a hard disk, a removable hard disk, a CD-ROM, or any other form of computer readable storage medium known in the art.
The computer-executable instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer program or instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center by wired or wireless means. The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that integrates one or more available media. The usable medium may be a magnetic medium, e.g., floppy disk, hard disk, tape; optical media, such as digital video discs (digital video disc, DVD); but also semiconductor media such as solid state disks.
In the various embodiments of the application, if there is no specific description or logical conflict, terms and/or descriptions between the various embodiments are consistent and may reference each other, and features of the various embodiments may be combined to form new embodiments according to their inherent logical relationships. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion, such as a series of steps or elements. The method, system, article, or apparatus is not necessarily limited to those explicitly listed but may include other steps or elements not explicitly listed or inherent to such process, method, article, or apparatus.
Although the present application has been described in connection with specific features and embodiments thereof, it will be apparent that various modifications and combinations can be made without departing from the spirit and scope of the application. Accordingly, the specification and drawings are merely exemplary of the arrangements defined in the appended claims and are to be construed as covering any and all modifications, variations, combinations, or equivalents that are within the scope of the application.
It will be apparent to those skilled in the art that various modifications and variations can be made in the present application without departing from the scope of the application. Thus, if such modifications and variations of the embodiments of the present application fall within the scope of the claims and the equivalents thereof, the present application is intended to encompass such modifications and variations.

Claims (11)

1. A method of data storage, the method comprising:
migrating at least one check block in a first storage area into a second storage area, wherein the reading performance of the second storage area is lower than that of the first storage area;
after migrating the at least one check block into the second storage area, migrating a plurality of data blocks in a third storage area into the second storage area, the second storage area having a read performance lower than that of the first storage area;
The at least one check block and the plurality of data blocks are obtained by erasure code encoding of target data, the at least one check block is used for jointly recovering the target data with the plurality of data blocks when the plurality of data blocks are incomplete, and the reading performance of the first storage area is the same as or different from the reading performance of the third storage area.
2. The method of claim 1, wherein prior to migrating at least one parity block in the first storage region to the second storage region, the method further comprises:
determining that a first condition is satisfied, wherein the first condition includes one or more of:
the storage time length of the target data is greater than or equal to a first threshold value; or alternatively, the process may be performed,
the access frequency of the target data is less than or equal to a second threshold.
3. The method of claim 1 or 2, wherein prior to migrating the plurality of data blocks in the third storage area into the second storage area, the method further comprises:
determining that a second condition is satisfied, wherein the second condition comprises one or more of:
the storage time length of the target data is greater than or equal to a third threshold value; or alternatively, the process may be performed,
The access frequency of the target data is less than or equal to a fourth threshold.
4. A method according to claim 1 or 3, wherein before migrating at least one check block in the first storage area into the second storage area, the method further comprises:
and inputting the storage time length of the target data and/or the access frequency of the target data into a data migration model to obtain a first output result, wherein the first output result is used for indicating migration of the at least one check block.
5. The method of claim 1, 2 or 4, wherein prior to migrating the plurality of data blocks in the third storage area into the second storage area, the method further comprises:
and inputting the storage time length of the target data and/or the access frequency of the target data into a data migration model to obtain a second output result, wherein the second output result is used for indicating migration of the plurality of data blocks.
6. The method of any of claims 1-5, wherein after migrating the at least one parity block into the second storage region and before migrating a plurality of data blocks in a third storage region into the second storage region, the method further comprises:
Receiving a read request of the target data;
in response to the read request, determining that the at least one check block is stored in the second storage area and that the plurality of data blocks are stored in the third storage area according to metadata of the target data;
pulling the plurality of data blocks from the third storage area;
when the plurality of data blocks are complete, acquiring the target data according to the plurality of data blocks; or when the plurality of data blocks are incomplete, pulling the at least one check block from the second storage area, and recovering the target data according to the at least one check block and the plurality of data blocks.
7. A computing device comprising a memory and a processor;
the memory stores a computer program;
the processor is configured to invoke a computer program stored in the memory to perform the method of any of claims 1 to 6.
8. A cluster of computing devices, comprising at least one computing device, each computing device comprising a processor and a memory;
the processor of the at least one computing device is configured to execute instructions stored in the memory of the at least one computing device to cause the group of computing devices to perform the method of any one of claims 1 to 6.
9. A data storage system comprising a first storage area, a second storage area, a third storage area, and a computing device, wherein the computing device is configured to perform the method of any of claims 1-6.
10. A computer readable storage medium having instructions stored therein which, when executed on a processor, cause the processor to perform the method of any one of claims 1 to 6.
11. A computer program product comprising instructions which, when executed by a processor, cause the processor to perform the method of any one of claims 1 to 6.
CN202310233483.9A 2023-02-28 2023-02-28 Data storage method, computing device and data storage system Pending CN116382573A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310233483.9A CN116382573A (en) 2023-02-28 2023-02-28 Data storage method, computing device and data storage system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310233483.9A CN116382573A (en) 2023-02-28 2023-02-28 Data storage method, computing device and data storage system

Publications (1)

Publication Number Publication Date
CN116382573A true CN116382573A (en) 2023-07-04

Family

ID=86963976

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310233483.9A Pending CN116382573A (en) 2023-02-28 2023-02-28 Data storage method, computing device and data storage system

Country Status (1)

Country Link
CN (1) CN116382573A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117407124A (en) * 2023-12-13 2024-01-16 之江实验室 Service execution method based on constructed data arrangement strategy generation model

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117407124A (en) * 2023-12-13 2024-01-16 之江实验室 Service execution method based on constructed data arrangement strategy generation model
CN117407124B (en) * 2023-12-13 2024-03-12 之江实验室 Service execution method based on constructed data arrangement strategy generation model

Similar Documents

Publication Publication Date Title
US11907585B2 (en) Storing data sequentially in zones in a dispersed storage network
US10248506B2 (en) Storing data and associated metadata in a dispersed storage network
US11693789B2 (en) System and method for mapping objects to regions
US10969962B2 (en) Compacting data in a dispersed storage network
US10356150B1 (en) Automated repartitioning of streaming data
US11416166B2 (en) Distributed function processing with estimate-based scheduler
US11449280B1 (en) Dynamic provisioning and activation of storage pools
WO2021213281A1 (en) Data reading method and system
US11226778B2 (en) Method, apparatus and computer program product for managing metadata migration
CN116382573A (en) Data storage method, computing device and data storage system
US10831714B2 (en) Consistent hashing configurations supporting multi-site replication
US10078468B2 (en) Slice migration in a dispersed storage network
CN109840051B (en) Data storage method and device of storage system
EP4170499A1 (en) Data storage method, storage system, storage device, and storage medium
US20180341697A1 (en) Pre-allocating filesystem metadata within an object storage system
EP4369170A1 (en) Method and apparatus for data storage in storage system
US10241878B2 (en) System and method of data allocation providing increased reliability of storage
CN116594551A (en) Data storage method and device
US11726658B2 (en) Method, device, and computer program product for storage management
US11544387B2 (en) Hash protection within an object storage library
US11971902B1 (en) Data retrieval latency management system
US11347596B2 (en) Preliminary data protection using composite copies of data in a data storage system
US11093157B2 (en) Method, electronic device and computer program product for storage management
US11435913B2 (en) Storage policy matching using difference-based scoring
US11347553B2 (en) Data distribution for fast recovery in cluster-based storage systems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination