CN116028276B

CN116028276B - Delay data reconstruction method, delay data reconstruction device, storage node and storage medium

Info

Publication number: CN116028276B
Application number: CN202310166673.3A
Authority: CN
Inventors: 经宁; 樊官跃; 齐泽青
Original assignee: Shenzhen Fanlian Information Technology Co ltd
Current assignee: Shenzhen Fanlian Information Technology Co ltd
Priority date: 2023-02-27
Filing date: 2023-02-27
Publication date: 2023-06-09
Anticipated expiration: 2043-02-27
Also published as: CN116028276A

Abstract

The embodiment of the invention provides a delay data reconstruction method, a delay data reconstruction device, a storage node and a storage medium, and a data object to be recovered and an associated hard disk are obtained according to a fault hard disk. The method comprises the steps that a data object with the expiration of a life cycle skips data reconstruction, when the data reliability of the data object to be recovered in the life cycle is lower than a reliability threshold, the data object is determined to be a reconstruction object, and the data reconstruction is carried out on all the reconstruction objects according to the order of priority from high to low; the priority is positively correlated with the lifecycle and negatively correlated with the probability of data reliability. Therefore, the data objects to be recovered with low data reliability in the life cycle are screened for data reconstruction, in the data reconstruction process, the data objects with low data reliability are preferably reconstructed, and the data objects close to the tail period of the life cycle are reconstructed with lower priority, so that the reconstruction data volume and the reconstruction times are greatly reduced, the reconstruction efficiency is improved, and the influence of the reconstruction on the system performance is effectively reduced.

Description

Delay data reconstruction method, delay data reconstruction device, storage node and storage medium

Technical Field

The present invention relates to the field of distributed storage systems, and in particular, to a method, an apparatus, a storage node, and a storage medium for reconstructing delayed data.

Background

In a distributed storage system, a storage device is typically composed of a plurality of servers, which are generally referred to as distributed storage nodes, abbreviated as nodes. Each node places a hard disk through a hard disk slot to provide a storage space. The related health state of the hard disk is monitored and early-warned by adopting a Self-monitoring, analyzing and reporting technology (Self-MonitoringAnalysis and Reporting Technology, short for SMART) in the industry, so that faults are recognized in advance, and timely treatment is facilitated. Common hard Disk types include a mechanical hard Disk and a Solid State Disk (SSD), wherein a plurality of nodes are uniformly managed through distributed storage software to form a logically uniform storage resource pool for a user to read and write files or objects (data objects for short). Distributed storage systems rely primarily on data redundancy to provide security and reliability of data, with copies and erasure codes being common at present. And when the hard disk is recovered, the data of the remaining nodes are read and written, and the data are recovered and written into the new hard disk in a corresponding redundancy mode.

In the prior art, in order to avoid data loss, the data reconstruction is immediately performed after the failed hard disk is replaced by a new hard disk. In the data reconstruction process, all data stored on the fault hard disk are recovered and reconstructed, and even if the life cycle of the data is expired or is close to expiration, the data is recovered and reconstructed, so that the reconstruction data volume is increased, and the system performance is affected. When a plurality of hard disks fail in sequence, the data reconstruction is repeated for a plurality of times. Therefore, system resources such as a system CPU and a memory are continuously occupied, the read-write pressure of a hard disk is increased, the system performance is further reduced, and the normal business of a user is further influenced.

Disclosure of Invention

In view of the above, the present invention is to provide a method, an apparatus, a storage node and a storage medium for delaying data reconstruction, which can perform data reconstruction by screening a data object to be recovered with low data reliability in a life cycle, and in the data reconstruction process, the data object with low data reliability is preferably reconstructed, and the data object near the tail of the life cycle is reconstructed with a lower priority, so that the reconstruction data amount and the reconstruction times are greatly reduced, the reconstruction efficiency is improved, and the influence of the reconstruction on the system performance is effectively reduced.

In order to achieve the above object, the technical scheme adopted by the embodiment of the invention is as follows:

in a first aspect, the present invention provides a method for reconstructing delayed data, applied to a storage node, the method comprising:

acquiring a data object to be restored, an associated hard disk list, a life cycle of the data object to be restored and a reliability factor of the associated hard disk; the data object to be recovered is a data object of which the data is positioned on the fault hard disk; the associated hard disk list is used for recording hard disk information which does not have faults in the corresponding strips of all the data objects to be recovered; the reliability factor is used for representing the reliability of the associated hard disk, and is obtained by evaluating the self-monitoring, analysis and reporting technology SMART information of the associated hard disk in the associated hard disk list;

Respectively acquiring target associated hard disks of target data objects and average fault-free working time MTBF data of the target associated hard disks; the target associated hard disk is at least one of the associated hard disk lists; the target data object is the data object to be restored in the life cycle;

calculating to obtain the data reliability probability of the target data object according to the life cycle, the MTBF data and the reliability factor of the target associated hard disk;

determining the target data object as a reconstruction object when the data reliability probability is below a reliability threshold;

carrying out data reconstruction on all the reconstruction objects according to the order of the priority from high to low; the priority is positively correlated with the lifecycle and negatively correlated with the data reliability probability.

In an alternative embodiment, the step of obtaining the data object to be restored and the associated hard disk list includes:

receiving a data reconstruction instruction; the data reconstruction instruction comprises fault hard disk information;

and traversing the hard disk data distribution metadata to obtain the data object to be recovered and the associated hard disk list which are related to the fault hard disk.

In an alternative embodiment, the method further comprises:

And skipping data reconstruction for the data object to be restored when the life cycle of the data object to be restored expires or expires.

In an optional embodiment, the step of calculating the data reliability probability of the target data object according to the life cycle, the MTBF data and the reliability factor of the target associated hard disk respectively includes:

according to the life cycle and the MTBF data, calculating to obtain the fault probability of the target associated hard disk;

and obtaining the data reliability probability according to the fault probability and the reliability factor of the target associated hard disk.

In an optional embodiment, the step of reconstructing data for all the reconstructed objects in order of priority from high to low includes:

dividing at least one priority according to the numerical range of the data reliability probability; each priority corresponds to a reconstruction queue;

adding all the reconstruction objects into the corresponding reconstruction queues according to the priority;

and sequentially taking out the reconstruction objects from the corresponding reconstruction queues according to the order of the priority from high to low, and carrying out data reconstruction.

In an optional embodiment, after the step of adding all the reconstructed objects to the corresponding reconstruction queues according to the priority, the method further includes:

When the reconstructed object meets a removal condition, removing the reconstructed object from a reconstruction queue; the removal condition characterizes the reconstructed object as deleted or migrated.

In an alternative embodiment, the formula of the data reliability probability of the data object to be restored is:

wherein y is the probability of data loss of the data object to be recovered in the life cycle; x is the data reliability probability of the data object to be recovered; n is the number of data blocks of the data object to be restored; m is the number of check blocks of the data object to be recovered or the number of copies corresponding to the data blocks; b is the life cycle of the data object to be restored; a is MTBF data of the target associated hard disk; d is the reliability factor of the target associated hard disk.

In a second aspect, the present invention provides a delayed data reconstruction device for use in a storage node, the device comprising:

the acquisition module is used for acquiring the data object to be restored, the associated hard disk list, the life cycle of the data object to be restored and the reliability factor of the associated hard disk; the data object to be recovered is a data object of which the data is positioned on the fault hard disk; the associated hard disk list is used for recording hard disk information which does not have faults in the corresponding strips of all the data objects to be recovered; the reliability factor is used for representing the reliability of the associated hard disk, and is obtained by evaluating the self-monitoring, analysis and reporting technology SMART information of the associated hard disk in the associated hard disk list; respectively acquiring target associated hard disks of target data objects and average fault-free working time MTBF data of the target associated hard disks; the target associated hard disk is at least one of the associated hard disk lists; the target data object is the data object to be restored in the life cycle;

The decision module is used for calculating the data reliability probability of the target data object according to the life cycle, the MTBF data and the reliability factor of the target associated hard disk respectively; determining the target data object as a reconstruction object when the data reliability probability is below a reliability threshold;

the reconstruction module is used for reconstructing data according to the order of priority from high to low for all the reconstruction objects; the priority is positively correlated with the lifecycle and negatively correlated with the data reliability probability.

In a third aspect, the present invention provides a storage node comprising a memory for storing a computer program and a processor for performing the delayed data reconstruction method of any of the previous embodiments when the computer program is invoked.

In a fourth aspect, the present invention provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a delayed data reconstruction method as in any of the previous embodiments.

Compared with the prior art, the method, the device, the storage node and the storage medium for reconstructing the delayed data provided by the embodiment of the invention acquire the data object to be recovered, the associated hard disk list, the life cycle of the data object to be recovered and the reliability factor of the associated hard disk; the data object to be recovered is a data object of which the data is positioned on the fault hard disk; the associated hard disk list is used for recording hard disk information which does not have faults in the corresponding strips of all the data objects to be recovered; the reliability factor is used for representing the reliability of the associated hard disk, and is obtained by evaluating the self-monitoring, analysis and reporting technology SMART information of the associated hard disk in the associated hard disk list; respectively acquiring the target associated hard disk of the target data object and the average fault-free working time MTBF data of the target associated hard disk; the target associated hard disk is at least one of the associated hard disk lists; the target data object is a data object to be restored in a life cycle; respectively calculating to obtain the data reliability probability of the target data object according to the life cycle, the MTBF data and the reliability factor of the target associated hard disk; determining the target data object as a reconstruction object when the data reliability probability is below a reliability threshold; carrying out data reconstruction according to the order of priority from high to low for all the reconstruction objects; the priority is positively correlated with the lifecycle and negatively correlated with the probability of data reliability. Therefore, the data objects to be recovered with low data reliability in the life cycle are screened for data reconstruction, in the data reconstruction process, the data objects with low data reliability are preferably reconstructed, and the data objects close to the tail period of the life cycle are reconstructed with lower priority, so that the reconstruction data volume and the reconstruction times are greatly reduced, the reconstruction efficiency is improved, and the influence of the reconstruction on the system performance is effectively reduced.

In order to make the above objects, features and advantages of the present invention more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 shows a schematic diagram of a prior art distributed storage system.

Fig. 2 is a schematic flow chart of a delayed data reconstruction method according to an embodiment of the present invention.

Fig. 3 shows a schematic flow chart of a sub-step of step S103 and step S105 in fig. 2.

Fig. 4 shows a schematic diagram of a check block failure scenario of erasure codes.

Fig. 5 shows a schematic diagram of a lifecycle of a data object.

Fig. 6 shows a schematic diagram of an erasure coded data block failure scenario.

Fig. 7 is a schematic diagram of a scenario of a delayed data reconstruction method according to an embodiment of the present disclosure.

Fig. 8 is a schematic flow chart of a delayed data reconstruction method according to an embodiment of the present invention.

Fig. 9 is a block diagram of a delayed data reconstruction device according to an embodiment of the present invention.

Fig. 10 shows a block schematic diagram of a storage node according to an embodiment of the present invention.

Icon: 100-storage nodes; 110-memory; a 120-processor; 130-a communication module; 200-delay data reconstruction means; 201-an acquisition module; 202-a decision module; 203-a reconstruction module.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be made by a person skilled in the art without making any inventive effort, are intended to be within the scope of the present invention.

It is noted that relational terms such as "first" and "second", and the like, are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

In order to ensure the reliability of the distributed storage system, data is generally stored in a data redundancy mode. Because of the redundant data, the system can support tolerating hard disk or node and even cabinet level failures. In the event of failure of one or more hard disks, nodes, and even cabinets, the failed data may be recovered from the redundant data. The erasure codes are increasingly widely used due to the advantages of high hard disk utilization rate, low cost and the like. Although data can be recovered by redundant data, the failure of data beyond the redundancy capability will result in data corruption or loss.

Therefore, in order to avoid data damage or loss, in the prior art, data reconstruction and recovery are generally performed immediately after a new hard disk is replaced, and when a plurality of hard disks, nodes and even a cabinet fail in sequence, multiple data reconstruction is performed. During the data reconstruction process, all the affected data are subjected to reconstruction recovery, wherein the reconstruction recovery comprises the data with the life cycle expired or about to expire. The data reconstruction mode can generate a large amount of reconstruction data, so that more read-write IO is generated, the IO load of a hard disk is further increased, and the performance of a storage system is influenced.

In order to more clearly understand the relationships among the distributed storage systems, nodes, hard disks, and data objects, a detailed description is given by way of example of fig. 1. Assuming that the distributed storage system has 6 nodes, each node has 2 hard disks inserted into its hard disk slot, and 4 data objects already exist in the distributed storage system. In order to better ensure data security, the problem that a plurality of data blocks are lost to affect data reliability due to single node faults is avoided, the data blocks of the data objects are stored on hard disks of different nodes, and when single node faults occur, only one data is lost, and the data can be recovered through redundancy.

Because the hard disks in the nodes are relatively independent and usually do not exist in the same strip, the node faults can be decomposed into a plurality of independent hard disk faults, and after the node faults occur, the hard disks in the nodes can be sequentially or parallelly processed.

In fig. 1, it is assumed that the data object 1 guarantees the reliability of data in a redundancy manner of erasure codes 4+2. The erasure code 4+2 mechanism divides data into 4 data blocks, then 2 check data blocks are obtained through calculation, and the 6 data blocks are respectively stored on different hard disks of different nodes. When a part of hard disks or nodes in the system fail, the system reliability is reduced. In order to avoid data damage or loss caused by exceeding redundancy recovery capability by multiple disk failures, currently, a failed hard disk or node is replaced as soon as possible, data reconstruction is started, residual data blocks are read to perform recovery calculation, data of the failed data blocks are obtained, and the recovered data are written into the replaced hard disk. The time period of the system in an unstable state is shortened as much as possible, and when the repair is completed, the redundancy capacity of the system is recovered, so that the preset reliability can be achieved.

Similarly, in the duplicate redundancy mode, multiple copies of the same data are stored in the system, when part of hard disks or nodes in the system fail, the corresponding number of the copies is reduced, the redundant capacity of the data is reduced, and when the number of the failed hard disks or nodes exceeds the number of the copies, the data is lost. Therefore, when the hard disk fails, the failure needs to be repaired in time and data reconstruction is started, the time period that the system is in an unstable state is shortened, and when the repair is completed, the redundancy capacity of the system is recovered, so that the preset reliability can be achieved.

When the copy is adopted to improve the data reliability, taking 100TB capacity as an example, under three scenes, a 300TB hard disk is needed, the hard disk utilization rate is 33%, and when the copy is increased to 4 copies, a 400TB hard disk is needed, and the hard disk utilization rate is 25%. It can be seen that, although the data reliability is improved with the increase of the number of copies, the cost of operation is increased, and the read-write performance is reduced. Therefore, in practical applications, 2 copies or 3 copies are usually configured.

Obviously, when the prior art performs fault recovery, in order to recover the reliability of the distributed storage system as soon as possible, the data reconstruction of all affected data is started immediately after the replacement of the failed hard disk, and when a plurality of failed hard disks are replaced with new hard disks in sequence, a plurality of data reconstruction is generated. The data reconstruction mode can generate a large amount of reconstruction data and generate more read-write IO, so that the IO load of a disk is increased, and the performance of a storage system is influenced. It is worth mentioning that the actual value is low by recovering the data whose life cycle has expired or is about to expire.

Based on the above, the embodiment of the invention provides a delayed data reconstruction method, a delayed data reconstruction device, a storage node and a storage medium. The data objects to be recovered with low data reliability in the life cycle are screened for data reconstruction, in the data reconstruction process, the data objects with low data reliability are preferably reconstructed, and the data objects close to the tail period of the life cycle are reconstructed with lower priority, so that the reconstruction data volume and the reconstruction times are greatly reduced, the reconstruction efficiency is improved, and the influence of the reconstruction on the system performance is effectively reduced.

Embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

Referring to fig. 2, a schematic diagram of delayed data reconstruction according to an embodiment of the present invention is shown, and the method includes the following steps:

step S101, acquiring a data object to be restored, an associated hard disk list, a life cycle of the data object to be restored, and a reliability factor of the associated hard disk.

The data object to be recovered is a data object of which the data is positioned on the fault hard disk; the associated hard disk list is used for recording hard disk information which does not have faults in the corresponding strips of all the data objects to be recovered; the reliability factor is used for representing the reliability of the associated hard disk, and is obtained according to the self-monitoring, analysis and reporting technology SMART information evaluation of the associated hard disk in the associated hard disk list.

In the distributed storage system, in order to ensure the reliability and the security of data, the data of one data object is distributed in a plurality of nodes and a plurality of hard disks for storage. When a hard disk fails, all data objects with data on the failed hard disk are affected, that is, the reliability of the data objects is reduced.

Specifically, taking fig. 1 as an example, assuming that the data objects in fig. 1 all store data in a redundancy manner of erasure codes 4+2, the data objects can tolerate faults of at most two hard disks, and when more than two hard disks fail, the data stored in the failed hard disk by the data objects cannot be recovered, which may cause data damage or loss. When the hard disk 1 fails and a new disk is replaced, the system notifies the data reconstruction. According to the notification, the hard disk with the fault is the hard disk 1, and before the data reconstruction, the data object affected by the hard disk 1 is obtained as a data object 1 and a data object 2, namely the data object to be recovered; the associated hard disk list, hard disk 2 through hard disk 6, is obtained in one redundant stripe with hard disk 1.

Data objects written to a storage system will typically have a set lifecycle, i.e., set timeliness. For example, in a video surveillance scenario, the lifecycle of data storage is typically one or three months, and data exceeding the lifecycle will be deleted directly or dumped into an external storage medium. There is little practical value in reconstructing data from failure data whose life cycle has expired or is near expired, and the data will be deleted or dumped after recovery. This provides the possibility that data does not recover after a hard disk failure.

The hard disk is a core component for data storage in a storage system, the reliability of the hard disk is very important for the reliability of data, and the SMART is an automatic hard disk state detection and early warning system and specification. The operation conditions of hardware such as magnetic heads, discs, motors and circuits of the hard disk are monitored, recorded and compared with preset safety values set by manufacturers through detection instructions in the hardware of the hard disk, and if the monitoring conditions are about to or exceed the safety range of the preset safety values, warning can be automatically made through monitoring hardware or software of a host computer and slight automatic repair can be carried out, so that the safety of the data of the hard disk is ensured in advance.

In the embodiment of the invention, in order to solve the problem of resource consumption in the data reconstruction process, the data object to be recovered for data reconstruction is determined by combining the life cycle of the data object and SMART information of the associated hard disk, so that the data object to be recovered is rarely recovered or even not recovered as much as possible.

Step S102, average fault-free working time MTBF data of a target associated hard disk of the target data object and the target associated hard disk are respectively obtained.

The target associated hard disk is at least one of the associated hard disk lists; the target data object is the data object to be restored within the life cycle.

In the embodiment of the invention, when the hard disk fault recovery is performed for data reconstruction, the data object to be recovered in the life cycle validity period is used as a target data object, part of data of the target data object is stored on the fault hard disk, the hard disks where other data are located are online hard disks, and the online hard disks are obtained, namely the target associated hard disk. And finally, acquiring the average fault-free working time MTBF data of the target associated hard disk.

Specifically, MTBF data of a hard disk is generally provided by a hard disk vendor, which is an indicator for measuring reliability of the hard disk, and may be "hours" in units that reflect time quality of the hard disk, and is an ability to embody a function of maintaining the hard disk for a specified time.

In the embodiment of the invention, the MTBF data may be preset by an administrator through an interactive interface or a third-party server, and may also be set as a default value during system initialization, which is not limited in the invention.

And step S103, calculating the data reliability probability of the target data object according to the life cycle, the MTBF data and the reliability factor of the target associated hard disk.

In the embodiment of the invention, the life cycle of the target data object, the MTBF data of the hard disk and the reliability factor obtained based on the SMART information of the target associated hard disk are comprehensively considered to obtain the data reliability probability of the target data object. Because the target object and the associated hard disk are monitored at multiple angles, the data reliability probability which relatively accords with the actual situation can be obtained, the screening of the reconstructed object based on the probability is more accurate, the recovery is as little as possible, even no recovery is realized, and the performance influence of the data reconstruction on the storage system is reduced as much as possible.

Step S104, when the data reliability probability is lower than the reliability threshold, determining the target data object as a reconstruction object.

In the embodiment of the invention, the data object to be recovered is traversed to analyze whether the data reconstruction is needed. When the data reliability of the data object to be recovered in the life cycle is lower than the reliability threshold, the data reliability of the target data object is lower, and data reconstruction is needed, so that the situation that the hard disk failure exceeds redundancy capacity again to cause data damage or loss is avoided.

It should be noted that the reliability threshold may be preset by an administrator through an interactive interface or a third party server. The reliability threshold support administrator may modify according to the actual application scenario, and may also use a default value of the system, for example, the default reliability threshold is 99.99%, which is not limited to the present invention.

Step S105, data reconstruction is performed for all the reconstruction objects in order of priority from high to low.

Wherein the priority is positively correlated with the lifecycle and negatively correlated with the probability of data reliability.

In the embodiment of the invention, the length of the life cycle of the reconstruction objects and the height of the data reliability are considered, all the reconstruction objects are divided into different priorities, and the data reconstruction is carried out according to the order of the priorities from high to low.

In summary, the method for reconstructing delayed data provided by the embodiment of the present invention obtains the data object to be restored, the associated hard disk list, the life cycle of the data object to be restored, and the reliability factor of the associated hard disk; the data object to be recovered is a data object of which the data is positioned on the fault hard disk; the associated hard disk list is used for recording hard disk information which does not have faults in the corresponding strips of all the data objects to be recovered; the reliability factor is used for representing the reliability of the associated hard disk, and is obtained by evaluating the self-monitoring, analysis and reporting technology SMART information of the associated hard disk in the associated hard disk list; respectively acquiring the target associated hard disk of the target data object and the average fault-free working time MTBF data of the target associated hard disk; the target associated hard disk is at least one of the associated hard disk lists; the target data object is a data object to be restored in a life cycle; respectively calculating to obtain the data reliability probability of the target data object according to the life cycle, the MTBF data and the reliability factor of the target associated hard disk; determining the target data object as a reconstruction object when the data reliability probability is below a reliability threshold; carrying out data reconstruction according to the order of priority from high to low for all the reconstruction objects; the priority is positively correlated with the lifecycle and negatively correlated with the probability of data reliability. Therefore, the data objects to be recovered with low data reliability in the life cycle are screened for data reconstruction, in the data reconstruction process, the data objects with low data reliability are preferably reconstructed, and the data objects close to the tail period of the life cycle are reconstructed with lower priority, so that the reconstruction data volume and the reconstruction times are greatly reduced, the reconstruction efficiency is improved, and the influence of the reconstruction on the system performance is effectively reduced.

Alternatively, in practical applications, the data of the data objects are stored on different hard disks, respectively, and the data objects stored on the hard disks and the association relationship between the hard disks are generally managed by metadata. With continued reference to fig. 2, the sub-step of obtaining the data object to be restored and the associated hard disk list in step S101 may include:

receiving a data reconstruction instruction; the data reconstruction instruction comprises fault hard disk information; and traversing the hard disk data distribution metadata to obtain the data objects to be recovered and the associated hard disk list which are related to the fault hard disk.

In the embodiment of the invention, after the fault hard disk is replaced by a new hard disk, the hard disk management module informs the system to reconstruct data, acquires the fault hard disk information in the data reconstruction instruction after receiving the data reconstruction instruction, traverses the hard disk data distribution metadata, and acquires the data object with the data on the fault hard disk as the data object to be recovered. Because the data objects are stored in a data redundancy mode, the data of each data object is scattered and stored on different hard disks. In order to determine the reliability of the data object data, all online hard disks except the failed hard disk, i.e., the associated hard disk, need to be acquired. And recording all the associated hard disk information in the associated hard disk list.

Optionally, in practical applications, a lifecycle and a processing policy are generally set according to application scenario user data, and a data processing policy for expiration of the lifecycle is generally automatic deletion or migration of the system. For expired or expired data objects, the method further comprises the steps of:

when the lifecycle of the data object to be restored expires or expires, data reconstruction for the data object to be restored is skipped.

In embodiments of the present invention, the vast majority of data that expires or expires in a lifecycle has typically been deleted or migrated. When the data object to be recovered is obtained by delaying the data reconstruction, the expired or outdated data object which is not deleted or migrated in time can be obtained, the data reconstruction is skipped for the part of the data object, and the data object is automatically deleted or migrated by a storage system. Therefore, the data reconstruction data volume is reduced, and the performance influence of the data reconstruction on the storage system is relieved.

Optionally, in practical application, reliability in the data life cycle is mainly affected by factors such as the data life cycle, the hard disk MTBF data, and the reliability factor of the hard disk health. Referring to fig. 3 on the basis of fig. 2, the substeps of step S103 may include:

And step S1031, calculating to obtain the fault probability of the target associated hard disk according to the life cycle and the MTBF data.

In the embodiment of the invention, the life cycle can be a data residual life cycle, the data residual life cycle is the valid period of the data, the data exceeding the data residual life cycle is deleted or migrated from the current storage system, and the longer the time is, the higher the probability of occurrence of data faults is, and the lower the data reliability is. And calculating the fault probability of the target associated hard disk according to the life cycle and the MTBF data. For example, if the MTBF data of the hard disk is a hours and the life cycle is b days, the probability of failure of the associated hard disk is

。

Step S1032, obtaining the data reliability probability according to the fault probability and the reliability factor of the target associated hard disk.

In the embodiment of the invention, the health state information of the hard disk can be obtained by obtaining the SMART information of the hard disk, and the reliability factor of the single hard disk can be obtained by integrating one or more health state information according to the actual application scene. The reliability of the target data object is affected by the health state of the target associated hard disk, and the worst value or average value of the reliability factors can be selected from a plurality of associated hard disks as the reliability factor of the target associated hard disk, so that the invention is not limited.

Specifically, assuming that the reliability factor of the target associated hard disk is d, calculating to obtain the data reliability probability according to the failure probability c and the reliability factor d of the target associated hard disk. Taking the actual Power-On Time Count (Power-On Time Count) of the hard disk as an example, the actual Power-On Time of the nth hard disk of the target associated hard disk obtained in the SMART information is

The MTBF data of the hard disk is +.>

The reliability factor of the nth hard disk is +.>

. Similarly, the reliability factor of each hard disk in the target associated hard disk can be obtained, assuming that the reliability factor of all hard disks of the target associated hard disk is +.>

Considering the reliability of the data object comprehensively, the worst value can be selected as the reliability factor of the target associated hard disk, such as +.>

。

Optionally, in practical application, the data object with the probability of data reliability lower than the reliability threshold needs to be subjected to data reconstruction, so that in order to ensure that the data object with low data reliability can be recovered preferentially, the data reconstruction can be divided into different priorities according to the data reliability, and then the data reconstruction can be performed according to the priorities. Referring to fig. 3 on the basis of fig. 2, the substeps of step S105 may include:

step S1051, dividing at least one priority according to the numerical range of the data reliability probability; each priority corresponds to a reconstruction queue.

In the embodiment of the invention, the numerical range of the reliability probability of each priority can be defined according to the preset reliability threshold, and each priority corresponds to a reconstruction queue for storing the reconstruction objects of the corresponding priority. For example, taking a reliability threshold of 99.99999% as an example, the probability of data reliability of priority L1 is 99.99% or less, and the probability range of data reliability of priority L2 is

The probability range of data reliability of priority L3 is

The data reliability probability range of priority L4 is +.>

. Data with higher probability of data reliability is less likely to be lost or damaged, and therefore data objects with low data reliability should be preferentially processed. From this, the priorities are ranked as L1, L2, L3, L4 from high to low.

Step S1052, adding all the reconstruction objects into the corresponding reconstruction queues according to the priority.

In the embodiment of the invention, the priority of the reconstruction object is determined according to the data reliability probability of the reconstruction object, and the reconstruction task of the reconstruction object is added into the corresponding reconstruction queue according to the priority. For example, the data reconstruction may be performed in a first-in first-out mode.

Step S1053, sequentially taking out the reconstructed objects from the corresponding reconstruction queues according to the order of the priority from high to low, and performing data reconstruction.

In the embodiment of the invention, the data reconstruction execution process sequentially takes out the reconstruction tasks of the reconstruction objects from the corresponding queues according to the order of the priority from high to low for data reconstruction, and the reconstruction tasks are deleted from the reconstruction queues after the execution of the reconstruction tasks is completed.

Optionally, in practical applications, the data object added to the reconstruction queue may be automatically deleted or manually deleted by the user due to expiration of the life cycle, and the data object needs to be deleted correspondingly to the data reconstruction task when the data object is deleted. The method may include the steps of:

and when the reconstructed object meets the removal condition, removing the reconstructed object from the reconstruction queue.

Wherein the removal condition characterizes that the reconstructed object is deleted or migrated.

As a specific embodiment, when deleting or migrating a data object, it is necessary to check whether the data object waits for data reconstruction in the reconstruction queue, and if so, remove it from the reconstruction queue, and delete or remove the data object. After the data object is removed from the reconstruction queue, the data reconstruction execution process continues to sequentially reconstruct the data of the rest data objects in the reconstruction queue;

as yet another embodiment, when a data object is deleted or migrated, the data object is directly deleted or migrated. And if the data object is in the reconstruction queue, when the data reconstruction execution process performs data reconstruction on the data object, the data object is removed from the reconstruction queue and the data reconstruction is continuously performed on the rest data objects in the reconstruction queue in sequence.

Alternatively, in practical applications, in order to reconstruct data objects whose data reliability is low preferentially, it is necessary to calculate the data reliability probabilities of all the data objects to be restored. The formula of the data reliability probability of the data object to be recovered is:

wherein y is the probability of data loss of the data object to be recovered in the life cycle; x is the data reliability probability of the data object to be recovered; n is the number of data blocks of the data object to be restored; m is the number of check blocks of the data object to be recovered or the number of copies corresponding to the data blocks; b is the life cycle of the data object to be restored; a is MTBF data of a target associated hard disk; d is the reliability factor of the target associated hard disk.

In the embodiment of the present invention, the unit of the life cycle b of the data object to be restored may be year, day, hour, or minute, which is not limited to the present invention, and the unit is uniformly converted into "hour" before calculation using a formula. The MTBF data of the target associated hard disk is in units of "hours".

Specifically, taking erasure codes n+m as an example, the number of data blocks of a service is N, the number of check blocks is M, and when a hard disk fails, the more the total number of available data blocks and check blocks is, the higher the redundancy degree is, and the higher the data reliability is. When the number of failed hard disks in the N+M hard disks exceeds M, data loss is caused, the copy is the same, but the number of data blocks N of the service is 1, the number of data block copies is M, and for the copy, only one data block is available, and the data can be accessed.

In order to more clearly illustrate the delayed data reconstruction method provided by the embodiment of the invention, after the replacement of a new hard disk is completed by a failed hard disk, a storage system is notified to restore the data of the failed hard disk. Taking erasure codes as an example, and combining different scenes of hard disk faults for carrying out exemplary explanation, the processing mode of the copy mode can refer to the erasure codes, and the copy can be regarded as 1+M processing.

As an embodiment, taking fig. 1 as an example, the data of data object 3 and data object 4 are stored on hard disks 7-12. When the hard disk 1 fails, since the data of the data object 3 and the data object 4 are not stored on the hard disk 1, there is no need to perform the delayed data reconstruction processing of the present embodiment. The read-write processing of the data object 3 and the data object 4 is the same as the normal erasure coding mode, so that normal read-write can be performed, and the normal business flow is not affected.

As another embodiment, taking fig. 4 as an example, the hard disk stored in the check block fails, when the replacement of the failed hard disk with a new hard disk is completed, the system calculates the data reliability probability of the data object in the life cycle, and when the data reliability probability is greater than or equal to the reliability threshold, the data reconstruction process is not performed, so that the performance loss of the system caused by the reconstruction is reduced.

Taking erasure code n+6 as an example, assuming that a hard disk stored in a check block fails, the life cycle of the affected data object is 30 days, and the preset reliability threshold is 99.99%. And calculating the reliability probability by combining the life cycle of the data object, the MTBF data of the hard disk and the reliability factor of the target associated hard disk, and if the reliability probability of the data is more than 99.99%, meeting the requirement of being higher than a reliability threshold value although the reliability of the data is degraded without actively carrying out data reconstruction recovery processing.

Because the data block is checked when the fault occurs, when a user reads the data object, the data can be read normally without recovering through an erasure code algorithm, and the check block is not required to be read additionally, so that the influence on the system performance is avoided. After the hard disk is recovered from faults, the user can write data normally and newly. According to the rules of the life cycle of the data object, over time, the data with the original reliability degraded successively enter the life cycle tail sound and are successively deleted, so that the hardware repair is completed for 30 days, all the data in the degraded state already reach the life cycle and are deleted, the data redundancy capability is recovered to be normal, and the system state is completely recovered to be normal, as shown in fig. 5.

As yet another embodiment, taking fig. 6 as an example, when the hard disk stored by the data block fails, and when the probability of data reliability of the data object in the life cycle is higher than the reliability threshold after the replacement of the failed hard disk by the new hard disk is completed, the data is automatically recovered to be normal after the life cycle expires without reconstructing the failed data.

Because the fault is a data block, when a user reads the fault data, the user needs to read the rest available data blocks, recover the fault data according to the rest available data blocks, and return the recovered fault data to the user. At this time, the data reconstruction can be completed by only storing the recovered fault data in the replaced new hard disk. When the data is read again later, the data is directly read without recovering the data, and the system performance is effectively improved.

As another embodiment, taking fig. 7 as an example, a plurality of hard disks where the data objects are located fail, when the probability of data reliability of the data objects in the life cycle is lower than the reliability threshold after the replacement of the failed hard disk with a new hard disk is completed, the data objects are subjected to data reconstruction so as to ensure the data reliability.

Taking erasure codes as an example, the preset reliability threshold is 99.99%. And calculating the reliability probability by combining the life cycle of the data object, the MTBF data of the hard disk and the reliability factor of the target associated hard disk, and if the reliability probability of the data is less than 99.99%, reducing the reliability of the data and being lower than a reliability threshold value, and actively reconstructing the data of the data object. And normally storing the newly written data to the replaced new hard disk, orderly reconstructing the fault data according to the sequence from high priority to low priority, and delaying the end of the data reconstruction processing flow after all the analysis of the related data objects of the fault hard disk is completed, wherein the delayed data reconstruction flow is shown in fig. 8. In the reconstruction process, the residual data is read once, the recovery of a plurality of fault data blocks is completed through an erasure code algorithm, the data reading and writing times are greatly reduced, and the system performance is effectively improved.

The invention intelligently judges whether to reconstruct the data by analyzing the reliability of the data in the life cycle, and reconstructs the data according to the order of priority from high to low. The method and the device only carry out reconstruction restoration on the data of which the reliability can not meet the threshold requirement, greatly reduce the influence of data reconstruction on the system performance, further solve the technical problem that the influence of large-proportion erasure code reconstruction on the system performance in the prior art is large, and provide a basic technical scheme for further improving the erasure code matching ratio.

In order to clearly compare the hard disk utilization rate and data reliability of erasure codes under different proportioning conditions in the prior art, a 64-node (hard disk) is taken as an example for explanation.

According to the erasure code principle, the hard disk utilization rate is directly related to the number N of data disks and the number M of check disks, and is as follows

When N and M are increased in the same proportion, the utilization rate of the disk is unchanged, but higher fault tolerance can be brought.

4 strips are configured according to erasure codes 14+2, the utilization rate of the hard disk is 87.5%, at the moment, any 2 hard disks are supported to simultaneously fail at maximum, the MTBF data of the hard disk is assumed to be 250 ten thousand hours, and the probability of data reliability is 99.998%; 2 strips are configured according to erasure codes 32+2, the hard disk utilization rate is 93.75%, at the moment, any 2 hard disks are supported to be simultaneously failed at maximum, and the data reliability probability is 99.98%; 1 stripe is configured according to erasure codes 60+4, the hard disk utilization rate is 93.75%, at the moment, any 4 hard disks are supported to be simultaneously failed at maximum, and the data reliability probability is 99.9996%; compared with the small-proportion erasure code ratio, the reliability is greatly improved under the condition that the hard disk utilization rate is the same.

Obviously, the large-scale erasure codes are more scattered when data are stored, and under a fault scene, the data are required to be read from more nodes to be calculated and recovered, so that the system performance is greatly influenced. Meanwhile, the larger the M value is, the higher the computation complexity is, the higher hardware resource configuration is required to be provided, the influence of data reliability, hard disk utilization rate and data reconstruction on the system performance is comprehensively considered, the main stream distributed storage in the industry currently mainly adopts erasure codes N+2, N+4 is supported at most, and N+M supports 32 nodes at most.

Therefore, the redundancy capability of the small-proportion erasure codes in the prior art is relatively poor, when faults occur, the data reliability needs to be immediately recovered, so that sufficient spare equipment is needed to be prepared or high-reliability spare part services are purchased, and when the system faults are identified, the spare part services are quickly replaced, so that the operation and maintenance cost is greatly increased.

According to the scheme, the reliability probability of the data object in the life cycle is calculated, the data is reconstructed according to the priority for the data object with the data reliability probability lower than the reliability threshold, and the priority is positively correlated with the life cycle and negatively correlated with the data reliability probability. Therefore, after the hard disk fault is recovered, the data reconstruction recovery is not immediately carried out, and when the data reliability is lower than the threshold value, the recovery is started, the hard disk fault can be supported for a plurality of times to carry out the data reconstruction in a concentrated manner, and a plurality of fault data can be recovered by only reading the data once in the recovery process. By combining with data life cycle management, the data reconstruction priority of the tail sounds close to the life cycle reduces the data which has expired in the life cycle without reconstruction, greatly reduces the reconstruction data volume, greatly improves the reconstruction efficiency and effectively reduces the influence of data reconstruction on the system performance.

Therefore, the reconstruction data volume is greatly reduced, the influence on the system performance is effectively reduced, and the effects of the same hard disk utilization rate and higher reliability can be realized through the large-scale erasure codes under the condition of the same nodes. The hard disk is not required to be replaced or repaired immediately after being failed, and the system can still keep higher reliability, so that operation and maintenance personnel can make regular inspection, and the operation and maintenance personnel can deal with the failure as soon as possible after finding the failure without paying attention to the failure state of the equipment at any time, thereby reducing operation and maintenance pressure, further reducing the stock quantity of standby equipment and effectively reducing operation cost.

Based on the same inventive concept, an embodiment of the present invention further provides a delay data reconstruction device, and referring to fig. 9, a block schematic diagram of the delay data reconstruction device 200 provided by the embodiment of the present invention is shown, where the delay data reconstruction device is applied to a storage node. The delay data reconstruction device 200 includes an acquisition module 201, a decision module 202 and a reconstruction module 203.

An obtaining module 201, configured to obtain a data object to be restored, an associated hard disk list, a life cycle of the data object to be restored, and a reliability factor of the associated hard disk; the data object to be recovered is a data object of which the data is positioned on the fault hard disk; the associated hard disk list is used for recording hard disk information which does not have faults in the corresponding strips of all the data objects to be recovered; the reliability factor is used for representing the reliability of the associated hard disk, and is obtained by evaluating the self-monitoring, analysis and reporting technology SMART information of the associated hard disk in the associated hard disk list; respectively acquiring the target associated hard disk of the target data object and the average fault-free working time MTBF data of the target associated hard disk; the target associated hard disk is at least one of the associated hard disk lists; the target data object is the data object to be restored within the life cycle.

The decision module 202 is configured to calculate a data reliability probability of the target data object according to the life cycle, the MTBF data, and the reliability factor of the target associated hard disk, respectively; when the data reliability probability is below the reliability threshold, the target data object is determined to be a reconstructed object.

A reconstruction module 203, configured to reconstruct data for all reconstructed objects in order of priority from high to low; the priority is positively correlated with the lifecycle and negatively correlated with the probability of data reliability.

Optionally, the acquiring module 201 is specifically configured to receive a data reconstruction instruction; the data reconstruction instruction comprises fault hard disk information; and traversing the hard disk data distribution metadata to obtain the data objects to be recovered and the associated hard disk list which are related to the fault hard disk.

Optionally, the decision module 202 is specifically configured to skip the data reconstruction for the data object to be restored when the life cycle of the data object to be restored expires or expires.

Optionally, the decision module 202 is specifically configured to calculate, according to the life cycle and the MTBF data, a failure probability of the target associated hard disk; and obtaining the data reliability probability according to the fault probability and the reliability factor.

Optionally, the reconstruction module 203 is specifically configured to divide at least one priority according to a numerical range of the probability of data reliability; each priority corresponds to a reconstruction queue; adding all the reconstruction objects into corresponding reconstruction queues according to the priority; and sequentially taking out the reconstruction objects from the corresponding reconstruction queues according to the order of the priority from high to low, and carrying out data reconstruction.

Optionally, the reconstruction module 203 is specifically configured to remove the reconstructed object from the reconstruction queue when the reconstructed object meets a removal condition; the removal condition characterizes that the reconstructed object is deleted or migrated.

Optionally, the decision module 202 is specifically configured to obtain a data reliability probability of the data object to be restored, where the formula of the data reliability probability of the data object to be restored is:

Referring to fig. 10, a block diagram of a storage node 100 according to an embodiment of the present invention is shown. Storage node 100 includes memory 110, processor 120, and communication module 130. The memory 110, the processor 120, and the communication module 130 are electrically connected directly or indirectly to each other to realize data transmission or interaction. For example, the components may be electrically connected to each other via one or more communication buses or signal lines.

Wherein the memory 110 is used for storing programs or data. The Memory 110 may be, but is not limited to, random access Memory (Random Access Memory, RAM), read Only Memory (ROM), programmable Read Only Memory (PROM), erasable Read Only Memory (Erasable ProgrammableRead-Only Memory, EPROM), electrically erasable Read Only Memory (Electric ErasableProgrammable Read-Only Memory, EEPROM), etc.

The processor 120 is used to read/write data or programs stored in the memory 110 and perform corresponding functions. For example, the delayed data reconstruction methods disclosed in the above embodiments may be implemented when a computer program stored in the memory 110 is executed by the processor 120.

The communication module 130 is used for establishing communication connection between the storage node 100 and other communication terminals through a network, and for transceiving data through the network.

It should be understood that the structure shown in fig. 10 is merely a schematic diagram of the structure of the storage node 100, and that the storage node 100 may also include more or fewer components than shown in fig. 10, or have a different configuration than shown in fig. 10. The components shown in fig. 10 may be implemented in hardware, software, or a combination thereof.

The embodiments of the present invention also provide a computer readable storage medium having stored thereon a computer program which, when executed by the processor 120, implements the delayed data reconstruction method disclosed in the above embodiments.

In summary, the method, the device, the storage node and the storage medium for reconstructing delay data provided by the embodiment of the invention acquire the data object to be restored, the associated hard disk list, the life cycle of the data object to be restored and the reliability factor of the associated hard disk; the data object to be recovered is a data object of which the data is positioned on the fault hard disk; the associated hard disk list is used for recording hard disk information which does not have faults in the corresponding strips of all the data objects to be recovered; the reliability factor is used for representing the reliability of the associated hard disk, and is obtained by evaluating the self-monitoring, analysis and reporting technology SMART information of the associated hard disk in the associated hard disk list; respectively acquiring the target associated hard disk of the target data object and the average fault-free working time MTBF data of the target associated hard disk; the target associated hard disk is at least one of the associated hard disk lists; the target data object is a data object to be restored in a life cycle; respectively calculating to obtain the data reliability probability of the target data object according to the life cycle, the MTBF data and the reliability factor of the target associated hard disk; determining the target data object as a reconstruction object when the data reliability probability is below a reliability threshold; carrying out data reconstruction according to the order of priority from high to low for all the reconstruction objects; the priority is positively correlated with the lifecycle and negatively correlated with the probability of data reliability. Therefore, the data objects to be recovered with low data reliability in the life cycle are screened for data reconstruction, in the data reconstruction process, the data objects with low data reliability are preferably reconstructed, and the data objects close to the tail period of the life cycle are reconstructed with lower priority, so that the reconstruction data volume and the reconstruction times are greatly reduced, the reconstruction efficiency is improved, and the influence of the reconstruction on the system performance is effectively reduced.

In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other manners as well. The apparatus embodiments described above are merely illustrative, for example, of the flowcharts and block diagrams in the figures that illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, functional modules in the embodiments of the present invention may be integrated together to form a single part, or each module may exist alone, or two or more modules may be integrated to form a single part.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, randomAccess Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method of deferred data reconstruction, applied to a storage node, the method comprising:

2. The method for reconstructing delayed data according to claim 1, wherein said step of obtaining a list of data objects to be restored, associated hard disks, comprises:

3. The method of delayed data reconstruction of claim 1, further comprising:

4. The method for reconstructing delayed data according to claim 1, wherein the step of calculating the data reliability probability of the target data object according to the life cycle, the MTBF data, and the reliability factor of the target associated hard disk, respectively, comprises:

5. The delayed data reconstruction method according to claim 1, wherein said step of performing data reconstruction in order of priority from high to low for all of said reconstructed objects comprises:

6. The method of claim 5, wherein after the step of adding all the reconstructed objects to the corresponding reconstruction queue according to priorities, the method further comprises:

7. The method for reconstructing delayed data according to claim 1, wherein the formula of the probability of data reliability of the data object to be restored is:

8. A delayed data reconstruction device for use with a storage node, said device comprising:

9. A storage node, characterized in that the storage node comprises a memory for storing a computer program and a processor for performing the delayed data reconstruction method according to any of claims 1-7 when the computer program is invoked.

10. A computer readable storage medium having stored thereon a computer program, which when executed by a processor implements the delayed data reconstruction method as claimed in any of claims 1-7.