WO2018001110A1

WO2018001110A1 - Method and device for reconstructing stored data based on erasure coding, and storage node

Info

Publication number: WO2018001110A1
Application number: PCT/CN2017/088477
Authority: WO
Inventors: 江滢; 王志坤
Original assignee: 中兴通讯股份有限公司
Priority date: 2016-06-29
Filing date: 2017-06-15
Publication date: 2018-01-04
Also published as: CN107544862A; CN107544862B

Abstract

A method and a device for reconstructing stored data based on erasure coding, and a storage node. The method comprises: determining a fault recovery initiation threshold, the fault recovery initiation threshold being less than or equal to the difference between the number of blocks of stored data of a stripe and the minimum number of blocks for erasure coding reconstruction and being greater than or equal to 1 (S301); for a stripe, of which the number of fault data blocks reaches the fault recovery initiation threshold, initiating fault recovery of the stripe (S302); using non-fault data blocks of the stripe to perform data reconstruction (S303). Compared to the related art, the invention reduces the amount of times fault recovery is required to be performed, thereby reducing the bandwidth consumption of the system, such that the system is more stable and has improved service performance.

Description

Storage data reconstruction method and device based on erasure code, storage node

Technical field

The present application relates to the field of communications, for example, to a method and apparatus for storing data based on erasure code, and a storage node.

Background technique

In recent years, with the explosive growth of information resources and data, distributed storage systems have become the foundation and core of cloud storage and big data with high performance, high scalability, high availability, and easy management. However, due to hardware damage and software failure, data corruption and loss may occur during data storage. Cloud storage systems generally use erasure code technology to improve fault tolerance and improve data resource utilization efficiency and system performance. The erasure code does not increase the excess storage space, and usually ensures high reliability and availability of data through reasonable redundancy coding. In the cloud storage system, the use of erasure code technology to save data, compared to the full replication technology, to a large extent reduce the system space overhead, but at the same time due to data reconstruction will bring huge network overhead, the adoption of this The type of erasure code technology may cause the network of the entire system or the network of some nodes to be congested and unable to provide services, which affects the performance of the system. Moreover, as the system scale and disk capacity increase, the number of nodes deployed in the current storage system is increasing, and the number of nodes that fail every day is also increasing. The proportion of data recovery traffic will continue to increase in total network traffic. Access that greatly affects daily business data. Therefore, how to reduce the bandwidth consumption in the erasure code technology and ensure the service performance are currently worthy of consideration.

Summary of the invention

The embodiments of the present disclosure provide a storage data reconstruction method and apparatus based on erasure code, and a storage node, so as to solve the problem of high bandwidth consumption in the storage data reconstruction in the related art, the system is unstable, and the service performance is not good. The problem.

To solve the above technical problem, an embodiment of the present disclosure provides a method for reconstructing a stored data based on an erasure code, including:

Determine the startup failure recovery threshold, and the startup failure recovery threshold is less than or equal to the number of striped storage blocks. Reconstructing the difference between the minimum number of data blocks and the erasure code, and greater than or equal to 1;

For the stripe whose number of faulty data blocks reaches the startup fault recovery threshold, initiate fault recovery of the stripe;

Data reconstruction is performed using the non-faulty data block of the strip.

The embodiment of the present disclosure further provides a storage data reconstruction apparatus based on an erasure code, including:

The startup fault recovery threshold determining module is configured to determine a startup fault recovery threshold, and the startup fault recovery threshold is less than or equal to a difference between the number of stripe storage data blocks and the minimum number of data blocks reconstructed by the erasure code, and is greater than or equal to 1;

The fault recovery startup module is configured to start a fault recovery of the stripe for a stripe whose number of fault data blocks reaches a startup fault recovery threshold;

A data reconstruction module configured to utilize the non-faulty data blocks of the stripe for data reconstruction.

The embodiment of the present disclosure further provides a storage node based on an erasure code, including a physical storage medium and a processor, where the processor is configured to:

Determining the startup failure recovery threshold, and distributing the startup failure recovery threshold to other storage nodes; the startup failure recovery threshold is less than or equal to the difference between the number of striped storage data blocks and the minimum number of data blocks reconstructed by the erasure code, and is greater than or equal to 1;

Scanning the fault conditions of the strips that the storage node is responsible for, and starting the fault recovery of the strips when the number of faulty data blocks reaches the threshold for starting the fault recovery threshold;

The stripe non-faulty data block is extracted from the physical storage medium of the storage node and the physical storage medium of the other storage node for data reconstruction.

The embodiment of the present disclosure further provides a computer storage medium, where the computer storage medium stores computer executable instructions, and the computer executable instructions are used to execute the erasure code based storage data reconstruction method according to any one of the foregoing.

The computer storage medium may be a transitory computer readable storage medium or a non-transitory computer readable storage medium.

The beneficial effects of the present disclosure are:

Method and device for reconstructing stored data based on erasure code provided by embodiment of the present disclosure, storage node, And the computer storage medium, by determining the startup failure recovery threshold, the startup failure recovery threshold is less than or equal to the difference between the number of stripe storage data blocks and the minimum number of data blocks reconstructed by the erasure code, and is greater than or equal to 1; The stripe of the fault recovery threshold starts the fault recovery of the strip; the non-faulty data block of the strip is used for data reconstruction. Compared with the related technology, the number of times of failure recovery is reduced, thereby reducing the bandwidth consumption of the system, making the system more stable and improving the service performance of the system.

BRIEF abstract

1 is a schematic diagram of a principle of an erasure code according to any embodiment of the present disclosure;

2 is a schematic diagram of distributed data storage of erasure codes according to any embodiment of the present disclosure;

3 is a flowchart of a method for reconstructing stored data based on erasure code according to Embodiment 1 of the present disclosure;

4 is a schematic diagram of a storage data reconstruction apparatus based on erasure code according to Embodiment 2 of the present disclosure;

5 is a schematic diagram of a storage node based on an erasure code according to Embodiment 3 of the present disclosure;

6 is a schematic diagram of a storage cluster based on an erasure code according to Embodiment 4 of the present disclosure;

FIG. 7 is a flowchart of a method for reconstructing stored data based on erasure code according to Embodiment 4 of the present disclosure.

detailed description

The embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings.

The data is stored by using the erasure code technique. Referring to FIG. 1, the method includes: dicing the original file to obtain k source data blocks, and then encoding the k source data blocks to generate n coded data blocks, that is, one. (n, k) erasure code is that k source data blocks are encoded to obtain n data blocks; then, when performing data reconstruction, any k data blocks in the n data blocks can be restored by decoding. The k source data blocks, the k source data blocks are combined to reconstruct the original file. The distributed data storage model based on erasure code can be seen in Figure 2. It is assumed that the system contains n storage nodes, of which k are data nodes and m are coding nodes, that is, satisfy n=k+m. The k data nodes store the original data blocks, labeled D ₀ , D ₁ , . . . , D _{k-1 , respectively} ; m coding nodes store the coded data blocks, labeled C ₀ , C ₁ , . . . , C _m-1 . The erasure code algorithm needs to cut the original file into k equal parts and store it in k data nodes in turn, that is, switch the original file to obtain k source data blocks, and put the m pieces of data generated by the encoding into m coding nodes. . When storing a large file, the original file needs to be double-cut, that is, each time the data of the specified size is read from the file for encoding, we refer to the original data and the encoded data involved in the encoding process as a stripe. band). A stripe independently constitutes a coded set of information, and different stripes are independent of each other.

For a stripe, in general, as long as one data block in the strip fails, the data reconstruction is triggered. In the data reconstruction, the new node needs to first download all the data from the k nodes to recover the original file. Re-encoding to generate invalid data, the amount of data transmitted in this process is k times the invalid data. When there are multiple stripe and multi-block failures in the whole system, a large amount of data reconstruction traffic is triggered. However, the related technologies limit the available network bandwidth for data recovery, which will inevitably lead to a slower node reconstruction process. For a distributed storage system that is constantly failing, the node reconstruction rate directly affects system reliability. If the reconfiguration rate is too slow, even if the speed at which the node fails, the system will not be able to maintain its reliability. And limit the data recovery bandwidth, but reduce the network bandwidth consumption in a short period of time, and in the long run, the bandwidth occupied by data recovery is not substantially reduced. Therefore, a more reasonable and reliable data reconstruction method is needed to reduce system bandwidth usage and ensure system stability.

Embodiment 1:

In order to solve the problem that the bandwidth consumption is high, the system is unstable, and the service performance is not good, the method for reconstructing the data based on the erasure code is provided in the related art. Figure 3, including:

Step S301: Determine a startup failure recovery threshold, and the startup failure recovery threshold is less than or equal to a difference between the number of stripe storage data blocks and the minimum number of data blocks reconstructed by the erasure code, and is greater than or equal to 1.

In the method of reconstructing the data based on the erasure code provided in this embodiment, the concept of starting the fault recovery threshold is introduced, that is, the startup failure recovery is performed, in order to reduce the number of times of the storage data recovery. When the threshold is set, the startup failure recovery threshold for data recovery is set for each strip according to the resource condition of the system, and the fault recovery of the strip is started for the stripe whose number of faulty data blocks reaches the startup fault recovery threshold. . The start failure recovery threshold is defined as r. For each strip, when the number of fault data blocks reaches r, the fault recovery is started immediately. For the (n, k) erasure code, in order to ensure the reliability of the erasure code technique, r can be max nk and the minimum is 1, which is the number of stripe storage blocks, corresponding to n storage nodes in the system. , k reconstructs the least data block for the erasure code The number corresponds to k data nodes. The startup failure recovery threshold is set, and the fault recovery of the strip is started for the strip whose fault data block reaches the startup fault recovery threshold. Compared with the related technology, as long as one storage node fails, the data recovery and reconstruction are triggered, which is effective. The frequency of data recovery is reduced, and the bandwidth occupation is reduced, thereby ensuring service performance and improving system stability.

The method for reconstructing the data based on the erasure code provided in this embodiment may further include setting the startup failure recovery threshold to an initial value; dynamically adjusting the startup failure recovery threshold according to the system load condition, and the system load is heavier, the more the failure recovery threshold is. Big.

Because the load in the system is constantly changing, in order to make the data recovery and reconstruction times more consistent with the real-time status of the system, the data recovery frequency can be reduced more reasonably, the bandwidth occupied by recovery can be reduced, and services can be guaranteed as much as possible. Performance, when the storage system is initialized, the determined startup failure recovery threshold is set to an initial value in the storage system, and then the startup failure recovery threshold is dynamically adjusted according to the real-time resource state of the system, wherein the system load is heavier. The startup failure recovery threshold is larger; the real-time dynamic adjustment of the startup failure recovery threshold includes setting an adjustment period, and adjusting the startup failure recovery threshold at intervals. In addition, when the storage system is initialized, in order to ensure the maximum system redundancy and the highest reliability, the initial value of the startup failure recovery threshold r can be set to 1. Then, dynamically adjusting the startup fault recovery threshold according to the system load condition includes: periodically calculating the load information of the system, and determining that the system load is a heavy load or a light load according to a preset rule; and recovering the startup fault of the next cycle when the heavy load is performed; The threshold is increased by a preset step value, and when the light load is performed, the startup failure recovery threshold of the next cycle is subtracted from the preset step value; the preset step value includes greater than or equal to 1, and is less than or equal to the number of stripe storage data blocks and The erasure code reconstructs a positive integer of the minimum number of data block differences. When the overload occurs, the startup failure recovery threshold of the next cycle is incremented by 1, and is not greater than the difference between the number of stripe storage data blocks and the minimum number of data blocks reconstructed by the erasure code; when the heavy load is lightly loaded, the next period is The startup failure recovery threshold is decremented by 1, and is not less than one. That is, when it is judged that the load of the system is light and the system bandwidth does not constitute a bottleneck, the failure recovery threshold is started to be continuously 1 to ensure rapid recovery of system data. When the load is heavy, the startup fault recovery threshold is n-k, which ensures that the faulty strip recovers quickly, thus effectively ensuring system reliability and improving system service performance.

The above-mentioned periodic calculation system load information, and determining the system load as a heavy load or a light load according to a preset rule includes: recording Num _i as the number of user I/O requests completed in the time period P _i , Latency _i (k) P _i is the period of the k-th user I / O service hours; maximum delay period P _i is provided limit _i, and delay requirements for each protocol user I / O, are satisfied latency _i (k) ≤ Limit _i ; defines the ratio of Violate _i to user I/O that violates the delay protocol:

If system congestion occurs in period P _i or Violate _i >δ, δ is called relaxation factor, then the load is judged to be overloaded; if system congestion does not occur in period P _i and Violate _i >δ is not satisfied, it is judged as light The δ can be set as needed, which is not limited in this embodiment.

Step S102: Start a fault recovery of the stripe for a stripe whose number of faulty data blocks reaches a fault recovery threshold.

The number of faulty data blocks in the system is detected, and the number of fault data blocks corresponding to each stripe is counted. When the faulty data block of the stripe reaches the startup fault recovery threshold, the stripe is recovered.

Step S103, performing data reconstruction by using the non-faulty data block of the strip.

For the fault recovery of the strips whose number of faulty data blocks reaches the threshold of the fault recovery threshold, for the accurate and convenient statistics of the faulty data blocks, at least one queue to be reconstructed is constructed for each stripe of the faulty data block. The stripe identification information is recorded in the queue to be reconstructed, and each strip corresponding to each queue has the same number of fault data blocks; for the queue to be reconstructed that reaches the startup fault recovery threshold, according to the fault data of each strip corresponding to the queue The number of blocks is selected from the largest to the smallest, and the queues to be reconstructed are sequentially selected, and the fault recovery is started for the strips in the selected queue to be reconstructed. That is, the fault data block and the stripe statistics can be performed through the queue to be reconstructed, and then the fault recovery is performed according to the statistical situation. The data is reconstructed for the stripe whose number of faulty data blocks reaches the threshold of the fault recovery threshold. The fault is recovered by sequentially selecting the strips with more faulty data blocks, and the k normal storage data corresponding to the stripe are read from the system. Obtaining the original file according to data decoding and combination; then calculating a new n node set for placing the strip according to the strip id and the current node and network availability; and encoding n data blocks according to the erasure code algorithm, The network sends the stripe information and the data block to the new node respectively; each new node updates the local information according to the situation, writes the data to the node, and completes the data reconstruction.

The method for reconstructing the stored data based on the erasure code provided by the embodiment, by determining the startup failure recovery threshold, starts the difference that the failure recovery threshold is less than or equal to the number of stripe storage data blocks and the minimum number of data blocks reconstructed by the erasure code, and If the number of faulty data blocks reaches the start fault recovery threshold, the fault recovery of the strip is started; and the non-faulty data block of the strip is used for data reconstruction. compared to In the related technology, the number of times of failure recovery is reduced, thereby reducing the bandwidth consumption of the system, making the system more stable and improving the system service performance.

Embodiment 2:

The embodiment provides a storage data reconstruction apparatus based on the erasure code. Referring to FIG. 4, the method includes: a startup failure recovery threshold determination module 41, a failure recovery startup module 42 and a data reconstruction module 43, wherein the startup failure recovery threshold is determined. The module 41 is configured to determine a startup failure recovery threshold, the startup failure recovery threshold being less than or equal to a difference between the number of stripe storage data blocks and the number of erasure code reconstruction minimum data blocks, and greater than or equal to 1; the fault recovery initiation module 42 is configured to For the stripe whose number of failed data blocks reaches the startup failure recovery threshold, the fault recovery of the strip is initiated; the data reconstruction module 43 is configured to perform data reconstruction using the non-faulty data blocks of the strip.

In the storage data reconstruction device based on the erasure code provided in this embodiment, the startup failure recovery threshold determination module 41 performs the startup failure recovery threshold setting, in order to reduce the number of times of the storage data recovery. According to the resource condition of the system, the startup failure recovery threshold for data recovery is set for each stripe. For the stripe whose number of faulty data blocks reaches the startup fault recovery threshold, the fault recovery of the strip is started. The start failure recovery threshold is defined as r. For each strip, when the number of fault data blocks reaches r, the fault recovery is started immediately. For the (n, k) erasure code, in order to ensure the reliability of the erasure code technique, r may be n-k at the maximum and 1 at least. The startup failure recovery threshold is set, and the fault recovery of the strip is started for the stripe whose number of faulty data blocks reaches the startup fault recovery threshold. Compared with the related art, as long as one data block in the strip fails, the data recovery is triggered. Reconstruction effectively reduces the frequency of data recovery and reduces bandwidth usage, thereby ensuring service performance and improving system stability.

The apparatus for reconstructing data based on the erasure code provided in this embodiment may further include a startup failure recovery threshold adjustment module 44 configured to set the startup failure recovery threshold to an initial value; The load condition is dynamically adjusted to start the fault recovery threshold. The heavier the system load is, the larger the fault recovery threshold is.

Because the load in the system is constantly changing, in order to make the data recovery and reconstruction times more consistent with the real-time status of the system, the data recovery frequency can be reduced more reasonably, the bandwidth occupied by recovery can be reduced, and services can be guaranteed as much as possible. Performance, when the storage system is initialized, the determined startup failure recovery threshold is set to an initial value in the storage system, and then the startup failure recovery threshold is dynamically adjusted according to the real-time resource state of the system, wherein the system load is heavier. The startup failure recovery threshold is larger. In addition, when the storage system is initialized, in order to ensure the maximum system redundancy and the highest reliability, the initial value of the startup failure recovery threshold r may be set to 1 by the startup failure recovery threshold adjustment module. Then, dynamically adjusting the startup fault recovery threshold according to the system load condition includes: periodically calculating the load information of the system, and determining that the system load is a heavy load or a light load according to a preset rule; and recovering the startup fault of the next cycle when the heavy load is performed; The threshold is incremented by one, and is not greater than the difference between the number of stripe storage data blocks and the minimum number of data blocks reconstructed by the erasure code; when the light load is heavy, the start failure recovery threshold of the next period is decremented by 1, and not less than one. That is, when it is judged that the load of the system is light and the system bandwidth does not constitute a bottleneck, the failure recovery threshold is started to be continuously 1 to ensure rapid recovery of system data. When the load is heavy, the startup fault recovery threshold is nk, which ensures that the faulty fault is quickly recovered, thus effectively ensuring system reliability and improving system service performance. The above-mentioned cyclic calculation system load information, and determining whether the system load is heavy or light load according to a preset rule can be performed by whether system congestion occurs in the period P _i or whether the Violate _i is greater than δ, δ is called a relaxation factor. Judge.

The fault recovery startup module 42 determines, for the number of fault data blocks, a strip that initiates a fault recovery threshold, including: detecting the number of data blocks that have failed in the system, and performing statistics on the data of the fault data blocks for each stripe, when the strips are When the fault data block is reached, the strip is recovered and the fault recovery of the strip is started.

The apparatus for reconstructing the data based on the erasure code according to the embodiment further includes a reconstruction queue processing module 45 configured to: construct at least one to be reconstructed for each stripe of the faulty data block In the queue, the stripe identification information is recorded in the queue to be reconstructed, and each strip corresponding to each queue has the same number of fault data blocks; for the queue to be reconstructed that reaches the startup fault recovery threshold, according to each strip corresponding to the queue The number of fault data blocks is selected from the largest to the smallest, and the queues to be reconstructed are selected, and the fault recovery is started for the strips in the selected queue to be reconstructed.

The data reconstruction module 43 performs data reconstruction using the non-faulty data blocks of the stripe. Select a strip with more faulty data blocks to recover the fault, read the k normal storage data corresponding to the strip from the network, and obtain the original file; then calculate the placement bar according to the strip id and the current node and network availability. a new set of n nodes; and encodes n data blocks according to an erasure code algorithm, respectively sends strip information and data blocks to the new node through the network; each new node updates local information according to the situation; writes the data Enter the node and complete the data reconstruction.

The apparatus for reconstructing data based on the erasure code provided by the embodiment, by determining the startup failure recovery threshold, starts the difference that the failure recovery threshold is less than or equal to the number of stripe storage blocks and the minimum number of blocks of the erasure code reconstruction, and If the number of faulty data blocks reaches the start fault recovery threshold, the fault recovery of the strip is started; and the non-faulty data block of the strip is used for data reconstruction. Compared with the related technology, the number of times of failure recovery is effectively reduced, thereby reducing the bandwidth consumption of the system, making the system more stable and improving the service performance of the system.

Embodiment 3:

The embodiment provides a storage node based on an erasure code. Referring to FIG. 5, the processor 51 and the physical storage medium 52 are included, wherein the processor 51 is configured to: determine a startup failure recovery threshold, and distribute the threshold to other storage. Node; the startup failure recovery threshold is less than or equal to the difference between the number of stripe storage data blocks and the minimum number of data blocks reconstructed by the erasure code, and is greater than or equal to 1; scanning the fault condition of each strip responsible for the storage node, for the fault data block The number of the fault recovery threshold is reached, and the fault recovery of the strip is started; the non-faulty data block of the strip is extracted from the physical storage medium of the storage node and the physical storage medium of the other storage node for data reconstruction.

The processor 51 sets a startup failure recovery threshold in the system when the system is initialized, and sets the startup failure recovery threshold to an initial value; then dynamically adjusts the startup failure recovery threshold according to the system load condition when the system performs file read and write operations. The heavier the system load, the larger the recovery threshold is. Through the setting of the startup failure recovery threshold, the frequency of data reconstruction in the system is dynamically adjusted according to the system load, thereby effectively reducing the bandwidth consumption of the system.

The processor 51 may be configured to: construct, for each strip of the faulty data block, at least one queue to be reconstructed, and record the queue to be reconstructed, in order to conveniently collect the faulty data block and the stripe information. Stripe identification information, each strip corresponding to the queue to be reconstructed has the same number of fault data blocks; for the queue to be reconstructed to reach the fault recovery threshold, the fault data according to each strip corresponding to the queue to be reconstructed The number of blocks is selected from the largest to the smallest, and the queues to be reconstructed are sequentially selected, and the fault recovery is started for the strips in the selected queue to be reconstructed. When performing data reconstruction, the server 51 may obtain the non-faulty data block of the stripe from the physical storage medium of the storage node where the server is located, or obtain the non-faulty data block of the stripe from other stored physical storage media. . The physical storage medium in this embodiment may be a storage unit configured to store data.

It should be understood that the processor 51 in this embodiment may be a processor 51, which is provided with Different functional modules are used to perform the different processes described above; the processor 51 may also be a plurality of processors 51 having different processing functions, each of which performs one of the above processes or several processes.

The storage node based on the erasure code provided in this embodiment determines that the fault recovery threshold is started, and the fault recovery threshold is less than or equal to the difference between the number of stripe data blocks and the minimum number of data blocks reconstructed by the erasure code, and is greater than or equal to 1 For the stripe whose number of faulty data blocks reaches the startup fault recovery threshold, the fault recovery of the strip is initiated; the non-faulty data block of the stripe is used for data reconstruction. Compared with related technologies, the number of times of failure recovery is effectively reduced, thereby reducing the bandwidth consumption of the system, making the system more stable and improving service performance.

Embodiment 4:

In the storage cluster for storing the data, the storage node is usually a storage node provided by the foregoing embodiment 3. As shown in FIG. 6, the storage node usually includes a management center 61 and a management agent 62. Distributed file storage client 63, data routing 64 and local data storage service 65 and the like. The management center 61 is configured to maintain the members and status of the cluster, as well as data distribution rules, data recovery rules, etc., to provide strong consistency decisions. It is usually deployed on three storage nodes by default to form a management center cluster; the management center 61 can also be deployed on a separate server for cluster management. The management center cluster is designed based on paxos algorithm to implement a more suitable consistency election algorithm, so that the change of node state is unified on all nodes of the whole system. The management agent 62 is configured to implement communication between the nodes and the management center 61, periodically provide node health information to the management center 61, and receive control instructions from the management center 61. A management center 61 can be deployed on each storage node. The distributed file storage client 63 is configured to provide a distributed cluster access point service, and can also be regarded as an agent for an application access storage system, providing a file operation interface common to applications such as C API, IAVA API, NFS ( Network File System (Network File System), CIFS (Common Internet File System), etc.; perform data interaction with the client 60, and the client 60 may be a user client corresponding to the storage cluster. Data routing 64 is responsible for file access control, data file distribution and management of various data, and metadata storage. The data location function communicates with the local data storage service process, responds to read and write requests from the distributed file storage client, and routes the request to the local data storage service process on the node to implement data access, copy distribution, and the like. It is deployed in a cluster mode on each storage node. Each data positioning module can share memory data and zero failover time, which can be easily expanded to provide massive metadata capacity. Maintain routing data queues Q _i to be reconstructed local data storage service manager responsible for the actual physical media resource management and maintenance space, and can be responsible for storing lookup local objects, perform I / O operations. The local data storage service is a process that actually processes data read and write, interacts with physical storage devices, and implements data read and write functions. The storage cluster may be a NAS storage cluster or any other storage cluster configured as data storage.

Based on the foregoing storage cluster, the embodiment provides a method for reconstructing stored data based on the erasure code. Referring to FIG. 7, the method includes:

Step S701, initializing the storage system.

The initial setting of the system includes: setting the use of the erasure code (n, k) through the management center 61, the maximum number of faulty data blocks can be tolerated as nk, and when the nk data blocks are faulty, another k normal data can be passed. The block restores the original file and regenerates n data blocks to maintain system redundancy. The system must maintain the necessary data reliability by writing additional redundant data to the new node. Then, the initial startup failure recovery threshold is initialized, and the maximum reliability is the highest, and the reliability is the highest. The initialization startup failure recovery threshold r can be 1. Then each storage node initializes a stripe list of the node, and each item in the stripe list includes a stripe id, a stripe master id information, and the stripe relates to all nodes and disk id information. Each storage node initializes the to-be-reconstructed queue Q _i , i of the node to correspond to the number of strip-related data block failures (1<=i<=nk). The number of failures of all stripe data blocks in Q ₁ is 1. Similarly, the number of failures of all stripe data blocks in Q _nk is nk. Each record in the queue is stripped with an id. Therefore, when the storage system is initialized, there are nk queues to be reconstructed on each storage node, and each queue is empty.

In step S702, a file writing operation is performed.

After the system initialization is completed, the file write operation is performed, including: the distributed file storage client 63 dynamically selects a data node response write request of a storage node according to the load balancing principle; the data route is searched or calculated according to the current storage system rule. Obtaining n nodes and disk ids that the file file should be written, and determining one of the (node id, disk id) tuples as the primary id; then encoding n data blocks according to the (n, k) erasure code; The stripe information and data blocks are sent to n nodes respectively. On the aforementioned n nodes, the data route records the stripe information into the stripe list, and the local data storage service writes the data to the local disk.

Step S703, detecting status information of the system.

When the storage system is initialized, the user frequently initiates read and write files during the system running process. During the file reading and writing process, data corruption and data loss may occur during data storage due to hardware damage and software failure. In this case, in order to ensure the stability of the system, it is necessary to detect the state of the stored data block in the system to perform timely fault recovery on the strip corresponding to the faulty data block. The status information of the detection system includes: the management center 61 periodically reads the system load information and the system availability status information in the current time period P _i from each node management agent 62, wherein the system availability status information includes each node, a disk, and a network. Link status, etc. At the same time, the management center 61 needs to process the collected information, including filtering out the dirty data obtained from the faulty node, etc.; the management center 61 confirms according to the processed system available information, the decision node and the network fault condition, and the active heartbeat. The fault condition of the system. At the same time, according to the system load information, the startup failure recovery threshold r of the next period P _i+1 is determined. Then, the determined startup failure recovery threshold is sent to the data route 64 of each storage node, and the global fault condition is also sent to each storage node.

In step S704, data reconstruction is performed.

When there is a faulty data block in the system, the data is reconstructed for the stripe fault recovery that meets the startup fault recovery threshold. After detecting that there is a faulty data block in the system, setting the queue to be reconstructed or updating the pre-built queue to be reconstructed. In the data reading and writing process, the data routing 64 scans each stripe responsible for the node (the strip main data block is at the local node), and refreshes the queue to be reconstructed, including: if the node or disk of all the data blocks of the strip S is located If the queue is not in any of the queues to be reconstructed, the strip is skipped and the next strip is scanned. If all the data blocks of the strip S are in the normal state, the strips are normal. In the previous cycle of S to be reconstructed in the queue Q _i , the strip S is deleted from the Q _i and the queue information of the strip S is updated; if the fault node and the number of the disk blocks involved in the strip S are i (stripe If the node or disk where the partial data block of S is located is abnormal, and the previous period of the strip S is in the queue Q _i to be reconstructed, the strip S is deleted from the Q _i and then inserted into the queue Q _i and updated at the same time. The queue information of the strip S; if the faulty node involved in the strip S, the number of disk blocks is i (the node where the partial data block of the strip S is located or the disk is abnormal), and the strip S is not in any one cycle configuration queue, then the strip S is inserted into the tail queue Q _i, queue information and update the strip S by the above-mentioned more Such a process the same number of faults to be reconstructed slice queue, during reconstruction, and more preferred number of faults to be reconstructed with a strip queue for data reconstruction.

If Q _{i (i} = nk) is not empty, then sequentially taken out from the first tape Q _i team id, start the reconstruction process; if Q _i is empty, then i = i-1, is repeated scanning strip, until i is less than r. When there are many stripes to be reconstructed, so that all reconstructions cannot be completed in a single cycle, the queue Q _{i may} be adjusted by the queuing module during the reconstruction process. In this case, always starts reconstruction of Q _{i (i} = nk), the process comprising: a data routing 64 a correspondence relationship according to a local strip queue records in the queue, the node, the strip S is acquired currently stored in the n nodes Set Set; according to the system node, the network fault state, determine k normal nodes; then read data from the k node's local data storage service through the network; decode the original file according to the erasure code algorithm. Then, according to the strip id, the current node, and the availability of the network, a new set of n nodes set to be placed is calculated; the n data blocks encoded according to the erasure code algorithm are used to strip the information and data through the network. The blocks are sent to all reachable nodes in the set Set∪Set'respectively; each new node updates the local information according to the situation. For node n in Set', if n∈Set∩Set', the data route of node n records the stripe information into the stripe list;

The data route of the node N records the stripe information into the stripe list, and the local data storage service module writes the data to the node to complete the data reconstruction; for the node n in the set, if

The space reclamation module deletes the data corresponding to the strip S and reclaims the space. At the same time, the data routing deletes the corresponding strip information record from the strip list.

Embodiments of the present disclosure also provide a computer readable storage medium storing computer executable instructions arranged to perform the method of any of the above embodiments.

The method for reconstructing stored data based on the erasure code provided in this embodiment recovers and merges multiple data blocks of the same strip into one completion according to the availability of the system and the load condition of the system, thereby effectively reducing the data recovery bandwidth occupation. Compared to one block failure in the related art, it takes up to k times bandwidth recovery, and recovering multiple data blocks (assuming f) requires f*k times bandwidth. The method for reconstructing stored data based on the erasure code provided in this embodiment requires k times bandwidth to recover f data blocks, and converts to recover one data block, only needs k/f times bandwidth, thereby avoiding unnecessary data recovery. The bandwidth consumption is greatly reduced; and the bandwidth consumption caused by data recovery is reduced, the network communication cost is effectively reduced, and the service performance is improved; the startup failure recovery threshold is dynamically adjusted according to the load, and the system data is quickly restored when the load is light. When the load is heavy, the strips with severe faults are quickly restored, thereby effectively ensuring system reliability and achieving a good balance between system reliability and system service performance. In addition, the method for reconstructing the stored data based on the erasure code provided by the embodiment is simple to implement, and does not need to modify the underlying kernel, and is applicable to various operating systems such as windows and Linux; and is independent of the platform, that is, it is used for various architectures. The distributed storage system is applicable.

Obviously, those skilled in the art should understand that the modules or steps of the above embodiments of the present disclosure may be implemented by a general computing device, which may be concentrated on a single computing device or distributed among multiple computing devices. On the network, optionally, they may be implemented by program code executable by the computing device, such that they may be stored in a computer storage medium (ROM/RAM, disk, optical disk) by a computing device, and at some In some cases, it can be performed in a different order than here. The steps shown or described are either made separately into individual integrated circuit modules, or a plurality of modules or steps are fabricated as a single integrated circuit module. Therefore, the present disclosure is not limited to any specific combination of hardware and software.

The above is a detailed description of the embodiments of the present disclosure in connection with the embodiments, and the implementation of the present disclosure is not limited to the description. It is to be understood by those skilled in the art that the present invention may be construed as being limited to the scope of the present disclosure without departing from the scope of the embodiments of the present disclosure.

Industrial applicability

The method and device for storing data based on erasure code provided by the present application and the storage node reduce the number of times of failure recovery, thereby reducing the bandwidth consumption of the system, making the system more stable and improving the service performance of the system.

Claims

A method for reconstructing stored data based on erasure code includes:

Determining a startup failure recovery threshold, the startup failure recovery threshold being less than or equal to a difference between the number of stripe storage data blocks and the minimum number of data blocks reconstructed by the erasure code, and greater than or equal to 1;

For the stripe whose number of faulty data blocks reaches the startup fault recovery threshold, initiate fault recovery of the stripe;

Data reconstruction is performed using the non-faulty data block of the strip.
The method of claim 1 further comprising:

Setting the startup failure recovery threshold to an initial value;

The startup failure recovery threshold is dynamically adjusted according to the system load condition, and the system load is heavier, and the startup failure recovery threshold is larger.
The method according to claim 2, wherein the dynamically adjusting the startup failure recovery threshold according to a system load condition comprises: setting an initial value of the startup failure recovery threshold to 1, periodically calculating load information of the system, and according to the preset The rule determines that the system load is heavy or light load; when the overload occurs, the startup failure recovery threshold of the next cycle is increased by a preset step value, and when the light load is performed, the startup failure recovery threshold of the next cycle is subtracted from the preset step. The preset step value includes a positive integer greater than or equal to 1, and less than or equal to the difference between the number of stripe storage data blocks and the minimum number of data blocks reconstructed by the erasure code.
The method according to claim 3, wherein said periodically calculating load information of the system and determining that the system load is overloaded or lightly loaded according to a preset rule comprises:

Note that Num i is the number of user I/O requests completed in the time period P i , and Latency i (k) is the service time of the kth user I/O in the period P i ;

Provided the maximum delay period P i is Limit i, and delay requirements for each protocol user I / O, are satisfied Latency i (k) ≤Limit i; Violate i is defined as a protocol violation delay user I / O percentage :

If system congestion or Violate i > δ occurs in the period P i , δ is called a relaxation factor, and the load is judged to be overloaded; if system congestion does not occur in the period P i and Violate i > δ is not satisfied, it is judged as light load. .
A method according to any one of claims 1 to 4, further comprising:

For each stripe in the faulty data block, at least one queue to be reconstructed is constructed, and stripe identification information is recorded in the queue to be reconstructed, and each strip corresponding to each queue to be reconstructed has the same number of fault data blocks;

For the queue to be reconstructed, the queue to be reconstructed is selected according to the number of fault data blocks in each strip corresponding to the queue to be reconstructed, and the queue to be reconstructed is selected in the selected queue to be reconstructed. The strips initiate fault recovery separately.
A storage data reconstruction device based on erasure code includes:

The startup failure recovery threshold determining module is configured to determine a startup failure recovery threshold, the startup failure recovery threshold being less than or equal to a difference between the number of stripe storage data blocks and the minimum number of data blocks reconstructed by the erasure code, and greater than or equal to 1;

The fault recovery startup module is configured to start a fault recovery of the stripe for a stripe whose number of fault data blocks reaches the boot fault recovery threshold;

A data reconstruction module configured to utilize the non-faulty data blocks of the stripe for data reconstruction.
The apparatus of claim 6, further comprising: activating a failure recovery threshold adjustment module, the startup failure recovery threshold adjustment module configured to:

Setting the startup failure recovery threshold to an initial value;

The startup failure recovery threshold is dynamically adjusted according to the system load condition, and the system load is heavier, and the startup failure recovery threshold is larger.
The apparatus according to claim 7, wherein the startup failure recovery threshold adjustment module is further configured to: set an initial value of the startup failure recovery threshold to 1, periodically calculate load information of the system, and determine according to a preset rule. The system load is heavy or light load; when the load is heavy, the startup fault recovery threshold of the next cycle is increased by a preset step value, and when the light load is performed, the startup fault recovery threshold of the next cycle is subtracted from the preset step value; The preset step value includes a positive integer greater than or equal to 1, and less than or equal to the number of stripe storage data blocks and the minimum number of data blocks reconstructed by the erasure code.
Apparatus according to any of claims 6-8, further comprising a reconstruction queue processing module, said The refactoring queue processing module is configured to:

For each stripe in the faulty data block, at least one queue to be reconstructed is constructed, and stripe identification information is recorded in the queue to be reconstructed, and each strip corresponding to each queue to be reconstructed has the same number of fault data blocks;

For the queue to be reconstructed, the queue to be reconstructed is selected according to the number of fault data blocks in each strip corresponding to the queue to be reconstructed, and the queue to be reconstructed is selected in the selected queue to be reconstructed. The strips initiate fault recovery separately.
An erasure code based storage node includes a physical storage medium and a processor; the processor is configured to:

Determining a startup failure recovery threshold, and distributing the startup failure recovery threshold to other storage nodes; the startup failure recovery threshold is less than or equal to a difference between the number of stripe storage data blocks and the minimum number of data blocks reconstructed by the erasure code, and is greater than Equal to 1;

Scanning the fault condition of each strip that is responsible for the storage node, and starting the stripe fault recovery for the stripe whose number of faulty data blocks reaches the boot fault recovery threshold;

The stripe non-faulty data block is extracted from the physical storage medium of the storage node and the physical storage medium of the other storage node for data reconstruction.
The storage node of claim 10 wherein the processor is further configured to:

Setting the startup failure recovery threshold to an initial value;

The startup failure recovery threshold is dynamically adjusted according to the system load condition, and the system load is heavier, and the startup failure recovery threshold is larger.
A storage node according to claim 10 or 11, wherein the processor is further configured to:

For each stripe of the faulty data block in the storage node, at least one queue to be reconstructed is constructed, and stripe identification information is recorded in the queue to be reconstructed, and each strip corresponding to each queue to be reconstructed has the same fault data. Number of blocks;

For the queue to be reconstructed, the queue to be reconstructed is selected according to the number of fault data blocks that each strip corresponding to the queue to be reconstructed has from the largest to the smallest, and the selected queue to be reconstructed is selected. The strips in the column initiate fault recovery, respectively.
A computer readable storage medium storing computer executable instructions arranged to perform the method of any of claims 1-5.