WO2018001110A1 - Method and device for reconstructing stored data based on erasure coding, and storage node - Google Patents

Method and device for reconstructing stored data based on erasure coding, and storage node Download PDF

Info

Publication number
WO2018001110A1
WO2018001110A1 PCT/CN2017/088477 CN2017088477W WO2018001110A1 WO 2018001110 A1 WO2018001110 A1 WO 2018001110A1 CN 2017088477 W CN2017088477 W CN 2017088477W WO 2018001110 A1 WO2018001110 A1 WO 2018001110A1
Authority
WO
WIPO (PCT)
Prior art keywords
reconstructed
queue
data
stripe
recovery threshold
Prior art date
Application number
PCT/CN2017/088477
Other languages
French (fr)
Chinese (zh)
Inventor
江滢
王志坤
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2018001110A1 publication Critical patent/WO2018001110A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's

Definitions

  • the present application relates to the field of communications, for example, to a method and apparatus for storing data based on erasure code, and a storage node.
  • Cloud storage systems In recent years, with the explosive growth of information resources and data, distributed storage systems have become the foundation and core of cloud storage and big data with high performance, high scalability, high availability, and easy management. However, due to hardware damage and software failure, data corruption and loss may occur during data storage. Cloud storage systems generally use erasure code technology to improve fault tolerance and improve data resource utilization efficiency and system performance. The erasure code does not increase the excess storage space, and usually ensures high reliability and availability of data through reasonable redundancy coding.
  • the use of erasure code technology to save data compared to the full replication technology, to a large extent reduce the system space overhead, but at the same time due to data reconstruction will bring huge network overhead, the adoption of this
  • the type of erasure code technology may cause the network of the entire system or the network of some nodes to be congested and unable to provide services, which affects the performance of the system.
  • the number of nodes deployed in the current storage system is increasing, and the number of nodes that fail every day is also increasing.
  • the proportion of data recovery traffic will continue to increase in total network traffic. Access that greatly affects daily business data. Therefore, how to reduce the bandwidth consumption in the erasure code technology and ensure the service performance are currently worthy of consideration.
  • the embodiments of the present disclosure provide a storage data reconstruction method and apparatus based on erasure code, and a storage node, so as to solve the problem of high bandwidth consumption in the storage data reconstruction in the related art, the system is unstable, and the service performance is not good. The problem.
  • an embodiment of the present disclosure provides a method for reconstructing a stored data based on an erasure code, including:
  • the startup failure recovery threshold is less than or equal to the number of striped storage blocks. Reconstructing the difference between the minimum number of data blocks and the erasure code, and greater than or equal to 1;
  • Data reconstruction is performed using the non-faulty data block of the strip.
  • the embodiment of the present disclosure further provides a storage data reconstruction apparatus based on an erasure code, including:
  • the startup fault recovery threshold determining module is configured to determine a startup fault recovery threshold, and the startup fault recovery threshold is less than or equal to a difference between the number of stripe storage data blocks and the minimum number of data blocks reconstructed by the erasure code, and is greater than or equal to 1;
  • the fault recovery startup module is configured to start a fault recovery of the stripe for a stripe whose number of fault data blocks reaches a startup fault recovery threshold;
  • a data reconstruction module configured to utilize the non-faulty data blocks of the stripe for data reconstruction.
  • the embodiment of the present disclosure further provides a storage node based on an erasure code, including a physical storage medium and a processor, where the processor is configured to:
  • the startup failure recovery threshold is less than or equal to the difference between the number of striped storage data blocks and the minimum number of data blocks reconstructed by the erasure code, and is greater than or equal to 1;
  • the stripe non-faulty data block is extracted from the physical storage medium of the storage node and the physical storage medium of the other storage node for data reconstruction.
  • the embodiment of the present disclosure further provides a computer storage medium, where the computer storage medium stores computer executable instructions, and the computer executable instructions are used to execute the erasure code based storage data reconstruction method according to any one of the foregoing.
  • the computer storage medium may be a transitory computer readable storage medium or a non-transitory computer readable storage medium.
  • the startup failure recovery threshold is less than or equal to the difference between the number of stripe storage data blocks and the minimum number of data blocks reconstructed by the erasure code, and is greater than or equal to 1;
  • the stripe of the fault recovery threshold starts the fault recovery of the strip; the non-faulty data block of the strip is used for data reconstruction.
  • the number of times of failure recovery is reduced, thereby reducing the bandwidth consumption of the system, making the system more stable and improving the service performance of the system.
  • FIG. 1 is a schematic diagram of a principle of an erasure code according to any embodiment of the present disclosure
  • FIG. 2 is a schematic diagram of distributed data storage of erasure codes according to any embodiment of the present disclosure
  • FIG. 3 is a flowchart of a method for reconstructing stored data based on erasure code according to Embodiment 1 of the present disclosure
  • FIG. 4 is a schematic diagram of a storage data reconstruction apparatus based on erasure code according to Embodiment 2 of the present disclosure
  • FIG. 5 is a schematic diagram of a storage node based on an erasure code according to Embodiment 3 of the present disclosure
  • FIG. 6 is a schematic diagram of a storage cluster based on an erasure code according to Embodiment 4 of the present disclosure
  • FIG. 7 is a flowchart of a method for reconstructing stored data based on erasure code according to Embodiment 4 of the present disclosure.
  • the data is stored by using the erasure code technique.
  • the method includes: dicing the original file to obtain k source data blocks, and then encoding the k source data blocks to generate n coded data blocks, that is, one.
  • (n, k) erasure code is that k source data blocks are encoded to obtain n data blocks; then, when performing data reconstruction, any k data blocks in the n data blocks can be restored by decoding.
  • the k source data blocks, the k source data blocks are combined to reconstruct the original file.
  • the distributed data storage model based on erasure code can be seen in Figure 2.
  • the k data nodes store the original data blocks, labeled D 0 , D 1 , . . . , D k-1 , respectively ;
  • m coding nodes store the coded data blocks, labeled C 0 , C 1 , . . . , C m-1 .
  • the erasure code algorithm needs to cut the original file into k equal parts and store it in k data nodes in turn, that is, switch the original file to obtain k source data blocks, and put the m pieces of data generated by the encoding into m coding nodes. .
  • the original file When storing a large file, the original file needs to be double-cut, that is, each time the data of the specified size is read from the file for encoding, we refer to the original data and the encoded data involved in the encoding process as a stripe. band).
  • a stripe independently constitutes a coded set of information, and different stripes are independent of each other.
  • the data reconstruction is triggered.
  • the new node needs to first download all the data from the k nodes to recover the original file. Re-encoding to generate invalid data, the amount of data transmitted in this process is k times the invalid data.
  • the related technologies limit the available network bandwidth for data recovery, which will inevitably lead to a slower node reconstruction process.
  • the node reconstruction rate directly affects system reliability. If the reconfiguration rate is too slow, even if the speed at which the node fails, the system will not be able to maintain its reliability. And limit the data recovery bandwidth, but reduce the network bandwidth consumption in a short period of time, and in the long run, the bandwidth occupied by data recovery is not substantially reduced. Therefore, a more reasonable and reliable data reconstruction method is needed to reduce system bandwidth usage and ensure system stability.
  • Embodiment 1 is a diagrammatic representation of Embodiment 1:
  • Figure 3 including:
  • Step S301 Determine a startup failure recovery threshold, and the startup failure recovery threshold is less than or equal to a difference between the number of stripe storage data blocks and the minimum number of data blocks reconstructed by the erasure code, and is greater than or equal to 1.
  • the concept of starting the fault recovery threshold is introduced, that is, the startup failure recovery is performed, in order to reduce the number of times of the storage data recovery.
  • the startup failure recovery threshold for data recovery is set for each strip according to the resource condition of the system, and the fault recovery of the strip is started for the stripe whose number of faulty data blocks reaches the startup fault recovery threshold.
  • the start failure recovery threshold is defined as r. For each strip, when the number of fault data blocks reaches r, the fault recovery is started immediately.
  • r can be max nk and the minimum is 1, which is the number of stripe storage blocks, corresponding to n storage nodes in the system.
  • k reconstructs the least data block for the erasure code The number corresponds to k data nodes.
  • the startup failure recovery threshold is set, and the fault recovery of the strip is started for the strip whose fault data block reaches the startup fault recovery threshold. Compared with the related technology, as long as one storage node fails, the data recovery and reconstruction are triggered, which is effective. The frequency of data recovery is reduced, and the bandwidth occupation is reduced, thereby ensuring service performance and improving system stability.
  • the method for reconstructing the data based on the erasure code provided in this embodiment may further include setting the startup failure recovery threshold to an initial value; dynamically adjusting the startup failure recovery threshold according to the system load condition, and the system load is heavier, the more the failure recovery threshold is. Big.
  • the data recovery frequency can be reduced more reasonably, the bandwidth occupied by recovery can be reduced, and services can be guaranteed as much as possible.
  • Performance when the storage system is initialized, the determined startup failure recovery threshold is set to an initial value in the storage system, and then the startup failure recovery threshold is dynamically adjusted according to the real-time resource state of the system, wherein the system load is heavier.
  • the startup failure recovery threshold is larger; the real-time dynamic adjustment of the startup failure recovery threshold includes setting an adjustment period, and adjusting the startup failure recovery threshold at intervals.
  • the initial value of the startup failure recovery threshold r can be set to 1.
  • dynamically adjusting the startup fault recovery threshold according to the system load condition includes: periodically calculating the load information of the system, and determining that the system load is a heavy load or a light load according to a preset rule; and recovering the startup fault of the next cycle when the heavy load is performed;
  • the threshold is increased by a preset step value, and when the light load is performed, the startup failure recovery threshold of the next cycle is subtracted from the preset step value;
  • the preset step value includes greater than or equal to 1, and is less than or equal to the number of stripe storage data blocks and
  • the erasure code reconstructs a positive integer of the minimum number of data block differences.
  • the startup failure recovery threshold of the next cycle is incremented by 1, and is not greater than the difference between the number of stripe storage data blocks and the minimum number of data blocks reconstructed by the erasure code; when the heavy load is lightly loaded, the next period is The startup failure recovery threshold is decremented by 1, and is not less than one. That is, when it is judged that the load of the system is light and the system bandwidth does not constitute a bottleneck, the failure recovery threshold is started to be continuously 1 to ensure rapid recovery of system data.
  • the startup fault recovery threshold is n-k, which ensures that the faulty strip recovers quickly, thus effectively ensuring system reliability and improving system service performance.
  • the above-mentioned periodic calculation system load information, and determining the system load as a heavy load or a light load according to a preset rule includes: recording Num i as the number of user I/O requests completed in the time period P i , Latency i (k) P i is the period of the k-th user I / O service hours; maximum delay period P i is provided limit i, and delay requirements for each protocol user I / O, are satisfied latency i (k) ⁇ Limit i ; defines the ratio of Violate i to user I/O that violates the delay protocol:
  • is called relaxation factor, then the load is judged to be overloaded; if system congestion does not occur in period P i and Violate i > ⁇ is not satisfied, it is judged as light
  • the ⁇ can be set as needed, which is not limited in this embodiment.
  • Step S102 Start a fault recovery of the stripe for a stripe whose number of faulty data blocks reaches a fault recovery threshold.
  • the number of faulty data blocks in the system is detected, and the number of fault data blocks corresponding to each stripe is counted.
  • the faulty data block of the stripe reaches the startup fault recovery threshold, the stripe is recovered.
  • Step S103 performing data reconstruction by using the non-faulty data block of the strip.
  • At least one queue to be reconstructed is constructed for each stripe of the faulty data block.
  • the stripe identification information is recorded in the queue to be reconstructed, and each strip corresponding to each queue has the same number of fault data blocks; for the queue to be reconstructed that reaches the startup fault recovery threshold, according to the fault data of each strip corresponding to the queue.
  • the number of blocks is selected from the largest to the smallest, and the queues to be reconstructed are sequentially selected, and the fault recovery is started for the strips in the selected queue to be reconstructed.
  • the fault data block and the stripe statistics can be performed through the queue to be reconstructed, and then the fault recovery is performed according to the statistical situation.
  • the data is reconstructed for the stripe whose number of faulty data blocks reaches the threshold of the fault recovery threshold.
  • the fault is recovered by sequentially selecting the strips with more faulty data blocks, and the k normal storage data corresponding to the stripe are read from the system.
  • the method for reconstructing the stored data based on the erasure code provided by the embodiment by determining the startup failure recovery threshold, starts the difference that the failure recovery threshold is less than or equal to the number of stripe storage data blocks and the minimum number of data blocks reconstructed by the erasure code, and If the number of faulty data blocks reaches the start fault recovery threshold, the fault recovery of the strip is started; and the non-faulty data block of the strip is used for data reconstruction.
  • the number of times of failure recovery is reduced, thereby reducing the bandwidth consumption of the system, making the system more stable and improving the system service performance.
  • Embodiment 2 is a diagrammatic representation of Embodiment 1:
  • the embodiment provides a storage data reconstruction apparatus based on the erasure code.
  • the method includes: a startup failure recovery threshold determination module 41, a failure recovery startup module 42 and a data reconstruction module 43, wherein the startup failure recovery threshold is determined.
  • the module 41 is configured to determine a startup failure recovery threshold, the startup failure recovery threshold being less than or equal to a difference between the number of stripe storage data blocks and the number of erasure code reconstruction minimum data blocks, and greater than or equal to 1;
  • the fault recovery initiation module 42 is configured to For the stripe whose number of failed data blocks reaches the startup failure recovery threshold, the fault recovery of the strip is initiated;
  • the data reconstruction module 43 is configured to perform data reconstruction using the non-faulty data blocks of the strip.
  • the startup failure recovery threshold determination module 41 performs the startup failure recovery threshold setting, in order to reduce the number of times of the storage data recovery.
  • the startup failure recovery threshold for data recovery is set for each stripe. For the stripe whose number of faulty data blocks reaches the startup fault recovery threshold, the fault recovery of the strip is started.
  • the start failure recovery threshold is defined as r. For each strip, when the number of fault data blocks reaches r, the fault recovery is started immediately.
  • r may be n-k at the maximum and 1 at least.
  • the startup failure recovery threshold is set, and the fault recovery of the strip is started for the stripe whose number of faulty data blocks reaches the startup fault recovery threshold.
  • the data recovery is triggered. Reconstruction effectively reduces the frequency of data recovery and reduces bandwidth usage, thereby ensuring service performance and improving system stability.
  • the apparatus for reconstructing data based on the erasure code may further include a startup failure recovery threshold adjustment module 44 configured to set the startup failure recovery threshold to an initial value;
  • the load condition is dynamically adjusted to start the fault recovery threshold. The heavier the system load is, the larger the fault recovery threshold is.
  • the data recovery frequency can be reduced more reasonably, the bandwidth occupied by recovery can be reduced, and services can be guaranteed as much as possible.
  • Performance when the storage system is initialized, the determined startup failure recovery threshold is set to an initial value in the storage system, and then the startup failure recovery threshold is dynamically adjusted according to the real-time resource state of the system, wherein the system load is heavier. The startup failure recovery threshold is larger.
  • the initial value of the startup failure recovery threshold r may be set to 1 by the startup failure recovery threshold adjustment module.
  • dynamically adjusting the startup fault recovery threshold according to the system load condition includes: periodically calculating the load information of the system, and determining that the system load is a heavy load or a light load according to a preset rule; and recovering the startup fault of the next cycle when the heavy load is performed;
  • the threshold is incremented by one, and is not greater than the difference between the number of stripe storage data blocks and the minimum number of data blocks reconstructed by the erasure code; when the light load is heavy, the start failure recovery threshold of the next period is decremented by 1, and not less than one. That is, when it is judged that the load of the system is light and the system bandwidth does not constitute a bottleneck, the failure recovery threshold is started to be continuously 1 to ensure rapid recovery of system data.
  • the startup fault recovery threshold is nk, which ensures that the faulty fault is quickly recovered, thus effectively ensuring system reliability and improving system service performance.
  • the above-mentioned cyclic calculation system load information, and determining whether the system load is heavy or light load according to a preset rule can be performed by whether system congestion occurs in the period P i or whether the Violate i is greater than ⁇ , ⁇ is called a relaxation factor. Judge.
  • the fault recovery startup module 42 determines, for the number of fault data blocks, a strip that initiates a fault recovery threshold, including: detecting the number of data blocks that have failed in the system, and performing statistics on the data of the fault data blocks for each stripe, when the strips are When the fault data block is reached, the strip is recovered and the fault recovery of the strip is started.
  • the apparatus for reconstructing the data based on the erasure code further includes a reconstruction queue processing module 45 configured to: construct at least one to be reconstructed for each stripe of the faulty data block In the queue, the stripe identification information is recorded in the queue to be reconstructed, and each strip corresponding to each queue has the same number of fault data blocks; for the queue to be reconstructed that reaches the startup fault recovery threshold, according to each strip corresponding to the queue The number of fault data blocks is selected from the largest to the smallest, and the queues to be reconstructed are selected, and the fault recovery is started for the strips in the selected queue to be reconstructed.
  • the data reconstruction module 43 performs data reconstruction using the non-faulty data blocks of the stripe. Select a strip with more faulty data blocks to recover the fault, read the k normal storage data corresponding to the strip from the network, and obtain the original file; then calculate the placement bar according to the strip id and the current node and network availability. a new set of n nodes; and encodes n data blocks according to an erasure code algorithm, respectively sends strip information and data blocks to the new node through the network; each new node updates local information according to the situation; writes the data Enter the node and complete the data reconstruction.
  • the apparatus for reconstructing data based on the erasure code provided by the embodiment, by determining the startup failure recovery threshold, starts the difference that the failure recovery threshold is less than or equal to the number of stripe storage blocks and the minimum number of blocks of the erasure code reconstruction, and If the number of faulty data blocks reaches the start fault recovery threshold, the fault recovery of the strip is started; and the non-faulty data block of the strip is used for data reconstruction.
  • the number of times of failure recovery is effectively reduced, thereby reducing the bandwidth consumption of the system, making the system more stable and improving the service performance of the system.
  • Embodiment 3 is a diagrammatic representation of Embodiment 3
  • the embodiment provides a storage node based on an erasure code.
  • the processor 51 and the physical storage medium 52 are included, wherein the processor 51 is configured to: determine a startup failure recovery threshold, and distribute the threshold to other storage. Node; the startup failure recovery threshold is less than or equal to the difference between the number of stripe storage data blocks and the minimum number of data blocks reconstructed by the erasure code, and is greater than or equal to 1; scanning the fault condition of each strip responsible for the storage node, for the fault data block The number of the fault recovery threshold is reached, and the fault recovery of the strip is started; the non-faulty data block of the strip is extracted from the physical storage medium of the storage node and the physical storage medium of the other storage node for data reconstruction.
  • the processor 51 sets a startup failure recovery threshold in the system when the system is initialized, and sets the startup failure recovery threshold to an initial value; then dynamically adjusts the startup failure recovery threshold according to the system load condition when the system performs file read and write operations. The heavier the system load, the larger the recovery threshold is. Through the setting of the startup failure recovery threshold, the frequency of data reconstruction in the system is dynamically adjusted according to the system load, thereby effectively reducing the bandwidth consumption of the system.
  • the processor 51 may be configured to: construct, for each strip of the faulty data block, at least one queue to be reconstructed, and record the queue to be reconstructed, in order to conveniently collect the faulty data block and the stripe information.
  • Stripe identification information each strip corresponding to the queue to be reconstructed has the same number of fault data blocks; for the queue to be reconstructed to reach the fault recovery threshold, the fault data according to each strip corresponding to the queue to be reconstructed The number of blocks is selected from the largest to the smallest, and the queues to be reconstructed are sequentially selected, and the fault recovery is started for the strips in the selected queue to be reconstructed.
  • the server 51 may obtain the non-faulty data block of the stripe from the physical storage medium of the storage node where the server is located, or obtain the non-faulty data block of the stripe from other stored physical storage media.
  • the physical storage medium in this embodiment may be a storage unit configured to store data.
  • the processor 51 in this embodiment may be a processor 51, which is provided with Different functional modules are used to perform the different processes described above; the processor 51 may also be a plurality of processors 51 having different processing functions, each of which performs one of the above processes or several processes.
  • the storage node based on the erasure code provided in this embodiment determines that the fault recovery threshold is started, and the fault recovery threshold is less than or equal to the difference between the number of stripe data blocks and the minimum number of data blocks reconstructed by the erasure code, and is greater than or equal to 1
  • the fault recovery threshold is less than or equal to the difference between the number of stripe data blocks and the minimum number of data blocks reconstructed by the erasure code, and is greater than or equal to 1
  • the fault recovery of the strip is initiated; the non-faulty data block of the stripe is used for data reconstruction.
  • the number of times of failure recovery is effectively reduced, thereby reducing the bandwidth consumption of the system, making the system more stable and improving service performance.
  • Embodiment 4 is a diagrammatic representation of Embodiment 4:
  • the storage node is usually a storage node provided by the foregoing embodiment 3.
  • the storage node usually includes a management center 61 and a management agent 62.
  • the management center 61 is configured to maintain the members and status of the cluster, as well as data distribution rules, data recovery rules, etc., to provide strong consistency decisions. It is usually deployed on three storage nodes by default to form a management center cluster; the management center 61 can also be deployed on a separate server for cluster management.
  • the management center cluster is designed based on paxos algorithm to implement a more suitable consistency election algorithm, so that the change of node state is unified on all nodes of the whole system.
  • the management agent 62 is configured to implement communication between the nodes and the management center 61, periodically provide node health information to the management center 61, and receive control instructions from the management center 61.
  • a management center 61 can be deployed on each storage node.
  • the distributed file storage client 63 is configured to provide a distributed cluster access point service, and can also be regarded as an agent for an application access storage system, providing a file operation interface common to applications such as C API, IAVA API, NFS ( Network File System (Network File System), CIFS (Common Internet File System), etc.; perform data interaction with the client 60, and the client 60 may be a user client corresponding to the storage cluster.
  • Data routing 64 is responsible for file access control, data file distribution and management of various data, and metadata storage. The data location function communicates with the local data storage service process, responds to read and write requests from the distributed file storage client, and routes the request to the local data storage service process on the node to implement data access, copy distribution, and the like.
  • Each data positioning module can share memory data and zero failover time, which can be easily expanded to provide massive metadata capacity.
  • Maintain routing data queues Q i to be reconstructed local data storage service manager responsible for the actual physical media resource management and maintenance space, and can be responsible for storing lookup local objects, perform I / O operations.
  • the local data storage service is a process that actually processes data read and write, interacts with physical storage devices, and implements data read and write functions.
  • the storage cluster may be a NAS storage cluster or any other storage cluster configured as data storage.
  • the embodiment provides a method for reconstructing stored data based on the erasure code.
  • the method includes:
  • Step S701 initializing the storage system.
  • the initial setting of the system includes: setting the use of the erasure code (n, k) through the management center 61, the maximum number of faulty data blocks can be tolerated as nk, and when the nk data blocks are faulty, another k normal data can be passed.
  • the block restores the original file and regenerates n data blocks to maintain system redundancy.
  • the system must maintain the necessary data reliability by writing additional redundant data to the new node.
  • the initial startup failure recovery threshold is initialized, and the maximum reliability is the highest, and the reliability is the highest.
  • the initialization startup failure recovery threshold r can be 1.
  • each storage node initializes a stripe list of the node, and each item in the stripe list includes a stripe id, a stripe master id information, and the stripe relates to all nodes and disk id information.
  • the number of failures of all stripe data blocks in Q 1 is 1.
  • the number of failures of all stripe data blocks in Q nk is nk.
  • Each record in the queue is stripped with an id. Therefore, when the storage system is initialized, there are nk queues to be reconstructed on each storage node, and each queue is empty.
  • step S702 a file writing operation is performed.
  • the file write operation is performed, including: the distributed file storage client 63 dynamically selects a data node response write request of a storage node according to the load balancing principle; the data route is searched or calculated according to the current storage system rule. Obtaining n nodes and disk ids that the file file should be written, and determining one of the (node id, disk id) tuples as the primary id; then encoding n data blocks according to the (n, k) erasure code; The stripe information and data blocks are sent to n nodes respectively. On the aforementioned n nodes, the data route records the stripe information into the stripe list, and the local data storage service writes the data to the local disk.
  • Step S703 detecting status information of the system.
  • the status information of the detection system includes: the management center 61 periodically reads the system load information and the system availability status information in the current time period P i from each node management agent 62, wherein the system availability status information includes each node, a disk, and a network. Link status, etc.
  • the management center 61 needs to process the collected information, including filtering out the dirty data obtained from the faulty node, etc.; the management center 61 confirms according to the processed system available information, the decision node and the network fault condition, and the active heartbeat. The fault condition of the system.
  • the startup failure recovery threshold r of the next period P i+1 is determined. Then, the determined startup failure recovery threshold is sent to the data route 64 of each storage node, and the global fault condition is also sent to each storage node.
  • step S704 data reconstruction is performed.
  • the data routing 64 scans each stripe responsible for the node (the strip main data block is at the local node), and refreshes the queue to be reconstructed, including: if the node or disk of all the data blocks of the strip S is located If the queue is not in any of the queues to be reconstructed, the strip is skipped and the next strip is scanned. If all the data blocks of the strip S are in the normal state, the strips are normal.
  • the strip S is deleted from the Q i and the queue information of the strip S is updated; if the fault node and the number of the disk blocks involved in the strip S are i (stripe If the node or disk where the partial data block of S is located is abnormal, and the previous period of the strip S is in the queue Q i to be reconstructed, the strip S is deleted from the Q i and then inserted into the queue Q i and updated at the same time.
  • the queue information of the strip S if the faulty node involved in the strip S, the number of disk blocks is i (the node where the partial data block of the strip S is located or the disk is abnormal), and the strip S is not in any one cycle configuration queue, then the strip S is inserted into the tail queue Q i, queue information and update the strip S by the above-mentioned more Such a process the same number of faults to be reconstructed slice queue, during reconstruction, and more preferred number of faults to be reconstructed with a strip queue for data reconstruction.
  • the queue Q i may be adjusted by the queuing module during the reconstruction process.
  • the blocks are sent to all reachable nodes in the set Set ⁇ Set'respectively; each new node updates the local information according to the situation.
  • each new node updates the local information according to the situation.
  • the data route of node n records the stripe information into the stripe list;
  • the data route of the node N records the stripe information into the stripe list, and the local data storage service module writes the data to the node to complete the data reconstruction;
  • the node n in the set if The space reclamation module deletes the data corresponding to the strip S and reclaims the space. At the same time, the data routing deletes the corresponding strip information record from the strip list.
  • Embodiments of the present disclosure also provide a computer readable storage medium storing computer executable instructions arranged to perform the method of any of the above embodiments.
  • the method for reconstructing stored data based on the erasure code provided in this embodiment recovers and merges multiple data blocks of the same strip into one completion according to the availability of the system and the load condition of the system, thereby effectively reducing the data recovery bandwidth occupation. Compared to one block failure in the related art, it takes up to k times bandwidth recovery, and recovering multiple data blocks (assuming f) requires f*k times bandwidth.
  • the method for reconstructing stored data based on the erasure code provided in this embodiment requires k times bandwidth to recover f data blocks, and converts to recover one data block, only needs k/f times bandwidth, thereby avoiding unnecessary data recovery.
  • the bandwidth consumption is greatly reduced; and the bandwidth consumption caused by data recovery is reduced, the network communication cost is effectively reduced, and the service performance is improved; the startup failure recovery threshold is dynamically adjusted according to the load, and the system data is quickly restored when the load is light.
  • the load is heavy, the strips with severe faults are quickly restored, thereby effectively ensuring system reliability and achieving a good balance between system reliability and system service performance.
  • the method for reconstructing the stored data based on the erasure code provided by the embodiment is simple to implement, and does not need to modify the underlying kernel, and is applicable to various operating systems such as windows and Linux; and is independent of the platform, that is, it is used for various architectures.
  • the distributed storage system is applicable.
  • modules or steps of the above embodiments of the present disclosure may be implemented by a general computing device, which may be concentrated on a single computing device or distributed among multiple computing devices. On the network, optionally, they may be implemented by program code executable by the computing device, such that they may be stored in a computer storage medium (ROM/RAM, disk, optical disk) by a computing device, and at some In some cases, it can be performed in a different order than here.
  • the steps shown or described are either made separately into individual integrated circuit modules, or a plurality of modules or steps are fabricated as a single integrated circuit module. Therefore, the present disclosure is not limited to any specific combination of hardware and software.
  • the method and device for storing data based on erasure code provided by the present application and the storage node reduce the number of times of failure recovery, thereby reducing the bandwidth consumption of the system, making the system more stable and improving the service performance of the system.

Abstract

A method and a device for reconstructing stored data based on erasure coding, and a storage node. The method comprises: determining a fault recovery initiation threshold, the fault recovery initiation threshold being less than or equal to the difference between the number of blocks of stored data of a stripe and the minimum number of blocks for erasure coding reconstruction and being greater than or equal to 1 (S301); for a stripe, of which the number of fault data blocks reaches the fault recovery initiation threshold, initiating fault recovery of the stripe (S302); using non-fault data blocks of the stripe to perform data reconstruction (S303). Compared to the related art, the invention reduces the amount of times fault recovery is required to be performed, thereby reducing the bandwidth consumption of the system, such that the system is more stable and has improved service performance.

Description

一种基于纠删码的存储数据重构方法和装置、存储节点Storage data reconstruction method and device based on erasure code, storage node 技术领域Technical field
本申请涉及通信领域,例如涉及一种基于纠删码的存储数据重构方法和装置、存储节点。The present application relates to the field of communications, for example, to a method and apparatus for storing data based on erasure code, and a storage node.
背景技术Background technique
近年来,随着信息资源和数据的爆炸性增长,分布式存储系统凭借高性能、高扩展、高可用、易管理等特点,成为云存储和大数据的基础和核心。但是由于硬件损坏和软件故障等原因,在数据存储过程中可能发生数据的损坏和丢失。云存储系统一般采用纠删码技术来提升容错能力,提高数据资源的使用效率和系统性能。纠删码在没有增加过量的存储空间的基础上,通常通过合理的冗余编码来保证数据的高可靠性和可用性。在云存储系统中,采用纠删码技术对数据进行保存,相对于完全复制技术,在很大程度上降低了系统的空间开销,但同时由于数据重构会带来巨大的网络开销,采用这种纠删码技术可能导致整个系统的网络或者部分节点的网络拥塞而无法提供服务,影响系统的性能。而且随着系统规模和磁盘容量增大,当前存储系统中部署的节点量越来越大,每天失效的节点数量也随之增多,数据恢复流量比重在总网络流量中还会不断增长,则会极大影响日常业务数据的访问。所以如何降低纠删码技术中的带宽消耗,保障业务性能是当前值得考虑的问题。In recent years, with the explosive growth of information resources and data, distributed storage systems have become the foundation and core of cloud storage and big data with high performance, high scalability, high availability, and easy management. However, due to hardware damage and software failure, data corruption and loss may occur during data storage. Cloud storage systems generally use erasure code technology to improve fault tolerance and improve data resource utilization efficiency and system performance. The erasure code does not increase the excess storage space, and usually ensures high reliability and availability of data through reasonable redundancy coding. In the cloud storage system, the use of erasure code technology to save data, compared to the full replication technology, to a large extent reduce the system space overhead, but at the same time due to data reconstruction will bring huge network overhead, the adoption of this The type of erasure code technology may cause the network of the entire system or the network of some nodes to be congested and unable to provide services, which affects the performance of the system. Moreover, as the system scale and disk capacity increase, the number of nodes deployed in the current storage system is increasing, and the number of nodes that fail every day is also increasing. The proportion of data recovery traffic will continue to increase in total network traffic. Access that greatly affects daily business data. Therefore, how to reduce the bandwidth consumption in the erasure code technology and ensure the service performance are currently worthy of consideration.
发明内容Summary of the invention
本公开实施例提供了基于纠删码的存储数据重构方法和装置、存储节点,以解决相关技术中纠删码技术进行存储数据重构时带宽消耗高,使得系统不稳定,业务性能不好的问题。The embodiments of the present disclosure provide a storage data reconstruction method and apparatus based on erasure code, and a storage node, so as to solve the problem of high bandwidth consumption in the storage data reconstruction in the related art, the system is unstable, and the service performance is not good. The problem.
为解决上述技术问题,本公开实施例提供一种基于纠删码的存储数据重构方法,包括:To solve the above technical problem, an embodiment of the present disclosure provides a method for reconstructing a stored data based on an erasure code, including:
确定启动故障恢复阈值,启动故障恢复阈值小于等于条带存储数据块数量 与纠删码重构最少数据块数量的差值,且大于等于1;Determine the startup failure recovery threshold, and the startup failure recovery threshold is less than or equal to the number of striped storage blocks. Reconstructing the difference between the minimum number of data blocks and the erasure code, and greater than or equal to 1;
对于故障数据块数量达到启动故障恢复阈值的条带,启动该条带的故障恢复;For the stripe whose number of faulty data blocks reaches the startup fault recovery threshold, initiate fault recovery of the stripe;
利用该条带的非故障数据块进行数据重构。Data reconstruction is performed using the non-faulty data block of the strip.
本公开实施例还提供一种基于纠删码的存储数据重构装置,包括:The embodiment of the present disclosure further provides a storage data reconstruction apparatus based on an erasure code, including:
启动故障恢复阈值确定模块,被配置为确定启动故障恢复阈值,启动故障恢复阈值小于等于条带存储数据块数量与纠删码重构最少数据块数量的差值,且大于等于1;The startup fault recovery threshold determining module is configured to determine a startup fault recovery threshold, and the startup fault recovery threshold is less than or equal to a difference between the number of stripe storage data blocks and the minimum number of data blocks reconstructed by the erasure code, and is greater than or equal to 1;
故障恢复启动模块,被配置为对于故障数据块数量达到启动故障恢复阈值的条带,启动该条带的故障恢复;The fault recovery startup module is configured to start a fault recovery of the stripe for a stripe whose number of fault data blocks reaches a startup fault recovery threshold;
数据重构模块,被配置为利用该条带的非故障数据块进行数据重构。A data reconstruction module configured to utilize the non-faulty data blocks of the stripe for data reconstruction.
本公开实施例还提供一种基于纠删码的存储节点,包括物理存储介质和处理器,处理器被配置为:The embodiment of the present disclosure further provides a storage node based on an erasure code, including a physical storage medium and a processor, where the processor is configured to:
确定启动故障恢复阈值,并将启动故障恢复阈值分发到其它存储节点;启动故障恢复阈值小于等于条带存储数据块数量与纠删码重构最少数据块数量的差值,且大于等于1;Determining the startup failure recovery threshold, and distributing the startup failure recovery threshold to other storage nodes; the startup failure recovery threshold is less than or equal to the difference between the number of striped storage data blocks and the minimum number of data blocks reconstructed by the erasure code, and is greater than or equal to 1;
扫描本存储节点负责的各条带的故障情况,对于故障数据块数量达到启动故障恢复阈值的条带,启动该条带的故障恢复;Scanning the fault conditions of the strips that the storage node is responsible for, and starting the fault recovery of the strips when the number of faulty data blocks reaches the threshold for starting the fault recovery threshold;
从本存储节点的物理存储介质以及其他存储节点的物理存储介质中提取该条带的非故障数据块进行数据重构。The stripe non-faulty data block is extracted from the physical storage medium of the storage node and the physical storage medium of the other storage node for data reconstruction.
本公开实施例还提供一种计算机存储介质,计算机存储介质中存储有计算机可执行指令,计算机可执行指令用于执行前述的任一项的基于纠删码的存储数据重构方法。The embodiment of the present disclosure further provides a computer storage medium, where the computer storage medium stores computer executable instructions, and the computer executable instructions are used to execute the erasure code based storage data reconstruction method according to any one of the foregoing.
所述计算机存储介质可以是暂态计算机可读存储介质,也可以是非暂态计算机可读存储介质。The computer storage medium may be a transitory computer readable storage medium or a non-transitory computer readable storage medium.
本公开的有益效果是:The beneficial effects of the present disclosure are:
本公开实施例提供的基于纠删码的存储数据重构方法和装置、存储节点, 以及计算机存储介质,通过确定启动故障恢复阈值,启动故障恢复阈值小于等于条带存储数据块数量与纠删码重构最少数据块数量的差值,且大于等于1;对于故障数据块数量达到启动故障恢复阈值的条带,启动该条带的故障恢复;利用该条带的非故障数据块进行数据重构。相比于相关技术减少了进行故障恢复的次数,从而减轻了系统的带宽消耗,使得系统更稳定,提高了系统的业务性能。Method and device for reconstructing stored data based on erasure code provided by embodiment of the present disclosure, storage node, And the computer storage medium, by determining the startup failure recovery threshold, the startup failure recovery threshold is less than or equal to the difference between the number of stripe storage data blocks and the minimum number of data blocks reconstructed by the erasure code, and is greater than or equal to 1; The stripe of the fault recovery threshold starts the fault recovery of the strip; the non-faulty data block of the strip is used for data reconstruction. Compared with the related technology, the number of times of failure recovery is reduced, thereby reducing the bandwidth consumption of the system, making the system more stable and improving the service performance of the system.
附图概述BRIEF abstract
图1为本公开实施例任一实施例的纠删码技术原理示意图;1 is a schematic diagram of a principle of an erasure code according to any embodiment of the present disclosure;
图2为本公开实施例任一实施例的纠删码分布式数据存储示意图;2 is a schematic diagram of distributed data storage of erasure codes according to any embodiment of the present disclosure;
图3为本公开实施例一的基于纠删码的存储数据重构方法流程图;3 is a flowchart of a method for reconstructing stored data based on erasure code according to Embodiment 1 of the present disclosure;
图4为本公开实施例二的基于纠删码的存储数据重构装置示意图;4 is a schematic diagram of a storage data reconstruction apparatus based on erasure code according to Embodiment 2 of the present disclosure;
图5为本公开实施例三的基于纠删码的存储节点示意图;5 is a schematic diagram of a storage node based on an erasure code according to Embodiment 3 of the present disclosure;
图6为本公开实施例四的基于纠删码的存储集群示意图;6 is a schematic diagram of a storage cluster based on an erasure code according to Embodiment 4 of the present disclosure;
图7为本公开实施例四的基于纠删码的存储数据重构方法流程图。FIG. 7 is a flowchart of a method for reconstructing stored data based on erasure code according to Embodiment 4 of the present disclosure.
具体实施方式detailed description
下面通过实施方式结合附图对本公开实施例进行详细说明。The embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings.
采用纠删码技术对数据进行存储,请参见图1,包括:将原始文件进行切块,得到k个源数据块,然后将该k个源数据块进行编码生成n个编码数据块,即一个(n,k)纠删码是k个源数据块经过编码运算得到n个数据块;然后在进行数据重构时,使用这n个数据块中任意k个数据块均可通过解码恢复出原来的k个源数据块,将该k个源数据块进行组合则重构出原始文件。基于纠删码的分布式数据存储模型可参见图2,假设系统中含有n个存储节点,其中k个是数据节点,m个是编码节点,即满足n=k+m。k个数据节点存储原始数据块,分别标记为D0,D1,…,Dk-1;m个编码节点存储编码数据块,标记为C0,C1,…,Cm-1。纠删码算法需要将原始文件切割成k等份后依次存储在k个数据节点中, 即将原始文件进行切换得到k个源数据块,并将编码生成的m份数据放入m个编码节点中。当存储大文件时,需要对原始文件进行二次切割,即每次从文件中读取指定大小的数据量进行编码,我们将一次编码过程中涉及的原始数据和编码数据称为一个stripe(条带)。一个stripe独立地构成一个编码的信息集合,不同stripe之间相互无关。The data is stored by using the erasure code technique. Referring to FIG. 1, the method includes: dicing the original file to obtain k source data blocks, and then encoding the k source data blocks to generate n coded data blocks, that is, one. (n, k) erasure code is that k source data blocks are encoded to obtain n data blocks; then, when performing data reconstruction, any k data blocks in the n data blocks can be restored by decoding. The k source data blocks, the k source data blocks are combined to reconstruct the original file. The distributed data storage model based on erasure code can be seen in Figure 2. It is assumed that the system contains n storage nodes, of which k are data nodes and m are coding nodes, that is, satisfy n=k+m. The k data nodes store the original data blocks, labeled D 0 , D 1 , . . . , D k-1 , respectively ; m coding nodes store the coded data blocks, labeled C 0 , C 1 , . . . , C m-1 . The erasure code algorithm needs to cut the original file into k equal parts and store it in k data nodes in turn, that is, switch the original file to obtain k source data blocks, and put the m pieces of data generated by the encoding into m coding nodes. . When storing a large file, the original file needs to be double-cut, that is, each time the data of the specified size is read from the file for encoding, we refer to the original data and the encoded data involved in the encoding process as a stripe. band). A stripe independently constitutes a coded set of information, and different stripes are independent of each other.
对于一个条带,通常情况下,只要条带中一个数据块发生故障,即触发数据重构,其中,进行数据重构时,新节点需要首先从k个节点中下载全部数据恢复出原始文件,再重新编码生成失效的数据,这个过程中传输的数据量是失效数据的k倍。当全系统存在多条条带、多数据块故障时,则会引发大量数据重构流量。而相关技术中一味限制数据恢复可用的网络带宽,势必会导致节点重构过程变慢,对于不断发生故障的分布式存储系统来说,节点的重构速率直接影响到系统可靠性。如果重构速率过慢,甚至赶不上节点发生故障的速度,那么系统将无法维持其可靠性。并且限制数据恢复带宽,只是短期内降低网络带宽占用,而从长期来看,数据恢复所占用的带宽并没有实质的降低。所以需要一种更为合理、可靠的数据重构方法来减轻系统带宽占用,保证系统稳定性。For a stripe, in general, as long as one data block in the strip fails, the data reconstruction is triggered. In the data reconstruction, the new node needs to first download all the data from the k nodes to recover the original file. Re-encoding to generate invalid data, the amount of data transmitted in this process is k times the invalid data. When there are multiple stripe and multi-block failures in the whole system, a large amount of data reconstruction traffic is triggered. However, the related technologies limit the available network bandwidth for data recovery, which will inevitably lead to a slower node reconstruction process. For a distributed storage system that is constantly failing, the node reconstruction rate directly affects system reliability. If the reconfiguration rate is too slow, even if the speed at which the node fails, the system will not be able to maintain its reliability. And limit the data recovery bandwidth, but reduce the network bandwidth consumption in a short period of time, and in the long run, the bandwidth occupied by data recovery is not substantially reduced. Therefore, a more reasonable and reliable data reconstruction method is needed to reduce system bandwidth usage and ensure system stability.
实施例一:Embodiment 1:
为解决相关技术中纠删码技术进行存储数据重构时带宽消耗高,使得系统不稳定,业务性能不好的问题,本实施例提供一种基于纠删码的存储数据重构方法,请参见图3,包括:In order to solve the problem that the bandwidth consumption is high, the system is unstable, and the service performance is not good, the method for reconstructing the data based on the erasure code is provided in the related art. Figure 3, including:
步骤S301,确定启动故障恢复阈值,启动故障恢复阈值小于等于条带存储数据块数量与纠删码重构最少数据块数量的差值,且大于等于1。Step S301: Determine a startup failure recovery threshold, and the startup failure recovery threshold is less than or equal to a difference between the number of stripe storage data blocks and the minimum number of data blocks reconstructed by the erasure code, and is greater than or equal to 1.
在保证系统冗余度的同时,为减少进行存储数据恢复的次数,本实施例提供的基于纠删码的存储数据重构方法中,引入了启动故障恢复阈值的概念,即在进行启动故障恢复阈值设置时,根据系统的负载等资源情况,为每个条带设置进行数据恢复的启动故障恢复阈值,对于故障数据块数量达到该启动故障恢复阈值的条带,才启动该条带的故障恢复。其中,将启动故障恢复阈值定义为r,对于每个条带,当故障数据块数达到r,则立即启动故障恢复。对于(n,k)纠删码,为了保证纠删码技术的可靠性,其中r最大可为n-k,最小为1,该n为条带存储数据块数量,对应于系统中的n个存储节点,k为纠删码重构最少数据块 数量,对应于k个数据节点。设置启动故障恢复阈值,对故障数据块数量达到该启动故障恢复阈值的条带启动该条带的故障恢复,相比于相关技术中,只要一个存储节点失效,即触发数据恢复及重构,有效的降低了进行数据恢复的频率,减少了带宽占用,从而可以保障业务性能,提高系统稳定性。In the method of reconstructing the data based on the erasure code provided in this embodiment, the concept of starting the fault recovery threshold is introduced, that is, the startup failure recovery is performed, in order to reduce the number of times of the storage data recovery. When the threshold is set, the startup failure recovery threshold for data recovery is set for each strip according to the resource condition of the system, and the fault recovery of the strip is started for the stripe whose number of faulty data blocks reaches the startup fault recovery threshold. . The start failure recovery threshold is defined as r. For each strip, when the number of fault data blocks reaches r, the fault recovery is started immediately. For the (n, k) erasure code, in order to ensure the reliability of the erasure code technique, r can be max nk and the minimum is 1, which is the number of stripe storage blocks, corresponding to n storage nodes in the system. , k reconstructs the least data block for the erasure code The number corresponds to k data nodes. The startup failure recovery threshold is set, and the fault recovery of the strip is started for the strip whose fault data block reaches the startup fault recovery threshold. Compared with the related technology, as long as one storage node fails, the data recovery and reconstruction are triggered, which is effective. The frequency of data recovery is reduced, and the bandwidth occupation is reduced, thereby ensuring service performance and improving system stability.
本实施例提供的基于纠删码的存储数据重构方法中,还可以包括将启动故障恢复阈值设置为初始值;根据系统负载情况动态调整启动故障恢复阈值,系统负载越重启动故障恢复阈值越大。The method for reconstructing the data based on the erasure code provided in this embodiment may further include setting the startup failure recovery threshold to an initial value; dynamically adjusting the startup failure recovery threshold according to the system load condition, and the system load is heavier, the more the failure recovery threshold is. Big.
由于系统中的负载是在不断发生变化的,为使进行数据故障恢复和重构的次数更加符合系统实时地状态,从而可以更加合理的降低数据恢复频率,减少恢复占用的带宽,尽可能保障业务性能,可以在存储系统初始化时,将确定的该启动故障恢复阈值设置为存储系统中的初始值,然后根据系统实时的资源状态对该启动故障恢复阈值进行动态调整,其中,系统的负载越重该启动故障恢复阈值越大;该实时的对启动故障恢复阈值进行动态调整包括设置一调整周期,每隔一段时间对此启动故障恢复阈值进行一次调整。另外,在存储系统初始化时,为保证系统冗余度最大,可靠性最高,可以将该启动故障恢复阈值r的初始值设置为1。然后,根据系统负载情况动态调整启动故障恢复阈值包括:周期性计算系统的负载信息,并根据预设的规则判定系统负载为重载或轻载;当重载时将下一周期的启动故障恢复阈值增加预设步进值,当轻载时将下一周期的启动故障恢复阈值减去预设步进值;该预设步进值包括大于等于1,且小于等于条带存储数据块数量与纠删码重构最少数据块数量差值的正整数。当重载时将下一周期的启动故障恢复阈值加1,且不大于条带存储数据块数量与纠删码重构最少数据块数量的差值;当重载轻载时将下一周期的启动故障恢复阈值减1,且不小于1。即当经判断,系统的负载较轻,系统带宽尚不构成瓶颈时,启动故障恢复阈值使其不断趋于1,保证系统数据快速恢复。而当负载较重时,启动故障恢复阈值最大为n-k,保证故障严重的条带快速恢复,从而有效保障了系统可靠性,提升了系统业务性能。Because the load in the system is constantly changing, in order to make the data recovery and reconstruction times more consistent with the real-time status of the system, the data recovery frequency can be reduced more reasonably, the bandwidth occupied by recovery can be reduced, and services can be guaranteed as much as possible. Performance, when the storage system is initialized, the determined startup failure recovery threshold is set to an initial value in the storage system, and then the startup failure recovery threshold is dynamically adjusted according to the real-time resource state of the system, wherein the system load is heavier. The startup failure recovery threshold is larger; the real-time dynamic adjustment of the startup failure recovery threshold includes setting an adjustment period, and adjusting the startup failure recovery threshold at intervals. In addition, when the storage system is initialized, in order to ensure the maximum system redundancy and the highest reliability, the initial value of the startup failure recovery threshold r can be set to 1. Then, dynamically adjusting the startup fault recovery threshold according to the system load condition includes: periodically calculating the load information of the system, and determining that the system load is a heavy load or a light load according to a preset rule; and recovering the startup fault of the next cycle when the heavy load is performed; The threshold is increased by a preset step value, and when the light load is performed, the startup failure recovery threshold of the next cycle is subtracted from the preset step value; the preset step value includes greater than or equal to 1, and is less than or equal to the number of stripe storage data blocks and The erasure code reconstructs a positive integer of the minimum number of data block differences. When the overload occurs, the startup failure recovery threshold of the next cycle is incremented by 1, and is not greater than the difference between the number of stripe storage data blocks and the minimum number of data blocks reconstructed by the erasure code; when the heavy load is lightly loaded, the next period is The startup failure recovery threshold is decremented by 1, and is not less than one. That is, when it is judged that the load of the system is light and the system bandwidth does not constitute a bottleneck, the failure recovery threshold is started to be continuously 1 to ensure rapid recovery of system data. When the load is heavy, the startup fault recovery threshold is n-k, which ensures that the faulty strip recovers quickly, thus effectively ensuring system reliability and improving system service performance.
上述周期性计算系统的负载信息,并根据预设的规则判定系统负载为重载或轻载包括:记Numi为时间周期Pi内完成的用户I/O请求个数,Latencyi(k)是在周期Pi内的第k个用户I/O的服务时间;设周期Pi的延迟上限为Limiti,且延迟协议为要求对于每个用户I/O,都满足Latencyi(k)≤Limiti;定义Violatei 为违反延迟协议的用户I/O所占比率:The above-mentioned periodic calculation system load information, and determining the system load as a heavy load or a light load according to a preset rule includes: recording Num i as the number of user I/O requests completed in the time period P i , Latency i (k) P i is the period of the k-th user I / O service hours; maximum delay period P i is provided limit i, and delay requirements for each protocol user I / O, are satisfied latency i (k) ≤ Limit i ; defines the ratio of Violate i to user I/O that violates the delay protocol:
Figure PCTCN2017088477-appb-000001
Figure PCTCN2017088477-appb-000001
如果在周期Pi内发生系统拥塞或者Violatei>δ,δ称为松弛因子,则判断负载为重载;如果在周期Pi内未发生系统拥塞并且不满足Violatei>δ,则判为轻载;其中δ可以根据需要设置,本实施例对此不做限定。If system congestion occurs in period P i or Violate i >δ, δ is called relaxation factor, then the load is judged to be overloaded; if system congestion does not occur in period P i and Violate i >δ is not satisfied, it is judged as light The δ can be set as needed, which is not limited in this embodiment.
步骤S102,对于故障数据块数量达到启动故障恢复阈值的条带,启动该条带的故障恢复。Step S102: Start a fault recovery of the stripe for a stripe whose number of faulty data blocks reaches a fault recovery threshold.
检测系统中发生故障的数据块的数量,并对各条带对应的故障数据块的数量进行统计,当条带的故障数据块达到启动故障恢复阈值时,对该条带进行故障恢复。The number of faulty data blocks in the system is detected, and the number of fault data blocks corresponding to each stripe is counted. When the faulty data block of the stripe reaches the startup fault recovery threshold, the stripe is recovered.
步骤S103,利用该条带的非故障数据块进行数据重构。Step S103, performing data reconstruction by using the non-faulty data block of the strip.
对于故障数据块数量达到启动故障恢复阈值的条带进行故障恢复的过程中,为准确和方便的进行故障数据块的统计,对于存在故障数据块的各条带,构建至少一个待重构队列,待重构队列中记录条带标识信息,每个队列对应的各条带具有相同的故障数据块数量;对于达到启动故障恢复阈值的待重构队列,根据队列对应的各条带具有的故障数据块数量从大到小依次选择待重构队列,对选择的待重构队列中的条带分别启动故障恢复。即可以通过该待重构队列进行故障数据块和条带的统计,然后根据统计情况进行故障恢复。对于故障数据块个数达到启动故障恢复阈值的条带进行数据重构,其为依次选择故障数据块多的条带进行故障恢复,从系统中读取该条带对应的k个正常存储数据,根据数据解码和组合得到原始文件;然后根据条带id及当前节点、网络可用性情况,计算出放置该条带的新的n个节点集合;并根据纠删码算法编码得到n个数据块,通过网络将条带信息及数据块分别发送到该新节点;各新节点根据情况更新本地信息,将数据写入本节点,完成数据重构。For the fault recovery of the strips whose number of faulty data blocks reaches the threshold of the fault recovery threshold, for the accurate and convenient statistics of the faulty data blocks, at least one queue to be reconstructed is constructed for each stripe of the faulty data block. The stripe identification information is recorded in the queue to be reconstructed, and each strip corresponding to each queue has the same number of fault data blocks; for the queue to be reconstructed that reaches the startup fault recovery threshold, according to the fault data of each strip corresponding to the queue The number of blocks is selected from the largest to the smallest, and the queues to be reconstructed are sequentially selected, and the fault recovery is started for the strips in the selected queue to be reconstructed. That is, the fault data block and the stripe statistics can be performed through the queue to be reconstructed, and then the fault recovery is performed according to the statistical situation. The data is reconstructed for the stripe whose number of faulty data blocks reaches the threshold of the fault recovery threshold. The fault is recovered by sequentially selecting the strips with more faulty data blocks, and the k normal storage data corresponding to the stripe are read from the system. Obtaining the original file according to data decoding and combination; then calculating a new n node set for placing the strip according to the strip id and the current node and network availability; and encoding n data blocks according to the erasure code algorithm, The network sends the stripe information and the data block to the new node respectively; each new node updates the local information according to the situation, writes the data to the node, and completes the data reconstruction.
本实施例提供的基于纠删码的存储数据重构方法,通过确定启动故障恢复阈值,启动故障恢复阈值小于等于条带存储数据块数量与纠删码重构最少数据块数量的差值,且大于等于1;对于故障数据块数量达到启动故障恢复阈值的条带,启动该条带的故障恢复;利用该条带的非故障数据块进行数据重构。相比 于相关技术,减少了进行故障恢复的次数,从而减轻了系统的带宽消耗,使得系统更稳定,提高了系统业务性能。The method for reconstructing the stored data based on the erasure code provided by the embodiment, by determining the startup failure recovery threshold, starts the difference that the failure recovery threshold is less than or equal to the number of stripe storage data blocks and the minimum number of data blocks reconstructed by the erasure code, and If the number of faulty data blocks reaches the start fault recovery threshold, the fault recovery of the strip is started; and the non-faulty data block of the strip is used for data reconstruction. compared to In the related technology, the number of times of failure recovery is reduced, thereby reducing the bandwidth consumption of the system, making the system more stable and improving the system service performance.
实施例二:Embodiment 2:
本实施例提供一种基于纠删码的存储数据重构装置,请参见图4,包括:启动故障恢复阈值确定模块41,故障恢复启动模块42和数据重构模块43,其中启动故障恢复阈值确定模块41被配置为确定启动故障恢复阈值,启动故障恢复阈值小于等于条带存储数据块数量与纠删码重构最少数据块数量的差值,且大于等于1;故障恢复启动模块42被配置为对于故障数据块数量达到启动故障恢复阈值的条带,启动该条带的故障恢复;数据重构模块43被配置为利用该条带的非故障数据块进行数据重构。The embodiment provides a storage data reconstruction apparatus based on the erasure code. Referring to FIG. 4, the method includes: a startup failure recovery threshold determination module 41, a failure recovery startup module 42 and a data reconstruction module 43, wherein the startup failure recovery threshold is determined. The module 41 is configured to determine a startup failure recovery threshold, the startup failure recovery threshold being less than or equal to a difference between the number of stripe storage data blocks and the number of erasure code reconstruction minimum data blocks, and greater than or equal to 1; the fault recovery initiation module 42 is configured to For the stripe whose number of failed data blocks reaches the startup failure recovery threshold, the fault recovery of the strip is initiated; the data reconstruction module 43 is configured to perform data reconstruction using the non-faulty data blocks of the strip.
在保证系统冗余度的同时,为减少进行存储数据恢复的次数,本实施例提供的基于纠删码的存储数据重构装置中,通过启动故障恢复阈值确定模块41进行启动故障恢复阈值设置,根据系统的负载等资源情况,为每个条带设置进行数据恢复的启动故障恢复阈值,对于故障数据块数量达到该启动故障恢复阈值的条带,才启动该条带的故障恢复。其中,将启动故障恢复阈值定义为r,对于每个条带,当故障数据块数达到r,则立即启动故障恢复。对于(n,k)纠删码,为了保证纠删码技术的可靠性,其中r最大可为n-k,最小为1。设置启动故障恢复阈值,对故障数据块数量达到该启动故障恢复阈值的条带启动该条带的故障恢复,相比于相关技术中,只要条带中一个数据块发生故障,即触发数据恢复及重构,有效的降低了进行数据恢复的频率,减少了带宽占用,从而可以保障业务性能,提高系统稳定性。In the storage data reconstruction device based on the erasure code provided in this embodiment, the startup failure recovery threshold determination module 41 performs the startup failure recovery threshold setting, in order to reduce the number of times of the storage data recovery. According to the resource condition of the system, the startup failure recovery threshold for data recovery is set for each stripe. For the stripe whose number of faulty data blocks reaches the startup fault recovery threshold, the fault recovery of the strip is started. The start failure recovery threshold is defined as r. For each strip, when the number of fault data blocks reaches r, the fault recovery is started immediately. For the (n, k) erasure code, in order to ensure the reliability of the erasure code technique, r may be n-k at the maximum and 1 at least. The startup failure recovery threshold is set, and the fault recovery of the strip is started for the stripe whose number of faulty data blocks reaches the startup fault recovery threshold. Compared with the related art, as long as one data block in the strip fails, the data recovery is triggered. Reconstruction effectively reduces the frequency of data recovery and reduces bandwidth usage, thereby ensuring service performance and improving system stability.
本实施例提供的基于纠删码的存储数据重构装置中,还可以包括启动故障恢复阈值调整模块44,该启动故障恢复阈值调整模块被配置为将启动故障恢复阈值设置为初始值;根据系统负载情况动态调整启动故障恢复阈值,系统负载越重启动故障恢复阈值越大。The apparatus for reconstructing data based on the erasure code provided in this embodiment may further include a startup failure recovery threshold adjustment module 44 configured to set the startup failure recovery threshold to an initial value; The load condition is dynamically adjusted to start the fault recovery threshold. The heavier the system load is, the larger the fault recovery threshold is.
由于系统中的负载是在不断发生变化的,为使进行数据故障恢复和重构的次数更加符合系统实时地状态,从而可以更加合理的降低数据恢复频率,减少恢复占用的带宽,尽可能保障业务性能,可以在存储系统初始化时,将确定的该启动故障恢复阈值设置为存储系统中的初始值,然后根据系统实时的资源状 态对该启动故障恢复阈值进行动态调整,其中,系统的负载越重该启动故障恢复阈值越大。另外,在存储系统初始化时,为保证系统冗余度最大,可靠性最高,可以由启动故障恢复阈值调整模块将该启动故障恢复阈值r的初始值设置为1。然后,根据系统负载情况动态调整启动故障恢复阈值包括:周期性计算系统的负载信息,并根据预设的规则判定系统负载为重载或轻载;当重载时将下一周期的启动故障恢复阈值加1,且不大于条带存储数据块数量与纠删码重构最少数据块数量的差值;当重载轻载时将下一周期的启动故障恢复阈值减1,且不小于1。即当经判断,系统的负载较轻,系统带宽尚不构成瓶颈时,启动故障恢复阈值使其不断趋于1,保证系统数据快速恢复。而当负载较重时,启动故障恢复阈值最大为n-k,保证故障严重的条带快速恢复,从而有效保障了系统可靠性,提升了系统业务性能。上述周期性计算系统的负载信息,并根据预设的规则判定系统负载为重载或轻载可以通过在周期Pi内是否发生系统拥塞或者Violatei是否大于δ,δ称为松弛因子,来进行判断。Because the load in the system is constantly changing, in order to make the data recovery and reconstruction times more consistent with the real-time status of the system, the data recovery frequency can be reduced more reasonably, the bandwidth occupied by recovery can be reduced, and services can be guaranteed as much as possible. Performance, when the storage system is initialized, the determined startup failure recovery threshold is set to an initial value in the storage system, and then the startup failure recovery threshold is dynamically adjusted according to the real-time resource state of the system, wherein the system load is heavier. The startup failure recovery threshold is larger. In addition, when the storage system is initialized, in order to ensure the maximum system redundancy and the highest reliability, the initial value of the startup failure recovery threshold r may be set to 1 by the startup failure recovery threshold adjustment module. Then, dynamically adjusting the startup fault recovery threshold according to the system load condition includes: periodically calculating the load information of the system, and determining that the system load is a heavy load or a light load according to a preset rule; and recovering the startup fault of the next cycle when the heavy load is performed; The threshold is incremented by one, and is not greater than the difference between the number of stripe storage data blocks and the minimum number of data blocks reconstructed by the erasure code; when the light load is heavy, the start failure recovery threshold of the next period is decremented by 1, and not less than one. That is, when it is judged that the load of the system is light and the system bandwidth does not constitute a bottleneck, the failure recovery threshold is started to be continuously 1 to ensure rapid recovery of system data. When the load is heavy, the startup fault recovery threshold is nk, which ensures that the faulty fault is quickly recovered, thus effectively ensuring system reliability and improving system service performance. The above-mentioned cyclic calculation system load information, and determining whether the system load is heavy or light load according to a preset rule can be performed by whether system congestion occurs in the period P i or whether the Violate i is greater than δ, δ is called a relaxation factor. Judge.
故障恢复启动模块42对于故障数据块数量达到启动故障恢复阈值的条带,包括:检测系统中发生故障的数据块的数量,并对各条带对于的故障数据块的数据进行统计,当条带的故障数据块达到时,对该条带进行故障恢复,启动该条带的故障恢复。The fault recovery startup module 42 determines, for the number of fault data blocks, a strip that initiates a fault recovery threshold, including: detecting the number of data blocks that have failed in the system, and performing statistics on the data of the fault data blocks for each stripe, when the strips are When the fault data block is reached, the strip is recovered and the fault recovery of the strip is started.
本实施例提供的基于纠删码的存储数据重构装置,还包括重构队列处理模块45,重构队列处理模块被配置为:对于存在故障数据块的各条带,构建至少一个待重构队列,待重构队列中记录条带标识信息,每个队列对应的各条带具有相同的故障数据块数量;对于达到启动故障恢复阈值的待重构队列,根据队列对应的各条带具有的故障数据块数量从大到小依次选择待重构队列,对选择的待重构队列中的条带分别启动故障恢复。The apparatus for reconstructing the data based on the erasure code according to the embodiment further includes a reconstruction queue processing module 45 configured to: construct at least one to be reconstructed for each stripe of the faulty data block In the queue, the stripe identification information is recorded in the queue to be reconstructed, and each strip corresponding to each queue has the same number of fault data blocks; for the queue to be reconstructed that reaches the startup fault recovery threshold, according to each strip corresponding to the queue The number of fault data blocks is selected from the largest to the smallest, and the queues to be reconstructed are selected, and the fault recovery is started for the strips in the selected queue to be reconstructed.
数据重构模块43利用该条带的非故障数据块进行数据重构。依次选择故障数据块多的条带进行故障恢复,从网络中读取该条带对应的k个正常存储数据,得到原始文件;然后根据条带id及当前节点、网络可用性情况,计算出放置条带的新的n个节点集合;并根据纠删码算法编码得到n个数据块,通过网络将条带信息及数据块分别发送到该新节点;各新节点根据情况更新本地信息;将数据写入本节点,完成数据重构。 The data reconstruction module 43 performs data reconstruction using the non-faulty data blocks of the stripe. Select a strip with more faulty data blocks to recover the fault, read the k normal storage data corresponding to the strip from the network, and obtain the original file; then calculate the placement bar according to the strip id and the current node and network availability. a new set of n nodes; and encodes n data blocks according to an erasure code algorithm, respectively sends strip information and data blocks to the new node through the network; each new node updates local information according to the situation; writes the data Enter the node and complete the data reconstruction.
本实施例提供的基于纠删码的存储数据重构装置,通过确定启动故障恢复阈值,启动故障恢复阈值小于等于条带存储数据块数量与纠删码重构最少数据块数量的差值,且大于等于1;对于故障数据块数量达到启动故障恢复阈值的条带,启动该条带的故障恢复;利用该条带的非故障数据块进行数据重构。相比于相关技术,有效的减少了进行故障恢复的次数,从而减轻了系统的带宽消耗,使得系统更稳定,提高了系统业务性能。The apparatus for reconstructing data based on the erasure code provided by the embodiment, by determining the startup failure recovery threshold, starts the difference that the failure recovery threshold is less than or equal to the number of stripe storage blocks and the minimum number of blocks of the erasure code reconstruction, and If the number of faulty data blocks reaches the start fault recovery threshold, the fault recovery of the strip is started; and the non-faulty data block of the strip is used for data reconstruction. Compared with the related technology, the number of times of failure recovery is effectively reduced, thereby reducing the bandwidth consumption of the system, making the system more stable and improving the service performance of the system.
实施例三:Embodiment 3:
本实施例提供一种基于纠删码的存储节点,请参见图5,包括处理器51和物理存储介质52,其中处理器51被配置为:确定启动故障恢复阈值,并将阈值分发到其它存储节点;启动故障恢复阈值小于等于条带存储数据块数量与纠删码重构最少数据块数量的差值,且大于等于1;扫描本存储节点负责的各条带的故障情况,对于故障数据块数量达到启动故障恢复阈值的条带,启动该条带的故障恢复;从本存储节点的物理存储介质以及其他存储节点的物理存储介质中提取该条带的非故障数据块进行数据重构。The embodiment provides a storage node based on an erasure code. Referring to FIG. 5, the processor 51 and the physical storage medium 52 are included, wherein the processor 51 is configured to: determine a startup failure recovery threshold, and distribute the threshold to other storage. Node; the startup failure recovery threshold is less than or equal to the difference between the number of stripe storage data blocks and the minimum number of data blocks reconstructed by the erasure code, and is greater than or equal to 1; scanning the fault condition of each strip responsible for the storage node, for the fault data block The number of the fault recovery threshold is reached, and the fault recovery of the strip is started; the non-faulty data block of the strip is extracted from the physical storage medium of the storage node and the physical storage medium of the other storage node for data reconstruction.
该处理器51在系统初始化时,对系统中的启动故障恢复阈值进行设置,将启动故障恢复阈值设置为初始值;然后在系统进行文件读写操作时,根据系统负载情况动态调整启动故障恢复阈值,系统负载越重启动故障恢复阈值越大。通过该启动故障恢复阈值的设置,根据系统负载动态调整系统中进行数据重构的频率,有效的减轻了系统的带宽消耗。The processor 51 sets a startup failure recovery threshold in the system when the system is initialized, and sets the startup failure recovery threshold to an initial value; then dynamically adjusts the startup failure recovery threshold according to the system load condition when the system performs file read and write operations. The heavier the system load, the larger the recovery threshold is. Through the setting of the startup failure recovery threshold, the frequency of data reconstruction in the system is dynamically adjusted according to the system load, thereby effectively reducing the bandwidth consumption of the system.
为能方便的统计故障数据块和条带信息,该处理器51还可以被配置为:对于本存储节点存在故障数据块的各条带,构建至少一个待重构队列,待重构队列中记录条带标识信息,每个待重构队列对应的各条带具有相同的故障数据块数量;对于达到启动故障恢复阈值的待重构队列,根据待重构队列对应的各条带具有的故障数据块数量从大到小依次选择待重构队列,对选择的待重构队列中的条带分别启动故障恢复。该服务器51在进行数据重构时,可以从其所在存储节点的物理存储介质中获取该条带的非故障数据块,也可以从其他存储的物理存储介质中获取该条带的非故障数据块。本实施例中的物理存储介质可以是被配置为存储数据的存储单元。The processor 51 may be configured to: construct, for each strip of the faulty data block, at least one queue to be reconstructed, and record the queue to be reconstructed, in order to conveniently collect the faulty data block and the stripe information. Stripe identification information, each strip corresponding to the queue to be reconstructed has the same number of fault data blocks; for the queue to be reconstructed to reach the fault recovery threshold, the fault data according to each strip corresponding to the queue to be reconstructed The number of blocks is selected from the largest to the smallest, and the queues to be reconstructed are sequentially selected, and the fault recovery is started for the strips in the selected queue to be reconstructed. When performing data reconstruction, the server 51 may obtain the non-faulty data block of the stripe from the physical storage medium of the storage node where the server is located, or obtain the non-faulty data block of the stripe from other stored physical storage media. . The physical storage medium in this embodiment may be a storage unit configured to store data.
需要理解的是,本实施例中的处理器51可以是一个处理器51,其中设置有 不同的功能模块来完成上述不同的处理过程;该处理器51也可以是具有不同处理功能的多个处理器51,每个处理器51完成上述的一项处理或几项处理。It should be understood that the processor 51 in this embodiment may be a processor 51, which is provided with Different functional modules are used to perform the different processes described above; the processor 51 may also be a plurality of processors 51 having different processing functions, each of which performs one of the above processes or several processes.
本实施例提供的基于纠删码的存储节点,通过确定启动故障恢复阈值,启动故障恢复阈值小于等于条带存储数据块数量与纠删码重构最少数据块数量的差值,且大于等于1;对于故障数据块数量达到启动故障恢复阈值的条带,启动该条带的故障恢复;利用该条带的非故障数据块进行数据重构。相比于相关技术,有效减少了进行故障恢复的次数,从而减少了系统的带宽消耗,使得系统更稳定,提高了业务性能。The storage node based on the erasure code provided in this embodiment determines that the fault recovery threshold is started, and the fault recovery threshold is less than or equal to the difference between the number of stripe data blocks and the minimum number of data blocks reconstructed by the erasure code, and is greater than or equal to 1 For the stripe whose number of faulty data blocks reaches the startup fault recovery threshold, the fault recovery of the strip is initiated; the non-faulty data block of the stripe is used for data reconstruction. Compared with related technologies, the number of times of failure recovery is effectively reduced, thereby reducing the bandwidth consumption of the system, making the system more stable and improving service performance.
实施例四:Embodiment 4:
在对数据进行存储的存储集群中,通常包括多个存储节点,该存储节点可以是上述实施例三提供的存储节点,如图6所示,该存储节点通常包括管理中心61,管理代理62,分布式文件存储客户端63,数据路由64和本地数据存储服务65等部分。其中管理中心61被配置为维护集群的成员和状态,以及数据分布规则、数据恢复规则等,提供强一致性的决策,。其通常默认部署在3个存储节点上,形成管理中心集群;该管理中心61也可部署在独立服务器上,进行集群管理。管理中心集群基于paxos算法设计实现了更适用的一致性选举算法,使节点状态的变化在整系统所有节点上是统一的。管理代理62被配置为实现各节点与管理中心61的通信,周期性向管理中心61提供节点健康信息,并接收管理中心61的控制指令。管理中心61可以部署在每个存储节点上。分布式文件存储客户端63被配置为提供分布式集群接入点服务,也可以看作应用程序访问存储系统的代理,提供给应用程序通用的文件操作接口,如C API、IAVA API、NFS(Network File System,网络文件系统)、CIFS(Common Internet File System,通用网络文件系统)等;与客户端60进行数据交互,该客户端60可以是该存储集群对应的用户客户端。数据路由64负责文件访问控制,数据文件分布和各种数据的管理,元数据保存。通过数据定位功能与本地数据存储服务进程通信,响应来自分布式文件存储客户端的读写请求,将请求路由到节点上的本地数据存储服务进程,实现数据的访问、副本分发等。其采用集群方式部署在每个存储节点上,各数据定位模块之间可以共享内存数据,零故障切换时间,可以很方便的进行扩容,以提供海量元数据容量。数据路由中维护待重构队列Qi本地数据存储服务管理负责实际物理介质的空间资源管理维护,可以负责存储和查 找本地对象,执行I/O操作。本地数据存储服务是实际处理数据读写的进程,与物理存储设备交互,实现数据的读写功能。上述存储集群可以是NAS存储集群,也可以是其他任何被配置为数据存储的存储集群。In the storage cluster for storing the data, the storage node is usually a storage node provided by the foregoing embodiment 3. As shown in FIG. 6, the storage node usually includes a management center 61 and a management agent 62. Distributed file storage client 63, data routing 64 and local data storage service 65 and the like. The management center 61 is configured to maintain the members and status of the cluster, as well as data distribution rules, data recovery rules, etc., to provide strong consistency decisions. It is usually deployed on three storage nodes by default to form a management center cluster; the management center 61 can also be deployed on a separate server for cluster management. The management center cluster is designed based on paxos algorithm to implement a more suitable consistency election algorithm, so that the change of node state is unified on all nodes of the whole system. The management agent 62 is configured to implement communication between the nodes and the management center 61, periodically provide node health information to the management center 61, and receive control instructions from the management center 61. A management center 61 can be deployed on each storage node. The distributed file storage client 63 is configured to provide a distributed cluster access point service, and can also be regarded as an agent for an application access storage system, providing a file operation interface common to applications such as C API, IAVA API, NFS ( Network File System (Network File System), CIFS (Common Internet File System), etc.; perform data interaction with the client 60, and the client 60 may be a user client corresponding to the storage cluster. Data routing 64 is responsible for file access control, data file distribution and management of various data, and metadata storage. The data location function communicates with the local data storage service process, responds to read and write requests from the distributed file storage client, and routes the request to the local data storage service process on the node to implement data access, copy distribution, and the like. It is deployed in a cluster mode on each storage node. Each data positioning module can share memory data and zero failover time, which can be easily expanded to provide massive metadata capacity. Maintain routing data queues Q i to be reconstructed local data storage service manager responsible for the actual physical media resource management and maintenance space, and can be responsible for storing lookup local objects, perform I / O operations. The local data storage service is a process that actually processes data read and write, interacts with physical storage devices, and implements data read and write functions. The storage cluster may be a NAS storage cluster or any other storage cluster configured as data storage.
基于上述存储集群,本实施例提供一种基于纠删码的存储数据重构方法,请参见图7,包括:Based on the foregoing storage cluster, the embodiment provides a method for reconstructing stored data based on the erasure code. Referring to FIG. 7, the method includes:
步骤S701,对存储系统进行初始化设置。Step S701, initializing the storage system.
对系统进行初始化设置,包括:通过管理中心61设置采用纠删码(n,k),则可容忍最大故障数据块数为n-k,当n-k个数据块故障时,可以通过另外k个正常的数据块恢复原始文件,并重新生成n个数据块保持系统冗余度。系统必须通过向新节点写额外的冗余数据来维持必要的数据可靠性。然后初始化设置当前启动故障恢复阈值,为保证系统冗余度最大,可靠性最高,该初始化启动故障恢复阈值r可以为1。然后各存储节点初始化本节点的条带列表,条带列表中每项内容包括条带id,条带主id信息,条带涉及所有节点及磁盘id信息。各存储节点初始化本节点的待重构队列Qi,i对应条带相关数据块故障数(1<=i<=n-k)。Q1中所有条带数据块故障数为1,同样的,Qn-k中所有条带数据块故障数为n-k。队列中每项记录条带id。故存储系统初始化时,每个存储节点上有n-k个待重构队列,且每个队列为空。The initial setting of the system includes: setting the use of the erasure code (n, k) through the management center 61, the maximum number of faulty data blocks can be tolerated as nk, and when the nk data blocks are faulty, another k normal data can be passed. The block restores the original file and regenerates n data blocks to maintain system redundancy. The system must maintain the necessary data reliability by writing additional redundant data to the new node. Then, the initial startup failure recovery threshold is initialized, and the maximum reliability is the highest, and the reliability is the highest. The initialization startup failure recovery threshold r can be 1. Then each storage node initializes a stripe list of the node, and each item in the stripe list includes a stripe id, a stripe master id information, and the stripe relates to all nodes and disk id information. Each storage node initializes the to-be-reconstructed queue Q i , i of the node to correspond to the number of strip-related data block failures (1<=i<=nk). The number of failures of all stripe data blocks in Q 1 is 1. Similarly, the number of failures of all stripe data blocks in Q nk is nk. Each record in the queue is stripped with an id. Therefore, when the storage system is initialized, there are nk queues to be reconstructed on each storage node, and each queue is empty.
步骤S702,进行文件写入操作。In step S702, a file writing operation is performed.
在完成系统初始化设置后,进行文件写入操作,包括:分布式文件存储客户端63根据负载均衡原则,动态选取一个存储节点的数据路由响应写入请求;数据路由根据当前存储系统规则查找或计算得到文件file应写入的n个节点及磁盘id,同时确定其中一个(节点id,磁盘id)元组为主id;然后根据(n,k)纠删码编码得到n个数据块;并将条带信息及数据块分别发送到n个节点。前述的n个节点上,数据路由将条带信息记录入条带列表,本地数据存储服务将数据写入本地磁盘。After the system initialization is completed, the file write operation is performed, including: the distributed file storage client 63 dynamically selects a data node response write request of a storage node according to the load balancing principle; the data route is searched or calculated according to the current storage system rule. Obtaining n nodes and disk ids that the file file should be written, and determining one of the (node id, disk id) tuples as the primary id; then encoding n data blocks according to the (n, k) erasure code; The stripe information and data blocks are sent to n nodes respectively. On the aforementioned n nodes, the data route records the stripe information into the stripe list, and the local data storage service writes the data to the local disk.
步骤S703,检测系统的状态信息。Step S703, detecting status information of the system.
当完成存储系统初始化,系统运行过程中用户频繁发起读写文件操作,在文件读写过程中,由于硬件损坏和软件故障等原因,在数据存储过程中可能发生数据的损坏和丢失等数据故障,此时为保证系统的稳定性,需要对系统中的 存储数据块状态进行检测,以对发生故障的数据块对应的条带进行及时的故障恢复。其中,检测系统的状态信息,包括:管理中心61从各节点管理代理62周期性读取当前时间周期Pi内系统负载信息和系统可用性状态信息,其中系统可用性状态信息包括各节点、磁盘、网络链路状态等。同时管理中心61需要将收集到的信息进行加工,包括过滤掉从故障节点获取到的脏数据等;管理中心61根据处理过后的系统可用信息、决策节点及网络故障情况,主动心跳等方式,确认系统的故障情况。同时根据系统负载信息,确定下一周期Pi+1的启动故障恢复阈值r。然后将确定好的启动故障恢复阈值下发到各存储节点的数据路由64,并将全局的故障情况也下发到各存储节点。When the storage system is initialized, the user frequently initiates read and write files during the system running process. During the file reading and writing process, data corruption and data loss may occur during data storage due to hardware damage and software failure. In this case, in order to ensure the stability of the system, it is necessary to detect the state of the stored data block in the system to perform timely fault recovery on the strip corresponding to the faulty data block. The status information of the detection system includes: the management center 61 periodically reads the system load information and the system availability status information in the current time period P i from each node management agent 62, wherein the system availability status information includes each node, a disk, and a network. Link status, etc. At the same time, the management center 61 needs to process the collected information, including filtering out the dirty data obtained from the faulty node, etc.; the management center 61 confirms according to the processed system available information, the decision node and the network fault condition, and the active heartbeat. The fault condition of the system. At the same time, according to the system load information, the startup failure recovery threshold r of the next period P i+1 is determined. Then, the determined startup failure recovery threshold is sent to the data route 64 of each storage node, and the global fault condition is also sent to each storage node.
步骤S704,进行数据重构。In step S704, data reconstruction is performed.
当系统中存在故障数据块时,对于满足启动故障恢复阈值的条带开启故障恢复,进行数据重构。当检测到系统中存在故障数据块后,进行待重构队列的设置或更新预先构建的待重构队列。在数据读写过程中,数据路由64扫描本节点负责的各条带(条带主数据块在本节点),刷新待重构队列,包括:若条带S的所有数据块所在的节点或磁盘均正常,且条带S上一周期未在任一待重构队列,则跳过该条带,扫描下一个条带;若条带S的所有数据块所在的节点或磁盘均正常,但条带S上一周期在待重构队列Qi中,则将条带S从Qi删除,同时更新条带S的队列信息;若条带S涉及的故障节点、磁盘块个数为i(条带S的部分数据块所在的节点或磁盘异常),且条带S上一周期在待重构队列Qi中,则将条带S从Qi删除,而后插入到队列Qi队尾,同时更新条带S的队列信息;若条带S涉及的故障节点、磁盘块个数为i(条带S的部分数据块所在的节点或磁盘异常),且条带S上一周期未在任一待重构队列,则将条带S插入到队列Qi队尾,同时更新条带S的队列信息,通过上述更新过程使得一个待重构队列中的条带的故障数相同,在进行重构时,优先选择故障数多的待重构队列中的条带进行数据重构。When there is a faulty data block in the system, the data is reconstructed for the stripe fault recovery that meets the startup fault recovery threshold. After detecting that there is a faulty data block in the system, setting the queue to be reconstructed or updating the pre-built queue to be reconstructed. In the data reading and writing process, the data routing 64 scans each stripe responsible for the node (the strip main data block is at the local node), and refreshes the queue to be reconstructed, including: if the node or disk of all the data blocks of the strip S is located If the queue is not in any of the queues to be reconstructed, the strip is skipped and the next strip is scanned. If all the data blocks of the strip S are in the normal state, the strips are normal. In the previous cycle of S to be reconstructed in the queue Q i , the strip S is deleted from the Q i and the queue information of the strip S is updated; if the fault node and the number of the disk blocks involved in the strip S are i (stripe If the node or disk where the partial data block of S is located is abnormal, and the previous period of the strip S is in the queue Q i to be reconstructed, the strip S is deleted from the Q i and then inserted into the queue Q i and updated at the same time. The queue information of the strip S; if the faulty node involved in the strip S, the number of disk blocks is i (the node where the partial data block of the strip S is located or the disk is abnormal), and the strip S is not in any one cycle configuration queue, then the strip S is inserted into the tail queue Q i, queue information and update the strip S by the above-mentioned more Such a process the same number of faults to be reconstructed slice queue, during reconstruction, and more preferred number of faults to be reconstructed with a strip queue for data reconstruction.
如果Qi(i=n-k)不为空,则依次从Qi队首取出条带id,启动重构流程;若Qi为空,则i=i-1,则重复进行条带扫描,直至i小于r。当待重构条带较多,以致单个周期无法完成所有重构时,可能会发生重构过程中,队列Qi被排队模块调整。对于这种情况,始终从Qi(i=n-k)开始重构,流程包括:数据路由64根据本地条带队列中记录的队列、节点对应关系,获取到条带S当前存储在的n个节点集合 Set;根据系统节点、网络故障状态,确定k个正常节点;然后通过网络从k个节点的本地数据存储服务读取数据;根据纠删码算法解码得到原始文件。再根据条带id及当前节点、网络可用性情况,计算出放置条带的新的n个节点集合Set’;将根据纠删码算法编码得到的n个数据块,通过网络将条带信息及数据块分别发送到集合Set∪Set’中所有可达节点;各新节点根据情况更新本地信息。对于Set’中的节点n,如果n∈Set∩Set’,节点n的数据路由将条带信息记录入条带列表;如果
Figure PCTCN2017088477-appb-000002
节点N的数据路由将条带信息记录入条带列表,本地数据存储服务模块将数据写入本节点,完成数据重构;对于Set中的节点n,如果
Figure PCTCN2017088477-appb-000003
空间回收模块将条带S对应数据删除,空间回收,同时,数据路由将对应条带信息记录从条带列表中删除。
If Q i (i = nk) is not empty, then sequentially taken out from the first tape Q i team id, start the reconstruction process; if Q i is empty, then i = i-1, is repeated scanning strip, until i is less than r. When there are many stripes to be reconstructed, so that all reconstructions cannot be completed in a single cycle, the queue Q i may be adjusted by the queuing module during the reconstruction process. In this case, always starts reconstruction of Q i (i = nk), the process comprising: a data routing 64 a correspondence relationship according to a local strip queue records in the queue, the node, the strip S is acquired currently stored in the n nodes Set Set; according to the system node, the network fault state, determine k normal nodes; then read data from the k node's local data storage service through the network; decode the original file according to the erasure code algorithm. Then, according to the strip id, the current node, and the availability of the network, a new set of n nodes set to be placed is calculated; the n data blocks encoded according to the erasure code algorithm are used to strip the information and data through the network. The blocks are sent to all reachable nodes in the set Set∪Set'respectively; each new node updates the local information according to the situation. For node n in Set', if n∈Set∩Set', the data route of node n records the stripe information into the stripe list;
Figure PCTCN2017088477-appb-000002
The data route of the node N records the stripe information into the stripe list, and the local data storage service module writes the data to the node to complete the data reconstruction; for the node n in the set, if
Figure PCTCN2017088477-appb-000003
The space reclamation module deletes the data corresponding to the strip S and reclaims the space. At the same time, the data routing deletes the corresponding strip information record from the strip list.
本公开实施例还提供了一种计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令设置为执行上述任一实施例中的方法。Embodiments of the present disclosure also provide a computer readable storage medium storing computer executable instructions arranged to perform the method of any of the above embodiments.
本实施例提供的基于纠删码的存储数据重构方法,根据系统可用情况及系统负载情况,将同一条带的多个数据块恢复合并为一次完成,有效降低数据恢复带宽占用。相比于相关技术中的一个数据块故障,要消耗k倍带宽恢复,恢复多个数据块(假设f),需要消耗f*k倍带宽。本实施例提供的基于纠删码的存储数据重构方法,将恢复f个数据块需要k倍带宽,折算到恢复一个数据块,仅需要k/f倍带宽,从而避免不必要的数据恢复,大大降低带宽占用;而且通过减少数据恢复造成的带宽消耗,有效降低网络通信成本,提升业务性能;根据负载动态调整启动故障恢复阈值,当负载较轻,保证系统数据快速恢复。而当负载较重时,保证故障严重的条带快速恢复,从而有效保障了系统可靠性,在系统可靠性和系统业务性能间达到较好的平衡。另外,本实施例提供的基于纠删码的存储数据重构方法,实现简单,无需修改底层内核,对于windows、Linux等各种操作系统均适用;并且与平台无关,即实用于各种不同架构的分布式存储系统都适用。The method for reconstructing stored data based on the erasure code provided in this embodiment recovers and merges multiple data blocks of the same strip into one completion according to the availability of the system and the load condition of the system, thereby effectively reducing the data recovery bandwidth occupation. Compared to one block failure in the related art, it takes up to k times bandwidth recovery, and recovering multiple data blocks (assuming f) requires f*k times bandwidth. The method for reconstructing stored data based on the erasure code provided in this embodiment requires k times bandwidth to recover f data blocks, and converts to recover one data block, only needs k/f times bandwidth, thereby avoiding unnecessary data recovery. The bandwidth consumption is greatly reduced; and the bandwidth consumption caused by data recovery is reduced, the network communication cost is effectively reduced, and the service performance is improved; the startup failure recovery threshold is dynamically adjusted according to the load, and the system data is quickly restored when the load is light. When the load is heavy, the strips with severe faults are quickly restored, thereby effectively ensuring system reliability and achieving a good balance between system reliability and system service performance. In addition, the method for reconstructing the stored data based on the erasure code provided by the embodiment is simple to implement, and does not need to modify the underlying kernel, and is applicable to various operating systems such as windows and Linux; and is independent of the platform, that is, it is used for various architectures. The distributed storage system is applicable.
显然,本领域的技术人员应该明白,上述本公开实施例的各模块或各步骤可以用通用的计算装置来实现,它们可以集中在单个的计算装置上,或者分布在多个计算装置所组成的网络上,可选地,它们可以用计算装置可执行的程序代码来实现,从而,可以将它们存储在计算机存储介质(ROM/RAM、磁碟、光盘)中由计算装置来执行,并且在某些情况下,可以以不同于此处的顺序执行 所示出或描述的步骤,或者将它们分别制作成各个集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。所以,本公开不限制于任何特定的硬件和软件结合。Obviously, those skilled in the art should understand that the modules or steps of the above embodiments of the present disclosure may be implemented by a general computing device, which may be concentrated on a single computing device or distributed among multiple computing devices. On the network, optionally, they may be implemented by program code executable by the computing device, such that they may be stored in a computer storage medium (ROM/RAM, disk, optical disk) by a computing device, and at some In some cases, it can be performed in a different order than here. The steps shown or described are either made separately into individual integrated circuit modules, or a plurality of modules or steps are fabricated as a single integrated circuit module. Therefore, the present disclosure is not limited to any specific combination of hardware and software.
以上内容是结合实施方式对本公开实施例所作的详细说明,不能认定本公开的实施只局限于这些说明。对于本公开所属技术领域的普通技术人员来说,在不脱离本公开实施例的范围的前提下,还可以做出若干简单推演或替换,都应当视为属于本公开的保护范围。The above is a detailed description of the embodiments of the present disclosure in connection with the embodiments, and the implementation of the present disclosure is not limited to the description. It is to be understood by those skilled in the art that the present invention may be construed as being limited to the scope of the present disclosure without departing from the scope of the embodiments of the present disclosure.
工业实用性Industrial applicability
本申请提供的基于纠删码的存储数据重构方法和装置、存储节点,减少了进行故障恢复的次数,从而减轻了系统的带宽消耗,使得系统更稳定,提高了系统的业务性能。 The method and device for storing data based on erasure code provided by the present application and the storage node reduce the number of times of failure recovery, thereby reducing the bandwidth consumption of the system, making the system more stable and improving the service performance of the system.

Claims (13)

  1. 一种基于纠删码的存储数据重构方法,包括:A method for reconstructing stored data based on erasure code includes:
    确定启动故障恢复阈值,所述启动故障恢复阈值小于等于条带存储数据块数量与纠删码重构最少数据块数量的差值,且大于等于1;Determining a startup failure recovery threshold, the startup failure recovery threshold being less than or equal to a difference between the number of stripe storage data blocks and the minimum number of data blocks reconstructed by the erasure code, and greater than or equal to 1;
    对于故障数据块数量达到所述启动故障恢复阈值的条带,启动该条带的故障恢复;For the stripe whose number of faulty data blocks reaches the startup fault recovery threshold, initiate fault recovery of the stripe;
    利用该条带的非故障数据块进行数据重构。Data reconstruction is performed using the non-faulty data block of the strip.
  2. 根据权利要求1所述的方法,还包括:The method of claim 1 further comprising:
    将所述启动故障恢复阈值设置为初始值;Setting the startup failure recovery threshold to an initial value;
    根据系统负载情况动态调整所述启动故障恢复阈值,系统负载越重所述启动故障恢复阈值越大。The startup failure recovery threshold is dynamically adjusted according to the system load condition, and the system load is heavier, and the startup failure recovery threshold is larger.
  3. 根据权利要求2所述的方法,其中,所述根据系统负载情况动态调整所述启动故障恢复阈值包括:设置启动故障恢复阈值的初始值为1,周期性计算系统的负载信息,并根据预设的规则判定系统负载为重载或轻载;当重载时将下一周期的启动故障恢复阈值增加预设步进值,当轻载时将下一周期的启动故障恢复阈值减去预设步进值;所述预设步进值包括大于等于1,且小于等于条带存储数据块数量与纠删码重构最少数据块数量差值的正整数。The method according to claim 2, wherein the dynamically adjusting the startup failure recovery threshold according to a system load condition comprises: setting an initial value of the startup failure recovery threshold to 1, periodically calculating load information of the system, and according to the preset The rule determines that the system load is heavy or light load; when the overload occurs, the startup failure recovery threshold of the next cycle is increased by a preset step value, and when the light load is performed, the startup failure recovery threshold of the next cycle is subtracted from the preset step. The preset step value includes a positive integer greater than or equal to 1, and less than or equal to the difference between the number of stripe storage data blocks and the minimum number of data blocks reconstructed by the erasure code.
  4. 根据权利要求3所述的方法,其中,所述周期性计算系统的负载信息,并根据预设的规则判定系统负载为重载或轻载包括:The method according to claim 3, wherein said periodically calculating load information of the system and determining that the system load is overloaded or lightly loaded according to a preset rule comprises:
    记Numi为时间周期Pi内完成的用户I/O请求个数,Latencyi(k)是在周期Pi内的第k个用户I/O的服务时间;Note that Num i is the number of user I/O requests completed in the time period P i , and Latency i (k) is the service time of the kth user I/O in the period P i ;
    设周期Pi的延迟上限为Limiti,且延迟协议为要求对于每个用户I/O,都满足Latencyi(k)≤Limiti;定义Violatei为违反延迟协议的用户I/O所占比率:Provided the maximum delay period P i is Limit i, and delay requirements for each protocol user I / O, are satisfied Latency i (k) ≤Limit i; Violate i is defined as a protocol violation delay user I / O percentage :
    Figure PCTCN2017088477-appb-100001
    Figure PCTCN2017088477-appb-100001
    如果在周期Pi内发生系统拥塞或者Violatei>δ,δ称为松弛因子,判断负载为重载;如果在周期Pi内未发生系统拥塞并且不满足Violatei>δ,则判 为轻载。If system congestion or Violate i > δ occurs in the period P i , δ is called a relaxation factor, and the load is judged to be overloaded; if system congestion does not occur in the period P i and Violate i > δ is not satisfied, it is judged as light load. .
  5. 根据权利要求1-4任一项所述的方法,还包括:A method according to any one of claims 1 to 4, further comprising:
    对于存在故障数据块的各条带,构建至少一个待重构队列,所述待重构队列中记录条带标识信息,每个待重构队列对应的各条带具有相同的故障数据块数量;For each stripe in the faulty data block, at least one queue to be reconstructed is constructed, and stripe identification information is recorded in the queue to be reconstructed, and each strip corresponding to each queue to be reconstructed has the same number of fault data blocks;
    对于达到所述启动故障恢复阈值的待重构队列,根据待重构队列对应的各条带具有的故障数据块数量从大到小依次选择待重构队列,对选择的待重构队列中的条带分别启动故障恢复。For the queue to be reconstructed, the queue to be reconstructed is selected according to the number of fault data blocks in each strip corresponding to the queue to be reconstructed, and the queue to be reconstructed is selected in the selected queue to be reconstructed. The strips initiate fault recovery separately.
  6. 一种基于纠删码的存储数据重构装置,包括:A storage data reconstruction device based on erasure code includes:
    启动故障恢复阈值确定模块,被配置为确定启动故障恢复阈值,所述启动故障恢复阈值小于等于条带存储数据块数量与纠删码重构最少数据块数量的差值,且大于等于1;The startup failure recovery threshold determining module is configured to determine a startup failure recovery threshold, the startup failure recovery threshold being less than or equal to a difference between the number of stripe storage data blocks and the minimum number of data blocks reconstructed by the erasure code, and greater than or equal to 1;
    故障恢复启动模块,被配置为对于故障数据块数量达到所述启动故障恢复阈值的条带,启动该条带的故障恢复;The fault recovery startup module is configured to start a fault recovery of the stripe for a stripe whose number of fault data blocks reaches the boot fault recovery threshold;
    数据重构模块,被配置为利用该条带的非故障数据块进行数据重构。A data reconstruction module configured to utilize the non-faulty data blocks of the stripe for data reconstruction.
  7. 根据权利要求6所述的装置,还包括:启动故障恢复阈值调整模块,所述启动故障恢复阈值调整模块被配置为:The apparatus of claim 6, further comprising: activating a failure recovery threshold adjustment module, the startup failure recovery threshold adjustment module configured to:
    将所述启动故障恢复阈值设置为初始值;Setting the startup failure recovery threshold to an initial value;
    根据系统负载情况动态调整所述启动故障恢复阈值,系统负载越重所述启动故障恢复阈值越大。The startup failure recovery threshold is dynamically adjusted according to the system load condition, and the system load is heavier, and the startup failure recovery threshold is larger.
  8. 根据权利要求7所述的装置,其中,所述启动故障恢复阈值调整模块还被配置为:设置启动故障恢复阈值的初始值为1,周期性计算系统的负载信息,并根据预设的规则判定系统负载为重载或轻载;当重载时将下一周期的启动故障恢复阈值增加预设步进值,当轻载时将下一周期的启动故障恢复阈值减去预设步进值;所述预设步进值包括大于等于1,且小于等于条带存储数据块数量与纠删码重构最少数据块数量差值的正整数。The apparatus according to claim 7, wherein the startup failure recovery threshold adjustment module is further configured to: set an initial value of the startup failure recovery threshold to 1, periodically calculate load information of the system, and determine according to a preset rule. The system load is heavy or light load; when the load is heavy, the startup fault recovery threshold of the next cycle is increased by a preset step value, and when the light load is performed, the startup fault recovery threshold of the next cycle is subtracted from the preset step value; The preset step value includes a positive integer greater than or equal to 1, and less than or equal to the number of stripe storage data blocks and the minimum number of data blocks reconstructed by the erasure code.
  9. 根据权利要求6-8任一项所述的装置,还包括重构队列处理模块,所述 重构队列处理模块被配置为:Apparatus according to any of claims 6-8, further comprising a reconstruction queue processing module, said The refactoring queue processing module is configured to:
    对于存在故障数据块的各条带,构建至少一个待重构队列,所述待重构队列中记录条带标识信息,每个待重构队列对应的各条带具有相同的故障数据块数量;For each stripe in the faulty data block, at least one queue to be reconstructed is constructed, and stripe identification information is recorded in the queue to be reconstructed, and each strip corresponding to each queue to be reconstructed has the same number of fault data blocks;
    对于达到所述启动故障恢复阈值的待重构队列,根据待重构队列对应的各条带具有的故障数据块数量从大到小依次选择待重构队列,对选择的待重构队列中的条带分别启动故障恢复。For the queue to be reconstructed, the queue to be reconstructed is selected according to the number of fault data blocks in each strip corresponding to the queue to be reconstructed, and the queue to be reconstructed is selected in the selected queue to be reconstructed. The strips initiate fault recovery separately.
  10. 一种基于纠删码的存储节点,包括物理存储介质和处理器,;所述处理器被配置为:An erasure code based storage node includes a physical storage medium and a processor; the processor is configured to:
    确定启动故障恢复阈值,并将所述启动故障恢复阈值分发到其它存储节点;所述启动故障恢复阈值小于等于条带存储数据块数量与纠删码重构最少数据块数量的差值,且大于等于1;Determining a startup failure recovery threshold, and distributing the startup failure recovery threshold to other storage nodes; the startup failure recovery threshold is less than or equal to a difference between the number of stripe storage data blocks and the minimum number of data blocks reconstructed by the erasure code, and is greater than Equal to 1;
    扫描本存储节点负责的各条带的故障情况,对于故障数据块数量达到所述启动故障恢复阈值的条带,启动该条带的故障恢复;Scanning the fault condition of each strip that is responsible for the storage node, and starting the stripe fault recovery for the stripe whose number of faulty data blocks reaches the boot fault recovery threshold;
    从本存储节点的物理存储介质以及其他存储节点的物理存储介质中提取该条带的非故障数据块进行数据重构。The stripe non-faulty data block is extracted from the physical storage medium of the storage node and the physical storage medium of the other storage node for data reconstruction.
  11. 根据权利要求10所述的存储节点,其中,所述处理器还被配置为:The storage node of claim 10 wherein the processor is further configured to:
    将所述启动故障恢复阈值设置为初始值;Setting the startup failure recovery threshold to an initial value;
    根据系统负载情况动态调整所述启动故障恢复阈值,系统负载越重所述启动故障恢复阈值越大。The startup failure recovery threshold is dynamically adjusted according to the system load condition, and the system load is heavier, and the startup failure recovery threshold is larger.
  12. 根据权利要求10或11所述的存储节点,其中,所述处理器还被配置为:A storage node according to claim 10 or 11, wherein the processor is further configured to:
    对于本存储节点存在故障数据块的各条带,构建至少一个待重构队列,所述待重构队列中记录条带标识信息,每个待重构队列对应的各条带具有相同的故障数据块数量;For each stripe of the faulty data block in the storage node, at least one queue to be reconstructed is constructed, and stripe identification information is recorded in the queue to be reconstructed, and each strip corresponding to each queue to be reconstructed has the same fault data. Number of blocks;
    对于达到所述启动故障恢复阈值的待重构队列,根据待重构队列对应的各条带具有的故障数据块数量从大到小依次选择待重构队列,对选择的待重构队 列中的条带分别启动故障恢复。For the queue to be reconstructed, the queue to be reconstructed is selected according to the number of fault data blocks that each strip corresponding to the queue to be reconstructed has from the largest to the smallest, and the selected queue to be reconstructed is selected. The strips in the column initiate fault recovery, respectively.
  13. 一种计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令设置为执行权利要求1-5中任一项的方法。 A computer readable storage medium storing computer executable instructions arranged to perform the method of any of claims 1-5.
PCT/CN2017/088477 2016-06-29 2017-06-15 Method and device for reconstructing stored data based on erasure coding, and storage node WO2018001110A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610495313.8A CN107544862B (en) 2016-06-29 2016-06-29 Stored data reconstruction method and device based on erasure codes and storage node
CN201610495313.8 2016-06-29

Publications (1)

Publication Number Publication Date
WO2018001110A1 true WO2018001110A1 (en) 2018-01-04

Family

ID=60786768

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/088477 WO2018001110A1 (en) 2016-06-29 2017-06-15 Method and device for reconstructing stored data based on erasure coding, and storage node

Country Status (2)

Country Link
CN (1) CN107544862B (en)
WO (1) WO2018001110A1 (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109213637A (en) * 2018-11-09 2019-01-15 浪潮电子信息产业股份有限公司 Data reconstruction method, device and the medium of distributed file system clustered node
CN110568993A (en) * 2019-08-06 2019-12-13 新华三技术有限公司成都分公司 Data updating method and related device
CN110597655A (en) * 2019-06-26 2019-12-20 中大编码有限公司 Fast predictive restoration method for coupling migration and erasure code-based reconstruction and implementation
CN110874284A (en) * 2018-09-03 2020-03-10 阿里巴巴集团控股有限公司 Data processing method and device
CN111400083A (en) * 2020-03-17 2020-07-10 上海七牛信息技术有限公司 Data storage method and system and storage medium
CN111581020A (en) * 2020-04-22 2020-08-25 上海天玑科技股份有限公司 Method and device for data recovery in distributed block storage system
CN111625394A (en) * 2020-05-27 2020-09-04 成都信息工程大学 Data recovery method, device and equipment based on erasure codes and storage medium
CN113190384A (en) * 2021-05-21 2021-07-30 重庆紫光华山智安科技有限公司 Data recovery control method, device, equipment and medium based on erasure codes
US11182249B1 (en) 2020-06-24 2021-11-23 International Business Machines Corporation Block ID encoding in an erasure coded storage system
CN114415970A (en) * 2022-03-25 2022-04-29 北京金山云网络技术有限公司 Disk fault processing method and device for distributed storage system and server

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108763107B (en) * 2018-06-04 2022-03-01 平安科技(深圳)有限公司 Background disc writing flow control method and device, electronic equipment and storage medium
CN108959399B (en) * 2018-06-04 2022-07-15 平安科技(深圳)有限公司 Distributed data deletion flow control method and device, electronic equipment and storage medium
CN108804039B (en) * 2018-06-04 2021-01-29 平安科技(深圳)有限公司 Adaptive data recovery flow control method and device, electronic equipment and storage medium
CN110865901B (en) * 2018-08-28 2021-05-04 华为技术有限公司 Method and device for building EC (embedded control) strip
CN111506450B (en) * 2019-01-31 2024-01-02 伊姆西Ip控股有限责任公司 Method, apparatus and computer program product for data processing
CN111176900A (en) * 2019-12-30 2020-05-19 浪潮电子信息产业股份有限公司 Distributed storage system and data recovery method, device and medium thereof
CN111475329B (en) * 2020-02-25 2023-07-18 成都信息工程大学 Method and device for reducing predictive erasure code repair under big data application platform
CN111614720B (en) * 2020-04-13 2022-02-18 厦门大学 Cross-cluster flow optimization method for single-point failure recovery of cluster storage system
CN111679793B (en) * 2020-06-16 2023-03-14 成都信息工程大学 Single-disk fault rapid recovery method based on STAR code
CN111917823B (en) * 2020-06-17 2022-02-18 烽火通信科技股份有限公司 Data reconstruction method and device based on distributed storage Ceph
CN112799882A (en) * 2021-02-08 2021-05-14 上海交通大学 File perception recovery method and device based on graph algorithm
CN112783688B (en) * 2021-02-10 2022-06-03 上海交通大学 Erasure code data recovery method and device based on available partition level
CN113205836A (en) * 2021-03-26 2021-08-03 重庆冷存科技有限公司 Cold data reconstruction system and method based on erasure codes
CN113504875B (en) * 2021-06-24 2023-08-01 中国科学院计算技术研究所 Method and system for recovering erasure code system based on multistage scheduling
CN115657965B (en) * 2022-11-16 2023-04-07 苏州浪潮智能科技有限公司 Method, device and medium for configuring metadata

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130173996A1 (en) * 2011-12-30 2013-07-04 Michael H. Anderson Accelerated erasure coding system and method
CN103207761A (en) * 2013-04-17 2013-07-17 浪潮(北京)电子信息产业有限公司 Data backup method and data reconfiguration method for RAID (redundant arrays of independent disks) 5 system hot backup disks
CN103577274A (en) * 2012-07-31 2014-02-12 国际商业机器公司 Management method and device of memory array
US20140208022A1 (en) * 2013-01-21 2014-07-24 Kaminario Technologies Ltd. Raid erasure code applied to partitioned stripe
CN103955343A (en) * 2014-04-16 2014-07-30 华中科技大学 Failure node data reconstruction and optimization method based on I/O (input/output) flow line

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6993701B2 (en) * 2001-12-28 2006-01-31 Network Appliance, Inc. Row-diagonal parity technique for enabling efficient recovery from double failures in a storage array
JP2006171957A (en) * 2004-12-14 2006-06-29 Fujitsu Ltd Storage controller unit and storage control method
JP5752267B2 (en) * 2011-01-11 2015-07-22 ヒューレット−パッカード デベロップメント カンパニー エル.ピー.Hewlett‐Packard Development Company, L.P. Simultaneous request scheduling
US9582355B2 (en) * 2014-07-09 2017-02-28 Qualcomm Incorporated Systems and methods for reliably storing data using liquid distributed storage
CN104391759B (en) * 2014-11-11 2017-06-13 华中科技大学 The data archiving method of Load-aware in a kind of correcting and eleting codes storage
EP3230863B1 (en) * 2014-12-09 2022-03-02 Hitachi Vantara LLC A system and method for providing thin-provisioned block storage with multiple data protection classes
CN104881370B (en) * 2015-05-11 2018-01-12 中国人民解放军国防科学技术大学 Collaboration uses correcting and eleting codes and the reliable flash-memory storage system construction method of error correcting code
CN104935481B (en) * 2015-06-24 2018-03-09 华中科技大学 Data reconstruction method based on redundancy scheme under a kind of distributed storage

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130173996A1 (en) * 2011-12-30 2013-07-04 Michael H. Anderson Accelerated erasure coding system and method
CN103577274A (en) * 2012-07-31 2014-02-12 国际商业机器公司 Management method and device of memory array
US20140208022A1 (en) * 2013-01-21 2014-07-24 Kaminario Technologies Ltd. Raid erasure code applied to partitioned stripe
CN103207761A (en) * 2013-04-17 2013-07-17 浪潮(北京)电子信息产业有限公司 Data backup method and data reconfiguration method for RAID (redundant arrays of independent disks) 5 system hot backup disks
CN103955343A (en) * 2014-04-16 2014-07-30 华中科技大学 Failure node data reconstruction and optimization method based on I/O (input/output) flow line

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110874284A (en) * 2018-09-03 2020-03-10 阿里巴巴集团控股有限公司 Data processing method and device
CN110874284B (en) * 2018-09-03 2024-03-22 阿里巴巴集团控股有限公司 Data processing method and device
CN109213637A (en) * 2018-11-09 2019-01-15 浪潮电子信息产业股份有限公司 Data reconstruction method, device and the medium of distributed file system clustered node
CN110597655B (en) * 2019-06-26 2023-04-28 云链网科技(广东)有限公司 Migration and erasure code-based reconstruction coupling rapid prediction repair method and device
CN110597655A (en) * 2019-06-26 2019-12-20 中大编码有限公司 Fast predictive restoration method for coupling migration and erasure code-based reconstruction and implementation
CN110568993A (en) * 2019-08-06 2019-12-13 新华三技术有限公司成都分公司 Data updating method and related device
CN110568993B (en) * 2019-08-06 2022-04-12 新华三技术有限公司成都分公司 Data updating method and related device
CN111400083A (en) * 2020-03-17 2020-07-10 上海七牛信息技术有限公司 Data storage method and system and storage medium
CN111400083B (en) * 2020-03-17 2024-02-23 上海七牛信息技术有限公司 Data storage method and system and storage medium
CN111581020A (en) * 2020-04-22 2020-08-25 上海天玑科技股份有限公司 Method and device for data recovery in distributed block storage system
CN111581020B (en) * 2020-04-22 2024-03-19 上海天玑科技股份有限公司 Method and device for recovering data in distributed block storage system
CN111625394A (en) * 2020-05-27 2020-09-04 成都信息工程大学 Data recovery method, device and equipment based on erasure codes and storage medium
US11182249B1 (en) 2020-06-24 2021-11-23 International Business Machines Corporation Block ID encoding in an erasure coded storage system
WO2021260538A1 (en) * 2020-06-24 2021-12-30 International Business Machines Corporation Block id encoding in erasure coded storage system
CN113190384B (en) * 2021-05-21 2022-07-22 重庆紫光华山智安科技有限公司 Data recovery control method, device, equipment and medium based on erasure codes
CN113190384A (en) * 2021-05-21 2021-07-30 重庆紫光华山智安科技有限公司 Data recovery control method, device, equipment and medium based on erasure codes
CN114415970A (en) * 2022-03-25 2022-04-29 北京金山云网络技术有限公司 Disk fault processing method and device for distributed storage system and server

Also Published As

Publication number Publication date
CN107544862A (en) 2018-01-05
CN107544862B (en) 2022-03-25

Similar Documents

Publication Publication Date Title
WO2018001110A1 (en) Method and device for reconstructing stored data based on erasure coding, and storage node
JP6538780B2 (en) System-wide checkpoint avoidance for distributed database systems
US10536167B2 (en) Matrix-based error correction and erasure code methods and system and applications thereof
US10127233B2 (en) Data processing method and device in distributed file storage system
US10489422B2 (en) Reducing data volume durability state for block-based storage
US9971823B2 (en) Dynamic replica failure detection and healing
US10956276B2 (en) System state recovery in a distributed, cloud-based storage system
JP6404907B2 (en) Efficient read replica
US9846540B1 (en) Data durability using un-encoded copies and encoded combinations
US20170206140A1 (en) System and method for building a point-in-time snapshot of an eventually-consistent data store
US9785691B2 (en) Method and apparatus for sequencing transactions globally in a distributed database cluster
US9053166B2 (en) Dynamically varying the number of database replicas
US9185188B1 (en) Method and system for determining optimal time period for data movement from source storage to target storage
US20150201036A1 (en) Gateway device, file server system, and file distribution method
US20140136571A1 (en) System and Method for Optimizing Data Storage in a Distributed Data Storage Environment
CN109407977B (en) Big data distributed storage management method and system
US10372504B2 (en) Global usage tracking and quota enforcement in a distributed computing system
US10740198B2 (en) Parallel partial repair of storage
CN113168404B (en) System and method for replicating data in a distributed database system
CN107346270B (en) Method and system for real-time computation based radix estimation
US10223184B1 (en) Individual write quorums for a log-structured distributed storage system
US20160139996A1 (en) Methods for providing unified storage for backup and disaster recovery and devices thereof
US20170315869A1 (en) Fault-tolerant Enterprise Object Storage System for Small Objects
CN107566341B (en) Data persistence storage method and system based on federal distributed file storage system
JP6671708B2 (en) Backup restore system and backup restore method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17819107

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17819107

Country of ref document: EP

Kind code of ref document: A1