CN115454727B - Data recovery method, device and equipment and readable storage medium - Google Patents

Data recovery method, device and equipment and readable storage medium Download PDF

Info

Publication number
CN115454727B
CN115454727B CN202211409858.4A CN202211409858A CN115454727B CN 115454727 B CN115454727 B CN 115454727B CN 202211409858 A CN202211409858 A CN 202211409858A CN 115454727 B CN115454727 B CN 115454727B
Authority
CN
China
Prior art keywords
data
disk
fault
raid array
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211409858.4A
Other languages
Chinese (zh)
Other versions
CN115454727A (en
Inventor
李飞龙
许永良
孙明刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202211409858.4A priority Critical patent/CN115454727B/en
Publication of CN115454727A publication Critical patent/CN115454727A/en
Application granted granted Critical
Publication of CN115454727B publication Critical patent/CN115454727B/en
Priority to PCT/CN2023/093083 priority patent/WO2024098696A1/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1469Backup restoration techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • G06F3/0619Improving the reliability of storage systems in relation to data integrity, e.g. data losses, bit errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0644Management of space entities, e.g. partitions, extents, pools
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0662Virtualisation aspects
    • G06F3/0665Virtualisation aspects at area level, e.g. provisioning of virtual or logical volumes

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Quality & Reliability (AREA)
  • Computer Security & Cryptography (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a data recovery method, a device, equipment and a readable storage medium, wherein the method comprises the following steps: backing up data in a main control node in an auxiliary node in a multi-control node to take the data in the auxiliary node as mirror image data; monitoring whether the number of fault disks of the RAID array in the main control node exceeds the fault tolerance of the RAID array; if so, sending the information of the lost data in the fault disk to the auxiliary node, and acquiring the data corresponding to the information from the mirror image data by the auxiliary node and sending the data; receiving data corresponding to the information, and writing the data corresponding to the information into a corresponding partition of a hot spare disk corresponding to the fault disk; and replacing the corresponding fault disk by using the hot standby disk so as to form a RAID array with the normal disk. According to the technical scheme disclosed by the application, the mirror image data is added in the auxiliary node, and the mirror image data in the auxiliary node is utilized for data recovery after the number of the fault disks exceeds the fault tolerance, so that the data reliability of the storage system is improved.

Description

Data recovery method, device and equipment and readable storage medium
Technical Field
The present application relates to the field of data storage technologies, and in particular, to a data recovery method, apparatus, device, and readable storage medium.
Background
The society has entered into the big data era at present, and massive data need safe and reliable storage, and many trades not only have the exponential growth to the data volume that needs to store, and have reached extremely to the reliability requirement of storing data, often can lead to fatal business disasters because of losing a little data, for example silver industry and military industry field etc.. Therefore, storage systems have been looking for breakthroughs in both increasing data reliability and improving I/O (Input/Output) performance.
In the aspect of increasing data reliability, RAID (Redundant Arrays of Independent Disks) technology is used in the industry to improve data reliability, and Redundant Disks in a RAID array are used to recover data of a failed disk. In the aspect of improving I/O performance, the industry has utilized multiple control nodes to form a cluster, and specifically, in order to ensure high availability of the system in a storage system, at least two nodes are used to form an IOGROUP (input/output group), the at least two nodes are respectively connected to one port of a dual-port hard disk, the at least two nodes in the IOGROUP are opposite end nodes, one or more IOGROUP(s) form a cluster, and nodes in the cluster can communicate with each other. The primary node is responsible for processing I/O requests of the host, and the secondary node is responsible for background tasks (such as RAID array initialization, routing inspection and reconstruction tasks and the like) of the storage system, so that the I/O performance of the storage is provided. However, when the host node performs I/O data storage using RAID technology, if the number of failed disks in the storage system exceeds the maximum number of failed disks that can be recovered by the RAID array, the data of the failed disks cannot be recovered by the internal mechanism of the RAID array, and thus the data reliability of the storage system is low.
In summary, how to recover the data of the failed disk to improve the reliability of the data is a technical problem to be urgently solved by those skilled in the art.
Disclosure of Invention
In view of the above, an object of the present application is to provide a data recovery method, apparatus, device and readable storage medium for recovering failed disk data to improve data reliability.
In order to achieve the above purpose, the present application provides the following technical solutions:
a method of data recovery, comprising:
backing up data in a main control node in an auxiliary node in a multi-control node to take the data in the auxiliary node as mirror image data of the main control node;
monitoring whether the number of fault disks of the RAID array in the main control node exceeds the fault tolerance of the RAID array;
if so, sending the information of the lost data in the fault disk to the auxiliary node, and acquiring and sending data corresponding to the information from the mirror image data by the auxiliary node;
receiving data corresponding to the information, and writing the data corresponding to the information into a corresponding partition of a hot spare disk corresponding to the fault disk;
and replacing the corresponding fault disk by the hot standby disk so as to form a RAID array with the normal disk.
Preferably, the backing up the data in the primary control node in the secondary node in the multi-control node includes:
backing up the logical volume in the main control node in the auxiliary node to obtain a corresponding mirror image logical volume; the logical volume and the mirror image logical volume comprise a plurality of blocks, the blocks have mapping relations with sub-blocks in a disk, and the mapping relations are stored in the main control node and the auxiliary nodes;
sending information of the lost data in the failed disk to the auxiliary node, including:
according to the blocks contained in the fault disk and the mapping relation, data loss metadata corresponding to the logical volume are formed, and the data loss metadata are sent to the auxiliary node; the data loss metadata includes information on whether or not a block included in the logical volume is lost;
the auxiliary node acquires the data corresponding to the information from the mirror image data and sends the data, and the method comprises the following steps:
and the auxiliary node acquires the block number of the lost data according to the mapping relation and the data loss metadata, and sends the data corresponding to the block number in the mirror image logical volume to the main control node.
Preferably, the forming of the data loss metadata corresponding to the logical volume according to the partitions included in the failed disk and the mapping relationship includes:
according to the blocks contained in the fault disk and the mapping relation, the data loss metadata which corresponds to the logical volume and takes a bitmap as a data organization mode is formed; wherein each bit in the bitmap indicates whether a corresponding block contained in the logical volume is lost.
Preferably, before replacing the corresponding failed disk with the hot spare disk, the method further includes:
judging whether each strip in the RAID array needs to recalculate check blocks;
and if the stripe which needs to recalculate the check blocks exists, calculating the check blocks according to the blocks in the stripe, and writing the calculated check blocks into the corresponding subareas of the hot standby disk corresponding to the fault disk.
Preferably, the receiving the data corresponding to the information, and writing the data corresponding to the information into a corresponding partition of the hot spare disk corresponding to the failed disk includes:
and receiving the data corresponding to the information block by block, and writing the data corresponding to the information block by block into a corresponding partition of the hot spare disk corresponding to the fault disk.
Preferably, if the number of failed disks in the RAID array exceeds the fault tolerance of the RAID array, the method further includes:
redirecting the host's I/O request to the secondary node;
after replacing the corresponding failed disk with the hot spare disk to reconstruct a RAID array with a normal disk, the method further includes:
redirecting the host's I/O request to the master node.
Preferably, when redirecting the I/O request of the host to the secondary node, the method further comprises:
and storing the data newly sent by the host in the main control node, so that the data newly sent by the host stored in the main control node is used as mirror image data of the auxiliary node.
Preferably, if the number of failed disks in the RAID array does not exceed the fault tolerance of the RAID array, the method further includes:
calculating by using the blocks of the normal disk included in each stripe in the RAID array to obtain the blocks of the hot spare disk corresponding to the fault disk;
and after all the data lost by the failed disk are recovered, replacing the failed disk by the hot standby disk so as to form a RAID array together with the normal disk.
Preferably, the monitoring whether the number of failed disks of the RAID array in the main control node exceeds the fault tolerance of the RAID array includes:
and monitoring whether the number of the fault disks of the RAID array in the main control node exceeds the fault tolerance of the RAID array at regular time.
Preferably, the method further comprises the following steps:
if the number of the fault disks of the RAID array does not exceed the fault tolerance of the RAID array, returning to the step of monitoring whether the number of the fault disks of the RAID array in the main control node exceeds the fault tolerance of the RAID array at regular time.
Preferably, if the number of the auxiliary nodes is greater than 1, sending the information of the missing data in the failed disk to the auxiliary nodes, including:
and sending the information of the lost data in the fault disk to one auxiliary node selected from the plurality of auxiliary nodes according to a preset selection strategy.
Preferably, after replacing the corresponding failed disk with the hot spare disk to reconstruct the RAID array with a normal disk, the method further includes:
and cleaning mirror image data in the auxiliary node regularly.
A data recovery apparatus comprising:
the backup module is used for backing up data in a main control node in an auxiliary node in a multi-control node so as to take the data in the auxiliary node as mirror image data of the main control node;
the monitoring module is used for monitoring whether the number of the fault disks of the RAID array in the main control node exceeds the fault tolerance of the RAID array;
the sending module is used for sending the information of the lost data in the fault disk to the auxiliary node if the number of the fault disks of the RAID array exceeds the fault tolerance of the RAID array, and the auxiliary node acquires the data corresponding to the information from the mirror image data and sends the data;
the writing module is used for receiving the data corresponding to the information and writing the data corresponding to the information into a corresponding partition of the hot spare disk corresponding to the fault disk;
and the first replacement module is used for replacing the corresponding fault disk by using the hot standby disk so as to reconstruct the RAID array with the normal disk.
A data recovery apparatus comprising:
a memory for storing a computer program;
a processor for implementing the steps of the data recovery method as claimed in any one of the above when said computer program is executed.
A readable storage medium, having stored therein a computer program which, when executed by a processor, carries out the steps of the data recovery method as claimed in any one of the preceding claims.
The application provides a data recovery method, a device, equipment and a readable storage medium, wherein the method comprises the following steps: backing up data in a main control node in an auxiliary node in a multi-control node so as to take the data in the auxiliary node as mirror image data of the main control node; monitoring whether the number of fault disks of the RAID array in the main control node exceeds the fault tolerance of the RAID array; if so, sending the information of the lost data in the fault disk to the auxiliary node, and acquiring the data corresponding to the information from the mirror image data by the auxiliary node and sending the data; receiving data corresponding to the information, and writing the data corresponding to the information into a corresponding partition of a hot spare disk corresponding to the fault disk; and replacing the corresponding fault disk by the hot standby disk so as to reconstruct the RAID array with the normal disk.
According to the technical scheme, the data in the main control node is backed up in the auxiliary node in the multi-control node, and the data backed up in the auxiliary node is used as mirror image data of the main control node. When the number of the fault disks of the RAID array in the main control node exceeds the fault tolerance of the RAID array, the information of the lost data in the fault disks is sent to the auxiliary node, and the auxiliary node acquires the data corresponding to the information of the lost data from the mirror image data and sends the data out. After receiving the data sent by the auxiliary node, writing the data into a corresponding partition of a hot standby disk corresponding to the failed disk, and then replacing the failed disk by the hot standby disk to form a RAID array with a normal disk. Therefore, the mirror image data is added in the auxiliary node, and the data recovery is carried out by using the mirror image data in the auxiliary node after the number of the fault disks exceeds the fault tolerance, so that the data reliability of the storage system can be improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a flowchart of a data recovery method according to an embodiment of the present application;
fig. 2 is a flowchart of another data recovery method provided in an embodiment of the present application;
fig. 3 is a schematic structural diagram of a master node according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a reconstruction process when the number of failed disks exceeds a fault tolerance, according to an embodiment of the present disclosure;
fig. 5 is another schematic structural diagram of a master node according to an embodiment of the present application;
fig. 6 is a schematic diagram of an auxiliary node according to an embodiment of the present application;
FIG. 7 is a schematic diagram illustrating reconstruction and recovery of data of a failed disk when the number of failed disks does not exceed the fault tolerance according to an embodiment of the present application;
fig. 8 is a schematic structural diagram of a data recovery apparatus according to an embodiment of the present application;
fig. 9 is a schematic structural diagram of a data recovery device according to an embodiment of the present application.
Detailed Description
The core of the application is to provide a data recovery method, device, equipment and readable storage medium, which are used for recovering data of a failed disk so as to improve the reliability of the data.
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without making any creative effort belong to the protection scope of the present application.
Referring to fig. 1 and fig. 2, where fig. 1 shows a flowchart of a data recovery method provided in an embodiment of the present application, and fig. 2 shows a flowchart of another data recovery method provided in an embodiment of the present application. The data recovery method provided by the embodiment of the application can include:
s11: and backing up the data in the main control node in the auxiliary node in the multi-control node so as to take the data in the auxiliary node as mirror image data of the main control node.
It should be noted that the execution subject in the present application may be a master node in a multi-control node (including a master node and an auxiliary node), or may also be a storage system, and the execution subject is described as the multi-control node in the present application as an example.
First, the main control node may process an I/O request issued by the host, store data (specifically, i.e., I/O data) corresponding to the I/O request, and backup data in the main control node in an auxiliary node of the multiple control nodes, that is, backup data stored by the main control node and corresponding to the I/O request issued by the host in the auxiliary node. The data backed up in the auxiliary node may be specifically stored in a memory of the auxiliary node, so as to facilitate rapid data acquisition and transmission.
The data backed up in the auxiliary node can be used as mirror image data of the main control node by backing up the data in the main control node in the auxiliary node, so that the data in the fault disk in the main control node can be reconstructed and recovered by utilizing the radial data in the auxiliary node in the follow-up process, and the data reliability of the storage system is effectively improved.
S12: monitoring whether the number of the fault disks of the RAID array in the main control node exceeds the fault tolerance of the RAID array; if yes, executing step S13; if not, the process returns to step S12.
While the main control node processes the I/O request issued by the host, or after processing the I/O request issued by the host, the main control node may monitor and acquire the number of failed disks in the RAID array in the main control node, and determine whether the number of failed disks in the RAID array exceeds the fault tolerance of the RAID array (that is, the maximum number of failed disks that can be recovered by the RAID array).
The RAID array level has its own internal data redundancy, for example, the RAID5 array provides data redundancy of a P check block, the RAID5 array recovers data of a failed disk by using a single redundancy of the P check block (the P check block in the stripe is obtained by data block exclusive or operation, and the data block is effective data issued by the host), that is, the fault tolerance of the RAID5 array is 1, that is, when the number of the failed disks is 1, the internal data redundancy can be used to perform data recovery of the failed disk; the RAID6 array provides data dual redundancy of a P check block and a Q check block (the Q check block and the P check block in RAID6 true accumulation are cooperatively used, and two fault disks in the RAID6 array can be recovered), the RAID6 array recovers the data of the two fault disks by using the dual redundancy of the P check and the Q check, namely the fault tolerance of the RAID6 array is 2, namely when the number of the fault disks is not more than 2, the data recovery of the fault disks can be performed by adopting internal data redundancy. It should be noted that the RAID array mentioned in this application may specifically be a RAID5 array or a RAID6 array, and of course, may also be another RAID array, which is not limited in this application.
If the number of the failed disks in the RAID array exceeds the fault tolerance of the RAID array, the data of the failed disks cannot be recovered by using a data redundancy mechanism inside the RAID array, because the number of the failed disks that have failed at the same time exceeds the fault tolerance. In this case, the primary node needs to reconstruct the lost data in the failed disk using the mirror data of the secondary node. If the number of the failed disks in the RAID array does not exceed the fault tolerance of the RAID array, the process may return to step S12, that is, whether the number of the failed disks in the RAID array in the main control node exceeds the fault tolerance of the RAID array is continuously monitored.
S13: and sending the information of the lost data in the fault disk to the auxiliary node, and acquiring the data corresponding to the information from the mirror image data by the auxiliary node and sending the data.
In step S12, if the number of failed disks in the RAID array exceeds the fault tolerance of the RAID array, the failed disk data may be recovered by using the external redundancy provided by the auxiliary node. Specifically, the master node may obtain information of the lost data in the failed disk, so as to retrieve the lost data of the master node from the secondary node based on the information of the lost data. The information about the lost data may be, for example, which chunks (strips) in the failed disk are lost, where a chunk is a partition of the physical storage medium on the disk and is used for the granularity of data reconstruction by the RAID array.
After acquiring the information of the data lost in the failed disk, the main control node may send the information of the data lost in the failed disk to the auxiliary node. After receiving the information of the lost data, the auxiliary node can acquire data corresponding to the information of the lost data from the mirror image data stored in the auxiliary node, and send the acquired information of the lost data to the main control node, so that the main control node can reconstruct and recover the data according to the data.
In addition, the auxiliary node only retrieves the data corresponding to the lost data information and transmits the data to the main control node for data reconstruction, so that the data quantity required to be retrieved and transmitted can be minimized, and the time window of data reconstruction is also minimized.
S14: and receiving data corresponding to the information, and writing the data corresponding to the information into a corresponding partition of the hot spare disk corresponding to the fault disk.
Based on step S13, the main control node may receive the data corresponding to the information about the lost data sent by the secondary node, and may write the data corresponding to the information about the lost data into a corresponding partition of the hot spare disk corresponding to the failed disk. Wherein, the hot spare disk is also called spare, spare storage drive.
S15: and replacing the corresponding fault disk by using the hot standby disk so as to form a RAID array with the normal disk.
After all the lost data in the failed disk are recovered and written into the corresponding partition of the corresponding hot spare disk, the hot spare disk can be used for replacing the corresponding failed disk so as to form the RAID array with the normal disk again, and therefore recovery of the data of the failed disk is completed.
For example, as shown in fig. 3 and fig. 4, where fig. 3 shows a schematic structural diagram of a master node provided in the embodiment of the present application, fig. 4 shows a schematic structural diagram of a reconstruction process when the number of failed disks exceeds a fault tolerance provided in the embodiment of the present application, in fig. 3 and fig. 4, the master node has a RAID5 array composed of five hard disks, and fig. 4 shows a data reconstruction process when two disks of the master node of fig. 3 fail at the same time. In fig. 3, a failure occurs in a disk 1 and a disk 2 at the same time (which exceeds the fault tolerance of RAID 5), and therefore, data of strips 1-2, 5, 9, 13-14, and 17-18 is lost, and data recovery is performed beyond the internal mechanism of the RAID5 array, and therefore, by using the above scheme in this application, data recovery is performed, to obtain a hot spare disk 1 corresponding to the disk 1 and a hot spare disk 2 corresponding to the disk 2, and the hot spare disk 1 is used to replace the corresponding disk 1, and the hot spare disk 2 is used to replace the corresponding disk 2, so as to reconstruct the RAID array with normal disks (disk 3, disk 4, and disk 5).
According to the process, for the data loss event with high probability of occurrence in the storage system, in the cluster multi-control storage system, mirror image data is added in the auxiliary node in the cluster, and when the number of the failed disks exceeds the maximum number of the failed disks which can be recovered by the RAID array, the mirror image data of the auxiliary node is used to reconstruct and recover the data in the failed disks, specifically, the main control node uses the data of the normal disk and the data transmitted from the auxiliary node (the data is required to be written in the failed disks, but the data is not successfully written in due to the failure of the disks, so that the mirror image data of the auxiliary node corresponding to the data is transmitted to the main control node) to perform offline reconstruction, so as to effectively improve the data reliability of the storage system.
According to the technical scheme, the data in the main control node is backed up in the auxiliary node in the multi-control node, and the data backed up in the auxiliary node is used as mirror image data of the main control node. When the number of the fault disks of the RAID array in the main control node exceeds the fault tolerance of the RAID array, the information of the lost data in the fault disks is sent to the auxiliary node, and the auxiliary node acquires the data corresponding to the information of the lost data from the mirror image data and sends the data out. After receiving the data sent by the auxiliary node, writing the data into a corresponding partition of a hot standby disk corresponding to the failed disk, and then replacing the failed disk by the hot standby disk to form a RAID array with a normal disk. Therefore, the mirror image data is added in the auxiliary node, and the data recovery is carried out by using the mirror image data in the auxiliary node after the number of the fault disks exceeds the fault tolerance, so that the data reliability of the storage system can be improved.
In the data recovery method provided in the embodiment of the present application, backing up data in a main control node in an auxiliary node in a multi-control node may include:
backing up a logical volume in a main control node in an auxiliary node to obtain a corresponding mirror image logical volume; the logical volume and the mirror image logical volume comprise a plurality of blocks, the blocks have mapping relations with the sub-blocks in the disk, and the main control node and the auxiliary nodes both store the mapping relations;
sending information of lost data in the failed disk to the secondary node may include:
forming data loss metadata corresponding to the logical volume according to the blocks and the mapping relation contained in the fault disk, and sending the data loss metadata to the auxiliary node; the data loss metadata includes information on whether or not a block included in the logical volume is lost;
the method for acquiring and sending the data corresponding to the information from the mirror image data by the auxiliary node may include:
and the auxiliary node acquires the block number of the lost data according to the mapping relation and the data loss metadata, and sends the data corresponding to the block number in the mirror image logical volume to the main control node.
In the present application, considering that the host performs I/O access through a logical volume (volume) instead of blocking, the data corresponding to the I/O request is stored in the logical volume first when the primary node stores the data, so that when the data in the primary node is backed up in the secondary node, the logical volume in the primary node may be backed up in the secondary node to obtain the corresponding mirror image logical volume. The logical volume in the main control node and the mirror image logical volume both include a plurality of blocks (block, which is the granularity of host I/O access data, and is a logical unit), a certain mapping relationship exists between block and strip, the blocks can be located to specific blocks in a physical disk through the mapping relationship, the blocks form a volume, the blocks have a mapping relationship with the blocks in the disk, and the main control node and the auxiliary node both store mapping relationships, that is, the main control node and the auxiliary node both maintain the mapping relationship between the blocks. It should be noted that the block and the partition may be a partition corresponding to a plurality of blocks, so that a partition is equally divided into several blocks, or a partition corresponding to a block.
Specifically, referring to fig. 5 and fig. 6, fig. 5 shows another schematic structural diagram of a master node provided in the embodiment of the present application, fig. 6 shows a schematic diagram of an auxiliary node provided in the embodiment of the present application, fig. 5 shows a block diagram of a master node configured with a RAID5 array, a mirror volume a in the auxiliary node in fig. 6 corresponds to a volume a in the master node in fig. 5, and a mirror volume B in the auxiliary node in fig. 6 corresponds to a volume B in the master node in fig. 5. The main control node is used as a main server to process the I/O request of the host, and the data of the I/O request is distributed in a RAID5 array formed by five hard disks. In fig. 5, the main control node has a RAID5 array formed by five hard disks, strip is a granularity for performing data reconstruction in the RAID array, and block is a granularity for accessing data by host I/O, and is also a granularity for mirroring data between the main control node and the auxiliary node. From another perspective, strip is a physical data unit of a disk and block is a logical data unit. A volume consists of multiple blocks. The main control node and the auxiliary node maintain the mapping relation between the blocks and the strip, wherein the mapping relation can be that one strip corresponds to a plurality of blocks, so that one strip can be equally divided into a plurality of blocks, and one strip can correspond to one block. For easier understanding, a strip is designed in FIG. 5 and FIG. 6 corresponding to a block, such as blocks 0-99 in FIG. 5 corresponding to strip 1A. An example of a mapping relationship organized by volume is provided on the left side of FIG. 5, with data in volume A being distributed across five disks, as indicated by the rectangular box containing the letter "A", and data in volume B being likewise distributed across five disks, as indicated by the rectangular box containing the letter "B". Specifically, the data in volume A is distributed among strip1-2, 4-5, 9, 11-12, 16, 19-20. The data in volume B is distributed in strip3, 6-8, 10, 13-15, 17-18.
On the basis, when the primary control node sends the information of the data loss in the failed disk to the secondary node, the primary control node may specifically form the data loss metadata corresponding to the logical volume in the primary control node according to the partition included in the failed disk and the mapping relationship between the stored blocks and the partitions. The data loss metadata is also metadata (managing the data structure of the stripe, which may be a bitmap or a hash table, etc.), except that the data loss metadata not only manages the stripe but also identifies the data unit of the lost data in the failed disk (the data unit is a block, i.e., the data loss metadata is for a block in the logical volume). Specifically, the data loss metadata includes information on whether or not a block included in the logical volume in the master node is lost (information on whether or not all blocks included in the logical volume are lost is included in the data loss metadata). It should be noted that the data loss metadata referred to in this application is a global variable for implementing a specified logic function, and may be implemented in C language or C + +, in particular.
The primary node may then send the formed data loss metadata to the secondary node. Accordingly, after receiving the data loss metadata, the auxiliary node may scan the data loss metadata, and obtain the block number of the lost data according to the mapping relationship between the stored blocks and the received data loss metadata, for example, the block numbers of the lost data finally obtained by the auxiliary node in fig. 6 are 1-2, 5, 9, 13-14, and 17-18, and then the auxiliary node may send the data corresponding to the block number in the mirror image logical volume to the main control node, taking fig. 6 as an example, the data corresponding to the block numbers 1-2, 5, 9, 13-14, and 17-18 are sent to the main control node.
Through the method, the mirror image logical volume is formed in the auxiliary node, so that the data mirror image is the block size between the main control node and the auxiliary node, the host and the user can be better faced, the data loss metadata is transmitted between the main control node and the auxiliary node, and the auxiliary node only retrieves the lost data block (specifically, the data corresponding to the block in the fault disk) based on the data loss metadata and transmits the lost data block to the main control node for data reconstruction, so that the data volume needing to be retrieved and transmitted is minimized, and the time window for data reconstruction is also minimized.
The data recovery method provided in the embodiment of the present application forms data loss metadata corresponding to a logical volume according to a mapping relation and partitions included in a failed disk, and may include:
forming data loss metadata which corresponds to the logical volume and takes a bitmap as a data organization mode according to the blocks and the mapping relation contained in the fault disk; wherein, each bit in the bitmap indicates whether the corresponding block contained in the logical volume is lost.
In this application, data loss metadata corresponding to the logical volume and organized in a bitmap manner may be specifically formed according to the blocks and the mapping relationship included in the failed disk, and each bit in the bitmap indicates whether a corresponding block included in the logical volume is lost, for example, when the bit is designed to be 0, it indicates that the corresponding block is a lost block, and when the bit is designed to be 1, it indicates that the corresponding block is a non-lost block (of course, other designs may also be adopted as needed).
Specifically, taking fig. 6 as an example, both disk 1 and disk 2 fail, so strips 1-2, 5, 9, 13-14 and 17-18 are data-lost blocks, and the bit of the block corresponding to each of these blocks is set to 0. According to the one-to-one relationship between the strip and the block, the data loss metadata in volume A and the data loss metadata in volume B in the master control node are expressed as follows: first row bitmap (representing volume a): {0 (partition 1), 0 (partition 2), 1 (partition 4), 0 (partition 5), 0 (partition 9), 1 (partition 11), 1 (partition 12), 1 (partition 16), 1 (partition 19), 1 (partition 20) }, second-row bitmap (representing volume B): {1 (block 3), 1 (block 6), 1 (block 7), 1 (block 8), 1 (block 10), 0 (block 13), 0 (block 14), 1 (block 15), 0 (block 17), 0 (block 18) }. It should be noted that the bitmap metadata organization method designed in the present application is a two-dimensional bitmap metadata organization method, each row represents the bit of each block in one logical volume, the bit of the block in different logical volumes is on each column, and there are multiple rows in multiple logical volumes, thus forming a two-dimensional bitmap metadata organization method, that is, the data loss metadata is a two-dimensional bitmap metadata organization method. In summary, for the logical volume of the master node shown in fig. 5, the first behavior of the final data loss metadata (i.e. bitmap metadata) can be obtained: 0 01 11 1, second action: 11 11 10 01 0 0. Of course, the bit primitive data organization mode in the data loss metadata may also be adjusted according to actual needs, which is not limited in the present application.
The data loss metadata which takes the bitmap as a data organization mode is not only simple and clear, but also is convenient for the auxiliary node to quickly determine the block number of the lost data based on the data loss metadata.
Of course, the data loss metadata may be other data organization methods such as a hash table, and the like, as long as the information indicating whether or not the blocks included in the logical volume are lost and the determination of the block number of the lost data by the auxiliary node can be performed based on the information. In addition, it should be noted that the two-dimensional bitmap metadata organization method designed by the present application can be applied not only in the present service scenario but also in other service scenarios.
Before replacing the corresponding failed disk with the hot spare disk, the data recovery method provided by the embodiment of the application may further include:
judging whether each strip in the RAID array needs to recalculate the check blocks;
and if the stripe which needs to recalculate the check blocks exists, calculating the check blocks according to the blocks in the stripe, and writing the calculated check blocks into the corresponding subareas of the hot spare disk corresponding to the fault disk.
In this application, before replacing the corresponding failed disk with the hot spare disk, the main control node may further determine whether each stripe in the RAID array needs to recalculate the check block, where the main control node may sequentially determine each stripe. It should be noted that a stripe is a collection of blocks related to positions on different disks of the array, and is a unit for organizing blocks on different disks. Referring specifically to fig. 5, the strip is represented by the dashed rectangle 101 in fig. 5. Since the data is reconstructed and the P parity blocks are calculated by performing an exclusive or operation using the blocks in the stripe as a unit, the redundancy of the RAID array is maintained in units of stripes in the RAID array. As shown in FIG. 5, stripe 101 is composed of strip1A, strip2A, strip3B, strip4A, and P Parity chunks (i.e., parity 1). strip 1A-4A may be a data block issued by a host, and the P check block is a redundant block obtained by strip 101 strip 1A-4A XOR operation, and stores redundant data.
When the master control node determines whether a stripe in the RAID array needs to recalculate the parity chunks, it may specifically determine whether the parity chunks in the stripe are located on the failed disk, or determine whether the parity chunks in the stripe have been lost. If the check block in the stripe is located on the failed disk, or the stripe has a missing check block, determining that the stripe needs to recalculate the check block; if the parity chunks in the stripe are not located on the failed disk, or the stripe does not lack parity chunks, it is determined that the stripe does not need to recalculate parity chunks.
If it is determined that the Parity chunks need to be recalculated for the stripe, the Parity chunks are calculated according to the chunks in the stripe (the chunks mentioned herein specifically refer to the chunks in the normal disk in the stripe and the chunks in the hot spare disk corresponding to the failed disk in the stripe), specifically, the data chunks in the stripe are used to perform an exclusive or operation to obtain the Parity chunks, for example, the stripe 102 in fig. 4 and fig. 6 may perform an exclusive or operation based on the strip6A, the strip7B, the strip8B, and the recovered strip5A (i.e., the strip5A in the hot spare disk 2 corresponding to the failed disk 2) to obtain the P Parity chunk 2 (Parity 2). After the check blocks of the stripe are obtained through calculation, the calculated check blocks may be written into the corresponding partition of the hot spare disk corresponding to the failed disk, and corresponding to fig. 6, the obtained Parity2 is written into the corresponding partition of the hot spare disk 1. Through the above process, complete recovery of the corresponding stripe (i.e. not only the lost block data but also the lost check blocks are recovered) can be achieved, so as to improve the reliability of data recovery and improve the reliability of RAID array reorganization.
It should be noted that fig. 2 shows a manner of performing check partition judgment and writing data into a corresponding partition of the hot spare disk, that is, after writing the partitioned data received from the auxiliary node into the corresponding partition of the hot spare disk, the main control node judges whether a corresponding stripe (specifically, a stripe corresponding to the partitioned data written in) in the RAID array needs to recalculate the check partition, and when recalculation is needed, performs calculation of the check partition, and writes the recalculated check partition into the corresponding partition of the hot spare disk, so that the corresponding stripe is completely recovered; if the fact that the check blocks do not need to be recalculated is determined, or the check blocks obtained through recalculation are written into the corresponding partitions of the hot spare disk, whether reconstruction of all data of the fault disk is completed or not is judged, if yes, all lost data are recovered and written into the corresponding partitions of the hot spare disk, the hot spare disk is used for replacing the fault disk and is recombined with other disks without faults to form a RAID array, if not, the next stripe is moved to continuously write the received block data into the corresponding partitions of the hot spare disk so as to execute block reconstruction in the next stripe, and the cycle is repeated until all the lost data are completely recovered. Of course, after the data corresponding to all the information of the lost data is written into the corresponding partition of the hot spare disk corresponding to the failed disk, it may be determined, stripe by stripe, whether the check partition needs to be recalculated.
The data recovery method provided in the embodiment of the present application receives data corresponding to information, and writes the data corresponding to the information into a corresponding partition of a hot spare disk corresponding to a failed disk, where the method may include:
and receiving the data corresponding to the information block by block, and writing the data corresponding to the information block by block into a corresponding partition of the hot spare disk corresponding to the fault disk.
In the application, when receiving the data corresponding to the information sent by the auxiliary node, the main control node may receive the data corresponding to the information block by block, that is, receive the data corresponding to each block in the failure disk in a serial manner, so as to implement orderly data reception. Moreover, when the data corresponding to the information is written into the corresponding partition of the hot spare disk corresponding to the failed disk, the data corresponding to the information may also be written into the corresponding partition of the hot spare disk corresponding to the failed disk block by block, that is, the data corresponding to each block in the failed disk is sequentially written into the corresponding partition of the hot spare disk corresponding to the failed disk in a serial manner, so as to implement data writing and recovery in order.
It should be noted that the receiving of the data corresponding to the partition and the writing of the data corresponding to the partition by the master control node may be performed simultaneously, that is, while receiving the data corresponding to one partition, the data corresponding to the previously received partition may be written into the corresponding partition of the corresponding hot spare disk. Of course, the main control node may also receive data corresponding to one partition and then receive data corresponding to another partition after writing the data into the corresponding partition of the corresponding hot spare disk. This is not a limitation of the present application.
In the data recovery method provided in the embodiment of the present application, if the number of failed disks in the RAID array exceeds the fault tolerance of the RAID array, the method may further include:
redirecting the I/O request of the host to the secondary node;
after replacing the corresponding failed disk with the hot spare disk to reconstruct the RAID array with the normal disk, the method may further include:
the host's I/O request is redirected to the master node.
In the application, if the number of the failed disks of the RAID array exceeds the fault tolerance of the RAID array, the I/O request of the host may be redirected to the auxiliary node, that is, the auxiliary node receives and processes the I/O request of the host, and the auxiliary node stores data corresponding to the I/O request of the host, so as to ensure that the I/O request of the host can be processed normally. Furthermore, redirecting the host's I/O requests to the primary node may enable the secondary node to effectively function as a data mirror for the primary node. In addition, while the master node redirects the I/O request of the host, the master node may go offline (i.e., receive and process the I/O request of the host is not performed any more), in which case the master node may go offline to perform reconstruction and recovery of data.
In addition, after the hot spare disk is used to replace the corresponding failed disk to form a RAID array with the normal disk again, the main control node may redirect the I/O request of the host to the main control node, that is, the main control node continues to receive, process, and execute the I/O request from the host, so that the storage system recovers to a normal state.
When redirecting an I/O request of a host to an auxiliary node, a data recovery method provided in an embodiment of the present application may further include:
and storing the data newly sent by the host in the main control node, and taking the data newly sent by the host stored in the main control node as mirror image data of the auxiliary node.
In the application, when the I/O request of the host is redirected to the auxiliary node, the data newly sent by the host may be stored in the main control node (that is, the data corresponding to the I/O request newly sent by the host is not only stored in the auxiliary node but also stored in the main control node), so that the data newly sent by the host stored in the main control node is used as the mirror image data of the auxiliary node, and therefore, if the auxiliary node performs data dropping and the number of failed disks exceeds the fault tolerance, the mirror image data in the main control node may be used to recover the data of the failed disks in the auxiliary node, that is, a process similar to the process of recovering the data of the failed disks by the main control node is implemented, so as to improve the reliability of data storage in the auxiliary node.
In the data recovery method provided in the embodiment of the present application, if the number of failed disks in the RAID array does not exceed the fault tolerance of the RAID array, the method may further include:
calculating by using the blocks of the normal disk contained in each stripe in the RAID array to obtain the blocks of the hot spare disk corresponding to the fault disk;
and after all the data lost by the failed disk is recovered, replacing the failed disk by using the hot standby disk so as to form a RAID array together with the normal disk.
In this application, in step S12, if it is monitored that the number of failed disks of the RAID array in the main control node does not exceed the fault tolerance of the RAID array, data recovery may be performed by using a data redundancy mechanism inside the RAID array. Specifically, the exclusive or calculation is performed by using the blocks of the normal disk included in each stripe in the RAID array to obtain the blocks of the hot spare disk corresponding to the failed disk. Then, after all the data lost by the failed disk are recovered, that is, after the data of the failed disk of all the stripes in the RAID array are recovered, the failed disk is replaced by the hot spare disk corresponding to the failed disk to form the RAID array with the normal disk again, so that the data of the failed disk can be recovered when the number of the failed disk does not exceed the fault tolerance, and the data reliability of the storage system is improved.
Specifically, referring to fig. 7, which shows a schematic diagram of reconstructing and recovering data of a failed disk when the number of failed disks does not exceed the fault tolerance provided in the embodiment of the present application, in fig. 7, a master node has a RAID5 array formed by five disks, and if only one disk of the master node fails, data recovery can be performed through an internal mechanism of the RAID5 array. Fig. 7 illustrates that the disk 1 in the master node fails, and the spare hot spare disk is used to reconstruct data lost by the disk 1, for example, strip1 in the hot spare disk is obtained by performing exclusive or operation on strip2A in the disk 2, strip3B in the disk 3, strip4A in the disk 4, and P parity chunks (parity 1) in the disk 5, and data of the disk 1 of other strips can be recovered similarly. When the hot spare disk recovers all the data lost by the disk 1, the hot spare disk replaces the disk 1 and forms a new RAID5 array with the disks 2-5 to process the host I/O request.
The data recovery method provided in the embodiment of the present application, monitoring whether the number of failed disks in a RAID array in a master control node exceeds a fault tolerance of the RAID array, may include:
and regularly monitoring whether the number of the fault disks of the RAID array in the main control node exceeds the fault tolerance of the RAID array.
In the application, the main control node can monitor whether the number of the fault disks of the RAID array in the main control node exceeds the fault tolerance of the RAID array at regular time, and specifically, a timer can be used to monitor at regular time, so that the relationship between the number of the fault disks and the fault tolerance is found in time, and then different data reconstruction recovery methods are used to perform data recovery in time, so that the data reliability of the storage system is improved. The timed time interval may be set according to practical experience, which is not limited in this application.
Of course, the master control node may also monitor whether the number of failed disks of the RAID array in the master control node exceeds the fault tolerance of the RAID array in real time.
The data recovery method provided by the embodiment of the application can further include:
and if the number of the fault disks of the RAID array does not exceed the fault tolerance of the RAID array, returning to the step of executing the timed monitoring of whether the number of the fault disks of the RAID array in the main control node exceeds the fault tolerance of the RAID array.
In the application, when the main control node performs the timing monitoring, if it is monitored that the number of the failed disks of the RAID array does not exceed the fault tolerance of the RAID array, the step of performing the timing monitoring whether the number of the failed disks of the RAID array in the main control node exceeds the fault tolerance of the RAID array may be returned, that is, the timing cycle monitoring is performed, so that the relationship between the number of the failed disks and the fault tolerance is found in time, and then, the data recovery is performed in time by adopting different data reconstruction recovery methods, so as to improve the data reliability of the storage system.
It should be noted that, while returning to the step of executing the step of regularly monitoring whether the number of failed disks in the RAID array in the main control node exceeds the fault tolerance of the RAID array, the main control node may also perform data recovery by using a data redundancy mechanism inside the RAID array, so as to ensure the data reliability of the storage system.
In the data recovery method provided in the embodiment of the present application, if the number of the auxiliary nodes is greater than 1, sending information of data lost in the failed disk to the auxiliary nodes, where the sending may include:
and sending the information of the lost data in the fault disk to one auxiliary node selected from the plurality of auxiliary nodes according to a preset selection strategy.
In this application, if the number of the auxiliary nodes is greater than 1, the main control node may select one auxiliary node from the multiple auxiliary nodes according to a preset selection policy (for example, a minimum workload selection policy or a best working performance (e.g., lowest failure rate) selection policy, etc.), and then the main control node may send the information of the missing data to the selected one auxiliary node, so as to reconstruct and recover the data in the failed disk by using the mirror image data in the auxiliary node.
By the aid of the method, one auxiliary node in the plurality of auxiliary nodes can participate in data recovery, so that stable and ordered data recovery can be achieved, and data reliability is improved.
If the number of the auxiliary nodes is 1, the main control node directly sends the information of the data lost in the fault disk to the auxiliary nodes, so that the auxiliary nodes participate in the recovery of the data of the fault disk.
In the data recovery method provided in this embodiment of the present application, after replacing the corresponding failed disk with the hot spare disk to reconstruct, with the normal disk, the RAID array, the method may further include:
and cleaning the mirror image data in the auxiliary node regularly.
In the application, after the hot spare disk is used for replacing the corresponding fault disk to re-form the RAID array with the normal disk, the mirror data in the auxiliary node may also be cleaned at regular time (specifically, the mirror data in the auxiliary node may be cleaned in an overwriting manner), so as to reduce occupation of the memory, and the auxiliary node may participate in recovery of the fault disk data with more data sent by the host as the mirror data.
An embodiment of the present application further provides a data recovery apparatus, see fig. 8, which shows a schematic structural diagram of the data recovery apparatus provided in the embodiment of the present application, and the data recovery apparatus may include:
the backup module 81 is configured to backup data in a main control node in an auxiliary node in the multi-control node, so that the data in the auxiliary node is used as mirror image data of the main control node;
the monitoring module 82 is used for monitoring whether the number of the fault disks of the RAID array in the main control node exceeds the fault tolerance of the RAID array;
the sending module 83 is configured to send information of data lost in the failed disk to the auxiliary node if the number of failed disks in the RAID array exceeds the fault tolerance of the RAID array, and the auxiliary node obtains data corresponding to the information from the mirror image data and sends the data;
a write module 84, configured to receive data corresponding to the information, and write the data corresponding to the information into a corresponding partition of a hot spare disk corresponding to the failed disk;
and a first replacement module 85, configured to replace the corresponding failed disk with the hot spare disk to reconstruct the RAID array with the normal disk.
In an embodiment of the data recovery apparatus, the backup module 81 may include:
the backup unit is used for backing up the logical volume in the main control node in the auxiliary node to obtain a corresponding mirror image logical volume; the logical volume and the mirror image logical volume comprise a plurality of blocks, the blocks have mapping relations with the sub-blocks in the disk, and the main control node and the auxiliary nodes both store the mapping relations;
the sending module 83 may include:
the forming unit is used for forming data loss metadata corresponding to the logical volume according to the blocks and the mapping relation contained in the fault disk and sending the data loss metadata to the auxiliary node; the data loss metadata includes information on whether or not a block included in the logical volume is lost;
the auxiliary node is specifically configured to obtain a block number of the lost data according to the mapping relationship and the data loss metadata, and send data corresponding to the block number in the mirror image logical volume to the main control node.
In an embodiment of the present application, a forming unit of a data recovery apparatus may include:
a forming subunit, configured to form, according to the partition and the mapping relationship included in the failed disk, data loss metadata that corresponds to the logical volume and is organized by using a bitmap as data; wherein, each bit in the bitmap indicates whether the corresponding block contained in the logical volume is lost.
The data recovery device provided in the embodiment of the present application may further include:
the judging module is used for judging whether each strip in the RAID array needs to recalculate the check blocks before the hot spare disk is used for replacing the corresponding fault disk;
and the first calculation module is used for calculating the check blocks according to the blocks in the stripe if the stripe which needs to recalculate the check blocks exists, and writing the calculated check blocks into the corresponding partitions of the hot spare disk corresponding to the failed disk.
In an embodiment of the present invention, the writing module 84 may include:
and the writing unit is used for receiving the data corresponding to the information block by block and writing the data corresponding to the information block by block into a corresponding partition of the hot spare disk corresponding to the fault disk.
The data recovery device provided in the embodiment of the present application may further include:
the first redirection module is used for redirecting the I/O request of the host to the auxiliary node if the number of the fault disks of the RAID array exceeds the fault tolerance of the RAID array;
and the second redirection module is used for redirecting the I/O request of the host to the main control node after the corresponding fault disk is replaced by the hot standby disk so as to form a RAID array with the normal disk again.
The data recovery device provided in the embodiment of the present application may further include:
and the storage module is used for storing the data newly sent by the host into the main control node when the I/O request of the host is redirected to the auxiliary node, so that the data newly sent by the host stored in the main control node is used as the mirror image data of the auxiliary node.
The data recovery device provided in the embodiment of the present application may further include:
the second calculation module is used for calculating by using the blocks of the normal disks contained in each strip in the RAID array to obtain the blocks of the hot spare disks corresponding to the fault disks if the number of the fault disks of the RAID array does not exceed the fault tolerance of the RAID array;
and the second replacement module is used for replacing the failed disk by using the hot standby disk after all the data lost by the failed disk are recovered so as to form the RAID array together with the normal disk.
In an embodiment of the data recovery apparatus, the monitoring module 82 may include:
and the timing monitoring unit is used for regularly monitoring whether the number of the fault disks of the RAID array in the main control node exceeds the fault tolerance of the RAID array.
In an embodiment of the data recovery apparatus provided in this application, the monitoring module 82 may further include:
and the return execution unit is used for returning to execute the step of regularly monitoring whether the number of the fault disks of the RAID array in the main control node exceeds the fault tolerance of the RAID array if the number of the fault disks of the RAID array does not exceed the fault tolerance of the RAID array.
In the data recovery apparatus provided in the embodiment of the present application, if the number of the auxiliary nodes is greater than 1, the sending module 83 may include:
and the sending unit is used for sending the information of the lost data in the fault disk to one auxiliary node selected from the plurality of auxiliary nodes according to a preset selection strategy.
The data recovery device provided in the embodiment of the present application may further include:
and the timing cleaning module is used for regularly cleaning the mirror image data in the auxiliary node after the corresponding fault disk is replaced by the hot standby disk so as to form the RAID array with the normal disk again.
An embodiment of the present application further provides a data recovery device, see fig. 9, which shows a schematic structural diagram of the data recovery device provided in the embodiment of the present application, and the data recovery device may include:
a memory 91 for storing a computer program;
the processor 92, when executing the computer program stored in the memory 91, may implement the following steps:
backing up data in a main control node in an auxiliary node in a multi-control node to take the data in the auxiliary node as mirror image data of the main control node; monitoring whether the number of fault disks of the RAID array in the main control node exceeds the fault tolerance of the RAID array; if yes, sending the information of the lost data in the fault disk to the auxiliary node, and acquiring and sending data corresponding to the information from the mirror image data by the auxiliary node; receiving data corresponding to the information, and writing the data corresponding to the information into a corresponding partition of a hot spare disk corresponding to the fault disk; and replacing the corresponding fault disk by the hot standby disk so as to reconstruct the RAID array with the normal disk.
An embodiment of the present application further provides a readable storage medium, in which a computer program is stored, and when the computer program is executed by a processor, the following steps may be implemented:
backing up data in a main control node in an auxiliary node in a multi-control node so as to take the data in the auxiliary node as mirror image data of the main control node; monitoring whether the number of fault disks of the RAID array in the main control node exceeds the fault tolerance of the RAID array; if so, sending the information of the lost data in the fault disk to the auxiliary node, and acquiring the data corresponding to the information from the mirror image data by the auxiliary node and sending the data; receiving data corresponding to the information, and writing the data corresponding to the information into a corresponding partition of a hot spare disk corresponding to the fault disk; and replacing the corresponding fault disk by the hot standby disk so as to reconstruct the RAID array with the normal disk.
The readable storage medium may include: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, an optical disk, or other various media capable of storing program codes.
For a description of a relevant part in a data recovery device, a device, and a readable storage medium provided by the present application, reference may be made to a detailed description of a corresponding part in a data recovery method provided by an embodiment of the present application, which is not described herein again.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Furthermore, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include elements inherent in the list. Without further limitation, an element defined by the phrases "comprising one of 8230; \8230;" 8230; "does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element. In addition, parts of the above technical solutions provided in the embodiments of the present application, which are consistent with the implementation principles of corresponding technical solutions in the prior art, are not described in detail so as to avoid redundant description.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (14)

1. A method for data recovery, comprising:
backing up data in a main control node in an auxiliary node in a multi-control node to take the data in the auxiliary node as mirror image data of the main control node;
monitoring whether the number of fault disks of the RAID array in the main control node exceeds the fault tolerance of the RAID array;
if so, sending the information of the lost data in the fault disk to the auxiliary node, and acquiring and sending data corresponding to the information from the mirror image data by the auxiliary node;
receiving data corresponding to the information, and writing the data corresponding to the information into a corresponding partition of a hot spare disk corresponding to the fault disk;
replacing the corresponding fault disk by the hot standby disk to form a RAID array with a normal disk;
before replacing the corresponding failed disk with the hot spare disk, the method further comprises:
judging whether each strip in the RAID array needs to recalculate check blocks;
and if the stripe which needs to recalculate the check blocks exists, calculating the check blocks according to the blocks in the stripe, and writing the calculated check blocks into the corresponding subareas of the hot standby disk corresponding to the fault disk.
2. The data recovery method of claim 1, wherein backing up data in the primary control node in a secondary node in the multi-control node comprises:
backing up the logical volume in the main control node in the auxiliary node to obtain a corresponding mirror image logical volume; the logical volume and the mirror image logical volume comprise a plurality of blocks, the blocks have mapping relations with sub-blocks in a disk, and the mapping relations are stored in the main control node and the auxiliary nodes;
sending information of the lost data in the failed disk to the auxiliary node, including:
forming data loss metadata corresponding to the logical volume according to the blocks contained in the fault disk and the mapping relation, and sending the data loss metadata to the auxiliary node; the data loss metadata includes information on whether or not a block included in the logical volume is lost;
the auxiliary node acquires the data corresponding to the information from the mirror image data and sends the data, and the method comprises the following steps:
and the auxiliary node acquires the block number of the lost data according to the mapping relation and the data loss metadata, and sends the data corresponding to the block number in the mirror image logical volume to the main control node.
3. The data recovery method according to claim 2, wherein forming the metadata of data loss corresponding to the logical volume according to the partitions included in the failed disk and the mapping relationship comprises:
according to the blocks contained in the fault disk and the mapping relation, the data loss metadata which corresponds to the logical volume and takes a bitmap as a data organization mode is formed; wherein each bit in the bitmap indicates whether a corresponding block contained in the logical volume is lost.
4. The data recovery method according to claim 1, wherein receiving the data corresponding to the information, and writing the data corresponding to the information into a corresponding partition of a hot spare disk corresponding to the failed disk includes:
and receiving the data corresponding to the information block by block, and writing the data corresponding to the information block by block into a corresponding partition of the hot spare disk corresponding to the fault disk.
5. The data recovery method of claim 1, wherein if the number of failed disks of the RAID array exceeds the fault tolerance of the RAID array, further comprising:
redirecting the host's I/O requests to the secondary node;
after replacing the corresponding failed disk with the hot spare disk to reconstruct a RAID array with a normal disk, the method further includes:
redirecting the host's I/O request to the master node.
6. The data recovery method of claim 5, further comprising, when redirecting the host's I/O requests to the secondary node:
and storing the data newly sent by the host in the main control node, so that the data newly sent by the host stored in the main control node is used as mirror image data of the auxiliary node.
7. The data recovery method of claim 1, wherein if the number of failed disks of the RAID array does not exceed the fault tolerance of the RAID array, further comprising:
calculating by using the blocks of the normal disk included in each stripe in the RAID array to obtain the blocks of the hot spare disk corresponding to the fault disk;
and after all the data lost by the failed disk are recovered, replacing the failed disk by the hot standby disk so as to form a RAID array together with the normal disk.
8. The method of claim 1, wherein monitoring whether the number of failed disks of the RAID array in the master node exceeds the fault tolerance of the RAID array comprises:
and monitoring whether the number of the fault disks of the RAID array in the main control node exceeds the fault tolerance of the RAID array at regular time.
9. The data recovery method of claim 8, further comprising:
if the number of the fault disks of the RAID array does not exceed the fault tolerance of the RAID array, returning to the step of monitoring whether the number of the fault disks of the RAID array in the main control node exceeds the fault tolerance of the RAID array at regular time.
10. The data recovery method according to claim 1, wherein if the number of the secondary nodes is greater than 1, sending information of the missing data in the failed disk to the secondary nodes, includes:
and sending the information of the lost data in the fault disk to one auxiliary node selected from a plurality of auxiliary nodes according to a preset selection strategy.
11. The method according to claim 1, further comprising, after replacing the corresponding failed disk with the hot spare disk to reconstruct a RAID array with a normal disk:
and cleaning mirror image data in the auxiliary node regularly.
12. A data recovery apparatus, comprising:
the backup module is used for backing up data in a main control node in an auxiliary node in a multi-control node so as to take the data in the auxiliary node as mirror image data of the main control node;
the monitoring module is used for monitoring whether the number of the fault disks of the RAID array in the main control node exceeds the fault tolerance of the RAID array;
the sending module is used for sending the information of the lost data in the fault disk to the auxiliary node if the number of the fault disks of the RAID array exceeds the fault tolerance of the RAID array, and the auxiliary node acquires the data corresponding to the information from the mirror image data and sends the data;
the writing module is used for receiving the data corresponding to the information and writing the data corresponding to the information into a corresponding partition of the hot spare disk corresponding to the fault disk;
the first replacement module is used for replacing the corresponding fault disk by using the hot standby disk so as to form a RAID array together with a normal disk;
the judging module is used for judging whether each strip in the RAID array needs to recalculate the check blocks before the hot spare disk is used for replacing the corresponding fault disk;
and the first calculation module is used for calculating the check blocks according to the blocks in the stripe if the stripe which needs to recalculate the check blocks exists, and writing the calculated check blocks into the corresponding partitions of the hot spare disk corresponding to the failed disk.
13. A data recovery apparatus, comprising:
a memory for storing a computer program;
a processor for implementing the steps of the data recovery method of any one of claims 1 to 11 when executing said computer program.
14. A readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the data recovery method according to any one of claims 1 to 11.
CN202211409858.4A 2022-11-11 2022-11-11 Data recovery method, device and equipment and readable storage medium Active CN115454727B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202211409858.4A CN115454727B (en) 2022-11-11 2022-11-11 Data recovery method, device and equipment and readable storage medium
PCT/CN2023/093083 WO2024098696A1 (en) 2022-11-11 2023-05-09 Data recovery method, apparatus and device, and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211409858.4A CN115454727B (en) 2022-11-11 2022-11-11 Data recovery method, device and equipment and readable storage medium

Publications (2)

Publication Number Publication Date
CN115454727A CN115454727A (en) 2022-12-09
CN115454727B true CN115454727B (en) 2023-03-10

Family

ID=84295788

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211409858.4A Active CN115454727B (en) 2022-11-11 2022-11-11 Data recovery method, device and equipment and readable storage medium

Country Status (2)

Country Link
CN (1) CN115454727B (en)
WO (1) WO2024098696A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115454727B (en) * 2022-11-11 2023-03-10 苏州浪潮智能科技有限公司 Data recovery method, device and equipment and readable storage medium
CN116166203B (en) * 2023-04-19 2023-07-14 苏州浪潮智能科技有限公司 Method, device, equipment and medium for managing naming space of RAID card

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140258612A1 (en) * 2013-03-07 2014-09-11 Dot Hill Systems Corporation Mirrored data storage with improved data reliability
CN104317678A (en) * 2014-10-30 2015-01-28 浙江宇视科技有限公司 Method and device for repairing RAID (redundant array of independent disks) without interrupting data storage service
CN114281591A (en) * 2021-12-30 2022-04-05 郑州云海信息技术有限公司 Storage node fault processing method, device, equipment and storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8255739B1 (en) * 2008-06-30 2012-08-28 American Megatrends, Inc. Achieving data consistency in a node failover with a degraded RAID array
CN106371947B (en) * 2016-09-14 2019-07-26 郑州云海信息技术有限公司 A kind of multiple faults disk data reconstruction method and its system for RAID
US10417069B2 (en) * 2017-12-01 2019-09-17 International Business Machines Corporation Handling zero fault tolerance events in machines where failure likely results in unacceptable loss
CN115454727B (en) * 2022-11-11 2023-03-10 苏州浪潮智能科技有限公司 Data recovery method, device and equipment and readable storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140258612A1 (en) * 2013-03-07 2014-09-11 Dot Hill Systems Corporation Mirrored data storage with improved data reliability
CN104317678A (en) * 2014-10-30 2015-01-28 浙江宇视科技有限公司 Method and device for repairing RAID (redundant array of independent disks) without interrupting data storage service
CN114281591A (en) * 2021-12-30 2022-04-05 郑州云海信息技术有限公司 Storage node fault processing method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN115454727A (en) 2022-12-09
WO2024098696A1 (en) 2024-05-16

Similar Documents

Publication Publication Date Title
CN115454727B (en) Data recovery method, device and equipment and readable storage medium
US10372537B2 (en) Elastic metadata and multiple tray allocation
EP2250563B1 (en) Storage redundant array of independent drives
US7681104B1 (en) Method for erasure coding data across a plurality of data stores in a network
US7480909B2 (en) Method and apparatus for cooperative distributed task management in a storage subsystem with multiple controllers using cache locking
US7159150B2 (en) Distributed storage system capable of restoring data in case of a storage failure
US7058762B2 (en) Method and apparatus for selecting among multiple data reconstruction techniques
US7231493B2 (en) System and method for updating firmware of a storage drive in a storage network
US7281158B2 (en) Method and apparatus for the takeover of primary volume in multiple volume mirroring
CN104813290B (en) RAID investigation machines
US20070239952A1 (en) System And Method For Remote Mirror Data Backup Over A Network
US20070050667A1 (en) Method and apparatus for ensuring data integrity in redundant mass storage systems
US20120089799A1 (en) Data backup processing method, data storage node apparatus and data storage device
CN102521058A (en) Disk data pre-migration method of RAID (Redundant Array of Independent Disks) group
CN101567211A (en) Method for improving usability of disk and disk array controller
US20060242540A1 (en) System and method for handling write commands to prevent corrupted parity information in a storage array
CN103942112A (en) Magnetic disk fault-tolerance method, device and system
US8726129B1 (en) Methods of writing and recovering erasure coded data
CN107729536A (en) A kind of date storage method and device
US11093339B2 (en) Storage utilizing a distributed cache chain and a checkpoint drive in response to a data drive corruption
CN106933707B (en) Data recovery method and system of data storage device based on raid technology
CN115981926B (en) Method, device and equipment for improving disk array performance
US11860746B2 (en) Resilient data storage system with efficient space management
CN115048061A (en) Cold data storage method based on Raft
US11494090B2 (en) Systems and methods of maintaining fault tolerance for new writes in degraded erasure coded distributed storage

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant