WO2021072917A1 - 一种raid的写洞保护方法、系统及存储介质 - Google Patents

一种raid的写洞保护方法、系统及存储介质 Download PDF

Info

Publication number
WO2021072917A1
WO2021072917A1 PCT/CN2019/121096 CN2019121096W WO2021072917A1 WO 2021072917 A1 WO2021072917 A1 WO 2021072917A1 CN 2019121096 W CN2019121096 W CN 2019121096W WO 2021072917 A1 WO2021072917 A1 WO 2021072917A1
Authority
WO
WIPO (PCT)
Prior art keywords
log
striped
raid
data
data block
Prior art date
Application number
PCT/CN2019/121096
Other languages
English (en)
French (fr)
Inventor
施培任
Original Assignee
苏州浪潮智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏州浪潮智能科技有限公司 filed Critical 苏州浪潮智能科技有限公司
Priority to US17/642,643 priority Critical patent/US11650880B2/en
Publication of WO2021072917A1 publication Critical patent/WO2021072917A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • G06F11/1076Parity data used in redundant arrays of independent storages, e.g. in RAID systems
    • G06F11/1096Parity calculation or recalculation after configuration or reconfiguration of the system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • G06F11/1076Parity data used in redundant arrays of independent storages, e.g. in RAID systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • G06F11/1076Parity data used in redundant arrays of independent storages, e.g. in RAID systems
    • G06F11/1084Degraded mode, e.g. caused by single or multiple storage removals or disk failures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1456Hardware arrangements for backup
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1469Backup restoration techniques

Definitions

  • the invention relates to the technical field of computer data storage, in particular to a RAID write hole protection method, system and storage medium.
  • RAID Redundant Array Of Independent Disks
  • RAID5 and RAID6 have a higher comprehensive value in terms of available capacity and performance, and are the most widely used.
  • the stripe includes multiple conventional data blocks and a parity data block, and each data block is stored in a different RAID member disk, such as a stripe.
  • a different RAID member disk such as a stripe.
  • the data blocks of the failed member disk can be calculated based on other member disks. For example, if member disk 1 fails, it can be based on the regular data blocks in member disks 2 to 4 and member disk 5.
  • the check data block in the restores the data block in the No. 1 member disk. After one member disk fails, and before repairing, RAID5 is in a degraded state.
  • RAID6 Compared with RAID5, which allows one member disk to fail, a strip of RAID6 has two parity data blocks, and at most two member disks are allowed to fail. The reason for the "write hole” problem is the same as that of RAID5, so I won’t repeat it. .
  • the purpose of the present invention is to provide a RAID write hole protection method, system and storage medium to avoid the RAID write hole problem.
  • the present invention provides the following technical solutions:
  • a RAID write hole protection method including:
  • each data block to be written in the strip includes the data block to be written to the failed member disk;
  • the log area is used to perform data repair.
  • the method further includes: saving related information of the backup operation to form a striped log;
  • said using the log area to perform data restoration includes:
  • Each valid strip log is determined by the associated information of each strip log in the log area, and data restoration is performed based on the determined each valid strip log.
  • the saved associated information includes at least: the strip sequence number and the log number of the strip log;
  • the determination of each valid striped log through the associated information of each striped log in the log area includes:
  • each striped log in the log area is traversed, and for each striped log with the same strip sequence number, the striped log with the largest log number is taken as the effective striped log.
  • the saved associated information includes at least: the strip sequence number, log number, and strip log check value of the strip log;
  • the determination of each valid striped log through the associated information of each striped log in the log area includes:
  • the striped log with the largest log number is taken as the effective striped log.
  • the saved associated information includes at least: the strip serial number, log number, strip log check value, and log version number of the strip log;
  • the method further includes: updating the log area version number in the overall information recorded in the log area;
  • the determination of each valid striped log through the associated information of each striped log in the log area includes:
  • the striped log with the largest log number is taken as the effective striped log.
  • the performing data repair based on each of the determined valid strip logs includes:
  • each data block to be written is written to the corresponding healthy member disk to complete data repair.
  • the preset log area includes:
  • a space of a preset size is divided from each member disk of the RAID, and the set log area is formed by each divided space.
  • the preset log area includes:
  • the log area is set by a solid state disk other than the RAID member disk.
  • a RAID write hole protection system including:
  • the log area setting module is used to set the log area in advance, and set the log area to the active state after the RAID is degraded;
  • the first judging module is configured to judge whether the data block of the failed member disk of the RAID in the stripe is parity data each time before the stripe write operation is performed when the log area is in the enabled state Piece;
  • the second judgment module is configured to judge whether each data block to be written in the stripe includes the data block to be written to the failed member disk;
  • trigger the first backup module if not included, trigger the second backup module;
  • the first backup module is configured to back up data blocks to be written to the failed member disk in the log area
  • the second backup module is configured to calculate the data block of the failed member disk according to the RAID algorithm and perform backup in the log area, or back up each data block to be written in the log area;
  • the data repair module is configured to use the log area to perform data repair when the degraded RAID is started after a failure.
  • a computer-readable storage medium having a computer program stored on the computer-readable storage medium, and when the computer program is executed by a processor, implements the steps of the RAID write hole protection method described in any one of the above items.
  • the application of the technical solution provided by the embodiment of the present invention does not involve communication between multiple controllers, that is, the solution of the present application can be applied to a single-controller scenario or a multiple-controller scenario.
  • the log area is preset. After the RAID is degraded, the log area is set to the enabled state, and the corresponding backup operation is performed before the stripe write operation is performed each time, thereby avoiding the write hole problem.
  • the backup is an overall backup, that is, the data blocks of the failed member disk can be obtained directly through the log area, and the data is correct.
  • the data blocks of other healthy member disks can be read, combined with the data blocks of the failed member disks that were backed up, to calculate the correct check data block to avoid errors in the check data block. If each data block to be written is backed up, that is, the data block to be written to a healthy member disk is backed up, indicating that the write operation does not involve a failed member disk, you can directly backup each data block to be written Write to the corresponding healthy member disk so that the data of each healthy member disk is correct data. Therefore, the solution of the present application avoids the write hole problem of RAID.
  • Figure 1 is an implementation flow chart of a RAID write hole protection method in the present invention
  • Figure 2 is a schematic diagram of the structure of a RAID write hole protection system in the present invention.
  • the core of the present invention is to provide a RAID write hole protection method, which avoids the RAID write hole problem.
  • Figure 1 is a flowchart of an implementation of a RAID write hole protection method in the present invention.
  • the RAID write hole protection method may include the following steps:
  • Step S101 Pre-set a log area, and set the log area to an enabled state after the RAID is degraded.
  • the log area can be set based on each member disk of the RAID, or other hard disks other than the RAID member disks can be used as the log space.
  • the preset log area may be specifically: a space of a preset size is divided from each member disk of the RAID, and the set log area is formed by each divided space.
  • a space is divided from each hard disk in the RAID, and these spaces can constitute the set log area.
  • the set log area is in the inactive state.
  • the log area needs to be set to the enabled state. It is understandable that after the RAID is degraded, a log area that can be used is composed of the corresponding space of each healthy disk, so as to perform data backup in the subsequent steps based on the log area.
  • the log area is set based on each member disk of the RAID, which has low cost and is convenient for implementation of the solution.
  • solid state disks other than RAID member disks may also be used to set the log area, for example, a higher-performance NVME solid state disk, a 3D NAND solid state disk with higher performance and lifespan, etc. may be used.
  • Step S102 When the log area is in the enabled state, each time before performing a stripe write operation, it is determined whether the data block in the stripe of the failed member disk of the RAID is a parity data block. If not, step S103 is executed.
  • each regular data block in the data block to be written can be written to the corresponding healthy member disk.
  • the stripe write operation can be directly started.
  • Step S103 Determine whether each data block to be written in the stripe includes the data block to be written to the failed member disk. If it is included, step S104 is executed, and if it is not included, step S105 is executed.
  • the data block in the stripe of the failed member disk of the RAID is not a parity data block, indicating that it is a regular data block, but every time a stripe write operation is performed, the regular data involved in the stripe write operation Blocks can be different.
  • a strip includes data blocks No. 1 to No. 5, of which No. 1 to No. 4 are regular data blocks, which are sequentially stored in member disks No. 1 to No. 4, and No. 5 is a checksum.
  • the data block is stored in the No. 5 member disk.
  • the stripe write operation needs to write data blocks No. 1, No. 2, and No. 3 to the corresponding member disk, and assuming that the invalid member disk is No. 1, then you can It is determined that each data block to be written includes the data block to be written to the failed member disk.
  • the stripe write operation needs to write data blocks No. 2 and No. 4 to the corresponding member disks, and assuming that the failed member disk is No. 1, you can determine each The data block to be written does not include the data block to be written to the failed member disk.
  • Step S104 Back up the data blocks to be written to the failed member disk in the log area.
  • the stripe write operation needs to write data blocks No. 1, No. 2 and No. 3 into the corresponding member disks, that is, the data blocks to be written are No. 1, No. 2 and No. 3, of which No. 1
  • the data block is the data block to be written to the failed member disk, and the data block No. 1 is backed up in the log area.
  • Step S105 Calculate the data block of the failed member disk according to the RAID algorithm and perform backup in the log area, or back up each data block to be written in the log area.
  • Step S105 discloses two backup methods after step S105 is triggered. In actual applications, which one is specifically selected can be selected according to actual needs, and can also be switched at any time. No matter which one is selected, subsequent data restoration can be achieved.
  • a strip includes data blocks 1 to 5, of which numbers 1 to 4 are regular data blocks, which are sequentially stored in member disks 1 to 4, and number 5 is the check data block stored in 5.
  • No. member in the plate For example, when a stripe write operation is performed, the stripe write operation needs to write data blocks No. 1, No. 2, and No. 5 into the corresponding member disks. Assuming that the invalid member disk is No. 4, it indicates that each is waiting. The written data block does not include the data block to be written to the failed member disk.
  • a backup method is based on the RAID algorithm, which is to calculate the data block of the failed member disk based on the data block No. 1, No. 2 and No. 5 to be written, combined with the No. 3 data block, that is, calculate the No. 4 The data block of the member disk, and then the calculated data block of the No. 4 member disk is backed up in the log area.
  • Another backup method is to directly back up each data block to be written in the log area.
  • the data blocks No. 1, 2, and 5 to be written are directly backed up in the log area.
  • the stripe writing operation can be performed this time. That is, for any stripe writing operation, after determining the data block that needs to be backed up for the stripe writing operation and completing the backup, the current stripe writing operation can be executed.
  • every time a data block is backed up in the log area it may also include: saving the associated information of the backup operation to form a striped log.
  • each valid stripe log can be determined from the associated information of each stripe log in the log area, and data can be restored based on the determined valid stripe logs.
  • Associated information refers to the relevant parameter data generated when the data block is backed up, and the specific items included can be set and adjusted as needed.
  • each data block in this backup and the corresponding associated information constitute a striped log.
  • the saved associated information includes at least: the strip sequence number and the log number of the striped log.
  • each striped log in the log area is traversed, and for each striped log with the same strip sequence number, the striped log with the largest log number is taken as the effective striped log.
  • one or more write operations may be performed, that is, there may be multiple stripe logs that are for the same stripe, so the log number and stripe The serial number is used to distinguish.
  • the stripe serial number refers to the serial number of the stripe targeted by the stripe log in the RAID, that is, the corresponding stripe and the start and end sectors of the corresponding member disk can be found according to the stripe serial number .
  • the log number in this embodiment is an incremental number, so as to identify the sequence of each striped log for the same strip, that is, when the strip sequence numbers are the same, the striped log with the largest log number is the valid log.
  • other non-incremental log numbers can also be used, and the sequence of each striped log for the same strip can be distinguished, which does not affect the implementation of the present invention.
  • the saved associated information includes at least: the strip sequence number, the log number, and the strip log check value of the strip log;
  • determining each valid striped log through the associated information of each striped log in the log area may specifically include the following two steps:
  • the first step traverse the associated information of each striped log in the log area, and filter the incomplete striped logs according to the striped log check value in each associated information;
  • the second step For each striped log with the same strip sequence number after filtering, the striped log with the largest log number is taken as the effective striped log.
  • the associated information also includes the stripe log check value, so that it can be determined based on the stripe log check value whether the data block in the stripe log is complete, and the used check algorithm can choose the public check value.
  • Check algorithm such as CRC32 check algorithm. Because the stripe log check value is added to the associated information in this embodiment, each incomplete stripe log is filtered, which is conducive to timely detection of stripe log abnormalities, which in turn is conducive to the accuracy of data repair.
  • the saved associated information includes at least: the strip sequence number, log number, strip log check value, and log version number of the strip log;
  • step S106 after step S106, it also needs to include: updating the log area version number in the overall information recorded in the log area. That is, the version number of the log area needs to be updated after the repair is completed.
  • determining each valid striped log through the associated information of each striped log in the log area may specifically include the following three steps:
  • the first step traverse the associated information of each striped log in the log area, and filter each striped log whose log version number does not match the version number of the log area in the current log area;
  • the second step filter each incomplete strip log according to the strip log check value in each associated information
  • the third step Regarding the filtered striped logs with the same strip sequence number, the striped log with the largest log number is taken as the effective striped log.
  • the RAID may still be in a degraded state, that is, the staff may not be able to restore the failed disk to the healthy disk in time. Therefore, the RAID needs to continue to be Write hole protection is performed in the degraded state, that is, the log area will continue to back up the data block to generate the corresponding striped log. Therefore, in order to distinguish which striped logs have been used before and which have not been used, in this embodiment, the log version number is included in the associated information, and the log version number needs to be the same as the log in the overall information. Use in conjunction with the district version number.
  • log area version number is used in conjunction with the log version number of each striped log to determine which striped logs are unused striped logs. The description here is not used, that is, data repair has not been performed based on the striped log.
  • the log area is enabled, and 16 striped write operations are performed, and 16 striped logs are backed up in the log area accordingly.
  • the log version number of each striped log is, for example, 001
  • the version number of the log area is also 001.
  • data repair is performed based on these 16 striped logs.
  • the version number of the log area is updated, for example, changed to 002.
  • 20 stripe write operations are performed, and 20 stripe logs are backed up in the log area, and the log of each stripe log
  • the version numbers are both 002.
  • the specific process can be: first you can change the log version Each striped log whose number does not match the log area version number in the current log area is filtered, that is, 16 striped logs with log version number 001 are filtered out, and then the striped log check value in each associated information is filtered out. , Filter each incomplete striped log in the remaining 20 striped logs, for example, filter 3 incomplete striped logs, leaving 17 striped logs.
  • the striped log with the largest log number is taken as the effective striped log. For example, there are 3 striped logs in the 17 striped logs. The strip sequence numbers with logs are the same, and the strip sequence numbers of the remaining 14 strip logs are different, and finally 15 valid strip logs can be determined. After repairing with these 15 striped logs, the version number of the log area can be adjusted to 003.
  • the log area version number is updated after each repair, so as to cooperate with the log version number to determine which striped logs have been used.
  • the version number of the log area when updated, it is incrementally updated, that is, by one.
  • other update methods can also be used, as long as the log area version number is different from the version number of each historical log area after the update, for example, the value of the log area version number can be increased by 2 each time, another example is , You can use a larger random number to update and so on.
  • the associated information may also include other parameters, for example, the log size of the striped log may also be included, and the log size represents the continuous occupied space size of the striped log. It should be noted that because the striped log is composed of backed up data blocks and corresponding associated information, the log size includes the backed up data blocks and associated information, that is, the size of the entire striped log. For another example, the associated information may also include the member disk number, data length, and so on.
  • the overall information in the log area can also include other parameters.
  • the overall information also includes: the enabled status of the log area, the size of the log space, the start address of the striped log, and the size of the striped log space. , The number of striped log alignment areas and log area check value, etc., so that the staff can quickly and conveniently learn the status of the log area based on the overall information.
  • the enabled status of the log area is used to indicate whether the log area is enabled, that is, after the RAID is degraded, the log area is enabled, and it is not enabled if it is not degraded.
  • the RAID type is not RAID5 or RAID6, it is not enabled.
  • the striped log space is a fixed value, and the striped log space plus the space occupied by the overall information is the log space size.
  • the start address of the striped log refers to the start address of the first striped log.
  • the size of the striped log space is divided by the number of striped log alignment areas to get the size of each log alignment area.
  • the stripe log space size is 2G
  • the preset number of stripe log alignment areas is 1024, for example, then the size of each log alignment area is 2M.
  • the meaning of 2M means that the 2M space is used to store one or more There are two striped logs, but there will be no striped logs that cross this 2M space, that is, the head of each log alignment area is the start and end position of a certain striped log. In this way, the start address of each log alignment area is aligned with the start address of the corresponding striped log, so that the striped log can be located and searched more quickly.
  • the log area check value can realize the verification of the data integrity of the log area.
  • Step S106 When the degraded RAID is started after a failure, the log area is used to perform data repair.
  • the data can be repaired according to each striped log in the log area.
  • the data repairing method used can be different.
  • each valid stripe log can usually be determined through the associated information of each stripe log in the log area, and based on the determined individual stripe logs. Effective striped logs for data repair. After the repair is completed, the overall information of the log area can be reinitialized, and specifically, the version number of the log area can be updated.
  • each data block to be written is backed up in the striped log, that is, when the data block to be written to a healthy member disk is backed up, it indicates that the striped log corresponds to
  • the write operation does not involve the failed member disk, you can directly write each data block of the backup to the corresponding healthy member disk, so that the data of each healthy member disk is the correct data, so that the correct data is determined based on the correct healthy member disk
  • the data block of the failed member disk is naturally correct, thus avoiding the write hole problem.
  • the log area takes up a lot of space.
  • the backup is an overall backup, that is, the data blocks of the failed member disks can be obtained directly through the striped log, so the write hole problem can also be avoided.
  • the correct check data block can be calculated, and the check data block can be written into the check disk to make the check
  • the data on the disk is accurate data. After that, if you want to obtain the data block of the failed member disk, it can be accurately calculated through each healthy member disk.
  • the application can also obtain correct data on the failed member disk.
  • performing data repair based on each valid strip log determined may specifically include:
  • Step 1 For any valid striped log in the log area, when the striped log backs up the data blocks of the failed member disk, based on the backup data in the striped log and the read data of each member disk Calculate the check block data from the data, and use the calculated check block data to repair the check disk data;
  • Step 2 For any valid striped log in the log area, when the striped log backs up each data block to be written, write each data block to be written to the corresponding healthy member disk to complete Data repair.
  • this implementation manner can perform data verification and repair of the verification disk based on the backup data. That is, read the data blocks of each other healthy member disk, and combine the data blocks of the failed member disks backed up in the striped log to calculate the correct check data block.
  • the calculated check block data can be directly written into the check disk, so that the data block of the failed member disk can be accurately determined from the data block of each healthy member disk in the future. , Realize write hole protection.
  • the check data block can also be read from the check disk first. If the calculated check data block is the same as the read check data If the blocks are inconsistent, it means that the check data block on the check disk is wrong, and event recording can be performed. After recording, the calculated check block data is written to the check disk.
  • the data block of the failed disk can be calculated based on the data blocks backed up in the striped log and the data blocks of other member disks, which is marked as A. Read the data blocks of the strip on all healthy member disks and then calculate the data blocks of the failed disk, denoted as B. If A and B are different, it means that the data block on the healthy member disk is wrong.
  • the data block is written to the corresponding healthy member disk to complete the data repair.
  • the reason for the error is the data error of the check disk.
  • the application of the technical solution provided by the embodiment of the present invention does not involve communication between multiple controllers, that is, the solution of the present application can be applied to a single-controller scenario or a multiple-controller scenario.
  • the log area is preset. After the RAID is degraded, the log area is set to the enabled state, and the corresponding backup operation is performed before the stripe write operation is performed each time, thereby avoiding the write hole problem.
  • the backup is an overall backup, that is, the data blocks of the failed member disk can be obtained directly through the log area, and the data is correct.
  • the data blocks of other healthy member disks can be read, combined with the data blocks of the failed member disks that were backed up, to calculate the correct check data block to avoid errors in the check data block. If each data block to be written is backed up, that is, the data block to be written to a healthy member disk is backed up, indicating that the write operation does not involve a failed member disk, you can directly backup each data block to be written Write to the corresponding healthy member disk so that the data of each healthy member disk is correct data.
  • the embodiment of the present invention also provides a RAID write hole protection system, which can be cross-referenced with the above.
  • FIG. 2 is a schematic structural diagram of a RAID write hole protection system in the present invention, including:
  • the log area setting module 201 is used to set the log area in advance, and set the log area to an active state after the RAID is degraded;
  • the first judgment module 202 is configured to judge whether the data block of the failed member disk of the RAID in the stripe is a parity data block each time before performing a stripe write operation when the log area is in the enabled state;
  • the second judging module 203 is used to judge whether each data block to be written in the strip includes the data block to be written to the failed member disk;
  • trigger the first backup module 204 if not included, trigger the second backup module 205;
  • the first backup module 204 is configured to back up data blocks to be written to the failed member disk in the log area;
  • the second backup module 205 is configured to calculate the data block of the failed member disk according to the RAID algorithm and perform backup in the log area, or back up each data block to be written in the log area;
  • the data repair module 206 is configured to use the log area to repair data when the degraded RAID is started after a failure.
  • a striped log generation module is further included, which is used to: each time a data block is backed up in the log area, save the associated information of the backup operation to form a stripe.
  • the data repair module 206 is specifically configured to: when the degraded RAID is started after a failure, determine each valid striped log through the associated information of each striped log in the log area, and Perform data repair based on each valid stripe log determined.
  • the associated information saved by the log generation module includes at least: the strip sequence number and the log number of the strip log;
  • the data repair module 206 is specifically used for:
  • the associated information of each striped log in the log area is traversed, and for each striped log with the same strip sequence number, the striped log with the largest log number is taken as the valid Striped log.
  • the associated information saved by the log generation module includes at least: the strip sequence number, log number, and strip log check value of the strip log;
  • the data repair module 206 is specifically used for:
  • the associated information of each striped log in the log area is traversed, and the incomplete striped logs are filtered according to the striped log check value in each associated information ;
  • the striped log with the largest log number is taken as the effective striped log.
  • the associated information saved by the log generation module includes at least: the strip sequence number, log number, strip log check value, and log version number of the strip log;
  • the log area After the log area is used for data restoration, it also includes a log area version number update module, which is used to restore the log in the overall information recorded in the log area after the data restoration module 206 uses the log area to perform data restoration. Update the district version number;
  • the data repair module 206 is specifically used for:
  • the striped log with the largest log number is taken as the effective striped log.
  • the data repair module 206 is specifically used for:
  • the log area setting module 201 is specifically configured to:
  • a space of a preset size is divided from each member disk of the RAID, and the set log area is formed by each divided space.
  • the log area setting module 201 is specifically configured to:
  • the log area is set by a solid state disk other than the RAID member disk.
  • the embodiment of the present invention also provides a computer-readable storage medium, and a computer program is stored on the computer-readable storage medium.
  • a computer program is stored on the computer-readable storage medium.
  • the computer program is executed by a processor, the RAID The steps of the write hole protection method.
  • the computer-readable storage medium mentioned here includes random access memory (RAM), internal memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disks, removable disks, CD-ROMs, Or any other form of storage medium known in the technical field.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种RAID的写洞保护方法、系统及存储介质,包括:预先设定日志区,并在RAID降级后设为启用状态;在日志区为启用状态下,每次执行条带写入操作前,判断RAID的失效成员盘在该条带中的数据块是否为校验数据块;如果不是校验数据块,则判断该条带的各个待写入的数据块中是否包括待写入失效成员盘的数据块;若是则在日志区中备份待写入失效成员盘的数据块;若否则计算出失效成员盘的数据块并在日志区中进行备份,或在日志区中备份各个待写入的数据块;在降级后的RAID经过故障之后启动时,利用日志区进行数据修复。应用本方案,避免了RAID的写洞问题。

Description

一种RAID的写洞保护方法、系统及存储介质
本申请要求于2019年10月18日提交至中国专利局、申请号为201910995233.2、发明名称为“一种RAID的写洞保护方法、系统及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及计算机数据存储技术领域,特别是涉及一种RAID的写洞保护方法、系统及存储介质。
背景技术
机械硬盘、固态盘是目前计算机系统中最常用的数据存储设备,现代计算机服务器、存储系统一般配置多块盘,为了提高单个逻辑盘的容量以及性能,可以使用RAID(Redundant Array Of Independent Disks,磁盘冗余阵列)技术将多块盘组成阵列。主流的RAID类型包括RAID0,RAID1,RAID5,RAID6等。其中,RAID5和RAID6在可用容量、性能方面的综合价值较高,应用最广泛。
但是,对于RAID5和RAID6而言,在由于故障而降级之后,如果在降级后又发生了系统故障,导致读写的中断,就会存在数据错误的可能,也即RAID5和RAID6的“写洞”问题。
以RAID5为例进行说明,对于一个条带而言,该条带中包括多个常规的数据块以及一个校验数据块,每个数据块存入不同的RAID成员盘中,例如一个条带中包括了1至5号数据块,其中的1号至4号均为常规数据块,依次存储在1号至4号成员盘中,5号为校验数据块存储在5号成员盘中。当任意一个成员盘故障时,均可以基于其他成员盘计算出该故障成员盘的数据块,例如1号成员盘故障,则可以根据2至4号成员盘中的常规数据块以及5号成员盘中的校验数据块对1号成员盘中的数据块进行恢复。出现1个成员盘故障之后,且在未修复之前,RAID5便处于降级状态。
由于RAID的条带写入过程中,不是所有成员盘的写入情况和写入进度都是严格同步的,因此RAID5在降级之后,如果又发生了故障,例如断 电等原因导致读写的中断,则对于某个条带而言,各个健康成员盘写入的数据量和起止位置会不同,之后在系统恢复后,根据健康成员盘计算故障成员盘的数据就可能出现错误。这也就是RAID5的“写洞”问题。
相较于RAID5允许一个成员盘故障,RAID6的一个条带中具有两个校验数据块,最多允许两个成员盘故障,出现“写洞”问题的原因则与RAID5同理,就不再赘述。
针对RAID5和RAID6的写洞问题,有方案提出了具有双控制器的存储系统,通过在双控制器之间同步备份条带写入数据的方法来解决写洞问题,但是,随着固态盘的普及和性能提高,多块盘的性能已高于控制器系统间的通信性能,这样的方案中,双控制器之间的通信性能成为瓶颈,并且这样的方案对于单控制器系统也不适用。
综上所述,如何有效地解决RAID5和RAID6的写洞问题,是目前本领域技术人员急需解决的技术问题。
发明内容
本发明的目的是提供一种RAID的写洞保护方法、系统及存储介质,以避免RAID的写洞问题。
为解决上述技术问题,本发明提供如下技术方案:
一种RAID的写洞保护方法,包括:
预先设定日志区,并在RAID降级之后,将所述日志区设置为启用状态;
在所述日志区为启用状态下,每次在执行条带写入操作之前,均判断所述RAID的失效成员盘在该条带中的数据块是否为校验数据块;
如果不是校验数据块,则判断该条带的各个待写入的数据块中是否包括待写入所述失效成员盘的数据块;
如果包括,则在所述日志区中备份待写入所述失效成员盘的数据块;
如果不包括,则根据RAID算法计算出所述失效成员盘的数据块并在所述日志区中进行备份,或者在所述日志区中备份各个待写入的数据块;
在降级后的所述RAID经过故障之后启动时,利用所述日志区进行数 据修复。
优选的,每次在所述日志区中进行数据块的备份时,还包括:保存该次备份操作的关联信息以构成一个条带日志;
相应的,所述利用所述日志区进行数据修复,包括:
通过所述日志区中的各个条带日志的关联信息确定出每一个有效的条带日志,并基于确定出的各个有效的条带日志进行数据修复。
优选的,保存的关联信息中至少包括:该条带日志的条带序号以及日志编号;
相应的,所述通过所述日志区中的各个条带日志的关联信息确定出每一个有效的条带日志,包括:
遍历所述日志区中的各个条带日志的关联信息,并且针对具有相同的条带序号的各个条带日志,将日志编号最大的条带日志作为有效的条带日志。
优选的,保存的关联信息中至少包括:该条带日志的条带序号、日志编号以及条带日志校验值;
相应的,所述通过所述日志区中的各个条带日志的关联信息确定出每一个有效的条带日志,包括:
遍历所述日志区中的各个条带日志的关联信息,并根据各个关联信息中的条带日志校验值,将不完整的各个条带日志进行过滤;
针对过滤后的具有相同的条带序号的各个条带日志,将日志编号最大的条带日志作为有效的条带日志。
优选的,保存的关联信息中至少包括:该条带日志的条带序号、日志编号、条带日志校验值以及日志版本号;
在利用所述日志区进行数据修复之后,还包括:将所述日志区中记录的总体信息中的日志区版本号进行更新;
相应的,所述通过所述日志区中的各个条带日志的关联信息确定出每一个有效的条带日志,包括:
遍历所述日志区中的各个条带日志的关联信息,将日志版本号不符合当前的所述日志区中的日志区版本号的各个条带日志进行过滤;
根据各个关联信息中的条带日志校验值,将不完整的各个条带日志进行过滤;
针对过滤后的具有相同的条带序号的各个条带日志,将日志编号最大的条带日志作为有效的条带日志。
优选的,所述基于确定出的各个有效的条带日志进行数据修复,包括:
针对日志区中的任意一个有效的条带日志,当该条带日志备份的是失效成员盘的数据块时,基于该条带日志中的备份数据以及读取到的各个成员盘的数据计算出检验块数据,并利用计算出的所述校验块数据进行校验盘的数据修复;
针对日志区中的任意一个有效的条带日志,当该条带日志备份的是各个待写入的数据块时,将各个待写入的数据块写入相应的健康成员盘中完成数据修复。
优选的,所述预先设定日志区,包括:
从所述RAID的每块成员盘中均划分出预设大小的空间,并通过划分出的各个空间构成设定的所述日志区。
优选的,所述预先设定日志区,包括:
利用所述RAID成员盘之外的固态盘设定所述日志区。
一种RAID的写洞保护系统,包括:
日志区设定模块,用于预先设定日志区,并在RAID降级之后,将所述日志区设置为启用状态;
第一判断模块,用于在所述日志区为启用状态下,每次在执行条带写入操作之前,均判断所述RAID的失效成员盘在该条带中的数据块是否为校验数据块;
如果不是校验数据块,则触发第二判断模块;
所述第二判断模块,用于判断该条带的各个待写入的数据块中是否包括待写入所述失效成员盘的数据块;
如果包括,则触发第一备份模块,如果不包括,则触发第二备份模块;
所述第一备份模块,用于在所述日志区中备份待写入所述失效成员盘的数据块;
所述第二备份模块,用于根据RAID算法计算出所述失效成员盘的数据块并在所述日志区中进行备份,或者在所述日志区中备份各个待写入的数据块;
数据修复模块,用于在降级后的所述RAID经过故障之后启动时,利用所述日志区进行数据修复。
一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现上述任一项所述的RAID的写洞保护方法的步骤。
应用本发明实施例所提供的技术方案,并不涉及多控制器之间的通信,即本申请的方案可以应用在单控制器的场合中,也可以应用在多控制器的场合中。而本申请的方案中预先设定了日志区,在RAID降级之后,将日志区设置为启用状态,每次在执行条带写入操作之前,进行相应的备份操作,从而避免了写洞问题。具体的,当备份的是失效成员盘的数据块时,由于备份时是整体备份,即直接通过日志区便可以获得失效成员盘的数据块,且是正确数据。此外还可以读取其他各个健康成员盘的数据块,结合备份的失效成员盘的数据块,计算出正确的校验数据块,避免校验数据块出错的情况。如果备份的是各个待写入的数据块,即备份的是待写入健康成员盘的数据块,说明该次写操作不涉及失效成员盘,则可以直接将备份的各个待写入的数据块写入相应的健康成员盘中,使得每一个健康成员盘的数据均是正确的数据。因此本申请的方案避免了RAID的写洞问题。
附图说明
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本发明中一种RAID的写洞保护方法的实施流程图;
图2为本发明中一种RAID的写洞保护系统的结构示意图。
具体实施方式
本发明的核心是提供一种RAID的写洞保护方法,避免了RAID的写洞问题。
为了使本技术领域的人员更好地理解本发明方案,下面结合附图和具体实施方式对本发明作进一步的详细说明。显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
请参考图1,图1为本发明中一种RAID的写洞保护方法的实施流程图,该RAID的写洞保护方法可以包括以下步骤:
步骤S101:预先设定日志区,并在RAID降级之后,将日志区设置为启用状态。
在设定日志区时,可以基于RAID的各个成员盘来设定出日志区,也可以利用RAID成员盘之外的其他硬盘作为日志空间。
例如在一种具体实施方式中,预先设定日志区可以具体为:从RAID的每块成员盘中均划分出预设大小的空间,并通过划分出的各个空间构成设定的日志区。
该种实施方式中,从RAID中的每一块硬盘中划分出一个空间,这些空间便可以构成设定的日志区。在RAID未降级时,设定的日志区为未启用状态,相应的,RAID降级之后,需要将日志区设置为启用状态。可以理解的是,RAID降级之后,由各个健康盘的相应空间组成可被使用的日志区,以基于该日志区进行后续步骤中的数据备份。
该种实施方式中基于RAID的各个成员盘来设定出日志区,成本较低,便于方案的实施。而在部分实施方式中,也可以利用RAID成员盘之外的固态盘来设定日志区,例如使用性能更高的NVME固态盘、性能和寿命都更高的3D NAND固态盘等。
步骤S102:在日志区为启用状态下,每次在执行条带写入操作之前,均判断RAID的失效成员盘在该条带中的数据块是否为校验数据块。如果 否,则执行步骤S103。
对于任意一次条带写入的操作,如果RAID的失效成员盘在该条带中的数据块为校验数据块,说明失效成员盘在该条带中的数据块不是用户数据,则可以不用进行备份,待写入的数据块中的各个常规数据块均可以被写入相应的健康成员盘中。也就是说,当判断出RAID的失效成员盘在该条带中的数据块为校验数据块时,可以直接开始执行本次条带写入的操作。
步骤S103:判断该条带的各个待写入的数据块中是否包括待写入失效成员盘的数据块。如果包括,则执行步骤S104,如果不包括,则执行步骤S105。
RAID的失效成员盘在该条带中的数据块不为校验数据块,说明是常规数据块,但是,在每次执行条带写入操作时,该条带写入操作涉及到的常规数据块可以是不同的,例如一个条带中包括了1至5号数据块,其中的1号至4号均为常规数据块,依次存储在1号至4号成员盘中,5号为校验数据块存储在5号成员盘中。例如执行某一次条带写入操作时,该条带写入操作需要将1号,2号以及3号数据块写入相应的成员盘中,并且假设失效成员盘为1号,则此时可以确定各个待写入的数据块中包括了待写入失效成员盘的数据块。
又如执行某一次条带写入操作时,该条带写入操作需要将2号和4号数据块写入相应的成员盘中,并且假设失效成员盘为1号,则此时可以确定各个待写入的数据块中不包括待写入失效成员盘的数据块。
步骤S104:在日志区中备份待写入失效成员盘的数据块。
例如上述例子中,条带写入操作需要将1号,2号以及3号数据块写入相应的成员盘中,即待写入的数据块为1号,2号以及3号,其中1号数据块是待写入失效成员盘的数据块,则在日志区中备份1号数据块。
步骤S105:根据RAID算法计算出失效成员盘的数据块并在日志区中进行备份,或者在日志区中备份各个待写入的数据块。
步骤S105中公开了触发步骤S105之后的两种备份方式,实际应用中具体选取哪一种可以根据实际需要进行选取,并且也可以随时切换。无论选取哪一种,后续均可以实现数据修复。
仍然假设一个条带中包括了1至5号数据块,其中的1号至4号均为常规数据块,依次存储在1号至4号成员盘中,5号为校验数据块存储在5号成员盘中。例如执行某一次条带写入操作时,该条带写入操作需要将1号,2号和5号数据块写入相应的成员盘中,并且假设失效成员盘为4号,则说明各个待写入的数据块中不包括待写入失效成员盘的数据块。
此时,一种备份方式是基于RAID算法,即根据待写入的1号,2号和5号数据块,并结合3号数据块,计算出失效成员盘的数据块,即计算出4号成员盘的数据块,进而将计算出的该4号成员盘的数据块备份在日志区中。
还有一种备份方式则是直接在日志区中备份各个待写入的数据块,该例子中即直接将待写入的1号,2号以及5号数据块备份在日志区中。
可以理解的是,在执行完步骤S104或者步骤S105之后,便可以执行本次条带写入操作。即针对任意一次条带写入操作,判断出对于该条带写入操作而言所需备份的数据块并完成备份之后,便可以执行本次条带写入操作。
还需要说明的是,每次在日志区中进行数据块的备份时,还可以包括:保存该次备份操作的关联信息以构成一个条带日志。从而使得后续在进行数据修复时,可以通过日志区中的各个条带日志的关联信息确定出每一个有效的条带日志,并基于确定出的各个有效的条带日志进行数据修复。
关联信息指的是进行数据块的备份时产生的相关参数数据,具体包括的项目可以根据需要进行设定和调整。在任意一次进行数据块的备份时,本次备份的各个数据块以及对应的关联信息便构成了一个条带日志。
例如在本发明的一种具体实施方式中,保存的关联信息中至少包括:该条带日志的条带序号以及日志编号。
则通过日志区中的各个条带日志的关联信息确定出每一个有效的条带日志的操作可以具体为:
遍历日志区中的各个条带日志的关联信息,并且针对具有相同的条带序号的各个条带日志,将日志编号最大的条带日志作为有效的条带日志。
该种实施方式中是考虑到针对同一个条带,可能会执行一次或多次写 操作,即可能会有多个条带日志都是针对同一个条带的情况,因此通过日志编号以及条带序号进行区分。
对于任意一个条带日志而言,条带序号指的是该条带日志所针对的条带在RAID中的序号,即可以根据条带序号找到对应的条带以及对应的成员盘的起止扇区。该种实施方式中的日志编号是递增的编号,从而用于识别针对同一条带的各个条带日志的先后,即条带序号相同时,日志编号最大的条带日志为有效日志。在其他实施方式中,也可以采用非递增的其他形式的日志编号,能够区分出针对同一条带的各个条带日志的先后即可,并不影响本发明的实施。
进一步地,在本发明的一种具体实施方式中,保存的关联信息中至少包括:该条带日志的条带序号、日志编号以及条带日志校验值;
相应的,通过日志区中的各个条带日志的关联信息确定出每一个有效的条带日志,可以具体包括以下两个步骤:
第一个步骤:遍历日志区中的各个条带日志的关联信息,并根据各个关联信息中的条带日志校验值,将不完整的各个条带日志进行过滤;
第二个步骤:针对过滤后的具有相同的条带序号的各个条带日志,将日志编号最大的条带日志作为有效的条带日志。
该种实施方式中,关联信息中还包括了条带日志校验值,从而使得可以基于条带日志校验值确定条带日志中的数据块是否完整,采用的校验算法可选择公开的校验算法如CRC32校验算法。由于该种实施方式中在关联信息中加入了条带日志校验值,将不完整的各个条带日志进行了过滤,有利于及时发现条带日志异常的情况,进而也就有利于数据的准确修复。
在本发明的一种具体实施方式中,保存的关联信息中至少包括:该条带日志的条带序号、日志编号、条带日志校验值以及日志版本号;
该种实施方式中,在步骤S106之后,还需要包括:将日志区中记录的总体信息中的日志区版本号进行更新。即需要在修复完成之后,对日志区版本号进行更新。
相应的,该种实施方式中的通过日志区中的各个条带日志的关联信息确定出每一个有效的条带日志,可以具体包括以下三个步骤:
第一个步骤:遍历日志区中的各个条带日志的关联信息,将日志版本号不符合当前的日志区中的日志区版本号的各个条带日志进行过滤;
第二个步骤:根据各个关联信息中的条带日志校验值,将不完整的各个条带日志进行过滤;
第三个步骤:针对过滤后的具有相同的条带序号的各个条带日志,将日志编号最大的条带日志作为有效的条带日志。
该种实施方式中,是考虑到在部分场合中,在利用日志区进行数据修复之后,RAID可能还是处于降级状态,即工作人员未必能够及时将失效盘恢复为健康盘,因此,RAID需要继续在降级状态下进行写洞保护,即日志区会接着进行数据块的备份,产生相应的条带日志。因此,为了区分出哪些条带日志是此前已被使用过的,哪些是未被使用的,该种实施方式中,关联信息中便包括了日志版本号,日志版本号需要与总体信息中的日志区版本号配合使用。
具体的,该种实施方式中,日志区中需要记录有总体信息,总体信息中的相关数据用来描述日志区的属性。日志区版本号与各个条带日志的日志版本号配合使用,可以确定出哪些条带日志是未被使用的条带日志。此处描述的未被使用,即还没有基于该条带日志执行过数据修复。
例如,RAID在某次降级之后,日志区启用,并且执行了16次条带的写入操作,且在日志区相应地备份了16个条带日志,各个条带日志的日志版本号例如均为001,日志区版本号也为001。之后RAID故障,则基于这16个条带日志进行数据修复。修复流程结束之后,日志区版本号更新,例如变更为002,之后例如又执行了20次条带的写入操作,且在日志区相应地备份了20个条带日志,各个条带日志的日志版本号均为002。之后RAID又故障,则需要进行数据修复时,日志区此时有16+20=36个条带日志,则基于这36个条带日志进行数据修复时,具体过程可以为:首先可以将日志版本号不符合当前的日志区中的日志区版本号的各个条带日志进行过滤,即过滤掉日志版本号为001的16个条带日志,之后是根据各个关联信息中的条带日志校验值,将剩下的20个条带日志中不完整的各个条带日志进行过滤,例如过滤了3个不完整的条带日志,剩下17个条带日志。最后 针对这17个条带日志中,具有相同的条带序号的各个条带日志,将日志编号最大的条带日志作为有效的条带日志,例如,这17个条带日志中有3个条带日志的条带序号是相同的,其余的14个条带日志的条带序号各不相同,则最终可以确定出15个有效的条带日志。在利用这15个条带日志进行修复之后,可以将日志区版本号调整为003。
该种例子中,日志区版本号在每次修复后进行更新,从而与日志版本号配合,以确定出哪些条带日志被使用过,在其他实施方式中,也可以有其他具体实现方式,并不影响本发明的实施。
并且,前述实施例中,对日志区版本号进行更新时,是递增更新,即增1。而在其他实施方式中,也可以采用其他的更新方式,只要保证日志区版本号更新之后与各个历史日志区版本号不同即可,例如可以每次将日志区版本号的数值增加2,又如,可以采用一个较大的随机数进行更新等。
此外需要说明的是,在实际应用中,关联信息中还可以包括其他参数,例如还可以包括条带日志的日志大小,日志大小表示该条带日志的连续占用空间大小。需要说明的是,由于条带日志是由备份的数据块以及对应的关联信息构成,因此日志大小包含了备份的数据块以及关联信息,即是整个条带日志的大小。又如,关联信息中还可以包括成员盘编号,数据长度等。
当然,除了日志区版本号之外,日志区中的总体信息还可以包括其他参数例如总体信息中还包括:日志区的启用状态,日志空间大小,条带日志起始地址,条带日志空间大小,条带日志对齐区域个数以及日志区校验值等,使得工作人员可以基于总体信息快速、方便地获知日志区的状态。
日志区的启用状态用于表示日志区是否启用,即在RAID降级之后,日志区启用,不降级则不启用。当然,RAID类型不为RAID5、RAID6时,也是不启用。
条带日志空间大小为固定值,条带日志空间大小加上总体信息的占用空间即为日志空间大小。条带日志起始地址指的是首个条带日志的起始地址。
条带日志空间大小除以条带日志对齐区域个数,可以得到每一个日志 对齐区域的大小。例如条带日志空间大小为2G,预设的条带日志对齐区域个数例如为1024个,则每个日志对齐区域的大小为2M,该2M的含义是指用这2M的空间存放一个或者多个条带日志,但不会有条带日志跨过这2M空间的情况,即每个日志对齐区域的头部均是某一个条带日志的起止位置。这样的方式,将每个日志对齐区域的起始地址与相应的条带日志的起始地址对齐,可以更快速地定位查找条带日志。
还需要指出的是,在往每个日志对齐区域中写入条带日志中,通常可以是顺序写,从而更好地保证硬盘寿命。
日志区校验值可以实现对日志区的数据完整性的校验。
步骤S106:在降级后的RAID经过故障之后启动时,利用日志区进行数据修复。
具体的,可以根据日志区中的各个条带日志进行数据修复,当然,条带日志中的备份的数据块的内容不同时,所采用的数据修复手段可以不同。
在利用日志区进行数据修复之前,可以确定日志区是否有效。确定有效之后,在利用各个条带日志进行数据修复时,如前文的描述,通常可以通过日志区中的各个条带日志的关联信息确定出每一个有效的条带日志,并基于确定出的各个有效的条带日志进行数据修复。修复完毕之后,可以重新初始化日志区的总体信息,具体的,可以更新日志区版本号。
对于某一个确定为有效的条带日志,如果该条带日志中备份的是各个待写入的数据块,即备份的是待写入健康成员盘的数据块时,说明该条带日志对应的写操作不涉及失效成员盘,则可以直接将备份的各个数据块写入相应的健康成员盘中,使得每一个健康成员盘的数据均是正确的数据,这样基于正确的健康成员盘确定出的失效成员盘的数据块自然也是正确的,因此避免了写洞问题。但需要说明的是,当备份的是待写入健康成员盘的数据块时,对日志区的空间占用较大。
而如果条带日志中备份的是失效成员盘的数据块,由于备份时是整体备份,即直接通过条带日志便可以获得失效成员盘的数据块,因此也能够避免写洞问题。具体的,可以基于备份的失效成员盘的数据块以及各个健康成员盘中的常规数据块,计算出正确的校验数据块,将该校验数据块写 入校验盘,便可以使得校验盘中的数据为准确的数据。之后,如果要获取失效成员盘的数据块,通过各个健康成员盘康便可以准确计算出。
因此,假设出现了写洞问题,即出现了各健康成员盘的写入进度不同步的情况,本申请也能够获得正确的失效成员盘的数据。
在本发明的一种具体实施方式中,基于确定出的各个有效的条带日志进行数据修复,可以具体包括:
步骤一:针对日志区中的任意一个有效的条带日志,当该条带日志备份的是失效成员盘的数据块时,基于该条带日志中的备份数据以及读取到的各个成员盘的数据计算出检验块数据,并利用计算出的校验块数据进行校验盘的数据修复;
步骤二:针对日志区中的任意一个有效的条带日志,当该条带日志备份的是各个待写入的数据块时,将各个待写入的数据块写入相应的健康成员盘中完成数据修复。
具体的,针对日志区中的任意一个有效的条带日志,当该条带日志备份的是失效成员盘的数据块时,该种实施方式可以基于备份数据进行校验盘的数据验证以及修复。即读取其他各个健康成员盘的数据块,结合该条带日志中的备份的失效成员盘的数据块,可以计算出正确的校验数据块。
在计算出正确的校验数据块之后,可以直接将计算出的校验块数据直接写入校验盘中,这样后续便可以通过各个健康成员盘的数据块准确确定出失效成员盘的数据块,实现写洞保护。
进一步的,在将计算出的校验块数据直接写入校验盘之前,还可以先从校验盘上读取校验数据块,如果计算出的校验数据块与读取的校验数据块不一致,说明校验盘上的校验数据块有误,可以进行事件记录,在记录之后,再将计算出的校验块数据写入校验盘。
而针对日志区中的任意一个条带日志,当该条带日志备份的是各个待写入的数据块时,说明条带日志备份的是健康成员盘的数据块。则可以根据条带日志中备份的数据块以及其他成员盘的数据块,计算出失效盘的数据块,记为A。读取该条带在所有健康成员盘上的数据块进而计算出失效盘的数据块,记为B。如果A和B不同,说明健康成员盘上的数据块有误, 可以进行事件记录,并将各个待写入的数据块写入相应的健康成员盘中完成数据修复,即将条带日志中的备份的数据块写入相应的健康成员盘中完成数据修复,通常,出现该错误的原因是校验盘的数据错误导致的。当然,也可以不执行该实施例中的比较操作,直接将各个待写入的数据块写入相应的健康成员盘中完成数据修复,因为备份的各个数据块均为正确的数据块。
应用本发明实施例所提供的技术方案,并不涉及多控制器之间的通信,即本申请的方案可以应用在单控制器的场合中,也可以应用在多控制器的场合中。而本申请的方案中预先设定了日志区,在RAID降级之后,将日志区设置为启用状态,每次在执行条带写入操作之前,进行相应的备份操作,从而避免了写洞问题。具体的,当备份的是失效成员盘的数据块时,由于备份时是整体备份,即直接通过日志区便可以获得失效成员盘的数据块,且是正确数据。此外还可以读取其他各个健康成员盘的数据块,结合备份的失效成员盘的数据块,计算出正确的校验数据块,避免校验数据块出错的情况。如果备份的是各个待写入的数据块,即备份的是待写入健康成员盘的数据块,说明该次写操作不涉及失效成员盘,则可以直接将备份的各个待写入的数据块写入相应的健康成员盘中,使得每一个健康成员盘的数据均是正确的数据。
相应于上面的方法实施例,本发明实施例还提供了一种RAID的写洞保护系统,可与上文相互对应参照。
参见图2所示,为本发明中一种RAID的写洞保护系统的结构示意图,包括:
日志区设定模块201,用于预先设定日志区,并在RAID降级之后,将日志区设置为启用状态;
第一判断模块202,用于在日志区为启用状态下,每次在执行条带写入操作之前,均判断RAID的失效成员盘在该条带中的数据块是否为校验数据块;
如果不是校验数据块,则触发第二判断模块203;
第二判断模块203,用于判断该条带的各个待写入的数据块中是否包 括待写入失效成员盘的数据块;
如果包括,则触发第一备份模块204,如果不包括,则触发第二备份模块205;
第一备份模块204,用于在日志区中备份待写入失效成员盘的数据块;
第二备份模块205,用于根据RAID算法计算出失效成员盘的数据块并在日志区中进行备份,或者在日志区中备份各个待写入的数据块;
数据修复模块206,用于在降级后的RAID经过故障之后启动时,利用日志区进行数据修复。
在本发明的一种具体实施方式中,还包括条带日志生成模块,用于:每次在所述日志区中进行数据块的备份时,保存该次备份操作的关联信息以构成一个条带日志;
相应的,所述数据修复模块206,具体用于:在降级后的RAID经过故障之后启动时,通过所述日志区中的各个条带日志的关联信息确定出每一个有效的条带日志,并基于确定出的各个有效的条带日志进行数据修复。
在本发明的一种具体实施方式中,日志生成模块保存的关联信息中至少包括:该条带日志的条带序号以及日志编号;
相应的,数据修复模块206,具体用于:
在降级后的RAID经过故障之后启动时,遍历所述日志区中的各个条带日志的关联信息,并且针对具有相同的条带序号的各个条带日志,将日志编号最大的条带日志作为有效的条带日志。
在本发明的一种具体实施方式中,日志生成模块保存的关联信息中至少包括:该条带日志的条带序号、日志编号以及条带日志校验值;
相应的,数据修复模块206,具体用于:
在降级后的RAID经过故障之后启动时,遍历所述日志区中的各个条带日志的关联信息,并根据各个关联信息中的条带日志校验值,将不完整的各个条带日志进行过滤;
针对过滤后的具有相同的条带序号的各个条带日志,将日志编号最大的条带日志作为有效的条带日志。
在本发明的一种具体实施方式中,日志生成模块保存的关联信息中至 少包括:该条带日志的条带序号、日志编号、条带日志校验值以及日志版本号;
在利用所述日志区进行数据修复之后,还包括日志区版本号更新模块,用于在数据修复模块206利用所述日志区进行数据修复之后,将所述日志区中记录的总体信息中的日志区版本号进行更新;
相应的,数据修复模块206,具体用于:
在降级后的RAID经过故障之后启动时,遍历所述日志区中的各个条带日志的关联信息,将日志版本号不符合当前的所述日志区中的日志区版本号的各个条带日志进行过滤;
根据各个关联信息中的条带日志校验值,将不完整的各个条带日志进行过滤;
针对过滤后的具有相同的条带序号的各个条带日志,将日志编号最大的条带日志作为有效的条带日志。
在本发明的一种具体实施方式中,数据修复模块206,具体用于:
在降级后的RAID经过故障之后启动时,针对日志区中的任意一个有效的条带日志,当该条带日志备份的是失效成员盘的数据块时,基于该条带日志中的备份数据以及读取到的各个成员盘的数据计算出检验块数据,并利用计算出的所述校验块数据进行校验盘的数据修复;
在降级后的RAID经过故障之后启动时,针对日志区中的任意一个有效的条带日志,当该条带日志备份的是各个待写入的数据块时,将各个待写入的数据块写入相应的健康成员盘中完成数据修复。
在本发明的一种具体实施方式中,日志区设定模块201具体用于:
从所述RAID的每块成员盘中均划分出预设大小的空间,并通过划分出的各个空间构成设定的所述日志区。
在本发明的一种具体实施方式中,日志区设定模块201具体用于:
利用所述RAID成员盘之外的固态盘设定所述日志区。
相应于上面的方法和系统实施例,本发明实施例还提供了一种计算机可读存储介质,计算机可读存储介质上存储有计算机程序,计算机程序被处理器执行时实现上述任一项的RAID的写洞保护方法的步骤。这里所说 的计算机可读存储介质包括随机存储器(RAM)、内存、只读存储器(ROM)、电可编程ROM、电可擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM、或技术领域内所公知的任意其它形式的存储介质。
专业人员还可以进一步意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。
本文中应用了具体个例对本发明的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本发明的技术方案及其核心思想。应当指出,对于本技术领域的普通技术人员来说,在不脱离本发明原理的前提下,还可以对本发明进行若干改进和修饰,这些改进和修饰也落入本发明权利要求的保护范围内。

Claims (10)

  1. 一种RAID的写洞保护方法,其特征在于,包括:
    预先设定日志区,并在RAID降级之后,将所述日志区设置为启用状态;
    在所述日志区为启用状态下,每次在执行条带写入操作之前,均判断所述RAID的失效成员盘在该条带中的数据块是否为校验数据块;
    如果不是校验数据块,则判断该条带的各个待写入的数据块中是否包括待写入所述失效成员盘的数据块;
    如果包括,则在所述日志区中备份待写入所述失效成员盘的数据块;
    如果不包括,则根据RAID算法计算出所述失效成员盘的数据块并在所述日志区中进行备份,或者在所述日志区中备份各个待写入的数据块;
    在降级后的所述RAID经过故障之后启动时,利用所述日志区进行数据修复。
  2. 根据权利要求1所述的RAID的写洞保护方法,其特征在于,每次在所述日志区中进行数据块的备份时,还包括:保存该次备份操作的关联信息以构成一个条带日志;
    相应的,所述利用所述日志区进行数据修复,包括:
    通过所述日志区中的各个条带日志的关联信息确定出每一个有效的条带日志,并基于确定出的各个有效的条带日志进行数据修复。
  3. 根据权利要求2所述的RAID的写洞保护方法,其特征在于,保存的关联信息中至少包括:该条带日志的条带序号以及日志编号;
    相应的,所述通过所述日志区中的各个条带日志的关联信息确定出每一个有效的条带日志,包括:
    遍历所述日志区中的各个条带日志的关联信息,并且针对具有相同的条带序号的各个条带日志,将日志编号最大的条带日志作为有效的条带日志。
  4. 根据权利要求2所述的RAID的写洞保护方法,其特征在于,保存的关联信息中至少包括:该条带日志的条带序号、日志编号以及条带日志校验值;
    相应的,所述通过所述日志区中的各个条带日志的关联信息确定出每一个有效的条带日志,包括:
    遍历所述日志区中的各个条带日志的关联信息,并根据各个关联信息中的条带日志校验值,将不完整的各个条带日志进行过滤;
    针对过滤后的具有相同的条带序号的各个条带日志,将日志编号最大的条带日志作为有效的条带日志。
  5. 根据权利要求2所述的RAID的写洞保护方法,其特征在于,保存的关联信息中至少包括:该条带日志的条带序号、日志编号、条带日志校验值以及日志版本号;
    在利用所述日志区进行数据修复之后,还包括:将所述日志区中记录的总体信息中的日志区版本号进行更新;
    相应的,所述通过所述日志区中的各个条带日志的关联信息确定出每一个有效的条带日志,包括:
    遍历所述日志区中的各个条带日志的关联信息,将日志版本号不符合当前的所述日志区中的日志区版本号的各个条带日志进行过滤;
    根据各个关联信息中的条带日志校验值,将不完整的各个条带日志进行过滤;
    针对过滤后的具有相同的条带序号的各个条带日志,将日志编号最大的条带日志作为有效的条带日志。
  6. 根据权利要求2至5任一项所述的RAID的写洞保护方法,其特征在于,所述基于确定出的各个有效的条带日志进行数据修复,包括:
    针对日志区中的任意一个有效的条带日志,当该条带日志备份的是失效成员盘的数据块时,基于该条带日志中的备份数据以及读取到的各个成员盘的数据计算出检验块数据,并利用计算出的所述校验块数据进行校验盘的数据修复;
    针对日志区中的任意一个有效的条带日志,当该条带日志备份的是各个待写入的数据块时,将各个待写入的数据块写入相应的健康成员盘中完成数据修复。
  7. 根据权利要求1所述的RAID的写洞保护方法,其特征在于,所述 预先设定日志区,包括:
    从所述RAID的每块成员盘中均划分出预设大小的空间,并通过划分出的各个空间构成设定的所述日志区。
  8. 根据权利要求1所述的RAID的写洞保护方法,其特征在于,所述预先设定日志区,包括:
    利用所述RAID成员盘之外的固态盘设定所述日志区。
  9. 一种RAID的写洞保护系统,其特征在于,包括:
    日志区设定模块,用于预先设定日志区,并在RAID降级之后,将所述日志区设置为启用状态;
    第一判断模块,用于在所述日志区为启用状态下,每次在执行条带写入操作之前,均判断所述RAID的失效成员盘在该条带中的数据块是否为校验数据块;
    如果不是校验数据块,则触发第二判断模块;
    所述第二判断模块,用于判断该条带的各个待写入的数据块中是否包括待写入所述失效成员盘的数据块;
    如果包括,则触发第一备份模块,如果不包括,则触发第二备份模块;
    所述第一备份模块,用于在所述日志区中备份待写入所述失效成员盘的数据块;
    所述第二备份模块,用于根据RAID算法计算出所述失效成员盘的数据块并在所述日志区中进行备份,或者在所述日志区中备份各个待写入的数据块;
    数据修复模块,用于在降级后的所述RAID经过故障之后启动时,利用所述日志区进行数据修复。
  10. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1至8任一项所述的RAID的写洞保护方法的步骤。
PCT/CN2019/121096 2019-10-18 2019-11-27 一种raid的写洞保护方法、系统及存储介质 WO2021072917A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/642,643 US11650880B2 (en) 2019-10-18 2019-11-27 Write hole protection method and system for raid, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910995233.2A CN110795273B (zh) 2019-10-18 2019-10-18 一种raid的写洞保护方法、系统及存储介质
CN201910995233.2 2019-10-18

Publications (1)

Publication Number Publication Date
WO2021072917A1 true WO2021072917A1 (zh) 2021-04-22

Family

ID=69439329

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/121096 WO2021072917A1 (zh) 2019-10-18 2019-11-27 一种raid的写洞保护方法、系统及存储介质

Country Status (3)

Country Link
US (1) US11650880B2 (zh)
CN (1) CN110795273B (zh)
WO (1) WO2021072917A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024113687A1 (zh) * 2022-11-29 2024-06-06 苏州元脑智能科技有限公司 一种数据恢复方法及相关装置

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112181298B (zh) * 2020-09-25 2022-05-17 杭州宏杉科技股份有限公司 阵列访问方法、装置、存储设备及机器可读存储介质
CN113626248B (zh) * 2021-06-30 2023-07-18 苏州浪潮智能科技有限公司 一种raid中条带数据不一致的修复方法和系统
CN113791731A (zh) * 2021-08-26 2021-12-14 深圳创云科软件技术有限公司 一种解决存储磁盘阵列Write Hole的处理方法
CN115599607B (zh) * 2022-11-29 2023-06-16 苏州浪潮智能科技有限公司 一种raid阵列的数据恢复方法及相关装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104035830A (zh) * 2014-06-24 2014-09-10 浙江宇视科技有限公司 一种数据恢复方法和装置
CN104407821A (zh) * 2014-12-12 2015-03-11 浪潮(北京)电子信息产业有限公司 一种实现raid重构的方法及装置
CN104881242A (zh) * 2014-02-28 2015-09-02 中兴通讯股份有限公司 数据写入方法及装置
CN106062721B (zh) * 2014-12-31 2018-11-16 华为技术有限公司 一种将数据写入存储系统的方法和存储系统
CN109725831A (zh) * 2017-10-27 2019-05-07 伊姆西Ip控股有限责任公司 管理存储系统的方法、系统和计算机程序产品

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090300282A1 (en) * 2008-05-30 2009-12-03 Promise Technology, Inc. Redundant array of independent disks write recovery system
CN102023809B (zh) * 2009-09-21 2012-10-17 成都市华为赛门铁克科技有限公司 存储系统、从存储系统读取数据的方法及写入数据的方法
US8839028B1 (en) * 2011-12-23 2014-09-16 Emc Corporation Managing data availability in storage systems
CN102968361A (zh) * 2012-11-19 2013-03-13 浪潮电子信息产业股份有限公司 一种raid数据自修复的方法
CN104407813B (zh) * 2014-11-20 2019-02-19 上海宝存信息科技有限公司 一种基于固态存储介质的raid系统及方法
CN107273046B (zh) * 2017-06-06 2019-08-13 华中科技大学 一种基于固态盘阵列的数据处理方法及系统
CN110413205B (zh) * 2018-04-28 2023-07-07 伊姆西Ip控股有限责任公司 用于向磁盘阵列进行写入的方法、设备和计算机可读存储介质
CN110413439B (zh) * 2018-04-28 2023-10-20 伊姆西Ip控股有限责任公司 用于检测数据的不完整写入的方法、设备和计算机可读介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104881242A (zh) * 2014-02-28 2015-09-02 中兴通讯股份有限公司 数据写入方法及装置
CN104035830A (zh) * 2014-06-24 2014-09-10 浙江宇视科技有限公司 一种数据恢复方法和装置
CN104407821A (zh) * 2014-12-12 2015-03-11 浪潮(北京)电子信息产业有限公司 一种实现raid重构的方法及装置
CN106062721B (zh) * 2014-12-31 2018-11-16 华为技术有限公司 一种将数据写入存储系统的方法和存储系统
CN109725831A (zh) * 2017-10-27 2019-05-07 伊姆西Ip控股有限责任公司 管理存储系统的方法、系统和计算机程序产品

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024113687A1 (zh) * 2022-11-29 2024-06-06 苏州元脑智能科技有限公司 一种数据恢复方法及相关装置

Also Published As

Publication number Publication date
US11650880B2 (en) 2023-05-16
CN110795273B (zh) 2021-06-15
US20220350703A1 (en) 2022-11-03
CN110795273A (zh) 2020-02-14

Similar Documents

Publication Publication Date Title
WO2021072917A1 (zh) 一种raid的写洞保护方法、系统及存储介质
US9189311B2 (en) Rebuilding a storage array
JP5768587B2 (ja) ストレージシステム、ストレージ制御装置およびストレージ制御方法
US8700951B1 (en) System and method for improving a data redundancy scheme in a solid state subsystem with additional metadata
JP2959901B2 (ja) 記憶装置の冗長アレイおよびオンライン再構成方法
CN102184129B (zh) 磁盘阵列的容错方法和装置
US8356292B2 (en) Method for updating control program of physical storage device in storage virtualization system and storage virtualization controller and system thereof
JPWO2006123416A1 (ja) ディスク故障復旧方法及びディスクアレイ装置
JP4792490B2 (ja) 記憶制御装置及びraidグループの拡張方法
TWI295021B (en) Storage system and method for handling bad storage device data therefor
US8041891B2 (en) Method and system for performing RAID level migration
JP2005122338A (ja) スペアディスクドライブをもつディスクアレイ装置及びデータスペアリング方法
JP2016512365A (ja) 不揮発性メモリシステムにおける同期ミラーリング
CN104035830A (zh) 一种数据恢复方法和装置
JP2008052547A (ja) 記憶制御装置及び記憶制御装置の障害回復方法
US7958432B2 (en) Verification of non volatile storage storing preserved unneeded data
US7360018B2 (en) Storage control device and storage device error control method
WO2012089152A1 (zh) 一种文件系统内实现独立磁盘冗余阵列保护的方法及装置
TW201525687A (zh) 檔案系統的日誌子系統寫入方法、錯誤追蹤方法及處理器
WO2024113685A1 (zh) 一种raid阵列的数据恢复方法及相关装置
CN108874312B (zh) 数据存储方法以及存储设备
WO2021098041A1 (zh) 存储集群bbu故障时的节点模式调整方法及相关组件
KR20140086223A (ko) 디스크 어레이의 패리티 재동기화 장치 및 방법
US20070036055A1 (en) Device, method and program for recovering from media error in disk array device
EP2613258A1 (en) Automatic remapping in redundant array of independent disks and related raid

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19949354

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19949354

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 19949354

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 31.10.2022)

122 Ep: pct application non-entry in european phase

Ref document number: 19949354

Country of ref document: EP

Kind code of ref document: A1