WO2024113685A1 - 一种raid阵列的数据恢复方法及相关装置 - Google Patents

一种raid阵列的数据恢复方法及相关装置 Download PDF

Info

Publication number
WO2024113685A1
WO2024113685A1 PCT/CN2023/093953 CN2023093953W WO2024113685A1 WO 2024113685 A1 WO2024113685 A1 WO 2024113685A1 CN 2023093953 W CN2023093953 W CN 2023093953W WO 2024113685 A1 WO2024113685 A1 WO 2024113685A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
stripe
raid array
written
secure memory
Prior art date
Application number
PCT/CN2023/093953
Other languages
English (en)
French (fr)
Inventor
李飞龙
许永良
孙明刚
Original Assignee
苏州元脑智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏州元脑智能科技有限公司 filed Critical 苏州元脑智能科技有限公司
Publication of WO2024113685A1 publication Critical patent/WO2024113685A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1469Backup restoration techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1456Hardware arrangements for backup
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • G06F3/0619Improving the reliability of storage systems in relation to data integrity, e.g. data losses, bit errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0683Plurality of storage devices
    • G06F3/0689Disk arrays, e.g. RAID, JBOD
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present application relates to the field of storage technology, and in particular to a data recovery method for a RAID array; and also to a data recovery device, equipment and non-volatile readable storage medium for a RAID array.
  • RAID technology is an important technology in the storage field. It uses stripes, mirrors, and checks to ensure data reliability.
  • the industry often uses multiple control nodes to form a cluster.
  • the master node is responsible for processing the host's I/O requests
  • the auxiliary node is responsible for the background tasks of the storage system (for example, RAID array initialization, inspection, and reconstruction tasks, etc.), so as to improve the I/O performance of the storage system.
  • the industry currently uses redundant disks in RAID arrays to recover data from failed disks.
  • the industry currently mainly adopts the following two methods: 1. Adopt the design concept of the file system's Journal to achieve atomic processing of write requests; 2. Use non-volatile memory as a write cache to achieve the purpose of atomic write operations.
  • the first method requires multiple reads and writes to the underlying file system, which will seriously affect performance.
  • the second method requires the addition of NVRAM hardware resources, which are expensive and have limited storage resources. Limited storage resources cannot store large amounts of data.
  • the purpose of the present application is to provide a data recovery method for a RAID array, which can solve the write hole problem without affecting the performance of the storage system and without increasing hardware resources.
  • Another purpose of the present application is to provide a data recovery device, equipment and non-volatile readable storage medium for a RAID array, all of which have the above technical effects.
  • the present application provides a method for recovering data from a RAID array, comprising:
  • the data and verification data to be written to the stripe are stored in the secure memory;
  • the secure memory is the memory that completes data storage when the storage system is powered off by the battery backup unit;
  • the stripe information is read, and the read data and the verification data are written into the stripe according to the stripe information.
  • the stripe information of the saved stripe includes:
  • Stripe information is stored in each node of the I/O group.
  • the data and verification data to be written to the stripe are stored in the secure memory, including:
  • the data and verification data are stored in the secure memory of the node corresponding to the write request in the I/O group.
  • the data is read and verified from the secure memory.
  • the data in the unwritten stripes is recovered based on the data in the written stripes.
  • the following step is further included:
  • the RAID array has data recovery capability, the data not written into the stripe is recovered based on the data written into the stripe.
  • determining whether the RAID array has data recovery capability includes:
  • the RAID array has data recovery capability.
  • the RAID array has data recovery capabilities
  • the RAID array will not have the ability to recover data.
  • the following step is further included:
  • the secure memory is not restored to be available after the first preset time, the data not written to the stripe is restored based on the data written to the stripe;
  • the data and the verification data are read from the secure memory, and the read data and the verification data are written into the corresponding stripe.
  • the data and the verification data are read from the secure memory, and the read data and the verification data are written into the corresponding stripe.
  • the stripe information of the stripes in the RAID array may include:
  • the stripe address of the stripe and/or the stripe number of the stripe are saved.
  • the stripe information of the stripes in the RAID array may include:
  • the verification data is saved in the secure memory, and after the data is written to the disk, a write success signal is sent to the host.
  • the present application also provides a data recovery device for a RAID array, comprising:
  • a first storage module used for storing stripe information of stripes in the RAID array
  • the second storage module is used to store the data and verification data to be written to the stripe in the secure memory
  • the secure memory is the memory that completes data storage when the battery backup unit supplies power to the storage system after the storage system loses power
  • a read module is used to read and verify data from the secure memory after the storage system is powered off and restored;
  • the writing module is used to read the stripe information and write the read data and the verification data into the stripe according to the stripe information.
  • the present application also provides a RAID array data recovery device, including:
  • a processor is used to implement the steps of any of the above methods for recovering data from a RAID array when executing a computer program.
  • the present application also provides a non-volatile readable storage medium, on which a computer program is stored.
  • a computer program is stored on which a computer program is stored.
  • the steps of the data recovery method of the RAID array as described above are implemented.
  • the data recovery method of the RAID array includes: saving stripe information of stripes in the RAID array; saving data and verification data to be written into the stripe in a secure memory; the secure memory is a memory for completing data preservation when the storage system is powered on by a battery backup unit after a power failure of the storage system; after the storage system is restored, reading data and verification data from the secure memory; reading stripe information, and writing the read data and verification data into the stripe according to the stripe information.
  • the data to be written to the stripe and the verification data will be saved in the memory protected by the battery backup unit. Even if the storage system loses power abnormally, when the battery backup unit is powered, it can ensure that the data to be written to the stripe and the verification data will be completely saved in the memory protected by the battery backup unit. Then, after the storage system is restored, the data can be read from the memory protected by the battery backup unit and rewritten to the stripe to restore the consistency of the stripe. The whole process does not require multiple reading and writing of the underlying file system, will not affect the performance of the storage system, and does not require additional hardware resources.
  • the RAID array data recovery device, equipment, and non-volatile readable storage medium provided in this application all have the above-mentioned technical effects.
  • FIG1 is a schematic diagram of a flow chart of a data recovery method for a RAID array provided in an embodiment of the present application
  • FIG2 is a schematic diagram of a host I/O divided into stripes according to an embodiment of the present application.
  • FIG3 is a schematic diagram of a data recovery device for a RAID array provided in an embodiment of the present application.
  • FIG. 4 is a schematic diagram of a data recovery device for a RAID array provided in an embodiment of the present application.
  • the core of this application is to provide a data recovery method for a RAID array, which can solve the write hole problem without affecting the performance of the storage system and without increasing hardware resources.
  • Another core of this application is to provide a data recovery device, equipment and non-volatile readable storage medium for a RAID array, all of which have the above technical effects.
  • FIG. 1 is a flow chart of a data recovery method for a RAID array provided in an embodiment of the present application. Referring to FIG. 1 , the method includes:
  • the stripe information of the stripe is saved so as to restore the consistency of the stripe based on the stripe information.
  • a difference setting can be made so that the stripe data can be restored in a targeted manner.
  • saving stripe information for stripes in a RAID array includes:
  • the stripe information may be a stripe address. When restoring stripe data, data may be rewritten to the corresponding stripe according to the saved stripe address.
  • the stripe information may also be a stripe number. When restoring stripe data, data may be rewritten to the corresponding stripe according to the saved stripe number.
  • the specific location where the stripe information is stored can also be set differently.
  • saving stripe information for stripes in a RAID array includes:
  • Stripe information is stored in each node of the I/O group.
  • an I/OGROUP i.e., an I/O group.
  • the two nodes in the I/O group are each other's peer nodes.
  • One or more I/O groups form a cluster, and the nodes in the cluster can communicate with each other.
  • some embodiments save the stripe information to the two nodes of the I/O group, and the stripe information can be read from any node of the I/O group later to ensure that the stripe information can be obtained.
  • the stripe information can be stored in the memory protected by the battery backup unit.
  • S102 Save the data and verification data to be written into the stripe in the secure memory;
  • the secure memory is the memory that is powered by the battery backup unit to save data after the storage system loses power;
  • some embodiments recover stripe data based on secure memory, i.e., memory protected by a battery backup unit.
  • secure memory i.e., memory protected by a battery backup unit.
  • the memory protected by a battery backup unit means that after the storage system loses power, the battery backup unit continues to supply power to the storage, and data is saved when the battery backup unit is powered.
  • some embodiments save them all in the memory protected by the battery backup unit. Even if the storage system loses power, these data and verification data can be saved normally in the memory protected by the battery backup unit.
  • storing the data to be written to the stripe and the verification data in the secure memory includes:
  • the data and verification data are stored in the secure memory of the node corresponding to the write request in the I/O group.
  • some embodiments When saving data and verification data of the stripe to be written, some embodiments only save them to the node corresponding to the write request in the I/O group, and another node in the I/O group does not save the data and verification data of the stripe to be written.
  • it may also include:
  • S104 Read the stripe information, and write the read data and the verification data into the stripe according to the stripe information.
  • the data and verification data are read from the memory protected by the battery backup unit, and the data and verification data are rewritten to the corresponding stripe according to the read stripe information to restore the consistency of the stripe.
  • the following steps may also be included:
  • the data is read and verified from the secure memory.
  • the premise for reading the data and verification data stored in the memory protected by the battery backup unit is that the memory protected by the battery backup unit is available. Available means that the data and verification data can be read from the memory protected by the battery backup unit and the data and verification are valid. If available, the data and verification data are read from the memory protected by the battery backup unit. If not available, the stripe data recovery process can be terminated, or other methods can be used to recover the stripe data.
  • the data not written to the stripe is recovered based on the data written to the stripe.
  • a backup solution may be adopted to restore the data not written to the stripe based on the data written to the stripe when the storage system loses power, thereby restoring the stripe consistency.
  • the check data is reconstructed based on the written data, or the unwritten data is restored based on the written data and the check data.
  • the method before restoring the data not written to the stripe according to the data written to the stripe when the storage system loses power, the method further includes:
  • the RAID array has data recovery capability, the data not written to the stripe is recovered based on the data written to the stripe when the storage system loses power.
  • the recovery of data not written to the stripe based on the data written to the stripe when the storage system loses power is possible only when the RAID array has data recovery capability. If the RAID array does not have data recovery capability, it is impossible to recover data written to the stripe based on the data written to the stripe when the storage system loses power. The data of the stripe is recovered without writing the data of the stripe.
  • determining whether the RAID array has data recovery capability may include:
  • the RAID array has data recovery capability.
  • the RAID array has no faulty disk, it indicates that data can be read from each data disk, so at this time, the data that has not been written to the stripe can be restored based on the data that has been written to the stripe when the storage system loses power. If the RAID array has a faulty disk, it may not be possible to restore the data that has not been written to the stripe based on the data that has been written to the stripe when the storage system loses power.
  • the RAID array if the RAID array has a faulty disk, it is determined whether the number of faulty disks exceeds the allowed value; if the number of faulty disks does not exceed the allowed value, the RAID array has data recovery capabilities; if the number of faulty disks exceeds the allowed value, the RAID array does not have data recovery capabilities.
  • the method before restoring the data not written to the stripe according to the data written to the stripe, the method further includes:
  • the secure memory is not restored to use after the first preset time, the data not written to the stripe is restored based on the data written to the stripe when the power is off;
  • the data and the verification data are read from the secure memory, and the read data and the verification data are written into the corresponding stripe.
  • the unavailability of the memory protected by the battery backup unit may be temporary due to some reasons. Therefore, when the memory protected by the battery backup unit is unavailable, you can wait for a period of time. If the memory protected by the battery backup unit is available again after a period of time, read the data and verification data from the secure memory, and write the read data and verification data to the corresponding stripe. If the memory protected by the battery backup unit is still unavailable after a period of time, restore the data that was not written to the stripe based on the data that was written to the stripe when the storage system was powered off.
  • it further includes:
  • the data and the verification data are read from the secure memory, and the read data and the verification data are written into the corresponding stripe.
  • some embodiments wait for a second preset time. If the memory protected by the battery backup unit is restored to be available after the second preset time, data and verification data are read from the secure memory, and the read data and verification data are written to the corresponding stripe. If the memory protected by the battery backup unit is still unavailable after the second preset time, the stripe data recovery process is terminated.
  • it further includes:
  • the second preset time period for starting stripe data recovery it can be determined whether the data and verification data have been successfully written into the corresponding stripe. If the data and verification data have been successfully written into the corresponding stripe, the data and verification data in the memory protected by the battery backup unit are deleted to free up memory space. If the data and verification data are not successfully written into the stripe after the third preset time period, the write abnormality event can be recorded so that the management personnel can troubleshoot and maintain the storage system accordingly.
  • it further includes:
  • the verification data is saved in the secure memory, and after the data is written to the disk, a write success signal is sent to the host.
  • some embodiments directly send a write success signal to the host after the verification data is saved in the memory protected by the battery backup unit and the data of the data blocks are written to the disk, instead of waiting for the verification data to be written to the disk before sending the write success signal to the host, thereby effectively improving the write performance of the internal RAID array of a single node.
  • the write data requested by the host is divided into stripes.
  • the write data is divided into stripes stripe0, stripe1 and stripe2, and each stripe has a data block strip and a check block parity.
  • the divided stripe is further split into write requests for multiple blocks in the stripe, and each data block strip is processed separately.
  • Each data block strip begins to be written to the disk, and the check block parity is obtained by XOR operation of multiple data block strips in the stripe; the check data of the check block parity is saved in the memory protected by the battery backup unit. After all data block strips are written, a write success signal is sent directly to the host.
  • the data recovery method for the RAID array provided by the present application, the data to be written to the stripe and the verification data will be saved in the memory protected by the battery backup unit. Even if the storage system loses power abnormally, when the battery backup unit is powered, it can ensure that the data to be written to the stripe and the verification data will be completely saved in the memory protected by the battery backup unit. Then, after the storage system is restored, the data can be read from the memory protected by the battery backup unit and rewritten to the stripe to restore the consistency of the stripe. The entire process does not require multiple reads and writes to the underlying file system, will not affect the performance of the storage system, and does not require additional hardware resources.
  • the present application also provides a data recovery device for a RAID array.
  • the device described below can be used together with the data recovery device described above.
  • FIG3 is a schematic diagram of a data recovery device for a RAID array provided by an embodiment of the present application. As shown in FIG3 , the device includes:
  • a first storage module 10 is used to store stripe information of stripes in the RAID array
  • the second storage module 20 is used to store the data to be written into the stripe and the verification data in the secure memory;
  • the secure memory is the memory that completes the data storage when the battery backup unit supplies power to the storage system after the storage system loses power;
  • the reading module 30 is used to read and verify data from the secure memory after the storage system is powered off and restored;
  • the writing module 40 is used to read the stripe information and write the read data and the verification data into the stripe according to the stripe information.
  • the first storage module 10 is specifically used for:
  • Stripe information is stored in each node of the I/O group.
  • the second storage module 20 is specifically used for:
  • the data and verification data are stored in the secure memory of the node corresponding to the write request in the I/O group.
  • a first determination module used to determine whether the secure memory is available
  • the reading module 30 reads the data and verifies the data from the secure memory.
  • the recovery module is used to recover the data not written into the stripe according to the data written into the stripe if the secure memory is unavailable.
  • a second judgment module is used to judge whether the RAID array has data recovery capability
  • the recovery module recovers the data not written into the stripe according to the data written into the stripe.
  • the second judgment module is specifically used to:
  • the first determination module is used to determine that if there is no failed disk in the RAID array, the RAID array has data recovery capability.
  • the third judgment module is used to judge whether the number of faulty disks exceeds the allowed value if there are faulty disks in the RAID array;
  • a second determination module is used to determine that the RAID array has data recovery capability if the number of failed disks does not exceed an allowable value
  • the third determination module is used to determine that the RAID array does not have data recovery capability if the number of failed disks exceeds an allowed value.
  • a fourth determination module used to determine whether the secure memory is restored to be usable after a first preset time period
  • the recovery module recovers the data not written to the stripe based on the data written to the stripe;
  • the reading module 30 reads the data and the verification data from the secure memory, and the writing module 40 writes the read data and the verification data into the corresponding stripe.
  • a fifth determination module configured to determine whether the secure memory is restored to be usable after a second preset time period if the RAID array does not have data recovery capability
  • An ending module used for ending the stripe data recovery process if the secure memory is not restored to be usable after the second preset time period
  • the reading module 30 reads the data and the verification data from the secure memory, and the writing module 40 writes the read data and the verification data into the corresponding stripe.
  • a sixth judgment module used to judge whether the data and the verification data are successfully written into the stripe after the third preset time period
  • the deletion module is used to delete the data and verification data in the secure memory if the data and verification data are successfully written into the stripe after a third preset time period.
  • the recording module is used to record a write abnormality event if the data and the verification data are not successfully written into the stripe after a third preset time period.
  • the first storage module 10 is specifically used for:
  • the stripe address of the stripe and/or the stripe number of the stripe are saved.
  • the first storage module 10 is specifically used for:
  • the backup module is used to back up the data and the verification data to another node in the I/O group.
  • the sending module is used to verify that the data is saved in the secure memory, and after the data is written to the disk, a write success signal is sent to the host.
  • the data to be written into the stripe and the verification data will be saved in the memory protected by the battery backup unit. Even if the storage system loses power abnormally, when the battery backup unit is powered, it can ensure that the data to be written into the stripe and the verification data will be completely saved in the memory protected by the battery backup unit. Then, after the storage system is restored, the data can be read from the memory protected by the battery backup unit and rewritten into the stripe to restore the stripe. Consistency. The entire process does not require multiple reads and writes to the underlying file system, does not affect the performance of the storage system, and does not require additional hardware resources.
  • the present application also provides a data recovery device for a RAID array.
  • the device includes a memory 1 and a processor 2 .
  • Memory 1 used for storing computer programs
  • Processor 2 is used to execute the computer program to implement the following steps:
  • the secure memory is the memory that completes data preservation when the storage system is powered off by the battery backup unit; after the storage system is restored, read the data and verification data from the secure memory; read the stripe information, and write the read data and verification data into the stripe according to the stripe information.
  • the present application also provides a non-volatile readable storage medium, on which a computer program is stored.
  • a computer program is stored on a non-volatile readable storage medium, on which a computer program is stored.
  • the secure memory is the memory that completes data preservation when the storage system is powered off by the battery backup unit; after the storage system is restored, read the data and verification data from the secure memory; read the stripe information, and write the read data and verification data into the stripe according to the stripe information.
  • the non-volatile readable storage medium may include: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk or an optical disk, and other media that can store program codes.
  • the steps of the method or algorithm described in conjunction with the embodiments disclosed herein may be implemented directly using hardware, a software module executed by a processor, or a combination of the two.
  • the software module may be placed in a random access memory (RAM), a memory, a read-only memory (ROM), an electrically programmable ROM, an electrically erasable programmable ROM, a register, a hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Quality & Reliability (AREA)
  • Computer Security & Cryptography (AREA)
  • Techniques For Improving Reliability Of Storages (AREA)

Abstract

本申请公开了一种RAID阵列的数据恢复方法,涉及存储技术领域,包括:保存RAID阵列中条带的条带信息;在安全内存中保存待写入所述条带的数据与校验数据;所述安全内存为存储系统掉电后,由电池备电单元为所述存储系统供电时,完成数据保存的内存;在所述存储系统恢复后,从所述安全内存读取所述数据与所述校验数据;读取所述条带信息,并根据所述条带信息,将读取的所述数据与所述校验数据写入所述条带。该方法能够解决写洞问题,且不会对存储系统的性能造成影响,也不需要增加硬件资源。本申请还公开了一种RAID阵列的数据恢复装置、设备以及计算机可读存储介质,均具有上述技术效果。

Description

一种RAID阵列的数据恢复方法及相关装置
相关申请的交叉引用
本申请要求于2022年11月29日提交中国专利局,申请号为202211508424.X,申请名称为“一种RAID阵列的数据恢复方法及相关装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及存储技术领域,特别涉及一种RAID阵列的数据恢复方法;还涉及一种RAID阵列的数据恢复装置、设备以及非易失性可读存储介质。
背景技术
RAID技术是存储领域中的重要技术,其使用条带、镜像和校验来保证数据可靠性。在提高I/O性能方面,业界多利用多控节点组成集群,主节点负责处理主机的I/O请求,辅助节点负责存储系统的后台任务(例如,RAID阵列初始化、巡检和重构任务等),以此来提高存储系统的I/O性能。在增加数据可靠性方面,目前业界多利用RAID阵列中的冗余磁盘来恢复故障盘的数据。然而如果RAID阵列在更新条带内部数据的时候,系统突然断电或出现其他故障,此时就会出现条带中的部分数据更新完成,而条带中的其他数据没有更新完成。因此当系统重启之后,条带中的数据是不完整的,条带处于数据不一致的情况,就会出现Write Hole问题,即写洞问题。
为了解决Write Hole问题,目前业界主要采取如下两种方法:1、采用文件系统的Journal(日志)的设计思想,实现写请求的原子处理;2、采用非易失性内存作为写缓存,达到原子写操作的目的。然而第一种方法需要多次读写底层文件系统,而多次读写底层文件系统会严重影响性能。第二种方法需要增加NVRAM硬件资源,且NVRAM硬件资源价格昂贵且存储资源有限,有限的存储资源无法大批量的存储数据。
发明内容
本申请的目的是提供一种RAID阵列的数据恢复方法,能够解决写洞问题,且不会对存储系统的性能造成影响,也不需要增加硬件资源。本申请的另一个目的是提供一种RAID阵列的数据恢复装置、设备以及非易失性可读存储介质,均具有上述技术效果。
为解决上述技术问题,本申请提供了一种RAID阵列的数据恢复方法,包括:
保存RAID阵列中条带的条带信息;
在安全内存中保存待写入条带的数据与校验数据;安全内存为存储系统掉电后,由电池备电单元为存储系统供电时,完成数据保存的内存;
在存储系统恢复后,从安全内存读取数据与校验数据;
读取条带信息,并根据条带信息,将读取的数据与校验数据写入条带。
可选的,保存条带的条带信息包括:
在I/O群组的各个节点中保存条带信息。
可选的,在安全内存中保存待写入条带的数据与校验数据包括:
在I/O群组中的写请求对应的节点的安全内存中保存数据与校验数据。
可选的,从安全内存读取数据与校验数据前还包括:
判断安全内存是否可用;
若安全内存可用,则从安全内存读取数据与校验数据。
可选的,还包括:
若安全内存不可用,则根据已写入条带的数据恢复未写入条带的数据。
可选的,根据已写入条带的数据恢复未写入条带的数据前还包括:
判断RAID阵列是否具有数据恢复能力;
若RAID阵列具有数据恢复能力,则根据已写入条带的数据恢复未写入条带的数据。
可选的,判断RAID阵列是否具有数据恢复能力包括:
判断RAID阵列是否有故障盘;
若RAID阵列没有故障盘,则RAID阵列具有数据恢复能力。
可选的,还包括:
若RAID阵列有故障盘,则判断故障盘的个数是否超出允许值;
若故障盘的个数未超出允许值,则RAID阵列具有数据恢复能力;
若故障盘的个数超出允许值,则RAID阵列不具有数据恢复能力。
可选的,根据已写入条带的数据恢复未写入条带的数据前还包括:
判断第一预设时长后安全内存是否恢复可用;
若第一预设时长后安全内存未恢复可用,则根据已写入条带的数据恢复未写入条带的数据;
若第一预设时长后安全内存恢复可用,则从安全内存读取数据与校验数据,并将读取的数据与校验数据写入对应的条带。
可选的,还包括:
若RAID阵列不具有数据恢复能力,则判断第二预设时长后安全内存是否恢复可用;
若第二预设时长后安全内存未恢复可用,则结束条带数据恢复流程;
若第二预设时长后安全内存恢复可用,则从安全内存读取数据与校验数据,并将读取的数据与校验数据写入对应的条带。
可选的,还包括:
判断第三预设时长后数据与校验数据是否成功写入条带;
若第三预设时长后数据与校验数据成功写入条带,则删除安全内存中的数据与校验数据。
可选的,还包括:
若第三预设时长后数据与校验数据未成功写入条带,则记录写入异常事件。
可选的,保存RAID阵列中条带的条带信息包括:
保存条带的条带地址和/或条带的条带编号。
可选的,保存RAID阵列中条带的条带信息包括:
在安全内存保存条带信息。
可选的,还包括:
将数据与校验数据备份到I/O群组中的另一个节点。
可选的,还包括:
校验数据保存到安全内存中,且数据写入磁盘后,向主机发送写成功信号。
为解决上述技术问题,本申请还提供了一种RAID阵列的数据恢复装置,包括:
第一保存模块,用于保存RAID阵列中条带的条带信息;
第二保存模块,用于在安全内存中保存待写入条带的数据与校验数据;安全内存为存储系统掉电后,由电池备电单元为存储系统供电时,完成数据保存的内存;
读取模块,用于在存储系统掉电恢复后,从安全内存读取数据与校验数据;
写入模块,用于读取条带信息,并根据条带信息,将读取的数据与校验数据写入条带。
为解决上述技术问题,本申请还提供了一种RAID阵列的数据恢复设备,包括:
存储器,用于存储计算机程序;
处理器,用于执行计算机程序时实现如上任一项的RAID阵列的数据恢复方法的步骤。
为解决上述技术问题,本申请还提供了一种非易失性可读存储介质,非易失性可读存储介质上存储有计算机程序,计算机程序被处理器执行时实现如上任一项的RAID阵列的数据恢复方法的步骤。
本申请所提供的RAID阵列的数据恢复方法,包括:保存RAID阵列中条带的条带信息;在安全内存中保存待写入条带的数据与校验数据;安全内存为存储系统掉电后,由电池备电单元为存储系统供电时,完成数据保存的内存;在存储系统恢复后,从安全内存读取数据与校验数据;读取条带信息,并根据条带信息,将读取的数据与校验数据写入条带。
可见,本申请所提供的RAID阵列的数据恢复方法,待写入条带的数据与校验数据会保存到电池备电单元保护的内存中,即使存储系统发生异常掉电,在电池备电单元供电的情况下,也可以确保待写入条带的数据与校验数据均会完整的保存到电池备电单元保护的内存中,进而在存储系统恢复后可以从电池备电单元保护的内存中读取数据并重新写入条带,恢复条带的一致性。整个过程不需要多次读写底层文件系统,不会对存储系统的性能造成影响,也不需要增加硬件资源。
本申请所提供的RAID阵列的数据恢复装置、设备以及非易失性可读存储介质均具有上述技术效果。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对现有技术和实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请实施例所提供的一种RAID阵列的数据恢复方法的流程示意图;
图2为本申请实施例所提供的一种主机I/O按条带切分的示意图;
图3为本申请实施例所提供的一种RAID阵列的数据恢复装置的示意图;
图4为本申请实施例所提供的一种RAID阵列的数据恢复设备的示意图。
具体实施方式
本申请的核心是提供一种RAID阵列的数据恢复方法,能够解决写洞问题,且不会对存储系统的性能造成影响,也不需要增加硬件资源。本申请的另一个核心是提供一种RAID阵列的数据恢复装置、设备以及非易失性可读存储介质,均具有上述技术效果。
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有 做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
请参考图1,图1为本申请实施例所提供的一种RAID阵列的数据恢复方法的流程示意图,参考图1所示,该方法包括:
S101:保存RAID阵列中条带的条带信息;
对于正在写入数据的条带,保存该条带的条带信息,以便基于此条带信息恢复条带的一致性。对于条带信息的具体类型,可以进行差异性设置,能够据此针对性的恢复条带数据即可。
在一些实施例中,保存RAID阵列中条带的条带信息包括:
保存条带的条带地址和/或条带的条带编号;
条带信息可以为条带地址。在恢复条带数据时,可以根据所保存的条带地址,对相应的条带重新写入数据。条带信息还可以为条带编号。在恢复条带数据时,可以根据所保存的条带编号,对相应的条带重新写入数据。
对于条带信息保存的具体位置,同样可以差异性设置。
在一些实施例中,保存RAID阵列中条带的条带信息包括:
在I/O群组的各个节点中保存条带信息。
为了保证存储系统的高可用性,使用两个节点组成一个I/OGROUP即I/O群组,I/O群组中的两个节点互为对端节点,一个或多个I/O群组组成集群,集群中节点可相互通信。一些实施例在保存条带的条带信息时,将条带信息保存到I/O群组的两个节点,后续从I/O群组的任意一个节点都可以读取到条带信息,确保能够获取到条带信息。可选地,条带信息可以保存在电池备电单元保护的内存中。
S102:在安全内存中保存待写入条带的数据与校验数据;安全内存为存储系统掉电后,由电池备电单元为存储系统供电完成数据保存的内存;
区别于传统的基于文件系统与基于非易失性内存的恢复条带数据的方式,一些实施例基于安全内存即电池备电单元保护的内存进行条带数据恢复。电池备电单元保护的内存是指在存储系统掉电后,由电池备电单元为存储继续供电,在电池备电单元供电的情况下,完成数据保存的内存。对于要写入条带的数据与校验数据,一些实施例将其均保存到电池备电单元保护的内存中。即使存储系统发生掉电故障,这些数据与校验数据也能够正常保存到电池备电单元保护的内存中。
其中,一些实施例中,在安全内存中保存待写入条带的数据与校验数据包括:
在I/O群组中的写请求对应的节点的安全内存中保存数据与校验数据。
一些实施例在保存待写入条带的数据与校验数据时,只将其保存到I/O群组中的写请求对应的节点,I/O群组中的另一个节点不保存待写入条带的数据与校验数据。
可选地,在一些实施例中,还可以包括:
将数据与校验数据备份到I/O群组中的另一个节点。以便后续在写请求对应的节点无法读取到保存的数据与校验数据时,可以从另外一个节点处读取到保存的数据与校验数据。
S103:在存储系统恢复后,从安全内存读取数据与校验数据;
S104:读取条带信息,并根据条带信息,将读取的数据与校验数据写入条带。
存储系统恢复后,从电池备电单元保护的内存中读取数据与校验数据,并根据读取的条带信息将数据与校验数据重新写入相应的条带,恢复该条带一致性。
在一些实施例中,从安全内存读取数据与校验数据前还可以包括:
判断安全内存是否可用;
若安全内存可用,则从安全内存读取数据与校验数据。
读取电池备电单元保护的内存中保存的数据与校验数据的前提是电池备电单元保护的内存可用。可用是指可以从电池备电单元保护的内存中读取数据与校验数据且数据与校验有效。如果可用,则从电池备电单元保护的内存中读取数据与校验数据。如果不可用,可以结束本次的条带数据恢复流程,也可以可选地采取其他方式恢复条带数据。
其中,在一些实施例中,若安全内存不可用,则根据已写入条带的数据恢复未写入条带的数据。
一些实施例在电池备电单元保护的内存不可用的情况下,可以采取备用方案,根据存储系统掉电时已写入条带的数据恢复未写入条带的数据,以此恢复条带一致性。
例如,根据已写入的数据重构校验数据。或者,根据已写入的数据与校验数据恢复未写入的数据。
其中,在一些实施例中,根据存储系统掉电时已写入条带的数据恢复未写入条带的数据前还包括:
判断RAID阵列是否具有数据恢复能力;
若RAID阵列具有数据恢复能力,则根据存储系统掉电时已写入条带的数据恢复未写入条带的数据。
根据存储系统掉电时写入条带的数据恢复未写入条带的数据是在RAID阵列具有数据恢复能力时实现的。如果RAID阵列不具有数据恢复能力,则无法根据存储系统掉电时已写入 条带的数据恢复未写入条带的数据。
其中,判断RAID阵列是否具有数据恢复能力可以包括:
判断RAID阵列是否有故障盘;
若RAID阵列没有故障盘,则RAID阵列具有数据恢复能力。
如果RAID阵列没有故障盘,表明能够从每个数据盘读出数据,故此时可以根据存储系统掉电时已写入条带的数据恢复未写入条带的数据。如果RAID阵列有故障盘,则可能无法实现根据存储系统掉电时已写入条带的数据恢复未写入条带的数据。因此,在一些实施例中,若RAID阵列有故障盘,则判断故障盘的个数是否超出允许值;若故障盘的个数未超出允许值,则RAID阵列具有数据恢复能力;若故障盘的个数超出允许值,则RAID阵列不具有数据恢复能力。
例如,对于RAID5阵列,如果故障盘的个数超出一个,则其不具有数据恢复能力,此时无法实现根据存储系统掉电时已写入条带的数据恢复未写入条带的数据。
例如,对于RAID6阵列,如果故障盘的个数超出两个,则其不具有数据恢复能力,此时无法实现根据存储系统掉电时已写入条带的数据恢复未写入条带的数据。
此外,在一些实施例中,根据已写入条带的数据恢复未写入条带的数据前还包括:
判断第一预设时长后安全内存是否恢复可用;
若第一预设时长后安全内存未恢复可用,则根据掉电时已写入条带的数据恢复未写入条带的数据;
若第一预设时长后安全内存恢复可用,则从安全内存读取数据与校验数据,并将读取的数据与校验数据写入对应的条带。
电池备电单元保护的内存不可用可能是由于某些原因导致的暂时性不可用,故可以在电池备电单元保护的内存不可用时,等待一段时间,如果一段时间后,电池备电单元保护的内存恢复可用,则从安全内存读取数据与校验数据,并将读取的数据与校验数据写入对应的条带。如果一段时间后,电池备电单元保护的内存仍不可用,则根据存储系统掉电时已写入条带的数据恢复未写入条带的数据。
可选地,在一些实施例中,还包括:
若RAID阵列不具有数据恢复能力,则判断第二预设时长后安全内存是否恢复可用;
若第二预设时长后安全内存未恢复可用,则结束条带数据恢复流程;
若第二预设时长后安全内存恢复可用,则从安全内存读取数据与校验数据,并将读取的数据与校验数据写入对应的条带。
一些实施例在判断出RAID阵列不具有数据恢复能力时,等待第二预设时长。如果第二预设时长后电池备电单元保护的内存恢复可用,则从安全内存读取数据与校验数据,并将读取的数据与校验数据写入对应的条带。如果第二预设时长后电池备电单元保护的内存仍不可用,则结束本次的条带数据恢复流程。
可选地,在一些实施例中,还包括:
判断第三预设时长后数据与校验数据是否成功写入条带;
若第三预设时长后数据与校验数据成功写入条带,则删除安全内存中的数据与校验数据。
可以在启动条带数据恢复的第二预设时长后,判断数据与校验数据是否成功写入了对应的条带。如果数据与校验数据已经成功写入了对应的条带,则删除电池备电单元保护的内存中的该数据与校验数据,以释放内存空间。如果第三预设时长后数据与校验数据未成功写入条带,则可以记录写入异常事件,以便管理人员据此排查故障,维护存储系统。
可选地,在一些实施例中,还包括:
校验数据保存到安全内存中,且数据写入磁盘后,向主机发送写成功信号。
为了提高RAID阵列的写性能,一些实施例在将校验数据保存到电池备电单元保护的内存中且数据分块的数据均写入磁盘后,就直接向主机发送写成功信号,而不再等待校验数据写入磁盘后才向主机发送写成功信号,从而可以有效提高单节点内部RAID阵列的写性能。
具体而言,参考图2所示,主机请求的写数据按条带切分,图2中写数据切分为条带stripe0、stripe1和stripe2,每个条带都有数据分块strip和校验分块parity。在写数据按条带切分的基础上,将切分后的条带stripe,继续拆分成条带中多个分块的写请求,每个数据分块strip单独处理。每个数据分块strip开始写入磁盘中,并由条带stripe中的多个数据分块strip异或运算得出校验分块parity;将校验分块parity的校验数据保存到电池备电单元保护的内存中。待所有数据分块strip都写完成之后,直接向主机发送写成功信号。
综上,本申请所提供的RAID阵列的数据恢复方法,待写入条带的数据与校验数据会保存到电池备电单元保护的内存中,即使存储系统发生异常掉电,在电池备电单元供电的情况下,也可以确保待写入条带的数据与校验数据均会完整的保存到电池备电单元保护的内存中,进而在存储系统恢复后可以从电池备电单元保护的内存中读取数据并重新写入条带,恢复条带的一致性。整个过程不需要多次读写底层文件系统,不会对存储系统的性能造成影响,也不需要增加硬件资源。
本申请还提供了一种RAID阵列的数据恢复装置,下文描述的该装置可以与上文描述的 方法相互对应参照。请参考图3,图3为本申请实施例所提供的一种RAID阵列的数据恢复装置的示意图,结合图3所示,该装置包括:
第一保存模块10,用于保存RAID阵列中条带的条带信息;
第二保存模块20,用于在安全内存中保存待写入条带的数据与校验数据;安全内存为存储系统掉电后,由电池备电单元为存储系统供电时,完成数据保存的内存;
读取模块30,用于在存储系统掉电恢复后,从安全内存读取数据与校验数据;
写入模块40,用于读取条带信息,并根据条带信息,将读取的数据与校验数据写入条带。
在上述一些实施例的基础上,作为一种具体的实施方式,第一保存模块10具体用于:
在I/O群组的各个节点中保存条带信息。
在上述一些实施例的基础上,作为一种具体的实施方式,第二保存模块20具体用于:
在I/O群组中的写请求对应的节点的安全内存中保存数据与校验数据。
在上述一些实施例的基础上,作为一种具体的实施方式,还包括:
第一判断模块,用于判断安全内存是否可用;
若安全内存可用,则读取模块30从安全内存读取数据与校验数据。
在上述一些实施例的基础上,作为一种具体的实施方式,还包括:
恢复模块,用于若安全内存不可用,则根据已写入条带的数据恢复未写入条带的数据。
在上述一些实施例的基础上,作为一种具体的实施方式,还包括:
第二判断模块,用于判断RAID阵列是否具有数据恢复能力;
若RAID阵列具有数据恢复能力,则恢复模块根据已写入条带的数据恢复未写入条带的数据。
在上述一些实施例的基础上,作为一种具体的实施方式,第二判断模块具体用于:
判断RAID阵列是否有故障盘;
第一确定模块,用于若RAID阵列没有故障盘,则RAID阵列具有数据恢复能力。
在上述一些实施例的基础上,作为一种具体的实施方式,还包括:
第三判断模块,用于若RAID阵列有故障盘,则判断故障盘的个数是否超出允许值;
第二确定模块,用于若故障盘的个数未超出允许值,则RAID阵列具有数据恢复能力;
第三确定模块,用于若故障盘的个数超出允许值,则RAID阵列不具有数据恢复能力。
在上述一些实施例的基础上,作为一种具体的实施方式,还包括:
第四判断模块,用于判断第一预设时长后安全内存是否恢复可用;
若第一预设时长后安全内存未恢复可用,则恢复模块根据已写入条带的数据恢复未写入条带的数据;
若第一预设时长后安全内存恢复可用,则读取模块30从安全内存读取数据与校验数据,写入模块40将读取的数据与校验数据写入对应的条带。
在上述一些实施例的基础上,作为一种具体的实施方式,还包括:
第五判断模块,用于若RAID阵列不具有数据恢复能力,则判断第二预设时长后安全内存是否恢复可用;
结束模块,用于若第二预设时长后安全内存未恢复可用,则结束条带数据恢复流程;
若第二预设时长后安全内存恢复可用,则读取模块30从安全内存读取数据与校验数据,写入模块40将读取的数据与校验数据写入对应的条带。
在上述一些实施例的基础上,作为一种具体的实施方式,还包括:
第六判断模块,用于判断第三预设时长后数据与校验数据是否成功写入条带;
删除模块,用于若第三预设时长后数据与校验数据成功写入条带,则删除安全内存中的数据与校验数据。
在上述一些实施例的基础上,作为一种具体的实施方式,还包括:
记录模块,用于若第三预设时长后数据与校验数据未成功写入条带,则记录写入异常事件。
在上述一些实施例的基础上,作为一种具体的实施方式,第一保存模块10具体用于:
保存条带的条带地址和/或条带的条带编号。
在上述一些实施例的基础上,作为一种具体的实施方式,第一保存模块10具体用于:
在安全内存保存条带信息。
在上述一些实施例的基础上,作为一种具体的实施方式,还包括:
备份模块,用于将数据与校验数据备份到I/O群组中的另一个节点。
在上述一些实施例的基础上,作为一种具体的实施方式,还包括:
发送模块,用于校验数据保存到安全内存中,且数据写入磁盘后,向主机发送写成功信号。
本申请所提供的RAID阵列的数据恢复装置,待写入条带的数据与校验数据会保存到电池备电单元保护的内存中,即使存储系统发生异常掉电,在电池备电单元供电的情况下,也可以确保待写入条带的数据与校验数据均会完整的保存到电池备电单元保护的内存中,进而在存储系统恢复后可以从电池备电单元保护的内存中读取数据并重新写入条带,恢复条带的 一致性。整个过程不需要多次读写底层文件系统,不会对存储系统的性能造成影响,也不需要增加硬件资源。
本申请还提供了一种RAID阵列的数据恢复设备,参考图4所示,该设备包括存储器1和处理器2。
存储器1,用于存储计算机程序;
处理器2,用于执行计算机程序实现如下的步骤:
保存RAID阵列中条带的条带信息;在安全内存中保存待写入条带的数据与校验数据;安全内存为存储系统掉电后,由电池备电单元为存储系统供电时,完成数据保存的内存;在存储系统恢复后,从安全内存读取数据与校验数据;读取条带信息,并根据条带信息,将读取的数据与校验数据写入条带。
对于本申请所提供的设备的介绍请参照上述方法实施例,本申请在此不做赘述。
本申请还提供了一种非易失性可读存储介质,该非易失性可读存储介质上存储有计算机程序,计算机程序被处理器执行时可实现如下的步骤:
保存RAID阵列中条带的条带信息;在安全内存中保存待写入条带的数据与校验数据;安全内存为存储系统掉电后,由电池备电单元为存储系统供电时,完成数据保存的内存;在存储系统恢复后,从安全内存读取数据与校验数据;读取条带信息,并根据条带信息,将读取的数据与校验数据写入条带。
该非易失性可读存储介质可以包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
对于本申请所提供的非易失性可读存储介质的介绍请参照上述方法实施例,本申请在此不做赘述。
说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似部分互相参见即可。对于实施例公开的装置、设备以及非易失性可读存储介质而言,由于其与实施例公开的方法相对应,所以描述的比较简单,相关之处参见方法部分说明即可。
专业人员还可以进一步意识到,结合本文中所公开的实施例描述的各示例的单元及算法 步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
结合本文中所公开的实施例描述的方法或算法的步骤可以直接用硬件、处理器执行的软件模块,或者二者的结合来实施。软件模块可以置于随机存储器(RAM)、内存、只读存储器(ROM)、电可编程ROM、电可擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM、或技术领域内所公知的任意其它形式的存储介质中。
以上对本申请所提供的RAID阵列的数据恢复方法、装置、设备以及非易失性可读存储介质进行了详细介绍。本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想。应当指出,对于本技术领域的普通技术人员来说,在不脱离本申请原理的前提下,还可以对本申请进行若干改进和修饰,这些改进和修饰也落入本申请权利要求的保护范围。

Claims (20)

  1. 一种RAID阵列的数据恢复方法,其特征在于,包括:
    保存RAID阵列中条带的条带信息;
    在安全内存中保存待写入所述条带的数据与校验数据;所述安全内存为存储系统掉电后,由电池备电单元为所述存储系统供电时,完成数据保存的内存;
    在所述存储系统恢复后,从所述安全内存读取所述数据与所述校验数据;
    读取所述条带信息,并根据所述条带信息,将读取的所述数据与所述校验数据写入所述条带。
  2. 根据权利要求1所述的RAID阵列的数据恢复方法,其特征在于,所述保存RAID阵列中条带的条带信息包括:
    在I/O群组的各个节点中保存所述条带信息。
  3. 根据权利要求1所述的RAID阵列的数据恢复方法,其特征在于,所述在安全内存中保存待写入所述条带的数据与校验数据包括:
    在I/O群组中的写请求对应的节点的所述安全内存中保存所述数据与所述校验数据。
  4. 根据权利要求1所述的RAID阵列的数据恢复方法,其特征在于,所述从所述安全内存读取所述数据与所述校验数据前还包括:
    判断所述安全内存是否可用;
    若所述安全内存可用,则从所述安全内存读取所述数据与所述校验数据。
  5. 根据权利要求4所述的RAID阵列的数据恢复方法,其特征在于,还包括:
    若所述安全内存不可用,则根据已写入所述条带的数据恢复未写入所述条带的数据。
  6. 根据权利要求5所述的RAID阵列的数据恢复方法,其特征在于,所述根据已写入所述条带的数据恢复未写入所述条带的数据前还包括:
    判断RAID阵列是否具有数据恢复能力;
    若所述RAID阵列具有数据恢复能力,则根据已写入所述条带的数据恢复未写入所述条带的数据。
  7. 根据权利要求6所述的RAID阵列的数据恢复方法,其特征在于,所述判断所述RAID阵列是否具有数据恢复能力包括:
    判断所述RAID阵列是否有故障盘;
    若所述RAID阵列没有故障盘,则所述RAID阵列具有数据恢复能力。
  8. 根据权利要求7所述的RAID阵列的数据恢复方法,其特征在于,还包括:
    若所述RAID阵列有故障盘,则判断故障盘的个数是否超出允许值;
    若所述故障盘的个数未超出所述允许值,则所述RAID阵列具有数据恢复能力;
    若所述故障盘的个数超出所述允许值,则所述RAID阵列不具有数据恢复能力。
  9. 根据权利要求5所述的RAID阵列的数据恢复方法,其特征在于,所述根据已写入所述条带的数据恢复未写入所述条带的数据前还包括:
    判断第一预设时长后所述安全内存是否恢复可用;
    若所述第一预设时长后所述安全内存未恢复可用,则根据已写入所述条带的数据恢复未写入所述条带的数据;
    若所述第一预设时长后所述安全内存恢复可用,则从所述安全内存读取所述数据与所述校验数据,并将读取的所述数据与所述校验数据写入对应的所述条带。
  10. 根据权利要求6所述的RAID阵列的数据恢复方法,其特征在于,还包括:
    若所述RAID阵列不具有数据恢复能力,则判断第二预设时长后所述安全内存是否恢复可用;
    若所述第二预设时长后所述安全内存未恢复可用,则结束条带数据恢复流程;
    若所述第二预设时长后所述安全内存恢复可用,则从所述安全内存读取所述数据与所述校验数据,并将读取的所述数据与所述校验数据写入对应的所述条带。
  11. 根据权利要求1所述的RAID阵列的数据恢复方法,其特征在于,还包括:
    判断第三预设时长后所述数据与所述校验数据是否成功写入所述条带;
    若第三预设时长后所述数据与所述校验数据成功写入所述条带,则删除所述安全内存中的所述数据与所述校验数据。
  12. 根据权利要求11所述的RAID阵列的数据恢复方法,其特征在于,还包括:
    若第三预设时长后所述数据与所述校验数据未成功写入所述条带,则记录写入异常事件。
  13. 根据权利要求1所述的RAID阵列的数据恢复方法,其特征在于,所述保存RAID阵列中条带的条带信息包括:
    保存所述条带的条带地址和/或所述条带的条带编号。
  14. 根据权利要求1所述的RAID阵列的数据恢复方法,其特征在于,所述保存RAID阵列中条带的条带信息包括:
    在所述安全内存保存所述条带信息。
  15. 根据权利要求3所述的RAID阵列的数据恢复方法,其特征在于,还包括:
    将所述数据与所述校验数据备份到所述I/O群组中的另一个节点。
  16. 根据权利要求1所述的RAID阵列的数据恢复方法,其特征在于,还包括:
    所述校验数据保存到所述安全内存中,且所述数据写入磁盘后,向主机发送写成功信号。
  17. 根据权利要求16所述的RAID阵列的数据恢复方法,其特征在于,所述校验数据保存到所述安全内存中,且所述数据写入磁盘后,向主机发送写成功信号的步骤,包括:
    所述校验数据保存到所述安全内存中,且数据分块的数据写入磁盘后,向主机发送写成功信号;其中,所述数据分块的数据为按所述条带切分的数据。
  18. 一种RAID阵列的数据恢复装置,其特征在于,包括:
    第一保存模块,用于保存RAID阵列中条带的条带信息;
    第二保存模块,用于在安全内存中保存待写入所述条带的数据与校验数据;所述安全内存为存储系统掉电后,由电池备电单元为所述存储系统供电时,完成数据保存的内存;
    读取模块,用于在所述存储系统掉电恢复后,从所述安全内存读取所述数据与所述校验数据;
    写入模块,用于读取所述条带信息,并根据所述条带信息,将读取的所述数据与所述校验数据写入所述条带。
  19. 一种RAID阵列的数据恢复设备,其特征在于,包括:
    存储器,用于存储计算机程序;
    处理器,用于执行所述计算机程序时实现如权利要求1至17任一项所述的RAID阵列的数据恢复方法的步骤。
  20. 一种非易失性可读存储介质,其特征在于,所述非易失性可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1至17任一项所述的RAID阵列的数据恢复方法的步骤。
PCT/CN2023/093953 2022-11-29 2023-05-12 一种raid阵列的数据恢复方法及相关装置 WO2024113685A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211508424.X 2022-11-29
CN202211508424.XA CN115599607B (zh) 2022-11-29 2022-11-29 一种raid阵列的数据恢复方法及相关装置

Publications (1)

Publication Number Publication Date
WO2024113685A1 true WO2024113685A1 (zh) 2024-06-06

Family

ID=84853336

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/093953 WO2024113685A1 (zh) 2022-11-29 2023-05-12 一种raid阵列的数据恢复方法及相关装置

Country Status (2)

Country Link
CN (1) CN115599607B (zh)
WO (1) WO2024113685A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115878047B (zh) * 2023-01-19 2023-06-16 苏州浪潮智能科技有限公司 一种数据一致性校验方法、装置、设备及存储介质
CN117707854B (zh) * 2023-12-22 2024-07-02 深圳奥束科技有限公司 一种读取ic卡信息异常时的自恢复方法及装置

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104035830A (zh) * 2014-06-24 2014-09-10 浙江宇视科技有限公司 一种数据恢复方法和装置
CN106528001A (zh) * 2016-12-05 2017-03-22 北京航空航天大学 一种基于非易失性存储器和软件raid的缓存系统
CN113391947A (zh) * 2021-06-22 2021-09-14 深圳忆联信息系统有限公司 Ssd raid条带掉电快速恢复方法、装置、计算机设备及存储介质
US20220350703A1 (en) * 2019-10-18 2022-11-03 Inspur Suzhou Intelligent Technology Co., Ltd. Write hole protection method and system for raid, and storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100456253C (zh) * 2005-12-28 2009-01-28 英业达股份有限公司 存储系统的高速缓存数据的保护方法
CN104881242A (zh) * 2014-02-28 2015-09-02 中兴通讯股份有限公司 数据写入方法及装置
US9921914B2 (en) * 2015-11-03 2018-03-20 Intel Corporation Redundant array of independent disks (RAID) write hole solutions
US11467777B1 (en) * 2020-10-12 2022-10-11 iodyne, LLC Method and system for storing data in portable storage devices

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104035830A (zh) * 2014-06-24 2014-09-10 浙江宇视科技有限公司 一种数据恢复方法和装置
CN106528001A (zh) * 2016-12-05 2017-03-22 北京航空航天大学 一种基于非易失性存储器和软件raid的缓存系统
US20220350703A1 (en) * 2019-10-18 2022-11-03 Inspur Suzhou Intelligent Technology Co., Ltd. Write hole protection method and system for raid, and storage medium
CN113391947A (zh) * 2021-06-22 2021-09-14 深圳忆联信息系统有限公司 Ssd raid条带掉电快速恢复方法、装置、计算机设备及存储介质

Also Published As

Publication number Publication date
CN115599607B (zh) 2023-06-16
CN115599607A (zh) 2023-01-13

Similar Documents

Publication Publication Date Title
US8356292B2 (en) Method for updating control program of physical storage device in storage virtualization system and storage virtualization controller and system thereof
WO2024113685A1 (zh) 一种raid阵列的数据恢复方法及相关装置
US6990611B2 (en) Recovering data from arrays of storage devices after certain failures
US6523087B2 (en) Utilizing parity caching and parity logging while closing the RAID5 write hole
US7809979B2 (en) Storage control apparatus and method
US8904129B2 (en) Method and apparatus for backup and restore in a dynamic chunk allocation storage system
US7721143B2 (en) Method for reducing rebuild time on a RAID device
US5819109A (en) System for storing pending parity update log entries, calculating new parity, updating the parity block, and removing each entry from the log when update is complete
US10303560B2 (en) Systems and methods for eliminating write-hole problems on parity-based storage resources during an unexpected power loss
US8402210B2 (en) Disk array system
US7103811B2 (en) Mechanisms for detecting silent errors in streaming media devices
US8225136B2 (en) Control method and storage device
US20040044705A1 (en) Optimized disk repository for the storage and retrieval of mostly sequential data
EP0768605A2 (en) Reconstructing data blocks in a RAID array data storage system having storage device metadata and RAIDset metada
JP2004118837A (ja) 耐故障性の記憶サブシステムにデータを格納するための方法、記憶サブシステムおよびそのシステムのためのデータ編成管理プログラム
WO2015058542A1 (zh) 独立冗余磁盘阵列的重构方法及装置
US10503620B1 (en) Parity log with delta bitmap
US5421003A (en) Disk storage system with fault tolerant media maintenance
US20040250028A1 (en) Method and apparatus for data version checking
EP3794451A1 (en) Parity log with by-pass
US7577804B2 (en) Detecting data integrity
WO2024113687A1 (zh) 一种数据恢复方法及相关装置
JP2001075741A (ja) ディスク制御システムおよびデータ保全方法
US7174476B2 (en) Methods and structure for improved fault tolerance during initialization of a RAID logical unit
US20140173337A1 (en) Storage apparatus, control method, and control program