CN115565598B - Data storage and repair method and system for temporary failure of RAID array disk - Google Patents

Data storage and repair method and system for temporary failure of RAID array disk Download PDF

Info

Publication number
CN115565598B
CN115565598B CN202211209814.7A CN202211209814A CN115565598B CN 115565598 B CN115565598 B CN 115565598B CN 202211209814 A CN202211209814 A CN 202211209814A CN 115565598 B CN115565598 B CN 115565598B
Authority
CN
China
Prior art keywords
list
target
raid array
disks
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211209814.7A
Other languages
Chinese (zh)
Other versions
CN115565598A (en
Inventor
麻昊志
傅智康
宫永生
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Technology and Engineering Center for Space Utilization of CAS
Original Assignee
Technology and Engineering Center for Space Utilization of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Technology and Engineering Center for Space Utilization of CAS filed Critical Technology and Engineering Center for Space Utilization of CAS
Priority to CN202211209814.7A priority Critical patent/CN115565598B/en
Publication of CN115565598A publication Critical patent/CN115565598A/en
Application granted granted Critical
Publication of CN115565598B publication Critical patent/CN115565598B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C29/00Checking stores for correct operation ; Subsequent repair; Testing stores during standby or offline operation
    • G11C29/04Detection or location of defective memory elements, e.g. cell constructio details, timing of test signals
    • G11C29/08Functional testing, e.g. testing during refresh, power-on self testing [POST] or distributed testing
    • G11C29/12Built-in arrangements for testing, e.g. built-in self testing [BIST] or interconnection details
    • G11C29/44Indication or identification of errors, e.g. for repair
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a data storage and repair method and a system for temporary failure of RAID array disks, wherein the method comprises the following steps: when the running state of the disks in the RAID array is changed, a first list containing all currently accessible disks is generated, and when all the disks are not contained in the first list, a degradation IO list is generated according to the first list, and whether the number of the disks in the first list meets a preset condition is judged; if yes, acquiring and generating a new second list according to all the disks contained in the second list and the first list, judging whether the number of the disks in the new second list meets the preset condition, and if yes, judging whether the number of the disks in the first list is larger than the number of the disks in the new second list; if yes, adopting a preset restoration mode to restore the data of the target RAID array. The invention improves the stability of data storage and data restoration process when the disk is temporarily invalid while the normal access service is not interrupted.

Description

Data storage and repair method and system for temporary failure of RAID array disk
Technical Field
The invention relates to the field of data storage, in particular to a data storage and repair method and system for temporary failure of RAID array disks.
Background
Disk failures exist in a variety of forms, namely permanent failures as well as temporary failures. Temporary failure refers to a failure mode in which the disk fails to provide normal access only for a period of time, but may later resume normal access. If a disk fails temporarily, the failed disk only loses data during the temporary failure period. The existing RAID data repair mode can reconstruct all data, and long-term shutdown of the system is caused. If the temporary failure rate is high, RAID will be in an unavailable state for a long time due to frequent rebuilding.
Aiming at the problems, a series of RAID data autonomous repair methods are provided in the industry at present, and the main method is to demote RAID when disk failure occurs, and set all write-in IO addresses occurring during the fault recording period of a demote IO list, the fault disk position and other information during writing. When RAID reading is carried out, firstly, whether the IO is located in the list is searched, if so, the IO data is rebuilt according to the disk fault condition during the IO writing by combining the data redundancy and the checking function of RAID, thereby realizing reliable data reading. And when the fault disk recovers the normal function, reconstructing the corresponding data of the IO in the recovery disk in the degraded IO list, and rewriting the recovered fault disk to realize autonomous repair.
Although the existing scheme can effectively reduce the system recovery data volume caused by temporary failure, the following problems exist: 1) After the disk fails and before the repair is completed, all read IOs need to search the degraded IO list firstly to access data, and the length of the table is expanded continuously along with the increase of the failure time, so that the search delay is increased, and the RAID access performance is seriously reduced. 2) The degraded IO list is necessary information for reliable reading of the RAID, and if the degraded IO list is stored in an additional reliable storage device, since the degraded IO list is continuously increased with the time of failure, a high-reliability memory with a larger capacity needs to be added in the system, so that the RAID cost is significantly increased. 3) If the degraded IO list is directly stored in RAID, writing IO is also generated by writing the degraded IO list, so that the degraded IO list is changed; in order to store the changed IO list, new written IO is caused, so that the writing process is repeatedly started, and the writing process can not be completed at all. If the IO information is written in the degraded IO list without recording, the correctness and the availability of the degraded IO list cannot be reliably judged when different disks temporarily fail.
Disclosure of Invention
In order to solve the technical problems, the invention provides a data storage and repair method and a system for temporary failure of RAID array disks.
The technical scheme of the data storage and repair method for temporary failure of RAID array disk is as follows:
s1, when a target RAID array is initialized to run, a first list for data writing and a second list for data reading are respectively set according to all disks of the target RAID array;
s2, when the running state of any disk in a target RAID array changes, updating a first list according to all currently accessible disks in the target RAID array, and when all disks in the target RAID array are not contained in the updated first list, generating at least one degradation IO list according to the updated first list, and judging whether the number of disks in the updated first list meets a first preset condition or not to obtain a first judgment result;
s3, when the first judging result is yes, acquiring and updating the second list according to a second list which is commonly contained in the first list before the running state of any disk is changed and all disks in the updated first list, and judging whether the number of the disks in the updated second list meets a second preset condition or not to obtain a second judging result;
s4, when the second judging result is yes, judging whether the number of the magnetic disks in the first list is larger than that in the new second list, and obtaining a third judging result;
and S5, when the third judging result is yes, repairing the data of the target RAID array by adopting a preset repairing mode, and when the repairing of the target RAID array is completed, updating the second list according to the first list so as to complete the repairing of the data of the target RAID array.
The data storage and repair method for temporary failure of RAID array disk has the following beneficial effects:
the method of the invention improves the stability of the data storage and data restoration process when the disk is temporarily invalid while the normal access service is not interrupted.
On the basis of the scheme, the data storage and repair method for temporary failure of the RAID array disk can be improved as follows.
Further, the preset repairing method includes:
based on a preset sequence, sequentially judging whether each degradation IO list contains all disks in the updated first list, obtaining a fourth judgment result corresponding to each degradation IO list, and determining each degradation IO list with the fourth judgment result being negative as an IO list to be processed;
and respectively reading target data corresponding to each IO information in each IO list to be processed, respectively rewriting each target data into the target RAID array, and deleting each IO list to be processed.
Further, the process of reading target data corresponding to any IO information in any IO list to be processed includes:
acquiring read data corresponding to target IO information from all disks of the new second list; the target IO information is as follows: any IO information in any IO list to be processed;
and acquiring target data in the read data corresponding to the target IO information through a RAID data redundancy and verification function.
Further, the process of rewriting the target data of any IO information into the corresponding disk of the target RAID array includes:
judging whether the updated first list contains all disks in the target RAID array or not to obtain a fifth judging result;
when the fifth judging result is yes, respectively writing target data corresponding to the target IO information into corresponding magnetic discs; the target IO information is as follows: and any IO information in any IO list to be processed.
Further, the process of rewriting the target data of any one piece of IO information into the corresponding disk of the target RAID array further includes:
and when the fifth judging result is negative, respectively writing the target data corresponding to the target IO information into the corresponding magnetic disk, and correspondingly storing the target IO information into a degradation IO list corresponding to the updated first list.
Further, the method further comprises the following steps:
and when the updated first list does not contain all the disks in the target RAID array, storing all the write IO information into a degradation IO list corresponding to the first list.
Further, the method further comprises the following steps:
when the target RAID array stops running, storing a first list and a second list corresponding to the target RAID array in a current running state;
when the target RAID array is powered up to run, the first list and the second list stored when the target RAID array stops running are loaded.
The technical scheme of the data storage and repair system for temporary failure of RAID array disk is as follows:
comprising the following steps: the device comprises an initialization module, a first processing module, a second processing module, a third processing module and an operation module;
the initialization module is used for: when a target RAID array is initialized to run, a first list for data writing and a second list for data reading are respectively set according to all disks of the target RAID array;
the first processing module is used for: when the running state of any disk in a target RAID array changes, updating a first list according to all currently accessible disks in the target RAID array, and when all disks in the target RAID array are not contained in the updated first list, generating at least one degradation IO list according to the updated first list, and judging whether the number of disks in the updated first list meets a first preset condition or not to obtain a first judgment result;
the second processing module is used for: when the first judging result is yes, acquiring and updating the second list according to a second list which is commonly contained in the running state of any disk and all disks in the updated first list, and judging whether the number of the disks in the updated second list meets a second preset condition or not to obtain a second judging result;
the third processing module is used for: when the second judging result is yes, judging whether the number of the magnetic disks in the first list is larger than the number of the magnetic disks in the new second list, and obtaining a third judging result;
the operation module is used for: and when the third judging result is yes, repairing the data of the target RAID array by adopting a preset repairing mode, and when the repairing of the target RAID array is completed, updating the second list according to the first list so as to complete the repairing of the data of the target RAID array.
The data storage and repair system for temporary failure of RAID array disk has the following beneficial effects:
the system of the invention improves the stability of data storage and data restoration process when the disk is temporarily invalid while the normal access service is not interrupted.
Based on the scheme, the data storage and repair system for temporary failure of RAID array disks can be improved as follows.
Further, the preset repairing method includes:
based on a preset sequence, sequentially judging whether each degradation IO list contains all disks in the updated first list, obtaining a fourth judgment result corresponding to each degradation IO list, and determining each degradation IO list with the fourth judgment result being negative as an IO list to be processed;
and respectively reading target data corresponding to each IO information in each IO list to be processed, respectively rewriting each target data into the target RAID array, and deleting each IO list to be processed.
Further, the process of reading target data corresponding to any IO information in any IO list to be processed includes:
acquiring read data corresponding to target IO information from all disks of the new second list; the target IO information is as follows: any IO information in any IO list to be processed;
and acquiring target data in the read data corresponding to the target IO information through a RAID data redundancy and verification function.
Drawings
FIG. 1 is a schematic flow chart of a method for data storage and repair of temporary failure of RAID array disks according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a target RAID array in a data storage and repair method for temporary failure of RAID array disks according to an embodiment of the present invention;
FIG. 3 is a first schematic diagram of a target RAID array disk failure in a method for temporarily disabling RAID array disks according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a target RAID array disk recovery and data unrecovered in a data storage and repair method for temporary failure of RAID array disks according to an embodiment of the present invention;
FIG. 5 is a second schematic diagram of a target RAID array disk failure in a method for temporarily disabling RAID array disks according to the present invention;
FIG. 6 is a schematic diagram of a storage state after data recovery of a target RAID array disk in a data storage and recovery method for temporary failure of a RAID array disk according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of a method for temporarily invalidating data storage and repair of RAID array disk according to an embodiment of the present invention under normal operation after repairing target RAID array disk data;
FIG. 8 is a schematic diagram of a data storage and repair system for temporary failure of RAID array disks according to an embodiment of the present invention.
Detailed Description
As shown in fig. 1, the data storage and repair method for temporary failure of a RAID array disk according to the embodiment of the present invention includes the following steps:
s1, when a target RAID array is initialized to run, a first list for data writing and a second list for data reading are respectively set according to all disks of the target RAID array.
The disks in the first list are used for writing data of the target RAID array, and the disks in the second list are used for reading data of the target RAID array. The first list is identical to the second list upon initialization.
Specifically, assuming that the target RAID array includes disks 1/2/3/4/5, the first list is disks 1/2/3/4/5 and the second list is disks 1/2/3/4/5.
S2, when the running state of any disk in a target RAID array changes, updating a first list according to all currently accessible disks in the target RAID array, and when all disks in the target RAID array are not contained in the updated first list, generating at least one degradation IO list according to the updated first list, and judging whether the number of disks in the updated first list meets a first preset condition or not to obtain a first judgment result.
The target RAID array is any type of RAID disk array, including but not limited to: RAID3, RAID5, RAID6, RAID7, etc. The running state comprises: normal operation or temporary failure. The first list is: the disk write list for the target RAID array includes all currently accessible disks in the current target RAID array. The first preset condition is: the number of disks in the first list is greater than or equal to the number of disks required by the RAID data redundancy and verification function.
The degraded IO list is stored in the target RAID array, and a specific location may be specified or configured by a program.
The process of obtaining at least one degraded IO list according to the first list comprises the following steps: first list
The associated degraded IO list can be a new list or an existing list. If it is newly built
Degrading the IO list, recording the available disks in the first list at the time of list creation. If it is already present
List, the available disks of the degraded IO list should be identical to the available disks in the first list.
Specifically, when the running state of any disk in the target RAID array changes, a RAID disk write-in list before the running state changes is cleared, a first list containing all currently accessible disks of the target RAID array is generated, when the first list does not contain all disks in the target RAID array, at least one degradation IO list is obtained according to the first list setting, whether the number of disks in the first list meets the number of disks required by RAID data redundancy and a checking function is judged, and a first judgment result is obtained.
It should be noted that, regarding the number of disks required for the RAID data redundancy and the verification function, the number of disks required for different types of target RAID arrays is different; for example, for RAID5, if more than 1 disk is damaged, then a failure is determined, otherwise valid; for RAID6, disk corruption of 2 blocks and above can be tolerated.
The target RAID array monitors the operation state of each disk during operation, and determines that the operation state changes when an abnormal (temporary failure) disk is found to occur or the abnormal (temporary failure) disk returns to normal. The method for monitoring the state of the magnetic disk comprises the following steps: detecting whether the disk is normally connected or whether the read-write time is overtime, etc.
And S3, when the first judgment result is yes, acquiring and updating the second list according to a second list which is commonly contained in the running state of any disk before the running state of the disk is changed and all disks in the updated first list, and judging whether the number of the disks in the updated second list meets a second preset condition or not to obtain a second judgment result.
For example, before the running state of the disk changes, the first list is: 1/2/3/4/5, the second list is: 1/2/3/4/5; at this time, if the disk 5 fails, the updated first list is 1/2/3/4, the second list before the running state changes is 1/2/3/4/5, and the updated second list is 1/2/3/4. When the magnetic disk 5 is recovered to be normal, the updated first list is 1/2/3/4/5, the second list before the running state is changed is 1/2/3/4 of the first list, and the updated second list is 1/2/3/4 of the second list.
The second preset condition is as follows: the number of disks in the new second list is greater than or equal to the number of disks required by the RAID data redundancy and verification function.
Specifically, when the number of disks in the first list meets the number of disks required by the RAID data redundancy and check function, a new second list is generated according to the first list and the second list, whether the number of disks in the new second list is larger than or equal to the number of disks required by the RAID data redundancy and check function is judged, and a second judgment result is obtained.
When the first judgment result is negative, the operation is temporarily stopped, and the operation is continued after the available disks are restored to the number of disks required by the RAID data redundancy and verification function.
And S4, when the second judging result is yes, judging whether the number of the magnetic disks in the first list is larger than that in the new second list, and obtaining a third judging result.
Specifically, when the number of disks in the new second list is greater than or equal to the number of disks required by the RAID data redundancy and verification function, whether the first list contains disks not existing in the second list or not is determined, and a third determination result is obtained.
When the second judgment result is no, the operation is temporarily stopped, and the operation is continued after the available disks are restored to the number of disks required by the RAID data redundancy and verification function.
And S5, when the third judging result is yes, repairing the data of the target RAID array by adopting a preset repairing mode, and when the repairing of the target RAID array is completed, updating the second list according to the first list so as to complete the repairing of the data of the target RAID array.
Wherein the updating of the second list according to the first list comprises: and after traversing all the degraded IO lists, resetting the second list according to the first list, and updating the disk in the second list to the disk in the first list. For example, assuming that the disk in the first list is 1/2/3 and the disk in the second list is 1/2, the second list is updated according to the first list, and the updated second list is 1/2/3.
Specifically, when the number of disks in the first list is greater than that in the new second list, performing data repair on the target RAID array by adopting a preset repair mode, and after the disk repair is completed, updating the second list (RAID disk read list) according to the first list (RAID disk write list).
When the third determination result is no, the process of performing the data repair is stopped.
Preferably, the preset repairing method includes:
based on a preset sequence, whether each degradation IO list contains all disks in the updated first list is sequentially judged, a fourth judgment result corresponding to each degradation IO list is obtained, and each degradation IO list with the fourth judgment result being negative is determined to be an IO list to be processed.
The preset sequence may be a sequence, a reverse sequence or other sequences, which are not limited herein.
The IO list to be processed is a degradation IO list needing to execute data restoration.
Specifically, traversing all the degradation IO lists, and sequentially judging whether the available disk of each degradation IO list contains all the disks in the first list to obtain a fourth judgment result of each degradation IO list; if any of the degraded IO lists does not contain all the disks in the first list, the fourth judging result of the degraded IO list is no, the degraded IO list is determined to be the to-be-processed IO list, and until each degraded IO list with the fourth judging result of no is determined to be the to-be-processed IO list. For example, the available disk in any degraded IO list is 1/2/4/5, the first list is 1/2/3/4/5. Wherein disk 3 is a disk that does not exist in the degraded IO list, and therefore the IO information in this list is repaired.
It should be noted that, when the fourth determination result of any degraded IO list is yes, the degraded IO list does not execute data repair.
And respectively reading target data corresponding to each IO information in each IO list to be processed, respectively rewriting each target data into the target RAID array, and deleting each IO list to be processed.
Each IO list to be processed contains at least one piece of IO information, and each piece of IO information comprises an IO address and a length.
It should be noted that, after any one of the IO information is read and written, the IO information record is deleted in the corresponding IO list to be processed. And deleting the degraded IO list after all IO information in any degraded IO list is deleted.
Preferably, the process of reading target data corresponding to any IO information in any IO list to be processed includes:
and acquiring the read data corresponding to the target IO information from all the disks in the new second list.
The target IO information is as follows: any IO information in any IO list to be processed;
specifically, the read target IO information is read data corresponding to all disks in the new second list (updated RAID disk read list).
And acquiring target data in the read data corresponding to the target IO information through a RAID data redundancy and verification function.
Specifically, when the updated second list does not include all the disks in the target RAID array, the target data in the read data corresponding to the target IO information is obtained through RAID data redundancy and verification function calculation.
Furthermore, since the target RAID array is in the repair process, there is no case where all disks in the target RAID array are contained in the new second list.
The target data refers to data actually written and read by the user. To improve reliability, the target RAID array may add some redundant information (i.e., read data including some redundant information) to the target data. When a failure occurs, if part of the target data is lost, the part of the lost data can be recovered through redundant information and the remaining valid data.
Preferably, the process of rewriting the target data of any IO information into the corresponding disk of the target RAID array includes:
judging whether the updated first list contains all disks in the target RAID array or not to obtain a fifth judging result;
and when the fifth judging result is yes, respectively writing the target data corresponding to the target IO information into the corresponding magnetic disk.
The target IO information is as follows: and any IO information in any IO list to be processed.
Specifically, when the first list includes all the disks in the target RAID array, target data corresponding to target IO information (including an IO address and a length) is written to the corresponding disks respectively.
Preferably, the process of rewriting the target data of any IO information into the corresponding disk of the target RAID array further includes:
and when the fifth judging result is negative, respectively writing the target data corresponding to the target IO information into the corresponding magnetic disk, and correspondingly storing the target IO information into a degradation IO list corresponding to the updated first list.
Specifically, when the first list does not include all the disks in the target RAID array, a fault disk exists in the target RAID array, target data corresponding to target IO information are respectively written into the corresponding disks, and the target IO information is correspondingly stored into a degradation IO list corresponding to the first list
Is a kind of medium. For example, when disk 2 fails, write data length 0x200 to array 0x100 address, then
Recording IO information as follows: the 0x100 address write data length is 0x200.
It should be noted that, before the process of determining whether the first list includes all the disks in the target RAID array, the method further includes: calculating the write-in data of each disk in the target RAID array according to RAID data redundancy and verification function requirements; and performing actual writing to the disks in the first list, and ignoring the rest disks.
Preferably, the method further comprises:
and when the updated first list does not contain all the disks in the target RAID array, storing all the written IO information into a degraded IO list corresponding to the updated first list.
Specifically, when the operation state of the disk in the target RAID array is normal (i.e. there is no temporary failure or normal recovery), the data is written in through the first list, and the data is read through the second list. For example, assuming a total of 5 disks, there are two cases: (1) the first list is: 1/2/3/4/5, the second list is: 1/2/3/4/5; (2) the first list is: 1/2/3/4, the second list is: 1/2/3/4.
Preferably, the method further comprises:
and when the target RAID array stops running, storing a first list and a second list corresponding to the target RAID array in the current running state.
When the target RAID array is powered up to run, the first list and the second list stored when the target RAID array stops running are loaded.
To better illustrate the technical solution of this embodiment, the following examples are used for illustration.
The target RAID array is composed of 5 disks, and correct data in any three disks can recover correct data of the other two disks.
(1) As shown in FIG. 2, the target RAID array is in normal access and there are no failed disks. All read and write accesses are now performed to disk 1/2/3/4/5, with the first list being 1/2/3/4/5 and the second list being 1/2/3/4/5.
(2) As shown in fig. 3, assuming that the disk 5 fails at this time, the first list is updated to 1/2/3/4, and a degraded IO list is associated; the second list is updated to a new second list, which is 1/2/3/4. The method comprises the steps that at the moment, data writing only carries out operation on 1/2/3/4 of a disk, and IO information is added in an associated degradation IO table; the reading of data is performed only for 1/2/3/4 of the disk at this time.
(3) Assuming that the disk 5 is restored at this point, the first list is updated to 1/2/3/4/5 and the new second list remains 1/2/3/4 until autonomous repair is completed, as shown in FIG. 4. At this time, the writing of data performs an operation on the disk 1/2/3/4/5 without adding a degraded IO table record. Before autonomous repair is not completed, the reading of data is still performed only on disk 1/2/3/4.
(4) As shown in fig. 5, assuming that disk 4 failed before completing the data repair, the first list is updated to 1/2/3/5 and a degraded IO list is associated; the new second list is updated to 1/2/3. At the moment, the writing of the data only carries out operation on 1/2/3/5 of the disk, and IO information is added in the associated degradation IO table; the reading of data is performed only for 1/2/3 of the disk at this time.
(5) At this time, the first list length is greater than the new second list, and a data repair process is performed. The autonomous repair process retrieves the existing degraded IO list, finds the disk (disk 5) that does not exist in the RAID disk information of the failed degraded IO table for disk 5 in the first list, and thus performs a read and write back (write) operation on all IOs in the list. In the process, the reading of the data is only performed on the disk 1/2/3, the writing of the data is only performed on the disk 1/2/3/5, and the IO information is added in the associated degradation IO table. After execution is completed, the list is deleted. For the degradation IO table corresponding to the temporary failure of the disk 4, the step of data repair is skipped because the autonomous repair condition is not satisfied. As shown in fig. 6, when traversing the degraded IO list is completed, the second list is set according to the first list update.
(6) Assuming that the disk 4 is restored, the first list is updated to 1/2/3/4/5, and the updated new second list is determined to be 1/2/3/5 according to the second list. At this time, the writing of data performs an operation on the disk 1/2/3/4/5 without adding a degraded IO table record. Before autonomous repair is not completed, the reading of data is still performed only on disk 1/2/3/5. Since the first list length is greater than the new second list length at this time, a process of data repair is performed.
The autonomous repair process retrieves the existing degraded IO list, finds out that there is a disk (disk 4) in the degraded IO list that is not present in the first list when disk 4 is temporarily dead, and thus performs a read and write back (write) operation on all the IO information in the list. After execution is completed, the list is deleted. As shown in fig. 7, when traversing the degraded IO list is completed, the second list is set according to the first list update. At this time, all the disks of the target RAID array resume normal operation.
The technical scheme of the embodiment improves the stability of data storage and data restoration process when the disk is temporarily invalid while normal access service is not interrupted.
As shown in fig. 8, the data storage and repair system 200 for temporary failure of a RAID array disk according to an embodiment of the present invention includes an initialization module 210, a first processing module 220, a second processing module 230, a third processing module 240, and an operation module 250;
the initialization module 210 is configured to: when a target RAID array is initialized to run, a first list for data writing and a second list for data reading are respectively set according to all disks of the target RAID array;
the first processing module 220 is configured to: when the running state of any disk in a target RAID array changes, updating a first list according to all currently accessible disks in the target RAID array, and when all disks in the target RAID array are not contained in the updated first list, generating at least one degradation IO list according to the updated first list, and judging whether the number of disks in the updated first list meets a first preset condition or not to obtain a first judgment result;
the second processing module 230 is configured to: when the first judging result is yes, acquiring and updating the second list according to a second list which is commonly contained in the running state of any disk and all disks in the updated first list, and judging whether the number of the disks in the updated second list meets a second preset condition or not to obtain a second judging result;
the third processing module 240 is configured to: when the second judging result is yes, judging whether the number of the magnetic disks in the first list is larger than the number of the magnetic disks in the new second list, and obtaining a third judging result;
the operation module 250 is configured to: and when the third judging result is yes, repairing the data of the target RAID array by adopting a preset repairing mode, and when the repairing of the target RAID array is completed, updating the second list according to the first list so as to complete the repairing of the data of the target RAID array.
Preferably, the preset repairing method includes:
based on a preset sequence, sequentially judging whether each degradation IO list contains all disks in the updated first list, obtaining a fourth judgment result corresponding to each degradation IO list, and determining each degradation IO list with the fourth judgment result being negative as an IO list to be processed;
and respectively reading target data corresponding to each IO information in each IO list to be processed, respectively rewriting each target data into the target RAID array, and deleting each IO list to be processed.
Preferably, the process of reading target data corresponding to any IO information in any IO list to be processed includes:
acquiring read data corresponding to target IO information from all disks of the new second list; the target IO information is as follows: any IO information in any IO list to be processed;
and acquiring target data in the read data corresponding to the target IO information through a RAID data redundancy and verification function.
The technical scheme of the embodiment improves the stability of data storage and data restoration process when the disk is temporarily invalid while normal access service is not interrupted.
The steps for implementing the corresponding functions of the parameters and the modules in the data storage and repair system 200 for temporary failure of a RAID array disk according to the present embodiment are referred to in the embodiments of the data storage and repair method for temporary failure of a RAID array disk, and are not described herein.
In the description provided herein, numerous specific details are set forth. It will be appreciated, however, that embodiments of the invention may be practiced without such specific details. Similarly, in the above description of exemplary embodiments of the invention, various features of embodiments of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. Wherein the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the words first, second, third, etc. do not denote any order. These words may be interpreted as names. The steps in the above embodiments should not be construed as limiting the order of execution unless specifically stated.

Claims (8)

1. A method for data storage and repair of temporary failure of a RAID array disk, comprising:
s1, when a target RAID array is initialized to run, a first list for data writing and a second list for data reading are respectively set according to all disks of the target RAID array;
s2, when the running state of any disk in a target RAID array changes, updating a first list according to all currently accessible disks in the target RAID array, and when all disks in the target RAID array are not contained in the updated first list, generating at least one degradation IO list according to the updated first list, and judging whether the number of disks in the updated first list meets a first preset condition or not to obtain a first judgment result;
s3, when the first judging result is yes, acquiring and updating the second list according to a second list which is commonly contained in the first list before the running state of any disk is changed and all disks in the updated first list, and judging whether the number of the disks in the updated second list meets a second preset condition or not to obtain a second judging result;
s4, when the second judging result is yes, judging whether the number of the magnetic disks in the first list is larger than that in the updated second list, and obtaining a third judging result;
s5, when the third judging result is yes, repairing the data of the target RAID array in a preset repairing mode, and when the repairing of the target RAID array is completed, updating the second list according to the first list to complete the repairing of the data of the target RAID array;
the preset repairing mode comprises the following steps:
based on a preset sequence, sequentially judging whether each degradation IO list contains all disks in the updated first list, obtaining a fourth judgment result corresponding to each degradation IO list, and determining each degradation IO list with the fourth judgment result being negative as an IO list to be processed; the IO list to be processed is a degradation IO list needing to execute data restoration;
and respectively reading target data corresponding to each IO information in each IO list to be processed, respectively rewriting each target data into the target RAID array, and deleting each IO list to be processed.
2. The method for storing and repairing data temporarily failed in a RAID array disk according to claim 1, wherein the process of reading target data corresponding to any IO information in any IO list to be processed includes:
acquiring read data corresponding to target IO information from all the disks of the updated second list; the target IO information is as follows: any IO information in any IO list to be processed;
and acquiring target data in the read data corresponding to the target IO information through a RAID data redundancy and verification function.
3. The method for data storage and repair of temporary failure of a RAID array disk according to claim 1 wherein the process of rewriting target data of any IO information into a corresponding disk of said target RAID array comprises:
judging whether the updated first list contains all disks in the target RAID array or not to obtain a fifth judging result;
when the fifth judging result is yes, respectively writing target data corresponding to the target IO information into corresponding magnetic discs; the target IO information is as follows: any IO information in any IO list to be processed.
4. A method of data storage and repair for temporary failure of a RAID array disk according to claim 3 wherein said process of rewriting target data of any IO information to a corresponding disk of said target RAID array further comprises:
and when the fifth judging result is negative, respectively writing the target data corresponding to the target IO information into the corresponding magnetic disk, and correspondingly storing the target IO information into a degradation IO list corresponding to the updated first list.
5. The method for data storage and repair of temporary failure of a RAID array disk of claim 1 further comprising:
and when the updated first list does not contain all the disks in the target RAID array, storing all the write IO information into a degradation IO list corresponding to the first list.
6. The method for data storage and repair of temporary failure of a RAID array disk of claim 1 further comprising:
when the target RAID array stops running, storing a first list and a second list corresponding to the target RAID array in a current running state;
when the target RAID array is powered up to run, the first list and the second list stored when the target RAID array stops running are loaded.
7. A data storage and repair system for temporary failure of a RAID array disk, comprising: the device comprises an initialization module, a first processing module, a second processing module, a third processing module and an operation module;
the initialization module is used for: when a target RAID array is initialized to run, a first list for data writing and a second list for data reading are respectively set according to all disks of the target RAID array;
the first processing module is used for: when the running state of any disk in a target RAID array changes, updating a first list according to all currently accessible disks in the target RAID array, and when all disks in the target RAID array are not contained in the updated first list, generating at least one degradation IO list according to the updated first list, and judging whether the number of disks in the updated first list meets a first preset condition or not to obtain a first judgment result;
the second processing module is used for: when the first judging result is yes, acquiring and updating the second list according to a second list which is commonly contained in the running state of any disk and all disks in the updated first list, and judging whether the number of the disks in the updated second list meets a second preset condition or not to obtain a second judging result;
the third processing module is used for: when the second judging result is yes, judging whether the number of the magnetic disks in the first list is larger than the number of the magnetic disks in the updated second list, and obtaining a third judging result;
the operation module is used for: when the third judging result is yes, repairing the data of the target RAID array by adopting a preset repairing mode, and when the repairing of the target RAID array is completed, updating the second list according to the first list so as to complete the repairing of the data of the target RAID array;
the preset repairing mode comprises the following steps:
based on a preset sequence, sequentially judging whether each degradation IO list contains all disks in the updated first list, obtaining a fourth judgment result corresponding to each degradation IO list, and determining each degradation IO list with the fourth judgment result being negative as an IO list to be processed; the IO list to be processed is a degradation IO list needing to execute data restoration;
and respectively reading target data corresponding to each IO information in each IO list to be processed, respectively rewriting each target data into the target RAID array, and deleting each IO list to be processed.
8. The system for storing and repairing data temporarily failed in a RAID array disk according to claim 7 wherein the process of reading target data corresponding to any IO information in any IO list to be processed comprises:
acquiring read data corresponding to target IO information from all the disks of the updated second list; the target IO information is as follows: any IO information in any IO list to be processed;
and acquiring target data in the read data corresponding to the target IO information through a RAID data redundancy and verification function.
CN202211209814.7A 2022-09-30 2022-09-30 Data storage and repair method and system for temporary failure of RAID array disk Active CN115565598B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211209814.7A CN115565598B (en) 2022-09-30 2022-09-30 Data storage and repair method and system for temporary failure of RAID array disk

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211209814.7A CN115565598B (en) 2022-09-30 2022-09-30 Data storage and repair method and system for temporary failure of RAID array disk

Publications (2)

Publication Number Publication Date
CN115565598A CN115565598A (en) 2023-01-03
CN115565598B true CN115565598B (en) 2023-06-02

Family

ID=84743081

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211209814.7A Active CN115565598B (en) 2022-09-30 2022-09-30 Data storage and repair method and system for temporary failure of RAID array disk

Country Status (1)

Country Link
CN (1) CN115565598B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101984400A (en) * 2010-11-05 2011-03-09 成都市华为赛门铁克科技有限公司 RAID control method, device and system

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0701715A4 (en) * 1993-06-04 1999-11-17 Network Appliance Corp A method for providing parity in a raid sub-system using a non-volatile memory
CN102043685A (en) * 2010-12-31 2011-05-04 成都市华为赛门铁克科技有限公司 RAID (redundant array of independent disk) system and data recovery method thereof
CN105808170B (en) * 2016-03-22 2018-06-26 华东交通大学 A kind of RAID6 coding methods that can repair single disk error
CN106250055A (en) * 2016-07-12 2016-12-21 乐视控股(北京)有限公司 A kind of date storage method and system
CN108733518B (en) * 2017-04-17 2021-07-09 伊姆西Ip控股有限责任公司 Method, apparatus, and computer-readable medium for managing a storage system

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101984400A (en) * 2010-11-05 2011-03-09 成都市华为赛门铁克科技有限公司 RAID control method, device and system

Also Published As

Publication number Publication date
CN115565598A (en) 2023-01-03

Similar Documents

Publication Publication Date Title
US7143308B2 (en) Apparatus, system, and method for differential rebuilding of a reactivated offline RAID member disk
US7640412B2 (en) Techniques for improving the reliability of file systems
US6480969B1 (en) Providing parity in a RAID sub-system using non-volatile memory
US6990611B2 (en) Recovering data from arrays of storage devices after certain failures
US6195761B1 (en) Method and apparatus for identifying and repairing mismatched data
EP0608344B1 (en) System for backing-up data for rollback
US9009526B2 (en) Rebuilding drive data
US7721143B2 (en) Method for reducing rebuild time on a RAID device
US8484522B2 (en) Apparatus, system, and method for bad block remapping
EP2857971B1 (en) Method and device for repairing error data
US7464322B2 (en) System and method for detecting write errors in a storage device
US6233696B1 (en) Data verification and repair in redundant storage systems
US8356292B2 (en) Method for updating control program of physical storage device in storage virtualization system and storage virtualization controller and system thereof
US6389511B1 (en) On-line data verification and repair in redundant storage system
KR100561495B1 (en) Self healing storage system
US7076686B2 (en) Hot swapping memory method and system
US10114699B2 (en) RAID consistency initialization method
US20150067443A1 (en) Method and Device for Recovering Erroneous Data
US20060259812A1 (en) Data protection method
US20100138603A1 (en) System and method for preventing data corruption after power failure
US7577804B2 (en) Detecting data integrity
US10831601B2 (en) Reconstruction hard disk array and reconstruction method for to-be-reconstructed hard disks therein including comparing backup data with an access timestamp of first, second and third hard disks
CN115565598B (en) Data storage and repair method and system for temporary failure of RAID array disk
CN106933707B (en) Data recovery method and system of data storage device based on raid technology
CN111381997A (en) RAID reconstruction method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant