CN115565598A - Data storage and repair method and system for temporary failure of RAID array disk - Google Patents

Data storage and repair method and system for temporary failure of RAID array disk Download PDF

Info

Publication number
CN115565598A
CN115565598A CN202211209814.7A CN202211209814A CN115565598A CN 115565598 A CN115565598 A CN 115565598A CN 202211209814 A CN202211209814 A CN 202211209814A CN 115565598 A CN115565598 A CN 115565598A
Authority
CN
China
Prior art keywords
list
target
raid array
disks
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211209814.7A
Other languages
Chinese (zh)
Other versions
CN115565598B (en
Inventor
麻昊志
傅智康
宫永生
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Technology and Engineering Center for Space Utilization of CAS
Original Assignee
Technology and Engineering Center for Space Utilization of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Technology and Engineering Center for Space Utilization of CAS filed Critical Technology and Engineering Center for Space Utilization of CAS
Priority to CN202211209814.7A priority Critical patent/CN115565598B/en
Publication of CN115565598A publication Critical patent/CN115565598A/en
Application granted granted Critical
Publication of CN115565598B publication Critical patent/CN115565598B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C29/00Checking stores for correct operation ; Subsequent repair; Testing stores during standby or offline operation
    • G11C29/04Detection or location of defective memory elements, e.g. cell constructio details, timing of test signals
    • G11C29/08Functional testing, e.g. testing during refresh, power-on self testing [POST] or distributed testing
    • G11C29/12Built-in arrangements for testing, e.g. built-in self testing [BIST] or interconnection details
    • G11C29/44Indication or identification of errors, e.g. for repair
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Techniques For Improving Reliability Of Storages (AREA)

Abstract

The invention discloses a data storage and repair method and a system for temporary failure of a RAID array disk, which comprises the following steps: when the running state of the disks in the RAID array changes, generating a first list containing all currently accessible disks, and when the first list does not contain all the disks, generating a degraded IO list according to the first list, and judging whether the number of the disks in the first list meets a preset condition; if so, acquiring and generating a new second list according to all disks contained in the second list and the first list together, judging whether the number of the disks in the new second list meets a preset condition, and if so, judging whether the number of the disks in the first list is larger than that in the new second list; and if so, performing data repair on the target RAID array by adopting a preset repair mode. The invention improves the stability of the data storage and data repair process when the disk fails temporarily while not interrupting the normal access service.

Description

Data storage and repair method and system for temporary failure of RAID array disk
Technical Field
The invention relates to the field of data storage, in particular to a data storage and repair method and system for temporary failure of a RAID array disk.
Background
There are many forms of disk failure, both permanent and temporary. A temporary failure refers to a disk that provides no normal access function for only a period of time, but thereafter can resume a normal failure mode. If the disk fails temporarily, the failed disk only loses data in the temporary failure time period. The existing RAID data repair mode can rebuild all data, and the system is stopped for a long time. If the temporary failure rate is high, RAID is in an unavailable state for a long time due to frequent reconstruction.
In order to solve the above problems, a series of RAID data autonomous repair methods are proposed in the industry at present, and a main method is to degrade RAID when a disk failure occurs, and set a degradation IO list to record information such as all write IO addresses occurring during a failure period and a location of a failed disk during writing. When RAID reading is carried out, firstly, whether IO is located in the list is searched, if yes, IO data is rebuilt according to the disk failure condition when the IO is written and the data redundancy and verification function of RAID, and therefore reliable data reading is achieved. And when the normal function of the failed disk is recovered, reconstructing the corresponding data of the IO in the degraded IO list in the recovery disk, and rewriting the recovered failed disk to realize the autonomous repair.
Although the existing scheme can effectively reduce the system recovery data volume caused by temporary failure, the following problems exist: 1) After a disk fails and before repair is completed, all read IO need to search a degraded IO list first, and then data access can be performed, and as the failure time increases, the length of the table will expand continuously, so that the retrieval delay is increased, and the RAID access performance is seriously reduced. 2) The degraded IO list is necessary information for reliable RAID reading, and if the degraded IO list is stored in an additional reliable storage device, since the degraded IO list will grow with the failure time, a large-capacity high-reliability storage device needs to be added to the system, and the RAID cost is significantly increased. 3) If the degraded IO list is directly stored in the RAID, writing in the degraded IO list also generates writing IO, so that the degraded IO list is changed; in order to store the changed IO list, new written IO is caused, and the writing process can never be completed. If the write IO information of the degraded IO list is not recorded, the correctness and the availability of the degraded IO list cannot be reliably determined when different disks temporarily fail.
Disclosure of Invention
In order to solve the technical problem, the invention provides a data storage and repair method and a data storage and repair system for temporary failure of a RAID array disk.
The technical scheme of the data storage and repair method for temporary failure of the RAID array disk comprises the following steps:
s1, when a target RAID array is initialized to operate, respectively setting a first list for data writing and a second list for data reading according to all disks of the target RAID array;
s2, when the running state of any disk in a target RAID array changes, updating a first list according to all currently accessible disks in the target RAID array, and when the updated first list does not contain all disks in the target RAID array, generating at least one degradation IO list according to the updated first list, and judging whether the number of the disks in the updated first list meets a first preset condition to obtain a first judgment result;
s3, when the first judgment result is yes, acquiring and updating a second list according to a second list and all disks in the updated first list, wherein the second list is commonly contained in the second list before the operation state of any disk changes, and judging whether the number of the disks in the updated second list meets a second preset condition or not to obtain a second judgment result;
s4, when the second judgment result is yes, judging whether the number of the disks in the first list is larger than that in the new second list or not to obtain a third judgment result;
and S5, when the third judgment result is yes, repairing the data of the target RAID array by adopting a preset repairing mode, and when the target RAID array is repaired, updating the second list according to the first list so as to complete the data repairing of the target RAID array.
The data storage and repair method for temporary failure of the RAID array disk has the following beneficial effects:
the method of the invention improves the stability of the data storage and data repair process when the disk fails temporarily while not interrupting the normal access service.
On the basis of the scheme, the data storage and repair method for temporary failure of the RAID array disk can be further improved as follows.
Further, the preset repairing mode comprises the following steps:
on the basis of a preset sequence, sequentially judging whether each degradation IO list contains all the disks in the updated first list, obtaining a fourth judgment result corresponding to each degradation IO list, and determining each degradation IO list with the fourth judgment result of no as an IO list to be processed;
and respectively reading target data corresponding to each IO information in each IO list to be processed, respectively rewriting each target data into the target RAID array, and deleting each IO list to be processed.
Further, the process of reading target data corresponding to any IO information in any to-be-processed IO list includes:
acquiring read data corresponding to the target IO information from all the disks of the new second list; wherein, the target IO information is: any IO information in any IO list to be processed;
and acquiring target data in the read data corresponding to the target IO information through RAID data redundancy and check functions.
Further, the process of rewriting the target data of any IO information into the corresponding disk of the target RAID array includes:
judging whether the updated first list contains all the disks in the target RAID array or not to obtain a fifth judgment result;
when the fifth judgment result is yes, respectively writing the target data corresponding to the target IO information into the corresponding disks; wherein, the target IO information is: any IO information in any IO list to be processed.
Further, the process of rewriting the target data of any IO information into the corresponding disk of the target RAID array further includes:
and when the fifth judgment result is negative, respectively writing the target data corresponding to the target IO information into corresponding disks, and correspondingly storing the target IO information into the degraded IO list corresponding to the updated first list.
Further, still include:
and when the updated first list does not contain all the disks in the target RAID array, storing all the written IO information to a degradation IO list corresponding to the first list.
Further, still include:
when the target RAID array stops running, storing a first list and a second list corresponding to the target RAID array in the current running state;
and when the target RAID array is powered on and operated, loading the first list and the second list stored when the target RAID array stops operating.
The technical scheme of the data storage and repair system for temporary failure of the RAID array disk comprises the following steps:
the method comprises the following steps: the system comprises an initialization module, a first processing module, a second processing module, a third processing module and an operation module;
the initialization module is configured to: when a target RAID array is initialized to operate, respectively setting a first list for data writing and a second list for data reading according to all disks of the target RAID array;
the first processing module is configured to: when the running state of any disk in a target RAID array changes, updating a first list according to all currently accessible disks in the target RAID array, and when the updated first list does not contain all disks in the target RAID array, generating at least one degraded IO list according to the updated first list, and judging whether the number of disks in the updated first list meets a first preset condition to obtain a first judgment result;
the second processing module is configured to: when the first judgment result is yes, acquiring and updating a second list according to a second list which is commonly contained in the first list before the running state of any disk changes and all disks in the first list after updating, and judging whether the number of the disks in the second list after updating meets a second preset condition to obtain a second judgment result;
the third processing module is configured to: when the second judgment result is yes, judging whether the number of the disks in the first list is larger than that in the new second list or not to obtain a third judgment result;
the operation module is used for: and when the third judgment result is yes, repairing the data of the target RAID array by adopting a preset repairing mode, and when the target RAID array is repaired, updating the second list according to the first list so as to complete the data repair of the target RAID array.
The data storage and repair system for temporary failure of the RAID array disk has the following beneficial effects:
the system of the invention improves the stability of data storage and data repair process when the disk fails temporarily while not interrupting normal access service.
Based on the above scheme, the data storage and repair system for temporary failure of a RAID array disk according to the present invention may be further improved as follows.
Further, the preset repairing mode comprises the following steps:
on the basis of a preset sequence, sequentially judging whether each degraded IO list contains all the disks in the updated first list to obtain a fourth judgment result corresponding to each degraded IO list, and determining each degraded IO list with the fourth judgment result of no as an IO list to be processed;
and respectively reading target data corresponding to each IO information in each IO list to be processed, respectively rewriting each target data into the target RAID array, and deleting each IO list to be processed.
Further, the process of reading target data corresponding to any IO information in any to-be-processed IO list includes:
acquiring read data corresponding to target IO information from all the disks of the new second list; wherein, the target IO information is: any IO information in any IO list to be processed;
and acquiring target data in the read data corresponding to the target IO information through RAID data redundancy and check functions.
Drawings
FIG. 1 is a schematic flow chart illustrating a method for storing and repairing temporarily failed RAID array disks according to an embodiment of the present invention;
FIG. 2 is a schematic diagram illustrating a normal operation of a target RAID array in a data storage and repair method for temporary failure of a RAID array disk according to an embodiment of the present invention;
fig. 3 is a first schematic diagram of a failure condition of a target RAID array disk in the data storage and repair method for temporary failure of a RAID array disk according to the embodiment of the present invention;
fig. 4 is a schematic diagram illustrating recovery of a target RAID array disk and non-recovery of data in the data storage and repair method for temporary failure of a RAID array disk according to the embodiment of the present invention;
fig. 5 is a second schematic diagram of a failure situation of a target RAID array disk in the data storage and repair method for temporary failure of a RAID array disk according to the embodiment of the present invention;
fig. 6 is a schematic diagram of a storage state of a target RAID array disk after data recovery in the data storage and recovery method for temporary failure of a RAID array disk according to the embodiment of the present invention;
fig. 7 is a schematic diagram of a normal operating situation after data of a target RAID array disk is repaired in the data storage and repair method for temporary failure of a RAID array disk according to the embodiment of the present invention;
FIG. 8 is a block diagram of a RAID array disk temporary failure data storage and repair system according to an embodiment of the present invention.
Detailed Description
As shown in fig. 1, a method for storing and repairing data of a RAID array disk that temporarily fails according to an embodiment of the present invention includes the following steps:
s1, when a target RAID array is initialized and operated, respectively setting a first list for data writing and a second list for data reading according to all disks of the target RAID array.
The disks in the first list are used for data writing of the target RAID array, and the disks in the second list are used for data reading of the target RAID array. At initialization, the first list is identical to the second list.
Specifically, assuming that the target RAID array includes 1/2/3/4/5 of the disks, the first list is 1/2/3/4/5 of the disks, and the second list is 1/2/3/4/5 of the disks.
S2, when the running state of any disk in a target RAID array changes, updating a first list according to all currently accessible disks in the target RAID array, and when the updated first list does not contain all disks in the target RAID array, generating at least one degraded IO list according to the updated first list, and judging whether the number of the disks in the updated first list meets a first preset condition to obtain a first judgment result.
The target RAID array is any one type of RAID disk array, and the types include but are not limited to: RAID3, RAID5, RAID6, RAID7, etc. The operation state comprises the following steps: normal operation or temporary failure. In the first list are: and a disk write list of the target RAID array, comprising all currently accessible disks in the current target RAID array. The first preset condition is as follows: the number of disks in the first list is greater than or equal to the number of disks required for the RAID data redundancy and check function.
The degraded IO lists are all stored in the target RAID array, and specific locations may be designated or configured by a program.
Wherein the process of obtaining at least one degraded IO list according to the first list comprises: first list
The associated downgraded IO list may be a newly created list or an existing list. If it is newly built
Downgrade the IO list, then record the available disks in the first list at the time the list was created. If it is already present
List, then the available disks of the degraded IO list should be identical to the available disks in the first list.
Specifically, when the running state of any disk in the target RAID array changes, the RAID disk write-in list before the running state changes is cleared, a first list including all currently accessible disks of the target RAID array is generated, when the first list does not include all disks in the target RAID array, at least one degradation IO list is obtained according to the first list setting, and whether the number of disks in the first list meets the number of disks required by the RAID data redundancy and check function is determined, so that a first determination result is obtained.
It should be noted that, for the number of disks required for the RAID data redundancy and check function, the number of disks required for different types of target RAID arrays is different; for example, for RAID5, if more than 1 disk is damaged, it is determined to be invalid, otherwise it is valid; for RAID6, 2 and more disk corruptions can be tolerated.
It should be noted that, when the target RAID array operates, the operating state of each disk is monitored, and when it is found that an original normal disk is abnormal (temporarily failed), or an abnormal (temporarily failed) disk returns to normal, it is determined that the operating state changes. The monitoring method of the disk state comprises the following steps: and detecting whether the disk is normally connected or whether reading and writing are overtime, and the like.
And S3, when the first judgment result is yes, acquiring and updating the second list according to a second list and all disks in the updated first list which are commonly contained in the second list before the operation state of any disk changes, and judging whether the number of the disks in the updated second list meets a second preset condition or not to obtain a second judgment result.
For example, before the running state of the disk changes, the first list is: 1/2/3/4/5, the second list is: 1/2/3/4/5; at this time, if the disk 5 fails, the updated first list is 1/2/3/4, the second list before the change of the operating state is 1/2/3/4/5, and the updated second list is 1/2/3/4. When the disk 5 returns to normal, the updated first list is 1/2/3/4/5, the second list before the change of the running state is 1/2/3/4, and the updated second list is 1/2/3/4.
Wherein the second preset condition is as follows: the number of disks in the new second list is greater than or equal to the number of disks required for the RAID data redundancy and check function.
Specifically, when the number of disks in the first list meets the number of disks required by the RAID data redundancy and check function, a new second list is generated according to the first list and the second list, and whether the number of disks in the new second list is greater than or equal to the number of disks required by the RAID data redundancy and check function is determined, so that a second determination result is obtained.
It should be noted that, when the first determination result is negative, the operation is temporarily stopped, and the operation is continued after the available disks are restored to the number of disks required by the RAID data redundancy and check function.
And S4, when the second judgment result is yes, judging whether the number of the disks in the first list is larger than that in the new second list or not, and obtaining a third judgment result.
Specifically, when the number of disks in the new second list is greater than or equal to the number of disks required by the RAID data redundancy and check function, whether the first list includes a disk that does not exist in the second list is determined, and a third determination result is obtained.
It should be noted that, when the second determination result is negative, the operation is temporarily stopped, and the operation is continued after the available disks are recovered to the number of disks required by the RAID data redundancy and check function.
And S5, when the third judgment result is yes, repairing the data of the target RAID array by adopting a preset repairing mode, and when the target RAID array is repaired, updating the second list according to the first list so as to complete the data repairing of the target RAID array.
Wherein, the process of updating the second list according to the first list comprises the following steps: and after traversing all the degraded IO lists, resetting the second list according to the first list, and updating the disks in the second list into the disks in the first list. For example, if the number of disks in the first list is 1/2/3 and the number of disks in the second list is 1/2, the second list is updated according to the first list, and the updated second list is 1/2/3.
Specifically, when the number of disks in the first list is greater than that in the new second list, a preset repair mode is adopted to repair data of the target RAID array, and after the disk repair is completed, the second list (RAID disk read list) is updated according to the first list (RAID disk write list).
It should be noted that, when the third determination result is negative, the process of performing data repair is stopped.
Preferably, the preset repairing mode comprises:
and on the basis of a preset sequence, sequentially judging whether each degraded IO list contains all the disks in the updated first list, obtaining a fourth judgment result corresponding to each degraded IO list, and determining each degraded IO list with the fourth judgment result of no as an IO list to be processed.
The preset sequence may be a sequence, a reverse sequence or other sequences, and is not limited herein.
The IO list to be processed is a degraded IO list that needs to perform data repair.
Specifically, traversing all the degraded IO lists, sequentially judging whether the available disks of each degraded IO list include all the disks in the first list, and obtaining a fourth judgment result of each degraded IO list; and if any degraded IO list does not contain all the disks in the first list, determining that the fourth judgment result of the degraded IO list is negative, and determining the degraded IO list as the IO list to be processed until each degraded IO list with the fourth judgment result of negative is determined as the IO list to be processed. For example, the available disks in any degraded IO list are 1/2/4/5, and the first list is 1/2/3/4/5. Wherein, the disk 3 is a disk that does not exist in the degraded IO list, so the IO information in the list is repaired.
It should be noted that, when the fourth determination result of any degraded IO list is yes, the degraded IO list does not perform data repair.
And respectively reading target data corresponding to each IO information in each IO list to be processed, respectively rewriting each target data into the target RAID array, and deleting each IO list to be processed.
Each IO list to be processed comprises at least one IO message, and each IO message comprises an IO address and a length.
It should be noted that, after the reading and writing of any IO information is completed, the IO information record is deleted from the corresponding to-be-processed IO list. And when all the IO information in any degraded IO list is deleted, deleting the degraded IO list.
Preferably, the process of reading the target data corresponding to any IO information in any to-be-processed IO list includes:
and acquiring read data corresponding to the target IO information from all the disks of the new second list.
Wherein, the target IO information is: any IO information in any IO list to be processed;
specifically, the read target IO information is read data corresponding to all disks in the new second list (updated RAID disk read list).
And acquiring target data in the read data corresponding to the target IO information through RAID data redundancy and check functions.
Specifically, when the updated second list does not include all the disks in the target RAID array, the target data in the read data corresponding to the target IO information is obtained through RAID data redundancy and check function calculation.
Furthermore, since the target RAID array is in the repair process, there is no case where all of the disks in the target RAID array are included in the new second list.
It should be noted that the target data refers to data actually written and read by the user. To improve reliability, the target RAID array may add some redundant information to the target data (i.e., the read data includes some redundant information). When a failure occurs, if a portion of the target data is lost, the portion of the lost data can be recovered by the redundant information and the remaining valid data.
Preferably, the process of rewriting the target data of any IO information into the corresponding disk of the target RAID array includes:
judging whether the updated first list contains all the disks in the target RAID array or not to obtain a fifth judgment result;
and when the fifth judgment result is yes, respectively writing the target data corresponding to the target IO information into the corresponding disks.
Wherein, the target IO information is: any IO information in any IO list to be processed.
Specifically, when all the disks in the target RAID array are included in the first list, the target data corresponding to the target IO information (including the IO address and the length) is written into the corresponding disks respectively.
Preferably, the process of rewriting the target data of any IO information into the corresponding disk of the target RAID array further includes:
and when the fifth judgment result is negative, respectively writing the target data corresponding to the target IO information into corresponding disks, and correspondingly storing the target IO information into the degraded IO list corresponding to the updated first list.
Specifically, when the first list does not include all the disks in the target RAID array, and at this time, a failed disk exists in the target RAID array, the target data corresponding to the target IO information is written into the corresponding disks, and the target IO information is stored in the degraded IO list corresponding to the first list correspondingly
In (1). For example, when disk 2 fails, writing data length 0x200 to 0x100 address of array, then
Recording IO information as follows: the 0x100 address writes data length 0x200.
It should be noted that, before the process of determining whether the first list includes all the disks in the target RAID array, the method further includes: calculating write-in data of each disk in a target RAID array according to the RAID data redundancy and check function requirements; the actual writes are made to the disks in the first list, ignoring the remaining disks.
Preferably, the method further comprises the following steps:
and when the updated first list does not contain all the disks in the target RAID array, storing all the written IO information to a degraded IO list corresponding to the updated first list.
Specifically, when the operating status of the disks in the target RAID array is normal (i.e., there is no situation of temporary failure or recovery from normal), data is written through the first list, and data is read through the second list. For example, assuming a total of 5 disks, there are two cases: (1) the first list is: 1/2/3/4/5, the second list is: 1/2/3/4/5; (2) the first list is: 1/2/3/4, the second list is: 1/2/3/4.
Preferably, the method further comprises the following steps:
and when the target RAID array stops running, storing a first list and a second list corresponding to the target RAID array in the current running state.
And when the target RAID array is powered on and operated, loading the first list and the second list stored when the target RAID array stops operating.
To better illustrate the technical solution of the present embodiment, the following examples are used for illustration.
The target RAID array is composed of 5 disks, and correct data in any three disks can recover correct data of the other two disks.
(1) As shown in fig. 2, the target RAID array has no failed disk in normal access. At this time, all read-write accesses are executed on the disk 1/2/3/4/5, wherein the first list is 1/2/3/4/5, and the second list is 1/2/3/4/5.
(2) As shown in fig. 3, assuming that the disk 5 fails at this time, the first list is updated to 1/2/3/4, and the degraded IO list is associated; the second list is updated to a new second list, which is 1/2/3/4. At the moment, the data writing only executes the operation on 1/2/3/4 of the disk, and simultaneously, IO information is added into the associated degradation IO table; the reading of data is only performed on the disk 1/2/3/4 at this time.
(3) As shown in FIG. 4, assuming that the disk 5 is restored at this time, the first list is updated to 1/2/3/4/5, and the new second list remains 1/2/3/4 before the autonomic repair is completed. At the moment, data writing is carried out on the 1/2/3/4/5 of the disk, and degraded IO table recording is not required to be added. The reading of data is still performed only on 1/2/3/4 of the disk before the autonomous repair is not completed.
(4) As shown in FIG. 5, assuming that disk 4 fails before data repair is completed, the first list is updated to 1/2/3/5 and a degraded IO list is associated; the new second list is updated to 1/2/3. At the moment, the writing of the data only executes the operation on 1/2/3/5 of the disk, and IO information is added in the associated degradation IO table; the reading of data is only performed on 1/2/3 of the disk at this time.
(5) At this time, the first list is longer than the new second list, and the data repair process is performed. The autonomic repair process retrieves the existing destage IO list, finds a disk (disk 5) in the first list that is not present in the RAID disk information of the disk 5 failed destage IO table, and therefore performs read and write back (write) operations on all IOs in this list. In the process, data reading is only performed on 1/2/3 of the disk, data writing is only performed on 1/2/3/5 of the disk, and IO information is added in the associated degradation IO table. After execution is complete, the list is deleted. For the degraded IO table corresponding to the temporary failure of the disk 4, the step of data repair is skipped since the autonomous repair condition is not satisfied. As shown in FIG. 6, when traversing the demoted IO list is complete, the second list is set up according to the first list update.
(6) And assuming that the disk 4 is recovered, updating the first list to be 1/2/3/4/5, and determining that the updated new second list is 1/2/3/5 according to the second list. At the moment, data writing is carried out on the 1/2/3/4/5 of the disk, and degraded IO table recording is not required to be added. The reading of data is still performed only on 1/2/3/5 of the disk before the autonomous repair is not completed. Since the first list length is now larger than the new second list length, the process of data repair is performed.
The autonomic repair process retrieves the existing list of degraded IOs, finds in the first list that there is a disk (disk 4) in the degraded IO table when disk 4 temporarily fails, and thus performs a read and write back (write) operation on all IO information in this list. After execution is complete, the list is deleted. As shown in FIG. 7, when traversing the demoted IO list is complete, the second list is updated from the first list. At this point all disks of the target RAID array resume normal operation.
According to the technical scheme, the stability of data storage and data repair processes of the disk in temporary failure is improved while normal access service is not interrupted.
As shown in fig. 8, a system 200 for data storage and repair of a RAID array disk temporary failure according to an embodiment of the present invention includes an initialization module 210, a first processing module 220, a second processing module 230, a third processing module 240, and an execution module 250;
the initialization module 210 is configured to: when a target RAID array is initialized to operate, respectively setting a first list for data writing and a second list for data reading according to all disks of the target RAID array;
the first processing module 220 is configured to: when the running state of any disk in a target RAID array changes, updating a first list according to all currently accessible disks in the target RAID array, and when the updated first list does not contain all disks in the target RAID array, generating at least one degraded IO list according to the updated first list, and judging whether the number of disks in the updated first list meets a first preset condition to obtain a first judgment result;
the second processing module 230 is configured to: when the first judgment result is yes, acquiring and updating a second list according to a second list which is commonly contained in the any one disk before the operation state changes and all disks in the updated first list, and judging whether the number of the disks in the updated second list meets a second preset condition to obtain a second judgment result;
the third processing module 240 is configured to: when the second judgment result is yes, judging whether the number of the disks in the first list is larger than that in the new second list or not to obtain a third judgment result;
the operation module 250 is configured to: and when the third judgment result is yes, repairing the data of the target RAID array by adopting a preset repairing mode, and when the target RAID array is repaired, updating the second list according to the first list so as to complete the data repairing of the target RAID array.
Preferably, the preset repairing mode comprises:
on the basis of a preset sequence, sequentially judging whether each degradation IO list contains all the disks in the updated first list, obtaining a fourth judgment result corresponding to each degradation IO list, and determining each degradation IO list with the fourth judgment result of no as an IO list to be processed;
and respectively reading target data corresponding to each IO information in each IO list to be processed, respectively rewriting each target data into the target RAID array, and deleting each IO list to be processed.
Preferably, the process of reading the target data corresponding to any IO information in any to-be-processed IO list includes:
acquiring read data corresponding to target IO information from all the disks of the new second list; wherein, the target IO information is: any IO information in any IO list to be processed;
and acquiring target data in the read data corresponding to the target IO information through RAID data redundancy and check functions.
According to the technical scheme, the stability of data storage and data repair processes of the disk in temporary failure is improved while normal access service is not interrupted.
For the above steps of implementing the corresponding functions for each parameter and each module in the system 200 for storing and repairing data with a temporarily failed RAID array disk in this embodiment, reference may be made to each parameter and step in the above embodiments of the method for storing and repairing data with a temporarily failed RAID array disk, and details are not described here.
In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. Similarly, in the above description of exemplary embodiments of the invention, various features of the embodiments of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the invention and aiding in the understanding of one or more of the various inventive aspects. Where the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names. The steps in the above embodiments should not be construed as limiting the order of execution unless specified otherwise.

Claims (10)

1. A data storage and repair method for temporary failure of a RAID array disk is characterized by comprising the following steps:
s1, when a target RAID array is initialized to operate, respectively setting a first list for data writing and a second list for data reading according to all disks of the target RAID array;
s2, when the running state of any disk in a target RAID array changes, updating a first list according to all currently accessible disks in the target RAID array, and when the updated first list does not contain all disks in the target RAID array, generating at least one degraded IO list according to the updated first list, and judging whether the number of the disks in the updated first list meets a first preset condition to obtain a first judgment result;
s3, when the first judgment result is yes, acquiring and updating the second list according to a second list and all disks in the updated first list which are commonly contained in the second list before the operation state of any disk changes, and judging whether the number of the disks in the updated second list meets a second preset condition or not to obtain a second judgment result;
s4, when the second judgment result is yes, judging whether the number of the disks in the first list is larger than that in the new second list or not to obtain a third judgment result;
and S5, when the third judgment result is yes, repairing the data of the target RAID array by adopting a preset repairing mode, and when the target RAID array is repaired, updating the second list according to the first list so as to complete the data repairing of the target RAID array.
2. The method for storing and repairing temporarily failed data in a RAID array disk of claim 1 wherein the predetermined repair manner comprises:
on the basis of a preset sequence, sequentially judging whether each degraded IO list contains all the disks in the updated first list to obtain a fourth judgment result corresponding to each degraded IO list, and determining each degraded IO list with the fourth judgment result of no as an IO list to be processed;
and respectively reading target data corresponding to each IO information in each IO list to be processed, respectively rewriting each target data into the target RAID array, and deleting each IO list to be processed.
3. The method for storing and repairing temporarily failed data in a RAID array disk according to claim 2, wherein the step of reading target data corresponding to any one IO information in any one to-be-processed IO list includes:
acquiring read data corresponding to the target IO information from all the disks of the new second list; wherein, the target IO information is: any IO information in any IO list to be processed;
and acquiring target data in the read data corresponding to the target IO information through RAID data redundancy and check functions.
4. The method for storing and repairing temporarily failed data in a RAID array disk according to claim 2, wherein the process of rewriting the target data of any IO information into the corresponding disk of the target RAID array includes:
judging whether the updated first list contains all the disks in the target RAID array or not to obtain a fifth judgment result;
when the fifth judgment result is yes, respectively writing the target data corresponding to the target IO information into corresponding disks; wherein, the target IO information is: any IO information in any IO list to be processed.
5. The method for storing and repairing temporarily failed data in RAID array disk according to claim 4, wherein said process of rewriting target data of any IO information into corresponding disk of the target RAID array further comprises:
and when the fifth judgment result is negative, respectively writing the target data corresponding to the target IO information into corresponding disks, and correspondingly storing the target IO information into the degradation IO list corresponding to the updated first list.
6. The method of claim 1, further comprising:
and when the updated first list does not contain all the disks in the target RAID array, storing all the written IO information to a degraded IO list corresponding to the first list.
7. The method of claim 1, further comprising:
when the target RAID array stops operating, storing a first list and a second list corresponding to the target RAID array in the current operating state;
and when the target RAID array is powered on and operated, loading the first list and the second list stored when the target RAID array stops operating.
8. A data storage and repair system for temporary failure of a RAID array disk comprising: the system comprises an initialization module, a first processing module, a second processing module, a third processing module and an operation module;
the initialization module is configured to: when a target RAID array is initialized to operate, respectively setting a first list for data writing and a second list for data reading according to all disks of the target RAID array;
the first processing module is configured to: when the running state of any disk in a target RAID array changes, updating a first list according to all currently accessible disks in the target RAID array, and when the updated first list does not contain all disks in the target RAID array, generating at least one degraded IO list according to the updated first list, and judging whether the number of disks in the updated first list meets a first preset condition to obtain a first judgment result;
the second processing module is configured to: when the first judgment result is yes, acquiring and updating a second list according to a second list which is commonly contained in the any one disk before the operation state changes and all disks in the updated first list, and judging whether the number of the disks in the updated second list meets a second preset condition to obtain a second judgment result;
the third processing module is configured to: when the second judgment result is yes, judging whether the number of the disks in the first list is larger than that in the new second list or not to obtain a third judgment result;
the operation module is used for: and when the third judgment result is yes, repairing the data of the target RAID array by adopting a preset repairing mode, and when the target RAID array is repaired, updating the second list according to the first list so as to complete the data repairing of the target RAID array.
9. The system of claim 8, wherein the predetermined repair mode comprises:
on the basis of a preset sequence, sequentially judging whether each degraded IO list contains all the disks in the updated first list to obtain a fourth judgment result corresponding to each degraded IO list, and determining each degraded IO list with the fourth judgment result of no as an IO list to be processed;
and respectively reading target data corresponding to each IO information in each IO list to be processed, respectively rewriting each target data into the target RAID array, and deleting each IO list to be processed.
10. The system for storing and repairing temporarily failed data in RAID array disk according to claim 9, wherein the process of reading target data corresponding to any one IO information in any one to-be-processed IO list includes:
acquiring read data corresponding to target IO information from all the disks of the new second list; wherein, the target IO information is: any IO information in any IO list to be processed;
and acquiring target data in the read data corresponding to the target IO information through RAID data redundancy and check functions.
CN202211209814.7A 2022-09-30 2022-09-30 Data storage and repair method and system for temporary failure of RAID array disk Active CN115565598B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211209814.7A CN115565598B (en) 2022-09-30 2022-09-30 Data storage and repair method and system for temporary failure of RAID array disk

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211209814.7A CN115565598B (en) 2022-09-30 2022-09-30 Data storage and repair method and system for temporary failure of RAID array disk

Publications (2)

Publication Number Publication Date
CN115565598A true CN115565598A (en) 2023-01-03
CN115565598B CN115565598B (en) 2023-06-02

Family

ID=84743081

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211209814.7A Active CN115565598B (en) 2022-09-30 2022-09-30 Data storage and repair method and system for temporary failure of RAID array disk

Country Status (1)

Country Link
CN (1) CN115565598B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030037281A1 (en) * 1993-06-04 2003-02-20 Network Appliance, Inc. Providing parity in a raid sub-system using non-volatile memory
CN101984400A (en) * 2010-11-05 2011-03-09 成都市华为赛门铁克科技有限公司 RAID control method, device and system
CN102043685A (en) * 2010-12-31 2011-05-04 成都市华为赛门铁克科技有限公司 RAID (redundant array of independent disk) system and data recovery method thereof
CN105808170A (en) * 2016-03-22 2016-07-27 华东交通大学 RAID6 (Redundant Array of Independent Disks 6) encoding method capable of repairing single-disk error by minimum disk accessing
CN106250055A (en) * 2016-07-12 2016-12-21 乐视控股(北京)有限公司 A kind of date storage method and system
US20180300211A1 (en) * 2017-04-17 2018-10-18 EMC IP Holding Company LLC Methods, devices and computer readable mediums for managing storage system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030037281A1 (en) * 1993-06-04 2003-02-20 Network Appliance, Inc. Providing parity in a raid sub-system using non-volatile memory
CN101984400A (en) * 2010-11-05 2011-03-09 成都市华为赛门铁克科技有限公司 RAID control method, device and system
CN102043685A (en) * 2010-12-31 2011-05-04 成都市华为赛门铁克科技有限公司 RAID (redundant array of independent disk) system and data recovery method thereof
CN105808170A (en) * 2016-03-22 2016-07-27 华东交通大学 RAID6 (Redundant Array of Independent Disks 6) encoding method capable of repairing single-disk error by minimum disk accessing
CN106250055A (en) * 2016-07-12 2016-12-21 乐视控股(北京)有限公司 A kind of date storage method and system
US20180300211A1 (en) * 2017-04-17 2018-10-18 EMC IP Holding Company LLC Methods, devices and computer readable mediums for managing storage system

Also Published As

Publication number Publication date
CN115565598B (en) 2023-06-02

Similar Documents

Publication Publication Date Title
US9009526B2 (en) Rebuilding drive data
US7640412B2 (en) Techniques for improving the reliability of file systems
US9110835B1 (en) System and method for improving a data redundancy scheme in a solid state subsystem with additional metadata
US7143308B2 (en) Apparatus, system, and method for differential rebuilding of a reactivated offline RAID member disk
US8356292B2 (en) Method for updating control program of physical storage device in storage virtualization system and storage virtualization controller and system thereof
US7421535B2 (en) Method for demoting tracks from cache
US7721143B2 (en) Method for reducing rebuild time on a RAID device
US7464322B2 (en) System and method for detecting write errors in a storage device
US7640452B2 (en) Method for reconstructing data in case of two disk drives of RAID failure and system therefor
CN102184129B (en) Fault tolerance method and device for disk arrays
JP2005122338A (en) Disk array device having spare disk drive, and data sparing method
CN104050056A (en) File system backup of multi-storage-medium device
US6389511B1 (en) On-line data verification and repair in redundant storage system
JP3435400B2 (en) Data recovery method and disk array controller in disk array device
US6363457B1 (en) Method and system for non-disruptive addition and deletion of logical devices
US10114699B2 (en) RAID consistency initialization method
US7730370B2 (en) Apparatus and method for disk read checking
EP2613258A1 (en) Automatic remapping in redundant array of independent disks and related raid
US20040128582A1 (en) Method and apparatus for dynamic bad disk sector recovery
US10831601B2 (en) Reconstruction hard disk array and reconstruction method for to-be-reconstructed hard disks therein including comparing backup data with an access timestamp of first, second and third hard disks
CN104205097A (en) De-duplicate method device and system
JP4143040B2 (en) Disk array control device, processing method and program for data loss detection applied to the same
JP4248164B2 (en) Disk array error recovery method, disk array control device, and disk array device
CN115565598B (en) Data storage and repair method and system for temporary failure of RAID array disk
JP2003303057A (en) Method for data recovery and disk array controller in disk array apparatus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant