CN110795276A - Storage medium repairing method, computer equipment and storage medium - Google Patents

Storage medium repairing method, computer equipment and storage medium Download PDF

Info

Publication number
CN110795276A
CN110795276A CN201810864296.XA CN201810864296A CN110795276A CN 110795276 A CN110795276 A CN 110795276A CN 201810864296 A CN201810864296 A CN 201810864296A CN 110795276 A CN110795276 A CN 110795276A
Authority
CN
China
Prior art keywords
storage medium
target storage
processing parameters
target
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810864296.XA
Other languages
Chinese (zh)
Inventor
王勇
闫宁
王鹏
朱家稷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201810864296.XA priority Critical patent/CN110795276A/en
Publication of CN110795276A publication Critical patent/CN110795276A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3037Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a memory, e.g. virtual memory, cache
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3055Monitoring arrangements for monitoring the status of the computing system or of the computing system component, e.g. monitoring if the computing system is on, off, available, not available

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Quality & Reliability (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The embodiment of the application discloses a method and a device for repairing a storage medium. The method comprises the following steps: monitoring processing parameters of the storage medium to the access request, determining a target storage medium according to the processing parameters, and executing a repair strategy on the target storage medium, wherein the repair strategy comprises the following steps: data backup, formatting, or re-enabling. Before the storage medium cannot be used due to faults, the applicant finds that the processing parameters are abnormal first, so that when the storage medium runs online, the target storage medium which is likely to fail is found earlier, the influence of the faults of the target storage medium on the whole storage system is avoided, the target storage medium does not need to be removed from the storage system due to the fact that historical log data of the storage medium and an external monitoring program are not needed to be relied on and repaired, the stability and the resource utilization rate of the storage system are improved, and the operation and maintenance cost of the storage system is reduced.

Description

Storage medium repairing method, computer equipment and storage medium
Technical Field
The present application relates to the field of medium repair technologies, and in particular, to a method for repairing a storage medium, a computer device, and a computer-readable storage medium.
Background
The distributed storage system manages a large number of mechanical hard disks, the mechanical hard disks generally have an annual failure rate of 3% -4%, and hard disk failures frequently occur in a large-scale deployed storage system. The hard disk failure causes include factors in multiple aspects such as the hard disk, a file system, an operating system and the like, and in most cases, the hard disk does not need to be returned to a factory for maintenance.
The applicant finds that the method commonly used in the industry relies on log data of the hard disk to find the hard disk with the problem and remove the hard disk with the problem from the storage system, the method relies on historical data and a monitoring program outside the storage system, the hard disk fault is difficult to avoid influencing the whole storage system, the system stability and the resource utilization rate are low, and the operation and maintenance cost is high.
Disclosure of Invention
In view of the above, the present application is proposed in order to provide a method of repairing a storage medium, as well as a computer device, a computer readable storage medium, which overcome the above problems or at least partially solve the above problems.
According to an aspect of the present application, there is provided a repair method of a storage medium, including:
monitoring processing parameters of the storage medium for the access request;
determining a target storage medium according to the processing parameters;
executing a repair policy on the target storage medium, wherein the repair policy comprises: data backup, formatting, or re-enabling.
Optionally, the monitoring processing parameters of the storage medium for the access request includes:
for each storage medium, recording a processing parameter for each access request for the storage medium.
Optionally, the recording, for each storage medium, the processing parameters of each access request for the storage medium includes:
respectively establishing a request queue for each storage medium on each storage device, wherein each storage device comprises at least one storage medium;
and recording the processing parameters of each access request in the request queue.
Optionally, the determining a target storage medium according to the processing parameter includes:
determining that the processing parameters recorded for the storage medium satisfy a condition to be repaired;
and determining the storage medium with the processing parameter meeting the condition to be repaired as the target storage medium.
Optionally, before the determining a target storage medium according to the processing parameter, the method further includes:
extracting a processing parameter of a preset quantile point from a plurality of processing parameters recorded for each storage medium;
the determining that the processing parameters recorded for the storage medium satisfy the condition to be repaired includes:
searching the processing parameter of the maximum preset quantile point in each storage medium;
and determining that the processing parameter of the maximum preset quantile point exceeds a preset processing threshold value.
Optionally, before the determining that the processing parameter recorded for the storage medium satisfies the condition to be repaired, the determining the target storage medium according to the processing parameter further includes:
determining that the storage medium satisfies at least one of the following conditions: the recording period of the processing parameters exceeds the preset period, and the recording number of the processing parameters exceeds the preset number.
Optionally, before the executing the repair policy on the target storage medium, the method includes:
deactivating the target storage medium.
Optionally, before the deactivating the target storage medium, the method further comprises:
and if the number of the storage media which are stopped in a preset time period in the storage system does not exceed a preset threshold value, determining that the target storage media can be stopped.
Optionally, the deactivating the target storage medium comprises:
marking the target storage medium in the storage system not to accept a new access request;
optionally, before the performing the repair policy on the target storage medium, the method includes:
determining that the access requests for the target storage media that have been processed are all completed.
Optionally, the executing the repair policy on the target storage medium includes:
backing up data on the target storage medium;
deleting metadata of a target storage medium in the storage system;
and carrying out formatting processing on the target storage medium.
Optionally, the executing the repair policy on the target storage medium includes:
identifying the repaired target storage medium;
and generating management data of the storage system on the repaired target storage medium.
Optionally, the processing parameter comprises an access delay.
Accordingly, according to another aspect of the present application, there is also provided a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method according to one or more of the above when executing the computer program.
Accordingly, according to another aspect of the present application, there is also provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method as one or more of the above.
According to the embodiment of the application, the processing parameters of the storage medium to the access request are monitored, the target storage medium is determined according to the processing parameters, and the repair strategy is executed on the target storage medium, wherein the repair strategy comprises the following steps: data backup, formatting, or re-enabling. Before the storage medium cannot be used due to faults, the applicant finds that the processing parameters are abnormal first, so that when the storage medium runs online, the target storage medium which is likely to fail is found earlier, the influence of the faults of the target storage medium on the whole storage system is avoided, the target storage medium does not need to be removed from the storage system due to the fact that historical log data of the storage medium and an external monitoring program are not needed to be relied on and repaired, the stability and the resource utilization rate of the storage system are improved, and the operation and maintenance cost of the storage system is reduced.
Further, before determining that the processing parameters meet the conditions to be repaired, at least one of the following conditions needs to be met, the recording period of the processing parameters exceeds the preset period, and the recording number of the processing parameters exceeds the preset number, so that the phenomenon that the recording time is too short or the recording number is too small and not representative is avoided, and the misdiagnosis probability of the storage medium is reduced.
Further, whether the condition to be repaired is met or not is judged according to the processing parameters of the preset quantiles, the problem that the special case of the maximum processing parameter is not representative is solved, misdiagnosis caused by unrepresentative extreme values of the processing parameters is avoided, and the accuracy of determining the storage medium to be repaired is improved.
The foregoing description is only an overview of the technical solutions of the present application, and the present application can be implemented according to the content of the description in order to make the technical means of the present application more clearly understood, and the following detailed description of the present application is given in order to make the above and other objects, features, and advantages of the present application more clearly understandable.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the application. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
FIG. 1 shows a schematic diagram of a repair process for a storage medium;
FIG. 2 is a flowchart illustrating an embodiment of a method for repairing a storage medium according to a first embodiment of the present application;
FIG. 3 is a flowchart illustrating an embodiment of a method for repairing a storage medium according to a second embodiment of the present application;
FIG. 4 is a schematic diagram showing the automatic handling of an abnormal disk;
FIG. 5 is a block diagram illustrating an embodiment of a storage medium repair apparatus according to a third embodiment of the present application;
fig. 6 illustrates an exemplary system that can be used to implement the various embodiments described in this disclosure.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
To enable those skilled in the art to better understand the present application, the following description is made of the concepts related to the present application:
the storage medium includes a hard disk, a flash memory, a floppy disk, an optical disk, a memory card, etc., or any other suitable storage medium, which is not limited in this application. For example, in a distributed storage system, a large number of storage media, especially mechanical hard disks (HDDs), are managed and widely used in various storage systems due to their larger capacity/price ratio.
In a storage system composed of storage media, there are a large number of access requests for each storage media. The access request includes a request for reading or writing to the storage medium, or any other applicable access request, which is not limited in this embodiment of the present application. For example, a distributed storage system may be comprised of a plurality of storage servers (storage nodes), each of which may have at least one storage medium thereon, and may handle a large number of access requests from a network, the access requests being distributed to the storage servers and then handled by the storage media on the storage servers.
When the storage medium processes the access request, the processing parameters of the storage medium to the access request can be obtained through monitoring. The processing parameters include access delay of one access request, error times, average value of access delays of multiple access requests, and the like, or any other suitable processing parameters, which is not limited in this embodiment of the present application. The processing parameters may be obtained for each access request, or may be obtained for multiple access requests.
Before the storage medium cannot be used due to a fault, an application program with slow access, unstable access delay and even a suspended access disk often occurs, that is, a processing parameter of the storage medium to an access request is abnormal, and when the processing parameter is abnormal, the corresponding storage medium is determined as a target storage medium. For example, if the access delay of the storage medium to the access request is not stable, the storage medium is determined as the target storage medium.
To repair the target storage medium, the error occurring on the target storage medium is eliminated, and a repair policy may be executed on the target storage medium. The repair policy includes processing automatically executed by data backup, deleting metadata of the storage medium, formatting processing or re-enabling, or any other suitable processing, which is not limited in this embodiment of the present application.
In an optional embodiment of the present application, the processing parameter includes access delay, monitoring of the access delay may be specifically obtained by recording a time between the start of execution and the return of the result of each access request, or any other suitable monitoring manner, which is not limited in this embodiment of the present application.
In an optional embodiment of the present application, the storage system comprises at least one storage device comprising at least one storage medium. The storage device includes a storage server, etc., or any other suitable storage device, which is not limited in this embodiment of the present application. A request queue may be established on the storage device for each storage medium, and the request queue includes access requests for the storage medium.
In an alternative embodiment of the present application, the storage system stores metadata of the storage medium for describing data stored in the storage medium, and the storage system can determine on which storage medium the data is stored according to the metadata.
In an optional embodiment of the present application, the storage system generates corresponding management data for each storage medium used, where the management data includes a location, a total capacity, a remaining capacity, and the like of the storage medium, or any other suitable data for managing the storage medium, and this is not limited in this embodiment of the present application.
According to an embodiment of the application, in the operation process of the storage system, the hard disk with the problem is found out by relying on log data of the hard disk, and the hard disk with the problem is removed from the storage system, so that the risk that the whole storage system is affected by hard disk faults exists, and the problems of low system stability and resource utilization rate, high operation and maintenance cost and the like are caused. As shown in fig. 1, a schematic diagram of a repair process of a storage medium, the present application provides a repair mechanism of a storage medium, which determines a target storage medium by monitoring a processing parameter of the storage medium for an access request, and according to the processing parameter, and executes a repair policy on the target storage medium, where the repair policy includes: data backup, formatting processing and re-enabling. Before the target storage medium cannot be used due to faults, the applicant finds that the processing parameters are abnormal first, so that the target storage medium which is likely to fail is found earlier when the storage medium runs online, the influence of the faults of the target storage medium on the whole storage system is avoided, the historical log data of the target storage medium and a monitoring program outside the storage system are not required to be relied on and repaired, the target storage medium is prevented from being removed from the storage system after the faults occur, the stability and the resource utilization rate of the storage system are improved, and the operation and maintenance cost of the storage system is reduced. The present application is applicable to, but not limited to, the above application scenarios.
Referring to fig. 2, a flowchart of an embodiment of a method for repairing a storage medium according to a first embodiment of the present application is shown, where the method may specifically include the following steps:
step 101, monitoring processing parameters of the storage medium for the access request.
In the embodiment of the present application, the storage system constantly has access requests for each storage medium, each storage medium respectively processes each access request, and in the process of processing, a processing parameter of each access request may be monitored. The implementation manner of monitoring the processing parameter of the storage medium for the access request may include recording, for each storage system of the storage system, the processing parameter of each access request for the storage medium, or any other applicable monitoring manner, which is not limited in this embodiment of the present application.
And step 102, determining a target storage medium according to the processing parameters.
In this embodiment of the present application, a storage medium with an abnormal processing parameter may be determined as a target storage medium, and a specific implementation manner may include multiple manners, for example, determining that a processing parameter recorded for the storage medium satisfies a condition to be repaired, determining a storage medium with a processing parameter satisfying a condition to be repaired as a target storage medium, or any other applicable manner, which is not limited in this embodiment of the present application.
For example, the storage system includes a plurality of storage devices, on each storage device, a request queue is established for each disk, the access delay of each request is recorded, and when the recording time exceeds 5 minutes and the total number of the records is not less than 1000 times, the delay value at 99.9% of the quantile point is found from the recorded access delay, that is, the delay value at 99.9% after the access delay is arranged from small to large. It should be noted that the common quantiles are binary quantiles (i.e., median), quartile and the like, and the preset quantile in the present application is determined empirically and can be adjusted according to actual conditions, which is not limited in the embodiments of the present application.
Step 103, executing a repair strategy on the target storage medium.
In the embodiment of the present application, after the target storage medium is determined, a repair policy may be executed on the target storage medium, and the failure of the target storage medium may also disappear, so as to prevent the target storage medium from becoming completely unusable, and even affecting the entire storage system. The restoration strategy is automatically executed, the storage medium to be restored does not need to be removed from the storage system, only isolation is carried out, after the automatic restoration is successful, the storage system can automatically identify the restored target storage medium, and the restored target storage medium is used as a new storage medium for continuous use.
It is worth noting that in a large-scale deployment of storage systems, tens of thousands of storage media are managed, which typically have a certain annual failure rate. According to the method and the device, under the condition that the storage system is on line, the storage medium with hidden danger which can not be used completely or even influences the whole storage system is found out according to the processing parameters of the storage medium to the access request, the storage medium is determined to be the target storage medium, the target storage medium with hidden danger is automatically repaired, and the probability that the target storage medium can not be used completely due to faults or even has adverse influence on the whole storage system is greatly reduced.
According to the embodiment of the application, the processing parameters of the storage medium to the access request are monitored, the target storage medium is determined according to the processing parameters, and the repair strategy is executed on the target storage medium, wherein the repair strategy comprises the following steps: data backup, formatting processing and re-enabling. Before the target storage medium cannot be used due to faults, the applicant finds that the processing parameters are abnormal first, so that the target storage medium which is likely to fail is found earlier when the storage medium runs online, the influence of the faults of the target storage medium on the whole storage system is avoided, the historical log data of the storage medium and a monitoring program outside the storage system are not required to be relied on and repaired, the storage medium is prevented from being removed from the storage system after the faults occur, the stability and the resource utilization rate of the storage system are improved, and the operation and maintenance cost of the storage system is reduced.
Referring to fig. 3, a flowchart of an embodiment of a method for repairing a storage medium according to the second embodiment of the present application is shown, where the method specifically includes the following steps:
step 201, for each storage medium, recording a processing parameter for each access request for the storage medium.
In the embodiment of the present application, for each storage medium, the processing parameter of each access request for the storage medium is recorded separately. For example, the access latency of each access request is recorded.
In an embodiment of the present application, optionally, the storage device includes at least one storage medium, and one implementation manner of recording, for each storage medium, the processing parameter of each access request for the storage medium includes: and respectively establishing a request queue for each storage medium on each storage device, and recording the processing parameters of each access request in the request queue.
Step 202, determining that the storage medium satisfies at least one of the following conditions: the recording period of the processing parameters exceeds the preset period, and the recording number of the processing parameters exceeds the preset number.
In the embodiment of the application, before determining that the processing parameters meet the conditions to be repaired, at least one of the following conditions needs to be met, the recording period of the processing parameters exceeds the preset period, and the recording number of the processing parameters exceeds the preset number, so that the misdiagnosis probability of the storage medium is reduced because the recording time is too short or the recording number is too small and is not representative. The preset period and the preset number can be continuously adjusted according to the operation condition, and specifically can be any applicable value, which is not limited in the embodiment of the application.
For example, if the recording period of the access delay exceeds 5 minutes and the recording number exceeds 1000 times, a diagnosis is triggered to determine whether the processing parameters satisfy the conditions to be repaired.
Step 203, determining that the processing parameters recorded for the storage medium satisfy the condition to be repaired.
In the embodiment of the present application, the condition to be repaired may include multiple types, for example, one implementation manner includes: when the recording period of the access delay exceeds a preset period and the recording number exceeds a preset number, finding out the access delay of 99.9% quantiles from the recorded processing parameters, finding out the maximum one of the access delays of 99.9% quantiles from all storage media, and if the access delay exceeds a set threshold, the processing parameters of the storage media meet the condition to be repaired. In another implementation manner, when the recording period of the access delay exceeds a preset period and the recording number exceeds a preset number, the recorded processing parameters are averaged, and if the average value of the processing parameters exceeds a set threshold, the processing parameters of the storage medium meet the condition to be repaired. Any suitable condition to be repaired may be specifically included, and this is not limited in this application embodiment.
In an embodiment of the present application, optionally, before determining the target storage medium in the storage system according to the processing parameter, the method may further include: extracting a processing parameter of a preset quantile point from a plurality of processing parameters recorded for each storage medium; one implementation of determining that a processing parameter recorded for the storage medium satisfies a condition to be repaired may include: searching the processing parameter of the maximum preset quantile point in each storage medium; and determining that the processing parameter of the maximum preset quantile point exceeds a preset processing threshold value.
And extracting the processing parameters of the preset quantiles aiming at each storage medium, wherein the processing parameters of the preset quantiles are equal to the numbers of the preset quantiles after all the recorded processing parameters are arranged from small to large. For example, the access delay of 99.9% quantile is found from the recorded processing parameters.
Each storage medium may extract the processing parameter of the preset quantile, and then find the largest processing parameter of the preset quantile in each storage medium, specifically, the largest processing parameter of the preset quantile may be found for each storage device, or the largest processing parameter of the preset quantile may be found for a plurality of storage devices, which is not limited in the embodiment of the present application.
And then judging whether the processing parameter of the maximum preset quantile point exceeds a preset processing threshold value, if so, indicating that the processing parameter of the storage medium meets the condition to be repaired, otherwise, not meeting the condition to be repaired. The preset processing threshold may be set to any applicable value, and is specifically adjusted according to the operation condition, which is not limited in the embodiment of the present application. Whether the conditions to be repaired are met or not is judged according to the processing parameters of the preset quantiles, the problem that the special case of the maximum processing parameter is not representative is solved, misdiagnosis caused by unrepresentative extreme values of the processing parameters is avoided, and the accuracy of determining the storage medium to be repaired is improved.
And step 204, determining the storage medium with the processing parameter meeting the condition to be repaired as the target storage medium.
In the embodiment of the present application, a storage medium whose processing parameter satisfies a condition to be repaired, that is, a storage medium whose processing parameter is completely unusable and even has an influence on the entire storage system, may be determined as a target storage medium. Specifically, the storage medium meeting the condition to be repaired may be determined separately for each storage device, or the storage medium meeting the condition to be repaired may be determined together for a plurality of storage devices, or any other arbitrary use manner, which is not limited in this embodiment of the present application.
Step 205, if the number of the storage media in the storage system in the preset time period does not exceed the preset threshold, determining that the target storage media can be deactivated.
In an embodiment of the present application, after determining the target storage medium, a request may be made to the storage system to deactivate the target storage medium. The storage system judges whether the number of the storage media in the preset time period exceeds a preset threshold value or not, and if not, the target storage media can be determined to be deactivated, so that the condition that a large number of storage media are deactivated due to the problem of the storage system is avoided.
Step 206, deactivating the target storage medium.
In this embodiment of the present application, before executing the repair policy, the target storage medium is deactivated, specifically, a request needs to be initiated to the storage system, and the target storage medium is marked as unusable in the storage system, or any other applicable deactivation manner, which is not limited in this embodiment of the present application.
Optionally, in an embodiment of the present application, one implementation of deactivating the target storage medium may include: the target storage medium is marked in the storage system not to accept the new access request and the storage medium is deactivated, i.e. the target storage medium is marked in the storage system not to accept the new access request.
Step 207, determining that all the access requests processed for the target storage medium are completed.
In the embodiment of the application, after the target storage medium is deactivated, the processing state of the access request which is processed on the target storage medium is monitored, and when all the access requests are processed, the repair policy can be executed to avoid errors of the access request which is processed.
And step 208, backing up the data on the target storage medium.
In this embodiment of the present application, repairing the storage medium requires backing up data on the target storage medium first, and the data may be backed up to other storage media of the storage system, which may specifically include any applicable backup manner, and this embodiment of the present application does not limit this.
In step 209, the metadata of the target storage medium in the storage system is deleted.
In the embodiment of the present application, after the data is backed up, the metadata of the target storage medium in the storage system may be deleted, so that the storage system no longer uses the storage medium, that is, the original access request for the storage medium will not be allocated to the storage system by the storage system any more
Step 210, performing formatting processing on the target storage medium.
In this embodiment of the present application, in order to eliminate a failure on a storage medium, a formatting process is finally performed on a target storage medium, and specifically, any applicable formatting process may be adopted, for example, a low-level formatting process is adopted, and a recovery means such as reset (reset) on hardware of the storage medium is adopted, which is not limited in this embodiment of the present application.
Step 211, identify the repaired target storage medium.
In the embodiment of the present application, after the repair is completed, the storage medium is a new storage medium for the storage system or the storage device, and is no longer a previous storage medium. The storage system or storage device may identify the new storage medium (i.e., the repaired target storage medium).
Step 212, generating management data of the storage system to the repaired target storage medium.
In the embodiment of the present application, after the target storage medium after being repaired is identified, the storage system is to use the storage medium, and management data for the repaired storage medium is generated, so that the storage medium can be re-embedded into the storage system, that is, re-added into the storage system.
According to the embodiment of the application, the storage medium is determined to meet at least one of the following conditions by recording the processing parameters of each access request aiming at the storage medium aiming at each storage medium: the method comprises the steps that the recording period of processing parameters exceeds a preset period, the recording number of the processing parameters exceeds a preset number, the processing parameters recorded aiming at the storage media are determined to meet the condition to be repaired, the storage media with the processing parameters meeting the condition to be repaired are determined to be target storage media, if the stop number of the storage media in a preset time period in a storage system does not exceed a preset threshold value, the target storage media are determined to be capable of being stopped, the target storage media are stopped, all processed access requests aiming at the target storage media are determined to be completed, data on the target storage media are backed up, metadata of the target storage media in the storage system are deleted, the target storage media after being repaired are identified, and management data of the storage system on the target storage media after being repaired are generated. Before the storage medium cannot be used due to faults, the applicant finds that the processing parameters are abnormal first, so that the storage medium which is likely to fail is found earlier when the storage medium runs online, the influence of the faults of the storage medium on the whole storage system is avoided, the storage medium does not need to rely on historical log data of the storage medium and a monitoring program outside the storage system, the storage medium is repaired and added into the storage system again, the storage medium is prevented from being removed from the storage system after the faults occur, the stability and the resource utilization rate of the storage system are improved, and the operation and maintenance cost of the storage system is reduced.
In order to make the present application better understood by those skilled in the art, an implementation of the present application is described below by way of specific examples.
Fig. 4 is a schematic diagram of an automatic processing procedure of an abnormal disk.
Step 1, counting the access delay of each access request. And establishing a request queue for each disk, and recording the access delay of each request. And counts for more than one period (e.g., more than 5 minutes) and totals no less than some threshold, such as 1000 records of requests. The target delay value with quantile of 99.9% is found.
And 2, finding out a disk with unstable time delay. And finding out the disk with the largest target delay value in all disks, wherein the largest target delay value exceeds a certain delay threshold value.
And 3, judging whether the bad disks can be isolated or not. The disk is requested to be deactivated by a central decision-making program of the storage system, and the central decision-making program avoids that a large number of disks are processed due to system problems according to whether more disks are requested to be deactivated in the past period of time.
And 4, deactivating the disk from the storage system. If there are no more disk requests to be deactivated in the past, the found disk is marked as unavailable in the storage system, and all requests on the disk are waited for to be completed. Data repair is then triggered, i.e., the data is backed up to other disks of the system. The metadata describing the storage system is eventually purged from the storage system. The deactivation is not removed from the storage system, but is isolated.
And 5, online formatting. And reformatting the disk and removing the data of the disk. Under linux systems, file systems are generally recreated, and this process gives up the data already on the original disk, and the failure to access the disk will disappear.
And 6, adding the storage system again. And the storage system re-identifies the formatted disk and generates a management data structure of the disk.
Referring to fig. 5, a block diagram of a storage medium repair apparatus according to a third embodiment of the present application is shown, which may specifically include:
a parameter monitoring module 301, configured to monitor a processing parameter of the storage medium for the access request;
a medium determining module 302, configured to determine a target storage medium according to the processing parameter;
a media repair module 303, configured to execute a repair policy on the target storage media, where the repair policy includes: data backup, formatting, or re-enabling.
In an embodiment of the present application, optionally, the parameter monitoring module includes:
and the parameter recording submodule is used for recording the processing parameter of each access request aiming at the storage medium aiming at each storage medium.
In an embodiment of the present application, optionally, the parameter recording sub-module includes:
the device comprises a queue establishing unit, a queue establishing unit and a queue generating unit, wherein the queue establishing unit is used for respectively establishing a request queue for each storage medium on each storage device, and the storage device comprises at least one storage medium;
and the parameter recording unit is used for recording the processing parameters of each access request in the request queue.
In an embodiment of the application, optionally, the medium determining module includes:
a first condition determining submodule for determining that a processing parameter recorded for the storage medium satisfies a condition to be repaired;
and the medium determining submodule is used for determining the storage medium with the processing parameter meeting the condition to be repaired as the target storage medium.
In an embodiment of the present application, optionally, the apparatus further includes:
a parameter extraction module, configured to extract a processing parameter of a preset quantile from a plurality of processing parameters recorded for each storage medium before determining a target storage medium according to the processing parameter;
the first condition determination submodule includes:
the parameter searching unit is used for searching the processing parameter of the maximum preset quantile point in each storage medium;
and the threshold value determining unit is used for determining that the processing parameter of the maximum preset quantile point exceeds a preset processing threshold value.
In an embodiment of the application, optionally, the medium determining module further includes:
a second condition determining sub-module, configured to determine that the storage medium satisfies at least one of the following conditions before the determination that the processing parameter recorded for the storage medium satisfies a condition to be repaired: the recording period of the processing parameters exceeds the preset period, and the recording number of the processing parameters exceeds the preset number.
In an embodiment of the present application, optionally, the apparatus includes:
a medium disabling module to disable the target storage medium before the repair policy is executed on the target storage medium.
In an embodiment of the present application, optionally, the apparatus further includes:
a determining module capable of being deactivated, configured to determine that the target storage medium is deactivated if a number of deactivated storage media in a storage system within a preset time period does not exceed a preset threshold before the target storage medium is deactivated.
In an embodiment of the present application, optionally, the medium disabling module includes:
a request marking submodule for marking the target storage medium in the storage system not to accept a new access request;
in an embodiment of the present application, optionally, the apparatus includes:
a completion determination module, configured to determine that all of the access requests processed for the target storage medium are completed before the repair policy is executed on the target storage medium.
In an embodiment of the present application, optionally, the media repair module includes:
the backup submodule is used for backing up the data on the target storage medium;
the data deleting submodule is used for deleting the metadata of the target storage medium in the storage system;
and the formatting submodule is used for carrying out formatting processing on the target storage medium.
In an embodiment of the present application, optionally, the media repair module includes:
the medium identification submodule is used for identifying the repaired target storage medium;
and the management data generation submodule is used for generating the management data of the storage system to the repaired target storage medium.
In an embodiment of the application, optionally, the processing parameter includes an access delay.
According to the embodiment of the application, a target storage medium is determined by monitoring processing parameters of the storage medium to access requests and according to the processing parameters, and a repair strategy is executed on the target storage medium, wherein the repair strategy comprises the following steps: data backup, formatting processing and re-enabling. Before the storage medium cannot be used due to faults, the applicant finds that the processing parameters are abnormal first, so that when the storage medium runs online, the target storage medium which is likely to fail is found earlier, the influence of the faults of the target storage medium on the whole storage system is avoided, the target storage medium does not need to be removed from the storage system due to the fact that historical log data of the storage medium and an external monitoring program are not needed to be relied on and repaired, the stability and the resource utilization rate of the storage system are improved, and the operation and maintenance cost of the storage system is reduced.
For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.
Embodiments of the disclosure may be implemented as a system using any suitable hardware, firmware, software, or any combination thereof, in a desired configuration. Fig. 6 schematically illustrates an exemplary system (or apparatus) 400 that can be used to implement various embodiments described in this disclosure.
For one embodiment, fig. 6 illustrates an exemplary system 400 having one or more processors 402, a system control module (chipset) 404 coupled to at least one of the processor(s) 402, system memory 406 coupled to the system control module 404, non-volatile memory (NVM)/storage 408 coupled to the system control module 404, one or more input/output devices 410 coupled to the system control module 404, and a network interface 412 coupled to the system control module 406.
Processor 402 may include one or more single-core or multi-core processors, and processor 402 may include any combination of general-purpose or special-purpose processors (e.g., graphics processors, application processors, baseband processors, etc.). In some embodiments, the system 400 can function as a browser as described in embodiments of the present application.
In some embodiments, system 400 may include one or more computer-readable media (e.g., system memory 406 or NVM/storage 408) having instructions and one or more processors 402 in combination with the one or more computer-readable media configured to execute the instructions to implement modules to perform the actions described in this disclosure.
For one embodiment, system control module 404 may include any suitable interface controllers to provide any suitable interface to at least one of processor(s) 402 and/or any suitable device or component in communication with system control module 404.
The system control module 404 may include a memory controller module to provide an interface to the system memory 406. The memory controller module may be a hardware module, a software module, and/or a firmware module.
System memory 406 may be used, for example, to load and store data and/or instructions for system 400. For one embodiment, system memory 406 may comprise any suitable volatile memory, such as suitable DRAM. In some embodiments, the system memory 406 may include a double data rate type four synchronous dynamic random access memory (DDR4 SDRAM).
For one embodiment, system control module 404 may include one or more input/output controllers to provide an interface to NVM/storage 408 and input/output device(s) 410.
For example, NVM/storage 408 may be used to store data and/or instructions. NVM/storage 408 may include any suitable non-volatile memory (e.g., flash memory) and/or may include any suitable non-volatile storage device(s) (e.g., one or more hard disk drive(s) (HDD (s)), one or more Compact Disc (CD) drive(s), and/or one or more Digital Versatile Disc (DVD) drive (s)).
NVM/storage 408 may include storage resources that are physically part of the device on which system 400 is installed or may be accessed by the device and not necessarily part of the device. For example, NVM/storage 408 may be accessed over a network via input/output device(s) 410.
Input/output device(s) 410 may provide an interface for system 400 to communicate with any other suitable device, and input/output devices 410 may include communication components, audio components, sensor components, and the like. Network interface 412 may provide an interface for system 400 to communicate over one or more networks, and system 400 may communicate wirelessly with one or more components of a wireless network according to any of one or more wireless network standards and/or protocols, such as to access a wireless network based on a communication standard, such as WiFi, 2G, or 3G, or a combination thereof.
For one embodiment, at least one of the processor(s) 402 may be packaged together with logic for one or more controller(s) (e.g., memory controller module) of the system control module 404. For one embodiment, at least one of the processor(s) 402 may be packaged together with logic for one or more controller(s) of the system control module 404 to form a System In Package (SiP). For one embodiment, at least one of the processor(s) 402 may be integrated on the same die with logic for one or more controller(s) of the system control module 404. For one embodiment, at least one of the processor(s) 402 may be integrated on the same die with logic for one or more controller(s) of the system control module 404 to form a system on a chip (SoC).
In various embodiments, system 400 may be, but is not limited to being: a browser, a workstation, a desktop computing device, or a mobile computing device (e.g., a laptop computing device, a handheld computing device, a tablet, a netbook, etc.). In various embodiments, system 400 may have more or fewer components and/or different architectures. For example, in some embodiments, system 400 includes one or more cameras, a keyboard, a Liquid Crystal Display (LCD) screen (including a touch screen display), a non-volatile memory port, multiple antennas, a graphics chip, an Application Specific Integrated Circuit (ASIC), and speakers.
Wherein, if the display includes a touch panel, the display screen may be implemented as a touch screen display to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation.
The present application further provides a non-volatile readable storage medium, where one or more modules (programs) are stored in the storage medium, and when the one or more modules are applied to a terminal device, the one or more modules may cause the terminal device to execute instructions (instructions) of method steps in the present application.
In one example, a computer device is provided, comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method according to the embodiments of the present application when executing the computer program.
There is also provided in one example a computer readable storage medium having stored thereon a computer program, characterized in that the program, when executed by a processor, implements a method as one or more of the embodiments of the application.
An embodiment of the application discloses a method and a device for repairing a storage medium, and example 1 includes a method for repairing a storage medium, including:
monitoring processing parameters of the storage medium for the access request;
determining a target storage medium according to the processing parameters;
executing a repair policy on the target storage medium, wherein the repair policy comprises: data backup, formatting, or re-enabling.
Example 2 may include the method of example 1, wherein the monitoring processing parameters of the storage medium for the access request includes:
for each storage medium, recording a processing parameter for each access request for the storage medium.
Example 3 may include the method of example 1 and/or example 2, wherein the recording, for each storage medium of the storage system, the processing parameters for each access request for the storage medium includes:
respectively establishing a request queue for each storage medium on each storage device, wherein each storage device comprises at least one storage medium;
and recording the processing parameters of each access request in the request queue.
Example 4 may include the method of one or more of examples 1-3, wherein the determining a target storage medium according to the processing parameter includes:
determining that the processing parameters recorded for the storage medium satisfy a condition to be repaired;
and determining the storage medium with the processing parameter meeting the condition to be repaired as the target storage medium.
Example 5 may include the method of one or more of examples 1-4, wherein prior to the determining a target storage medium from the processing parameters, the method further comprises:
extracting a processing parameter of a preset quantile point from a plurality of processing parameters recorded for each storage medium;
the determining that the processing parameters recorded for the storage medium satisfy the condition to be repaired includes:
searching the processing parameter of the maximum preset quantile point in each storage medium;
and determining that the processing parameter of the maximum preset quantile point exceeds a preset processing threshold value.
Example 6 may include the method of one or more of examples 1-5, wherein, prior to the determining that the processing parameter recorded for the storage medium satisfies the condition to be repaired, the determining the target storage medium according to the processing parameter further comprises:
determining that the storage medium satisfies at least one of the following conditions: the recording period of the processing parameters exceeds the preset period, and the recording number of the processing parameters exceeds the preset number.
Example 7 may include the method of one or more of examples 1-6, wherein prior to the performing a repair policy on the target storage media, the method includes:
deactivating the target storage medium.
Example 8 may include the method of one or more of examples 1-7, wherein, prior to the deactivating the target storage medium, the method further comprises:
and if the number of the storage media which are stopped in a preset time period in the storage system does not exceed a preset threshold value, determining that the target storage media can be stopped.
Example 9 may include the method of one or more of examples 1-8, wherein the deactivating the target storage medium comprises:
marking the target storage medium in the storage system not to accept a new access request;
example 10 may include the method of one or more of examples 1-9, wherein prior to the performing a repair policy on the target storage media, the method includes:
determining that the access requests for the target storage media that have been processed are all completed.
Example 11 may include the method of one or more of examples 1-10, wherein the performing a repair policy on the target storage media comprises:
backing up data on the target storage medium;
deleting metadata of a target storage medium in the storage system;
and carrying out formatting processing on the target storage medium.
Example 12 may include the method of one or more of examples 1-11, wherein the performing a repair policy on the target storage media comprises:
identifying the repaired target storage medium;
and generating management data of the storage system on the repaired target storage medium.
Example 13 may include the method of one or more of examples 1-12, wherein the processing parameter comprises an access latency.
Example 14 includes a storage medium repair apparatus comprising:
the parameter monitoring module is used for monitoring the processing parameters of the storage medium to the access request;
the medium determining module is used for determining a target storage medium according to the processing parameters;
a media repair module configured to execute a repair policy on the target storage media, wherein the repair policy includes: data backup, formatting, or re-enabling.
Example 15 may include the apparatus of example 14, wherein the parameter monitoring module comprises:
and the parameter recording submodule is used for recording the processing parameter of each access request aiming at the storage medium aiming at each storage medium.
Example 16 may include the apparatus of example 14 and/or example 15, wherein the parameter logging submodule includes:
the device comprises a queue establishing unit, a queue establishing unit and a queue generating unit, wherein the queue establishing unit is used for respectively establishing a request queue for each storage medium on each storage device, and the storage device comprises at least one storage medium;
and the parameter recording unit is used for recording the processing parameters of each access request in the request queue.
Example 17 may include the apparatus of one or more of examples 14-16, wherein the medium determination module comprises:
a first condition determining submodule for determining that a processing parameter recorded for the storage medium satisfies a condition to be repaired;
and the medium determining submodule is used for determining the storage medium with the processing parameter meeting the condition to be repaired as the target storage medium.
Example 18 may include the apparatus of one or more of examples 14-17, wherein the apparatus further comprises:
a parameter extraction module, configured to extract a processing parameter of a preset quantile from a plurality of processing parameters recorded for each storage medium before determining a target storage medium according to the processing parameter;
the first condition determination submodule includes:
the parameter searching unit is used for searching the processing parameter of the maximum preset quantile point in each storage medium;
and the threshold value determining unit is used for determining that the processing parameter of the maximum preset quantile point exceeds a preset processing threshold value.
Example 19 may include the apparatus of one or more of examples 14-18, wherein the medium determination module further comprises:
a second condition determining sub-module, configured to determine that the storage medium satisfies at least one of the following conditions before the determination that the processing parameter recorded for the storage medium satisfies a condition to be repaired: the recording period of the processing parameters exceeds the preset period, and the recording number of the processing parameters exceeds the preset number.
Example 20 may include the apparatus of one or more of examples 14-19, wherein the apparatus comprises:
a medium disabling module to disable the target storage medium before the repair policy is executed on the target storage medium.
Example 21 may include the apparatus of one or more of examples 14-20, wherein the apparatus further comprises:
a deactivable determining module, configured to determine that the target storage medium is deactivable if a number of deactivable storage media in a preset time period in the storage system does not exceed a preset threshold before the target storage medium is deactivated.
Example 22 may include the apparatus of one or more of examples 14-21, wherein the medium disabling module comprises:
a request marking submodule for marking the target storage medium in the storage system not to accept a new access request;
example 23 may include the apparatus of one or more of examples 14-22, wherein the apparatus comprises:
a completion determination module, configured to determine that all of the access requests processed for the target storage medium are completed before the repair policy is executed on the target storage medium.
Example 24 may include the apparatus of one or more of examples 14-23, wherein the media repair module comprises:
the backup submodule is used for backing up the data on the target storage medium;
the data deleting submodule is used for deleting the metadata of the target storage medium in the storage system;
and the formatting submodule is used for carrying out formatting processing on the target storage medium.
Example 25 may include the apparatus of one or more of examples 14-24, wherein the media repair module comprises:
the medium identification submodule is used for identifying the repaired target storage medium;
and the management data generation submodule is used for generating the management data of the storage system to the repaired target storage medium.
Example 26 may include the apparatus of one or more of examples 14-25, wherein the processing parameter comprises an access latency.
Example 27 includes a computer device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing a method as in one or more of examples 1-13 when executing the computer program.
Example 28 includes a computer-readable storage medium having stored thereon a computer program that, when executed by a processor, implements a method as in one or more of examples 1-13.
Although certain examples have been illustrated and described for purposes of description, a wide variety of alternate and/or equivalent implementations, or calculations, may be made to achieve the same objectives without departing from the scope of practice of the present application. This application is intended to cover any adaptations or variations of the embodiments discussed herein. Therefore, it is manifestly intended that the embodiments described herein be limited only by the claims and the equivalents thereof.

Claims (15)

1. A method of repairing a storage medium, comprising:
monitoring processing parameters of the storage medium for the access request;
determining a target storage medium according to the processing parameters;
executing a repair policy on the target storage medium, wherein the repair policy comprises: data backup, formatting, or re-enabling.
2. The method of claim 1, wherein monitoring processing parameters of the storage medium for the access request comprises:
for each storage medium, recording a processing parameter for each access request for the storage medium.
3. The method of claim 2, wherein the recording, for each storage medium, the processing parameters for each access request for the storage medium comprises:
respectively establishing a request queue for each storage medium on each storage device, wherein each storage device comprises at least one storage medium;
and recording the processing parameters of each access request in the request queue.
4. The method of claim 1, wherein determining the target storage medium based on the processing parameters comprises:
determining that the processing parameters recorded for the storage medium satisfy a condition to be repaired;
and determining the storage medium with the processing parameter meeting the condition to be repaired as the target storage medium.
5. The method of claim 4, wherein prior to said determining a target storage medium from said processing parameters, said method further comprises:
extracting a processing parameter of a preset quantile point from a plurality of processing parameters recorded for each storage medium;
the determining that the processing parameters recorded for the storage medium satisfy the condition to be repaired includes:
searching the processing parameter of the maximum preset quantile point in each storage medium;
and determining that the processing parameter of the maximum preset quantile point exceeds a preset processing threshold value.
6. The method according to claim 4, wherein before the determining that the processing parameters recorded for the storage medium satisfy the condition to be repaired, the determining the target storage medium according to the processing parameters further comprises:
determining that the storage medium satisfies at least one of the following conditions: the recording period of the processing parameters exceeds the preset period, and the recording number of the processing parameters exceeds the preset number.
7. The method of claim 1, wherein prior to said executing a repair policy on said target storage media, said method comprises:
deactivating the target storage medium.
8. The method of claim 7, wherein prior to said deactivating the target storage medium, the method further comprises:
and if the number of the storage media which are stopped in a preset time period in the storage system does not exceed a preset threshold value, determining that the target storage media can be stopped.
9. The method of claim 7, wherein the deactivating the target storage medium comprises:
marking the target storage medium in the storage system as not accepting new access requests.
10. The method of claim 1, wherein prior to said executing a repair policy on the target storage medium, the method comprises:
determining that the access requests for the target storage media that have been processed are all completed.
11. The method of claim 1, wherein the performing a repair policy on the target storage medium comprises:
backing up data on the target storage medium;
deleting metadata of a target storage medium in the storage system;
and carrying out formatting processing on the target storage medium.
12. The method of claim 1, wherein the performing a repair policy on the target storage medium comprises:
identifying the repaired target storage medium;
and generating management data of the storage system on the repaired target storage medium.
13. The method of claim 1, wherein the processing parameter comprises an access latency.
14. A computer arrangement comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method according to one or more of claims 1-13 when executing the computer program.
15. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to one or more of claims 1-13.
CN201810864296.XA 2018-08-01 2018-08-01 Storage medium repairing method, computer equipment and storage medium Pending CN110795276A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810864296.XA CN110795276A (en) 2018-08-01 2018-08-01 Storage medium repairing method, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810864296.XA CN110795276A (en) 2018-08-01 2018-08-01 Storage medium repairing method, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN110795276A true CN110795276A (en) 2020-02-14

Family

ID=69426143

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810864296.XA Pending CN110795276A (en) 2018-08-01 2018-08-01 Storage medium repairing method, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110795276A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111984195A (en) * 2020-08-19 2020-11-24 广州邦讯信息系统有限公司 Method and device for improving stability of embedded Linux system
CN115114065A (en) * 2022-06-24 2022-09-27 苏州浪潮智能科技有限公司 Memory repair method, system, storage medium and equipment
CN116110562A (en) * 2023-04-12 2023-05-12 深圳英美达医疗技术有限公司 Error management method and device for medical equipment, computer equipment and storage medium
CN115114065B (en) * 2022-06-24 2024-06-28 苏州浪潮智能科技有限公司 Memory repair method, system, storage medium and equipment

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020065833A1 (en) * 2000-11-30 2002-05-30 Emc Corporation System and method for evaluating changes in performance arising from reallocation of files among disk storage units
CN105468484A (en) * 2014-09-30 2016-04-06 伊姆西公司 Method and apparatus for determining fault location in storage system
CN106407083A (en) * 2016-10-26 2017-02-15 华为技术有限公司 Fault detection method and device
CN107273231A (en) * 2016-04-07 2017-10-20 阿里巴巴集团控股有限公司 Distributed memory system hard disk tangles fault detect, processing method and processing device
CN107577545A (en) * 2016-07-05 2018-01-12 北京金山云网络技术有限公司 A kind of failed disk detection and restorative procedure and device
CN107643877A (en) * 2016-07-22 2018-01-30 中国电信股份有限公司 Disk failure detection method and device
CN107844381A (en) * 2016-09-21 2018-03-27 中国电信股份有限公司 The fault handling method and device of storage system

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020065833A1 (en) * 2000-11-30 2002-05-30 Emc Corporation System and method for evaluating changes in performance arising from reallocation of files among disk storage units
CN105468484A (en) * 2014-09-30 2016-04-06 伊姆西公司 Method and apparatus for determining fault location in storage system
CN107273231A (en) * 2016-04-07 2017-10-20 阿里巴巴集团控股有限公司 Distributed memory system hard disk tangles fault detect, processing method and processing device
CN107577545A (en) * 2016-07-05 2018-01-12 北京金山云网络技术有限公司 A kind of failed disk detection and restorative procedure and device
CN107643877A (en) * 2016-07-22 2018-01-30 中国电信股份有限公司 Disk failure detection method and device
CN107844381A (en) * 2016-09-21 2018-03-27 中国电信股份有限公司 The fault handling method and device of storage system
CN106407083A (en) * 2016-10-26 2017-02-15 华为技术有限公司 Fault detection method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
刘榴等: "分布式存储系统中磁盘故障检测机制", 《信息技术》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111984195A (en) * 2020-08-19 2020-11-24 广州邦讯信息系统有限公司 Method and device for improving stability of embedded Linux system
CN115114065A (en) * 2022-06-24 2022-09-27 苏州浪潮智能科技有限公司 Memory repair method, system, storage medium and equipment
CN115114065B (en) * 2022-06-24 2024-06-28 苏州浪潮智能科技有限公司 Memory repair method, system, storage medium and equipment
CN116110562A (en) * 2023-04-12 2023-05-12 深圳英美达医疗技术有限公司 Error management method and device for medical equipment, computer equipment and storage medium
CN116110562B (en) * 2023-04-12 2023-11-24 深圳英美达医疗技术有限公司 Error management method and device for medical equipment, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
US10599536B1 (en) Preventing storage errors using problem signatures
US20200387311A1 (en) Disk detection method and apparatus
US12014791B2 (en) Memory fault handling method and apparatus, device, and storage medium
US9389937B2 (en) Managing faulty memory pages in a computing system
JP6818014B2 (en) Operation retry method and equipment for jobs
US20170300505A1 (en) Snapshot creation
US20100083043A1 (en) Information processing device, recording medium that records an operation state monitoring program, and operation state monitoring method
US10324794B2 (en) Method for storage management and storage device
CN104685474A (en) Notification of address range including non-correctable error
CN110795276A (en) Storage medium repairing method, computer equipment and storage medium
US20150286548A1 (en) Information processing device and method
US20150074455A1 (en) Method for maintaining file system of computer system
CN111124818B (en) Monitoring method, device and equipment for Expander
CN108845772B (en) Hard disk fault processing method, system, equipment and computer storage medium
CN111130856A (en) Server configuration method, system, equipment and computer readable storage medium
CN113905092B (en) Method, device, terminal and storage medium for determining reusable agent queue
CN107154960B (en) Method and apparatus for determining service availability information for distributed storage systems
US20160085649A1 (en) Disk drive repair
CN113312197A (en) Method and apparatus for determining batch faults, computer storage medium and electronic device
CN111309532A (en) PCIE equipment abnormity detection method, system, electronic equipment and storage medium
US9552247B2 (en) Method for detection of soft media errors for hard drive
CN115061641B (en) Disk fault processing method, device, equipment and storage medium
CN114510495B (en) Database service data consistency processing method and system
US11113122B1 (en) Event loop diagnostics
US11409566B2 (en) Resource control device, resource control method, and computer readable medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination