CN113886128B - SSD (solid State disk) fault diagnosis and data recovery method and system - Google Patents

SSD (solid State disk) fault diagnosis and data recovery method and system Download PDF

Info

Publication number
CN113886128B
CN113886128B CN202111224279.8A CN202111224279A CN113886128B CN 113886128 B CN113886128 B CN 113886128B CN 202111224279 A CN202111224279 A CN 202111224279A CN 113886128 B CN113886128 B CN 113886128B
Authority
CN
China
Prior art keywords
diagnosis
ssd
fault
data
log
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111224279.8A
Other languages
Chinese (zh)
Other versions
CN113886128A (en
Inventor
林梓梁
周雄伟
方智武
李红生
廖慧容
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Eastic Technology Co ltd
Original Assignee
Shenzhen Eastic Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Eastic Technology Co ltd filed Critical Shenzhen Eastic Technology Co ltd
Priority to CN202111224279.8A priority Critical patent/CN113886128B/en
Publication of CN113886128A publication Critical patent/CN113886128A/en
Application granted granted Critical
Publication of CN113886128B publication Critical patent/CN113886128B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/0727Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a storage system, e.g. in a DASD or network based storage system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/079Root cause analysis, i.e. error or fault diagnosis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0793Remedial or corrective actions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1469Backup restoration techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/20Administration of product repair or maintenance

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Test And Diagnosis Of Digital Computers (AREA)

Abstract

The invention provides a method and a system for SSD fault diagnosis and data recovery, wherein the method comprises the following steps: and diagnosing the health condition of the SSD in real time, calculating a quantitative reference value, and judging whether the SSD has a fault hidden danger or not according to the quantitative value. If a command issued by the host is received in the diagnosis process, the diagnosis process is suspended, the command issued by the host is executed, and after the command issued by the host is executed, the health condition of the SSD continues to be diagnosed until the diagnosis is finished. And judging the fault type according to the quantized value. And matching and repairing the fault method in the sample library according to the fault type and performing automatic recovery. Backup is required immediately after the automatic recovery is successful. If the automatic recovery fails, the full disk formatting of the disk is tried and the data is recovered by using the backup file. The full-disk formatting can not be solved, the fault type is sent to a display for displaying, and manual repair is needed; and after the repairing is successful, updating the flow and the fault type of the manual repairing in the sample library.

Description

SSD fault diagnosis and data recovery method and system
Technical Field
The invention relates to the technical field of solid state disk testing, in particular to a method and a system for SSD fault diagnosis and data recovery.
Background
Computer hard disks are the most important storage devices of computers, and are divided into mechanical hard disks (HDDs) and Solid State Disks (SSDs). At present, the application field of the solid state disk is spread all over the world, and the solid state disk has overwhelming advantages compared with a mechanical hard disk in the aspects of reading and writing speed, service life and the like, and along with the rapid development of the internet, the storage requirement of people on data information is also continuously improved.
The SSD solid state disk is tested for many times in the production process, various problems still occur in the operation process, the SSD solid state disk needs to be returned to a factory for processing under most conditions, the efficiency is very low, months are often needed from problem finding to positioning, and stored data are very likely to be lost. Therefore, grasping the SSD running condition in real time and judging the fault type after finding the fault are key factors for rapidly positioning the fault reason.
Disclosure of Invention
The invention provides a method and a system for SSD fault diagnosis and data recovery, which can diagnose the running condition of a system in real time, automatically repair the abnormal diagnostic system according to the fault type, collect the error system and display the fault point through a display if the repair fails, greatly reduce the troubleshooting time even if the repair is not needed and the system can be quickly recovered.
In order to achieve the above purpose, a method and a system for SSD fault diagnosis and data recovery are provided to solve the technical problems existing in the prior art background.
The invention provides a method for SSD fault diagnosis and data recovery, which comprises the following steps:
s100, diagnosing the SSD health condition in real time and calculating a quantitative reference value;
s200, if a command issued by the host is received in the diagnosis process, the diagnosis process is suspended, and the command issued by the host is executed; after the command issued by the host is executed, the health condition of the SSD continues to be diagnosed until the diagnosis is finished;
s300, matching fault types in a sample library according to the quantitative reference values, searching a repair process according to the fault types and automatically repairing;
s400, if the quantized value cannot be restored to a normal range or the similar fault continuously occurs three times in S300, judging that the restoration fails, formatting the disk in a full manner, and restoring data by using the backup file;
s500, the full-disk formatting still cannot be solved, the fault type is sent to a display to be displayed, and manual repair is needed; the process of manual repair and the fault code are recorded in a sample library.
Preferably, the S100 includes:
s101, diagnosing whether a bad block exists in the SSD in real time, wherein if the bad block passes the diagnosis, the quantized reference value is 1 point;
s102, diagnosing whether SSD system data is normal in real time, wherein if the SSD system data is diagnosed to be normal, the quantitative reference value is 2 points;
s103, diagnosing whether the SSD user data is normal in real time, wherein if the SSD user data is normal, the quantized reference value is 4 points;
s104, diagnosing whether the SSD peripheral connection is normal in real time, wherein if the SSD peripheral connection is normal, the quantized reference value is 8 points;
s105, if the diagnosis result is that the full score is 15 points, the system is normal in operation, the diagnosis test passes, and the detection of the next period starts; if the diagnosis result is less than 15 points, judging the fault type according to the final diagnosis score; and matching a repairing process in the sample library according to the fault type, and intelligently repairing according to the repairing process.
And S106, storing the diagnosis result into a diagnosis log.
The backup files in the S400 comprise timing backup and diagnosis backup;
the timing backup is to set a backup plan and perform backup according to a backup requirement; the backup plan includes: setting time parameters to regularly backup data; storing the backup file to peripheral equipment;
the diagnosis backup refers to that backup files are saved after the diagnosis is finished each time;
setting the diagnosis backup priority to be higher than the timing backup priority to ensure that the two backup modes do not conflict;
the backup process of the backup file comprises the following steps: according to the received backup request, acquiring a running system data log, calculating a data backup log of a latest backup file, and determining a data log to be backed up;
the formula for obtaining the data backup log of the current system is as follows:
Tk=S i+1 *{1-sim(S i+1 ,B i )}
S i+1 =B i ±Tk
wherein Bi represents a data backup log of the ith backup file; s i+1 A data backup log representing the i +1 st secondary system; tk represents data to be backed up; sim (S) i+1 ,B i ) Representing the correlation between the data backup log of the (i + 1) th system and the data backup log of the ith backup file;
the restoring data using the backup file in S400 includes: sending a factory reset command, analyzing the latest backup file or factory self-contained backup file, and taking the backup file as a first starting item; and after the analysis is completed, the host machine issues a restart command, and after the host machine is successfully restarted, the restoration is completed.
The sample library in S300 comprises: the diagnosis result, the fault type and the repair process are saved as a dbe format file;
the diagnosis result is composed of 1-15 numbers; each number represents one or several fault types;
the fault types include: bad block failure, system data failure, user data failure, peripheral connection failure.
The diagnosis log in the S106 comprises: diagnosis time, diagnosis duration, fault reasons and repair results;
the diagnostic log records the SSD operating condition and various faults occurring in the daily operating process; the diagnostic log needs to be stored in the peripheral device, so that the formatted SSD disk cannot find the diagnostic log.
The invention provides a system for SSD fault diagnosis and data recovery, which comprises a fault diagnosis module, a data recovery module, a data backup module and a diagnosis log module, wherein the fault diagnosis module is used for diagnosing the fault of an SSD;
the fault diagnosis module is used for diagnosing the health condition of the SSD in real time and calculating a quantitative reference value, if diagnosis is passed, a diagnosis record is stored to the diagnosis log module, if a fault is diagnosed, the fault type is provided for the data repair module, and the repair process is recorded into the diagnosis log module;
the data restoration module is used for inquiring a restoration flow in the sample base according to the fault type when the fault diagnosis module diagnoses that the SSD has a fault, and restoring the system according to the restoration flow;
the data backup module is configured to perform data backup on the SSD, and the data backup is a precondition for data recovery;
the diagnostic log module is used for recording the running condition of the SSD system in detail, storing the SSD running and diagnostic logs and checking the key basis of failure reasons;
if a command issued by the host is received in the diagnosis process, the diagnosis process is suspended, and the command issued by the host is executed; after the command issued by the host is executed, the health condition of the SSD continues to be diagnosed until the diagnosis is finished; matching fault types in a sample library according to the quantitative reference values, searching a repair process according to the fault types and automatically repairing; if the automatic repair cannot repair the quantized value to a normal range or the similar fault occurs three times continuously, formatting the disk and recovering data by using the backup file;
the full-disk formatting can not be solved, the fault type is sent to a display to be displayed, and manual repair is needed; the process of manual repair and the fault code are recorded in a sample library.
In a system for SSD fault diagnosis and data recovery, a fault diagnosis module comprises:
the first sub-module diagnoses whether the SSD has a bad block in real time, and if the diagnosis is passed, the quantized reference value is 1 point;
the second sub-module diagnoses whether the SSD system data is normal in real time, and if the SSD system data is normal, the quantized reference value is 2 points;
the third sub-module diagnoses whether the SSD user data is normal in real time, and if the SSD user data is normal, the quantized reference value is 4 points;
the fourth sub-module diagnoses whether the SSD peripheral connection is normal in real time, and if the diagnosis is passed, the quantized reference value is 8 points;
a diagnostic result determining submodule, wherein if the diagnostic result is full score of 15, the system is indicated to operate normally, the diagnostic test passes, and the detection of the next period starts; if the diagnosis result is less than 15 points, judging the fault type according to the final diagnosis score; and matching a repairing process in the sample library according to the fault type, and intelligently repairing according to the repairing process.
And the diagnosis log storage submodule stores the diagnosis result into the diagnosis log.
The SSD fault diagnosis and data recovery system comprises the backup file which comprises a timing backup and a diagnosis backup. Storing the backup file to peripheral equipment;
the timing backup is to set a backup plan and perform backup according to backup requirements; the backup plan includes: setting time parameters to periodically backup data;
the diagnosis backup refers to that backup files are saved after the diagnosis is finished each time;
the data backup module comprises:
the priority setting submodule is used for setting the diagnosis backup priority to be higher than the timing backup priority and ensuring that the two backup modes do not conflict;
the backup process submodule is used for backing up files, and the backup process of the backup files comprises the following steps: according to the received backup request, acquiring a running system data log, calculating a data backup log of a latest backup file, and determining a data log to be backed up;
the formula for obtaining the data backup log of the current system is as follows:
Tk=S i+1 *{1-sim(S i+1 ,B i )}
S i+1 =B i ±Tk
wherein, Bi represents the data backup log of the ith backup file; s. the i+1 A data backup log representing the i +1 th secondary system; tk represents data to be backed up; sim (S) i+1 ,B i ) Representing the correlation between the data backup log of the i +1 th system and the data backup log of the ith backup file;
the data recovery submodule for recovering the data by using the backup file comprises: sending a factory reset command, analyzing the latest backup file or factory self-contained backup file, and taking the backup file as a first starting item; and after the analysis is completed, the host machine issues a restart command, and after the host machine is restarted successfully, the restoration is completed.
The sample library comprises: the diagnosis result, the fault type and the repair process are saved as a dbe format file;
the diagnosis result is composed of 1-15 numbers; each number represents one or several fault types;
the fault types include: bad block failure, system data failure, user data failure, peripheral connection failure.
The diagnostic log module includes: diagnosis time, diagnosis duration, fault reasons and repair results;
the diagnostic log records the SSD operating conditions and various faults occurring in the daily operating process; the diagnostic log needs to be stored in the peripheral device, so that the formatted SSD disk cannot find the diagnostic log.
The invention provides a method and a system for SSD fault diagnosis and data recovery, which are used for diagnosing the SSD health condition in real time and calculating a quantitative reference value, pausing the diagnosis process if a command issued by a host is received in the diagnosis process, executing the command issued by the host, and continuing to diagnose the SSD health condition until the diagnosis is finished after the command issued by the host is executed. And matching the fault type in the sample base according to the quantitative reference value, searching a repair process according to the fault type and automatically repairing. If more than one type of the diagnosed fault is needed, the fault needs to be repaired in sequence until the repair is completed. The method can not restore the quantized value to the normal range or the similar fault connection occurs three times, the restoration failure is judged, the data is formatted in a full disk mode, and the backup file is used for restoring the data.
The full disk formatting can not be solved, the fault type is sent to a display for displaying, and manual repair is needed; the process of manual repair and the fault code are recorded in a sample library.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:
FIG. 1 is a schematic diagram of SSD fault diagnosis and data recovery steps in the prior art;
FIG. 2 is a schematic diagram of an SSD fault diagnosis and data recovery procedure according to the present invention;
FIG. 3 is a flowchart of a method for SSD failure diagnosis and data recovery in an embodiment of the present invention;
fig. 4 is a schematic diagram of a system structure for SSD fault diagnosis and data recovery according to the present invention.
Detailed Description
The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it will be understood that they are described herein for the purpose of illustration and explanation and not limitation.
As shown in fig. 1, a schematic diagram of SSD failure diagnosis and data recovery in the prior art is shown, and the method for failure diagnosis and data recovery includes the following steps:
s1, the end user collects SSD underlying data logs.
B1 the server vendor provides the SSD base data to the SSD vendor for analysis.
B2 SSD vendor cannot locate the cause of the failure using the underlying data, requiring more data to be collected.
S2, receiving the data request required by the SSD supplier.
S3, data logs are collected again as requested by the SSD vendor.
B3, sending the collected SSD data to the SSD supplier.
As can be seen from the above flow, in the conventional SSD failure diagnosis and data recovery, the end user sends the basic data log that is common to the SSD provider to the server provider, and the server provider provides the basic data log to the SSD provider for analysis. The underlying data log may not have the data log required for the fault, and the end user is required to collect the relevant data log. Therefore, the time duration of fault analysis in the prior art is long, and the positioning problem efficiency is low.
Accordingly, the present invention provides a method for SSD failure diagnosis and data recovery, and in one embodiment, as shown in fig. 3, a method for SSD failure diagnosis and data recovery is provided, the method comprising:
s100, diagnosing the health condition of the SSD in real time and calculating a quantitative reference value;
s200, if a command issued by the host is received in the diagnosis process, the diagnosis process is suspended, and the command issued by the host is executed; after the command issued by the host is executed, the health condition of the SSD continues to be diagnosed until the diagnosis is finished;
s300, matching a repairing process in a sample library according to the quantitative reference value and automatically repairing;
s400, if the quantized value cannot be restored to a normal range or the similar fault continuously occurs three times in the S300, formatting the disk in a full mode and recovering data by using the backup file;
s500, the full-disk formatting still cannot be solved, the fault type is sent to a display to be displayed, and manual repair is needed; the process of manual repair and the fault code are recorded in a sample library.
The method provides a method for SSD fault diagnosis and data recovery, and the working principle of the technical scheme (see figure 2) comprises the following steps: when the system diagnoses the SSD fault, the SSD fault is automatically repaired according to the fault type, the failure of repair is sent to the fault reason, the diagnosis log requests the SSD supplier to analyze the fault reason, and the fault reason and the repair process are placed in a sample library after the repair is finished.
Compared with the prior art, the SSD fault diagnosis and data recovery method provided by the invention has the advantages that the end user directly provides the fault type and the corresponding diagnosis log file to the SSD supplier, and the SSD supplier directly positions the fault point, thereby reducing the troubleshooting time. And as time goes on, the number of repairing methods corresponding to the fault types in the sample library is increased, and the SSD self-healing capability is stronger.
According to the technical scheme, whether bad blocks exist in the SSD is diagnosed in real time during fault diagnosis test, and if the bad blocks exist in the SSD, the quantized reference value is 1 minute; diagnosing whether the SSD system data is normal in real time, wherein if the SSD system data is normal, the quantitative reference value is 2 points; diagnosing whether SSD user data is normal in real time, wherein if the SSD user data is normal, the quantitative reference value is 4 points; diagnosing whether the SSD peripheral connection is normal in real time, wherein if the SSD peripheral connection is normal, the quantitative reference value is 8 points; if the diagnosis result is full score of 15, the system is indicated to operate normally, the diagnosis test is passed, and the next period of detection is started; if the diagnosis result is less than 15 points, judging the fault type according to the final diagnosis score; and matching a repairing process in the sample library according to the fault type, and intelligently repairing according to the repairing process. And storing the diagnosis result into a diagnosis log. For example: the diagnosis result is 14 points, which indicates that the SSD is diagnosed to have a bad block; therefore, the fault type is judged, then the bad block repairing process is inquired in the sample library, the system automatically repairs the SSD bad block, and diagnosis continues after repair is completed.
The technical scheme of the invention comprises the steps of backing up files regularly and diagnosing; the timing backup is to set a backup plan and perform backup according to a backup requirement; the backup plan includes: setting time parameters to periodically backup data;
the diagnosis backup refers to that backup files are saved after the diagnosis is finished each time;
the diagnosis backup priority is greater than the timing backup priority, so that the two backup modes are ensured not to conflict;
the backup process submodule is used for backing up files, and the backup process of the backup files comprises the following steps: according to the received backup request, acquiring a running system data log, calculating a data backup log of a latest backup file, and determining a data log to be backed up;
the formula for obtaining the data backup log of the current system is as follows:
Tk=S i+1 *{1-sim(S i+1 ,B i )}
S i+1 =B i ±Tk
bi represents a data backup log of the ith backup file; s. the i+1 A data backup log representing the i +1 th secondary system; tk represents data to be backed up; sim (S) i+1 ,B i ) Representing the correlation of the data backup log of the i +1 th system and the data backup log of the i-th backup file.
The working principle of the backup file is as follows: according to the received backup request, firstly collecting the current data log of the system, then analyzing the latest backup file, and calculating the data to be backed up at this time through correlation; the data to be backed up and the latest backup file are added to form the content of the current backup file, so that the data to be backed up only needs to be backed up each time, and system resources are saved.
The data recovery submodule, which recovers data by using the backup file, includes: sending a factory reset command, analyzing the latest backup file or factory self-contained backup file, and taking the backup file as a first starting item; and after the analysis is completed, the host machine issues a restart command, and after the host machine is restarted successfully, the restoration is completed.
The technical scheme of the invention comprises the following steps: the diagnosis result, the fault type and the repair process are saved as dbe format files;
the diagnosis result is composed of 1-15 numbers; each number represents one or several fault types;
the fault types include: bad block failure, system data failure, user data failure, peripheral connection failure.
The working principle of the sample library is as follows: the system diagnoses the SSD fault, and inquires a repairing process in a sample library according to the fault type; the integrity of the sample library is a key indicator of the self-recovery capability of the system.
The technical scheme of the invention comprises the following steps: diagnosis time, diagnosis duration, fault reasons and repair results;
the diagnostic log records the SSD operating conditions and various faults occurring in the daily operating process; the diagnostic log needs to be stored in the peripheral device, so that the formatted SSD disk cannot find the diagnostic log.
In one embodiment, as shown in fig. 4, the present embodiment provides a system for SSD fault diagnosis and data recovery, the system comprising: the system comprises a fault diagnosis module, a data repair module, a data backup module and a diagnosis log module;
the fault diagnosis module diagnoses the health condition of the SSD in real time and calculates a quantitative reference value, if diagnosis is passed, a diagnosis record is stored to the diagnosis log module, if a fault is diagnosed, the fault type is provided for the data repair module, and the repair process is recorded into the diagnosis log module;
the data restoration module and the fault diagnosis module diagnose that the SSD has faults, inquire restoration processes in the sample library according to the fault types, and restore the system according to the restoration processes;
the data backup module is configured to perform data backup on the SSD, and the data backup is a precondition for data recovery;
the diagnostic log module records the running condition of the SSD system in detail, stores SSD running and diagnostic logs and finds key basis of failure reasons;
if a command issued by the host is received in the diagnosis process, the diagnosis process is suspended, and the command issued by the host is executed; after the command issued by the host is executed, the health condition of the SSD continues to be diagnosed until the diagnosis is finished; matching a repairing process in a sample library according to the quantitative reference value and automatically repairing; if the quantized value cannot be restored to the normal range by automatic restoration, formatting the disk in a full manner and restoring data by using the backup file;
the full disk formatting can not be solved, the fault type is sent to a display to be displayed, and manual repair is needed; the process of manual repair and the fault code are recorded in a sample library.
In one embodiment, the diagnostic module is specifically configured to:
the first sub-module diagnoses whether the SSD has a bad block in real time, and if the diagnosis is passed, the quantized reference value is 1 point;
the second sub-module diagnoses whether the SSD system data is normal in real time, and if the SSD system data is normal, the quantized reference value is 2 points;
the third sub-module diagnoses whether the SSD user data is normal in real time, and if the SSD user data is normal, the quantized reference value is 4 points;
the fourth sub-module diagnoses whether the SSD peripheral connection is normal in real time, and if the diagnosis is passed, the quantized reference value is 8 points;
a diagnostic result determining submodule, wherein if the diagnostic result is full score of 15, the system is indicated to operate normally, the diagnostic test passes, and the detection of the next period starts; if the diagnosis result is less than 15 points, judging the fault type according to the final diagnosis score; and matching a repairing process in the sample library according to the fault type, and intelligently repairing according to the repairing process.
And the diagnosis log storage submodule stores the diagnosis result into the diagnosis log.
In one embodiment, the backup file includes: timed backup and diagnostic backup.
The timing backup is to set a backup plan and perform backup according to backup requirements; the backup plan includes: setting time parameters to regularly backup data;
the diagnosis backup refers to that backup files are saved after each diagnosis is finished;
the data backup module comprises:
the priority setting submodule is used for setting the diagnosis backup priority to be higher than the timing backup priority and ensuring that the two backup modes do not conflict;
the backup process submodule is used for backing up files, and the backup process of the backup files comprises the following steps: according to the received backup request, acquiring a running system data log, calculating a data backup log of a latest backup file, and determining a data log to be backed up;
the formula for obtaining the data backup log of the current system is as follows:
Tk=S i+1 *{1-sim(S i+1 ,B i )}
S i+1 =B i ±Tk
bi represents a data backup log of the ith backup file; s i+1 A data backup log representing the i +1 st secondary system; tk represents data to be backed up; sim (S) i+1 ,B i ) Representing the correlation between the data backup log of the (i + 1) th system and the data backup log of the ith backup file;
the data recovery submodule, configured to recover data using the backup file, includes: sending a factory reset command, analyzing the latest backup file or factory self-contained backup file, and taking the backup file as a first starting item; and after the analysis is completed, the host machine issues a restart command, and after the host machine is restarted successfully, the restoration is completed.
In one embodiment, the diagnostic log module comprises: diagnosis time, diagnosis duration, fault reasons and repair results;
the diagnostic log records the SSD operating conditions and various faults occurring in the daily operating process; the diagnostic log needs to be stored in the peripheral device, so that the formatted SSD disk cannot find the diagnostic log.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention, and it is intended that the present invention cover the modifications and variations of this invention provided they come within the scope of the appended claims and their equivalents.

Claims (6)

1. A method for SSD fault diagnosis and data recovery, comprising:
s100, diagnosing the health condition of the SSD in real time and calculating a quantitative reference value;
s200, if a command issued by a host is received in the diagnosis process, the diagnosis process is suspended, and the host is executed to issue the command; after the command issued by the host is executed, the health condition of the SSD continues to be diagnosed until the diagnosis is finished;
s300, matching fault types in a sample library according to the quantitative reference values, inquiring a repair process according to the fault types and automatically repairing;
s400, if the quantized value cannot be restored to a normal range or the similar fault continuously occurs three times in S300, judging that the restoration fails, formatting the disk in a full manner, and restoring data by using the backup file;
s500, the full-disk formatting still cannot be solved, and the fault type is sent to a display to be displayed for manual repair; updating the manual repair process and the fault code in a sample library;
the S100 includes:
s101, diagnosing whether a bad block exists in the SSD in real time, wherein if the bad block passes the diagnosis, the quantized reference value is 1 point;
s102, diagnosing whether SSD system data are normal in real time, and if the data are normal, quantifying a reference value to be 2 points;
s103, diagnosing whether the SSD user data is normal in real time, wherein if the SSD user data is normal, the quantized reference value is 4 points;
s104, diagnosing whether the SSD peripheral connection is normal in real time, wherein if the SSD peripheral connection is normal, the quantized reference value is 8 points;
s105, if the diagnosis result is that the full score is 15 points, the system is normal in operation, the diagnosis test passes, and the detection of the next period starts; if the diagnosis result is less than 15 points, judging the fault type according to the final diagnosis score; matching a repairing process in a sample library according to the fault type, and intelligently repairing according to the repairing process;
and S106, storing the diagnosis result into a diagnosis log.
2. The method according to claim 1, wherein the step S300 comprises: the diagnosis result, the fault type and the repair process are saved as a dbe format file;
the diagnosis result is composed of 1-15 numbers; each number represents one or several fault types;
the fault types include: bad block failure, system data failure, user data failure, peripheral connection failure.
3. The method according to claim 1, wherein the step of diagnosing the log in the S106 comprises: diagnosis time, diagnosis duration, fault reasons and repair results;
the diagnostic log records the SSD operating conditions and various faults occurring in the daily operating process; and storing the diagnosis log to the peripheral equipment to prevent the formatted SSD disk from not finding the diagnosis log.
4. A system for SSD fault diagnosis and data recovery is characterized by comprising a fault diagnosis module, a data repair module, a data backup module and a diagnosis log module;
the fault diagnosis module is used for diagnosing the SSD health condition in real time and calculating a quantitative reference value, if diagnosis is passed, a diagnosis record is stored in the diagnosis log module, if a fault is diagnosed, the fault type is provided for the data repair module, and the repair process is recorded in the diagnosis log module;
the data restoration module is used for inquiring a restoration flow in the sample base according to the fault type when the fault diagnosis module diagnoses that the SSD has a fault, and restoring the system according to the restoration flow;
the data backup module is configured to perform data backup on the SSD, and the data backup is a precondition for data recovery;
the diagnostic log module is used for recording the running condition of the SSD system in detail, storing the SSD running and diagnostic logs and checking the key basis of failure reasons;
if a command issued by the host is received in the diagnosis process, the diagnosis process is suspended, and the command issued by the host is executed; after the command issued by the host is executed, the health condition of the SSD continues to be diagnosed until the diagnosis is finished; matching fault types in a sample library according to the quantitative reference value, searching a repair process according to the fault types and automatically repairing; if the automatic repair cannot repair the quantized value to a normal range or the similar fault occurs three times continuously, formatting the disk and recovering data by using the backup file;
the full disk formatting can not be solved, the fault type is sent to a display to be displayed, and manual repair is needed; updating the manual repair process and the fault code in a sample library;
wherein the fault diagnosis module comprises:
the first submodule is used for diagnosing whether the SSD has a bad block in real time, and if the SSD passes the diagnosis, the quantized reference value is 1 point;
the second submodule is used for diagnosing whether the SSD system data is normal in real time, and if the SSD system data is normal, the quantized reference value is 2 points;
the third sub-module is used for diagnosing whether the SSD user data is normal in real time, and if the SSD user data is normal, the quantized reference value is 4 points;
the fourth sub-module is used for diagnosing whether the SSD peripheral connection is normal or not in real time, and if the SSD peripheral connection is normal, the quantized reference value is 8 points;
the diagnostic result determining submodule is used for indicating that the system runs normally if the diagnostic result is 15 points of full score, the diagnostic test passes and the detection of the next period starts; if the diagnosis result is less than 15 points, judging the fault type according to the final diagnosis score; matching a repairing process in a sample library according to the fault type, and intelligently repairing according to the repairing process;
and the diagnosis log storage submodule is used for storing the diagnosis result into the diagnosis log.
5. The SSD failure diagnosis and data recovery system of claim 4, wherein the sample library comprises: the diagnosis result, the fault type and the repair process are saved as a dbe format file;
the diagnosis result is composed of 1-15 numbers; each number represents one or several fault types;
the fault types include: bad block failure, system data failure, user data failure, peripheral connection failure.
6. The SSD failure diagnosis and data recovery system of claim 4, wherein the diagnostic log module comprises: diagnosis time, diagnosis duration, fault reasons and repair results;
the diagnostic log records the SSD operating condition and various faults occurring in the daily operating process; and storing the diagnosis log to the peripheral equipment to prevent the formatted SSD disk from not finding the diagnosis log.
CN202111224279.8A 2021-10-20 2021-10-20 SSD (solid State disk) fault diagnosis and data recovery method and system Active CN113886128B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111224279.8A CN113886128B (en) 2021-10-20 2021-10-20 SSD (solid State disk) fault diagnosis and data recovery method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111224279.8A CN113886128B (en) 2021-10-20 2021-10-20 SSD (solid State disk) fault diagnosis and data recovery method and system

Publications (2)

Publication Number Publication Date
CN113886128A CN113886128A (en) 2022-01-04
CN113886128B true CN113886128B (en) 2022-09-09

Family

ID=79003882

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111224279.8A Active CN113886128B (en) 2021-10-20 2021-10-20 SSD (solid State disk) fault diagnosis and data recovery method and system

Country Status (1)

Country Link
CN (1) CN113886128B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105260279A (en) * 2015-11-04 2016-01-20 四川效率源信息安全技术股份有限公司 Method and device of dynamically diagnosing hard disk failure based on S.M.A.R.T (Self-Monitoring Analysis and Reporting Technology) data
US10223224B1 (en) * 2016-06-27 2019-03-05 EMC IP Holding Company LLC Method and system for automatic disk failure isolation, diagnosis, and remediation
CN110502386A (en) * 2019-08-30 2019-11-26 西安易朴通讯技术有限公司 The on-line fault diagnosis method and apparatus of hard disk
CN111897686A (en) * 2020-08-05 2020-11-06 腾讯科技(深圳)有限公司 Server cluster hard disk fault processing method and device, electronic equipment and storage medium
CN112988467A (en) * 2021-04-19 2021-06-18 深圳市安信达存储技术有限公司 Solid state disk, data recovery method thereof and terminal equipment
CN113284547A (en) * 2021-06-18 2021-08-20 公安部物证鉴定中心 SSD hard disk fault diagnosis and data recovery tool

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111949443B (en) * 2018-09-05 2022-07-22 华为技术有限公司 Hard disk failure processing method, array controller and hard disk
CN111625405A (en) * 2020-04-22 2020-09-04 深圳忆联信息系统有限公司 SSD terminal fault diagnosis method, system, computer device and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105260279A (en) * 2015-11-04 2016-01-20 四川效率源信息安全技术股份有限公司 Method and device of dynamically diagnosing hard disk failure based on S.M.A.R.T (Self-Monitoring Analysis and Reporting Technology) data
US10223224B1 (en) * 2016-06-27 2019-03-05 EMC IP Holding Company LLC Method and system for automatic disk failure isolation, diagnosis, and remediation
CN110502386A (en) * 2019-08-30 2019-11-26 西安易朴通讯技术有限公司 The on-line fault diagnosis method and apparatus of hard disk
CN111897686A (en) * 2020-08-05 2020-11-06 腾讯科技(深圳)有限公司 Server cluster hard disk fault processing method and device, electronic equipment and storage medium
CN112988467A (en) * 2021-04-19 2021-06-18 深圳市安信达存储技术有限公司 Solid state disk, data recovery method thereof and terminal equipment
CN113284547A (en) * 2021-06-18 2021-08-20 公安部物证鉴定中心 SSD hard disk fault diagnosis and data recovery tool

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Investigating Power Outage Effects on Reliability of Solid-State Drives;Saba Ahmadian等;《 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE)》;20180423;第207-212页 *
浅谈数据恢复技术;陈钢;《中小企业管理与科技(上旬刊)》;20150305(第03期);第305-306页 *

Also Published As

Publication number Publication date
CN113886128A (en) 2022-01-04

Similar Documents

Publication Publication Date Title
WO2017124808A1 (en) Fault information reproduction method and reproduction apparatus
US9448916B2 (en) Software test automation systems and methods
US8140565B2 (en) Autonomic information management system (IMS) mainframe database pointer error diagnostic data extraction
US7308609B2 (en) Method, data processing system, and computer program product for collecting first failure data capture information
CN113836044B (en) Method and system for collecting and analyzing software faults
CN110008129B (en) Reliability test method, device and equipment for storage timing snapshot
US7398511B2 (en) System and method for providing a health model for software
CN113946499A (en) Micro-service link tracking and performance analysis method, system, equipment and application
CN115033419B (en) Method and system for realizing hardware fault self-healing
US20060168165A1 (en) Provisional application management with automated acceptance tests and decision criteria
CN110688358A (en) Log collection method, device and equipment and readable storage medium
CN115629968A (en) Test data recording method and device
CN114003417B (en) Method, device and storage medium for realizing automatic fault transfer of RAID card
CN113886128B (en) SSD (solid State disk) fault diagnosis and data recovery method and system
CN116560893B (en) Computer application program operation data fault processing system
US7415560B2 (en) Method of automatically monitoring computer system debugging routine
JP2003345628A (en) Method for collecting fault research material, and implementation system therefor and processing program therefor
CN112231202A (en) Automatic Bug lifting method based on log monitoring and monitored module monitoring
CN101060686B (en) A method and device for the anomaly diagnosis and error playback in a mobile terminal
CN117312175B (en) Data processing method, device, computer equipment and storage medium
CN114253846B (en) Automatic test abnormality positioning method, device, equipment and readable storage medium
CN112732584B (en) Complex business logic completeness test method for new and old system data migration process
CN110008114B (en) Configuration information maintenance method, device, equipment and readable storage medium
CN117591355A (en) Method and device for diagnosing hard disk faults, computer equipment and storage medium
CN117215891A (en) Fault injection method and device for database stability evaluation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant