CN111124785B - Method, device, equipment and storage medium for hard disk fault detection - Google Patents

Method, device, equipment and storage medium for hard disk fault detection Download PDF

Info

Publication number
CN111124785B
CN111124785B CN201911332551.7A CN201911332551A CN111124785B CN 111124785 B CN111124785 B CN 111124785B CN 201911332551 A CN201911332551 A CN 201911332551A CN 111124785 B CN111124785 B CN 111124785B
Authority
CN
China
Prior art keywords
hard disk
determining
target
fault
disk
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911332551.7A
Other languages
Chinese (zh)
Other versions
CN111124785A (en
Inventor
陈树成
张猛
王军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Inspur Smart Computing Technology Co Ltd
Original Assignee
Guangdong Inspur Big Data Research Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Inspur Big Data Research Co Ltd filed Critical Guangdong Inspur Big Data Research Co Ltd
Priority to CN201911332551.7A priority Critical patent/CN111124785B/en
Publication of CN111124785A publication Critical patent/CN111124785A/en
Application granted granted Critical
Publication of CN111124785B publication Critical patent/CN111124785B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • G06F11/2205Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing using arrangements specific to the hardware being tested
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • G06F11/2273Test methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3037Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a memory, e.g. virtual memory, cache
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/32Monitoring with visual or acoustical indication of the functioning of the machine
    • G06F11/324Display of status information
    • G06F11/327Alarm or error message display

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a method, a device, equipment and a storage medium for hard disk fault detection, wherein the method comprises the following steps: monitoring in-place signals of all hard disk slots on a storage server, and determining any hard disk as a target hard disk when determining that any hard disk is inserted into the corresponding hard disk slot currently based on the in-place signals; starting timing from the moment that the target hard disk is inserted into the corresponding hard disk slot, and determining that the target hard disk is a fault disk if the time length obtained by timing reaches a time length threshold and the target hard disk cannot realize normal connection with the corresponding expander; and sending the information of the target hard disk as the fault disk to a corresponding operating system, and simultaneously, lighting a fault lamp corresponding to the hard disk slot position in which the target hard disk is inserted. Therefore, when the hard disk fails and cannot be connected to the system, the state of the hard disk failure can be identified, and corresponding warning is generated so as to prompt a user to replace the hard disk in time.

Description

Method, device, equipment and storage medium for hard disk fault detection
Technical Field
The present invention relates to the field of storage devices, and in particular, to a method, an apparatus, a device, and a storage medium for hard disk failure detection.
Background
The storage server is a server product for providing storage space service for users, the front end of the storage server is generally connected to user equipment through optical fibers and the like, and the rear end of the storage server is connected with a large number of hard disks and hard disk expansion cabinets to provide mass storage service.
At the interface level of the system, the CPU of the storage server processes the service data, converts the PCIe bus led out by the CPU into an SAS bus protocol through an SAS (Serial Attached SCSI, serial attached SCSI protocol) controller, and then is connected to the hard disk through the SAS bus. To connect more hard disks, a SAS bus expander (SAS expander) is often provided on the storage server to expand a small number of SAS buses into a large number of SAS buses; the basic hardware structure is shown in fig. 1. In the using process of the storage server, the SAS bus expander scans all the hard disks connected with the storage server and sends broadcast to the SAS controller, an operating system on the CPU confirms how many hard disks are in the equipment through the SAS controller, and the allocation of the drive letter to each hard disk is managed uniformly. If the hard disk cannot respond to the instruction of the operating system within a certain time, the operating system judges that the hard disk has faults and generates an alarm. This method has the following drawbacks: if a hard disk has serious problems, the hard disk cannot be connected to the system, and the SAS bus expander cannot identify the hard disk, so that the operating system cannot perceive the existence of the hard disk, and an alarm cannot be generated.
Disclosure of Invention
The invention aims to provide a method, a device, equipment and a storage medium for checking hard disk faults, which can identify the state of the hard disk faults and generate corresponding alarms when the hard disk faults and cannot be connected to a system so as to prompt a user to replace the hard disk in time.
In order to achieve the above object, the present invention provides the following technical solutions:
a method of hard disk failure checking, comprising:
monitoring in-place signals of all hard disk slots on a storage server, and determining any hard disk as a target hard disk when determining that any hard disk is inserted into the corresponding hard disk slot currently based on the in-place signals;
starting timing from the moment that the target hard disk is inserted into the corresponding hard disk slot, and determining that the target hard disk is a fault disk if the time length obtained by timing reaches a time length threshold and the target hard disk cannot realize normal connection with the corresponding expander;
and sending the information of the target hard disk as the fault disk to a corresponding operating system, and simultaneously, lighting a fault lamp corresponding to the hard disk slot position in which the target hard disk is inserted.
Preferably, the method further comprises:
if the time length obtained by timing does not reach the preset time length and the target hard disk can realize normal connection with the corresponding expander, self-checking information obtained by detecting the target hard disk by the target hard disk is obtained at regular time;
extracting bad block proportion contained in the self-checking information, if the bad block proportion reaches a proportion threshold value, executing the step of determining that the target hard disk is a fault disk, otherwise, determining that the target hard disk can be continuously used; the bad block proportion is the proportion of the bad blocks contained in the target hard disk to all the data blocks contained in the target hard disk.
Preferably, after extracting the bad block proportion included in the self-checking information, the method further includes:
and if the bad block proportion is not zero, acquiring the priority of the destination hard disk, if the priority of the destination hard disk reaches a priority threshold, executing the step of determining that the destination hard disk is a fault disk, if the priority of the destination hard disk does not reach the priority threshold, executing the step of determining whether the bad block proportion reaches the proportion threshold, and if the bad block proportion is zero, determining that the destination hard disk can work normally.
Preferably, after determining that the time length obtained by timing reaches the time length threshold and the destination hard disk cannot realize normal connection with the corresponding expander, the method further includes:
and attempting to control the destination hard disk to realize normal connection with the corresponding expander, if the attempt is successful, determining that the destination hard disk can realize normal connection with the corresponding expander, and if the attempt is failed, executing the step of determining that the destination hard disk is a fault disk.
Preferably, determining that any hard disk is currently inserted into the corresponding hard disk slot based on the in-place signal includes:
if the bit signal of any hard disk slot is changed from high level to low level, it is determined that any hard disk slot is inserted into the hard disk.
Preferably, after determining that the destination hard disk is a failed disk, the method further includes:
and sending the information of the target hard disk as the fault disk to a preset management terminal.
An apparatus for hard disk failure inspection, comprising:
a first determining module, configured to: monitoring in-place signals of all hard disk slots on a storage server, and determining any hard disk as a target hard disk when determining that any hard disk is inserted into the corresponding hard disk slot currently based on the in-place signals;
a second determining module, configured to: starting timing from the moment that the target hard disk is inserted into the corresponding hard disk slot, and determining that the target hard disk is a fault disk if the time length obtained by timing reaches a time length threshold and the target hard disk cannot realize normal connection with the corresponding expander;
a fault reporting module for: and sending the information of the target hard disk as the fault disk to a corresponding operating system, and simultaneously, lighting a fault lamp corresponding to the hard disk slot position in which the target hard disk is inserted.
Preferably, the method further comprises:
a third determining module, configured to: if the time length obtained by timing does not reach the preset time length and the target hard disk can realize normal connection with the corresponding expander, self-checking information obtained by detecting the target hard disk by the target hard disk is obtained at regular time; extracting bad block proportion contained in the self-checking information, if the bad block proportion reaches a proportion threshold value, executing the step of determining that the target hard disk is a fault disk, otherwise, determining that the target hard disk can be continuously used; the bad block proportion is the proportion of the bad blocks contained in the target hard disk to all the data blocks contained in the target hard disk.
An apparatus for hard disk failure inspection, comprising:
a memory for storing a computer program;
a processor for implementing the steps of the method for hard disk failure detection as claimed in any one of the above when executing the computer program.
A computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the method of hard disk failure checking as claimed in any one of the preceding claims.
The invention provides a method, a device, equipment and a storage medium for hard disk fault detection, wherein the method comprises the following steps: monitoring in-place signals of all hard disk slots on a storage server, and determining any hard disk as a target hard disk when determining that any hard disk is inserted into the corresponding hard disk slot currently based on the in-place signals; starting timing from the moment that the target hard disk is inserted into the corresponding hard disk slot, and determining that the target hard disk is a fault disk if the time length obtained by timing reaches a time length threshold and the target hard disk cannot realize normal connection with the corresponding expander; and sending the information of the target hard disk as the fault disk to a corresponding operating system, and simultaneously, lighting a fault lamp corresponding to the hard disk slot position in which the target hard disk is inserted. According to the technical scheme, through monitoring the in-place signals of all hard disk slots on the storage server, when any hard disk is inserted into the corresponding hard disk slot, timing is started at the moment of hard disk insertion, if the time length obtained by timing is longer than a time length threshold value and normal connection of the hard disk with the corresponding expander cannot be achieved, connection timeout of the hard disk is indicated, a corresponding user is reminded of replacing the hard disk in a mode of sending the information to an operating system, meanwhile, the position of the hard disk in a machine room is prompted by a mode of lighting a fault lamp corresponding to the hard disk slot, so that the state of hard disk faults can be identified when the hard disk fails and cannot be connected to a system, corresponding warning occurs, and a user is prompted to replace the hard disk in time.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of the basic hardware links of a storage server of the prior art;
FIG. 2 is a flowchart of a method for hard disk failure detection according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a device for hard disk failure detection according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 2, a flowchart of a method for checking hard disk failures provided in an embodiment of the present invention may include:
s11: and monitoring an in-place signal of each hard disk slot on the storage server, and determining any hard disk as a target hard disk when determining that any hard disk currently exists and is inserted into the corresponding hard disk slot based on the in-place signal.
The execution body of the method for checking the hard disk failure provided by the embodiment of the invention can be a corresponding device, and the device can be arranged on an SAS bus controller, so that the execution body of the method can also be the SAS bus controller, and the execution body of the method is specifically described below as the SAS bus controller. The hard disk insertion system is that a hard disk is inserted into a hard disk slot connected with the SAS bus controller; in addition, the CPLD (Complex Programmable Logic Device ) is a module for monitoring hardware signals on the storage server, after the hard disk is inserted into the system, the hard disk (because the hard disk is inserted into the hard disk slot, the hard disk corresponds to the inserted hard disk slot, and the in-place signal of the hard disk is the in-place signal corresponding to the hard disk slot) is pulled down, so that the CPLD judges whether the hard disk in the hard disk slot is in place, the judged in-place state (the value of the in-place signal) is written into the CPLD register, the SAS bus expander queries the in-place states of the hard disks in all the hard disk slots from the CPLD register, if the state that the hard disk is not in place is found for one second, the state that the hard disk is in place is just inserted is described, and the SAS bus expander can record the time at this moment as the reference time for judging whether the hard disk is in fault.
The SAS bus expander can query the in-place state of all hard disks in real time or at fixed time, such as once every second or once every ten seconds, and the like, and can be specifically set according to actual needs; of course, it is also possible that the CPLD register reports the in-place status of all hard disks to the SAS bus expander in real time or at a fixed time for the SAS bus expander to determine the in-place status of each hard disk based on the information reported by the CPLD register.
S12: starting timing from the moment that the target hard disk is inserted into the corresponding hard disk slot, and determining that the target hard disk is a fault disk if the time length obtained by timing reaches a time length threshold and the target hard disk cannot realize normal connection with the corresponding expander.
It should be noted that, when determining that any hard disk is currently inserted into the corresponding hard disk slot based on the in-place signal, the time of the hard disk inserted into the corresponding hard disk slot can be recorded, so as to facilitate corresponding timing. The expander is an SAS bus expander, and the purpose of inserting the hard disk slot is to insert the corresponding SAS bus expander, so that the connection with the system is realized through the SAS bus expander; after the hard disk is inserted into the system, normal connection with the corresponding expander can be realized in tens of seconds, so that normal work can be realized, the normal response to an upper instruction and the like is included, and the state can be called hard disk connection; the process of inserting the hard disk into the system until normal connection with the corresponding expander can be realized comprises the steps of starting the hard disk, realizing signal connection between the hard disk and the SAS bus expander, and the like, and further realizing the receiving of upper-layer instructions, corresponding feedback and the like through the SAS bus expander. After determining the destination hard disk, the connection state of the hard disk can be queried in real time or at fixed time, namely, whether the hard disk is connected or not is determined, whether the time length obtained by starting timing at the moment of inserting the destination hard disk into the system reaches a time length threshold value is determined, if the time length threshold value is reached but the connection of the hard disk is not realized (normal connection with a corresponding SAS bus expander cannot be realized), the connection of the destination hard disk is considered to be overtime, and the connection cannot be realized normally, so that the destination hard disk is determined to be a fault disk at the moment, and if the time length obtained by starting timing at the moment of inserting the destination hard disk into the system does not reach the time length threshold value and the connection of the hard disk can be realized, the destination hard disk is indicated to be normal. In addition, the duration threshold may be set according to the time required for the hard disk connection under normal conditions, specifically, the duration threshold may be set to a value greater than the number of seconds within ten seconds of the time required for the hard disk connection, and if fifteen seconds are required for the hard disk connection under normal conditions, the duration threshold may be set to twenty seconds, so that it can be ensured that the duration threshold can fully satisfy the time required for the hard disk connection.
S13: and sending the information of the target hard disk as the fault disk to a corresponding operating system, and simultaneously, lighting a fault lamp corresponding to the slot position of the hard disk inserted by the target hard disk.
When the hard disk is not successfully connected, the operating system cannot identify the hard disk which is not connected, and therefore a fault alarm of unsuccessful connection of the hard disk cannot be generated. Therefore, in this embodiment, after determining that the connection of the hard disk is overtime and the hard disk is a failed disk, the information of the hard disk that the hard disk is not failed can be sent to the operating system, so that the user can obtain the information from the operating system and instruct the hard disk that the connection cannot be successfully realized, thereby reminding the user of replacing the failed disk in this way. Meanwhile, the SAS bus expander can also light a fault lamp corresponding to the slot position of the hard disk through the CPLD so as to prompt a user to determine the position of the fault disk in the machine room, and the user can conveniently replace the corresponding fault disk.
In addition, after the destination hard disk is determined to be the fault disk, the hard disk can be marked as the fault hard disk, specifically, the hard disk can be marked at the corresponding position in the SAS bus expander, or the hard disk slot into which the hard disk is inserted can be marked, so that the state of the hard disk can be conveniently known.
According to the technical scheme, through monitoring the in-place signals of all hard disk slots on the storage server, when any hard disk is inserted into the corresponding hard disk slot, timing is started at the moment of hard disk insertion, if the time length obtained by timing is longer than a time length threshold value and normal connection of the hard disk with the corresponding expander cannot be achieved, connection timeout of the hard disk is indicated, a corresponding user is reminded of replacing the hard disk in a mode of sending the information to an operating system, meanwhile, the position of the hard disk in a machine room is prompted by a mode of lighting a fault lamp corresponding to the hard disk slot, so that the state of hard disk faults can be identified when the hard disk fails and cannot be connected to a system, corresponding warning occurs, and a user is prompted to replace the hard disk in time.
The method for checking the hard disk faults provided by the embodiment of the invention can further comprise the following steps:
if the time length obtained by timing does not reach the preset time length and the target hard disk can realize normal connection with the corresponding expander, self-checking information obtained by detecting the target hard disk by the target hard disk is obtained at regular time;
extracting bad block proportion contained in the self-checking information, if the bad block proportion reaches a proportion threshold value, executing the step of determining that the target hard disk is a fault disk, otherwise, determining that the target hard disk can be continuously used; bad block proportion is the proportion of bad blocks contained in the target hard disk to all data blocks contained in the target hard disk.
It should be noted that, the hard disk may detect itself at regular time, and the detected result includes whether there is a bad block in the hard disk, the proportion of the bad block in the hard disk to all the data blocks in the hard disk (bad block proportion), and then store the detected result as self-checking information. If the hard disk is successfully connected in a reasonable time (the time obtained by timing does not reach the preset time), self-checking information in each connected hard disk can be obtained at fixed time or in real time, if the proportion of bad blocks contained in the self-checking information reaches a proportion threshold value, the hard disk is seriously damaged, so that the step of determining that the hard disk is a fault disk can be executed, otherwise, the hard disk is considered to be continuously used, and therefore, the effective monitoring on whether the hard disk can work normally is further realized in the mode. In addition, the ratio threshold may be set according to actual needs, such as fifty percent.
In addition, if the time length obtained by timing does not reach the preset time length and the target hard disk can realize normal connection with the corresponding expander, the SAS bus expander scans all the hard disks which are normally connected with the SAS bus expander and comprise the target hard disk, broadcast is sent to the SAS controller, an operating system on the CPU confirms how many hard disks are in the equipment through the SAS controller, the allocation of the drive letter to each hard disk is uniformly managed, and if the hard disk cannot respond to the instruction of the operating system within a certain time, the operating system judges that the hard disk has faults and generates an alarm.
The method for checking the hard disk faults, provided by the embodiment of the invention, further comprises the following steps after the bad block proportion contained in the self-checking information is extracted:
if the bad block proportion is not zero, the priority of the destination hard disk is obtained, if the priority of the destination hard disk reaches the priority threshold, the step of determining that the destination hard disk is a fault disk is executed, if the priority of the destination hard disk does not reach the priority threshold, the step of determining whether the bad block proportion reaches the proportion threshold is executed, and if the bad block proportion is zero, the destination hard disk can work normally.
The priority threshold may be set according to actual needs, for example, a priority of 1 to 10 may be set for each hard disk, and the priority threshold may be set to 7, 8, or the like. When the proportion of the bad blocks is not zero, the existence of the bad blocks in the hard disk is indicated, at the moment, the priority of the hard disk can be checked, if the priority of the hard disk reaches a priority threshold, the priority of the hard disk is indicated to be high enough, and the requirements on the safety and the reliability of the data stored in the hard disk are high enough, so that the step of determining the hard disk as a fault disk can be directly executed as long as the existence of the bad blocks in the hard disk is indicated to ensure the safety and the reliability, and adverse effects caused by the bad blocks are avoided; if the priority of the hard disk does not reach the priority threshold, it indicates that a small amount of damage to the hard disk can be allowed, so that the step of determining whether the bad block proportion reaches the proportion threshold can be performed. In this way it is further ensured that the hard disk conditions meet the current demands for it.
The method for checking the hard disk fault provided by the embodiment of the invention can further comprise the following steps after determining that the time length obtained by timing reaches the time length threshold and the target hard disk cannot realize normal connection with the corresponding expander:
and attempting to control the destination hard disk to realize normal connection with the corresponding expander, if the attempt is successful, determining that the destination hard disk can realize normal connection with the corresponding expander, and if the attempt is failed, executing the step of determining that the destination hard disk is a fault disk.
After the hard disk connection is overtime, the SAS bus controller may try to connect with the hard disk again, for example, send a signal to the hard disk to determine whether the hard disk can connect based on whether the hard disk is fed back, so that the accuracy of the determination of whether the disk is a failed disk is further ensured through the attempt.
The method for checking the hard disk faults provided by the embodiment of the invention, based on the in-place signal, determines that any hard disk is inserted into the corresponding hard disk slot, and can comprise the following steps:
if the bit signal of any hard disk slot is changed from high level to low level, it is determined that any hard disk slot is inserted into the hard disk.
The SAS bus expander checks the on-site state of the hard disk in time or in real time, and when the on-site signal of the on-site state of the hard disk is changed from non-on-site to on-site, the hard disk insertion system can be determined; in order to facilitate the information to be obtained, in this embodiment, the corresponding in-place signal may be made to be high level when the hard disk is inserted into the corresponding hard disk slot, and the in-place signal may be pulled down after the hard disk is inserted, so that the hard disk insertion is determined after the in-place signal is changed from high level to low level, and this information obtaining manner is fast and effective.
The method for checking the hard disk fault provided by the embodiment of the invention can further comprise the following steps after determining that the target hard disk is the fault disk:
and sending the information of the target hard disk as the fault disk to a preset management terminal.
In addition, in order to facilitate the user to remotely acquire the information of the hard disk as the fault disk, the information of the hard disk as the fault disk can be sent to the operating system and the fault lamp is indicated to be lightened, and meanwhile, the information of the hard disk as the fault disk can be sent to the terminal of the user, namely the preset management terminal, so that the user is further ensured to replace the hard disk in time.
The SAS bus expander regularly inquires the in-place state and the connection state of the hard disk, so that the hard disk which cannot be normally connected to the system is arranged, a user is prompted to replace the hard disk through corresponding alarm, and the problem that an operating system cannot arrange a fault hard disk in a scene that the hard disk is not connected is solved.
The embodiment of the invention also provides a device for checking the hard disk faults, as shown in fig. 3, which specifically comprises:
a first determining module 11, configured to: monitoring an in-place signal of each hard disk slot position on a storage server, and determining any hard disk as a target hard disk when determining that any hard disk is currently inserted into the corresponding hard disk slot position based on the in-place signal;
a second determining module 12 for: starting timing from the moment that the target hard disk is inserted into the corresponding hard disk slot position, and determining that the target hard disk is a fault disk if the time length obtained by timing reaches a time length threshold value and the target hard disk cannot realize normal connection with the corresponding expander;
a fault reporting module 13 for: and sending the information of the target hard disk as the fault disk to a corresponding operating system, and simultaneously, lighting a fault lamp corresponding to the slot position of the hard disk inserted by the target hard disk.
The device for checking the hard disk fault provided by the embodiment of the invention can further comprise:
a third determining module, configured to: if the time length obtained by timing does not reach the preset time length and the target hard disk can realize normal connection with the corresponding expander, self-checking information obtained by detecting the target hard disk by the target hard disk is obtained at regular time; extracting bad block proportion contained in the self-checking information, if the bad block proportion reaches a proportion threshold value, executing the step of determining that the target hard disk is a fault disk, otherwise, determining that the target hard disk can be continuously used; bad block proportion is the proportion of bad blocks contained in the target hard disk to all data blocks contained in the target hard disk.
The device for checking the hard disk fault provided by the embodiment of the invention can further comprise:
the judging module is used for: after the bad block proportion contained in the self-checking information is extracted, if the bad block proportion is not zero, the priority of the target hard disk is obtained, if the priority of the target hard disk reaches the priority threshold, the step of determining that the target hard disk is a fault disk is executed, if the priority of the target hard disk does not reach the priority threshold, the step of determining whether the bad block proportion reaches the proportion threshold is executed, and if the bad block proportion is zero, the step of determining that the target hard disk can work normally is executed.
The device for checking the hard disk fault provided by the embodiment of the invention can further comprise:
an attempt to connect module for: after determining that the time length obtained by timing reaches the time length threshold and the target hard disk cannot realize normal connection with the corresponding expander, attempting to control the target hard disk to realize normal connection with the corresponding expander, if the attempt is successful, determining that the target hard disk can realize normal connection with the corresponding expander, and if the attempt fails, executing the step of determining that the target hard disk is a fault disk.
The first determining module of the device for checking the hard disk fault provided by the embodiment of the invention may include:
a determining unit configured to: if the bit signal of any hard disk slot is changed from high level to low level, it is determined that any hard disk slot is inserted into the hard disk.
The device for checking the hard disk fault provided by the embodiment of the invention can further comprise:
a sending module, configured to: after the target hard disk is determined to be the fault disk, information of the target hard disk as the fault disk is sent to a preset management terminal.
The embodiment of the invention also provides equipment for checking the hard disk faults, which can comprise:
a memory for storing a computer program;
a processor for implementing the steps of the method of any one of the above hard disk failure checks when executing a computer program.
The embodiment of the invention also provides a computer readable storage medium, wherein a computer program is stored on the computer readable storage medium, and the computer program can realize the steps of the method for checking the fault of any hard disk when being executed by a processor.
It should be noted that, for the description of the related parts in the device, the device and the storage medium for hard disk failure detection provided by the embodiment of the present invention, please refer to the detailed description of the corresponding parts in the method for hard disk failure detection provided by the embodiment of the present invention, and the detailed description is omitted herein. In addition, the parts of the above technical solutions provided in the embodiments of the present invention, which are consistent with the implementation principles of the corresponding technical solutions in the prior art, are not described in detail, so that redundant descriptions are avoided.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (7)

1. A method for hard disk failure detection, comprising:
monitoring in-place signals of all hard disk slots on a storage server, and determining any hard disk as a target hard disk when determining that any hard disk is inserted into the corresponding hard disk slot currently based on the in-place signals;
starting timing from the moment that the target hard disk is inserted into the corresponding hard disk slot, and determining that the target hard disk is a fault disk if the time length obtained by timing reaches a time length threshold and the target hard disk cannot realize normal connection with the corresponding expander;
transmitting the information of the target hard disk as a fault disk to a corresponding operating system, and simultaneously, lighting a fault lamp corresponding to a hard disk slot position in which the target hard disk is inserted;
if the time length obtained by timing does not reach the time length threshold value and the target hard disk can realize normal connection with the corresponding expander, self-checking information obtained by detecting the target hard disk by the target hard disk is obtained at regular time; extracting bad block proportion contained in the self-checking information, if the bad block proportion reaches a proportion threshold value, executing the step of determining that the target hard disk is a fault disk, otherwise, determining that the target hard disk can be continuously used; the bad block proportion is the proportion of the bad blocks contained in the target hard disk to all the data blocks contained in the target hard disk;
after extracting the bad block proportion contained in the self-checking information, the method further comprises the following steps:
and if the bad block proportion is not zero, acquiring the priority of the destination hard disk, if the priority of the destination hard disk reaches a priority threshold, executing the step of determining that the destination hard disk is a fault disk, if the priority of the destination hard disk does not reach the priority threshold, executing the step of determining whether the bad block proportion reaches the proportion threshold, and if the bad block proportion is zero, determining that the destination hard disk can work normally.
2. The method of claim 1, wherein after determining that the time duration obtained by the timer reaches the time duration threshold and the destination hard disk fails to achieve a normal connection with the corresponding extender, further comprising:
and attempting to control the destination hard disk to realize normal connection with the corresponding expander, if the attempt is successful, determining that the destination hard disk can realize normal connection with the corresponding expander, and if the attempt is failed, executing the step of determining that the destination hard disk is a fault disk.
3. The method of claim 2, wherein determining that there is any currently any hard disk inserted into the corresponding hard disk slot based on the in-bit signal comprises:
if the bit signal of any hard disk slot is changed from high level to low level, it is determined that any hard disk slot is inserted into the hard disk.
4. The method of claim 3, further comprising, after determining that the destination hard disk is a failed disk:
and sending the information of the target hard disk as the fault disk to a preset management terminal.
5. An apparatus for hard disk failure detection, comprising:
a first determining module, configured to: monitoring in-place signals of all hard disk slots on a storage server, and determining any hard disk as a target hard disk when determining that any hard disk is inserted into the corresponding hard disk slot currently based on the in-place signals;
a second determining module, configured to: starting timing from the moment that the target hard disk is inserted into the corresponding hard disk slot, and determining that the target hard disk is a fault disk if the time length obtained by timing reaches a time length threshold and the target hard disk cannot realize normal connection with the corresponding expander;
a fault reporting module for: transmitting the information of the target hard disk as a fault disk to a corresponding operating system, and simultaneously, lighting a fault lamp corresponding to a hard disk slot position in which the target hard disk is inserted;
a third determining module, configured to: if the time length obtained by timing does not reach the time length threshold value and the target hard disk can realize normal connection with the corresponding expander, self-checking information obtained by detecting the target hard disk by the target hard disk is obtained at regular time; extracting bad block proportion contained in the self-checking information, if the bad block proportion reaches a proportion threshold value, executing the step of determining that the target hard disk is a fault disk, otherwise, determining that the target hard disk can be continuously used; the bad block proportion is the proportion of the bad blocks contained in the target hard disk to all the data blocks contained in the target hard disk;
the judging module is used for: after the bad block proportion contained in the self-checking information is extracted, if the bad block proportion is not zero, the priority of the destination hard disk is obtained, if the priority of the destination hard disk reaches a priority threshold, the step of determining that the destination hard disk is a fault disk is executed, if the priority of the destination hard disk does not reach the priority threshold, the step of determining whether the bad block proportion reaches the proportion threshold is executed, and if the bad block proportion is zero, the step of determining that the destination hard disk can work normally is executed.
6. An apparatus for hard disk failure detection, comprising:
a memory for storing a computer program;
a processor for implementing the steps of the method for hard disk failure checking according to any one of claims 1 to 4 when executing the computer program.
7. A computer readable storage medium, characterized in that it has stored thereon a computer program which, when executed by a processor, implements the steps of the method for hard disk failure checking according to any of claims 1 to 4.
CN201911332551.7A 2019-12-22 2019-12-22 Method, device, equipment and storage medium for hard disk fault detection Active CN111124785B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911332551.7A CN111124785B (en) 2019-12-22 2019-12-22 Method, device, equipment and storage medium for hard disk fault detection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911332551.7A CN111124785B (en) 2019-12-22 2019-12-22 Method, device, equipment and storage medium for hard disk fault detection

Publications (2)

Publication Number Publication Date
CN111124785A CN111124785A (en) 2020-05-08
CN111124785B true CN111124785B (en) 2024-02-09

Family

ID=70501364

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911332551.7A Active CN111124785B (en) 2019-12-22 2019-12-22 Method, device, equipment and storage medium for hard disk fault detection

Country Status (1)

Country Link
CN (1) CN111124785B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112379832B (en) * 2020-11-05 2023-04-25 杭州海康威视数字技术股份有限公司 Storage medium detection method and device
CN113868009A (en) * 2021-10-20 2021-12-31 南昌逸勤科技有限公司 Automatic repairing method, equipment and storage medium of SAS expander

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0747817A2 (en) * 1995-06-07 1996-12-11 Tandem Computers Incorporated Data communication method in a fail-fast, fail-functional, fault-tolerant multiprocessor system
CA2251455A1 (en) * 1997-12-24 1999-06-24 Barry E. Wood Computing system having fault containment
CN101149696A (en) * 2006-09-22 2008-03-26 鸿富锦精密工业(深圳)有限公司 Hard disk test system
CN101359309A (en) * 2007-08-03 2009-02-04 中兴通讯股份有限公司 Status indication apparatus for hard disc of serial connection small computer system interface and method
JP4503173B2 (en) * 1998-01-30 2010-07-14 オブジェクト テクノロジー ライセンシング コーポレイション Apparatus and method for modeling the operation of an expansion board in a computer system
CN105279057A (en) * 2015-11-10 2016-01-27 浪潮(北京)电子信息产业有限公司 Disk bad track detection method and system
CN106990919A (en) * 2017-03-04 2017-07-28 郑州云海信息技术有限公司 The memory management method and device of automatic separating fault disk
CN207020663U (en) * 2017-07-17 2018-02-16 环达电脑(上海)有限公司 PCIe device
CN109359016A (en) * 2018-09-27 2019-02-19 郑州云海信息技术有限公司 A kind of hard disk alarm method and device
CN109766249A (en) * 2019-01-09 2019-05-17 郑州云海信息技术有限公司 A kind of state display device of array hard disk
CN109815074A (en) * 2019-01-22 2019-05-28 郑州云海信息技术有限公司 A kind of method and system checking hard disk sequence in disk plug test process

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009003587A (en) * 2007-06-20 2009-01-08 Fujitsu Ltd Testing device, testing card and testing system
CN106649011A (en) * 2016-12-02 2017-05-10 曙光信息产业(北京)有限公司 Detection method and detection device for server equipment

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0747817A2 (en) * 1995-06-07 1996-12-11 Tandem Computers Incorporated Data communication method in a fail-fast, fail-functional, fault-tolerant multiprocessor system
CA2251455A1 (en) * 1997-12-24 1999-06-24 Barry E. Wood Computing system having fault containment
JP4503173B2 (en) * 1998-01-30 2010-07-14 オブジェクト テクノロジー ライセンシング コーポレイション Apparatus and method for modeling the operation of an expansion board in a computer system
CN101149696A (en) * 2006-09-22 2008-03-26 鸿富锦精密工业(深圳)有限公司 Hard disk test system
CN101359309A (en) * 2007-08-03 2009-02-04 中兴通讯股份有限公司 Status indication apparatus for hard disc of serial connection small computer system interface and method
CN105279057A (en) * 2015-11-10 2016-01-27 浪潮(北京)电子信息产业有限公司 Disk bad track detection method and system
CN106990919A (en) * 2017-03-04 2017-07-28 郑州云海信息技术有限公司 The memory management method and device of automatic separating fault disk
CN207020663U (en) * 2017-07-17 2018-02-16 环达电脑(上海)有限公司 PCIe device
CN109359016A (en) * 2018-09-27 2019-02-19 郑州云海信息技术有限公司 A kind of hard disk alarm method and device
CN109766249A (en) * 2019-01-09 2019-05-17 郑州云海信息技术有限公司 A kind of state display device of array hard disk
CN109815074A (en) * 2019-01-22 2019-05-28 郑州云海信息技术有限公司 A kind of method and system checking hard disk sequence in disk plug test process

Also Published As

Publication number Publication date
CN111124785A (en) 2020-05-08

Similar Documents

Publication Publication Date Title
CN101576860B (en) Detection method and detection system of linux or windows operation system
CN111124785B (en) Method, device, equipment and storage medium for hard disk fault detection
US20230333621A1 (en) Server firmware self-recovery system and server
CN109144789B (en) Method, device and system for restarting OSD
CN107678909B (en) Circuit and method for monitoring chip configuration state in server
CN104734979A (en) Control method for storage device externally connected with router
CN111048139A (en) Storage medium detection method, device, equipment and readable storage medium
CN111048138A (en) Hard disk fault detection method and related device
CN111459719B (en) Anti-downtime power-off protection system for computer
CN116560889A (en) Data link management method, device, computer equipment and storage medium
CN113672306B (en) Server component self-checking abnormity recovery method, device, system and medium
CN111880992B (en) Monitoring and maintaining method for controller state in storage device
CN111078484A (en) Power-off test method, device, equipment and storage medium for system upgrading
CN117573455A (en) PCIE equipment detection system, method, device and product
CN113868058A (en) Peripheral component high-speed interconnection equipment fault detection method and device and server
CN114281639A (en) Storage server fault SAS physical link shielding device and method
CN106201801B (en) electronic equipment and error reporting method
CN111930719A (en) Database access method, device and system
CN113992501A (en) Fault positioning system, method and computing device
CN110908839A (en) Method, device and equipment for relieving fault of logic module
CN101140540B (en) Method and system for detecting automatic monitoring magnetic array
CN116225804A (en) PCIe link detection method, system, equipment and storage medium
CN111913903A (en) Control system and method supporting NVMe disk hot plug
CN115098342A (en) System log collection method, system, terminal and storage medium
CN113835971A (en) Monitoring method for abnormal lighting of server backboard and related components

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant