CN116795610A - Hard disk fault detection method, system, device, computer equipment and storage medium - Google Patents

Hard disk fault detection method, system, device, computer equipment and storage medium Download PDF

Info

Publication number
CN116795610A
CN116795610A CN202310778421.6A CN202310778421A CN116795610A CN 116795610 A CN116795610 A CN 116795610A CN 202310778421 A CN202310778421 A CN 202310778421A CN 116795610 A CN116795610 A CN 116795610A
Authority
CN
China
Prior art keywords
hard disk
detected
fault
connection
state
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310778421.6A
Other languages
Chinese (zh)
Inventor
刘波
宋成磊
胡令超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202310778421.6A priority Critical patent/CN116795610A/en
Publication of CN116795610A publication Critical patent/CN116795610A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • G06F11/2205Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing using arrangements specific to the hardware being tested
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • G06F11/2273Test methods

Abstract

The invention relates to the technical field of server systems, and discloses a method, a system, a device, computer equipment and a storage medium for detecting hard disk faults, wherein the method comprises the following steps: acquiring the in-place states of a first number of hard disks to be detected; judging whether the hard disk to be detected is in place or not according to the in-place state; under the condition that the hard disk to be detected is in place, acquiring the connection state of the hard disk to be detected, and judging whether the hard disk to be detected fails or not according to the connection state; and sending out fault information under the condition that the hard disk to be detected is determined to be a fault hard disk. The invention solves the problems that the fault detection can not be carried out on all hard disks in the storage type server, and the specific hard disk with the fault can not be detected.

Description

Hard disk fault detection method, system, device, computer equipment and storage medium
Technical Field
The present invention relates to the field of server systems, and in particular, to a method, a system, an apparatus, a computer device, and a storage medium for detecting a hard disk failure.
Background
Because the number of hard disks directly output by the CPU is limited, a large number of high-capacity mechanical hard disks are expanded in the storage server by using SAS Expander (Serial Attached SCSI, serial attached SCSI interface hard disk expansion), thereby realizing mass storage. In a storage server, there are tens or even hundreds of hard disks. Even if the failure rate is only 1-2%, a storage server with 100 hard disks will have one to two bad disks. How to quickly and accurately identify a failed hard disk among a plurality of hard disks is very necessary.
There are two methods for detecting the hard disk failure at present, the first method is to determine whether the hard disk fails or not by detecting the change process of the hard disk from normal to failure. However, if a hard disk itself is bad, it is inserted into the storage server, the system cannot recognize, and cannot determine whether the hard disk is in place, and thus cannot automatically detect a failed hard disk by detecting a change. The other method is that firstly, the number of hard disks on the SAS Expander card in place is counted, and after the time delay is preset, the number of hard disks in a connection state is counted; if the number of the hard disks in place is different from the number of the hard disks in the connection state, the hard disks in the storage type server are determined to have faults. However, this method cannot detect which specific hard disk has a failure.
Therefore, the conventional technology has a problem that failure detection cannot be performed on all hard disks in the storage server, and that a specific hard disk has a failure cannot be detected.
Disclosure of Invention
In view of the above, the present invention provides a method, a system, an apparatus, a computer device and a storage medium for detecting hard disk failures, so as to solve the problem that in the prior art, failure detection cannot be performed on all hard disks in a storage server, and failure cannot be detected on a specific hard disk.
In a first aspect, the present invention provides a method for detecting a hard disk failure, the method comprising:
acquiring the in-place states of a first number of hard disks to be detected;
judging whether the hard disk to be detected is in place or not according to the in-place state;
under the condition that the hard disk to be detected is in place, acquiring the connection state of the hard disk to be detected, and judging whether the hard disk to be detected fails or not according to the connection state;
and sending out fault information under the condition that the hard disk to be detected is determined to be a fault hard disk.
According to the hard disk fault detection method provided by the embodiment, the on-site state and the connection state of each hard disk to be detected are obtained through excavation, so that the hard disk to be detected which is on site but abnormal in connection is identified as a fault hard disk. The method has the effects of accuracy, convenience and rapidness, has simple logic and can reduce false alarm conditions so as to achieve the effect of detecting which hard disk has faults in all hard disks in the storage type server. The problem that failure detection cannot be carried out on all hard disks in the storage server, and the failure of a specific hard disk cannot be detected is solved.
In an alternative embodiment, obtaining a connection state of the hard disk to be detected, and determining whether the hard disk to be detected is faulty according to the connection state includes:
Obtaining the connection state of the hard disk to be detected once every time a preset time step is passed in a preset time period to obtain a second number of connection states, wherein the second number is determined according to the preset time period and the preset time step;
judging whether the hard disk to be detected is successfully connected or not according to the second number of connection states;
under the condition that the hard disk to be detected is not successfully connected, the hard disk to be detected is determined to be a fault hard disk;
under the condition that the hard disk to be detected is successfully connected, judging whether the connection rate of the hard disk to be detected is within a preset range according to the second number of connection states;
and under the condition that the connection rate of the hard disk to be detected is not in the preset range, the hard disk to be detected is determined to be a fault hard disk.
In this embodiment, under the condition that the hard disk to be detected is in place, the connection state of the hard disk to be detected is obtained, whether the hard disk to be detected is successfully connected is first determined according to the connection state, and under the condition that the hard disk to be detected is successfully connected, whether the connection rate of the hard disk to be detected is normal is further determined, and the hard disk to be detected which is in place but is abnormally connected is identified as a fault hard disk. The process has the advantages of accuracy, convenience, rapidness, simple logic and difficult generation of false alarm.
In an alternative embodiment, after issuing the fault information, the method further comprises:
determining a slot position corresponding to the fault hard disk, wherein the slot position is used for inserting the hard disk to be detected;
under the condition that the hard disk in an in-place state does not exist on the slot position, canceling fault information;
under the condition that a hard disk in an in-place state exists on the slot position, acquiring the connection state of the hard disk, and judging whether the hard disk is successfully connected according to the connection state;
under the condition that the hard disk connection is successful, judging whether the connection rate of the hard disk is within a preset range;
and canceling the fault information under the condition that the connection rate of the hard disk is in a preset range.
In the embodiment, after the fault information is sent out, the slot position of the fault hard disk is obtained, the fault information is timely canceled under the condition that the hard disk in the in-place state does not exist in the slot position, and the fault information is timely canceled under the condition that the connection state of the hard disk in the in-place state does not exist in the slot position, so that the alarm information generated by the invention is more accurate, the situation of false alarm is avoided, and the quick positioning to the slot position information of the fault hard disk is facilitated.
In an alternative embodiment, in a case that the hard disk to be detected is determined to be a failed hard disk, sending out failure information includes:
Obtaining slot position information of a fault hard disk;
and sending out fault information according to the slot position information.
In the embodiment, under the condition that the hard disk to be detected is determined to be the fault hard disk, the slot information of the fault hard disk is acquired, and the fault information reflecting the slot information of the fault hard disk is sent out, so that the fault hard disk can be positioned according to the fault information, the method has simple logic, the fault information generated by the method is more accurate, and the fault hard disk is very convenient to identify.
In a second aspect, the present invention provides a hard disk failure detection system, the system comprising: the device comprises a hard disk expansion module, a logic device, a hard disk connection module and a hard disk state display module;
the logic device is connected with the hard disk connection module and is used for acquiring the in-place states of a first number of hard disks to be detected from the hard disk connection module;
the hard disk expansion module is connected with the logic device and is used for acquiring an in-place state from the logic device and judging whether the hard disk to be detected is in place or not according to the in-place state;
the hard disk expansion module is connected with the hard disk connection module and is used for acquiring the connection state of the hard disk to be detected from the hard disk connection module under the condition that the hard disk to be detected is in place;
the hard disk expansion module is used for judging whether the hard disk to be detected fails or not according to the connection state, and sending a failure display instruction to the logic device under the condition that the hard disk to be detected is determined to be the failed hard disk;
The logic device is connected with the hard disk state display module and is used for controlling the hard disk state display module to send out fault information under the condition that a fault display instruction is received.
According to the hard disk fault detection system provided by the embodiment, the hard disk expansion module obtains the in-place state of the hard disk to be detected from the logic device, and obtains the connection state of the hard disk to be detected from the hard disk connection module, so that the hard disk to be detected which is in place but is abnormal in connection is identified as the fault hard disk. The method has the effects of accuracy, convenience and rapidness, has simple logic and can reduce false alarm conditions so as to achieve the effect of detecting which hard disk has faults in all hard disks in the storage type server. The problem that failure detection cannot be carried out on all hard disks in the storage server, and the failure of a specific hard disk cannot be detected is solved.
In an alternative embodiment, the hard disk expansion module is connected with the logic device through a first link and a second link;
the hard disk expansion module acquires an in-place state from the logic device through a first link;
the hard disk expansion module sends a fault display instruction to the logic device through the second link.
In this embodiment, the hard disk expansion module obtains the in-place state from the logic device through the first link, and sends the fault display instruction to the logic device through the second link, so that the transmission of two data can be completed, and the detection and the sending of fault information to the hard disk can be conveniently realized.
In an alternative embodiment, the hard disk state display module includes a first number of hard disk state display units;
the hard disk expansion module is used for determining the slot information of the fault hard disk under the condition that the hard disk to be detected is determined to be the fault hard disk, generating a fault display instruction according to the slot information, and sending the fault display instruction to the logic device;
the logic device controls the hard disk state display unit corresponding to the slot position information to send out fault information according to the fault display instruction.
In the embodiment, under the condition that the hard disk to be detected is determined to be the fault hard disk, the hard disk expansion module acquires the slot information of the fault hard disk, and the logic device controls the hard disk state display unit to send out the fault information reflecting the slot information of the fault hard disk, so that the fault hard disk can be positioned according to the fault information, the method has simple logic, the fault information generated by the method is more accurate, and the fault hard disk is very convenient to identify.
In a third aspect, the present invention provides a hard disk failure detection apparatus, the apparatus comprising:
the acquisition module is used for acquiring the in-place states of the first number of hard disks to be detected;
the first judging module is used for judging whether the hard disk to be detected is in place or not according to the in-place state;
The second judging module is used for acquiring the connection state of the hard disk to be detected under the condition that the hard disk to be detected is in place and judging whether the hard disk to be detected fails or not according to the connection state;
and the sending module is used for sending out fault information under the condition that the hard disk to be detected is determined to be the fault hard disk.
In a fourth aspect, the present invention provides a computer device comprising: the hard disk fault detection method comprises the steps of storing computer instructions in a memory, and executing the computer instructions by the processor, wherein the memory and the processor are in communication connection, and the processor executes the hard disk fault detection method according to the first aspect or any implementation mode corresponding to the first aspect.
In a fifth aspect, the present invention provides a computer-readable storage medium having stored thereon computer instructions for causing a computer to execute the hard disk failure detection method of the first aspect or any of the embodiments corresponding thereto.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a method for detecting hard disk failures according to an embodiment of the present invention;
FIG. 2 is a flow diagram of single hard disk failure detection according to an embodiment of the present invention;
FIG. 3 is an exemplary diagram of a failed hard disk detected in accordance with an embodiment of the present invention;
FIG. 4 is a block diagram of a hard disk failure detection system according to an embodiment of the present invention;
fig. 5 is a block diagram of a hard disk failure detection apparatus according to an embodiment of the present invention;
fig. 6 is a schematic diagram of a hardware structure of a computer device according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
At present, a storage type server is generally used for managing network data, because the number of hard disks directly output by a CPU in the storage type server is limited, the storage type server expands an SAS HBA card (Host bus adapter) by using PCIE (peripheral component interconnect express, high-speed serial computer expansion bus standard), each port on the SAS HBA card is connected with an SAS Expander card through a plurality of physical interfaces, a plurality of hard disks are hung on the SAS Expander (Serial Attached SCSI, serial connection SCSI interface hard disk expansion) card, a large number of high-capacity mechanical hard disks are expanded through the SAS Expander, meanwhile, the data storage performance is effectively ensured, and the effect of reading and writing network data is realized through the plurality of hung hard disks. It is necessary to monitor a plurality of hard disks for hard disk failures and to quickly and accurately identify a failed hard disk therefrom.
The current method for monitoring the hard disk faults comprises the following steps: after the SAS Expander card is electrified, counting the number of hard disks on bit on the SAS Expander card, and counting the number of hard disks in a connection state after delaying for a preset time; if the number of the hard disks in place is different from the number of the hard disks in the connection state, confirming the abnormal identification of the hard disks. After confirming the abnormal identification of the hard disk, resetting the SAS Expander card and delaying for a period of time, and then counting the number of the hard disks in a connection state again and comparing the number with the in-place number of the hard disks, so that the problem of the abnormal identification of the hard disk is solved or ended after the preset times of circulation. The method is to conduct fault identification on the whole hard disk so as to identify the hard disk abnormality and ensure the starting speed of the SAS Expander card and the response speed of the network system. The fault identification of a single hard disk cannot be realized, and after the hard disk fault is identified, the alarm information aiming at the fault hard disk cannot be output.
The embodiment of the invention provides a hard disk fault detection method, which is used for acquiring physical in-place information and connection (link) information of each hard disk through excavation, so that the hard disk with in-place but abnormal link is identified as a fault hard disk. The false alarm condition is reduced, so that the effect of detecting which hard disk has faults in all hard disks in the storage type server is achieved.
According to an embodiment of the present invention, there is provided an embodiment of a hard disk failure detection method, it should be noted that the steps shown in the flowchart of the drawings may be performed in a server device having data processing capability, and that although a logical order is shown in the flowchart, in some cases, the steps shown or described may be performed in an order different from that herein.
In this embodiment, a method for detecting a hard disk failure is provided, which may be used in the server device described above, and fig. 1 is a flowchart of a method for detecting a hard disk failure according to an embodiment of the present invention, as shown in fig. 1, where the flowchart includes the following steps:
step S101, obtaining the in-place states of a first number of hard disks to be detected.
Specifically, the hard disk failure detection method of the present invention is implemented based on SAS Expander (hereinafter referred to as hard disk expansion module) and CPLD (Complex Programmable Logic Device, complex programmable logic device, hereinafter referred to as logic device). Because fault detection needs to be performed on each hard disk in the storage type server, each hard disk is a hard disk to be detected, the first number represents a plurality of hard disks, and the number is equal to the number of hard disks installed in the target storage type server. The hard disk expansion module initializes all the hard disks to be detected to a normal state, and then obtains the in-place state of each hard disk to be detected in the target storage type server through the logic device. The above process is shown in fig. 2, and the hard disk expansion module initializes the hard disk to a normal state.
Step S102, judging whether the hard disk to be detected is in place or not according to the in-place state.
Specifically, whether the hard disk to be detected is in place or not can be determined according to the in-place state of the hard disk to be detected, if the hard disk to be detected is not in place, the hard disk to be detected on the slot is considered to be a normal hard disk, and other operations are not performed; if the hard disk to be detected is in place, judging whether the hard disk to be detected fails or not according to the connection (link) state of the hard disk to be detected. The above process is as shown in fig. 2, the hard disk expansion module performs hard disk on-site judgment, if not, the hard disk is normal, and if on-site, the hard disk connection judgment is performed.
Step S103, under the condition that the hard disk to be detected is in place, the connection state of the hard disk to be detected is obtained, and whether the hard disk to be detected fails or not is judged according to the connection state.
Specifically, under the condition that the hard disk to be detected is in place, the hard disk expansion module acquires the connection (link) state of each hard disk to be detected, judges whether the hard disk to be detected is faulty or not through the connection (link) state, and determines that the hard disk to be detected is normal if the connection state of the hard disk to be detected is normal; and if the hard disk to be detected is abnormal for a long time according to the connection state, determining that the hard disk to be detected has faults. The process is as shown in fig. 2, the hard disk expansion module performs hard disk connection judgment, and if the hard disk connection is normal, the hard disk is considered to be normal; if the hard disk connection is abnormal for a long time, the hard disk is determined to be faulty.
Step S104, sending out fault information under the condition that the hard disk to be detected is determined to be a fault hard disk.
Specifically, under the condition that the hard disk to be detected is determined to be a fault hard disk, the hard disk expansion module controls the logic device to send out fault information, wherein the fault information is used for informing production and maintenance personnel of slot position information where the hard disk is in fault, and the form of the fault information comprises: the method comprises the steps of lighting a fault lamp, triggering alarm information, recording logs, popup window reminding and the like, and outputting the slot position information. In the above process, as shown in fig. 2, the hard disk expansion module lights up the fault lamp through the logic device in case of recognizing the hard disk fault.
As shown in fig. 3, by using a hard disk failure detection method of the present invention, failure detection is performed on all hard disks in a storage server having 60 hard disks, from which two failed hard disks are detected. According to the method, the fault hard disk can be identified within 1 minute, and after the fault hard disk is identified, the fault lamp of the slot position of the fault hard disk can be turned on, so that production and maintenance personnel can quickly locate two fault hard disks in the figure 3 and replace the two fault hard disks, and the fault operation of replacing the normal hard disk as the fault hard disk is prevented.
According to the hard disk fault detection method provided by the embodiment, the on-site state and the connection state of each hard disk to be detected are obtained through excavation, so that the hard disk to be detected which is on site but abnormal in connection is identified as a fault hard disk. The method has the effects of accuracy, convenience and rapidness, has simple logic and can reduce false alarm conditions so as to achieve the effect of detecting which hard disk has faults in all hard disks in the storage type server. The problem that failure detection cannot be carried out on all hard disks in the storage server, and the failure of a specific hard disk cannot be detected is solved.
In some optional embodiments, obtaining a connection state of the hard disk to be detected, and determining whether the hard disk to be detected has a fault according to the connection state includes:
obtaining the connection state of the hard disk to be detected once every time a preset time step is passed in a preset time period to obtain a second number of connection states, wherein the second number is determined according to the preset time period and the preset time step;
judging whether the hard disk to be detected is successfully connected or not according to the second number of connection states;
under the condition that the hard disk to be detected is not successfully connected, the hard disk to be detected is determined to be a fault hard disk;
Under the condition that the hard disk to be detected is successfully connected, judging whether the connection rate of the hard disk to be detected is within a preset range according to the second number of connection states;
and under the condition that the connection rate of the hard disk to be detected is not in the preset range, the hard disk to be detected is determined to be a fault hard disk.
Specifically, under the condition that the hard disk to be detected is in place, the hard disk expansion module acquires the connection (link) state of each hard disk to be detected, the acquiring process lasts for a preset time period, the connection state of the hard disk to be detected is acquired once without a preset time step in the preset time period, and the connection states of the hard disk to be detected are acquired for a second number of times, for example: the preset time period is 60 seconds, the preset time step is 2 seconds, the connection (link) state of the hard disk is detected every 2s continuously in 60 seconds, and the connection state of the hard disk to be detected is obtained 30 times in total.
Judging whether the hard disk to be detected is connected (link) according to the second number of connection states, if not, considering that the connection of the hard disk to be detected is unsuccessful, and directly recognizing that the hard disk to be detected is a fault hard disk; if the connection is on, judging whether the connection (link) rate of the hard disk to be detected is abnormal or not according to the second number of connection states, such as whether the connection rate of the hard disk to be detected is too low or not and whether the connection rate of the hard disk to be detected meets the standard requirement or not. If the connection rate of the hard disk to be detected is no longer within the preset range, the hard disk to be detected is considered to have faults, and the hard disk to be detected is the fault hard disk. For example: and according to the connection state of the hard disk to be detected, which is obtained for 30 times within 60 seconds, determining that the connection state of the hard disk to be detected is abnormal, and if the connection state of the hard disk to be detected is abnormal after the time of 60 seconds passes, determining that the hard disk is failed.
In this embodiment, under the condition that the hard disk to be detected is in place, the connection state of the hard disk to be detected is obtained, whether the hard disk to be detected is successfully connected is first determined according to the connection state, and under the condition that the hard disk to be detected is successfully connected, whether the connection rate of the hard disk to be detected is normal is further determined, and the hard disk to be detected which is in place but is abnormally connected is identified as a fault hard disk. The process has the advantages of accuracy, convenience, rapidness, simple logic and difficult generation of false alarm.
In some alternative embodiments, after issuing the fault information, the method further comprises:
determining a slot position corresponding to the fault hard disk, wherein the slot position is used for inserting the hard disk to be detected;
under the condition that the hard disk in an in-place state does not exist on the slot position, canceling fault information;
under the condition that a hard disk in an in-place state exists on the slot position, acquiring the connection state of the hard disk, and judging whether the hard disk is successfully connected according to the connection state;
under the condition that the hard disk connection is successful, judging whether the connection rate of the hard disk is within a preset range;
and canceling the fault information under the condition that the connection rate of the hard disk is in a preset range.
Specifically, after the fault information is sent out, production and maintenance personnel can quickly locate the slot position information of the fault hard disk according to the fault information and replace the fault hard disk. The hard disk expansion module will detect this process and determine if fault information is to be canceled, for example: the fault lamp is extinguished.
The hard disk expansion module firstly determines a slot corresponding to the fault hard disk, judges whether the hard disk in an in-place state exists on the slot, and if the hard disk is out of place, the hard disk expansion module determines that the hard disk is normal, and controls the logic device to cancel fault information; if the hard disk is in place, the connection state of the hard disk in the in-place state on the slot is obtained, whether the hard disk in the in-place state is successfully connected is judged according to the connection state, if the hard disk is successfully connected, whether the connection rate is abnormal or not is further judged according to the connection state, namely, whether the connection rate is in a preset range or not is judged, if the connection rate of the hard disk in the in-place state is in the preset range, the connection rate of the hard disk in the in-place state is normal, and the hard disk expansion module controls the logic device to cancel fault information.
In the embodiment, after the fault information is sent out, the slot position of the fault hard disk is obtained, the fault information is timely canceled under the condition that the hard disk in the in-place state does not exist in the slot position, and the fault information is timely canceled under the condition that the connection state of the hard disk in the in-place state does not exist in the slot position, so that the alarm information generated by the invention is more accurate, the situation of false alarm is avoided, and the quick positioning to the slot position information of the fault hard disk is facilitated.
In some optional embodiments, in a case that the hard disk to be detected is determined to be a failed hard disk, sending out failure information includes:
obtaining slot position information of a fault hard disk;
and sending out fault information according to the slot position information.
Specifically, after the hard disk expansion module recognizes that the hard disk is in an in-place state but the long-time connection (link) state is abnormal, the hard disk expansion module controls the logic device to send out fault information, and the specific steps include: the hard disk expansion module firstly acquires slot information of a fault hard disk, the hard disk expansion module sends a control instruction to the logic device according to the slot information, the logic device controls a component for displaying the state of the hard disk to send out fault information capable of displaying the slot information according to the control instruction, the fault information is used for informing production and maintenance personnel of the slot information where the hard disk fault is located, and the form of the fault information comprises: the method comprises the steps of lighting a fault lamp, triggering alarm information, recording logs, popup window reminding and the like, and outputting the slot position information.
In the embodiment, under the condition that the hard disk to be detected is determined to be the fault hard disk, the slot information of the fault hard disk is acquired, and the fault information reflecting the slot information of the fault hard disk is sent out, so that the fault hard disk can be positioned according to the fault information, the method has simple logic, the fault information generated by the method is more accurate, and the fault hard disk is very convenient to identify.
In this embodiment, a hard disk failure detection system is provided, which may be deployed in the server device described above, and includes: the device comprises a hard disk expansion module, a logic device, a hard disk connection module and a hard disk state display module;
the logic device is connected with the hard disk connection module and is used for acquiring the in-place states of a first number of hard disks to be detected from the hard disk connection module;
the hard disk expansion module is connected with the logic device and is used for acquiring an in-place state from the logic device and judging whether the hard disk to be detected is in place or not according to the in-place state;
the hard disk expansion module is connected with the hard disk connection module and is used for acquiring the connection state of the hard disk to be detected from the hard disk connection module under the condition that the hard disk to be detected is in place;
the hard disk expansion module is used for judging whether the hard disk to be detected fails or not according to the connection state, and sending a failure display instruction to the logic device under the condition that the hard disk to be detected is determined to be the failed hard disk;
the logic device is connected with the hard disk state display module and is used for controlling the hard disk state display module to send out fault information under the condition that a fault display instruction is received.
Specifically, fig. 4 is a block diagram of a hard disk failure detection system according to an embodiment of the present invention, and as shown in fig. 4, the system includes: the device comprises a hard disk expansion module, a logic device, a hard disk connection module and a hard disk state display module, wherein the hard disk connection module specifically comprises a plurality of hard disk connectors, the number of the hard disk connectors is consistent with that of hard disks to be detected, the hard disk expansion module is a Serial Attached SCSI (SAS) interface hard disk expansion module, and the logic device is a Complex Programmable Logic Device (CPLD).
The first number of in-place states of the hard disk to be detected is generated on the hard disk connection module, the hard disk connection module is connected with the logic device through a PIN (personal identification number, personal identification code), the logic device can receive the in-place states of the hard disk to be detected, the first number represents a plurality of the in-place states, and the number is equal to the number of the hard disks installed in the target storage server in value. The hard disk expansion module is connected with the logic device and is used for acquiring the in-place states of all the hard disks to be detected from the logic device and judging whether each hard disk to be detected is in place or not in sequence according to the in-place states.
The hard disk expansion module is connected with the hard disk connection module through an SAS channel (Serial Attached SCSI, serial connection SCSI interface and serial connection small computer system interface), and the hard disk expansion module acquires the connection (link) state of the hard disk to be detected from the hard disk connection module through the SAS channel under the condition that the hard disk to be detected is determined to be in place.
And the hard disk expansion module judges whether the hard disk to be detected has faults or not according to the connection state, and sends a fault display instruction to the logic device under the condition that the hard disk to be detected is determined to be the fault hard disk. The logic device is connected with the hard disk state display module and is used for controlling the hard disk state display module to send out fault information under the condition that a fault display instruction is received.
It should be noted that, the hard disk expansion module and the hard disk connection module can be connected by other connection modes such as optical fiber channel, etc., the SAS channel has higher flexibility, can be compatible with SATA (Serial Advanced Technology Attachment, serial hard disk interface), and can save investment; the system has higher expansibility, can directly connect a large number of devices at most through the SAS channel, and improves the performance of the SAS channel along with the increase of the number of ports due to the point-to-point architecture; SAS channels also have a more rational cable design, providing more efficient heat dissipation in high density environments. Therefore, in the present invention, the hard disk expansion module is preferentially connected to the hard disk connection module through the SAS channel.
As shown in fig. 3, by using a hard disk failure detection method of the present invention, failure detection is performed on all hard disks in a storage server having 60 hard disks, from which two failed hard disks are detected. According to the method, the fault hard disk can be identified within 1 minute, and after the fault hard disk is identified, the fault lamp of the slot position of the fault hard disk can be turned on, so that production and maintenance personnel can quickly locate two fault hard disks in the figure 3 and replace the two fault hard disks, and the fault operation of replacing the normal hard disk as the fault hard disk is prevented.
According to the hard disk fault detection system provided by the embodiment, the hard disk expansion module obtains the in-place state of the hard disk to be detected from the logic device, and obtains the connection state of the hard disk to be detected from the hard disk connection module, so that the hard disk to be detected which is in place but is abnormal in connection is identified as the fault hard disk. The method has the effects of accuracy, convenience and rapidness, has simple logic and can reduce false alarm conditions so as to achieve the effect of detecting which hard disk has faults in all hard disks in the storage type server. The problem that failure detection cannot be carried out on all hard disks in the storage server, and the failure of a specific hard disk cannot be detected is solved.
In some alternative embodiments, the hard disk expansion module is connected to the logic device via a first link and a second link;
the hard disk expansion module acquires an in-place state from the logic device through a first link;
the hard disk expansion module sends a fault display instruction to the logic device through the second link.
Specifically, as shown in fig. 4, the hard disk expansion module is connected to the logic device through a first link and a second link, wherein the first link may be an IIC (Inter-Integrated Circuit, simple bidirectional two-wire synchronous serial bus) link, and the second link may be an SGPIO (Serial General Purpose Input/Output, serial universal input Output) link.
The hard disk expansion module passes through a first link, for example: and the IIC link is used for polling in the logic device to acquire the bit state of the hard disk to be detected. After the hard disk expansion module detects the failed hard disk, a failure display instruction is generated, and the failure display instruction is transmitted through a second link, for example: and the SGPIO link is used for sending the fault display instruction to the logic device.
It should be noted that the IIC link is a bidirectional, binary, synchronous serial communication standard, and only two devices need to complete data transmission between devices on the bus. The method has the advantages of few interfaces, simple control and high communication efficiency. Besides simple single-host communication, the method can be applied to a multi-host communication system, has collision detection and arbitration functions, and prevents data destruction. Serial generalized input output (SGPIO), which defines the communication between an initiator device and a target device, is one way to serialize generalized IO signals.
In this embodiment, the hard disk expansion module obtains the in-place state from the logic device through the first link, and sends the fault display instruction to the logic device through the second link, so that the transmission of two data can be completed, and the detection and the sending of fault information to the hard disk can be conveniently realized.
In some alternative embodiments, the hard disk state display module includes a first number of hard disk state display units;
the hard disk expansion module is used for determining the slot information of the fault hard disk under the condition that the hard disk to be detected is determined to be the fault hard disk, generating a fault display instruction according to the slot information, and sending the fault display instruction to the logic device;
the logic device controls the hard disk state display unit corresponding to the slot position information to send out fault information according to the fault display instruction.
Specifically, the hard disk state display module includes a first number of hard disk state display units, where the number of the hard disk state display units is the same as the number of hard disks to be detected, and the hard disk state display units may be hard disk state lamps.
After the hard disk expansion module is in an in-place state but the abnormal state of the long-time connection (link) is identified as a fault hard disk, the slot position information of the fault hard disk is acquired, a fault display instruction is generated according to the slot position information, and the fault display instruction is sent to the logic device. When the hard disk status display unit is a hard disk status lamp, the failure display instruction may be a failure lamp lighting command.
After receiving the control of the fault display instruction, the logic device controls the hard disk state display unit corresponding to the slot position of the fault hard disk to send out fault information according to the fault display instruction, wherein the fault information is used for informing production and maintenance personnel of the slot position information of the fault of the hard disk, and the form of the fault information comprises: the method comprises the steps of lighting a fault lamp, triggering alarm information, recording logs, popup window reminding and the like, and outputting the slot position information. When the hard disk state display unit is a hard disk state lamp, the logic device can light up faults and the like corresponding to the groove positions where the fault hard disk is located.
In the embodiment, under the condition that the hard disk to be detected is determined to be the fault hard disk, the hard disk expansion module acquires the slot information of the fault hard disk, and the logic device controls the hard disk state display unit to send out the fault information reflecting the slot information of the fault hard disk, so that the fault hard disk can be positioned according to the fault information, the method has simple logic, the fault information generated by the method is more accurate, and the fault hard disk is very convenient to identify.
In this embodiment, a device for detecting a hard disk failure is further provided, and the device is used to implement the foregoing embodiments and preferred embodiments, and is not described in detail. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. While the means described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated.
The present embodiment provides a hard disk failure detection apparatus, as shown in fig. 5, including:
the obtaining module 501 is configured to obtain in-place states of a first number of hard disks to be detected;
the first judging module 502 is configured to judge whether the hard disk to be detected is in place according to the in-place state;
A second judging module 503, configured to obtain a connection state of the hard disk to be detected when the hard disk to be detected is in place, and judge whether the hard disk to be detected fails according to the connection state;
and the sending module 504 is configured to send out fault information when it is determined that the hard disk to be detected is a faulty hard disk.
In some alternative embodiments, the second determining module 503 includes:
the first acquisition unit is used for acquiring the connection state of the hard disk to be detected once every time a preset time step is passed in a preset time period to obtain a second number of connection states, wherein the second number is determined according to the preset time period and the preset time step;
the first judging unit is used for judging whether the hard disk to be detected is successfully connected according to the second number of connection states;
the first identification unit is used for identifying the hard disk to be detected as a fault hard disk under the condition that the hard disk to be detected is not connected successfully;
the second judging unit is used for judging whether the connection rate of the hard disk to be detected is in a preset range according to the second number of connection states under the condition that the hard disk to be detected is successfully connected;
and the second identification unit is used for identifying the hard disk to be detected as a fault hard disk under the condition that the connection rate of the hard disk to be detected is not in the preset range.
In some alternative embodiments, the apparatus further comprises:
the determining module is used for determining a slot position corresponding to the fault hard disk, wherein the slot position is used for inserting the hard disk to be detected;
the first cancellation module is used for canceling the fault information under the condition that the hard disk in the in-place state does not exist on the slot position;
the third judging module is used for acquiring the connection state of the hard disk under the condition that the hard disk in the in-place state exists on the slot position, and judging whether the hard disk is successfully connected according to the connection state;
the fourth judging module is used for judging whether the connection rate of the hard disk is in a preset range or not under the condition that the hard disk is successfully connected;
and the second cancellation module is used for canceling the fault information under the condition that the connection rate of the hard disk is in a preset range.
In some alternative embodiments, the issue module 504 includes:
the second acquisition unit is used for acquiring slot position information of the fault hard disk;
and the sending unit is used for sending out fault information according to the slot position information.
Further functional descriptions of the above respective modules and units are the same as those of the above corresponding embodiments, and are not repeated here.
The hard disk failure detection means in this embodiment are presented in the form of functional units, here referred to as ASIC (Application Specific Integrated Circuit ) circuits, processors and memories executing one or more software or fixed programs, and/or other devices that can provide the above described functionality.
The embodiment of the invention also provides computer equipment, which is provided with the hard disk fault detection device shown in the figure 5.
Referring to fig. 6, fig. 6 is a schematic structural diagram of a computer device according to an alternative embodiment of the present invention, as shown in fig. 6, the computer device includes: one or more processors 10, memory 20, and interfaces for connecting the various components, including high-speed interfaces and low-speed interfaces. The various components are communicatively coupled to each other using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the computer device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In some alternative embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple computer devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). One processor 10 is illustrated in fig. 6.
The processor 10 may be a central processor, a network processor, or a combination thereof. The processor 10 may further include a hardware chip, among others. The hardware chip may be an application specific integrated circuit, a programmable logic device, or a combination thereof. The programmable logic device may be a complex programmable logic device, a field programmable gate array, a general-purpose array logic, or any combination thereof.
Wherein the memory 20 stores instructions executable by the at least one processor 10 to cause the at least one processor 10 to perform a method for implementing the embodiments described above.
The memory 20 may include a storage program area that may store an operating system, at least one application program required for functions, and a storage data area; the storage data area may store data created according to the use of the computer device, etc. In addition, the memory 20 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some alternative embodiments, memory 20 may optionally include memory located remotely from processor 10, which may be connected to the computer device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
Memory 20 may include volatile memory, such as random access memory; the memory may also include non-volatile memory, such as flash memory, hard disk, or solid state disk; the memory 20 may also comprise a combination of the above types of memories.
The computer device also includes a communication interface 30 for the computer device to communicate with other devices or communication networks.
The embodiments of the present invention also provide a computer readable storage medium, and the method according to the embodiments of the present invention described above may be implemented in hardware, firmware, or as a computer code which may be recorded on a storage medium, or as original stored in a remote storage medium or a non-transitory machine readable storage medium downloaded through a network and to be stored in a local storage medium, so that the method described herein may be stored on such software process on a storage medium using a general purpose computer, a special purpose processor, or programmable or special purpose hardware. The storage medium can be a magnetic disk, an optical disk, a read-only memory, a random access memory, a flash memory, a hard disk, a solid state disk or the like; further, the storage medium may also comprise a combination of memories of the kind described above. It will be appreciated that a computer, processor, microprocessor controller or programmable hardware includes a storage element that can store or receive software or computer code that, when accessed and executed by the computer, processor or hardware, implements the methods illustrated by the above embodiments.
Although embodiments of the present invention have been described in connection with the accompanying drawings, various modifications and variations may be made by those skilled in the art without departing from the spirit and scope of the invention, and such modifications and variations fall within the scope of the invention as defined by the appended claims.

Claims (10)

1. A method for detecting a hard disk failure, the method comprising:
acquiring the in-place states of a first number of hard disks to be detected;
judging whether the hard disk to be detected is in place or not according to the in-place state;
under the condition that the hard disk to be detected is in place, acquiring the connection state of the hard disk to be detected, and judging whether the hard disk to be detected fails or not according to the connection state;
and sending out fault information under the condition that the hard disk to be detected is determined to be a fault hard disk.
2. The method of claim 1, wherein the obtaining the connection state of the hard disk to be detected, and determining whether the hard disk to be detected is faulty according to the connection state, comprises:
obtaining a second number of connection states of the hard disk to be detected every time a preset time step is passed in a preset time period, wherein the second number is determined according to the preset time period and the preset time step;
Judging whether the hard disk to be detected is successfully connected or not according to the second number of connection states;
under the condition that the hard disk to be detected is not successfully connected, the hard disk to be detected is considered to be the fault hard disk;
under the condition that the hard disk to be detected is successfully connected, judging whether the connection rate of the hard disk to be detected is in a preset range or not according to a second number of connection states;
and under the condition that the connection rate of the hard disk to be detected is not in the preset range, the hard disk to be detected is considered to be the fault hard disk.
3. The method of claim 1, wherein after the issuing of the fault information, the method further comprises:
determining a slot position corresponding to the fault hard disk, wherein the slot position is used for inserting the hard disk to be detected;
under the condition that the hard disk in an in-place state does not exist on the slot position, canceling the fault information;
under the condition that a hard disk in an in-place state exists on the slot position, acquiring the connection state of the hard disk, and judging whether the hard disk is successfully connected according to the connection state;
judging whether the connection rate of the hard disk is in a preset range or not under the condition that the hard disk is successfully connected;
And canceling the fault information under the condition that the connection rate of the hard disk is in the preset range.
4. The method according to claim 1, wherein the issuing fault information in the case that the hard disk to be detected is determined to be a faulty hard disk includes:
acquiring slot position information of the fault hard disk;
and sending out the fault information according to the slot position information.
5. A hard disk failure detection system, the system comprising: the device comprises a hard disk expansion module, a logic device, a hard disk connection module and a hard disk state display module;
the logic device is connected with the hard disk connection module and is used for acquiring the in-place states of a first number of hard disks to be detected from the hard disk connection module;
the hard disk expansion module is connected with the logic device and is used for acquiring the on-site state from the logic device and judging whether the hard disk to be detected is on site or not according to the on-site state;
the hard disk expansion module is connected with the hard disk connection module and is used for acquiring the connection state of the hard disk to be detected from the hard disk connection module under the condition that the hard disk to be detected is in place;
the hard disk expansion module is used for judging whether the hard disk to be detected is faulty according to the connection state, and sending a fault display instruction to the logic device under the condition that the hard disk to be detected is determined to be the faulty hard disk;
The logic device is connected with the hard disk state display module and is used for controlling the hard disk state display module to send out fault information under the condition that the fault display instruction is received.
6. The system of claim 5, wherein the hard disk expansion module is coupled to the logic device via a first link and a second link;
the hard disk expansion module acquires the in-place state from the logic device through the first link;
and the hard disk expansion module sends the fault display instruction to the logic device through the second link.
7. The system of claim 5, wherein the hard disk status display module comprises a first number of hard disk status display units;
the hard disk expansion module is used for determining the slot information of the fault hard disk under the condition that the hard disk to be detected is the fault hard disk, generating a fault display instruction according to the slot information, and sending the fault display instruction to the logic device;
and the logic device controls the hard disk state display unit corresponding to the slot position information to send out the fault information according to the fault display instruction.
8. A hard disk failure detection apparatus, the apparatus comprising:
the acquisition module is used for acquiring the in-place states of the first number of hard disks to be detected;
the first judging module is used for judging whether the hard disk to be detected is in place or not according to the in-place state;
the second judging module is used for acquiring the connection state of the hard disk to be detected under the condition that the hard disk to be detected is in place and judging whether the hard disk to be detected fails or not according to the connection state;
and the sending module is used for sending out fault information under the condition that the hard disk to be detected is determined to be a fault hard disk.
9. A computer device, comprising:
a memory and a processor, the memory and the processor being communicatively connected to each other, the memory having stored therein computer instructions, the processor executing the computer instructions to perform the hard disk failure detection method of any one of claims 1 to 7.
10. A computer-readable storage medium having stored thereon computer instructions for causing a computer to execute the hard disk failure detection method according to any one of claims 1 to 7.
CN202310778421.6A 2023-06-28 2023-06-28 Hard disk fault detection method, system, device, computer equipment and storage medium Pending CN116795610A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310778421.6A CN116795610A (en) 2023-06-28 2023-06-28 Hard disk fault detection method, system, device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310778421.6A CN116795610A (en) 2023-06-28 2023-06-28 Hard disk fault detection method, system, device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116795610A true CN116795610A (en) 2023-09-22

Family

ID=88039909

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310778421.6A Pending CN116795610A (en) 2023-06-28 2023-06-28 Hard disk fault detection method, system, device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116795610A (en)

Similar Documents

Publication Publication Date Title
US7496694B2 (en) Circuit, systems and methods for monitoring storage controller status
US9594641B2 (en) Techniques for updating memory of a chassis management module
CN107678909B (en) Circuit and method for monitoring chip configuration state in server
CN111722990A (en) Method and device for checking cable connection between main back boards
US10515042B1 (en) DAS storage cable identification
WO2014082275A1 (en) Method and apparatus for detecting cable connection condition
US9916273B2 (en) Sideband serial channel for PCI express peripheral devices
US9158646B2 (en) Abnormal information output system for a computer system
CN104239174A (en) BMC (baseboard management controller) remote debugging system and method
CN111048138A (en) Hard disk fault detection method and related device
CN111124785B (en) Method, device, equipment and storage medium for hard disk fault detection
CN112000535A (en) SAS Expander card-based hard disk abnormity identification method and processing method
CN116627729A (en) External connection cable, external connection cable in-place detection device, startup self-checking method and system
CN210721440U (en) PCIE card abnormity recovery device, PCIE card and PCIE expansion system
CN116795610A (en) Hard disk fault detection method, system, device, computer equipment and storage medium
CN109753396A (en) A kind of cable self checking method, system and the server of storage system
CN116010141A (en) Method, device and medium for positioning starting abnormality of multipath server
TWI802951B (en) Method, computer system and computer program product for storing state data of finite state machine
JP6094685B2 (en) Information processing apparatus and information processing apparatus control program
CN110781042B (en) Method, device and medium for detecting UBM (Universal boot Module) backboard based on BMC (baseboard management controller)
CN112596983A (en) Monitoring method for connector in server
CN113835971A (en) Monitoring method for abnormal lighting of server backboard and related components
US9639438B2 (en) Methods and systems of managing an interconnection
CN117573455B (en) PCIE equipment detection system, method, device and product
US11486926B1 (en) Wearout card use count

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination