CN114328141A - Hard disk fault early warning method and related components - Google Patents

Hard disk fault early warning method and related components Download PDF

Info

Publication number
CN114328141A
CN114328141A CN202111632373.7A CN202111632373A CN114328141A CN 114328141 A CN114328141 A CN 114328141A CN 202111632373 A CN202111632373 A CN 202111632373A CN 114328141 A CN114328141 A CN 114328141A
Authority
CN
China
Prior art keywords
hard disk
parameter
fault
early warning
parameters
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111632373.7A
Other languages
Chinese (zh)
Inventor
龚树青
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Electronic Information Industry Co Ltd
Original Assignee
Inspur Electronic Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Electronic Information Industry Co Ltd filed Critical Inspur Electronic Information Industry Co Ltd
Priority to CN202111632373.7A priority Critical patent/CN114328141A/en
Publication of CN114328141A publication Critical patent/CN114328141A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a hard disk fault early warning method and a related component. Acquiring SMART information of a hard disk, wherein the SMART information comprises a plurality of parameters of the hard disk; judging whether the current value of the hard disk parameter exceeds the threshold value of the hard disk parameter; if the hard disk has the fault risk, judging the fault risk of the hard disk, and generating alarm information of the hard disk with the fault risk; if not, obtaining a comprehensive score of the hard disk based on a plurality of parameters; and if the hard disk fault risk is judged based on the comprehensive score, generating alarm information of the hard disk with the fault risk. The operation and maintenance personnel are informed before the hard disk fails, the hard disk is prevented from being damaged and then is alarmed, and the loss caused by data loss is reduced.

Description

Hard disk fault early warning method and related components
Technical Field
The invention relates to the field of hard disk faults, in particular to a hard disk fault early warning method and a related component.
Background
The health status of the hard disk plays a crucial role in the stable operation of the server, but the number of the hard disks is large, and the time of the hard disk failure is relatively random, so in the prior art, a BMC (Baseboard Management Controller) alarm is often generated only after the hard disk fails, and the stored data is lost after the hard disk fails, which affects the service of the server.
Disclosure of Invention
The invention aims to provide a hard disk fault early warning method and a related component. The operation and maintenance personnel are informed before the hard disk fails, the hard disk is prevented from being damaged and then is alarmed, and the loss caused by data loss is reduced.
In order to solve the technical problem, the invention provides a hard disk fault early warning method, which comprises the following steps:
acquiring SMART information of a hard disk, wherein the SMART information comprises a plurality of parameters of the hard disk;
judging whether the current value of the hard disk parameter exceeds the threshold value of the hard disk parameter;
if yes, judging the hard disk has fault risk, and generating alarm information of the hard disk with the fault risk;
if not, obtaining a comprehensive score of the hard disk based on the parameters;
and generating alarm information of the hard disk with the fault risk when judging the fault risk of the hard disk based on the comprehensive score.
Preferably, the plurality of parameters includes at least two of a remapped sector parameter, an uncorrectable error parameter, a currently to-be-mapped sector parameter, an offline sector parameter, and a command timeout parameter.
Preferably, the acquiring SMART information of the hard disk includes:
SMART information of the hard disk is obtained through the disk array controller.
Preferably, the acquiring SMART information of the hard disk includes:
SMART information of the hard disk is acquired periodically.
Preferably, after generating the alarm information of the hard disk with the failure risk, the method further includes:
and the control alarm module alarms the hard disk with the fault risk.
Preferably, deriving a composite score for the hard disk based on the plurality of parameters includes:
calculating a single score of each parameter, wherein a is a current value of the parameter, B is a threshold corresponding to the parameter, and C is a weight corresponding to the parameter;
adding the single scores of the parameters to obtain a comprehensive score of the hard disk;
if the hard disk is judged to have the fault risk based on the comprehensive score, generating alarm information of the hard disk with the fault risk, wherein the alarm information comprises the following steps:
and if the comprehensive score exceeds a preset score threshold, judging the fault risk of the hard disk, and generating alarm information of the hard disk with the fault risk.
In order to solve the above technical problem, the present invention further provides a hard disk failure early warning system, including:
the information acquisition unit is used for acquiring SMART information of the hard disk, wherein the SMART information comprises a plurality of parameters of the hard disk;
the fault judging unit is used for judging whether the current value of the hard disk parameter exceeds the threshold value of the hard disk parameter;
and the fault alarm unit is used for generating alarm information of the hard disk with the fault risk.
In order to solve the above technical problem, the present invention further provides a BMC, including:
a memory for storing a computer program;
and the processor is used for realizing the steps of the hard disk fault early warning method when executing the computer program.
In order to solve the technical problem, the invention also provides a hard disk fault early warning device which comprises the BMC.
In order to solve the technical problem, the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the hard disk failure early warning method are implemented.
The application provides a hard disk fault early warning method and related components. Acquiring SMART information of a hard disk, wherein the SMART information comprises a plurality of parameters of the hard disk; judging whether the current value of the hard disk parameter exceeds the threshold value of the hard disk parameter; if the hard disk has the fault risk, judging the fault risk of the hard disk, and generating alarm information of the hard disk with the fault risk; if not, obtaining a comprehensive score of the hard disk based on a plurality of parameters; and if the hard disk fault risk is judged based on the comprehensive score, generating alarm information of the hard disk with the fault risk. The operation and maintenance personnel are informed before the hard disk fails, the hard disk is prevented from being damaged and then is alarmed, and the loss caused by data loss is reduced.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed in the prior art and the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
FIG. 1 is a flow chart of a hard disk failure early warning method according to the present invention;
fig. 2 is a schematic structural diagram of a BMC according to the present invention.
Detailed Description
The core of the invention is to provide a hard disk fault early warning method and related components. The operation and maintenance personnel are informed before the hard disk fails, the hard disk is prevented from being damaged and then is alarmed, and the loss caused by data loss is reduced.
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a flowchart of a hard disk failure early warning method provided by the present invention, including:
s11: acquiring SMART information of the hard disk, wherein the SMART information comprises a plurality of parameters of the hard disk;
s12: judging whether the current value of the hard disk parameter exceeds the threshold value of the hard disk parameter; if yes, go to step S13; if not, go to step S14;
s13: judging the fault risk of the hard disk, and generating alarm information of the hard disk with the fault risk;
s14: obtaining a comprehensive score of the hard disk based on a plurality of parameters;
s15: and when judging the fault risk of the hard disk based on the comprehensive score, generating alarm information of the hard disk with the fault risk.
Considering that the time of the hard disk failure is random, in the prior art, the BMC alarm is often generated only after the hard disk fails, and the stored data is lost after the hard disk fails, which affects the service of the server. The application provides a hard disk fault early warning method which is used for early warning hard disks which are likely to have faults.
Specifically, SMART information of the hard disk is obtained first, the SMART information includes a plurality of parameters of the hard disk, and the parameters of the hard disk can be represented to judge whether the hard disk fails. Judging whether the current value of the hard disk parameter exceeds the threshold value of the hard disk parameter, if the current value of one or more hard disk parameters exceeds the threshold value of the hard disk parameter, judging that the hard disk has fault risk, and generating alarm information with fault. If the current value of each hard disk parameter does not exceed the threshold value of the hard disk parameter, the hard disk needs to be evaluated next. And obtaining the comprehensive score of the hard disk according to the parameters of the hard disk, judging that the hard disk has fault risk when the comprehensive score of the hard disk is in the range of the fault risk of the hard disk, and generating alarm information with the fault.
According to the hard disk fault early warning method, operation and maintenance personnel are informed before the hard disk fails, the hard disk is prevented from being out of order and being alarmed again after loss is caused, and loss caused by data loss is reduced.
On the basis of the above-described embodiment:
as a preferred embodiment, the plurality of parameters includes at least two of a remapped sectors parameter, an uncorrectable error parameter, a currently to-be-mapped sectors parameter, an offline sectors parameter, and a command timeout parameter.
For a mechanical hard disk, the following five parameters may characterize the health of the hard disk: the method comprises the steps of correcting error parameters which cannot be corrected by the Reported unorderable Errors, remapping Sector parameters of the responded Sectors, Current Pending Sector parameters to be mapped currently, Sector parameters of Offline of Offline unordered Sectors, and Command Timeout parameters. According to the specific values of the five parameters, whether the hard disk has a fault or not can be judged subsequently.
In addition, the parameters include, but are not limited to, the above five, which are not limited herein.
As a preferred embodiment, acquiring SMART information of a hard disk includes:
SMART information of the hard disk is obtained through the disk array controller.
The BMC is connected with the hard disk through the disk array controller. The BMC is communicated with the disk array controller through an out-of-band management channel so as to acquire SMART information of the hard disk.
As a preferred embodiment, acquiring SMART information of a hard disk includes:
SMART information of the hard disk is acquired periodically.
The conversion speed of each parameter of the hard disk is not high, so that the SMART information of the hard disk is periodically acquired, and energy can be saved.
As a preferred embodiment, after generating the alarm information of the hard disk with the failure risk, the method further includes:
and the control alarm module alarms the hard disk with the fault risk.
When the hard disk has a fault risk, generating alarm information of the hard disk with the fault risk, and informing operation and maintenance personnel that the hard disk has the fault risk. In addition, the alarm module can be controlled to alarm the hard disk with the fault risk, and operation and maintenance personnel can know the specific hard disk with the fault. The subsequent processing of the hard disk is facilitated.
As a preferred embodiment, deriving a composite score for a hard disk based on a plurality of parameters includes:
calculating the single score of each parameter as A C/B, wherein A is the current value of the parameter, B is the threshold value corresponding to the parameter, and C is the weight corresponding to the parameter;
adding the single scores of the multiple parameters to obtain a comprehensive score of the hard disk;
if the hard disk fault risk is judged based on the comprehensive score, generating alarm information of the hard disk with the fault risk, wherein the alarm information comprises the following steps:
and if the comprehensive score exceeds a preset score threshold, judging the fault risk of the hard disk, and generating alarm information of the hard disk with the fault risk.
Considering that the influence of each parameter on the hard disk is different, not only the current value of the hard disk parameter and the threshold value of the hard disk parameter but also the weight corresponding to the hard disk parameter need to be considered when the scoring is performed according to the parameters. And the individual scores of the parameters are A C/B, and the comprehensive score is the sum of the individual scores of the parameters.
Specifically, taking the number of parameters as two as an example, the current value of the first parameter is 50, the threshold value of the first parameter is 55, the weight of the first parameter is 70, the current value of the second parameter is 27, the threshold value of the second parameter is 70, and the weight of the second parameter is 30. The individual score of the first parameter is 50 × 70/55 ═ 63.6, the individual score of the second parameter is 27 × 30/70 ═ 11.6, and the composite score of the current hard disk is 63.6+11.6 ═ 75.2.
And setting a scoring threshold, and if the comprehensive score exceeds the threshold, judging the hard disk has a fault risk. Specifically, if the score threshold is 60, the hard disk comprehensive score exceeds the score threshold, the hard disk has a failure risk, and alarm information of the hard disk with the failure risk needs to be generated.
In addition, the specific relationship between the composite score of the hard disk and the fault risk of the hard disk includes, but is not limited to, that the higher the composite score is, the greater the fault risk of the hard disk is, and the lower the composite score is, the greater the fault risk of the hard disk is, which is not limited herein.
The invention also provides a hard disk fault early warning system, which comprises:
the information acquisition unit is used for acquiring SMART information of the hard disk, wherein the SMART information comprises a plurality of parameters of the hard disk;
the fault judging unit is used for judging whether the current value of the hard disk parameter exceeds the threshold value of the hard disk parameter;
and the fault alarm unit is used for generating alarm information of the hard disk with the fault risk.
Please refer to the above embodiments for the introduction of the hard disk failure early warning system, which is not described herein again.
Fig. 2 is a schematic structural diagram of a BMC according to the present invention, which includes:
a memory 21 for storing a computer program;
and the processor 22 is used for implementing the steps of the hard disk failure early warning method when executing the computer program.
For the description of the BMC, please refer to the above embodiments, which are not described herein.
The invention also provides a hard disk fault early warning device which comprises the BMC.
Please refer to the above embodiments for the introduction of the hard disk failure warning device, which is not described herein again.
The invention also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, carries out the steps of the hard disk failure warning method according to any one of claims 1 to 6.
For the introduction of the computer-readable storage medium, reference is made to the above-described embodiments, which are not described herein in detail.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A hard disk fault early warning method is characterized by comprising the following steps:
acquiring SMART information of a hard disk, wherein the SMART information comprises a plurality of parameters of the hard disk;
judging whether the current value of the hard disk parameter exceeds the threshold value of the hard disk parameter;
if yes, judging the hard disk has fault risk, and generating alarm information of the hard disk with the fault risk;
if not, obtaining a comprehensive score of the hard disk based on the parameters;
and generating alarm information of the hard disk with the fault risk when judging the fault risk of the hard disk based on the comprehensive score.
2. The hard disk failure early warning method of claim 1, wherein the plurality of parameters comprise at least two of a remapped sector parameter, an uncorrectable error parameter, a currently to-be-mapped sector parameter, an offline sector parameter, and a command timeout parameter.
3. The hard disk fault early warning method of claim 1, wherein obtaining SMART information of a hard disk comprises:
SMART information of the hard disk is obtained through the disk array controller.
4. The hard disk fault early warning method of claim 1, wherein obtaining SMART information of a hard disk comprises:
SMART information of the hard disk is acquired periodically.
5. The hard disk failure early warning method according to claim 1, wherein after generating the warning information of the hard disk with failure risk, the method further comprises:
and the control alarm module alarms the hard disk with the fault risk.
6. The hard disk failure early warning method of any one of claims 1 to 5, wherein obtaining a composite score of the hard disk based on a plurality of the parameters comprises:
calculating a single score of each parameter, wherein a is a current value of the parameter, B is a threshold corresponding to the parameter, and C is a weight corresponding to the parameter;
adding the single scores of the parameters to obtain a comprehensive score of the hard disk;
if the hard disk is judged to have the fault risk based on the comprehensive score, generating alarm information of the hard disk with the fault risk, wherein the alarm information comprises the following steps:
and if the comprehensive score exceeds a preset score threshold, judging the fault risk of the hard disk, and generating alarm information of the hard disk with the fault risk.
7. A hard disk fault early warning system is characterized by comprising:
the information acquisition unit is used for acquiring SMART information of the hard disk, wherein the SMART information comprises a plurality of parameters of the hard disk;
the fault judging unit is used for judging whether the current value of the hard disk parameter exceeds the threshold value of the hard disk parameter;
and the fault alarm unit is used for generating alarm information of the hard disk with the fault risk.
8. A BMC, comprising:
a memory for storing a computer program;
a processor for implementing the steps of the hard disk failure warning method according to any one of claims 1 to 6 when executing the computer program.
9. A hard disk failure early warning device, comprising the BMC of claim 8.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the hard disk failure warning method according to any one of claims 1 to 6.
CN202111632373.7A 2021-12-28 2021-12-28 Hard disk fault early warning method and related components Pending CN114328141A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111632373.7A CN114328141A (en) 2021-12-28 2021-12-28 Hard disk fault early warning method and related components

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111632373.7A CN114328141A (en) 2021-12-28 2021-12-28 Hard disk fault early warning method and related components

Publications (1)

Publication Number Publication Date
CN114328141A true CN114328141A (en) 2022-04-12

Family

ID=81015481

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111632373.7A Pending CN114328141A (en) 2021-12-28 2021-12-28 Hard disk fault early warning method and related components

Country Status (1)

Country Link
CN (1) CN114328141A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116362452A (en) * 2023-03-15 2023-06-30 东莞先知大数据有限公司 Grinder fault early warning method, grinder fault early warning device and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116362452A (en) * 2023-03-15 2023-06-30 东莞先知大数据有限公司 Grinder fault early warning method, grinder fault early warning device and storage medium
CN116362452B (en) * 2023-03-15 2023-09-12 东莞先知大数据有限公司 Grinder fault early warning method, grinder fault early warning device and storage medium

Similar Documents

Publication Publication Date Title
US20230214568A1 (en) Detection method, system, electronic equipment, and storage medium of product test data
CN104778111A (en) Alarm method and alarm device
CN110968061B (en) Equipment fault early warning method and device, storage medium and computer equipment
CN107426033B (en) Method and device for predicting state of access terminal of Internet of things
CN111104283B (en) Fault detection method, device, equipment and medium of distributed storage system
CN114328141A (en) Hard disk fault early warning method and related components
CN104426696A (en) Fault processing method and device
CN115994044B (en) Database fault processing method and device based on monitoring service and distributed cluster
CN115718450A (en) Equipment wire-stopping monitoring method and device, electronic equipment and system
CN113487182B (en) Device health state evaluation method, device, computer device and medium
CN116820820A (en) Server fault monitoring method and system
CN114860487A (en) Memory fault identification method and memory fault isolation method
CN108899059B (en) Detection method and equipment for solid state disk
US7664797B1 (en) Method and apparatus for using statistical process control within a storage management system
CN114168435A (en) Alarm processing recommendation method, device, equipment and readable storage medium
CN107092551B (en) Server system performance optimization method and device
CN116541222A (en) Hard disk state data generation method, system, equipment and medium
CN108964992B (en) Node fault detection method and device and computer readable storage medium
US20120005426A1 (en) Storage device, controller of storage device, and control method of storage device
CN112160868B (en) Monitoring method, system, equipment and medium of variable pitch system
CN113722179B (en) Method, system and device for monitoring health state of magnetic disk
CN113808725B (en) Equipment early warning system and method
CN110289977B (en) Fault detection method, system, equipment and storage medium for logistics warehouse system
CN116412087A (en) Abnormality detection method and related device for wind generating set
CN111949485A (en) SAS port monitoring method, system and related device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination