CN113553222B - Storage hard disk detection early warning method and system - Google Patents

Storage hard disk detection early warning method and system Download PDF

Info

Publication number
CN113553222B
CN113553222B CN202110683186.5A CN202110683186A CN113553222B CN 113553222 B CN113553222 B CN 113553222B CN 202110683186 A CN202110683186 A CN 202110683186A CN 113553222 B CN113553222 B CN 113553222B
Authority
CN
China
Prior art keywords
hard disk
information
state information
service life
acquiring
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110683186.5A
Other languages
Chinese (zh)
Other versions
CN113553222A (en
Inventor
宋柏森
唐卓
纪军刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changsha Zhengtong Cloud Calculating Co ltd
Shenzhen Zhengtong Electronics Co Ltd
Original Assignee
Changsha Zhengtong Cloud Calculating Co ltd
Shenzhen Zhengtong Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changsha Zhengtong Cloud Calculating Co ltd, Shenzhen Zhengtong Electronics Co Ltd filed Critical Changsha Zhengtong Cloud Calculating Co ltd
Priority to CN202110683186.5A priority Critical patent/CN113553222B/en
Publication of CN113553222A publication Critical patent/CN113553222A/en
Application granted granted Critical
Publication of CN113553222B publication Critical patent/CN113553222B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • G06F11/2205Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing using arrangements specific to the hardware being tested
    • G06F11/2221Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing using arrangements specific to the hardware being tested to test input/output devices or peripheral units
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Mathematical Optimization (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Algebra (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Evolutionary Biology (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a detection and early warning method and a system for a storage hard disk, which comprises the following steps: collecting physical hard disk state information; acquiring detection types of hard disk state information; acquiring a hard disk state information detection result according to whether the hard disk damage information is wrong or not; acquiring a hard disk state information detection result according to whether the IO read-write rate information of the hard disk is higher than a preset read-write rate threshold value or not; obtaining the predicted service life of the hard disk based on a Bayesian prediction algorithm; and acquiring a hard disk state information detection result according to whether the predicted service life of the hard disk is higher than a preset service life threshold value. According to the invention, after the detection type of the hard disk state information is obtained, the detection result of the hard disk state information is obtained according to the detection type of the hard disk state information, and the running state of the physical hard disk on the server is monitored through the detection results of the hard disk state information of multiple types, so that the stable and normal running of the distributed storage system can be well ensured, and the influence on a service system of a user caused by the problem of the physical hard disk is reduced.

Description

Storage hard disk detection early warning method and system
Technical Field
The invention relates to the technical field of storage hard disk management, in particular to a storage hard disk detection early warning method and system.
Background
With the advent of the age of big data information, a large amount of data is generated every day for storage. Legacy centralized SAN storage can not meet the requirements of users, so that the capacity is not limited, nodes can be continuously increased, and interactive distributed storage is performed through a network. The advent of distributed storage has solved the bottleneck in both capacity and performance, but the same has brought about the bottleneck problem of the barrel principle. In the distributed storage system, because the client sends the requested IO to all the physical hard disks of the node, if one physical hard disk is a bad disk or a slow disk, the access of the client with the IO falling to the hard disk is affected, and a bad experience is brought to the client.
Therefore, if in the distributed storage system, the slow disk and the bad disk can be actively discovered to reduce the influence on the customer service system, meanwhile, in the distributed system, the reliability of the data is disaster-tolerant based on the copy or erasure code between the hosts, and if one or more hard disks are damaged, new data balance needs to be carried out between the hosts to reconstruct the lost copy data. In a distributed storage system which communicates based on a network, the pressure of the data reconstruction process on a network switch and the physical hard disk which is reconstructing data can also be the hard disk which is accessed by the service of a client, so that the influence of double pressure easily affects the service of the client.
Therefore, how to know which physical hard disk is about to be damaged at the first time and then perform active intervention processing to avoid that the slow disk or the bad disk affects the use of the business system of the client becomes a technical problem to be solved by those skilled in the art.
Disclosure of Invention
Based on this, the invention aims to provide a storage hard disk detection early warning method and system, which can actively discover a slow disk and a bad disk, or predict the predicted service life of a physical hard disk by combining various parameters of the physical hard disk by using a naive Bayes prediction method, so that feedback information about the damage of the physical hard disk can be preferentially acquired for active intervention processing, and the influence of the slow disk or the bad disk on the use of a business system of a client is avoided.
In order to solve the technical problems, the invention adopts the following technical scheme:
the invention provides a detection and early warning method for a storage hard disk, which comprises the following steps:
step S110, collecting physical hard disk state information;
step S120, classifying the collected physical hard disk state information to obtain the detection type of the hard disk state information; the detection types of the hard disk state information comprise hard disk damage information and IO read-write rate information of the hard disk;
step S130, acquiring a hard disk state information detection result according to whether the hard disk damage information is wrong or not; if not, go to step S140; if yes, go to step S160;
step S140, acquiring a hard disk state information detection result according to whether the IO read-write rate information of the hard disk is higher than a preset read-write rate threshold value; if yes, go to step S150; if not, go to step S160;
s150, acquiring the predicted service life of the hard disk based on a Bayesian prediction algorithm, and acquiring a hard disk state information detection result according to whether the predicted service life of the hard disk is higher than a preset service life threshold value; if yes, go to step S170; if not, go to step S160;
step S160, sending hard disk failure warning information;
and step S170, finishing the hard disk detection early warning.
In one embodiment, in step S150, the predicted life of the hard disk is obtained based on a bayesian prediction algorithm, and a hard disk state information detection result is obtained according to whether the predicted life of the hard disk is higher than a preset life threshold; if yes, go to step S170; if not, after step S160 is executed, the method further includes:
step S180, judging whether all the physical hard disks of the server are detected completely; if yes, go to step S170; if not, go to step S110.
In one embodiment, the method for obtaining the predicted lifetime of the hard disk based on the bayesian prediction algorithm in step S150 includes the following steps:
acquiring index information in SMART information and the predicted service life of a hard disk based on a naive Bayesian prediction formula P (C/(F1F2.. FN)) ((F1F2.. FN)/C) P (C)/P (F1F2.. FN), and forming a training sample; wherein, SMART parameter information is respectively defined as F1-FN, N is the total number of items of the SMART parameter information; the predicted service life of the hard disk is defined as C1-CM as a classification category, and M is the total number of time classification items corresponding to the predicted service life of the hard disk;
processing the formula P (C/(F1F2.. FN)) ═ P ((F1F2.. FN)/C) P (C)/P (F1F2.. FN) to obtain a simplified naive bayes prediction formula P (C/(F1F2.. FN)) -P (F1/C) × P (F2/C). P (FN/C) · P (C));
acquiring conditional probability of each SMART parameter information characteristic in the classification type based on SMART parameter information of the physical hard disk and combined with acquired state information of other physical hard disks through a formula P (C/(F1F2.. FN)) ═ P (F1/C) × P (F2/C) · P (FN/C) × (C));
by the formula
Figure GDA0003577466170000031
And acquiring the predicted service life of the hard disk.
In one embodiment, in the step S140, a hard disk state information detection result is obtained according to whether IO read-write rate information of a hard disk is higher than a preset read-write rate threshold; if yes, go to step S150; if not, the method of step S160 is executed, which includes the following steps:
acquiring IO read-write rate information of a physical hard disk on a server;
setting IO read-write rate reference values of hard disks corresponding to different types of hard disks, and acquiring a preset read-write rate threshold;
comparing the IO read-write rate information of the physical hard disk with a preset read-write rate threshold, if so, executing the step S150; if not, step S160 is executed.
In one embodiment, the hard disk damage information includes whether the SMART information has error information or not, or whether the result of the hard disk bad track detection has error information or not.
A storage hard disk detection early warning method comprises the following steps:
step S110, collecting physical hard disk state information;
step S120, classifying the collected physical hard disk state information to obtain the detection type of the hard disk state information; the detection types of the hard disk state information comprise hard disk damage information and IO read-write rate information of the hard disk;
step S130, acquiring a hard disk state information detection result according to whether the IO read-write rate information of the hard disk is higher than a preset read-write rate threshold value; if yes, go to step S140; if not, go to step S160;
step S140, acquiring a hard disk state information detection result according to whether the hard disk damage information is wrong; if not, executing step S150; if yes, go to step S160;
s150, acquiring the predicted service life of the hard disk based on a Bayesian prediction algorithm, and acquiring a hard disk state information detection result according to whether the predicted service life of the hard disk is higher than a preset service life threshold value; if yes, go to step S170; if not, go to step S160;
step S160, sending hard disk failure warning information;
and step S170, finishing the hard disk detection early warning.
In one embodiment, in step S150, the predicted life of the hard disk is obtained based on a bayesian prediction algorithm, and a hard disk state information detection result is obtained according to whether the predicted life of the hard disk is higher than a preset life threshold; if yes, go to step S170; if not, after step S160 is executed, the method further includes:
step S180, judging whether all physical hard disks of the server are detected completely; if yes, go to step S170; if not, go to step S110.
In one embodiment, the hard disk damage information includes whether the SMART information has error information or not, or whether the result of the hard disk bad track detection has error information or not.
A storage hard disk detection early warning system, comprising:
the acquisition module is used for acquiring the state information of the physical hard disk;
the classification module is used for classifying the collected physical hard disk state information to obtain the detection type of the hard disk state information; the detection types of the hard disk state information comprise hard disk damage information and IO read-write rate information of the hard disk;
the first detection result module is used for obtaining a hard disk state information detection result according to whether the hard disk damage information is wrong or not;
the second detection result module is used for acquiring a hard disk state information detection result according to whether the IO read-write rate information of the hard disk is higher than a preset read-write rate threshold value or not;
and the third detection result module is used for acquiring the predicted service life of the hard disk based on a Bayesian prediction algorithm and acquiring a hard disk state information detection result according to whether the predicted service life of the hard disk is higher than a preset service life threshold value.
In one embodiment, the system further includes a determining module, configured to determine whether all the physical hard disks of the server have been detected.
In summary, the detection and early warning method and system for the storage hard disk provided by the invention acquire the detection types of the hard disk state information, acquire the detection results of the hard disk state information according to the detection types of the hard disk state information, and monitor the operation state of the physical hard disk on the server through the detection results of the hard disk state information of multiple types, so as to facilitate the processing and early warning of common faults of the hard disk, well ensure the stable and normal operation of the distributed storage system, reduce the influence on the service system of a user due to the problem of the physical hard disk, and improve the stability and reliability of the distributed storage system.
Drawings
Fig. 1 is a schematic flowchart of a first storage hard disk detection and early warning method according to an embodiment of the present invention;
fig. 2 is a schematic flowchart of a second method for detecting and warning a storage hard disk according to an embodiment of the present invention;
fig. 3 is a schematic flowchart of a third method for detecting and warning a storage hard disk according to an embodiment of the present invention;
fig. 4 is a schematic flowchart of a fourth method for detecting and warning a storage hard disk according to an embodiment of the present invention;
fig. 5 is a block diagram of a detection and early warning system for a storage hard disk according to an embodiment of the present invention;
fig. 6 is a block diagram of a detection and warning system for a storage hard disk according to another embodiment of the present invention.
Detailed Description
For further understanding of the features and technical means of the present invention, as well as the specific objects and functions attained by the present invention, the present invention will be described in further detail with reference to the accompanying drawings and detailed description.
Fig. 1 is a schematic flow diagram of a first storage hard disk detection and early warning method provided in an embodiment of the present invention, and as shown in fig. 1, the storage hard disk detection and early warning method specifically includes the following steps:
step S110, collecting physical hard disk state information; the physical hard disk state information comprises, but is not limited to, SMART information, IO read-write rate information of the hard disk and the predicted service life of the hard disk, and the SMART information comprises, but is not limited to, parameter information such as hard disk bad track analysis information in a preset time period, error read rate information of the hard disk detected by SMART when an error occurs, dotting frequency information, re-allocation sector number information, rotation retry frequency information, parity check error rate information and the like; in this embodiment, the hard disk bad track analysis information in the preset time period is hard disk bad track analysis information of a one-month time interval period, which is a known technology and is not described herein again.
The distributed storage system is composed of a plurality of servers, the servers are connected in a network mode, each server can be provided with a plurality of physical hard disks according to needs, in the embodiment, the number of the physical hard disks mounted on each server is 10-20, and at the moment, state information of the physical hard disks corresponding to each server needs to be collected and stored.
Step S120, classifying the collected physical hard disk state information to obtain the detection type of the hard disk state information; the hard disk state information detection types comprise hard disk damage information and IO read-write speed information of the hard disk.
Specifically, when the acquired hard disk state detection type is hard disk damage information, the hard disk damage information comprises whether SMART information has error information or not, or whether error information exists in a hard disk bad track detection result or not; when the IO read-write rate information of the hard disk state detection type is acquired, whether the hard disk is a slow disk or not is judged by judging the IO read-write rate information of the hard disk, wherein the slow disk refers to a hard disk which is normal in read-write work and is much poorer in read-write speed performance than the same hard disk, and the read-write IO performance of the whole distributed storage system can be affected by the hard disk.
Step S130, acquiring a hard disk state information detection result according to whether the hard disk damage information is wrong or not; if not, go to step S140; if yes, step S160 is executed to notify the operation and maintenance manager to process the hard disk with the bad track or the SMART information error.
Specifically, the hard disk damage information further includes position information of the hard disk, and the operation and maintenance manager can quickly acquire the server corresponding to the failed hard disk and the corresponding physical hard disk through the hard disk failure warning information.
Step S140, acquiring a hard disk state information detection result according to whether the IO read-write rate information of the hard disk is higher than a preset read-write rate threshold value; if yes, go to step S150; if not, step S160 is executed to notify the operation and maintenance manager to process the hard disk with a poor read/write speed performance.
In step S140, a hard disk state information detection result is obtained according to whether the IO read-write rate information of the hard disk is higher than a preset read-write rate threshold; if yes, go to step S150; if not, the method of step S160 is executed, which includes the following steps:
step S141, IO read-write rate information of a physical hard disk on a server is collected; in this embodiment, the physical hard disk used by the server has a corresponding hard disk model, and the hard disk model corresponds to IO read-write performance data of the hard disk.
Step S142, setting IO read-write rate reference values of hard disks corresponding to different types of hard disks, and acquiring a preset read-write rate threshold; in this embodiment, the IO read-write rate reference value of the hard disk is 40% of the IO read-write rate maximum value of the hard disk.
Step S143, comparing the IO read-write rate information of the physical hard disk with a preset read-write rate threshold, if yes, executing step S150; if not, step S160 is executed to notify the operation and maintenance manager to process the hard disk with a poor read/write speed performance.
In the invention, after the detection type of the hard disk state information is obtained, the detection result of the hard disk state information is obtained according to the detection type of the hard disk state information, and then other subsequent steps are executed, the operation state of the physical hard disk on the server is monitored through the detection results of the hard disk state information of a plurality of types, so that common faults of the hard disk can be conveniently processed and early warned, the stable and normal operation of the distributed storage system can be well ensured, the influence on a service system of a user due to the problem of the physical hard disk is reduced, and the stability and the reliability of the distributed storage system are improved.
S150, acquiring the predicted service life of the hard disk based on a Bayesian prediction algorithm, and acquiring a hard disk state information detection result according to whether the predicted service life of the hard disk is higher than a preset service life threshold; if yes, go to step S170; if not, executing step S160 to notify the operation and maintenance manager; the preset lifetime threshold may be set by user as needed, and in this embodiment, the preset lifetime threshold is set to 1 month.
Specifically, the predicted service life of the hard disk comprises the residual service life information of the hard disk, whether the hard disk is about to be damaged is judged by acquiring the preset service life of the hard disk, active intervention processing is further performed, and the influence on the use of a service system of a client due to the damage of the hard disk is avoided.
In one embodiment, the method for obtaining the predicted lifetime of the hard disk based on the bayesian prediction algorithm in step S150 includes the following steps:
s151, collecting index information in SMART information and the predicted service life of the hard disk based on a naive Bayesian prediction formula P (C/(F1F2.. FN)) -P ((F1F2.. FN)/C) P (C)/P (F1F2.. FN), and forming a training sample; wherein, SMART parameter information is respectively defined as F1-FN, N is the total number of items of the SMART parameter information; the predicted service life of the hard disk is defined as C1-CM as a classification category, and M is the total number of time classification items corresponding to the predicted service life of the hard disk.
In this embodiment, each item of SMART index information of the hard disk, such as SMART parameter information of the hard disk, such as the error reading rate, the adding point number, the reassignment sector number, the rotation retry number, the parity error rate and the like detected by the SMART when an error occurs, can be used as a training sample, wherein the SMART parameter information of the hard disk, such as the error reading rate, the adding point number, the reassignment sector number, the rotation retry number, the parity error rate and the like detected by the SMART when an error occurs, is respectively defined as F1 to FN, and N is the total number of SMART parameter information; the predicted service life of the hard disk is defined as C1-CM as a classification category, M is the total number of time classification items corresponding to the predicted service life of the hard disk, and in this embodiment, the predicted service life of the hard disk is classified according to the number of months of the remaining service life of the hard disk.
S152, processing the formula P (C/(F1F2.. FN)) -P ((F1F2.. FN)/C) P (C))/P (F1F2.. FN) to obtain a simplified naive bayes prediction formula P (C/(F1F2.. FN)) -P (F1/C). P (F2/C). P (FN/C). P (C)) (C); specifically, for the formula P (C/(F1F2.. FN)) ═ P ((F1F2.. FN)/C) P (C)/P (F1F2.. FN), since the denominator P (F1F2.. FN) is constant for all classification categories, we only need to maximize the numerator, and the SMART parameter information features are independent of each other, so the formula P (C/(F1F2.. FN)) ═ P ((F1F2.. FN)/C) P (C)/P (F1F2.. FN) can be simplified to the formula P (C/(F1F2.. FN))) P (F1/C) · P (F2/C) · P (FN/C) · P (C));
step S153, obtaining conditional probabilities of the SMART parameter information features under each classification category, namely P (F1 i C1), P (F2 i C1), … and P (FN i C1), by using a formula P (C/(F1f2.. FN)) ═ P (F1/C) · P (F2/C) · P (FN/C) · P (C)) in combination with the acquired state information of the other physical hard disks, based on the SMART parameter information of the physical hard disks; p (F1 | -C2), P (F2 | -C2), …, P (FN | -C2); …, respectively;
p (F1I CM), P (F2I CM), …, P (FN I CM).
Step S154, passing formula
Figure GDA0003577466170000101
And acquiring the predicted service life of the hard disk.
And step S160, sending hard disk failure warning information.
And step S170, finishing the hard disk detection early warning.
In one embodiment, fig. 2 is a schematic flowchart of a second method for detecting and warning a storage hard disk according to an embodiment of the present invention, and as shown in fig. 2, in step S150, a predicted lifetime of the hard disk is obtained based on a bayesian prediction algorithm, and a hard disk state information detection result is obtained according to whether the predicted lifetime of the hard disk is higher than a preset lifetime threshold; if yes, go to step S170; if not, after step S160 is executed, the method further includes:
step S180, judging whether all physical hard disks of the server are detected completely; if yes, go to step S170; if not, go to step S110.
In order to make the technical solution of the present invention clearer, the following describes a preferred embodiment.
Step S110, collecting physical hard disk state information;
step S120, classifying the collected physical hard disk state information to obtain the detection type of the hard disk state information; the detection types of the hard disk state information comprise hard disk damage information and IO read-write rate information of the hard disk;
step S130, acquiring a hard disk state information detection result according to whether the hard disk damage information is wrong or not; if not, go to step S140; if yes, go to step S160;
step S140, acquiring a hard disk state information detection result according to whether the IO read-write rate information of the hard disk is higher than a preset read-write rate threshold value; if yes, go to step S150; if not, go to step S160;
s150, acquiring the predicted service life of the hard disk based on a Bayesian prediction algorithm, and acquiring a hard disk state information detection result according to whether the predicted service life of the hard disk is higher than a preset service life threshold value; if yes, go to step S170; if not, go to step S160;
step S160, sending hard disk failure warning information;
step S170, finishing the hard disk detection early warning;
step S180, judging whether all the physical hard disks of the server are detected completely; if yes, go to step S170; if not, go to step S110.
In the invention, after the detection type of the hard disk state information is obtained, the detection result of the hard disk state information is obtained according to the detection type of the hard disk state information, and then other subsequent steps are executed; step S140 may also be set before step S130, and the operation state of the physical hard disk on the server is monitored through the detection results of the state information of the multiple types of hard disks, so as to facilitate processing and early warning of common faults of the hard disks, well ensure stable and normal operation of the distributed storage system, reduce the influence on the service system of the user due to the problem of the physical hard disk, and improve the stability and reliability of the distributed storage system.
Fig. 3 is a schematic flow chart of a third method for detecting and warning a storage hard disk according to an embodiment of the present invention, and as shown in fig. 3, the method for detecting and warning a storage hard disk specifically includes the following steps:
step S110, collecting physical hard disk state information;
step S120, classifying the collected physical hard disk state information to obtain the detection type of the hard disk state information; the detection types of the hard disk state information comprise hard disk damage information and IO read-write rate information of the hard disk;
specifically, the hard disk damage information includes whether SMART information has error information or not, or whether error information exists in a result of hard disk bad track detection or not.
Step S130, acquiring a hard disk state information detection result according to whether the IO read-write rate information of the hard disk is higher than a preset read-write rate threshold value; if yes, go to step S140; if not, go to step S160;
step S140, acquiring a hard disk state information detection result according to whether the hard disk damage information is wrong; if not, executing step S150; if yes, go to step S160;
s150, acquiring the predicted service life of the hard disk based on a Bayesian prediction algorithm, and acquiring a hard disk state information detection result according to whether the predicted service life of the hard disk is higher than a preset service life threshold value; if yes, go to step S170; if not, go to step S160;
step S160, sending hard disk failure warning information;
and step S170, finishing the hard disk detection early warning.
Fig. 4 is a schematic flowchart of a fourth method for detecting and warning a storage hard disk according to an embodiment of the present invention, and as shown in fig. 4, in step S150, a predicted life of the hard disk is obtained based on a bayesian prediction algorithm, and a hard disk state information detection result is obtained according to whether the predicted life of the hard disk is higher than a preset life threshold; if yes, go to step S170; if not, after step S160 is executed, the method further includes:
step S180, judging whether all physical hard disks of the server are detected completely; if yes, go to step S170; if not, go to step S110.
Fig. 5 shows a block diagram of a detection and early warning system for a storage hard disk according to the present invention, as shown in fig. 5, corresponding to the detection and early warning method for a storage hard disk, the present invention further provides a detection and early warning system for a storage hard disk, which includes a module for executing the detection and early warning method for a storage hard disk, and the detection and early warning system for a storage hard disk can be configured on a cloud platform, the detection and early warning system for a storage hard disk according to the detection type of the hard disk state information, the detection result for the hard disk state information is obtained after the detection type of the hard disk state information is obtained, the operation state of a physical hard disk on a server is monitored through the detection results for the hard disk state information of multiple types, so as to facilitate the processing and early warning of common faults of the hard disk, and well ensure the stable and normal operation of a distributed storage system, the influence on a service system of a user due to the problem of the physical hard disk is reduced, and the stability and the reliability of the distributed storage system are improved.
Specifically, referring to fig. 5, the system for detecting and warning a hard disk includes an acquisition module 110, a classification module 120, a first detection result module 130, a second detection result module 140, and a third detection result module 150.
The acquisition module 110 is used for acquiring the state information of the physical hard disk;
the classification module 120 is configured to classify the collected physical hard disk state information to obtain a hard disk state information detection type; the detection types of the hard disk state information comprise hard disk damage information and IO read-write rate information of the hard disk;
a first detection result module 130, configured to obtain a hard disk state information detection result according to whether the hard disk damage information is wrong;
the second detection result module 140 is configured to obtain a hard disk state information detection result according to whether the IO read-write rate information of the hard disk is higher than a preset read-write rate threshold;
and a third detection result module 150, configured to obtain a predicted lifetime of the hard disk based on a bayesian prediction algorithm, and obtain a detection result of the state information of the hard disk according to whether the predicted lifetime of the hard disk is higher than a preset lifetime threshold.
Fig. 6 shows a block diagram of a structure of another embodiment of the detection and warning system for a storage hard disk provided by the present invention, as shown in fig. 6, a determination module 160 is added on the basis of the detection and warning system for a storage hard disk provided by this embodiment, and the determination module 160 is used for determining whether all physical hard disks of a server have been detected.
According to the detection and early warning system for the storage hard disk, the detection types of the hard disk state information are obtained, the detection results of the hard disk state information are obtained according to the detection types of the hard disk state information, the running state of the physical hard disk on the server is monitored through the detection results of the hard disk state information of multiple types, so that common faults of the hard disk can be conveniently processed and early warned, the stable and normal running of the distributed storage system can be well guaranteed, the influence on a service system of a user due to the problem of the physical hard disk is reduced, and the stability and the reliability of the distributed storage system are improved.
It should be noted that, as can be clearly understood by those skilled in the art, the specific implementation process of the detection and early warning system for a storage hard disk and each module may refer to the corresponding description in the foregoing method embodiment, and for convenience and conciseness of description, no further description is provided herein.
In summary, according to the detection and early warning method and system for the storage hard disk, the detection type of the hard disk state information is acquired, the detection result of the hard disk state information is acquired according to the detection type of the hard disk state information, and the operation state of the physical hard disk on the server is monitored through the detection results of the hard disk state information of multiple types, so that common faults of the hard disk can be processed and early warned conveniently, the stable and normal operation of the distributed storage system can be well guaranteed, the influence on a service system of a user due to the problem of the physical hard disk is reduced, and the stability and reliability of the distributed storage system are improved.
Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of illustrating clearly the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the embodiments provided in the present invention, it should be understood that the disclosed system and method can be implemented in other ways. For example, the system embodiments described above are merely illustrative. For example, the division of each unit is only one logic function division, and there may be another division manner in actual implementation. For example, various elements or components may be combined or may be integrated into another system, or some features may be omitted, or not implemented.
The steps in the method of the embodiment of the invention can be sequentially adjusted, combined and deleted according to actual needs. The units in the device of the embodiment of the invention can be merged, divided and deleted according to actual needs. In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a storage medium. Based on such understanding, the technical solution of the present invention essentially or partially contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a terminal, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention.
The above examples are merely illustrative of several embodiments of the present invention, and the description thereof is more specific and detailed, but not to be construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present invention should be subject to the appended claims.

Claims (9)

1. A detection and early warning method for a storage hard disk is characterized by comprising the following steps,
step S110, collecting physical hard disk state information;
step S120, classifying the collected physical hard disk state information to obtain the detection type of the hard disk state information; the detection types of the hard disk state information comprise hard disk damage information and IO read-write rate information of the hard disk;
step S130, acquiring a hard disk state information detection result according to whether the hard disk damage information is wrong or not; if not, go to step S140; if yes, go to step S160;
step S140, acquiring a hard disk state information detection result according to whether the IO read-write rate information of the hard disk is higher than a preset read-write rate threshold value; if yes, go to step S150; if not, go to step S160;
s150, acquiring the predicted service life of the hard disk based on a Bayesian prediction algorithm, and acquiring a hard disk state information detection result according to whether the predicted service life of the hard disk is higher than a preset service life threshold value; if yes, go to step S170; if not, go to step S160;
step S160, sending hard disk failure warning information;
step S170, finishing the hard disk detection early warning;
in step S150, the method for obtaining the predicted lifetime of the hard disk based on the bayesian prediction algorithm includes the following steps:
acquiring index information in SMART information and the predicted service life of a hard disk based on a naive Bayesian prediction formula P (C/(F1F2.. FN)) ((F1F2.. FN)/C) P (C)/P (F1F2.. FN), and forming a training sample; wherein, SMART parameter information is respectively defined as F1-FN, N is the total number of items of the SMART parameter information; the predicted service life of the hard disk is defined as C1-CM as a classification category, and M is the total number of time classification items corresponding to the predicted service life of the hard disk;
processing the formula P (C/(F1F2.. FN)) ═ P ((F1F2.. FN)/C) P (C)/P (F1F2.. FN) to obtain a simplified naive bayes prediction formula P (C/(F1F2.. FN)) -P (F1/C) × P (F2/C). P (FN/C) · P (C));
acquiring conditional probability of each SMART parameter information characteristic in the classification type based on SMART parameter information of the physical hard disk and combined with acquired state information of other physical hard disks through a formula P (C/(F1F2.. FN)) ═ P (F1/C) × P (F2/C) · P (FN/C) × (C));
by the formula
Figure FDA0003577466160000021
And acquiring the predicted service life of the hard disk.
2. The method for detecting and warning the storage hard disk according to claim 1, wherein in the step S150, the predicted service life of the hard disk is obtained based on a Bayesian prediction algorithm, and the detection result of the state information of the hard disk is obtained according to whether the predicted service life of the hard disk is higher than a preset service life threshold value; if yes, go to step S170; if not, after step S160 is executed, the method further includes:
step S180, judging whether all physical hard disks of the server are detected completely; if yes, go to step S170; if not, go to step S110.
3. The method for detecting and warning a storage hard disk according to claim 1 or 2, wherein in the step S140, a hard disk state information detection result is obtained according to whether IO read-write rate information of the hard disk is higher than a preset read-write rate threshold; if yes, go to step S150; if not, the method of step S160 is executed, which includes the following steps:
acquiring IO read-write rate information of a physical hard disk on a server;
setting IO read-write rate reference values of hard disks corresponding to different types of hard disks, and acquiring a preset read-write rate threshold;
comparing the IO read-write rate information of the physical hard disk with a preset read-write rate threshold, if so, executing the step S150; if not, go to step S160.
4. The storage hard disk detection early warning method according to claim 1 or 2, characterized in that: the hard disk damage information comprises whether the SMART information has error information or not or whether the result of hard disk bad track detection has error information or not.
5. A detection and early warning method for a storage hard disk is characterized by comprising the following steps,
step S110, collecting physical hard disk state information;
step S120, classifying the collected physical hard disk state information to obtain the detection type of the hard disk state information; the detection types of the hard disk state information comprise hard disk damage information and IO read-write rate information of the hard disk;
step S130, acquiring a hard disk state information detection result according to whether the IO read-write rate information of the hard disk is higher than a preset read-write rate threshold value; if yes, go to step S140; if not, go to step S160;
step S140, acquiring a hard disk state information detection result according to whether the hard disk damage information is wrong; if not, executing step S150; if yes, go to step S160;
s150, acquiring the predicted service life of the hard disk based on a Bayesian prediction algorithm, and acquiring a hard disk state information detection result according to whether the predicted service life of the hard disk is higher than a preset service life threshold value; if yes, go to step S170; if not, go to step S160;
step S160, sending hard disk failure warning information;
step S170, finishing the hard disk detection early warning;
in step S150, the method for obtaining the predicted lifetime of the hard disk based on the bayesian prediction algorithm includes the following steps:
collecting index information in SMART information and the predicted service life of the hard disk based on a naive Bayesian prediction formula P (C/(F1F2. FN))) (P ((F1F2. FN)/C) P (C)/P (F1F2. FN), and forming a training sample; wherein, SMART parameter information is respectively defined as F1-FN, N is the total number of items of the SMART parameter information; the predicted service life of the hard disk is defined as C1-CM as a classification category, and M is the total number of time classification items corresponding to the predicted service life of the hard disk;
processing the formula P (C/(F1F2.. FN)) ═ P ((F1F2.. FN)/C) P (C)/P (F1F2.. FN) to obtain a simplified naive bayes prediction formula P (C/(F1F2.. FN)) ═ P (F1/C) P (F2/C.. P (FN/C) P (C);
acquiring conditional probability of each SMART parameter information characteristic in the classification type based on SMART parameter information of the physical hard disk and combined with acquired state information of other physical hard disks through a formula P (C/(F1F2.. FN)) ═ P (F1/C) × P (F2/C) · P (FN/C) × (C));
by the formula
Figure FDA0003577466160000041
And acquiring the predicted service life of the hard disk.
6. The method for detecting and warning the storage hard disk according to claim 5, wherein in the step S150, the predicted service life of the hard disk is obtained based on a Bayesian prediction algorithm, and the detection result of the state information of the hard disk is obtained according to whether the predicted service life of the hard disk is higher than a preset service life threshold value; if yes, go to step S170; if not, after step S160 is executed, the method further includes:
step S180, judging whether all physical hard disks of the server are detected completely; if yes, go to step S170; if not, go to step S110.
7. The storage hard disk detection early warning method according to claim 5, characterized in that: the hard disk damage information comprises whether the SMART information has error information or not or whether the result of hard disk bad track detection has error information or not.
8. A detection and early warning system for a storage hard disk is characterized by comprising,
the acquisition module is used for acquiring the state information of the physical hard disk;
the classification module is used for classifying the collected physical hard disk state information to obtain the detection type of the hard disk state information; the detection types of the hard disk state information comprise hard disk damage information and IO read-write speed information of the hard disk;
the first detection result module is used for obtaining a hard disk state information detection result according to whether the hard disk damage information is wrong or not;
the second detection result module is used for acquiring a hard disk state information detection result according to whether the IO read-write rate information of the hard disk is higher than a preset read-write rate threshold value or not;
the third detection result module is used for acquiring the predicted service life of the hard disk based on a Bayesian prediction algorithm and acquiring a hard disk state information detection result according to whether the predicted service life of the hard disk is higher than a preset service life threshold value;
the method for obtaining the predicted service life of the hard disk based on the Bayesian prediction algorithm comprises the following steps of:
acquiring index information in SMART information and the predicted service life of a hard disk based on a naive Bayesian prediction formula P (C/(F1F2.. FN)) ((F1F2.. FN)/C) P (C)/P (F1F2.. FN), and forming a training sample; wherein, SMART parameter information is respectively defined as F1-FN, N is the total number of items of the SMART parameter information; the predicted service life of the hard disk is used as a classification category and is respectively defined as C1-CM, and M is the total number of time classification items corresponding to the predicted service life of the hard disk;
processing the formula P (C/(F1F2.. FN)) ═ P ((F1F2.. FN)/C) P (C)/P (F1F2.. FN) to obtain a simplified naive bayes prediction formula P (C/(F1F2.. FN)) -P (F1/C) × P (F2/C). P (FN/C) · P (C));
acquiring conditional probability of each SMART parameter information characteristic in the classification type based on SMART parameter information of the physical hard disk and combined with acquired state information of other physical hard disks through a formula P (C/(F1F2.. FN)) ═ P (F1/C) × P (F2/C) · P (FN/C) × (C));
by the formula
Figure FDA0003577466160000051
And acquiring the predicted service life of the hard disk.
9. The storage hard disk detection and early warning system of claim 8, wherein: the system also comprises a judging module used for judging whether all the physical hard disks of the server are detected completely.
CN202110683186.5A 2021-06-21 2021-06-21 Storage hard disk detection early warning method and system Active CN113553222B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110683186.5A CN113553222B (en) 2021-06-21 2021-06-21 Storage hard disk detection early warning method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110683186.5A CN113553222B (en) 2021-06-21 2021-06-21 Storage hard disk detection early warning method and system

Publications (2)

Publication Number Publication Date
CN113553222A CN113553222A (en) 2021-10-26
CN113553222B true CN113553222B (en) 2022-05-13

Family

ID=78130729

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110683186.5A Active CN113553222B (en) 2021-06-21 2021-06-21 Storage hard disk detection early warning method and system

Country Status (1)

Country Link
CN (1) CN113553222B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108647136A (en) * 2018-05-10 2018-10-12 南京道熵信息技术有限公司 Hard disk corruptions prediction technique and device based on SMART information and deep learning
CN110413227A (en) * 2019-06-22 2019-11-05 华中科技大学 A kind of remaining life on-line prediction method and system of hard disc apparatus
CN110413430A (en) * 2019-07-19 2019-11-05 苏州浪潮智能科技有限公司 A kind of life-span prediction method of solid state hard disk, device and equipment
CN111309502A (en) * 2020-02-16 2020-06-19 西安奥卡云数据科技有限公司 Solid state disk service life prediction method
CN112115004A (en) * 2020-07-29 2020-12-22 西安交通大学 Hard disk service life prediction method based on back propagation Bayes deep learning

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11579951B2 (en) * 2018-09-27 2023-02-14 Oracle International Corporation Disk drive failure prediction with neural networks
CN112446557B (en) * 2021-01-29 2021-05-07 北京蒙帕信创科技有限公司 Disk failure prediction evasion method and system based on deep learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108647136A (en) * 2018-05-10 2018-10-12 南京道熵信息技术有限公司 Hard disk corruptions prediction technique and device based on SMART information and deep learning
CN110413227A (en) * 2019-06-22 2019-11-05 华中科技大学 A kind of remaining life on-line prediction method and system of hard disc apparatus
CN110413430A (en) * 2019-07-19 2019-11-05 苏州浪潮智能科技有限公司 A kind of life-span prediction method of solid state hard disk, device and equipment
CN111309502A (en) * 2020-02-16 2020-06-19 西安奥卡云数据科技有限公司 Solid state disk service life prediction method
CN112115004A (en) * 2020-07-29 2020-12-22 西安交通大学 Hard disk service life prediction method based on back propagation Bayes deep learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
On Predictability of System Anomalies in Real World;Yongmin Tan ET AL;《2010 18th Annual IEEE/ACM International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems》;20101231;第1-8页 *
基于深度学习的硬盘故障预测技术研究;康艳龙;《中国优秀硕士学位论文全文数据库(电子期刊)》;20200228;第29-39页 *

Also Published As

Publication number Publication date
CN113553222A (en) 2021-10-26

Similar Documents

Publication Publication Date Title
Xu et al. Improving service availability of cloud systems by predicting disk error
US10147048B2 (en) Storage device lifetime monitoring system and storage device lifetime monitoring method thereof
CN110413227B (en) Method and system for predicting remaining service life of hard disk device on line
CN107025153B (en) Disk failure prediction method and device
US20150074467A1 (en) Method and System for Predicting Storage Device Failures
KR101948634B1 (en) Failure prediction method of system resource for smart computing
JP6581648B2 (en) Recovering cloud-based service availability from system failures
CN105335250B (en) A kind of data reconstruction method and device based on distributed file system
WO2022001125A1 (en) Method, system and device for predicting storage failure in storage system
CN109918313B (en) GBDT decision tree-based SaaS software performance fault diagnosis method
US11165668B2 (en) Quality assessment and decision recommendation for continuous deployment of cloud infrastructure components
US11363042B2 (en) Detection of anomalies in communities based on access patterns by users
CN112951311B (en) Hard disk fault prediction method and system based on variable weight random forest
CN111984511A (en) Multi-model disk fault prediction method and system based on two-classification
US20220358380A1 (en) Method for failure prediction and apparatus implementing the same method
Zhang et al. Multi-view feature-based {SSD} failure prediction: What, when, and why
CN113553222B (en) Storage hard disk detection early warning method and system
Pinciroli et al. The life and death of SSDs and HDDs: Similarities, differences, and prediction models
Fadaei Tehrani et al. A threshold sensitive failure prediction method using support vector machine
Agarwal et al. Discovering rules from disk events for predicting hard drive failures
CN111400122A (en) Hard disk health degree assessment method and device
KR102266416B1 (en) Method for failure prediction and apparatus implementing the same method
Bayram et al. Improving reliability with dynamic syndrome allocation in intelligent software defined data centers
CN113539352A (en) Solid state disk hidden fault detection method and related equipment
Rombach et al. SmartPred: Unsupervised hard disk failure detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant