CN108415810B - Hard disk state monitoring method and device - Google Patents

Hard disk state monitoring method and device Download PDF

Info

Publication number
CN108415810B
CN108415810B CN201810212464.7A CN201810212464A CN108415810B CN 108415810 B CN108415810 B CN 108415810B CN 201810212464 A CN201810212464 A CN 201810212464A CN 108415810 B CN108415810 B CN 108415810B
Authority
CN
China
Prior art keywords
hard disk
neural network
recurrent neural
smart
state
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810212464.7A
Other languages
Chinese (zh)
Other versions
CN108415810A (en
Inventor
包卫东
朱晓敏
王吉
周文
张耀鸿
陈超
马力
张国良
陈俊杰
杨骋
吴冠霖
韩浩然
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN201810212464.7A priority Critical patent/CN108415810B/en
Publication of CN108415810A publication Critical patent/CN108415810A/en
Application granted granted Critical
Publication of CN108415810B publication Critical patent/CN108415810B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3037Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a memory, e.g. virtual memory, cache
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • G06F11/3476Data logging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Health & Medical Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a method and a device for monitoring the state of a hard disk, wherein the method comprises the following steps: periodically acquiring different SMART attribute values of the monitored hard disk; after the SMART attribute value of the hard disk is obtained each time, the following operations are carried out: obtaining attribute integration of the hard disk according to the SMART attribute value of the hard disk which is obtained currently; integrating and inputting the attribute into a recurrent neural network with a gated recurrent unit introduced, and taking the hidden layer state of the recurrent neural network as output; and monitoring whether the hard disk has a fault state according to the information acquired from the output of the recurrent neural network. The invention can trace the deviation of the normal state of the hard disk drive to an earlier period, thereby being beneficial to improving the fault detection rate or the fault prediction capability.

Description

Hard disk state monitoring method and device
Technical Field
The invention relates to the technical field of hard disk fault monitoring, in particular to a hard disk state monitoring method and device.
Background
In the big data era, data centers equipped with large storage systems play an important role in storing and processing data. However, complex systems pose a serious problem of IT equipment failure, with hard disks being the most common failing components. While a single hard disk failure may be rare, stacking thousands of hard disks together amplifies the probability of failure, making failure events common rather than exceptional in data center storage systems. Considering the huge economic losses caused by data loss and service interruption, the hard disk reliability problem is one of the most concerned problems for data center administrators.
Currently, almost all hard disks are equipped with self-monitoring, analysis and reporting technology (SMART) to detect and report various drive reliability indicators. Studies have shown that hard disks can predict impending failures via SMART data. It is even used by some hard disk manufacturers as a failure prediction model built into the product. However, the built-in model provides only a basic threshold-based evaluation, which is rather weak in failure prediction. In order to improve the failure prediction capability, researchers have proposed statistical and machine learning methods based on SMART data. While these approaches exhibit good performance in terms of fault detection rate and false alarm rate, there are still some key unsolved challenges:
a failed hard disk often undergoes a progression of degradation from health to failure. Most existing methods, however, predict faults based on a time stamp of SMART (automatic detection Analysis and Reporting Technology) data, ignoring the time-dependent degradation process. Some methods, based on markov models and simple Recurrent Neural Networks (RNNs), attempt to capture the temporal dependencies in SMART data. However, limited by the problems inherent in these models, such as gradient disappearance and gradient explosion in RNNs, these methods can only capture short-term time dependencies over a few days.
However, according to the observation and analysis of the present inventors, the deviation of the normal state of some hard disk drives can be traced back to tens of days or even months, which greatly exceeds the capability of these methods.
Disclosure of Invention
In view of the above, the present invention provides a method and an apparatus for monitoring a status of a hard disk drive, which can trace back a deviation of a normal status of the hard disk drive to an earlier time, so as to improve a failure detection rate or a failure prediction capability.
The invention provides a hard disk state monitoring method based on the above purpose, which comprises the following steps:
periodically acquiring different SMART attribute values of the monitored hard disk; after the SMART attribute value of the hard disk is obtained each time, the following operations are carried out:
obtaining attribute integration of the hard disk according to the SMART attribute value of the hard disk which is obtained currently;
integrating and inputting the attribute into a recurrent neural network with a gated recurrent unit introduced, and taking the hidden layer state of the recurrent neural network as output;
and monitoring whether the hard disk has a fault state according to the information acquired from the output of the recurrent neural network.
The method comprises the following steps of obtaining attribute integration of the hard disk according to a SMART attribute value of the hard disk which is currently obtained, wherein the attribute integration specifically comprises the following steps:
according to the SMART vector composed of the currently acquired SMART attribute values of the t day of the hard disk
Figure BDA0001597597820000021
The resulting attribute integration is represented as
Figure BDA0001597597820000022
The calculation is carried out according to the following expression I:
vt=ReLU(WVst+bv) (expression one)
Wherein the content of the first and second substances,
Figure BDA0001597597820000023
a weight matrix representing SMART attribute values,
Figure BDA0001597597820000024
is a bias vector. ReLU is an activation function defined as ReLU (x) ═ x + ═ max (0, x), where max is an element-by-element operation; wVAnd bvIs a vector obtained by learning in the training process in advance;
Figure BDA0001597597820000025
the representation dimension is dsThe set of real vectors of (2);
Figure BDA0001597597820000031
the representation dimension is dvThe set of real vectors of (2); dsFor the number of SMART attribute values, dvThe number of values is integrated for the attribute.
Wherein the gated recursion units in the recurrent neural network comprise a gated unit and a recursion unit; wherein the content of the first and second substances,
the gate control unit is used for controlling the information flow of the recursion unit so that the recursion unit captures the dependence of a long time scale; wherein the content of the first and second substances,
the gating cell includes a reset gate and an update gate to allow the recursion cell to hold existing content or update content on an existing content basis.
The relationship between the input and the output of the recurrent neural network is specifically realized by a recurrent algorithm of the following four expressions:
rt=sigmoid(Wrvt+Urht-1) (expression two)
zt=sigmoid(Wzvt+Uzht-1) (expression III)
ht'=tanh(Wvt+U(rt⊙ht-1) (expression four)
ht=zt⊙ht-1+(1-zt)⊙h't(expression five)
Wherein, an element-by-element multiplication operation; r istDenotes a reset gate, h'tRepresenting alternative states, ztRepresents an update gate, htRepresenting the current hidden layer state of the recurrent neural network, namely the current output of the recurrent neural network; h ist-1Representing the hidden layer state obtained in the last time step of the recurrent neural network; parameter Wr、Ur、Wz、UzW, U is a weight vector learned in advance in the training process.
Wherein the monitoring whether the hard disk will have a fault state according to information obtained from the output of the recurrent neural network specifically includes:
generating an attention distribution vector according to the hidden layer state output by the recurrent neural network; wherein, the attention distribution vector is used as a weight vector of the current hidden layer state, and reflects the difference between the current hidden layer state and the health state of the hard disk;
and determining whether the hard disk is in a fault state or not by monitoring the weight value in the weight vector.
Further, before periodically acquiring different SMART attribute values of the monitored hard disk, the method further includes:
training the recurrent neural network using SMART data of healthy and failed hard disks.
The present invention also provides a hard disk state monitoring device, including:
the characteristic integration module is used for periodically acquiring different SMART attribute values of the monitored hard disk; after the SMART attribute value of the hard disk is obtained every time, obtaining the attribute integration of the hard disk according to the currently obtained SMART attribute value of the hard disk;
the time dependence extraction module is used for integrating and inputting the attribute into a recurrent neural network with a gating recurrent unit and taking the hidden layer state of the recurrent neural network as output;
and the fault information monitoring module is used for monitoring whether the hard disk has a fault state according to the information acquired from the output of the recurrent neural network.
Further, the apparatus further comprises:
and the training module is used for training the recurrent neural network by using SMART data of the healthy hard disk and the fault hard disk.
In the technical scheme of the invention, in order to capture the long-term time dependence in SMART data, a Gated Recursion Unit (GRU) is introduced on the basis of the existing simple RNN, so that the problems of gradient extinction and explosion when a long-term sequence is processed are avoided. Therefore, the deviation of the normal state of the hard disk drive can be traced back to an earlier period, so that the fault detection rate or the fault prediction capability can be improved.
Furthermore, in the technical scheme of the invention, an attention mechanism is also designed, and an attention distribution vector is generated for the hidden layer state output by the recurrent neural network so as to reflect the difference between the current hidden layer state and the health state of the hard disk; by analyzing the attention distribution, wherein higher attention weight means more important role, the influence of which days in the past on the current state of the hard disk is the largest can be deeply known; therefore, the degradation process of the hard disk can be automatically revealed, and the reason of the hard disk failure can be tracked.
Drawings
Fig. 1 is a flowchart of a hard disk state monitoring method according to an embodiment of the present invention;
fig. 2 is a diagram illustrating an internal connection structure of a gating recursive unit according to an embodiment of the present invention;
fig. 3 is a schematic diagram of an internal structure of a hard disk state monitoring apparatus according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to specific embodiments and the accompanying drawings.
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative only and should not be construed as limiting the invention.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.
It should be noted that all expressions using "first" and "second" in the embodiments of the present invention are used for distinguishing two entities with the same name but different names or different parameters, and it should be noted that "first" and "second" are merely for convenience of description and should not be construed as limitations of the embodiments of the present invention, and they are not described in any more detail in the following embodiments.
The inventor of the invention observes and analyzes the change of the SMART attribute value of the hard disk drive, and finds that the deterioration process of the SMART attribute value can be traced to 15 days or even 40 days. For example, for SMART _197_ RAW, the change point typically occurs within 20 days before failure; however, for SMART _7_ RAW and SMART _242_ RAW, backtracking is usually required for 40 days to find the change point. This is clearly beyond the capabilities of a markov model or a simple RNN. A high performance predictive model requires a method that can extract the long-term time dependence.
In order to solve the problems, in the technical scheme of the invention, in order to capture the long-term time dependence in the SMART data, a Gated Recursion Unit (GRU) is introduced on the basis of the existing simple RNN, so that the problems of gradient extinction and explosion when a long-term sequence is processed are avoided. Therefore, the deviation of the normal state of the hard disk drive can be traced back to an earlier period, so that the fault detection rate or the fault prediction capability can be improved.
The technical solution of the embodiments of the present invention is described in detail below with reference to the accompanying drawings.
The hard disk state monitoring method provided by the embodiment of the invention can periodically monitor the hard disk fault state: periodically acquiring different SMART attribute values of the monitored hard disk, for example, acquiring different SMART attribute values of the monitored hard disk every day, and further performing the following operations after acquiring the SMART attribute values of the hard disk each time: monitoring whether the hard disk is in a fault state or not according to the acquired SMART attribute value;
the specific method for monitoring the state of the hard disk after acquiring different SMART attribute values of the monitored hard disk at one time is shown in fig. 1, and comprises the following steps:
step S101: and acquiring different SMART attribute values of the monitored hard disk, and acquiring the current attribute integrated representation of the hard disk.
Specifically, according to the SMART attribute value of the hard disk which is obtained currently, the attribute integrated representation of the hard disk is obtained; for example, the attribute integrated representation of the monitored hard disk on the t day may be obtained according to different SMART attribute values of the monitored hard disk on the t day currently acquired.
For example, a SMART vector composed of SMART attribute values of the current acquired t day
Figure BDA0001597597820000061
The resulting attribute integration is represented as
Figure BDA0001597597820000062
Can be calculated according to the following expression one:
vt=ReLU(WVst+bv) (expression one)
Wherein the content of the first and second substances,
Figure BDA0001597597820000063
a weight matrix representing SMART attribute values,
Figure BDA0001597597820000064
is a bias vector. ReLU is an activation function defined as ReLU (x) ═ x + ═ max (0, x), where max is an element-by-element operation; wVAnd bvIs a vector obtained by learning in the training process in advance;
Figure BDA0001597597820000065
the representation dimension is dsThe set of real vectors of (2);
Figure BDA0001597597820000066
the representation dimension is dvThe set of real vectors of (2); dsFor the number of SMART attribute values, dvThe number of values is integrated for the attribute.
Step S102: and integrating and inputting the obtained attributes into a recurrent neural network with a gated recurrent unit introduced, and taking the hidden layer state of the recurrent neural network as output.
Specifically, the recurrent neural network of the embodiment of the present invention includes a plurality of gated recurrent units, and when the obtained attribute integration is input to the recurrent neural network, each vector element of the attribute integration is used as an input of each gated recurrent unit.
The gate control recursion unit in the recursion neural network comprises a gate control unit and a recursion unit; in one gated recursion unit, the gated unit is used for controlling the information flow of the recursion unit, so that the recursion unit can capture the long-time scale dependence; wherein a gating cell includes a reset gate and an update gate to allow the recursion cell to hold existing content or update content on the basis of existing content. Fig. 2 shows the internal connection structure of the gated recursion unit.
The Recurrent Neural Network (RNN) maintains a recursive hidden state, updated in each time step from the current input and the previous hidden state, and the input v of the recurrent neural network, which introduces gated recurrent cellstAnd output htThe relationship between the above can be realized by a recursive algorithm of the following four expressions:
rt=sigmoid(Wrvt+Urht-1) (expression two)
zt=sigmoid(Wzvt+Uzht-1) (expression III)
ht'=tanh(Wvt+U(rt⊙ht-1) (expression four)
ht=zt⊙ht-1+(1-zt)⊙h't(expression five)
Wherein, an element-by-element multiplication operation; parameter Wr、Ur、Wz、UzW, U is a weight vector learned in advance in the training process; the Sigmoid function may convert any real value to [0,1 ]]Within the range; the Tanh function can convert any real value to [ -1,1 ] value]Within the range.
rtDenotes a reset gate, h'tRepresenting alternative states, ztRepresents an update gate, htRepresenting the current hidden layer state of the recurrent neural network (the hidden layer state of the recurrent neural network at the t day), namely the current output of the recurrent neural network; h ist-1Representing the hidden layer state obtained in the last time step of the recurrent neural network (the hidden layer state of the t-1 th day of the recurrent neural network);
when resetting the gate rtNear 0, alternative state h'tThe previous hidden layer state can be forgotten and reset as the current input; updating the door ztControlling hidden state h from last time stept-1And alternative status h'tThe amount of information flowing in.
Step S103: and monitoring whether the hard disk has a fault state according to the information acquired from the output of the recurrent neural network.
In this step, long-term information of SMART data of the hard disk can be acquired from the output of the recurrent neural network, and according to the acquired long-term information of the SMART data, the deviation of the normal state of the hard disk drive can be traced back to an earlier period, so that whether the hard disk is in a fault state or not can be monitored in advance, and the fault detection rate or the fault prediction capability can be improved.
In addition, in this step, when monitoring whether the hard disk will have a failure state according to the information obtained from the output of the recurrent neural network, a more preferable technical solution may be adopted: on the basis of obtaining long-term information of SMART data of the hard disk reflected by the hidden layer state output by the recurrent neural network, an attention mechanism is designed, and the attention mechanism can automatically focus on the degradation process of the hard disk. The attention mechanism can display which information has the greatest influence on fault prediction, and fault tracing diagnosis is provided. The method is helpful for managers to trace back to specific days and find out the reasons of the faults.
Specifically, an attention distribution vector may be generated according to a hidden layer state output by the recurrent neural network; wherein, the attention distribution vector is used as a weight vector of the current hidden layer state, and reflects the difference between the current hidden layer state and the health state of the hard disk; and determining whether the hard disk is in a fault state or not by monitoring the weight value in the weight vector.
Specifically, the hidden layer state h can be output according to the recurrent neural networktGenerating an attention distribution vector a according to the following expression sixt
Figure BDA0001597597820000081
Where i is a natural number whose summation range in expression six is [ T-T +1, T](ii) a T is the size of the time window of the sequence input into the recurrent neural network. u. oftIs in a hidden layer state htThe activation function is converted into a position-based representation through tanh, as shown in the following expression seven;
Figure BDA0001597597820000091
is a health state vector that can be viewed as a high-order representation of the characteristics of a healthy hard disk;
Figure BDA0001597597820000092
can be learned in advance in the training processThe method is obtained. The above equation is used to compare the difference between the health state vector and the current hidden state and to derive a weight for the difference.
ut=tanh(Waht+ba) (expression seven)
Wherein the content of the first and second substances,
Figure BDA0001597597820000093
the representation dimension is dr×drThe matrix of real numbers of (a) is,
Figure BDA0001597597820000094
the representation dimension is drThe real number vectors are parameters obtained by learning in the training process in advance; drThe number of gated recursion units of the recurrent neural network.
In obtaining the attention distribution vector atThen, the hidden layer state with attention weight can be obtained according to the following expression eight
Figure BDA0001597597820000095
Figure BDA0001597597820000096
With the help of a mechanism of attention, the most abundant part of the fault information can be focused, and therefore better evaluation and prediction can often be made. More importantly, by analyzing the attention distribution, higher attention weight means more important role, so that the influence of which days in the past on the current state of the hard disk is the greatest can be deeply known; it can automatically reveal the degradation process of the hard disk and help us to track the cause of the hard disk failure.
In fact, the training process may be performed before step S101 is performed, i.e. before different SMART attribute values of the monitored hard disk are periodically obtained. The training process comprises the steps of training the recurrent neural network by using SMART data of a healthy hard disk and a fault hard disk, namely training to obtain parameters in the recurrent neural network; the training process may also include learning other parameters:
in particular, during training, SMART data of a healthy hard disk and a fault hard disk can be used for calculation and verification to determine the parameter W in the recurrent neural networkr、Ur、Wz、UzW, U. Of course, W can also be obtained simultaneously during the training processVAnd bvAnd parameters in attention mechanism
Figure BDA0001597597820000097
Wa、ba. The training method may be a gradient descent method known to those skilled in the art, and the method will not be described herein.
Based on the foregoing method, an internal structure of the hard disk state monitoring device provided in the embodiment of the present invention is shown in fig. 3, and includes: the system comprises a feature integration module 301, a time dependence extraction module 302 and a fault information monitoring module 303.
The feature integration module 301 is configured to periodically obtain different SMART attribute values of the monitored hard disk; after the SMART attribute value of the hard disk is obtained every time, obtaining the attribute integration of the hard disk according to the currently obtained SMART attribute value of the hard disk; specifically, the feature integration module 301 periodically obtains different SMART attribute values of the monitored hard disk; for SMART vector composed of SMART attribute values of the current acquired Tth day of the hard disk
Figure BDA0001597597820000101
The resulting attribute integration may be calculated according to expression one below.
The time-dependent extraction module 302 is configured to integrate and input the attributes obtained by the feature integration module 301 into a recurrent neural network that introduces a gated recurrent unit, and take a hidden layer state of the recurrent neural network as an output; the relationship between the input and the output of the recurrent neural network is realized by the recurrent algorithm of the above expression two, three, four and five.
The failure information monitoring module 303 is configured to monitor whether the hard disk will have a failure state according to information obtained from the output of the recurrent neural network. Specifically, the fault information monitoring module 303 generates an attention distribution vector according to the hidden layer state output by the recurrent neural network; wherein, the attention distribution vector is used as a weight vector of the current hidden layer state, and reflects the difference between the current hidden layer state and the health state of the hard disk; and determining whether the hard disk is in a fault state or not by monitoring the weight value in the weight vector. The fault information monitoring module 303 may calculate the attention distribution vector according to the above expressions six and seven.
Further, the apparatus for monitoring a hard disk state provided in the embodiment of the present invention may further include: a training module 304.
The training module 304 is used for training the recurrent neural network by using the SMART data of the healthy hard disk and the failed hard disk, namely training the recurrent neural network by using the SMART data of the healthy hard disk and the failed hard disk, and determining the parameter W in the recurrent neural networkr、Ur、Wz、Uz、W、U。
The training module 304 can also train the recurrent neural network to obtain the parameter WVAnd bvAnd parameters in attention mechanism
Figure BDA0001597597820000111
Wa、ba
In the technical scheme of the invention, in order to capture the long-term time dependence in SMART data, a Gated Recursion Unit (GRU) is introduced on the basis of the existing simple RNN, so that the problems of gradient extinction and explosion when a long-term sequence is processed are avoided. Therefore, the deviation of the normal state of the hard disk drive can be traced back to an earlier period, so that the fault detection rate or the fault prediction capability can be improved.
Furthermore, in the technical scheme of the invention, an attention mechanism is also designed, and an attention distribution vector is generated for the hidden layer state output by the recurrent neural network so as to reflect the difference between the current hidden layer state and the health state of the hard disk; by analyzing the attention distribution, wherein higher attention weight means more important role, the influence of which days in the past on the current state of the hard disk is the largest can be deeply known; therefore, the degradation process of the hard disk can be automatically revealed, and the reason of the hard disk failure can be tracked.
Those skilled in the art will appreciate that the present invention includes apparatus directed to performing one or more of the operations described in the present application. These devices may be specially designed and manufactured for the required purposes, or they may comprise known devices in general-purpose computers. These devices have stored therein computer programs that are selectively activated or reconfigured. Such a computer program may be stored in a device (e.g., computer) readable medium, including, but not limited to, any type of disk including floppy disks, hard disks, optical disks, CD-ROMs, and magnetic-optical disks, ROMs (Read-Only memories), RAMs (Random Access memories), EPROMs (Erasable Programmable Read-Only memories), EEPROMs (Electrically Erasable Programmable Read-Only memories), flash memories, magnetic cards, or optical cards, or any type of media suitable for storing electronic instructions, and each coupled to a bus. That is, a readable medium includes any medium that stores or transmits information in a form readable by a device (e.g., a computer).
It will be understood by those within the art that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by computer program instructions. Those skilled in the art will appreciate that the computer program instructions may be implemented by a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, implement the features specified in the block or blocks of the block diagrams and/or flowchart illustrations of the present disclosure.
Those of skill in the art will appreciate that various operations, methods, steps in the processes, acts, or solutions discussed in the present application may be alternated, modified, combined, or deleted. Further, various operations, methods, steps in the flows, which have been discussed in the present application, may be interchanged, modified, rearranged, decomposed, combined, or eliminated. Further, steps, measures, schemes in the various operations, methods, procedures disclosed in the prior art and the present invention can also be alternated, changed, rearranged, decomposed, combined, or deleted.
Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, is limited to these examples; within the idea of the invention, also features in the above embodiments or in different embodiments may be combined, steps may be implemented in any order, and there are many other variations of the different aspects of the invention as described above, which are not provided in detail for the sake of brevity. Therefore, any omissions, modifications, substitutions, improvements and the like that may be made without departing from the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims (8)

1. A hard disk state monitoring method comprises the following steps:
periodically acquiring different automatic detection analysis and reporting technology SMART attribute values of the monitored hard disk; after the SMART attribute value of the hard disk is obtained each time, the following operations are carried out:
obtaining attribute integration of the hard disk according to the SMART attribute value of the hard disk which is obtained currently;
integrating and inputting the attribute into a recurrent neural network with a gated recurrent unit introduced, and taking the hidden layer state of the recurrent neural network as output;
monitoring whether the hard disk will fail according to information obtained from the output of the recurrent neural network: generating an attention distribution vector according to the hidden layer state output by the recurrent neural network; wherein, the attention distribution vector is used as a weight vector of the current hidden layer state, and reflects the difference between the current hidden layer state and the health state of the hard disk; and determining whether the hard disk is in a fault state or not by monitoring the weight value in the weight vector.
2. The method according to claim 1, wherein the obtaining of the attribute integration of the hard disk according to the SMART attribute value of the hard disk currently obtained specifically comprises:
according to the SMART vector composed of the currently acquired SMART attribute values of the t day of the hard disk
Figure FDA0002850030240000011
The resulting attribute integration is represented as
Figure FDA0002850030240000012
The calculation is carried out according to the following expression I:
vt=ReLU(WVst+bv) (expression one)
Wherein the content of the first and second substances,
Figure FDA0002850030240000013
a weight matrix representing SMART attribute values,
Figure FDA0002850030240000014
is a bias vector; ReLU is an activation function defined as ReLU (x) ═ x + ═ max (0, x), where max is an element-by-element operation; wVAnd bvIs a vector obtained by learning in the training process in advance;
Figure FDA0002850030240000015
the representation dimension is dsThe set of real vectors of (2);
Figure FDA0002850030240000016
the representation dimension is dvThe set of real vectors of (2); dsFor the number of SMART attribute values, dvThe number of values is integrated for the attribute.
3. The method of claim 2, wherein the gated recursion units in the recurrent neural network comprise a gated unit and a recursion unit; wherein the content of the first and second substances,
the gate control unit is used for controlling the information flow of the recursion unit so that the recursion unit captures the dependence of a long time scale; wherein the content of the first and second substances,
the gating cell includes a reset gate and an update gate to allow the recursion cell to hold existing content or update content on an existing content basis.
4. The method according to claim 3, characterized in that the relation between the inputs and outputs of the recurrent neural network is realized in particular by a recurrent algorithm of the following four expressions:
rt=sigmoid(Wrvt+Urht-1) (expression two)
zt=sigmoid(Wzvt+Uzht-1) (expression III)
ht'=tanh(Wvt+U(rt⊙ht-1) (expression four)
ht=zt⊙ht-1+(1-zt)⊙h't(expression five)
Wherein, an element-by-element multiplication operation; r istDenotes a reset gate, h'tRepresenting alternative states, ztRepresents an update gate, htRepresenting the current hidden layer state of the recurrent neural network, namely the current output of the recurrent neural network; h ist-1Representing the hidden layer state obtained in the last time step of the recurrent neural network; parameter Wr、Ur、Wz、UzW, U are learned in advance during training.
5. The method of any of claims 1-4, further comprising, prior to periodically obtaining different SMART attribute values for the monitored hard disk:
training the recurrent neural network using SMART data of healthy and failed hard disks.
6. A hard disk state monitoring device, comprising:
the characteristic integration module is used for periodically acquiring different SMART attribute values of the monitored hard disk; after the SMART attribute value of the hard disk is obtained every time, obtaining the attribute integration of the hard disk according to the currently obtained SMART attribute value of the hard disk;
the time dependence extraction module is used for integrating and inputting the attribute into a recurrent neural network with a gating recurrent unit and taking the hidden layer state of the recurrent neural network as output;
a fault information monitoring module for monitoring whether the hard disk will have a fault state according to the information obtained from the output of the recurrent neural network: generating an attention distribution vector according to the hidden layer state output by the recurrent neural network; wherein, the attention distribution vector is used as a weight vector of the current hidden layer state, and reflects the difference between the current hidden layer state and the health state of the hard disk; and determining whether the hard disk is in a fault state or not by monitoring the weight value in the weight vector.
7. The apparatus of claim 6, wherein the gated recursion units in the recurrent neural network comprise a gating unit and a recursion unit; wherein the content of the first and second substances,
gating a gate unit in a recursive unit to control the flow of information for the recursive unit in the gated recursive unit such that the recursive unit captures long time scale dependencies; wherein the content of the first and second substances,
the gating cell includes a reset gate and an update gate to allow the recursion cell to hold existing content or update content on an existing content basis.
8. The apparatus of any of claims 6-7, further comprising:
and the training module is used for training the recurrent neural network by using SMART data of the healthy hard disk and the fault hard disk.
CN201810212464.7A 2018-03-15 2018-03-15 Hard disk state monitoring method and device Active CN108415810B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810212464.7A CN108415810B (en) 2018-03-15 2018-03-15 Hard disk state monitoring method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810212464.7A CN108415810B (en) 2018-03-15 2018-03-15 Hard disk state monitoring method and device

Publications (2)

Publication Number Publication Date
CN108415810A CN108415810A (en) 2018-08-17
CN108415810B true CN108415810B (en) 2021-05-11

Family

ID=63131486

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810212464.7A Active CN108415810B (en) 2018-03-15 2018-03-15 Hard disk state monitoring method and device

Country Status (1)

Country Link
CN (1) CN108415810B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109919744B (en) * 2018-11-23 2023-01-10 创新先进技术有限公司 Neural network-based detection method and device
CN111966569A (en) * 2019-05-20 2020-11-20 中国电信股份有限公司 Hard disk health degree evaluation method and device and computer readable storage medium
CN110929305A (en) * 2019-08-08 2020-03-27 北京盛赞科技有限公司 Hard disk protection method, device, equipment and computer readable storage medium
CN110737732A (en) * 2019-10-25 2020-01-31 广西交通科学研究院有限公司 electromechanical equipment fault early warning method
CN114428709B (en) * 2022-01-17 2022-08-05 广州鲁邦通物联网科技股份有限公司 SDS state detection method and system in cloud management platform

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0801340A1 (en) * 1996-04-10 1997-10-15 A.G. für industrielle Elektronik AGIE Losone bei Locarno Method and device for controlling a machine tool, in particular an electro-erosion machine
CN102799449A (en) * 2012-06-26 2012-11-28 杭州海康威视数字技术股份有限公司 Multi-system starting method and system
CN106651007A (en) * 2016-11-24 2017-05-10 北京理工大学 Method and device for GRU-based medium and long-term prediction of irradiance of photovoltaic power station
CN107578124A (en) * 2017-08-28 2018-01-12 国网山东省电力公司电力科学研究院 The Short-Term Load Forecasting Method of GRU neutral nets is improved based on multilayer

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0801340A1 (en) * 1996-04-10 1997-10-15 A.G. für industrielle Elektronik AGIE Losone bei Locarno Method and device for controlling a machine tool, in particular an electro-erosion machine
CN102799449A (en) * 2012-06-26 2012-11-28 杭州海康威视数字技术股份有限公司 Multi-system starting method and system
CN106651007A (en) * 2016-11-24 2017-05-10 北京理工大学 Method and device for GRU-based medium and long-term prediction of irradiance of photovoltaic power station
CN107578124A (en) * 2017-08-28 2018-01-12 国网山东省电力公司电力科学研究院 The Short-Term Load Forecasting Method of GRU neutral nets is improved based on multilayer

Also Published As

Publication number Publication date
CN108415810A (en) 2018-08-17

Similar Documents

Publication Publication Date Title
CN108415810B (en) Hard disk state monitoring method and device
CN111459700B (en) Equipment fault diagnosis method, diagnosis device, diagnosis equipment and storage medium
CN109359698B (en) Leakage identification method based on long-time memory neural network model
CN111914873A (en) Two-stage cloud server unsupervised anomaly prediction method
CN111191897B (en) Business process online compliance prediction method and system based on bidirectional GRU neural network
CN108052528A (en) A kind of storage device sequential classification method for early warning
US9632859B2 (en) Generating problem signatures from snapshots of time series data
CN108491861A (en) Power transmission and transformation equipment state abnormal patterns recognition methods based on multi-source multi-parameter fusion and device
CN114785666B (en) Network troubleshooting method and system
Su et al. Detecting outlier machine instances through gaussian mixture variational autoencoder with one dimensional cnn
CN109471698B (en) System and method for detecting abnormal behavior of virtual machine in cloud environment
CN109918313B (en) GBDT decision tree-based SaaS software performance fault diagnosis method
CN108415819B (en) Hard disk fault tracking method and device
CN109684320B (en) Method and equipment for online cleaning of monitoring data
CN112560269B (en) Rhapbody state machine-based high fault tolerance electronic system task reliability simulation analysis method
CN112083244A (en) Integrated avionics equipment fault intelligent diagnosis system
CN117034143B (en) Distributed system fault diagnosis method and device based on machine learning
CN116112283A (en) CNN-LSTM-based power system network security situation prediction method and system
CN115964258A (en) Internet of things network card abnormal behavior grading monitoring method and system based on multi-time sequence analysis
CN114528942A (en) Construction method of data sample library of engineering machinery, failure prediction method and engineering machinery
CN107911762A (en) A kind of ONU method for diagnosing faults based on decision tree
WO2024087404A1 (en) Nuclear reactor fault determination method, apparatus, device, storage medium, and product
CN116149895A (en) Big data cluster performance prediction method and device and computer equipment
Li et al. Meteorological radar fault diagnosis based on deep learning
US20220050763A1 (en) Detecting regime change in time series data to manage a technology platform

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant