CN107479836A

CN107479836A - Disk failure monitoring method, device and storage system

Info

Publication number: CN107479836A
Application number: CN201710757310.1A
Authority: CN
Inventors: 王勇
Original assignee: Zhengzhou Yunhai Information Technology Co Ltd
Current assignee: Zhengzhou Yunhai Information Technology Co Ltd
Priority date: 2017-08-29
Filing date: 2017-08-29
Publication date: 2017-12-15

Abstract

This application provides a kind of disk failure Forecasting Methodology, device and storage system, this method is applied to the monitoring server in storage system, and the storage system includes multiple memory nodes, has disk in the memory node, including：Obtain the current operating conditions information of disk in the memory node；Current operating conditions information based on the disk, and the status change model obtained using training in advance, fault occurrences of the disk after current time are predicted, failure predication result is obtained, a situation arises that training obtains for the history run status information based on disk in the plurality of memory node before the current time and history physical fault for the status change model；The failure predication result of the disk is sent to the terminal of user.The scheme of the application is advantageous to timely processing disk failure, reduces the unstable situation of the storage system caused by disk failure.

Description

Disk failure monitoring method, device and storage system

Technical field

The application is related to technical field of data storage, more particularly to a kind of disk failure monitoring method, device and storage System.

Background technology

In order to improve the reliability of data storage, storage system needs to carry out the redundant storage of data, therefore, storage system Need to dispose substantial amounts of disk to meet the needs of memory capacity.With the increase of number of disks in storage system, disk occurs The situation of failure also can accordingly increase.

In order to avoid causing the digital independent of storage system abnormal due to disk failure, after disk failure, The administrative staff of storage system need to replace the disk to break down using the disk prepared in advance out.Yet with failed disk With uncertainty, therefore, how to ensure the timely processing of failed disk, to replace failure in time, reduce due to magnetic The unstable situation of storage system caused by disk failure, it is those skilled in the art's technical problem in the urgent need to address.

The content of the invention

In view of this, this application provides a kind of disk failure monitoring method, device and storage system, with timely processing Disk failure, reduce the unstable situation of the storage system caused by disk failure.

To achieve the above object, on the one hand, the embodiment of the present application provides a kind of disk failure monitoring method, this method bag Include：

A kind of disk failure Forecasting Methodology, applied to the monitoring server in storage system, the storage system includes more Individual memory node, there is disk in the memory node, including：

Obtain the current operating conditions information of disk in the memory node；

Current operating conditions information based on the disk, and the status change model obtained using training in advance, to institute State fault occurrences of the disk after current time to be predicted, obtain failure predication result, the status change model For the history run status information based on disk in the multiple memory nodes of the current time foregoing description and the actual event of history A situation arises that training obtains for barrier；

The failure predication result of the disk is sent to the terminal of user.

Preferably, the current operating conditions information for obtaining disk in the memory node, including：

Obtain the current operating conditions information of disk in the memory node that the memory node reports.

Preferably, the current operating conditions information of the disk include it is following any one or more：

Current bad number of blocks in the operation duration of the disk, the load state of the disk and the disk.

Preferably, the terminal that the failure predication result of the disk is sent to user, including：

When the failure predication for receiving user is asked, the failure predication result of the disk is sent to the user's Terminal；

Or when meeting prediction result delivery time set in advance, the failure predication result of the disk is sent Terminal to user.

Preferably, in the current operating conditions information based on the disk, and the state obtained using training in advance Transition model, after being predicted to fault occurrences of the disk after current time, in addition to：

Obtaining physical fault of the disk after the current time, a situation arises, and is broken down in the disk In the case of, the fault status information of the disk is obtained, the fault status information includes：Fault occurrence reason and failure The generation moment；

According to the physical fault, a situation arises and the fault status information, to the ginseng in the status change model Number is modified.

On the other hand, present invention also provides a kind of disk failure prediction meanss, applied to the monitoring clothes in storage system Business device, the storage system include multiple memory nodes, have disk in the memory node, described device includes：

State acquisition unit, for obtaining the current operating conditions information of disk in the memory node；

Predicting unit is analyzed, for the current operating conditions information based on the disk, and obtained using training in advance Status change models, fault occurrences of the disk after current time are predicted, obtain failure predication result, The status change model is the history run state letter based on disk in the multiple memory nodes of the current time foregoing description A situation arises that training obtains for breath and history physical fault；

User interaction unit, for the failure predication result of the disk to be sent to the terminal of user.

Preferably, the state acquisition unit, including：

State acquisition subelement, the current operation of disk in the memory node reported for obtaining the memory node Status information.

Preferably, the user interaction unit, including：

First interactive unit, for receive user failure predication ask when, by the failure predication knot of the disk Fruit is sent to the terminal of the user；

Or second interactive unit, for when meeting prediction result delivery time set in advance, by the disk Failure predication result is sent to the terminal of user.

Preferably, in addition to：

Actual information acquiring unit, in failure of the analysis predicting unit to the disk after current time A situation arises be predicted after, obtaining physical fault of the disk after the current time, a situation arises, and in institute In the case of stating disk failure, the fault status information of the disk is obtained, the fault status information includes：Failure is sent out Moment occurs for raw reason and failure；

Modifying model unit, for a situation arises and the fault status information according to the physical fault, to described Parameter in status change model is modified.

On the other hand, present invention also provides a kind of storage system, including：

Monitoring server and multiple memory nodes, there is disk in the memory node；

The monitoring server, for obtaining the current operating conditions information of disk in the memory node；Based on described The current operating conditions information of disk, and the status change model obtained using training in advance, to the disk at current time Fault occurrences afterwards are predicted, and obtain failure predication result, the status change model be based on it is described current when Carving the history run status information of disk and history physical fault in the multiple memory nodes of the foregoing description, a situation arises trains Arrive；The failure predication result of the disk is sent to the terminal of user.

Understood via above-mentioned technical scheme, in the storage system of the application, storage can be obtained by monitoring server The current operating conditions information of each disk in system, and the running state information based on disk, are obtained using training in advance Status change model, fault occurrences of the prediction disk after current time, and the failure predication result predicted is sent out Giving the terminal of user so that user can recognize the disk for being likely to occur failure in time before disk failures, from And the preparation of troubleshooting can be carried out in advance, be advantageous to find simultaneously timely processing failure in time, reduce due to disk failure And cause the unstable situation of storage system.

Further, since the application can there may be the disk of failure with look-ahead, so, it can also be directed to and there may be Disk failure, with this information it is possible to determine the quantity of the backup diskette of required outfit, be advantageous to more reasonably configure the number of backup diskette Amount, can both avoid the wasting of resources, can reduce the lazy weight due to backup diskette again, and lead to not timely processing disk The situation of failure.

Brief description of the drawings

, below will be to embodiment or existing in order to illustrate more clearly of the embodiment of the present application or technical scheme of the prior art There is the required accompanying drawing used in technology description to be briefly described, it should be apparent that, drawings in the following description are only this The embodiment of application, for those of ordinary skill in the art, on the premise of not paying creative work, can also basis The accompanying drawing of offer obtains other accompanying drawings.

Fig. 1 shows a kind of composition structural representation of storage system that a kind of disk failure Forecasting Methodology of the application is applicable Figure；

Fig. 2 shows a kind of schematic flow sheet of disk failure Forecasting Methodology one embodiment of the application；

Fig. 3 shows a kind of schematic flow sheet of another embodiment of disk failure Forecasting Methodology of the application；

Fig. 4 shows a kind of composition structural representation of another embodiment of disk failure prediction meanss of the application.

Embodiment

A kind of disk of the disk failure Forecasting Methodology of the application in the memory node suitable for storage system carries out event Barrier prediction.As shown in figure 1, it illustrates a kind of structure composed schematic diagram of storage system of the application, as shown in Figure 1, the storage System includes multiple memory nodes 101 and an at least monitoring server 102, wherein, the monitoring server it is also assumed that It is a memory node with data acquisition and data processing function.

Wherein, there is the disk for data storage in memory node 101.

In the embodiment of the present application, the memory node can obtain the running state information of disk, e.g., during the operation of disk Length, load state, health status (such as, if bad block and bad number of blocks etc. be present) etc., and by the operation of the disk got State information report is to monitoring server.

Such as, the current operation shape of disk can be obtained in memory node by disk drive and the interface being connected with disk State etc. information.For example, the preset state acquisition module for information gathering in memory node, to pass through the state acquisition mould Block obtains the related information of other disks such as the running state information of disk.

Certainly, in addition to the running state information of disk, the memory node can also obtain the attribute information of the disk, Such as, the model of disk, type etc., so as to attribute of the follow-up monitoring server based on disk, different memory nodes are reported The related data information of the disk of different attribute is normalized.

Optionally, the memory node when disk breaks down, can also obtain the fault status information of disk, wherein, The fault status information can include：Time of failure, and fault occurrence reason etc..

Accordingly, monitoring server 102, for the running state information of the disk reported based on memory node, to magnetic Fault occurrences of the disk after current time are predicted, and will predict that obtained failure predication result is notified to user's Terminal.

With reference to above general character, a kind of disk failure Forecasting Methodology of the application is described in detail below.

Such as, referring to Fig. 2, it illustrates a kind of schematic flow sheet of disk failure Forecasting Methodology one embodiment of the application, The method of the present embodiment is described from the angle of the monitoring server in storage system, and the method for the present embodiment can include：

S201, obtain the current operating conditions information of disk in memory node.

Wherein, the current operating conditions information of disk is used for the current state for reflecting disk, passes through the current operating conditions Information can analyze the disk with the presence or absence of the possibility to break down, and degree of risk to break down etc..

Such as, the current operating conditions of the disk can include：The operation duration of disk, use duration, the load shape of disk Condition (ratio of volume exclusion total capacity that e.g., disk is occupied etc.), and bad number of blocks current in the disk etc..Its In, if the problems such as being compared sharp pounding and unexpected power down among the process of read-write, the magnetic head of disk is also very easy to Scratch medium and produce bad block, when the quantity of bad block in disk is excessive, it is possible to cause the data storage of disk abnormal so that magnetic Disk breaks down.

Wherein, the running state information of the disk can be as previously mentioned, can monitor disk by memory node Current operating conditions information and report the monitoring server.The magnetic reported for each memory node in storage system The current operating conditions information of disk, monitoring server can be directed to the disk in each memory node respectively, successively based on the magnetic The fault occurrences of disk are predicted by the current operating conditions information of disk.

Certainly, in addition to memory node reports the current operating conditions information of disk, monitoring server can also be successively The current operating conditions information of each memory node request disk into storage system.

S202, the current operating conditions information based on the disk, and the status change model obtained using training in advance, it is right Fault occurrences of the disk after current time are predicted, and obtain failure predication result.

Wherein, the status change model is for the model for the fault occurrences for predicting disk, the status change model It is real for the history run status information based on disk in multiple memory nodes in storage system before current time and history Border fault occurrences train what is obtained.Wherein, for the ease of distinguishing, by the operation shape of the disk got before current time State information is referred to as history run status information, and by before current time, and a situation arises that to be referred to as history real for the physical fault of disk Border fault occurrences.

Such as, the status change model can be deep neural network model or convolutional neural networks model etc..

Wherein, training the process for obtaining the status change model can be：By disk in the multiple memory nodes got History run status information be input in status change model (e.g., deep neural network model) to be trained, and respectively will A situation arises with the history physical fault of each disk for the fault occurrences for each disk that the status change model prediction goes out It is compared, and by constantly adjusting the parameter in status change model so that the disk that the status change model prediction goes out Matching degree of the history physical fault of fault occurrences and disk between a situation arises exceedes predetermined threshold value, so as to train To the status change model.

Wherein, the fault occurrences of the disk predicted can include：Disk whether there is failure, and disk breaks down Risk, possibility time etc. that disk breaks down.For the ease of distinguishing, the fault occurrences of the disk predicted are claimed For the failure predication result of disk.

Optionally, while step S201 obtains the current operating conditions information of disk, the disk can also be obtained Attribute information, e.g., the attribute information of disk in service node is reported by service node, or obtain and deposited in advance in monitoring server The attribute information of the disk of storage.Because the attribute information of different disk is different so that the current fortune of the different disk collected The information of row status information in other words data format can difference, therefore, monitoring server can utilize status change model Before predicting failure predication result, the attribute information can be first based on, the current operating conditions information of disk is converted in advance The data format of setting, to realize the unitized of information format, and utilize the current operating conditions information after unitizing, prediction The failure predication result of the disk.

S203, the failure predication result of the disk is sent to the terminal of user.

Wherein, after the failure predication result predicted being sent into the terminal of user, user can be according to failure predication As a result judge whether disk is likely to occur failure, and be likely to occur time of failure etc. information, so as to event occur in disk Before barrier, failure countermeasure is made in advance, when disk breaks down, timely processing failed disk, to improve failure The promptness and treatment effeciency of processing.

It is understood that in the embodiment of the present application, the terminal for receiving the user of the failure predication result can be to deposit Terminal where the administrative staff of storage system, or terminal of other users set in advance etc..

Wherein, monitoring server failure predication result is sent to the terminal of user specific implementation can have it is more Kind：

Such as, in the case of one kind is possible, when monitoring server determines that disk has failure based on failure predication result Risk when, the failure predication result of the disk is sent to the terminal of user.

And for example, in the case of another is possible, user can send failure predication to monitoring server as needed please Ask, monitoring server is each in the storage system that can go out current predictive when the failure predication for receiving user is asked The failure predication result of disk is sent respectively to the terminal of user.Either, the failure predication request in can carry to The mark of a few disk, monitoring server can at least one disk mark, at least one disk that will be predicted Failure predication result be sent to the terminal of the user.

And for example, in the case of another is possible, user can preset the pushing condition of failure predication result, e.g., Moment, push cycle etc. are pushed, so, monitoring server can judge current according to user's pushing condition set in advance Whether satisfaction reaches the prediction result delivery time that can send prediction result.Accordingly, when monitoring server determines currently completely During foot prediction result delivery time set in advance, the fail result of the disk is sent to the terminal of user.

Certainly, in actual applications, monitoring server the mode of the failure predication result is sent to user terminal can be with There are other possible, be not any limitation as herein.

It can be seen that in the storage system of the application, working as each disk in storage system, can be obtained by monitoring server Preceding running state information, and the running state information based on disk, the status change model obtained using training in advance, predict magnetic Fault occurrences of the disk after current time, and the failure predication result predicted is sent to the terminal of user so that User can recognize the disk for being likely to occur failure in time before disk failures, so as to carry out failure in advance The preparation of processing, be advantageous to find simultaneously timely processing failure in time, reduce due to disk failure and cause storage system unstable Fixed situation.

Simultaneously as the disk of failure can be there may be with look-ahead, so, disk that may be present can also be directed to Failure, with this information it is possible to determine the quantity of the backup diskette of required outfit, be advantageous to more reasonably configure the quantity of backup diskette, both may be used To avoid the wasting of resources, the lazy weight due to backup diskette can be reduced again, and leads to not timely processing disk failure Situation.

Such as, referring to Fig. 3, it illustrates a kind of flow signal of another embodiment of disk failure Forecasting Methodology of the application Figure, the method for the present embodiment are described from the angle of monitoring server, and the method for the present embodiment can include：

S301, obtain the current operating conditions information of disk in the memory node that the memory node reports.

In the present embodiment, it is situated between so that memory node reports the current operating conditions information of disk to monitoring server as an example Continue, but the present embodiment is applied equally to for other situations.

S302, the current operating conditions information based on the disk, and the status change model obtained using training in advance, it is right Fault occurrences of the disk after current time are predicted, and obtain failure predication result.

Wherein, the status change model is the history run state based on disk in multiple memory nodes before current time A situation arises that training obtains for information and history physical fault.

S303, the failure predication result of the disk is sent to the terminal of user.

Wherein, above step S301 may refer to the related introduction of preceding embodiment to step S303, will not be repeated here.

S304, obtaining physical fault of the disk in this prior after the moment, a situation arises, and broken down in the disk In the case of, obtain the fault status information of disk.

Wherein, a situation arises to include for physical fault：Disk whether there is failure, and failure actually occurs the moment Etc..Accordingly, the fault status information includes：Moment occurs for fault occurrence reason and failure.

Obtaining the physical fault of the disk, a situation arises and the fault status information after disk failure, can be with Whether the failure predication result gone out for subsequent analysis to status change model prediction is accurate.

S305, according to the physical fault, a situation arises and fault status information, to the parameter in the status change model It is modified.

Specifically, monitoring server can be directed to each disk, that analyzes that the status change model last time predicts should Whether the physical fault situation whether the failure predication result of disk is currently got with the disk is consistent, if inconsistent, The then amendment data using the physical fault situation of the disk and fault status information as status change model, and utilize amendment Data are modified to the status change model, to improve the degree of accuracy of status change model prediction disk failure situation.

A kind of disk failure Forecasting Methodology of corresponding the application, the embodiment of the present application additionally provide a kind of disk failure prediction Device.Such as, referring to Fig. 4, it illustrates a kind of schematic flow sheet of disk failure prediction meanss one embodiment of the application, this reality Apply the device of example includes multiple memory nodes, the storage applied to the monitoring server in storage system, the storage system There is disk, described device includes in node：

State acquisition unit 401, for obtaining the current operating conditions information of disk in the memory node；

Predicting unit 402 is analyzed, for the current operating conditions information based on the disk, and is obtained using training in advance Status change model, fault occurrences of the disk after current time are predicted, obtain failure predication knot Fruit, the status change model are the history run state based on disk in the multiple memory nodes of the current time foregoing description A situation arises that training obtains for information and history physical fault；

User interaction unit 403, for the failure predication result of the disk to be sent to the terminal of user.

In one implementation, the state acquisition unit, including：

In one implementation, the user interaction unit, including：

Optionally, described device can also include：

It should be noted that each embodiment in this specification is described by the way of progressive, each embodiment weight Point explanation is all difference with other embodiment, between each embodiment identical similar part mutually referring to. For device class embodiment, because it is substantially similar to embodiment of the method, so description is fairly simple, related part is joined See the part explanation of embodiment of the method.

Finally, it is to be noted that, herein, such as first and second or the like relational terms be used merely to by One entity or operation make a distinction with another entity or operation, and not necessarily require or imply these entities or operation Between any this actual relation or order be present.Moreover, term " comprising ", "comprising" or its any other variant meaning Covering including for nonexcludability, so that process, method, article or equipment including a series of elements not only include that A little key elements, but also the other element including being not expressly set out, or also include for this process, method, article or The intrinsic key element of equipment.In the absence of more restrictions, the key element limited by sentence "including a ...", is not arranged Except other identical element in the process including key element, method, article or equipment being also present.

The foregoing description of the disclosed embodiments, those skilled in the art are enable to realize or using the present invention.To this A variety of modifications of a little embodiments will be apparent for a person skilled in the art, and generic principles defined herein can Without departing from the spirit or scope of the present invention, to realize in other embodiments.Therefore, the present invention will not be limited The embodiments shown herein is formed on, and is to fit to consistent with principles disclosed herein and features of novelty most wide Scope.

It the above is only the preferred embodiment of the present invention, it is noted that come for those skilled in the art Say, under the premise without departing from the principles of the invention, some improvements and modifications can also be made, these improvements and modifications also should be regarded as Protection scope of the present invention.

Claims

A kind of 1. disk failure Forecasting Methodology, it is characterised in that applied to the monitoring server in storage system, the storage system System includes multiple memory nodes, has disk in the memory node, including：

Obtain the current operating conditions information of disk in the memory node；

Current operating conditions information based on the disk, and the status change model obtained using training in advance, to the magnetic Fault occurrences of the disk after current time are predicted, and obtain failure predication result, and the status change model is base The history run status information of disk and history physical fault hair in the multiple memory nodes of the current time foregoing description Raw situation trains what is obtained；

The failure predication result of the disk is sent to the terminal of user.
2. disk failure Forecasting Methodology according to claim 1, it is characterised in that described to obtain magnetic in the memory node The current operating conditions information of disk, including：

Obtain the current operating conditions information of disk in the memory node that the memory node reports.
3. disk failure Forecasting Methodology according to claim 1 or 2, it is characterised in that the current operation shape of the disk State information include it is following any one or more：

Current bad number of blocks in the operation duration of the disk, the load state of the disk and the disk.
4. disk failure Forecasting Methodology according to claim 1, it is characterised in that the failure predication by the disk As a result the terminal of user is sent to, including：

When the failure predication for receiving user is asked, the failure predication result of the disk is sent to the end of the user End；

Or when meeting prediction result delivery time set in advance, the failure predication result of the disk is sent to use The terminal at family.
5. disk failure Forecasting Methodology according to claim 1, it is characterised in that described based on the current of the disk Running state information, and the status change model obtained using training in advance, to failure of the disk after current time A situation arises be predicted after, in addition to：

Obtaining physical fault of the disk after the current time, a situation arises, and the feelings to be broken down in the disk Under condition, the fault status information of the disk is obtained, the fault status information includes：Fault occurrence reason and failure occur Moment；

According to the physical fault, a situation arises and the fault status information, and the parameter in the status change model is entered Row amendment.
A kind of 6. disk failure prediction meanss, it is characterised in that applied to the monitoring server in storage system, the storage system System includes multiple memory nodes, has disk in the memory node, described device includes：

State acquisition unit, for obtaining the current operating conditions information of disk in the memory node；

Predicting unit is analyzed, for the current operating conditions information based on the disk, and the state obtained using training in advance Transition model, fault occurrences of the disk after current time are predicted, obtain failure predication result, it is described Status change model be the history run status information based on disk in the multiple memory nodes of the current time foregoing description with And a situation arises that training obtains for history physical fault；

User interaction unit, for the failure predication result of the disk to be sent to the terminal of user.
7. disk failure prediction meanss according to claim 6, it is characterised in that the state acquisition unit, including：

State acquisition subelement, the current operating conditions of disk in the memory node reported for obtaining the memory node Information.
8. disk failure prediction meanss according to claim 6, it is characterised in that the user interaction unit, including：

First interactive unit, for when receiving the failure predication request of user, the failure predication result of the disk to be sent out Give the terminal of the user；

Or second interactive unit, for when meeting prediction result delivery time set in advance, by the failure of the disk Prediction result is sent to the terminal of user.
9. disk failure prediction meanss according to claim 6, it is characterised in that also include：

Actual information acquiring unit, for failure of the disk after current time to occur in the analysis predicting unit After situation is predicted, obtaining physical fault of the disk after the current time, a situation arises, and in the magnetic In the case that disk breaks down, the fault status information of the disk is obtained, the fault status information includes：Failure occurs former Moment occurs for cause and failure；

Modifying model unit, for a situation arises and the fault status information according to the physical fault, to the state Parameter in transition model is modified.
A kind of 10. storage system, it is characterised in that including：

Monitoring server and multiple memory nodes, there is disk in the memory node；

The monitoring server, for obtaining the current operating conditions information of disk in the memory node；Based on the disk Current operating conditions information, and the status change model obtained using training in advance, to the disk after current time Fault occurrences be predicted, obtain failure predication result, the status change model be based on the current time it A situation arises that training obtains for the history run status information of disk and history physical fault in preceding the multiple memory node； The failure predication result of the disk is sent to the terminal of user.