CN115858265A

CN115858265A - Disk failure prediction method and device, electronic equipment and readable storage medium

Info

Publication number: CN115858265A
Application number: CN202211624477.8A
Authority: CN
Inventors: 杨聪敏; 曲新华; 李人杰; 赵立伟
Original assignee: Beijing Sino Bridge Technology Co ltd
Current assignee: Beijing Sino Bridge Technology Co ltd
Priority date: 2022-12-16
Filing date: 2022-12-16
Publication date: 2023-03-28

Abstract

The present disclosure relates to a disk failure prediction method, device, electronic device and readable storage medium, the method comprising: acquiring disk information of a first disk, wherein the disk information comprises a logical disk error report log, physical disk state information and disk SMART data; inputting the collected disk information into a disk failure prediction model to predict the failure probability of the first disk; the disk failure prediction model comprises a first classifier and a second classifier, the first classifier performs disk failure prediction by using data features corresponding to disk information, the second classifier performs disk failure prediction by using difference values of the data features corresponding to the disk information, and a weighted average value of a probability value of a first disk failure obtained by the first classifier and a probability value of the first disk failure obtained by the second classifier is a probability value of the first disk failure obtained by the disk failure prediction model.

Description

Disk failure prediction method and device, electronic equipment and readable storage medium

Technical Field

The present disclosure relates to the field of computer storage, and in particular, to a disk failure prediction method, apparatus, electronic device, and readable storage medium.

Background

In recent years, security and reliability of cloud computing technology have become a focus of much enterprise attention. The basis of cloud computing is cloud storage, with approximately 90% of the data stored in disks in a data center worldwide. Due to the mechanism of the disk storage itself, once the disk is damaged, the data stored therein will be permanently lost. Although the probability of failure of the disk is lower and lower with the progress of the disk manufacturing process, the disk failure in the cloud environment is still frequent due to the extremely large scale of the number of disks in the cloud storage. Therefore, early failure prediction of the disk is necessary.

There has been much research on the prediction of early disk failures, but the following challenges still remain:

with the rise of artificial intelligence, many current researches improve the accuracy of prediction by using machine learning technology and achieve good effects. However, most of the current technologies are developed based on Self-detection analysis and reporting technology (Self-monitoring analysis and reporting technology, abbreviated as SMART), and the accuracy of the prediction result is not high enough. And under the condition of zero fault samples of the new-model disk, the construction of an early fault prediction model is difficult.

Disclosure of Invention

In order to solve the problems in the related art, in a first aspect, an embodiment of the present disclosure provides a disk failure prediction method, including:

acquiring disk information of a first disk, wherein the disk information comprises a logical disk error report log, physical disk state information and disk SMART data;

inputting the collected disk information into a disk failure prediction model to predict the failure probability of the first disk;

the disk failure prediction model comprises a first classifier and a second classifier, the first classifier performs disk failure prediction by using data features corresponding to the disk information, the second classifier performs disk failure prediction by using a difference value of the data features corresponding to the disk information, and a weighted average value of a probability value of the first disk failure predicted by the first classifier and a probability value of the first disk failure predicted by the second classifier is the probability value of the first disk failure predicted by the disk failure prediction model; the disk failure prediction model is obtained by training at least by utilizing disk information of the second disk.

According to the embodiment of the present disclosure, in a case that the second disk is the same as the first disk in model, the training process of the disk failure prediction model includes:

storing the disk information of the second disk into a training sample data pool, wherein each disk information of each second disk corresponds to an array, and each array represents one sample data;

sequentially inputting sample data in the training sample data pool into a time window according to a time sequence, wherein the time window comprises a semi-supervised learning window with the time being before and an active learning window with the time being after;

responding to the fact that the magnetic disk corresponding to the sample data representation input last time in the semi-supervised learning window is in fault, predicting each sample data in the semi-supervised learning window by adopting a semi-supervised learning algorithm, and obtaining the probability value of each sample data representing the magnetic disk as a healthy state and the probability value of each sample data representing the magnetic disk as a fault state;

determining the value of each sample data according to the probability value of each sample data representing that the disk is in a healthy state and the probability value of each sample data representing that the disk is in a fault state; and

and selecting sample data with the value larger than a preset threshold value, and adding the selected sample data into the training set.

According to the embodiment of the present disclosure, the training process of the disk failure prediction model further includes:

determining the proportion of positive samples and negative samples in the current training set;

in response to determining that the proportion of positive samples and negative samples in the current training set is smaller than a preset proportion and sample data which has been input in the active learning window is full, executing a first operation,

wherein the first operation comprises: predicting sample data in the active learning window in sequence by adopting an active learning algorithm to obtain a probability value of representing the health state of the disk by the corresponding sample data; and selecting sample data corresponding to the probability value falling in the preset probability interval, and adding the selected sample data into the training set.

According to an embodiment of the present disclosure, wherein the training set comprises a first training set and a second training set; wherein the content of the first and second substances,

in the case that the training set comprises sample data selected by adopting the active learning algorithm, the first training set comprises a first subset and a second subset; in the case that the training set does not include sample data selected using the active learning algorithm, the first training set includes only the second subset;

under the condition that the training set comprises sample data selected by adopting the active learning algorithm, the second training set comprises a third subset and a fourth subset; in the case that the training set does not include sample data selected using the active learning algorithm, the second training set includes only the fourth subset;

wherein the content of the first and second substances,

the sample data included in each of the first subset and the third subset is selected using the active learning algorithm, and the sample data included in each of the second subset and the fourth subset is selected using the semi-supervised learning algorithm;

the first training set is used for training of the first classifier; the second training set is used for training of the second classifier.

According to the embodiment of the disclosure, the training process of the disk failure prediction model further comprises

Configuring a first tag value and a second tag value, wherein the first tag value is used for representing that the disk state corresponding to the sample data is a healthy state, and the second tag value is used for representing that the disk state corresponding to the sample data is a fault state;

sample data in both the first subset and the third subset is tagged with the first tag value;

the tag value of each sample data marker in the second subset is determined by:

predicting data characteristics corresponding to each sample data in the second subset by adopting the semi-supervised learning algorithm to obtain a probability value of each sample data representing that the disk is in a healthy state and a probability value of the disk in a fault state;

marking the first label value for the sample data with the probability value representing the health state being greater than the probability value representing the fault state, and marking the second label value for the sample data with the probability value representing the health state being less than the probability value representing the fault state;

the tag value of each sample data marker in the fourth subset is determined by:

predicting the difference value of the data characteristics corresponding to each sample data in the fourth subset by adopting the semi-supervised learning algorithm to obtain the probability value of each sample data representing that the disk is in a healthy state and the probability value of each sample data representing that the disk is in a fault state;

and marking the first label value for the sample data with the probability value representing the health state being greater than the probability value representing the fault state, and marking the second label value for the sample data with the probability value representing the health state being less than the probability value representing the fault state.

According to an embodiment of the present disclosure, after selecting sample data corresponding to a probability value falling within a preset probability interval and adding the selected sample data to the training set, the training process of the disk failure prediction model further includes:

and in response to the fact that the proportion of the positive samples and the negative samples in the current training set is smaller than the preset proportion, returning to continue executing the first operation until the proportion of the positive samples and the negative samples in the current training set is equal to the preset proportion.

According to the embodiment of the present disclosure, in a case that the model of the second disk is different from that of the first disk, a training process of the disk failure prediction model includes:

firstly, training by using the disk information of the second disk to obtain an intermediate prediction model;

and then, performing migration learning on the intermediate prediction model by using the collected disk information of the first disk to obtain the disk failure prediction model, wherein the collected disk information of the first disk is positive sample data.

In a second aspect, an embodiment of the present disclosure provides a disk failure prediction apparatus, including: the acquisition module is configured to acquire disk information of a first disk, wherein the disk information comprises a logical disk error log, physical disk state information and disk SMART data;

the input module is configured to input the collected disk information into a disk failure prediction model so as to predict the probability of the first disk failure;

In a third aspect, the disclosed embodiments provide an electronic device comprising a memory and a processor, wherein the memory is configured to store one or more computer instructions, wherein the one or more computer instructions are executed by the processor to implement the method according to the first aspect.

In a fourth aspect, the disclosed embodiments provide a computer-readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the method according to the first aspect.

In a fifth aspect, the disclosed embodiments provide a computer program product comprising computer instructions which, when executed by a processor, implement the method steps as described in the first aspect.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

Other features, objects, and advantages of the present disclosure will become more apparent from the following detailed description of non-limiting embodiments when taken in conjunction with the accompanying drawings. In the drawings:

FIG. 1 shows a schematic diagram of a disk failure prediction scenario, according to an embodiment of the present disclosure.

FIG. 2 shows a flow diagram of a disk failure prediction method according to an embodiment of the present disclosure.

FIG. 3 shows a flow chart of a disk failure prediction model training method according to an embodiment of the present disclosure.

FIG. 4 shows a flow diagram of a disk failure prediction model training method according to an embodiment of the disclosure.

Fig. 5 shows a block diagram of a configuration of a disk failure prediction apparatus according to an embodiment of the present disclosure.

Fig. 6 shows a block diagram of an electronic device according to an embodiment of the present disclosure.

FIG. 7 shows a schematic block diagram of a computer system suitable for use in implementing a method according to an embodiment of the present disclosure.

Detailed Description

Hereinafter, exemplary embodiments of the present disclosure will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily implement them. Also, for the sake of clarity, parts not relevant to the description of the exemplary embodiments are omitted in the drawings.

In the present disclosure, it is to be understood that terms such as "including" or "having," etc., are intended to indicate the presence of the disclosed features, numerals, steps, actions, components, parts, or combinations thereof in the specification, and are not intended to preclude the possibility that one or more other features, numerals, steps, actions, components, parts, or combinations thereof are present or added.

It should also be noted that the embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

In the present disclosure, if an operation of acquiring user information or user data or an operation of presenting user information or user data to others is involved, the operations are all operations authorized, confirmed, or actively selected by a user.

First, an application scenario to which the technical solution of the present disclosure is applied is described with reference to fig. 1.

As shown in fig. 1, it is assumed that a data center is provided with a storage server cluster 110, which includes a storage server 111, a storage server 112, a storage server 113, and a storage server 114. For simplicity, this disclosure shows only four storage servers, but it is understood that a cluster of storage servers of one data center may deploy more or fewer storage servers.

The storage server cluster 110 is communicatively connected to the server 120, and the server 120 is responsible for collecting disk information (including logical disk error log, physical disk state information, and disk SMART data) for the storage server 111, the storage server 112, the storage server 113, and the storage server 114, that is, the server 120 may receive the disk information of each disk installed on these storage servers and store the received disk information according to the ID of each disk and the generation time of the disk information.

The server 120 is in communication connection with the server 130, and the server 130 is responsible for acquiring the disk information from the server 120 and storing the disk information into a local training sample data pool. When the training condition of the disk failure prediction model is satisfied, the server 130 is responsible for training the disk failure prediction model required for predicting the disk failure based on the acquired disk information.

The server 120, the server 130, and the terminal device 150 are all communicatively connected to the server 140, and the server 140 is responsible for acquiring a trained disk failure prediction model from the server 130, so as to perform failure prediction on disks installed on each storage server in the storage server cluster 110 through the disk information collected by the server 120. When the prediction result indicates that the probability of the failure of any one or more disks installed on the storage servers exceeds a preset value, generating corresponding alarm information for each disk with the failure probability exceeding the preset value, and sending the alarm information to the terminal device 150 of the user for alarm.

The embodiment of the disclosure provides a disk failure prediction method, which includes: acquiring disk information of a first disk, wherein the disk information comprises a logical disk error report log, physical disk state information and disk SMART data; and inputting the acquired disk information into a disk failure prediction model to predict the failure probability of the first disk. The disk failure prediction model comprises a first classifier and a second classifier, the first classifier performs disk failure prediction by using data features corresponding to the disk information, the second classifier performs disk failure prediction by using a difference value of the data features corresponding to the disk information, and a weighted average value of a probability value of the first disk failure predicted by the first classifier and a probability value of the first disk failure predicted by the second classifier is the probability value of the first disk failure predicted by the disk failure prediction model; the disk failure prediction model is obtained by training at least by utilizing disk information of the second disk.

According to an embodiment of the present disclosure, the first disk may be any disk installed on any one of the storage server 111, the storage server 112, the storage server 113, the storage server 114 or a designated storage server therein or a designated disk therein. The second disk may be one or more disks of the same type as the first disk installed on storage server 111, storage server 112, storage server 113, storage server 114, in which case, for example, the second disk may comprise the first disk. Or the second disk may be one or more disks installed on storage server 111, storage server 112, storage server 113, or storage server 114 of a different model than the first disk.

FIG. 2 shows a flow diagram of a disk failure prediction method according to an embodiment of the present disclosure. As shown in fig. 2, the disk failure prediction method includes the following steps S201 to S202:

in step S201, collecting disk information of a first disk, where the disk information includes a logical disk error log, physical disk state information, and disk SMART data;

in step S202, the collected disk information is input into a disk failure prediction model to predict the probability of the first disk failing.

According to the embodiment of the disclosure, the disk failure prediction model comprises a first classifier and a second classifier, the first classifier is used for predicting disk failures by using data characteristics corresponding to the disk information, the second classifier is used for predicting disk failures by using difference values of the data characteristics corresponding to the disk information, and a weighted average value of a probability value of the first disk failure obtained by the first classifier and a probability value of the first disk failure obtained by the second classifier is the probability value of the first disk failure obtained by the disk failure prediction model; the disk failure prediction model is obtained by training at least by utilizing disk information of the second disk.

According to the embodiment of the disclosure, for example, when the health status of the disk is judged through the logical disk log, the PASSED indicates that the disk is healthy, otherwise, the PASSED indicates that the disk has failed or is about to fail soon. When the health state of the disk is judged according to the state information of the physical disk, for example, mediaErrorCount:0 indicates that the disk has no bad track, otherwise, indicates that the disk has a bad track or can quickly have a bad track; otherErrorCount:0 indicates that the disk has not loosened and does not need to be reinserted, or that the disk has loosened or soon loosens and needs to be reinserted. When the health state of the disk is judged through the SMART data of the disk, for example, when each index parameter representing the operation conditions of a magnetic head, a disk, a motor, a circuit and the like of the disk is in a corresponding threshold range, the health of the disk is shown, otherwise, the disk is failed or fails soon.

Therefore, for any disk, any one of the logical disk error report log, the physical disk state information and the disk SMART data can be used for predicting whether the disk fails, and compared with the prediction of whether the disk fails only through the disk SMART data, the accuracy of the prediction result can be improved by simultaneously predicting whether the disk fails through the logical disk error report log, the physical disk state information and the disk SMART data. Compared with the method for sequentially predicting whether the same disk fails or not through the logic disk error report log, the physical disk state information and the disk SMART data, the method for predicting whether the disk fails or not through the logic disk error report log, the physical disk state information and the disk SMART data can predict whether the disk fails or not at one time, accuracy of a prediction result is guaranteed, and prediction efficiency can be improved.

According to the embodiment of the disclosure, the logical disk error log, the physical disk state information and the disk SMART data of the first disk can be input into the disk failure prediction model in a predetermined array format, if the output probability prediction value is smaller than the probability threshold value, the first disk is healthy, otherwise, the first disk fails or fails soon.

Taking the scenario shown in fig. 1 as an example, the method shown in fig. 2 may be performed by the server 140. The server 120 receives disk information about the first disk from the storage server cluster 110 and sends it to the server 140. The server 140 inputs the acquired disk information of the first disk into a disk failure prediction model trained by the server 130, so as to output a predicted value of the probability of the first disk failing. When the probability prediction value indicates that the first disk has failed or soon will fail, generating alarm information for the first disk, and sending the alarm information to the terminal device 150 of the user for alarm. When the probabilistic predictive value indicates that the first disk is healthy, no warning information is generated for the first disk, and the server 140 continues to monitor and predict disk failures for the first disk in the storage server cluster 110. It should be noted that, while the server 140 performs disk failure monitoring and prediction on the first disk in the storage server cluster 110 by using the disk failure prediction model, the server may also perform disk failure monitoring and prediction on other disks of the same model as the first disk in the storage server cluster 110 by using the disk failure prediction model.

In this way, a user can predict whether the disk fails by using the logical disk error log, the physical disk state information and the disk SMART data of the disk, so that the prediction efficiency is improved while the accuracy is ensured.

FIG. 3 shows a flow chart of a disk failure prediction model training method according to an embodiment of the present disclosure. As shown in fig. 3, in the case that the second disk is the same as the first disk in model, the disk failure prediction model training method includes the following steps S301 to S305:

in step S301, storing the disk information of the second disk into a training sample data pool, where each disk information of each second disk corresponds to an array, and each array represents one sample data;

in step S302, sequentially inputting sample data in the training sample data pool into time windows according to a time sequence, where the time windows include a semi-supervised learning window with a previous time and an active learning window with a later time;

in step S303, in response to that the sample data representation input last time in the semi-supervised learning window has a fault on the corresponding disk, predicting each sample data in the semi-supervised learning window by using a semi-supervised learning algorithm to obtain a probability value that each sample data representation disk is in a healthy state and a probability value that the disk is in a fault state;

in step S304, determining the value of each sample data according to the probability value of the health state of each sample data representation disk and the probability value of the fault state of each sample data representation disk;

in step S305, sample data with a value greater than a preset threshold is selected, and the selected sample data is added to the training set.

According to an embodiment of the present disclosure, the second disk includes one or more disks, and the second disk includes the first disk when the second disk is the same as the first disk in model. Therefore, the disk failure prediction model required in the embodiment shown in fig. 2 can be obtained by utilizing the disk information training of the second disk, and the difficulty in constructing the early disk failure prediction model is reduced while the generalization capability of the model is ensured.

According to the embodiment of the disclosure, the logical disk error log, the physical disk state information and the disk SMART data generated by each disk in the second disk at each time point may be stored in a training sample data pool (hereinafter referred to as a data pool) in a predetermined array format.

According to the embodiment of the disclosure, a time window including two small windows, namely a semi-supervised learning window and an active learning window, can be preset, the semi-supervised learning window is arranged in front of the active learning window, and the length of the semi-supervised learning window is greater than that of the active learning window. When sample data are selected from the data pool to form a training set, the sample data in the data pool are sequentially input into the semi-supervised learning window and the active learning window according to the sequence of the time generated by the sample data. When the sample data is input into the time window, in response to the semi-supervised learning triggering condition being met, namely the condition that the magnetic disk corresponding to the sample data representation input last time in the semi-supervised learning window is in fault is met, predicting each sample data in the semi-supervised learning window according to a first-in first-out sequence by adopting a semi-supervised learning algorithm so as to obtain a probability value (probability value 1) that each sample data represents that the magnetic disk is in a healthy state and a probability value (probability value 2) that the magnetic disk is in a fault state, and determining the value of each sample data according to the probability value 1 and the probability value 2 corresponding to each sample data and a preset value calculation rule (which can be set according to the actual requirement of a user). In this way, sample data with value larger than a preset threshold value in the semi-supervised learning window can be selected and put into the training set, and the training set obtained in this way is used for disk failure prediction model training, so that a disk failure prediction model with more accurate prediction result can be obtained.

Taking the scenario shown in fig. 1 as an example, the method shown in fig. 3 may be performed by the server 130. Server 120 receives disk information from the second disk of storage server cluster 110 and sends it to server 130. The server 130 stores the acquired disk information of the second disk into a local data pool in a preset array format, then sequentially inputs sample data in the data pool into the semi-supervised learning window and the active learning window according to a time sequence, predicts each sample data in the semi-supervised learning window by using a semi-supervised learning algorithm when the sample data input most recently in the semi-supervised learning window represents that a corresponding disk has a fault, thereby obtaining a probability value 1 that each sample data represents that the disk is in a healthy state and a probability value 2 that the disk is in a fault state, determines the value of each sample data according to the probability value 1 and the probability value 2 corresponding to each sample data, selects the sample data of which the internal value is greater than a preset threshold value, puts one sample data into a training set, and finally trains a disk fault prediction model by using the training set.

According to the embodiment of the disclosure, in the process of inputting sample data into the time window, if sample data representing a failure of a disk does not appear in the semi-supervised learning window until the two time windows (the semi-supervised learning window and the active learning window) are filled with data, emptying the sample data in the two time windows, and starting the input of the next round of sample data.

In this way, semi-supervised learning can be performed on line, the time cost required by training of the disk failure prediction model is reduced, and the iteration efficiency of the disk failure prediction model is improved. And the occupation of large-scale sample data on additional storage equipment is reduced on the basis of ensuring the model training effect.

FIG. 4 shows a flow diagram of a disk failure prediction model training method according to an embodiment of the disclosure. As shown in FIG. 4, the disk failure prediction model training method includes the following steps S301-S305 and S406-S408:

in step S301, the disk information of the second disk is stored in a training sample data pool, each disk information of each second disk corresponds to an array, and each array represents one sample data;

in step S305, selecting sample data with a value greater than a preset threshold, and adding the selected sample data to a training set;

in step S406, determining the ratio of positive samples to negative samples in the current training set;

in step S407, in response to determining that the ratio of the positive samples to the negative samples in the current training set is smaller than a preset ratio and that the input sample data in the active learning window is full, sequentially predicting the sample data in the active learning window by using an active learning algorithm to obtain a probability value that the corresponding sample data represents that the disk is in a healthy state;

in step S408, sample data corresponding to the probability value falling within the preset probability interval is selected, and the selected sample data is also added to the training set.

According to the embodiment of the disclosure, if the training set obtained in steps S301 to S305 is directly used for disk failure prediction model training, the generalization capability of the trained disk failure prediction model may be poor because the ratio of the positive samples to the negative samples in the training set is smaller than the preset ratio. Therefore, after the steps S301 to S305 are executed, the steps S406 to S408 are continuously executed, that is, in response to that the active learning trigger condition is satisfied, a first operation is executed to select sample data from the active learning window, where a probability value representing that the disk is in a healthy state falls within a preset probability interval, and place the sample data into the training set, so that a ratio of positive samples to negative samples in the training set can be increased, and thus the generalization capability of the disk failure prediction model trained by using the training set obtained in this way can be improved.

According to an embodiment of the present disclosure, the first operation includes: predicting sample data in the active learning window in sequence by adopting an active learning algorithm to obtain a probability value of representing the health state of the disk by the corresponding sample data; and selecting sample data corresponding to the probability value falling in a preset probability interval, and adding the selected sample data into the training set.

According to the embodiment of the present disclosure, after selecting sample data corresponding to a probability value falling within a preset probability interval and adding the selected sample data to the training set, the training process of the disk failure prediction model further includes:

and in response to the fact that the proportion of the positive samples and the negative samples in the current training set is smaller than the preset proportion, returning to continue executing the first operation until the proportion of the positive samples and the negative samples in the current training set is equal to the preset proportion. Illustratively, the preset ratio may be 2 to 4.

In this way, the ratio of the positive samples to the negative samples in the training set can be increased to an ideal value (the preset ratio), and thus the generalization capability of the disk failure prediction model trained by using the training set obtained in this way can be improved to the greatest extent possible.

According to the embodiment of the present disclosure, after step S406 is performed, if it is determined that the ratio of the positive samples to the negative samples in the current training set is equal to the preset ratio, no sample data representing the positive samples is selected from the active learning window.

According to an embodiment of the present disclosure, the training set includes a first training set and a second training set. Wherein, in the case that the training set comprises sample data selected by adopting the active learning algorithm, the first training set comprises a first subset and a second subset; in the case that the training set does not include sample data selected using the active learning algorithm, the first training set includes only the second subset; under the condition that the training set comprises sample data selected by adopting the active learning algorithm, the second training set comprises a third subset and a fourth subset; in the case that the training set does not include sample data selected using the active learning algorithm, the second training set includes only the fourth subset; wherein the sample data included in each of the first subset and the third subset is selected using the active learning algorithm, and the sample data included in each of the second subset and the fourth subset is selected using the semi-supervised learning algorithm.

The first training set is used for training of the first classifier; the second training set is used for training of the second classifier. The initial networks corresponding to the first classifier and the second classifier may be selected according to actual requirements, for example.

According to an embodiment of the present disclosure, the training process of the disk failure prediction model further includes:

configuring a first tag value (such as 0) and a second tag value (such as 1), wherein the first tag value is used for representing that the disk state corresponding to the sample data is a healthy state, and the second tag value is used for representing that the disk state corresponding to the sample data is a fault state; if 0 indicates healthy, 1 indicates fault;

the first label value is marked on the sample data in the first subset and the third subset, that is, the disk states corresponding to the sample data in the first subset and the third subset are healthy;

the tag value of each sample data marker in the second subset is determined by: predicting data characteristics corresponding to each sample data in the second subset by adopting the semi-supervised learning algorithm to obtain a probability value of each sample data representing that the disk is in a healthy state and a probability value of the disk in a fault state; marking the first label value for the sample data with the probability value representing the health state being greater than the probability value representing the fault state, and marking the second label value for the sample data with the probability value representing the health state being less than the probability value representing the fault state;

the tag value of each sample data marker in the fourth subset is determined by: predicting the difference value of the data characteristics corresponding to each sample data in the fourth subset by adopting the semi-supervised learning algorithm to obtain the probability value of each sample data representing that the disk is in a healthy state and the probability value of the disk in a fault state; and marking the first label value for the sample data with the probability value representing the health state being greater than the probability value representing the fault state, and marking the second label value for the sample data with the probability value representing the health state being less than the probability value representing the fault state.

In this way, can realize semi-automatization mark to large-scale sample data, can promote marking efficiency under the circumstances of guaranteeing the mark degree of accuracy, shorten marking time, can also reduce or even eliminate the difference that artifical mark brought, can also practice thrift artifical mark cost simultaneously.

For the disks of different manufacturers and different models, the disks have different attribute distributions, the prediction result is inaccurate when the same fault prediction model is used for prediction, most researches provide the inaccurate fault prediction model with the technical scheme of transfer learning, normal and fault sample data of the disks of different models are adopted for training, the difference between different samples is reduced, and then the model is transferred; for new models of disks, there are often fewer disk samples and it takes a long time to obtain their failure samples, which is not enough to migrate the failure prediction model with only normal samples at an early stage.

According to an embodiment of the present disclosure, in a case that the second disk is different from the first disk in model, a training process of the disk failure prediction model includes:

In this way, the problem that in the case of a zero fault sample of a new type of disk in the related technology, the construction of an early fault prediction model is difficult can be overcome, and the disk fault prediction model with strong generalization capability can be realized.

According to the embodiment of the present disclosure, the method for obtaining the intermediate prediction model by using the disk information training of the second disk is the same as or similar to the method for obtaining the disk failure prediction model by using the disk information training of the second disk in the foregoing embodiment, and this embodiment is not described herein again.

Fig. 5 shows a block diagram of a configuration of a disk failure prediction apparatus according to an embodiment of the present disclosure. The apparatus may be implemented as part or all of an electronic device through software, hardware, or a combination of both.

As shown in fig. 5, the disk failure prediction apparatus 500 includes an acquisition module 501 and a first input module 502.

The acquisition module 501 is configured to acquire disk information of a first disk, where the disk information includes a logical disk error log, physical disk state information, and disk SMART data;

a first input module 502, configured to input the acquired disk information into a disk failure prediction model to predict a probability of failure of the first disk;

According to the embodiment of the disclosure, under the condition that the second disk and the first disk are the same in model, the training of the disk failure prediction model is realized through a prediction model training device. The predictive model training apparatus includes: the device comprises a storage module, a second input module, a first prediction module, a first determination module and a first screening module.

The storage module is configured to store the disk information of the second disk into a training sample data pool, each disk information of each second disk corresponds to an array, and each array represents one sample data;

the second input module is configured to input sample data in the training sample data pool into time windows in sequence according to time sequence, wherein the time windows comprise a semi-supervised learning window with the time being before and an active learning window with the time being after;

the first prediction module is configured to respond to the fact that a corresponding disk of sample data representation input last time in the semi-supervised learning window is in fault, predict each sample data in the semi-supervised learning window by adopting a semi-supervised learning algorithm, and obtain a probability value that each sample data representation disk is in a healthy state and a probability value that the disk is in a fault state;

the first determining module is configured to determine the value of each sample data according to the probability value of the health state of the represented disk of each sample data and the probability value of the fault state of the disk; and

and the first screening module is configured to select sample data with the value larger than a preset threshold value and add the selected sample data into the training set.

According to an embodiment of the present disclosure, the prediction model training apparatus includes, in addition to a storage module, a second input module, a first prediction module, a first determination module, and a first screening module: a second determination module, a second prediction module, and a second screening module.

A second determination module configured to determine a ratio of positive samples and negative samples in the current training set;

the second prediction module is configured to respond to the situation that the proportion of positive samples and negative samples in the current training set is smaller than a preset proportion and the input sample data in the active learning window is full, sequentially predict the sample data in the active learning window by adopting an active learning algorithm, and obtain a probability value that the corresponding sample data represents that the disk is in a healthy state; and

and the second screening module is configured to select sample data corresponding to the probability value falling in the preset probability interval and add the selected sample data into the training set.

According to an embodiment of the present disclosure, the training set includes a first training set and a second training set.

Under the condition that the training set comprises sample data selected by adopting the active learning algorithm, the first training set comprises a first subset and a second subset; in the case where no sample data selected using the active learning algorithm is included in the training set, the first training set includes only the second subset.

Under the condition that the training set comprises sample data selected by adopting the active learning algorithm, the second training set comprises a third subset and a fourth subset; in the case where the training set does not include sample data selected using the active learning algorithm, the second training set includes only the fourth subset.

Sample data included in each of the first subset and the third subset is selected using the active learning algorithm, and sample data included in each of the second subset and the fourth subset is selected using the semi-supervised learning algorithm.

According to an embodiment of the present disclosure, the prediction model training apparatus includes, in addition to a storage module, a second input module, a first prediction module, a first determination module, a first screening module, a second determination module, a second prediction module, and a second screening module: the device comprises a configuration module, a third determination module and a fourth determination module.

The configuration module is configured to configure a first tag value and a second tag value, wherein the first tag value is used for representing that the disk state corresponding to the sample data is a healthy state, and the second tag value is used for representing that the disk state corresponding to the sample data is a fault state; sample data in both the first subset and the third subset is tagged with the first tag value;

the label value of each sample data mark in the second subset is determined by executing the function corresponding to the third determining module. The third determining module includes: a first prediction unit and a first flag unit.

The first prediction unit is configured to predict the data characteristics corresponding to each sample data in the second subset by adopting the semi-supervised learning algorithm to obtain a probability value that each sample data represents that the disk is in a healthy state and a probability value that the disk is in a fault state;

the first marking unit is configured to mark the first label value for the sample data with the probability value representing the health state being greater than the probability value representing the fault state, and mark the second label value for the sample data with the probability value representing the health state being smaller than the probability value representing the fault state.

The label value of each sample data mark in the fourth subset is determined by executing the function corresponding to the fourth determining module. The fourth determining module includes: a second prediction unit and a second flag unit.

The second prediction unit is configured to predict the difference value of the data characteristics corresponding to each sample data in the fourth subset by adopting the semi-supervised learning algorithm to obtain a probability value that each sample data represents that the disk is in a healthy state and a probability value that the disk is in a fault state;

and the second marking unit is configured to mark the first label value for the sample data with the probability value representing the health state being greater than the probability value representing the fault state, and mark the second label value for the sample data with the probability value representing the health state being less than the probability value representing the fault state.

According to an embodiment of the present disclosure, the prediction model training apparatus further includes, in addition to the storage module, the second input module, the first prediction module, the first determination module, the first screening module, the second determination module, the second prediction module and the second screening module, the configuration module, the third determination module and the fourth determination module: and the skipping module is configured to, after selecting sample data corresponding to the probability value falling in a preset probability interval and adding the selected sample data to the training set, in response to that the ratio of the positive samples to the negative samples in the current training set is smaller than the preset ratio, return to continue executing the functions corresponding to the second prediction module and the second screening module until the ratio of the positive samples to the negative samples in the current training set is equal to the preset ratio.

According to the embodiment of the disclosure, under the condition that the model of the second disk is different from that of the first disk, the training of the disk failure prediction model is realized through a prediction model training device. The predictive model training apparatus includes: the device comprises a first training module and a transfer learning module.

The first training module is configured to train by using the disk information of the second disk to obtain an intermediate prediction model;

and the transfer learning module is configured to perform transfer learning on the intermediate prediction model by using the collected disk information of the first disk to obtain the disk failure prediction model, wherein the collected disk information of the first disk is positive sample data.

The present disclosure also discloses an electronic device, and fig. 6 shows a block diagram of the electronic device according to an embodiment of the present disclosure.

As shown in fig. 6, the electronic device includes a memory and a processor, where the memory is to store one or more computer instructions, where the one or more computer instructions are executed by the processor to implement a method according to an embodiment of the disclosure.

According to an embodiment of the present disclosure, a disk failure prediction method includes:

the disk failure prediction model comprises a first classifier and a second classifier, the first classifier is used for predicting disk failures by using data features corresponding to the disk information, the second classifier is used for predicting disk failures by using difference values of the data features corresponding to the disk information, and a weighted average value of a probability value of the first disk failure obtained by the first classifier and a probability value of the first disk failure obtained by the second classifier is the probability value of the first disk failure obtained by the disk failure prediction model; the disk failure prediction model is obtained by training at least by utilizing disk information of the second disk.

According to an embodiment of the present disclosure, in a case that the second disk is the same as the first disk in model, a training process of the disk failure prediction model includes:

determining the value of each sample data according to the probability value of the health state of the represented disk of each sample data and the probability value of the fault state of the disk; and

wherein: the first operation includes: sequentially predicting the sample data in the active learning window by adopting an active learning algorithm to obtain a probability value of the health state of the corresponding sample data representation disk; and selecting sample data corresponding to the probability value falling in the preset probability interval, and adding the selected sample data into the training set.

According to an embodiment of the present disclosure, the training set comprises a first training set and a second training set; wherein the content of the first and second substances,

under the condition that the training set comprises sample data selected by adopting the active learning algorithm, the first training set comprises a first subset and a second subset; in the case that the training set does not include sample data selected using the active learning algorithm, the first training set includes only the second subset;

wherein the content of the first and second substances,

the tag value of each sample data marker in the second subset is determined by:

predicting data characteristics corresponding to each sample data in the second subset by adopting the semi-supervised learning algorithm to obtain a probability value that each sample data represents that the disk is in a healthy state and a probability value that the disk is in a fault state;

the tag value of each sample data marker in the fourth subset is determined by:

predicting the difference value of the data characteristics corresponding to each sample data in the fourth subset by adopting the semi-supervised learning algorithm to obtain the probability value of each sample data representing that the disk is in a healthy state and the probability value of the disk in a fault state;

and then, performing migration learning on the intermediate prediction model by using the collected disk information of the first disk, wherein the collected disk information of the first disk is positive sample data.

As shown in fig. 7, the computer system includes a processing unit that can execute the various methods in the above-described embodiments according to a program stored in a Read Only Memory (ROM) or a program loaded from a storage section into a Random Access Memory (RAM). In the RAM, various programs and data necessary for the operation of the computer system are also stored. The processing unit, the ROM, and the RAM are connected to each other by a bus. An input/output (I/O) interface is also connected to the bus.

The following components are connected to the I/O interface: an input section including a keyboard, a mouse, and the like; an output section including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section including a hard disk and the like; and a communication section including a network interface card such as a LAN card, a modem, or the like. The communication section performs a communication process via a network such as the internet. The drive is also connected to the I/O interface as needed. A removable medium such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive as necessary, so that a computer program read out therefrom is mounted into the storage section as necessary. The processing unit can be realized as a CPU, a GPU, a TPU, an FPGA, an NPU and other processing units.

In particular, the above described methods may be implemented as computer software programs according to embodiments of the present disclosure. For example, embodiments of the present disclosure include a computer program product comprising a computer program tangibly embodied on a machine-readable medium, the computer program comprising program code for performing the above-described method. In such an embodiment, the computer program may be downloaded and installed from a network via the communication section, and/or installed from a removable medium.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units or modules described in the embodiments of the present disclosure may be implemented by software or by programmable hardware. The units or modules described may also be provided in a processor, and the names of the units or modules do not in some cases constitute a limitation on the units or modules themselves.

As another aspect, the present disclosure also provides a computer-readable storage medium, which may be a computer-readable storage medium included in the electronic device or the computer system in the above embodiments; or it may be a separate computer readable storage medium not incorporated into the device. The computer readable storage medium stores one or more programs for use by one or more processors in performing the methods described in the present disclosure.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is possible without departing from the inventive concept. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Claims

1. A disk failure prediction method comprises the following steps:

2. The method of claim 1, wherein in the case that the second disk is the same as the first disk in model, the training process of the disk failure prediction model comprises:

responding to the fact that a magnetic disk corresponding to the sample data representation input last time in the semi-supervised learning window is in fault, predicting each sample data in the semi-supervised learning window by adopting a semi-supervised learning algorithm, and obtaining the probability value that each sample data represents the magnetic disk to be in a healthy state and the probability value that the magnetic disk is in a fault state;

3. The method of claim 2, wherein the training process of the disk failure prediction model further comprises:

4. The method of claim 3, wherein the training set comprises a first training set and a second training set; wherein the content of the first and second substances,

wherein the content of the first and second substances,

the sample data included in each of the first subset and the third subset is selected using the active learning algorithm, and the sample data included in each of the second subset and the fourth subset is selected using the semi-supervised learning algorithm

5. The method of claim 4, wherein the training process of the disk failure prediction model further comprises:

the tag value of each sample data marker in the second subset is determined by:

the tag value of each sample data marker in the fourth subset is determined by:

6. The method of claim 3, wherein after selecting sample data corresponding to the probability value falling within a preset probability interval and adding the selected sample data to the training set, the training process of the disk failure prediction model further comprises:

7. The method of claim 1, wherein in the case that the second disk is different in model from the first disk, the training process of the disk failure prediction model comprises:

8. An electronic device comprising a memory and a processor; wherein the memory is to store one or more computer instructions, wherein the one or more computer instructions are to be executed by the processor to implement the method steps of any one of claims 1-7.

9. A computer-readable storage medium having stored thereon computer instructions, characterized in that the computer instructions, when executed by a processor, carry out the method steps of any of claims 1-7.

10. A computer program product comprising computer instructions which, when executed by a processor, carry out the method steps of any of claims 1 to 7.