WO2022001125A1

WO2022001125A1 - Method, system and device for predicting storage failure in storage system

Info

Publication number: WO2022001125A1
Application number: PCT/CN2021/076815
Authority: WO
Inventors: 晏海龙; 张东; 颜秉珩
Original assignee: 苏州浪潮智能科技有限公司
Priority date: 2020-06-30
Filing date: 2021-02-19
Publication date: 2022-01-06
Also published as: CN111858265A

Abstract

A method, system and device for predicting storage failure in a storage system. On the basis of operating state data of a storage medium having time series characteristics and a cyclic neural network for processing time series characteristic data, failure prediction for a storage medium is achieved, which can significantly advance the time at which failure is predicted, being capable of predicting the failure of the storage medium at least a few days in advance, thereby increasing system security. Furthermore, the operating state data of the storage medium is processed to obtain operating state data, the correlation between said data and the operating variation conditions of the storage medium being higher than a certain value, and model training is performed, thus the amount of data for model training is reduced under the principle of ensuring that important data and information experience less loss, so as to increase the speed of model training.

Description

A storage failure prediction method, system and device for a storage system

This application claims the priority of the Chinese patent application filed on June 30, 2020 with the application number 202010616525.3 and the invention titled "A storage failure prediction method, system and device for a storage system", the entire contents of which are Incorporated herein by reference.

technical field

The present invention relates to the field of storage, and in particular, to a storage failure prediction method, system and device of a storage system.

Background technique

With the development of the Internet, all walks of life tend to be digitized, and the amount of data that needs to be stored shows an explosive growth. At present, most of these data are stored in the Internet storage system, specifically in the storage medium of the storage system, so the quality of the storage medium determines the storage performance of the storage system. Once the storage medium fails, the external data services provided by the storage system may become unavailable, or the stored data may be permanently lost, causing huge losses to users.

In the prior art, the storage fault handling mechanism of the storage system is mainly divided into two types:

1) Passive fault tolerance mechanism: Passive fault tolerance mechanism means that the system backs up the data stored in the storage medium after the failure of the storage medium to restore the system. However, backing up data needs to be based on a large number of storage media, which increases the operating burden of the system; moreover, if a user initiates a data request during system data backup, the data request will have a certain response delay, which is not conducive to user experience.

2) Active fault tolerance mechanism: The active fault tolerance mechanism means that the system predicts the failure of the storage medium in advance, so as to perform data migration and data backup for the storage medium that is about to fail in advance, thereby greatly reducing the risk of data loss. At present, the commonly used method for predicting system storage failures is to set safety thresholds for multiple operating parameters of the storage medium in advance, monitor the operating parameter values of the storage medium during the operation of the storage system, and determine the value of each operating parameter of the storage medium when the storage medium is in operation. When the operating parameter value exceeds its corresponding safety threshold, it is considered that the storage medium will fail within 24 hours, and the system will issue an early warning message. However, this system storage failure prediction method can predict in advance that the storage medium will fail in a short time (within 24 hours), that is, the time left for managers to process system data is short, which is not conducive to the overall security of the system.

Therefore, how to provide a solution to the above technical problem is a problem that those skilled in the art need to solve at present.

SUMMARY OF THE INVENTION

The purpose of the present invention is to provide a storage fault prediction method, system and device for a storage system, based on the running state data of the storage medium with time series characteristics and a cyclic neural network for processing the time series characteristic data, to realize the fault detection of the storage medium. Prediction, the failure prediction time can be significantly advanced, and the failure of the storage medium can be predicted at least a few days in advance, thereby improving system security; moreover, the application processes the operation status data of the storage medium to obtain a high correlation with the operation change of the storage medium Model training is performed on a certain value of running state data, so as to reduce the amount of data for model training under the principle of ensuring less loss of important data information, so as to speed up model training.

In order to solve the above technical problems, the present invention provides a storage failure prediction method of a storage system, including:

Acquiring in advance first operating state data of the storage medium of the storage system running normally within a preset first time and second operating state data of running within a preset second time before the failure occurs;

Preprocessing the first operating state data and the second operating state data to obtain operating state data whose correlation with the operating change of the storage medium is higher than a certain value;

training a pre-established recurrent neural network model based on the operating state data to obtain a recurrent neural network model for predicting the failure of the storage medium;

During the operation of the storage system, the current operating state data of the storage medium is analyzed and processed based on the cyclic neural network model to obtain a fault prediction result of the storage medium.

Preferably, the process of pre-obtaining the first operating state data of the storage medium of the storage system running normally within the preset first time and the second operating state data of running within the preset second time before the failure occurs, includes:

Acquiring in advance a plurality of first operating state data in which the storage medium of the storage system operates normally within a preset first time, and using the plurality of first operating state data as negative samples;

Acquiring multiple pieces of second operating state data that are run by the storage medium within a preset second time before the failure occurs, and using the multiple pieces of second operating state data as positive samples;

The proportions of the positive samples and the negative samples are balanced, and the two together form a sample set for training the recurrent neural network model.

Preferably, the first operating state data and the second operating state data are preprocessed, so as to obtain operating state data whose correlation with the operating variation of the storage medium is higher than a certain value; The process of training the pre-established recurrent neural network model with the state data to obtain the recurrent neural network model for predicting the failure of the storage medium, including:

A sample matrix is constructed based on the acquired n samples x _i =(x _i1 ,x _i2 ,...,x _ip ) ^T , i=1,2,...,n; wherein, each sample collects p-dimensional vector data x=(x ₁ , x ₂ ,...,x _p ) ^T , n>p and both n and p are positive integers;

based on standard transformation relations

Standard transformation is performed on the sample matrix to obtain a standardized matrix Z; wherein,

Relational Expression Based on Sample Correlation Matrix

Obtain the sample correlation matrix R, and _{solve the characteristic equation |R-λI p} |=0 of the sample correlation matrix R to obtain p characteristic roots;

based on

Determine the value of m and solve for each λ _j ,j=1,2,...,m based on Rb=λ _{j b to get the identity matrix}

Among them, Q is the minimum utilization rate of preset information, p>m and m is a positive integer;

Converting Relational Expressions Based on Indicators

A new sample variable U _{ij is obtained} , and a pre-established recurrent neural network model is trained based on the new sample variable U _ij to obtain a recurrent neural network model for predicting the failure of the storage medium.

Preferably, the process of training a pre-established RNN model based on the new sample variable U _ij to obtain a RNN model for predicting the failure of the storage medium includes:

Obtain the arithmetic mean μ and standard deviation σ of the new variables U _ij of each sample, and standardize each new variable based on the standardized relationship g2=(g1-μ)/σ to obtain the value of each standardized variable; among them, g1 is The variable value before each new variable is standardized, and g2 is the variable value after each new variable is standardized;

The pre-established recurrent neural network model is trained based on the absolute value of each standardized variable value, so as to obtain a recurrent neural network model for predicting the failure of the storage medium.

Preferably, the process of training a pre-established recurrent neural network model based on the operating state data to obtain a recurrent neural network model for predicting the failure of the storage medium includes:

dividing the sample set composed of the operating state data into a training set, a verification set and a test set;

The pre-established cyclic neural network model is trained based on the training set to obtain a first cyclic neural network model;

Verifying the first RNN model based on the verification set, and judging whether the training of the first RNN model meets the standard according to the verification result;

If the training meets the standard, then test the first recurrent neural network model based on the test set, and determine whether the test of the first recurrent neural network model passes according to the test result;

If the test is passed, the first recurrent neural network model that has passed the test is used as the recurrent neural network model for predicting the failure of the storage medium;

If the test fails, obtain a new sample set again to continue training the first recurrent neural network model, and return to the step of testing the first recurrent neural network model based on the test set;

If the training fails to meet the standard, a new sample set is obtained again to continue training the first recurrent neural network model, and the step of verifying the first recurrent neural network model based on the verification set is returned.

Preferably, both the first operating state data and the second operating state data of the storage medium are specifically SMART data of the storage medium.

Preferably, the recurrent neural network model is specifically BERT or Transformer.

Preferably, the storage failure prediction method further includes:

The failure prediction result of the storage medium is recorded in a system log, and the failure prediction result is displayed on the management interface of the storage system.

In order to solve the above technical problems, the present invention also provides a storage failure prediction system of a storage system, including:

a data acquisition module, used for pre-acquiring first operating state data of the storage medium of the storage system operating normally within a preset first time and second operating state data of operating within a preset second time before the failure occurs;

a data extraction module, configured to preprocess the first operating state data and the second operating state data to obtain operating state data whose correlation with the operating change of the storage medium is higher than a certain value;

a model training module for training a pre-established recurrent neural network model based on the operating state data to obtain a recurrent neural network model for predicting the failure of the storage medium;

The fault prediction module is configured to analyze and process the current operating state data of the storage medium based on the cyclic neural network model during the operation of the storage system to obtain a fault prediction result of the storage medium.

In order to solve the above technical problems, the present invention also provides a storage failure prediction device of a storage system, including:

memory for storing computer programs;

The processor is configured to implement the steps of any one of the above storage system storage failure prediction methods when executing the computer program.

The present invention provides a storage failure prediction method for a storage system, which includes pre-acquiring first operating state data of the storage medium of the storage system running normally within a preset first time and data of running in a preset second time before the failure occurs. second operating state data; preprocessing the first operating state data and the second operating state data to obtain operating state data whose correlation with the operation change of the storage medium is higher than a certain value; The established recurrent neural network model is trained to obtain a recurrent neural network model for predicting the failure of the storage medium; during the operation of the storage system, the current operating state data of the storage medium is analyzed and processed based on the recurrent neural network model , to obtain the failure prediction result of the storage medium. It can be seen that the present application realizes the prediction of the failure of the storage medium based on the operating state data of the storage medium with time-series characteristics and the recurrent neural network for processing the time-series characteristic data, which can significantly advance the failure prediction time, at least several days in advance. In addition, the application processes the operation state data of the storage medium to obtain the operation state data whose correlation with the operation change of the storage medium is higher than a certain value for model training, so as to ensure important data. Reduce the amount of data for model training under the principle of less information loss to speed up model training.

The present invention also provides a storage failure prediction system and device for a storage system, which have the same beneficial effects as the above storage failure prediction method.

Description of drawings

In order to illustrate the technical solutions in the embodiments of the present invention more clearly, the following briefly introduces the prior art and the accompanying drawings required in the embodiments. Obviously, the drawings in the following description are only some of the present invention. In the embodiments, for those of ordinary skill in the art, other drawings can also be obtained according to these drawings without any creative effort.

1 is a flowchart of a method for predicting a storage failure of a storage system according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of an overall prediction of a storage system according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of training a recurrent neural network according to an embodiment of the present invention.

detailed description

The core of the present invention is to provide a storage fault prediction method, system and device of a storage system, based on the running state data of the storage medium with time series characteristics and the cyclic neural network for processing the time series characteristic data, to realize the fault detection of the storage medium. Prediction, the failure prediction time can be significantly advanced, and the failure of the storage medium can be predicted at least a few days in advance, thereby improving system security.

In order to make the purposes, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments These are some embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

Please refer to FIG. 1. FIG. 1 is a flowchart of a method for predicting a storage failure of a storage system according to an embodiment of the present invention.

The storage failure prediction method of the storage system includes:

Step S1: Pre-acquire first operating state data of the storage system of the storage system that operates normally within a preset first time and second operating state data that operate within a preset second time before the failure occurs.

It should be noted that the preset of this application is set in advance, and only needs to be set once, and does not need to be reset unless it needs to be modified according to the actual situation.

Specifically, the present application obtains in advance the first operating state data of the storage medium (such as a mechanical hard disk, solid-state hard disk, flash memory, etc.) of a storage system (such as a cloud server) in normal operation within a preset first time, and simultaneously obtains the storage medium The storage medium of the system presets the second operating state data that runs for a second time before the failure occurs, the purpose is to obtain the operating state data with time series characteristics for subsequent training of recurrent neural network models that are more suitable for processing time series characteristic data. .

Step S2: Preprocess the first operating state data and the second operating state data to obtain operating state data whose correlation with the operating change of the storage medium is higher than a certain value.

Specifically, considering that in the first operating state data and the second operating state data acquired in step S1, not all the operating state data can well represent the operating changes of the storage medium, this application uses the first operating state data The purpose of preprocessing with the second operating state data is to obtain the operating state data whose correlation with the operating changes of the storage medium is higher than a certain value, so as to perform subsequent model training based on the processed operating state data, so as to ensure the important Reduce the amount of data for model training under the principle of less loss of data information, thereby speeding up model training.

Step S3: Train the pre-established recurrent neural network model based on the operating state data to obtain a recurrent neural network model for predicting the failure of the storage medium.

Specifically, the first operating state data is data indicating that the storage medium operates normally within a preset first time, which is used as sample data for informing the recurrent neural network model of the normal operating state of the storage medium; the second operating state data is Data representing the operation of the storage medium for a preset second time before the failure occurs, which is used as sample data for informing the recurrent neural network model of the operation state of the storage medium before the failure occurs.

Based on the operating state data obtained by processing the first operating state data and the second operating state data, the pre-established recurrent neural network model is trained to obtain a recurrent neural network model for predicting the failure of the storage medium , for subsequent prediction of storage media failures.

Step S4: During the operation of the storage system, analyze and process the current operating state data of the storage medium based on the cyclic neural network model to obtain a fault prediction result of the storage medium.

Specifically, during the operation of the storage system, the running status data of the storage medium of the storage system is acquired in real time, and the acquired running status data of the storage medium is analyzed and processed based on the cyclic neural network model, so as to obtain the fault prediction result of the storage medium, For managers' reference. It should be noted that, based on the operating state data of the storage medium with time series characteristics and the recurrent neural network used to process the time series characteristic data, the failure of the storage medium can be predicted at least a few days in advance, thus leaving more time for managers to process system data. time, which is beneficial to the overall security of the system.

The present invention provides a storage failure prediction method for a storage system, which includes pre-acquiring first operating state data of the storage medium of the storage system running normally within a preset first time and data of running in a preset second time before the failure occurs. second operating state data; preprocessing the first operating state data and the second operating state data to obtain operating state data whose correlation with the operating change of the storage medium is higher than a certain value; based on the operating state data A good recurrent neural network model is trained to obtain a recurrent neural network model for predicting the failure of the storage medium; during the operation of the storage system, the current operating state data of the storage medium is analyzed and processed based on the recurrent neural network model. Obtain the failure prediction result of the storage medium. It can be seen that the present application realizes the prediction of the failure of the storage medium based on the operating state data of the storage medium with time-series characteristics and the recurrent neural network for processing the time-series characteristic data, which can significantly advance the failure prediction time, at least several days in advance. In addition, the application processes the operation state data of the storage medium to obtain the operation state data whose correlation with the operation change of the storage medium is higher than a certain value for model training, so as to ensure important data. Reduce the amount of data for model training under the principle of less information loss to speed up model training.

On the basis of the above-mentioned embodiment:

Please refer to FIG. 2 , which is a schematic diagram of an overall prediction of a storage system according to an embodiment of the present invention.

As an optional embodiment, pre-acquire first operating state data of the storage medium of the storage system running normally within a preset first time and second operating state data of running within a preset second time before the fault occurs data process, including:

Acquiring multiple pieces of second operating state data that the storage medium runs within a preset second time before the failure occurs, and using the multiple pieces of second operating state data as positive samples;

Among them, the proportion of positive samples and negative samples is balanced, and the two together form a sample set for training the recurrent neural network model.

Specifically, the storage medium obtained in advance by the present application has a plurality of first operating state data that all run normally within the preset first time, and the plurality of first operating state data are used as negative samples for training the recurrent neural network model; Similarly, the storage medium obtained in advance by the present application has a plurality of second operating state data that run within the preset second time before the failure occurs, and the plurality of second operating state data are used as positive samples for training the recurrent neural network model. .

It should be noted that the ratio of positive samples and negative samples here should be as balanced as possible, that is, the amount of data that constitutes a positive sample and the amount of data that constitutes a negative sample should be as equal as possible.

As an optional embodiment, the first operation state data and the second operation state data are preprocessed to obtain operation state data whose correlation with the operation change of the storage medium is higher than a certain value; based on the operation state data The process of training the pre-established recurrent neural network model to obtain the recurrent neural network model for predicting the failure of the storage medium, including:

based on standard transformation relations

Perform standard transformation on the sample matrix to obtain a standardized matrix Z; among them,

Relational Expression Based on Sample Correlation Matrix

based on

Converting Relational Expressions Based on Indicators

A new sample variable U _{ij is obtained} , and the pre-established recurrent neural network model is trained based on the new sample variable U _ij to obtain a recurrent neural network model for predicting the failure of the storage medium.

Specifically, when training the recurrent neural network model based on the first operating state data and the second operating state data, the first operating state data and the second operating state data may be preprocessed as follows:

When acquiring the first operating state data and the second operating state data, n samples are specifically acquired, which are expressed as: x _i =(x _i1 , x _i2 ,...,x _ip ) ^T , i=1,2, ...,n; wherein, each sample contains p-dimensional vector data, specifically, the data of p-type operating states within a period of time constitute p-dimensional operating state data, and the p-dimensional operating state data constitute a sample, expressed as: x = (x ₁ ,x ₂ ,...,x _p ) ^T .

Based on the acquired n samples x _i =(x _i1 ,x _i2 ,...,x _ip ) ^T , i=1,2,...,n construct a sample matrix, and based on the standard transformation relation

Perform standard transformation on the sample matrix to obtain the standardized matrix Z, and then obtain the relational expression based on the sample correlation matrix

The sample correlation matrix R is obtained, and the characteristic equation |R-λI _p |=0 of the sample correlation matrix R is solved to obtain p characteristic roots, which are expressed as λ _j , j=1,2,...,p.

based on

Determine the value of m, set Q=85%, even if the utilization rate of information reaches more than 85%, and solve each λ _j ,j=1,2,...,m _{based on Rb=λ j b to obtain the identity matrix}

Then convert the relational expression based on the indicator

The new sample variable U _{ij is obtained} , that is, the new sample variable U _ij contains n samples, and each sample contains m-dimensional new vector data.

It can be seen that in this application, the m-dimensional Y space is used to replace the p-dimensional X space (m<p, the best comprehensive simplification for multivariate data), and the important information lost by the low-dimensional Y space instead of the high-dimensional X space is very important. Less, that is, under the principle of ensuring less loss of important data information, dimensionality reduction processing is performed on the high-dimensional variable space to reduce the amount of data for model training and speed up the training speed of the recurrent neural network model.

Based on this, the present application _{trains the pre-established cyclic neural network model based on the new sample variable U ij} to obtain a cyclic neural network model for predicting the failure of the storage medium.

As an optional embodiment, the process of training the pre-established recurrent neural network model based on the new sample variable U _ij to obtain the recurrent neural network model for predicting the failure of the storage medium includes:

Specifically, when the recurrent neural network model is trained based on the first operating state data and the second operating state data, the first operating state data and the second operating state data can also be processed as follows:

Considering that in a multivariate system, due to the different nature of each variable, it usually has different dimensions and orders of magnitude. When the level of each variable is very different, if the original variable value is directly used for analysis, it will highlight that the value is higher. The role of the variables in the comprehensive analysis relatively weakens the role of the variables with lower numerical levels. Therefore, in order to ensure the reliability of the comprehensive analysis results, this application also _{standardizes the new variables U ij} for each sample. Specifically, the new variables for each sample are obtained. The arithmetic mean μ and the standard deviation σ of U _ij , and each new variable is standardized based on the standardized relational expression g2=(g1-μ)/σ to obtain the value of each standardized variable.

Based on this, the present application trains a pre-established recurrent neural network model based on the absolute value of each standardized variable value, so as to obtain a recurrent neural network model for predicting the failure of the storage medium.

Please refer to FIG. 3 , which is a schematic diagram of training a recurrent neural network according to an embodiment of the present invention.

As an optional embodiment, the process of training a pre-established recurrent neural network model based on the operating state data to obtain a recurrent neural network model for predicting the failure of the storage medium includes:

Divide the sample set composed of running state data into training set, validation set and test set;

The pre-established recurrent neural network model is trained based on the training set to obtain the first recurrent neural network model;

Verifying the first recurrent neural network model based on the verification set, and judging whether the training of the first recurrent neural network model meets the standard according to the verification result;

If the training fails to meet the standard, obtain a new sample set again to continue training the first recurrent neural network model, and return to the step of verifying the first recurrent neural network model based on the verification set.

Specifically, the present application divides the sample set based on the operating state data into training set, verification set and test set in advance; wherein, the training set is used to train the recurrent neural network model; the verification set is used to verify the trained recurrent neural network Model; the test set is used to test the trained recurrent neural network model, so that the prediction accuracy of the recurrent neural network model is high.

Based on this, the entire training process of the RNN model includes: 1) training the pre-established RNN model based on the training set to obtain the trained RNN model (called the first RNN model). 2) Verify the first RNN model based on the verification set, and judge whether the training of the first RNN model meets the standard according to the verification result (if the first RNN model can accurately predict the storage medium represented by the verification set based on the verification set; If the training meets the standard, then execute the subsequent steps of testing the first recurrent neural network model based on the test set; if the training fails to meet the standard, do not execute The subsequent step of testing the first recurrent neural network model based on the test set, but to re-acquire a new sample set, and continue to train the first recurrent neural network model based on the new sample set, and return to the first recurrent neural network based on the validation set. In the step of verifying the model, the subsequent step of testing the first recurrent neural network model based on the test set is not performed until the verification result is that the training of the first recurrent neural network model meets the standard. 3) Test the first recurrent neural network model based on the test set, and judge whether the test of the first recurrent neural network model passes according to the test result (if the first recurrent neural network model can accurately predict the storage medium represented by the test set based on the test set; If the test is passed, the first RNN model that has passed the test is used as the RNN model for predicting the failure of the storage medium. , it can be put into use; if the test fails, a new sample set will be re-acquired, and the first cyclic neural network model will continue to be trained based on the new sample set, and return to the steps of testing the first cyclic neural network model based on the test set, It is not put into use until the test result is that the test of the first recurrent neural network model passes.

As an optional embodiment, both the first operating state data and the second operating state data of the storage medium are specifically SMART data of the storage medium.

Specifically, the first operating state data and the second operating state data of the storage medium of the present application can directly adopt the SMART (Self-Monitoring Analysis and Reporting Technology, self-monitoring, analysis and reporting technology) data of the storage medium, and the SMART data is related to Some data closely related to the health of the storage medium, such as seek error rate, disk startup time, remap sector count, power-on time, head write height, temperature, etc.

As an optional embodiment, the recurrent neural network model is specifically BERT or Transformer.

Specifically, the cyclic neural network model of the present application can adopt high-precision BERT (Bidirectional Encoder Representation from Transformers, bidirectional encoder) or Transformer (a kind of cyclic neural network), or LSTM (Long Short-Term Memory, long short-term memory network), which is not specifically limited in this application.

As an optional embodiment, the storage failure prediction method further includes:

Record the fault prediction result of the storage medium in the system log, and display the fault prediction result on the management interface of the storage system.

Further, the application can record the storage failure prediction result of the storage system in the system log, as a basis for subsequent analysis of the storage failure of the system; at the same time, the application can also display the storage failure prediction result on the management interface of the storage system for use. Managers check in time.

The present application also provides a storage fault prediction system for a storage system, including:

The model training module is used to train the pre-established recurrent neural network model based on the running state data, so as to obtain the recurrent neural network model for predicting the failure of the storage medium;

The fault prediction module is used for analyzing and processing the current running state data of the storage medium based on the cyclic neural network model during the operation of the storage system, so as to obtain the fault prediction result of the storage medium.

For the introduction of the storage fault prediction system provided by the present application, please refer to the above-mentioned embodiments of the storage fault prediction method, which will not be repeated in this application.

The present application also provides a storage failure prediction device for a storage system, including:

memory for storing computer programs;

For the introduction of the storage fault prediction device provided by the present application, please refer to the above-mentioned embodiments of the storage fault prediction method, which will not be repeated in this application.

It should also be noted that, in this specification, relational terms such as first and second, etc. are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply these entities or operations. There is no such actual relationship or sequence between operations. Moreover, the terms "comprising", "comprising" or any other variation thereof are intended to encompass a non-exclusive inclusion such that a process, method, article or device that includes a list of elements includes not only those elements, but also includes not explicitly listed or other elements inherent to such a process, method, article or apparatus. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in a process, method, article or apparatus that includes the element.

The above description of the disclosed embodiments enables any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be implemented in other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

A storage failure prediction method for a storage system, comprising:

Acquiring in advance first operating state data of the storage medium of the storage system running normally within a preset first time and second operating state data of running within a preset second time before the failure occurs;

Preprocessing the first operating state data and the second operating state data to obtain operating state data whose correlation with the operating change of the storage medium is higher than a certain value;

training a pre-established recurrent neural network model based on the operating state data to obtain a recurrent neural network model for predicting the failure of the storage medium;

During the operation of the storage system, the current operating state data of the storage medium is analyzed and processed based on the cyclic neural network model to obtain a fault prediction result of the storage medium.
The method for predicting a storage failure of a storage system according to claim 1, wherein the first operating state data of the storage medium of the storage system running normally within a preset first time and the preset first data before the failure occur are obtained in advance. The process of running the second running state data within the second time, including:

Acquiring in advance a plurality of first operating state data in which the storage medium of the storage system operates normally within a preset first time, and using the plurality of first operating state data as negative samples;

Acquiring multiple pieces of second operating state data that are run by the storage medium within a preset second time before the failure occurs, and using the multiple pieces of second operating state data as positive samples;

The proportions of the positive samples and the negative samples are balanced, and the two together form a sample set for training the recurrent neural network model.
The method for predicting a storage failure of a storage system according to claim 2, wherein the first operating state data and the second operating state data are preprocessed to obtain an operating change condition related to the storage medium. The process of training the pre-established recurrent neural network model based on the operating state data to obtain the recurrent neural network model for predicting the failure of the storage medium ,include:

A sample matrix is constructed based on the acquired n samples x i =(x i1 ,x i2 ,...,x ip ) T , i=1,2,...,n; wherein, each sample collects p-dimensional vector data x=(x 1 , x 2 ,...,x p ) T , n>p and both n and p are positive integers;

based on standard transformation relations
Standard transformation is performed on the sample matrix to obtain a standardized matrix Z; wherein,

Relational Expression Based on Sample Correlation Matrix
Obtain the sample correlation matrix R, and solve the characteristic equation |R-λI p |=0 of the sample correlation matrix R to obtain p characteristic roots;

based on
Determine the value of m and solve for each λ j ,j=1,2,...,m based on Rb=λ j b to get the identity matrix
Among them, Q is the minimum utilization rate of preset information, p>m and m is a positive integer;

Converting Relational Expressions Based on Indicators
A new sample variable U ij is obtained , and a pre-established recurrent neural network model is trained based on the new sample variable U ij to obtain a recurrent neural network model for predicting the failure of the storage medium.
The storage fault prediction method of a storage system according to claim 3, wherein the pre-established recurrent neural network model is trained based on the new sample variable U ij , so as to obtain a method for predicting the fault of the storage medium The process of the recurrent neural network model, including:

Obtain the arithmetic mean μ and standard deviation σ of the new variables U ij of each sample, and standardize each new variable based on the standardized relationship g2=(g1-μ)/σ to obtain the value of each standardized variable; among them, g1 is The variable value before each new variable is standardized, and g2 is the variable value after each new variable is standardized;

The pre-established recurrent neural network model is trained based on the absolute value of each standardized variable value, so as to obtain a recurrent neural network model for predicting the failure of the storage medium.
The storage fault prediction method of a storage system according to claim 2, wherein a pre-established recurrent neural network model is trained based on the operating state data, so as to obtain a method for predicting the fault of the storage medium. The process of the recurrent neural network model, including:

dividing the sample set composed of the operating state data into a training set, a verification set and a test set;

The pre-established cyclic neural network model is trained based on the training set to obtain a first cyclic neural network model;

Verifying the first RNN model based on the verification set, and judging whether the training of the first RNN model meets the standard according to the verification result;

If the training meets the standard, then test the first recurrent neural network model based on the test set, and determine whether the test of the first recurrent neural network model passes according to the test result;

If the test is passed, the first recurrent neural network model that has passed the test is used as the recurrent neural network model for predicting the failure of the storage medium;

If the test fails, obtain a new sample set again to continue training the first recurrent neural network model, and return to the step of testing the first recurrent neural network model based on the test set;

If the training fails to meet the standard, obtain a new sample set again to continue training the first recurrent neural network model, and return to the step of verifying the first recurrent neural network model based on the verification set.
The method for predicting a storage failure of a storage system according to claim 1, wherein the first operation state data and the second operation state data of the storage medium are both specifically SMART data of the storage medium.
The storage fault prediction method of a storage system according to claim 1, wherein the recurrent neural network model is specifically a BERT or a Transformer.
The storage failure prediction method of a storage system according to claim 1, wherein the storage failure prediction method further comprises:

The failure prediction result of the storage medium is recorded in a system log, and the failure prediction result is displayed on the management interface of the storage system.
A storage failure prediction system for a storage system, characterized in that it includes:

a data acquisition module, used for pre-acquiring first operating state data of the storage medium of the storage system running normally within a preset first time and second operating state data of running within a preset second time before the failure occurs;

a data extraction module, configured to preprocess the first operating state data and the second operating state data to obtain operating state data whose correlation with the operating change of the storage medium is higher than a certain value;

a model training module for training a pre-established recurrent neural network model based on the operating state data to obtain a recurrent neural network model for predicting the failure of the storage medium;

The fault prediction module is configured to analyze and process the current operating state data of the storage medium based on the cyclic neural network model during the operation of the storage system to obtain a fault prediction result of the storage medium.
A storage failure prediction device for a storage system, characterized in that it includes:

memory for storing computer programs;

The processor is configured to implement the steps of the storage failure prediction method of the storage system according to any one of claims 1-8 when executing the computer program.