CN115373879A

CN115373879A - Intelligent operation and maintenance disk fault prediction method for large-scale cloud data center

Info

Publication number: CN115373879A
Application number: CN202211039310.5A
Authority: CN
Inventors: 徐小龙; 徐诗成
Original assignee: Nanjing University of Posts and Telecommunications
Current assignee: Nanjing University of Posts and Telecommunications
Priority date: 2022-08-29
Filing date: 2022-08-29
Publication date: 2022-11-22

Abstract

The invention discloses a disk fault prediction method for intelligent operation and maintenance of a large-scale cloud data center, which comprises the following steps: firstly, carrying out information entropy characteristic processing on unbalanced data, and selecting more important characteristics; then dividing the processed unbalanced data, and extracting few types of sample data, namely fault samples; then, data enhancement is carried out on the fault samples by using a time progressive sampling method TPS to generate synthetic data, and more fault sample data are generated through the TPS, so that the ratio of the number of the healthy samples to the number of the fault samples can reach better balance; then, combining the synthetic data with good generation effect with the original data to generate integrated data; and finally, inputting the integrated data into a disk failure prediction model for training, selecting a time window of 7 days to predict whether a failure occurs after 7 days, and carrying out corresponding data marking.

Description

Intelligent operation and maintenance disk fault prediction method for large-scale cloud data center

Technical Field

The invention belongs to the field of operation and maintenance of cloud data centers, and particularly relates to a disk fault prediction method for intelligent operation and maintenance of a large-scale cloud data center.

Background

Magnetic disks are widely used as the general and primary storage device for large storage systems in modern large-scale data centers. In such data centers, ensuring high availability and reliability of data center management is a very challenging task, because various disk failures occur continuously in the field, which is a main factor causing service interruption in the cloud data center, and if no data redundancy scheme is deployed, the disk failures may cause temporary data loss, thereby causing system unavailability or even permanent data loss. The failure prediction of the disk is used as a key link of the intelligent operation and maintenance of the cloud data center, and the problems can be predicted and solved quickly by the operation and maintenance, so that the server can operate normally, and the problems caused by the failure are solved effectively.

With the coming of big data era and the rapid development of technologies such as machine learning and deep learning, people can utilize a complex neural network model to mine and extract key information in massive data under the support of strong calculation power.

Meanwhile, with the intensive research on the SMART data, the related characteristics of the SMART data in time series are gradually concerned by researchers. Therefore, more and more researchers are trying to implement disk failure prediction using a time series processing method.

However, in practical application, the SMART data of the disk failure acquired by the data center is degraded data from a healthy state to a failure state, and the actual failure time of the disk is unknown. That is to say, for a piece of failed disk data, we can only say that there is failed data in its data sequence, but cannot locate accurately. Faced with this problem, a natural solution is to treat the degraded data from the disk as one sample.

However, degraded data of a disk is often long-term serial data with different lengths, and how to classify the non-fixed-length time series data is an important and challenging problem in data mining. Even LSTM neural networks do not work well in the face of such long time series data. In addition, the failure phase of the disk is fast and short during operation. Therefore, the proportion of abnormal data in the life cycle data is very small, so that error information is buried in a large amount of health data. This is also known as the imbalance problem, which presents a serious challenge to conventional classification methods.

The current disk failure prediction methods are mainly divided into two types: based on a traditional machine learning method and a deep neural network method.

(1) Based on the traditional machine learning method, some research works utilize the SMART attributes and the bayesian network to predict the failure of the disk, a subset of the SMART attributes which can describe the data most is selected through a method of feature selection, a binning process and feature creation, and is used together with a group of trend indicators based on the same SMART attributes, but the time sequence features of the data are not well considered, and the effect of the dynamic bayesian network needs to be researched. Still other research efforts have treated fault prediction as a binary classification problem, while taking into account the mean time between predicted and actual faults, and evaluating model performance (FDR) based on fault detection rate, defined as the proportion of faulty drives that are correctly classified as faults, and the False Alarm Rate (FAR) defined as the proportion of good drives that are incorrectly classified as faults.

(2) Based on a deep neural network method, some research works utilize a long-time memory model (LSTM) and different data balance methods to predict disk failures before 5-7 days, so that the problem of model aging is solved, and the time range of the disk failures is widened. But the prediction is in units of days, the IO standard for giving an alarm is ignored, so that some false positives still exist in the prediction. Still other research efforts have focused on predicting disk failures using sequential information. They used a data set collected from a real word data center containing 3 different disk models (denoted as W, S and M) and built prediction models for these disk models separately, while modeling the long-term dependent sequential SMART data and demonstrating its ability to predict models.

Disclosure of Invention

The purpose of the invention is as follows: in order to solve the problems in the prior art, the invention provides a disk fault prediction method for intelligent operation and maintenance of a large-scale cloud data center, which integrates the advantages of a Transformer and optimizes by utilizing the advantages of a time progressive sampling TPS method, realizes data enhancement of unbalanced data, and classifies and predicts disk faults.

The technical scheme is as follows: in order to solve the technical problems, the technical scheme adopted by the invention is as follows:

in a first aspect, a disk failure prediction method for intelligent operation and maintenance of a large-scale cloud data center is provided, and includes:

step 1, carrying out missing value filling, data normalization and information entropy processing on an unbalanced original data set to obtain a most relevant and frequently changed characteristic attribute, namely a data set G;

step 2, dividing the data set G into a source data set S1 formed by few samples with fault labels and a source data set S2 formed by multiple samples with non-fault labels according to the labels;

step 3, performing data enhancement on the source data set S1 by adopting a time progressive sampling TPS method to generate synthetic data to obtain a synthetic data set T;

step 4, integrating the source data set S1, the source data set S2 and the synthetic data set T to form an integrated data set Q, and dividing the integrated data set Q into a training set M and a test set N;

step 5, training the disk failure prediction model by using the training set M, and testing the trained disk failure prediction model by using the test set N until the model prediction effect meets the requirement, so as to obtain the trained disk failure prediction model;

step 6, inputting SMART data of the disk to be detected into a trained disk failure prediction model;

and 7, determining a disk failure prediction result according to the output of the disk failure prediction model.

In some embodiments, the missing value padding comprises: if the missing value is 2 or more in succession, using the pattern of the SMART entry on the disk as the padding value; if only one value is missing, the average value before and after the value is used as the fill value.

In some embodiments, the data normalization comprises:

scaling all values between [0,1] using the maximum and minimum values in the features, the scaling formula is as follows:

where x is the original value of the feature, x _max And x _min Maximum and minimum values of features in the dataset, respectively; and x' is the scaled eigenvalue.

In some embodiments, the information entropy processing comprises: the value of each characteristic attribute is calculated to express the information amount, and the formula is as follows:

where i represents the ith sample, for a total of n samples; p represents the probability of each value appearing in each SMART attribute; the higher the information entropy value H (U) of a feature, the more information it contains, meaning the more pronounced the fluctuation of the feature properties, so that the most relevant and frequently changing feature properties are selected.

In some embodiments, in step 3, performing data enhancement on the source data set S1 by using a time progressive sampling TPS method includes:

and generating and collecting fault data for the source data set S1 by adopting a time progressive sampling TPS method, performing loss calculation on the generated data and the original data, judging whether the loss is smaller than a set threshold value, if so, collecting, and otherwise, repeating the step operation until a synthetic data set T is obtained.

In some embodiments, for a given failed disk, assuming that the disk failure occurs at a timestamp t, the prediction operation occurs at a timestamp t-i, a time period t-i of length i between the occurrence of the prediction actions at t, the occurrence of the disk failure at t being denoted as lead period i;

during model training, for each failed disk, the TPS gradually collects more failure data samples in a lead period I, namely the range of the lead period I is 1 to I, wherein I is a hyper-parameter of the TPS;

there are also two important parameters in the TPS method:

the window length is defined as The size of The time window for training The network input data in each sequence sample, and The predict _ failure _ days is defined as The number of days before failure.

Further, in some embodiments, the window length is 5, then one training sample will contain SMART attribute information for The disk in The past 5 days; the value of predict _ failure _ days is within 5-7 days.

In some embodiments, the disk failure prediction model comprises: an input module, an encoder block, a decoder block, and an output module;

in the input module, the input data L (plus appropriate padding) is converted into H different query matrices using a convolution Transformer model using a convolution layer of kernel size k (i.e., "Conv, k") and step size 1

Key matrix

Sum matrix

Wherein H =1, …, H,

are all learnable parameters;

stacking a feedforward sublayer at the output of the encoder block and the decoder block respectively, wherein the position feedforward sublayer has two fully connected networks and middle ReLU activation, and the formula is as follows:

max(0,XW ₁ +b ₁ )W ₂ +b ₂ (3)

wherein X is input, W ₁ 、W ₂ Is a learnable parameter, b ₁ And b ₂ The dimension of an output matrix finally obtained by the feedforward sublayer is consistent with X;

attention calculation operations are performed by convolutional projection instead of the existing position-based linear projection, and queries, keys and value embedding are performed by convolutional projection to enhance the attention to local context information.

In a second aspect, the invention provides a disk failure prediction device for intelligent operation and maintenance of a large-scale cloud data center, which comprises a processor and a storage medium;

the storage medium is used for storing instructions;

the processor is configured to operate in accordance with the instructions to perform the steps of the method according to the first aspect.

In a third aspect, the invention provides a storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method of the first aspect.

The invention has the following beneficial effects:

(1) Aiming at disk failure prediction in a cloud data center, the advantages of a TPS (time progressive sampling method) are utilized, and the advantages of a Transformer are fused and optimized. The method can fully utilize the fault data to extract the relationship between the data, can enable the generated data to have the original characteristics of the fault data and the data distribution and hidden modes in the potential space, achieves excellent effects on public data, and has good practicability in a disk fault prediction system with higher requirements on an F1 value and a Matthews Correlation Coefficient (MCC).

(2) In the method, for long-time sequence data, a multi-head self-attention mechanism is utilized to superpose a multi-layer encoder-decoder model to learn and obtain the dependency relationship between time sequence data, and the time correlation between different time step data is established.

(3) In the method, the self-attention calculation method of the original Transformer is insensitive to local information, so that the model is easily influenced by abnormal points, and a potential optimization problem is brought. Therefore, the convolution projection is used for replacing the existing linear projection based on the position to perform the attention calculation operation, and the convolution projection is used for performing inquiry, key and value embedding to enhance the attention of local context information, so that the prediction is more accurate.

(4) In the method, a Time Progressive Sampling (TPS) method is utilized to perform data enhancement so as to solve the problem of data imbalance. The TPS can generate multiple failed samples for each failed disk, which not only preserves all the characteristics of a healthy disk, but also brings more failure modes.

(5) The algorithm of the method is simple in structure and low in time complexity.

Drawings

Fig. 1 is a schematic flow diagram of a disk failure prediction method for intelligent operation and maintenance of a large-scale cloud data center, which is designed in the embodiment of the invention.

FIG. 2 is a diagram of a disk failure prediction model in an embodiment of the present invention.

Fig. 3 is a design diagram of a time progressive sampling TPS method in an embodiment of the present invention.

Detailed Description

The technical scheme of the invention is further explained in detail by combining the drawings in the specification.

In the description of the present invention, the meaning of a plurality is one or more, the meaning of a plurality is two or more, and the above, below, exceeding, etc. are understood as excluding the present numbers, and the above, below, within, etc. are understood as including the present numbers. If the first and second are described for the purpose of distinguishing technical features, they are not to be understood as indicating or implying relative importance or implicitly indicating the number of technical features indicated or implicitly indicating the precedence of the technical features indicated.

In the description of the present invention, reference to the description of the terms "one embodiment," "some embodiments," "an illustrative embodiment," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

Example 1

A disk fault prediction method for intelligent operation and maintenance of a large-scale cloud data center comprises the following steps:

In some embodiments, the data normalization comprises:

where i represents the ith sample, for a total of n samples; p represents the probability of each value appearing in each SMART attribute; the higher the information entropy H (U) of a feature, the more information it contains, meaning the more pronounced the volatility of the feature attributes, so that the most relevant and frequently changing feature attributes are selected.

and generating and collecting fault data for the source data set S1 by adopting a time progressive sampling TPS method, performing loss calculation on the generated data and the original data, judging whether the loss calculation is smaller than a set threshold value, if so, collecting, otherwise, repeating the step operation until a synthetic data set T is obtained through collection.

during model training, for each failed disk, the TPS gradually collects more failure data samples within a lead period I, namely the range of the lead period I is 1 to I, wherein I is a hyper-parameter of the TPS;

there are two important parameters in the TPS process:

in the input module, the input data L (plus appropriate padding) is converted into H different query matrices using a convolution Transformer model using convolution layers of kernel size k (i.e. "Conv, k") and step size 1

Key matrix

Sum matrix

Wherein H =1, …, H,

are all learnable parameters;

max(0,XW ₁ +b ₁ )W ₂ +b ₂ (3)

In some specific embodiments, as shown in fig. 1, a disk failure prediction method for intelligent operation and maintenance of a large-scale cloud data center includes the following steps: firstly, carrying out information entropy characteristic processing on unbalanced data, and selecting more important characteristics; then dividing the processed unbalanced data, and extracting few types of sample data, namely fault samples; then, data enhancement is carried out on the fault samples by using a time progressive sampling method TPS to generate synthetic data, and more fault sample data are generated through the TPS, so that the ratio of the number of the healthy samples to the number of the fault samples can be well balanced; then, combining the synthetic data with good generation effect with the original data to generate integrated data; and finally, inputting the integrated data into a disk failure prediction model for training, selecting a time window of 7 days to predict whether a failure occurs after 7 days, and carrying out corresponding data marking. The method can fully utilize the fault data to extract the dependency relationship among the time sequence data, can enable the generated data to have the original characteristics of the fault data and the data distribution and hiding modes in the potential space, and has good practicability in a disk fault prediction system with higher requirements on an F1 value and a Matthews Correlation Coefficient (MCC).

The disk failure prediction method is used for performing failure prediction on a disk of a large-scale cloud data center intelligent operation and maintenance, and in the practical application process, the method specifically comprises the following steps:

step 1, filling feature missing values in an original data set, then carrying out data normalization, scaling all values between intervals of [0,1], and then carrying out information entropy feature processing, so that the most relevant and frequently changed features are selected, and finally a feature-processed data set G is obtained.

The missing value filling adopts the following method: if the missing value is 2 or more in succession, using the pattern of the SMART entry on the disk as the padding value; if only one value is missing, the average value before and after the value is used as the fill value. Following the normalization of the data, all values are scaled between [0,1] using the maximum and minimum values in the features, using the following formula:

where x is the original value of the feature, x _max And x _min Respectively, the maximum and minimum values of the feature in the dataset, and x' is the scaled feature value. Next, we perform entropy processing on the features after missing value filling and data normalization, and this method calculates the value of each feature attribute to express the information quantity, and its formula is as follows:

where i represents the ith sample, for a total of n samples; where p represents the probability of each value appearing in each SMART attribute. The higher the information entropy H (U) of a feature, the more information it contains, which means that the volatility of the feature properties is more pronounced, so that the most relevant and frequently changing feature properties can be selected.

Step 2, performing data division on the marked unbalanced fault data in the data set G, screening out sample data with few categories, namely labels 1, as a source data set S1, and samples with another category, namely labels 0, as a source data set S2;

and 3, generating and collecting fault data of the source data set S1 (namely the few samples) through TPS by adopting a time progressive sampling TPS method, performing loss calculation on the generated data and the original data, judging whether the loss is smaller than a set threshold value, collecting if the loss is smaller than the set threshold value, and otherwise, repeating the operation in the step 2. The time progressive sampling TPS method is used for carrying out data enhancement on the few types of sample fault data. Before describing TPS, there is an important concept called lead time. For a given failed disk, assuming that the disk failure occurs at timestamp t, the prediction operation occurs at timestamp t-i, then a time period t-i of length i between the occurrence of the prediction action at t, the occurrence of the disk failure at t being denoted as lead period i. During model training, for each failed disk, the TPS will collect progressively more failure data samples during lead period I (i.e. lead period I ranges from 1 to I, where I is the hyper-parameter of the TPS). There are also two important parameters in the TPS method:

the window length is defined as The time window size of The training network input data in each sequence sample, e.g. h is 5, then one training sample will contain SMART attribute information for The disk in The last 5 days. h needs to have a proper value. If too small, less potential information is provided to the ConvTrans-TPS. If it is too large, it corresponds to a long time sequence. Data that is too far from the ultimate failure has little, if any, misleading impact on the prediction of the ultimate failure trend.

The predict _ failure _ days is defined as the number of days before failure, which is an alarm boundary. The value of the predict _ failure _ days also needs to be appropriate. Too long or too short a time interval can affect the effectiveness of disk failure handling. The value of predict _ failure _ days is reasonable within 5-7 days. The method selects a prediction _ failure _ days value of 7 days.

Step 4, repeating the step 2 until the generation of the source data set S1 is finished, and collecting the synthesized data as a synthesized data set T;

and 5, integrating the source data set S1, the source data set S2 and the synthetic data set T to form a final integrated data set Q, and dividing the final integrated data set Q into a training set M and a test set N. The counts of each tag in the selected data are shown in Table 1

Table 1 data set tag statistics table

And 6, constructing and training a disk fault prediction model by using the training set M, predicting the test set N, and outputting possible faults in the test set N by using the disk fault prediction model. The disk failure prediction model is modified on the basis of a Transformer model, attention calculation operation of linear projection based on positions on the original Transformer model is abandoned, query, key and value embedding is carried out by utilizing convolution projection to enhance attention to local context information, and meanwhile, the disk failure prediction model comprises an original encoder block and an original decoder block. In the original transform model, long-term and short-term dependencies were captured by using a multi-head self-attentive mechanism, and different attention heads learned, focusing on different aspects of the temporal patterns.

In the self-attention layer, a multi-headed self-attention sublayer (applying the same model at each time step, thus simplifying the formula with some notation) simultaneously converts input data L into H different query matrices

Key matrix

Sum matrix

Wherein H =1, …, H, and

are all learnable parameters. After these linear projections, the dot product attention computation vector output sequence is scaled:

wherein the mask matrix M is used to avoid future information leakage by setting all upper triangular elements to- ∞ _k Is the number of columns of the Q, K matrix, the vector dimension. Then, A ₁ ,A ₂ ,…A _H Will be cascaded and projected linearly again. Stacking a feedforward sublayer at the output, with two fully connected networks and intermediate ReLU activations, the formula is as follows:

max(0,XW ₁ +b ₁ )W ₂ +b ₂ (4)

where X is the input, the dimension of the final output matrix from the feedforward sublayer is consistent with X.

However, in the original Transformer model, the similarity between the computation query and the key is computed according to the dot product thereof, which may cause the attention point of the data to be abnormal, the original computation method cannot consider the information of the current data, and is not sensitive to the local context, thereby causing the attention score to be only the expression of the correlation between single time points, and unlike the original purpose of time series prediction, it may confuse whether the observation value of the self-attention module is an abnormal value, a change point or a part of a pattern, and bring about a potential optimization problem.

Thus, a convolutional autofocusing mechanism is used to alleviate this problem, which converts the input (plus the appropriate padding) into a query, key, using a convolutional layer of kernel size k and a stride of 1, rather than using a kernel size of 1 and a stride of 1 (matrix multiplication). Attention calculation operations are performed by convolutional projection instead of the existing location-based linear projection, and queries, key and value embedding are performed by convolutional projection to enhance the attention to local context information.

Step 7, marking all the examples in the test set N by utilizing the output of the disk failure prediction model to obtain a marking result c, wherein when the marking value of the marking result c is 0, the example is represented as non-failure, and when the marking value is 1, the example is represented as failure; wherein, the marking result c is obtained by adopting the formula (5) in the step 7:

in the formula (4), the reaction mixture is,

for the ith feature of sample k, D is all data sets available for model training, D _k Is a subset of D, y _j Is the eigenvalue of sample j, a is the parameter, and p is the prior value.

And 8: and outputting the fault according to the marking result c.

The positioning system of the disk fault prediction method for the intelligent operation and maintenance of the large-scale cloud data center comprises the following steps:

the data set characteristic preprocessing module is used for carrying out missing value filling, data normalization and information entropy processing on the original data set so as to obtain the most relevant and frequently changed characteristic attributes;

the data set dividing module is used for screening a source data set S1 formed by a few samples and a source data set S2 formed by a plurality of samples from the unbalanced data set;

the time progressive sampling TPS module is used for performing data enhancement on the source data set S1 so as to generate high-quality synthetic data serving as a synthetic data set T;

the disk failure prediction model is used for training a training set M divided by the final integrated data set Q, predicting the test set N and outputting each instance failure in the test set N;

the marking module is used for marking all the examples in the test set N to obtain a marking result;

and the display module is used for predicting and displaying the fault in the network according to the marking result.

The time progressive sampling TPS module generates composite data T and merges with the source data set S1 and the source data set S2 to form an integrated data set Q. And constructing a disk failure prediction model by adopting the integrated data set Q.

Example 2

In a second aspect, the embodiment provides a disk failure prediction device for intelligent operation and maintenance of a large-scale cloud data center, which includes a processor and a storage medium;

the storage medium is used for storing instructions;

the processor is configured to operate in accordance with the instructions to perform the steps of the method according to embodiment 1.

Example 3

In a third aspect, the present embodiment provides a storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method of embodiment 1.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above description is only a preferred embodiment of the present invention, and the scope of the present invention is not limited to the above embodiment, but equivalent modifications or changes made by those skilled in the art according to the present disclosure should be included in the scope of the present invention as set forth in the appended claims.

Claims

1. A disk failure prediction method is characterized by comprising the following steps:

2. The disk failure prediction method of claim 1, wherein the missing value padding comprises: if the missing value is 2 or more in succession, using the pattern of the SMART entry on the disk as the padding value; if only one value is missing, the average value before and after the value is used as the fill value.

3. The disk failure prediction method of claim 1, wherein the data normalization comprises:

4. The disk failure prediction method of claim 1,

the information entropy processing comprises the following steps: the value of each characteristic attribute is calculated to express the information amount, and the formula thereof is as follows:

5. The disk failure prediction method according to claim 1, wherein in the step 3, performing data enhancement on the source data set S1 by using a time progressive sampling TPS method includes:

6. The disk failure prediction method of claim 5, wherein for a given failed disk, assuming that a disk failure occurs at a timestamp t, the prediction operation occurs at a timestamp t-i, a time period t-i of length i between the occurrence of the prediction action at t, and the occurrence of the disk failure at t is denoted as lead period i;

there are two important parameters in the TPS process:

7. The disk failure prediction method of claim 6, wherein The window length is 5, then a training sample will contain SMART attribute information of The disk in The last 5 days; the value of predict _ failure _ days is within 5-7 days.

8. The disk failure prediction method of claim 1, wherein the disk failure prediction model comprises: an input module, an encoder block, a decoder block, and an output module;

in the input module, the convolution Transformer model is utilized to convert input data L into H different query matrixes by using convolution layers with the kernel size of k and the step length of 1

Key matrix

Sum matrix

Wherein H =1, …, H,

are all learnable parameters;

stacking a feedforward sublayer at the output of the encoder block and the decoder block, respectively, the position feedforward sublayer having two fully connected networks and a middle ReLU activation, the formula is as follows:

max(0,XW ₁ +b ₁ )W ₂ +b ₂ (3)

wherein X is input, W ₁ 、W ₂ Is a learnable parameter, b ₁ And b ₂ The dimension conversion method is a preset regular term and is used for carrying out spatial dimension conversion in the connection layer, and the dimension of an output matrix finally obtained by the feedforward sublayer is consistent with X.

9. A disk failure prediction device is characterized by comprising a processor and a storage medium;

the storage medium is used for storing instructions;

the processor is configured to operate in accordance with the instructions to perform the steps of the method according to any one of claims 1 to 8.

10. A storage medium having a computer program stored thereon, the computer program, when being executed by a processor, performing the steps of the method of any one of claims 1 to 8.