CN117421145A - Heterogeneous hard disk system fault early warning method and device - Google Patents

Heterogeneous hard disk system fault early warning method and device Download PDF

Info

Publication number
CN117421145A
CN117421145A CN202311736825.5A CN202311736825A CN117421145A CN 117421145 A CN117421145 A CN 117421145A CN 202311736825 A CN202311736825 A CN 202311736825A CN 117421145 A CN117421145 A CN 117421145A
Authority
CN
China
Prior art keywords
hard disk
attribute data
state attribute
heterogeneous
cluster
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311736825.5A
Other languages
Chinese (zh)
Other versions
CN117421145B (en
Inventor
王东清
李道童
张炳会
李婷婷
陈衍东
孙秀强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Metabrain Intelligent Technology Co Ltd
Original Assignee
Suzhou Metabrain Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Metabrain Intelligent Technology Co Ltd filed Critical Suzhou Metabrain Intelligent Technology Co Ltd
Priority to CN202311736825.5A priority Critical patent/CN117421145B/en
Publication of CN117421145A publication Critical patent/CN117421145A/en
Application granted granted Critical
Publication of CN117421145B publication Critical patent/CN117421145B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/008Reliability or availability analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2433Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The application provides a heterogeneous hard disk system fault early warning method and device, and relates to the technical field of equipment detection, wherein the method comprises the following steps: obtaining hard disk state attribute data of hard disks of different types in a heterogeneous hard disk system, clustering and grouping the hard disk state attribute data according to the hard disk types and the distribution difference of the hard disk state attribute data, and determining a hard disk cluster to which the hard disk state attribute data belongs; data corresponding to each hard disk cluster are respectively input into a hard disk fault prediction model of the multi-tower structure for abnormality detection processing, and hard disk health index information is obtained; the hard disk fault prediction model is obtained by training based on hard disk health label information corresponding to sample hard disk state attribute data; and performing fault early warning on the heterogeneous hard disk system based on the hard disk health index information. The method provided by the application can effectively improve the accuracy and efficiency of fault early warning of the heterogeneous hard disk system, so that the running stability and safety of the heterogeneous hard disk system are improved.

Description

Heterogeneous hard disk system fault early warning method and device
Technical Field
The application relates to the technical field of intelligent detection, in particular to a heterogeneous hard disk system fault early warning method and device. In addition, the invention also relates to an electronic device and a processor readable storage medium.
Background
In recent years, with the rapid development of technologies such as big data and cloud computing, the data volume has been explosively increased. Cloud service manufacturers build huge data centers to provide high-quality services for users, and stable operation of the data centers becomes a key to influence user experience. In a data center, hard disk failures are most common, and account for about 78% of the hardware failures of the data center, and the probability of failures increases as the service time is longer due to the physical characteristics. Unexpected consequences can occur when the hard disk fails, on one hand, tasks running on the hard disk or system breakdown can be caused, and service interruption is caused; on the other hand, may result in a loss of a large amount of data held by the user. To improve reliability and security of a data center, some fault tolerance mechanisms are adopted, and are commonly used in passive fault tolerance and active fault tolerance. Passive fault tolerance is a measure to remedy after a hard disk failure occurs, such as: a redundant array of disks (Redundant Arrays of Inexpensive Disks, RAID) technique, which combines multiple hard disks into one or more hard disk groups by using a virtualized storage technique to realize data redundancy and performance improvement; and the data center adopts a multi-copy strategy to improve the reliability of the storage system, for example HDFS (Hadoop Distributed File System), and the fault tolerance problem of storage is solved by carrying out multiple backups on the data. These passive error-tolerant techniques, while capable of guaranteeing data security and reliability, have problems of high cost, storage space utilization, etc. Unlike passive error tolerant mechanism, active fault tolerance is a mechanism capable of predicting hard disk failure in advance, so that corresponding measures are taken in time, operation and maintenance cost is reduced, and reliability and user experience of a data center are improved. The method has obvious advantages, and becomes a hot spot direction for hard disk fault diagnosis.
S.m.a.r.t. (Self-Monitoring, analysis and Report Technology) is a typical active fault tolerance technique, collectively referred to as Self-Monitoring, analysis and reporting techniques, that can detect and record attributes related to drive reliability. In recent years, many studies construct hard disk failure prediction models for machine learning and deep learning based on s.m. a.r.t. information, but these methods generally assume that training data and test data are from the same distribution. However, in a real data center, the storage system is made up of thousands or even millions of hard disks, which are typically from different vendors or different models of hard disks from the same vendor, which are referred to as heterogeneous hard disk systems. In addition, the number of hard disks in the heterogeneous hard disk system increases with the occurrence of hard disk failures. Different SMART data distribution is usually arranged among heterogeneous hard disks, a fault prediction model trained by single-model hard disk data is not suitable for other models of hard disks, the migration learning method adopted in the prior art is high in limitation, the number of heterogeneous hard disks is required to be relied on, meanwhile, influences of the number of different models of hard disks on model parameters and the like are not considered, and therefore actual fault early warning efficiency and accuracy are poor. Therefore, how to design a more efficient and easy-to-use heterogeneous hard disk system fault early warning scheme becomes a current urgent problem to be solved.
Disclosure of Invention
Therefore, the application provides a heterogeneous hard disk system fault early warning method and device, which are used for solving the defects that in the prior art, the limitation of a heterogeneous hard disk system fault early warning scheme is higher, so that the fault early warning accuracy and efficiency in practical application are poor.
In a first aspect, the present application provides a method for early warning of a failure of a heterogeneous hard disk system, including:
obtaining hard disk state attribute data of hard disks of different types in a heterogeneous hard disk system, clustering and grouping the hard disk state attribute data according to the hard disk model and the distribution difference of the hard disk state attribute data, and determining a hard disk cluster to which the hard disk state attribute data belongs;
respectively inputting hard disk state attribute data corresponding to each hard disk cluster into a hard disk failure prediction model of a preset multi-tower structure to perform abnormality detection processing, and obtaining hard disk health index information output by the hard disk failure prediction model of the multi-tower structure; the hard disk fault prediction model is obtained by training based on sample hard disk state attribute data and hard disk health tag information corresponding to the sample hard disk state attribute data;
and carrying out fault early warning on the heterogeneous hard disk system based on the hard disk health index information.
Further, the step of inputting the hard disk state attribute data corresponding to each hard disk cluster to a preset hard disk failure prediction model of the multi-tower structure to perform abnormality detection processing, and obtaining hard disk health index information output by the hard disk failure prediction model of the multi-tower structure specifically includes:
according to the hard disk class cluster to which the current hard disk state attribute data belongs, selecting and inputting the hard disk state attribute data to a corresponding hard disk individual feature extraction module in a hard disk fault prediction model of the multi-tower structure to obtain individual feature parameters corresponding to each hard disk class cluster;
inputting the hard disk state attribute data to a hard disk common characteristic extraction module in a hard disk fault prediction model of the multi-tower structure to obtain common characteristic parameters of different hard disk clusters;
determining target attribute characterization corresponding to different types of hard disks in the heterogeneous hard disk system based on the personality characteristic parameter and the commonality characteristic parameter;
and inputting the target attribute representation to a hard disk fault early-warning functional module in a hard disk fault prediction model of the multi-tower structure to perform fault reasoning analysis, so as to obtain hard disk health index information output by the hard disk fault early-warning functional module.
Further, the clustering grouping is performed on the hard disk state attribute data according to the distribution difference of the hard disk model and the hard disk state attribute data, and the determining of the hard disk class cluster to which the hard disk state attribute data belongs specifically includes:
according to the hard disk model in the heterogeneous hard disk system, distributing hard disk state attribute data belonging to the same hard disk model into the same hard disk class cluster; if the data volume of the hard disk state attribute data of one or more hard disk models is greater than or equal to a preset data volume threshold, splitting the corresponding one or more first hard disk class clusters into a plurality of second hard disk class clusters respectively; the data volume of the first hard disk cluster is larger than the data volume of the second hard disk cluster; or if the data volume of the hard disk state attribute data of one or more hard disk models is smaller than the data volume threshold, merging the corresponding one or more third hard disk class clusters into a fourth hard disk class cluster; the data volume of the fourth hard disk cluster is larger than the data volume of the third hard disk cluster;
clustering the hard disk state attribute data based on the distribution difference of the hard disk state attribute data so as to distribute the hard disk state attribute data to the second hard disk cluster or the fourth hard disk cluster, and determining the hard disk cluster to which the hard disk state attribute data belongs.
Further, the hard disk health index information is a hard disk health score value obtained by performing abnormality detection processing on the hard disk state attribute data by the hard disk fault prediction model of the multi-tower structure; the hard disk health score value is in direct proportion to the hard disk health degree;
the fault early warning is carried out on the heterogeneous hard disk system based on the hard disk health index information, and the fault early warning specifically comprises the following steps: and comparing and analyzing the hard disk health score value with a scoring threshold value selected currently, judging that the heterogeneous hard disk system fails under the condition that the hard disk health score value is smaller than the scoring threshold value, and generating corresponding failure early warning prompt information.
Further, after comparing the hard disk health score value with the currently selected scoring threshold value, the method further comprises: and under the condition that the hard disk health score value is larger than or equal to the scoring threshold value, judging that the heterogeneous hard disk system is in a health state.
Further, the heterogeneous hard disk system fault early warning method further comprises the following steps: under the condition that the heterogeneous hard disk system is in a healthy state, the hard disk state attribute data with the hard disk health score value being greater than or equal to the scoring threshold value is used as a new training sample to carry out self-adaptive gradient training on the hard disk fault prediction model of the multi-tower structure so as to update parameters of each module in the hard disk fault prediction model of the multi-tower structure in real time, obtain a new hard disk fault prediction model of the multi-tower structure, and carry out abnormality detection processing on the subsequently input hard disk state attribute data by utilizing the new hard disk fault prediction model of the multi-tower structure.
Further, before obtaining the hard disk state attribute data of the hard disks of different types in the heterogeneous hard disk system, the method further comprises: model training is carried out in an offline state, and a hard disk fault prediction model of the multi-tower structure is obtained;
model training is carried out in an offline state, and a hard disk fault prediction model of the multi-tower structure is obtained, which comprises the following steps:
acquiring sample hard disk state attribute data of a sample hard disk; the sample hard disk state attribute data comprise state attribute data corresponding to a healthy hard disk and state attribute data corresponding to an abnormal hard disk;
clustering and grouping the sample hard disk state attribute data according to the hard disk model of the sample hard disk and the distribution difference of the sample hard disk state attribute data, and determining a sample hard disk cluster to which the sample hard disk state attribute data belongs;
training an initial hard disk fault prediction model of the multi-tower structure based on the sample hard disk state attribute data corresponding to the sample hard disk class cluster, screening a plurality of hard disk abnormality detection thresholds by comparing the influence condition of the plurality of hard disk abnormality detection thresholds on a final abnormality detection result, determining a hard disk abnormality detection threshold meeting a preset probability confidence condition, and updating a scoring threshold of model parameters based on the hard disk abnormality detection threshold meeting the preset probability confidence condition to obtain the hard disk fault prediction model of the multi-tower structure after final training; the scoring threshold is a hard disk abnormality detection threshold with probability confidence degree larger than or equal to preset probability confidence degree.
Further, the obtaining hard disk state attribute data of different types of hard disks in the heterogeneous hard disk system specifically includes: acquiring original hard disk state attribute data of different types of hard disks in a heterogeneous hard disk system, and filling missing values of the original hard disk state attribute data to acquire first hard disk state attribute data; feature screening is carried out on the first hard disk state attribute data to obtain second hard disk state attribute data; and carrying out normalization processing on the second hard disk state attribute data to obtain the hard disk state attribute data.
Further, the determining, based on the personality characteristic parameter and the commonality characteristic parameter, the target attribute representation corresponding to different types of hard disks in the heterogeneous hard disk system specifically includes:
multiplying the personality characteristic parameter and the commonality characteristic parameter to obtain target attribute characterization corresponding to different types of hard disks in the heterogeneous hard disk system.
Further, the clustering of the hard disk state attribute data based on the distribution difference of the hard disk state attribute data, so as to distribute the hard disk state attribute data to the second hard disk cluster or the fourth hard disk cluster, and determining the hard disk cluster to which the hard disk state attribute data belongs specifically includes:
Determining a metric for the clustering;
and clustering the hard disk state attribute data based on the distribution difference of the hard disk state attribute data and the measurement criterion to distribute the hard disk state attribute data to the second hard disk cluster or the fourth hard disk cluster so as to obtain the hard disk cluster to which the hard disk state attribute data belongs.
Further, according to the hard disk cluster to which the current hard disk state attribute data belongs, the hard disk state attribute data is selectively input to a corresponding hard disk personality characteristic extraction module in a hard disk fault prediction model of the multi-tower structure, so as to obtain personality characteristic parameters corresponding to each hard disk cluster, which specifically includes:
determining identification information of a corresponding hard disk personality characteristic extraction module according to a hard disk class cluster to which the current hard disk state attribute data belongs;
based on the identification information, the hard disk state attribute data are selected and input to a corresponding hard disk personality characteristic extraction module in the hard disk fault prediction model of the multi-tower structure, and personality characteristic parameters corresponding to each hard disk cluster are obtained.
Further, the hard disk fault prediction model of the multi-tower structure comprises a plurality of hard disk individual feature extraction modules, each hard disk individual feature extraction module correspondingly processes hard disk state attribute data of one hard disk class cluster, and each hard disk individual feature extraction module corresponds to one target weight parameter.
In a second aspect, the present application further provides a heterogeneous hard disk system fault early warning device, including:
the clustering grouping unit is used for obtaining hard disk state attribute data of hard disks of different types in the heterogeneous hard disk system, clustering and grouping the hard disk state attribute data according to the hard disk model and the distribution difference of the hard disk state attribute data, and determining the hard disk class cluster to which the hard disk state attribute data belong;
the fault analysis unit is used for respectively inputting the hard disk state attribute data corresponding to each hard disk cluster into a hard disk fault prediction model of a preset multi-tower structure to perform abnormality detection processing, and obtaining hard disk health index information output by the hard disk fault prediction model of the multi-tower structure; the hard disk fault prediction model is obtained by training based on sample hard disk state attribute data and hard disk health tag information corresponding to the sample hard disk state attribute data;
and the fault early warning unit is used for carrying out fault early warning on the heterogeneous hard disk system based on the hard disk health index information.
Further, the fault analysis unit is specifically configured to:
according to the hard disk class cluster to which the current hard disk state attribute data belongs, selecting and inputting the hard disk state attribute data to a corresponding hard disk individual feature extraction module in a hard disk fault prediction model of the multi-tower structure to obtain individual feature parameters corresponding to each hard disk class cluster;
Inputting the hard disk state attribute data to a hard disk common characteristic extraction module in a hard disk fault prediction model of the multi-tower structure to obtain common characteristic parameters of different hard disk clusters;
determining target attribute characterization corresponding to different types of hard disks in the heterogeneous hard disk system based on the personality characteristic parameter and the commonality characteristic parameter;
and inputting the target attribute representation to a hard disk fault early-warning functional module in a hard disk fault prediction model of the multi-tower structure to perform fault reasoning analysis, so as to obtain hard disk health index information output by the hard disk fault early-warning functional module.
Further, the clustering grouping is performed on the hard disk state attribute data according to the distribution difference of the hard disk model and the hard disk state attribute data, and the determining of the hard disk class cluster to which the hard disk state attribute data belongs specifically includes:
according to the hard disk model in the heterogeneous hard disk system, distributing hard disk state attribute data belonging to the same hard disk model into the same hard disk class cluster; if the data volume of the hard disk state attribute data of one or more hard disk models is greater than or equal to a preset data volume threshold, splitting the corresponding one or more first hard disk class clusters into a plurality of second hard disk class clusters respectively; the data volume of the first hard disk cluster is larger than the data volume of the second hard disk cluster; or if the data volume of the hard disk state attribute data of one or more hard disk models is smaller than the data volume threshold, merging the corresponding one or more third hard disk class clusters into a fourth hard disk class cluster; the data volume of the fourth hard disk cluster is larger than the data volume of the third hard disk cluster;
Clustering the hard disk state attribute data based on the distribution difference of the hard disk state attribute data so as to distribute the hard disk state attribute data to the second hard disk cluster or the fourth hard disk cluster, and determining the hard disk cluster to which the hard disk state attribute data belongs.
Further, the hard disk health index information is a hard disk health score value obtained by performing abnormality detection processing on the hard disk state attribute data by the hard disk fault prediction model of the multi-tower structure; the hard disk health score value is in direct proportion to the hard disk health degree;
the fault early warning unit is specifically configured to: and comparing and analyzing the hard disk health score value with a scoring threshold value selected currently, judging that the heterogeneous hard disk system fails under the condition that the hard disk health score value is smaller than the scoring threshold value, and generating corresponding failure early warning prompt information.
Further, after comparing the hard disk health score value with the scoring threshold value selected currently, the fault early warning unit is further configured to: and under the condition that the hard disk health score value is larger than or equal to the scoring threshold value, judging that the heterogeneous hard disk system is in a health state.
Further, the heterogeneous hard disk system fault early warning device further comprises: and the model parameter updating unit is used for carrying out self-adaptive gradient training on the hard disk fault prediction model of the multi-tower structure by taking the hard disk state attribute data with the hard disk health score value larger than or equal to the scoring threshold value as a new training sample under the condition that the heterogeneous hard disk system is in a health state so as to update the parameters of each module in the hard disk fault prediction model of the multi-tower structure in real time, so as to obtain a new hard disk fault prediction model of the multi-tower structure, and carrying out abnormality detection processing on the subsequently input hard disk state attribute data by utilizing the new hard disk fault prediction model of the multi-tower structure.
Further, before obtaining the hard disk state attribute data of the hard disks of different types in the heterogeneous hard disk system, the method further comprises: the model offline training unit is used for performing model training in an offline state to obtain a hard disk fault prediction model of the multi-tower structure;
the model offline training unit is specifically configured to:
acquiring sample hard disk state attribute data of a sample hard disk; the sample hard disk state attribute data comprise state attribute data corresponding to a healthy hard disk and state attribute data corresponding to an abnormal hard disk;
Clustering and grouping the sample hard disk state attribute data according to the hard disk model of the sample hard disk and the distribution difference of the sample hard disk state attribute data, and determining a sample hard disk cluster to which the sample hard disk state attribute data belongs;
training an initial hard disk fault prediction model of the multi-tower structure based on the sample hard disk state attribute data corresponding to the sample hard disk class cluster, screening a plurality of hard disk abnormality detection thresholds by comparing the influence condition of the plurality of hard disk abnormality detection thresholds on a final abnormality detection result, determining a hard disk abnormality detection threshold meeting a preset probability confidence condition, and updating a scoring threshold of model parameters based on the hard disk abnormality detection threshold meeting the preset probability confidence condition to obtain the hard disk fault prediction model of the multi-tower structure after final training; the scoring threshold is a hard disk abnormality detection threshold with probability confidence degree larger than or equal to preset probability confidence degree.
Further, the clustering grouping unit is specifically configured to: acquiring original hard disk state attribute data of different types of hard disks in a heterogeneous hard disk system, and filling missing values of the original hard disk state attribute data to acquire first hard disk state attribute data; feature screening is carried out on the first hard disk state attribute data to obtain second hard disk state attribute data; and carrying out normalization processing on the second hard disk state attribute data to obtain the hard disk state attribute data.
Further, the determining, based on the personality characteristic parameter and the commonality characteristic parameter, the target attribute representation corresponding to different types of hard disks in the heterogeneous hard disk system specifically includes:
multiplying the personality characteristic parameter and the commonality characteristic parameter to obtain target attribute characterization corresponding to different types of hard disks in the heterogeneous hard disk system.
Further, the clustering of the hard disk state attribute data based on the distribution difference of the hard disk state attribute data, so as to distribute the hard disk state attribute data to the second hard disk cluster or the fourth hard disk cluster, and determining the hard disk cluster to which the hard disk state attribute data belongs specifically includes:
determining a metric for the clustering;
and clustering the hard disk state attribute data based on the distribution difference of the hard disk state attribute data and the measurement criterion to distribute the hard disk state attribute data to the second hard disk cluster or the fourth hard disk cluster so as to obtain the hard disk cluster to which the hard disk state attribute data belongs.
Further, according to the hard disk cluster to which the current hard disk state attribute data belongs, the hard disk state attribute data is selectively input to a corresponding hard disk personality characteristic extraction module in a hard disk fault prediction model of the multi-tower structure, so as to obtain personality characteristic parameters corresponding to each hard disk cluster, which specifically includes:
Determining identification information of a corresponding hard disk personality characteristic extraction module according to a hard disk class cluster to which the current hard disk state attribute data belongs;
based on the identification information, the hard disk state attribute data are selected and input to a corresponding hard disk personality characteristic extraction module in the hard disk fault prediction model of the multi-tower structure, and personality characteristic parameters corresponding to each hard disk cluster are obtained.
Further, the hard disk fault prediction model of the multi-tower structure comprises a plurality of hard disk individual feature extraction modules, each hard disk individual feature extraction module correspondingly processes hard disk state attribute data of one hard disk class cluster, and each hard disk individual feature extraction module corresponds to one target weight parameter.
In a third aspect, the present application further provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the steps of the heterogeneous hard disk system failure warning method according to any one of the above when the computer program is executed.
In a fourth aspect, the present application further provides a processor readable storage medium, where a computer program is stored on the processor readable storage medium, where the computer program when executed by a processor implements the steps of the heterogeneous hard disk system fault early warning method according to any one of the above.
According to the heterogeneous hard disk system fault early warning method, hard disk state attribute data of different types of hard disks in the heterogeneous hard disk system are obtained, clustering grouping is conducted on the hard disk state attribute data according to the hard disk model and the distribution difference of the hard disk state attribute data, hard disk clusters to which the hard disk state attribute data belong are determined, the hard disk state attribute data corresponding to each hard disk cluster are respectively input into a preset hard disk fault prediction model of a multi-tower structure to conduct abnormal detection processing, hard disk health index information output by the hard disk fault prediction model of the multi-tower structure is obtained, fault early warning is conducted on the heterogeneous hard disk system based on the hard disk health index information, accuracy and efficiency of heterogeneous hard disk system fault early warning can be effectively improved, and therefore stability and safety of heterogeneous hard disk system operation are improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, a brief description will be given below of the drawings required for the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings may be obtained according to these drawings without any inventive effort for a person skilled in the art.
Fig. 1 is a flow chart of a heterogeneous hard disk system fault early warning method provided in an embodiment of the present application;
fig. 2 is a complete flow diagram of a fault early warning method for a heterogeneous hard disk system according to an embodiment of the present application;
fig. 3 is a schematic diagram of a hard disk failure prediction model of a multi-tower structure in the heterogeneous hard disk system failure early warning method provided in the embodiment of the present application;
fig. 4 is a schematic structural diagram of a fault early warning device for a heterogeneous hard disk system according to an embodiment of the present application;
fig. 5 is a schematic diagram of a hardware environment of a heterogeneous hard disk system fault early warning method according to an embodiment of the present application;
fig. 6 is a schematic entity structure of an electronic device according to an embodiment of the present application.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden based on the embodiments herein, are within the scope of the present application.
The following describes embodiments of the method for early warning faults of heterogeneous hard disk systems in detail based on the method for early warning faults of heterogeneous hard disk systems. As shown in fig. 1, a flow chart of a heterogeneous hard disk system fault early warning method provided in an embodiment of the present application is shown, and a specific implementation process includes the following steps:
step 101: obtaining hard disk state attribute data of hard disks of different types in a heterogeneous hard disk system, clustering and grouping the hard disk state attribute data according to the hard disk model and the distribution difference of the hard disk state attribute data, and determining the hard disk class cluster to which the hard disk state attribute data belongs.
In the embodiment of the invention, after the hard disk state attribute data of different types of hard disks in the heterogeneous hard disk system are obtained, the hard disk state attribute data belonging to the same hard disk model can be distributed into the same hard disk class cluster according to the hard disk model in the heterogeneous hard disk system; if the data volume of the hard disk state attribute data of one or more hard disk models is greater than or equal to a preset data volume threshold, splitting the corresponding one or more first hard disk class clusters into a plurality of second hard disk class clusters respectively; the data volume of the first hard disk cluster is larger than the data volume of the second hard disk cluster; or if the data volume of the hard disk state attribute data of one or more hard disk models is smaller than the data volume threshold, merging the corresponding one or more third hard disk class clusters into a fourth hard disk class cluster; and the data volume of the fourth hard disk cluster is larger than that of the third hard disk cluster. Clustering the hard disk state attribute data based on the distribution difference of the hard disk state attribute data so as to distribute the hard disk state attribute data to the second hard disk cluster or the fourth hard disk cluster, and determining the hard disk cluster to which the hard disk state attribute data belongs. The method for obtaining the hard disk state attribute data of the hard disks of different types in the heterogeneous hard disk system comprises the following specific implementation processes: acquiring original hard disk state attribute data of different types of hard disks in a heterogeneous hard disk system, and filling missing values of the original hard disk state attribute data to acquire first hard disk state attribute data; feature screening is carried out on the first hard disk state attribute data to obtain second hard disk state attribute data; and carrying out normalization processing on the second hard disk state attribute data to obtain the hard disk state attribute data. In the process of clustering the hard disk state attribute data based on the distribution difference of the hard disk state attribute data to distribute the hard disk state attribute data to the second hard disk cluster or the fourth hard disk cluster and determining the hard disk cluster to which the hard disk state attribute data belongs, a measurement criterion for clustering needs to be determined first, and then clustering is performed on the hard disk state attribute data based on the measurement criterion and the distribution difference of the hard disk state attribute data to distribute the hard disk state attribute data to the second hard disk cluster or the fourth hard disk cluster so as to obtain the hard disk cluster to which the hard disk state attribute data belongs. The hard disk state attribute data includes, but is not limited to, data read error rate, disk start time, motor start-stop count, remapped sector number, seek error rate, seek time, hard disk power-on time, etc. The data read error rate is a hardware read error rate which occurs when data is read from the surface of the magnetic disk, and the smaller the value is, the better the value is. The disc start time is the average time that the disc has been started from rest and then accelerated to the nominal rotational speed during this process, the smaller the value the better. And will not be described in detail herein. The hard disk class cluster is a set of hard disk state attribute data belonging to the same type of hard disk or similar hard disk attribute states.
It should be noted that, as shown in fig. 2, the embodiment of the present invention includes an offline training phase and an online prediction phase, and an online model parameter updating phase. Wherein the offline training phase comprises a solid line connection part in fig. 2, and the online prediction phase and the online model parameter updating phase comprise a broken line connection part in fig. 2, namely, the solid line represents the offline training phase, and the broken line represents the online prediction phase and the online model parameter updating phase. That is, before executing this step, model training needs to be performed in advance in an offline state to obtain a hard disk failure prediction model of the multi-tower structure. Model training is carried out in an offline state, a hard disk fault prediction model of the multi-tower structure is obtained, and the corresponding specific implementation process comprises the following steps: acquiring sample hard disk state attribute data of a sample hard disk; the sample hard disk state attribute data comprise state attribute data corresponding to a healthy hard disk and state attribute data corresponding to an abnormal hard disk. And clustering and grouping the sample hard disk state attribute data according to the hard disk model of the sample hard disk and the distribution difference of the sample hard disk state attribute data, and determining the sample hard disk class cluster to which the sample hard disk state attribute data belongs. Training an initial hard disk fault prediction model of the multi-tower structure based on the sample hard disk state attribute data corresponding to the sample hard disk class cluster, screening a plurality of hard disk abnormality detection thresholds by comparing the influence condition of the plurality of hard disk abnormality detection thresholds on a final abnormality detection result, determining a hard disk abnormality detection threshold meeting a preset probability confidence condition, and updating a scoring threshold of model parameters based on the hard disk abnormality detection threshold meeting the preset probability confidence condition to obtain the hard disk fault prediction model of the multi-tower structure after final training; the scoring threshold is a hard disk abnormality detection threshold with probability confidence degree larger than or equal to preset probability confidence degree. That is, when screening the hard disk abnormality detection threshold, the screening result of the abnormality detection threshold (i.e. scoring threshold) can be determined according to the existing normal hard disk data (i.e. hard disk health status data) and fault hard disk data (i.e. hard disk abnormality status data), and the quality of the influence of different thresholds on the final prediction result can be compared, so as to update the high confidence scoring threshold of the model parameters.
Specifically, in the offline training stage, first, according to the hard disk model, hard disk state attribute data (i.e., s.m. a.r.t. data), and the like, a clustering algorithm is adopted to divide hard disk related data of hard disks of different types into a plurality of hard disk clusters, and the data of the hard disk clusters is used to train a multi-tower anomaly detection model (i.e., an initial multi-tower hard disk failure prediction model). The online prediction stage comprises the steps of obtaining new data through preprocessing, and predicting whether a fault risk exists at the moment by adopting a multi-tower abnormality detection model (namely a hard disk fault prediction model of a multi-tower structure after the offline training stage) to send out early warning information. In addition, whether the hard disk health score meets the high confidence coefficient condition is judged, and if so, the current test sample is used for updating the model parameters. The specific implementation process of the algorithm is as follows: firstly, preprocessing S.M.A.R.T data of hard disks of different types, wherein the preprocessing comprises missing value filling, normalization processing, feature screening and other preprocessing contents. Then cluster approval is carried out according to the differences of the hard disk model and the hard disk state attribute data: clustering the S.M.A.R.T data of the preprocessed hard disk, and in order to consider the different proportions of the data sizes of the hard disk of different models, following the following criteria, namely a rough classification stage and a subdivision stage, when the hard disk data are clustered: in the rough classification stage, hard disk data of the same model can be placed in the same hard disk cluster, and if the data volume of one or a plurality of hard disk of the same model is overlarge, the hard disk data of the same model is split into hard disk clusters, namely one large hard disk cluster (namely a first hard disk cluster) is split into a plurality of subclasses of clusters (namely a second hard disk cluster); and combining one or more hard disk clusters with smaller data volume (namely, a third hard disk cluster) to obtain a large hard disk cluster (namely, a hard disk cluster). In the subdivision stage, the above cluster splitting or merging is divided according to the hard disk model, and in order to further ensure the hard disk data aggregation grouping of similar data distribution, a clustering algorithm (such as K-means, dbscan, hierarchical clustering) may be used to cluster the hard disk state attribute data by using a preset measurement criterion.
In the training of the multi-tower anomaly detection model, according to the obtained hard disk state attribute data of the plurality of hard disk clusters as sample data, the multi-tower anomaly detection model can be effectively trained to obtain a hard disk failure prediction model of a multi-tower structure, and the hard disk failure prediction model structure of the multi-tower structure is shown in fig. 3, and the specific process is as follows: the sample data for training typically includes s.m. a.r.t. characteristic information, tag information of whether the hard disk is abnormal at this time, and a cluster-like indicator (Input in fig. 3) to which the sample data belongs.
Inputting the hard Disk state attribute data of the plurality of hard Disk clusters into a multi-tower abnormality detection model, wherein a Gate Net network structure (Gate Net in fig. 3) in the multi-tower abnormality detection model can select to input sample data into a corresponding hard Disk unique Weight training module (namely, disk-1 Weight, disk-2 Weight …, disk-N Weight, N is a positive integer greater than 1 in fig. 3) according to the hard Disk cluster to which the current sample data belongs, and the hard Disk unique Weight training module is also called a hard Disk personality characteristic extraction module or a hard Disk unique parameter module and outputs unique parameters (namely, personality characteristic parameters) of the hard Disk cluster to which the hard Disk belongs; in addition, the sample data may be input into a hard disk sharing weight training module (i.e. Disk Shared Weight in fig. 3, also referred to as a hard disk sharing feature extraction module or a hard disk sharing parameter module) to obtain sharing feature parameters (i.e. common feature parameters) of different hard disk clusters, and the hard disk sharing feature extraction module and the hard disk personality feature extraction module output results are multiplied to obtain final target attribute representation of the hard disk. And inputting the final target attribute representation of the hard disk into a hard disk fault early warning functional module (corresponding to Output in fig. 3) for fault reasoning analysis to obtain hard disk scoring (namely a hard disk health score value), and taking the hard disk health score value as hard disk health index information.
The training target algorithm of the multi-tower abnormality detection model is as follows, wherein the training target is a predicted value calculated and outputAnd the input actual value +.>Maximum value of similarity between:
wherein,representing hard disk class cluster ciA predicted value of sample data (i.e., an output hard disk health score value); />An actual value (i.e., an input hard disk health score value) corresponding to the sample data; n is the number of the hard disk clusters divided in the embodiment of the invention; />The total number of sample data in the hard disk class cluster c is the total number of hard disk state attribute data of hard disks of different types.
Step 102: respectively inputting hard disk state attribute data corresponding to each hard disk cluster into a hard disk failure prediction model of a preset multi-tower structure to perform abnormality detection processing, and obtaining hard disk health index information output by the hard disk failure prediction model of the multi-tower structure; the hard disk fault prediction model is obtained by training based on sample hard disk state attribute data and hard disk health tag information corresponding to the sample hard disk state attribute data.
In the embodiment of the invention, the hard disk state attribute data can be selectively input to the corresponding hard disk personality characteristic extraction module in the hard disk fault prediction model of the multi-tower structure according to the hard disk class cluster to which the current hard disk state attribute data belongs, so as to obtain personality characteristic parameters corresponding to each hard disk class cluster. And inputting the hard disk state attribute data to a hard disk common characteristic extraction module in a hard disk fault prediction model of the multi-tower structure to obtain common characteristic parameters of different hard disk clusters. And determining target attribute characterization corresponding to different types of hard disks in the heterogeneous hard disk system based on the personality characteristic parameter and the commonality characteristic parameter. And inputting the target attribute representation to a hard disk fault early-warning functional module in a hard disk fault prediction model of the multi-tower structure to perform fault reasoning analysis, so as to obtain hard disk health index information output by the hard disk fault early-warning functional module. The determining, based on the personality characteristic parameter and the commonality characteristic parameter, target attribute characterizations corresponding to different types of hard disks in the heterogeneous hard disk system, where the corresponding specific implementation process may include: multiplying the individual characteristic parameters and the common characteristic parameters to obtain target attribute characterization corresponding to different types of hard disks in the heterogeneous hard disk system.
And in the process of selecting and inputting the hard disk state attribute data to the corresponding hard disk personality characteristic extraction module in the hard disk fault prediction model of the multi-tower structure according to the hard disk class cluster to which the current hard disk state attribute data belongs, determining the identification information of the corresponding hard disk personality characteristic extraction module according to the hard disk class cluster to which the current hard disk state attribute data belongs, and further selecting and inputting the hard disk state attribute data to the corresponding hard disk personality characteristic extraction module in the hard disk fault prediction model of the multi-tower structure based on the identification information to obtain the personality characteristic parameter corresponding to each hard disk class cluster. The hard disk fault prediction model of the multi-tower structure comprises a plurality of hard disk individual feature extraction modules, wherein each hard disk individual feature extraction module correspondingly processes hard disk state attribute data of one hard disk cluster, and each hard disk individual feature extraction module corresponds to one target weight parameter.
As shown in fig. 3, the network structure of the hard Disk failure prediction model of the multi-tower structure includes a plurality of sub-networks, such as a hard Disk personality feature extraction module 1, hard Disk personality feature extraction modules 2 and …, a hard Disk personality feature extraction module N (i.e., corresponding to the Disk-1 Weight, disk-2 Weight, … …, disk-N Weight, respectively), a hard Disk commonality feature extraction module (i.e., disk Shared Weight), a hard Disk failure early warning function module (i.e., output), and a weighting factor prediction module (i.e., gate Net network structure). The Gate Net network structure is used for extracting Weight factors corresponding to hard Disk state attribute data of different types of hard disks in the heterogeneous hard Disk system (namely, the data is correspondingly input into probability values of individual characteristic extraction modules of the hard disks), for example, if N is 3, the Weight factor of the corresponding Disk-1 Weight subnetwork can be 0.1, the Weight factor of the corresponding Disk-2 Weight subnetwork can be 0.8, and the Weight factor of the corresponding Disk-3 Weight subnetwork can be 0.1. And based on the weight factors, respectively multiplying the weight factors with the individual characteristics extracted by the three hard disk individual characteristic extraction modules, and then splicing to obtain individual characteristic parameters corresponding to each hard disk cluster. And inputting the hard disk state attribute data to a hard disk common characteristic extraction module in a hard disk fault prediction model of the multi-tower structure to obtain common characteristic parameters of different hard disk clusters. And determining target attribute characterization corresponding to different types of hard disks in the heterogeneous hard disk system based on multiplication of the individual characteristic parameter and the common characteristic parameter. And inputting the target attribute representation to a hard disk fault early-warning functional module in a hard disk fault prediction model of the multi-tower structure to perform fault reasoning analysis, so as to obtain hard disk health index information output by the hard disk fault early-warning functional module.
In the embodiment of the present invention, the algorithm formulas corresponding to the individual characteristic extraction modules of the plurality of hard disks respectively may be:
FFN(x)=relu(relu(xW1+b1)W2+b2)
in the formula, x is hard disk state attribute data of hard disks of different types; w1 and b1 are respectively a first weight coefficient and a first deviation coefficient of a first layer network structure in the module; the relu inside the brackets is the activation function of the first layer; w2 and b2 are respectively a second weight coefficient and a second deviation coefficient of a second layer network in the module; the relu outside the brackets is the activation function of the layer two network.
In the embodiment of the present invention, the algorithm formula corresponding to the hard disk commonality feature extraction module may be: FFN (x) =relu (relu (xw1+b1) w2+b2)
In the formula, x is hard disk state attribute data of hard disks of different types; w1 and b1 are respectively a first weight coefficient and a first deviation coefficient of a first layer network structure in the module; activating function of the relu first layer inside the brackets; w2 and b2 are respectively a second weight coefficient and a second deviation coefficient of a second layer network in the module; the activation function of the relu second layer network outside the brackets.
In the embodiment of the present invention, the algorithm formula corresponding to the hard disk fault early warning function module may be: FFN (x) =sigmoid (xw1+b1)
In the formula, x is a target attribute representation after multiplication of a hard disk shared representation and a specific representation, w1 and b1 are a first weight coefficient and a first deviation coefficient of a network structure in the module respectively; the sigmoid is an activation function of a network structure, outputs 0-1, and indicates whether a hard disk fails or not, and the larger the score value is, the larger the failure probability is.
In the embodiment of the present invention, the algorithm formula corresponding to the weight coefficient prediction module may be:
FFN(x)= softmax(relu(xW1+b1)W2+b2)
in the formula, x is related data of hard disks of different types, such as hard disk types, manufacturer identifiers and the like; w1 and b1 are respectively a first weight coefficient and a first deviation coefficient of a first layer network structure in the module; the activation function of the relu first layer in brackets, w2 and b2 are respectively a weight coefficient and a deviation coefficient; softmax is a function of the normalized processing network, the corresponding output is replaced by a weight factor between 0 and 1, and the sum of all the weight factors is equal to 1.
Step 103: and carrying out fault early warning on the heterogeneous hard disk system based on the hard disk health index information.
In the embodiment of the invention, the hard disk health index information is a hard disk health score value obtained by performing abnormality detection processing on the hard disk state attribute data by the hard disk fault prediction model of the multi-tower structure; the hard disk health score value is in direct proportion to the hard disk health degree.
Specifically, the hard disk health score value is compared with a scoring threshold value selected currently for analysis, and under the condition that the hard disk health score value is smaller than the scoring threshold value, the heterogeneous hard disk system is judged to be faulty, and corresponding fault early warning prompt information is generated. In other words, in the online prediction stage, after the new data is preprocessed, the new data is input into the hard disk fault prediction model of the medium-multi-tower structure to score the health level of the hard disk at the moment, and whether early warning is sent out or not is judged according to the detection threshold.
In addition, after comparing the hard disk health score value with the currently selected scoring threshold value, the heterogeneous hard disk system may be determined to be in a health state if the hard disk health score value is greater than or equal to the scoring threshold value. Under the condition that the heterogeneous hard disk system is in a healthy state, the hard disk state attribute data with the hard disk health score value being greater than or equal to the scoring threshold value can be used as a new training sample to carry out self-adaptive gradient training on the hard disk fault prediction model of the multi-tower structure so as to update the parameters of each module in the hard disk fault prediction model of the multi-tower structure in real time, obtain a new hard disk fault prediction model of the multi-tower structure, and carry out abnormality detection processing on the subsequently input hard disk state attribute data by utilizing the new hard disk fault prediction model of the multi-tower structure. That is, in the process of updating the model parameters online, whether the current hard disk score value meets the preset confidence coefficient condition can be judged according to the obtained high confidence coefficient scoring threshold value, if so, the current sample is used for updating the model parameters, otherwise, the model parameters are not updated.
In the embodiment of the invention, a solution can be provided for the problems of influence of unbalance of hard disk data of different types on updating model parameters, self-adaptive updating of model parameters and the like, and the specific steps are as follows: 1) Clustering and grouping the S.M.A.R.T data of the hard disks with different hard disk types by adopting a clustering algorithm according to the information such as the hard disk types, the distribution difference of the S.M.A.R.T data and the like so as to ensure the consistency of the data quantity and the data distribution difference among the hard disk clusters and the consistency of the data distribution in the hard disk clusters; 2) Based on S.M.A.R.T data of different hard disk clusters, a multi-tower fault early warning model (namely a hard disk fault prediction model of a multi-tower structure) which can be adapted to different types of hard disks can be established, wherein the multi-tower fault early warning model is mainly divided into a hard disk sharing parameter module, a hard disk specific parameter module and a hard disk fault early warning function module, and the hard disk sharing parameter is used for capturing commonalities of the hard disks of different types, namely commonality characteristic parameters; the characteristic parameters of the hard disk are used for representing the data distribution attribute (namely the individual characteristic parameters) of the hard disk cluster, multiplying the characteristic parameters by the characteristic parameters to represent the final target attribute representation of the current hard disk, wherein the target attribute representation is a characteristic vector, and inputting the target attribute representation into a hard disk fault early warning function module to acquire the information of whether the hard disk at the current moment can fail or not; 3) And carrying out self-adaptive gradient updating on each module parameter of the multi-tower fault early-warning model according to the probability confidence of predicting the hard disk fault, so as to realize real-time updating of each module parameter in the multi-tower fault early-warning model and ensure the prediction precision of the multi-tower fault early-warning model.
It should be noted that, the influence of the unbalanced hard disk model data can be solved by clustering hard disks of different models. The imbalance problem is that machine learning is applied in the field of anomaly detection, and the sources of parameter update on the model are mostly from sample data with more occupation ratio (namely normal hard disk data in the text), so that the characteristics of a small number of types of samples cannot be well characterized; the overall accuracy is high in the prediction result, and the accuracy is low in samples with fewer types. In particular to the proposed scheme, the two aspects are presented: the hard disk data quantity distribution difference of different types is large, and the scheme adopts two modes of coarse clustering and fine clustering to realize the balance of the number of the different hard disk types; the data difference between the fault hard disk and the normal hard disk is large, and the fault hard disk data can be solved by adopting modes of up-sampling, countermeasure learning and the like, and detailed description is omitted. In addition, the multi-tower structure based on the hard disk fault prediction model can solve the commonality and the dissimilarity of hard disk data of different models from the aspect of the model structure, namely, the scheme of the application is divided into a plurality of towers according to the number of hard disk clusters, and model parameters are prevented from being biased by a large amount of data on the model structure. In addition, the multi-tower structures are kept independent from each other, and even part of hard disk clusters with small data are still effective, so that the problem of sample unbalance is further optimized. The method and the device can screen high-confidence samples to update parameters of each module in the multi-tower fault early warning model. Robustness in hard disk fault early warning is reflected in indexes, so that the fault hard disk early warning rate is improved, and the false alarm rate of a normal hard disk is reduced. The method and the device can update the model parameters in real time through multiple aspects, and ensure that the model can capture the data distribution characteristics of the heterogeneous hard disk system in real time. And when the samples are updated, samples with high confidence are selected, so that the model parameters are further prevented from being influenced by misjudgment samples. In addition, due to the time-varying property of the disk, when the multi-tower fault early warning model is constructed, sequence data in a time window is Input to the Input module in fig. 3, and the sequence data is fused into the LSTM model to better capture the time-varying property. Specifically, samples in each time series, in addition to SMART data of the sample data itself, may also be input by hidden traversals of LSTM structures of previous samples.
According to the heterogeneous hard disk system fault early warning method, hard disk state attribute data of different types of hard disks in the heterogeneous hard disk system are obtained, the hard disk state attribute data are clustered and grouped according to the hard disk model and the distribution difference of the hard disk state attribute data, hard disk clusters to which the hard disk state attribute data belong are determined, the hard disk state attribute data corresponding to each hard disk cluster are respectively input into a hard disk fault prediction model of a preset multi-tower structure for carrying out abnormality detection processing, hard disk health index information output by the hard disk fault prediction model of the multi-tower structure is obtained, fault early warning is carried out on the heterogeneous hard disk system based on the hard disk health index information, and therefore accuracy and efficiency of heterogeneous hard disk system fault early warning can be effectively improved, and stability and safety of heterogeneous hard disk system operation are improved.
Corresponding to the heterogeneous hard disk system fault early warning method provided by the application, the application also provides a heterogeneous hard disk system fault early warning device. Since the embodiment of the device is similar to the embodiment of the method, the description is relatively simple, and the relevant point is just to refer to the description of the embodiment of the method, and the embodiment of the heterogeneous hard disk system fault early warning device described below is only illustrative. Fig. 4 is a schematic structural diagram of a fault early warning device for a heterogeneous hard disk system according to an embodiment of the present application. The utility model provides a heterogeneous hard disk system trouble early warning device, includes following part:
The clustering grouping unit 401 is configured to obtain hard disk state attribute data of hard disks of different types in a heterogeneous hard disk system, perform clustering grouping on the hard disk state attribute data according to a hard disk model and a distribution difference of the hard disk state attribute data, and determine a hard disk cluster to which the hard disk state attribute data belongs;
the fault analysis unit 402 is configured to input hard disk state attribute data corresponding to each hard disk cluster to a hard disk fault prediction model of a preset multi-tower structure to perform abnormality detection processing, so as to obtain hard disk health index information output by the hard disk fault prediction model of the multi-tower structure; the hard disk fault prediction model is obtained by training based on sample hard disk state attribute data and hard disk health tag information corresponding to the sample hard disk state attribute data;
and the fault early warning unit 403 is configured to perform fault early warning on the heterogeneous hard disk system based on the hard disk health index information.
Further, the fault analysis unit is specifically configured to:
according to the hard disk class cluster to which the current hard disk state attribute data belongs, selecting and inputting the hard disk state attribute data to a corresponding hard disk individual feature extraction module in a hard disk fault prediction model of the multi-tower structure to obtain individual feature parameters corresponding to each hard disk class cluster;
Inputting the hard disk state attribute data to a hard disk common characteristic extraction module in a hard disk fault prediction model of the multi-tower structure to obtain common characteristic parameters of different hard disk clusters;
determining target attribute characterization corresponding to different types of hard disks in the heterogeneous hard disk system based on the personality characteristic parameter and the commonality characteristic parameter;
and inputting the target attribute representation to a hard disk fault early-warning functional module in a hard disk fault prediction model of the multi-tower structure to perform fault reasoning analysis, so as to obtain hard disk health index information output by the hard disk fault early-warning functional module.
Further, the clustering grouping is performed on the hard disk state attribute data according to the distribution difference of the hard disk model and the hard disk state attribute data, and the determining of the hard disk class cluster to which the hard disk state attribute data belongs specifically includes:
according to the hard disk model in the heterogeneous hard disk system, distributing hard disk state attribute data belonging to the same hard disk model into the same hard disk class cluster; if the data volume of the hard disk state attribute data of one or more hard disk models is greater than or equal to a preset data volume threshold, splitting the corresponding one or more first hard disk class clusters into a plurality of second hard disk class clusters respectively; the data volume of the first hard disk cluster is larger than the data volume of the second hard disk cluster; or if the data volume of the hard disk state attribute data of one or more hard disk models is smaller than the data volume threshold, merging the corresponding one or more third hard disk class clusters into a fourth hard disk class cluster; the data volume of the fourth hard disk cluster is larger than the data volume of the third hard disk cluster;
Clustering the hard disk state attribute data based on the distribution difference of the hard disk state attribute data so as to distribute the hard disk state attribute data to the second hard disk cluster or the fourth hard disk cluster, and determining the hard disk cluster to which the hard disk state attribute data belongs.
Further, the hard disk health index information is a hard disk health score value obtained by performing abnormality detection processing on the hard disk state attribute data by the hard disk fault prediction model of the multi-tower structure; the hard disk health score value is in direct proportion to the hard disk health degree;
the fault early warning unit is specifically configured to: and comparing and analyzing the hard disk health score value with a scoring threshold value selected currently, judging that the heterogeneous hard disk system fails under the condition that the hard disk health score value is smaller than the scoring threshold value, and generating corresponding failure early warning prompt information.
Further, after comparing the hard disk health score value with the scoring threshold value selected currently, the fault early warning unit is further configured to: and under the condition that the hard disk health score value is larger than or equal to the scoring threshold value, judging that the heterogeneous hard disk system is in a health state.
Further, the heterogeneous hard disk system fault early warning device further comprises: and the model parameter updating unit is used for carrying out self-adaptive gradient training on the hard disk fault prediction model of the multi-tower structure by taking the hard disk state attribute data with the hard disk health score value larger than or equal to the scoring threshold value as a new training sample under the condition that the heterogeneous hard disk system is in a health state so as to update the parameters of each module in the hard disk fault prediction model of the multi-tower structure in real time, so as to obtain a new hard disk fault prediction model of the multi-tower structure, and carrying out abnormality detection processing on the subsequently input hard disk state attribute data by utilizing the new hard disk fault prediction model of the multi-tower structure.
Further, before obtaining the hard disk state attribute data of the hard disks of different types in the heterogeneous hard disk system, the method further comprises: the model offline training unit is used for performing model training in an offline state to obtain a hard disk fault prediction model of the multi-tower structure;
the model offline training unit is specifically configured to:
acquiring sample hard disk state attribute data of a sample hard disk; the sample hard disk state attribute data comprise state attribute data corresponding to a healthy hard disk and state attribute data corresponding to an abnormal hard disk;
Clustering and grouping the sample hard disk state attribute data according to the hard disk model of the sample hard disk and the distribution difference of the sample hard disk state attribute data, and determining a sample hard disk cluster to which the sample hard disk state attribute data belongs;
training an initial hard disk fault prediction model of the multi-tower structure based on the sample hard disk state attribute data corresponding to the sample hard disk class cluster, screening a plurality of hard disk abnormality detection thresholds by comparing the influence condition of the plurality of hard disk abnormality detection thresholds on a final abnormality detection result, determining a hard disk abnormality detection threshold meeting a preset probability confidence condition, and updating a scoring threshold of model parameters based on the hard disk abnormality detection threshold meeting the preset probability confidence condition to obtain the hard disk fault prediction model of the multi-tower structure after final training; the scoring threshold is a hard disk abnormality detection threshold with probability confidence degree larger than or equal to preset probability confidence degree.
Further, the clustering grouping unit is specifically configured to: acquiring original hard disk state attribute data of different types of hard disks in a heterogeneous hard disk system, and filling missing values of the original hard disk state attribute data to acquire first hard disk state attribute data; feature screening is carried out on the first hard disk state attribute data to obtain second hard disk state attribute data; and carrying out normalization processing on the second hard disk state attribute data to obtain the hard disk state attribute data.
Further, the determining, based on the personality characteristic parameter and the commonality characteristic parameter, the target attribute representation corresponding to different types of hard disks in the heterogeneous hard disk system specifically includes:
multiplying the personality characteristic parameter and the commonality characteristic parameter to obtain target attribute characterization corresponding to different types of hard disks in the heterogeneous hard disk system.
Further, the clustering of the hard disk state attribute data based on the distribution difference of the hard disk state attribute data, so as to distribute the hard disk state attribute data to the second hard disk cluster or the fourth hard disk cluster, and determining the hard disk cluster to which the hard disk state attribute data belongs specifically includes:
determining a metric for the clustering;
and clustering the hard disk state attribute data based on the distribution difference of the hard disk state attribute data and the measurement criterion to distribute the hard disk state attribute data to the second hard disk cluster or the fourth hard disk cluster so as to obtain the hard disk cluster to which the hard disk state attribute data belongs.
Further, according to the hard disk cluster to which the current hard disk state attribute data belongs, the hard disk state attribute data is selectively input to a corresponding hard disk personality characteristic extraction module in a hard disk fault prediction model of the multi-tower structure, so as to obtain personality characteristic parameters corresponding to each hard disk cluster, which specifically includes:
Determining identification information of a corresponding hard disk personality characteristic extraction module according to a hard disk class cluster to which the current hard disk state attribute data belongs;
based on the identification information, the hard disk state attribute data are selected and input to a corresponding hard disk personality characteristic extraction module in the hard disk fault prediction model of the multi-tower structure, and personality characteristic parameters corresponding to each hard disk cluster are obtained.
Further, the hard disk fault prediction model of the multi-tower structure comprises a plurality of hard disk individual feature extraction modules, each hard disk individual feature extraction module correspondingly processes hard disk state attribute data of one hard disk class cluster, and each hard disk individual feature extraction module corresponds to one target weight parameter.
According to the heterogeneous hard disk system fault early warning device, the hard disk state attribute data of different types of hard disks in the heterogeneous hard disk system are obtained, the hard disk state attribute data are clustered and grouped according to the distribution difference of the hard disk model and the hard disk state attribute data, the hard disk type clusters to which the hard disk state attribute data belong are determined, the hard disk state attribute data corresponding to each hard disk type cluster are respectively input into a hard disk fault prediction model of a preset multi-tower structure for carrying out abnormality detection processing, the hard disk health index information output by the hard disk fault prediction model of the multi-tower structure is obtained, and fault early warning is carried out on the heterogeneous hard disk system based on the hard disk health index information, so that the accuracy and the efficiency of fault early warning of the heterogeneous hard disk system can be effectively improved, and the stability and the safety of the heterogeneous hard disk system are improved.
The method embodiments provided in the embodiments of the present application may be performed in a computer terminal, a device terminal, or a similar computing apparatus. Taking a computer terminal as an example, fig. 5 is a schematic diagram of a hardware environment of a heterogeneous hard disk system fault early warning method according to an embodiment of the present application. As shown in fig. 5, the computer terminal may include one or more (only one is shown in fig. 5) first processors 502 (the first processors 502 may include, but are not limited to, a microprocessor MCU or a processing device such as a programmable logic device FPGA) and a first memory 504 for storing data, and in one exemplary embodiment, the computer terminal may further include a transmission device 506 for communication functions and an input-output device 508. It will be appreciated by those skilled in the art that the configuration shown in fig. 5 is merely illustrative and is not intended to limit the configuration of the computer terminal described above. For example, the computer terminal may also include more or fewer components than shown in FIG. 5, or have a different configuration than the equivalent functions shown in FIG. 5 or more than the functions shown in FIG. 5. The first memory 504 may be used to store a computer program, for example, a software program of application software and a module, such as a computer program corresponding to a heterogeneous hard disk system failure warning method in the embodiment of the present application, and the first processor 502 executes the computer program stored in the memory 504, thereby performing various functional applications and data processing, that is, implementing the method described above. The first memory 504 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the first memory 504 may further include memory located remotely from the first processor 502, which may be connected to the computer terminal via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof. The transmission device 506 is used to receive or transmit data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of a computer terminal. In one example, the transmission device 506 includes a network adapter (Network Interface Controller, simply referred to as NIC) that can connect to other network devices through a base station to communicate with the internet. In one example, the transmission device 506 may be a Radio Frequency (RF) module, which is used to communicate with the internet wirelessly. In this embodiment, a method for early warning of faults of heterogeneous hard disk systems is provided, which is applied to the computer terminal.
Or, corresponding to the heterogeneous hard disk system fault early warning method, the application also provides electronic equipment. Since the embodiments of the electronic device are similar to the method embodiments described above, the description is relatively simple, and reference should be made to the description of the method embodiments described above, and the electronic device described below is merely illustrative. Fig. 6 is a schematic physical structure of an electronic device according to an embodiment of the present disclosure. The electronic device may include: a processor (processor) 601, a memory (memory) 602, and a communication bus 603, wherein the processor 601, the memory 602, and the communication bus 603 are used to communicate with each other and with the outside through a communication interface 604. The processor 601 may call logic instructions in the memory 602 to perform a heterogeneous hard disk system failure warning method, the method comprising: obtaining hard disk state attribute data of hard disks of different types in a heterogeneous hard disk system, clustering and grouping the hard disk state attribute data according to the hard disk model and the distribution difference of the hard disk state attribute data, and determining a hard disk cluster to which the hard disk state attribute data belongs; respectively inputting hard disk state attribute data corresponding to each hard disk cluster into a hard disk failure prediction model of a preset multi-tower structure to perform abnormality detection processing, and obtaining hard disk health index information output by the hard disk failure prediction model of the multi-tower structure; the hard disk fault prediction model is obtained by training based on sample hard disk state attribute data and hard disk health tag information corresponding to the sample hard disk state attribute data; and carrying out fault early warning on the heterogeneous hard disk system based on the hard disk health index information.
Further, the logic instructions in the memory 602 described above may be implemented in the form of software functional modules and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a Memory chip, a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
In another aspect, embodiments of the present application further provide a computer program product, where the computer program product includes a computer program stored on a processor readable storage medium, where the computer program includes program instructions, when the program instructions are executed by a computer, can execute the heterogeneous hard disk system fault early warning method provided in the foregoing method embodiments. The method comprises the following steps: obtaining hard disk state attribute data of hard disks of different types in a heterogeneous hard disk system, clustering and grouping the hard disk state attribute data according to the hard disk model and the distribution difference of the hard disk state attribute data, and determining a hard disk cluster to which the hard disk state attribute data belongs; respectively inputting hard disk state attribute data corresponding to each hard disk cluster into a hard disk failure prediction model of a preset multi-tower structure to perform abnormality detection processing, and obtaining hard disk health index information output by the hard disk failure prediction model of the multi-tower structure; the hard disk fault prediction model is obtained by training based on sample hard disk state attribute data and hard disk health tag information corresponding to the sample hard disk state attribute data; and carrying out fault early warning on the heterogeneous hard disk system based on the hard disk health index information.
In yet another aspect, embodiments of the present application further provide a processor readable storage medium, where a computer program is stored on the processor readable storage medium, where the computer program is implemented when executed by a processor to perform the heterogeneous hard disk system fault early warning method provided in the foregoing embodiments. The method comprises the following steps: obtaining hard disk state attribute data of hard disks of different types in a heterogeneous hard disk system, clustering and grouping the hard disk state attribute data according to the hard disk model and the distribution difference of the hard disk state attribute data, and determining a hard disk cluster to which the hard disk state attribute data belongs; respectively inputting hard disk state attribute data corresponding to each hard disk cluster into a hard disk failure prediction model of a preset multi-tower structure to perform abnormality detection processing, and obtaining hard disk health index information output by the hard disk failure prediction model of the multi-tower structure; the hard disk fault prediction model is obtained by training based on sample hard disk state attribute data and hard disk health tag information corresponding to the sample hard disk state attribute data; and carrying out fault early warning on the heterogeneous hard disk system based on the hard disk health index information.
The processor-readable storage medium may be any available medium or data storage device that can be accessed by a processor, including, but not limited to, magnetic storage (e.g., floppy disks, hard disks, magnetic tape, magneto-optical disks (MOs), etc.), optical storage (e.g., CD, DVD, BD, HVD, etc.), semiconductor storage (e.g., ROM, EPROM, EEPROM, nonvolatile storage (NAND FLASH), solid State Disk (SSD)), and the like.
The apparatus embodiments described above are merely illustrative, wherein the modules illustrated as separate components may or may not be physically separate, and the components shown as modules may or may not be physical, i.e., may be located in one place, or may be distributed over a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and are not limiting thereof; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the corresponding technical solutions.

Claims (15)

1. The heterogeneous hard disk system fault early warning method is characterized by comprising the following steps of:
obtaining hard disk state attribute data of hard disks of different types in a heterogeneous hard disk system, clustering and grouping the hard disk state attribute data according to the hard disk model and the distribution difference of the hard disk state attribute data, and determining a hard disk cluster to which the hard disk state attribute data belongs;
respectively inputting hard disk state attribute data corresponding to each hard disk cluster into a hard disk failure prediction model of a preset multi-tower structure to perform abnormality detection processing, and obtaining hard disk health index information output by the hard disk failure prediction model of the multi-tower structure; the hard disk fault prediction model is obtained by training based on sample hard disk state attribute data and hard disk health tag information corresponding to the sample hard disk state attribute data;
And carrying out fault early warning on the heterogeneous hard disk system based on the hard disk health index information.
2. The heterogeneous hard disk system fault early warning method according to claim 1, wherein the method is characterized in that the hard disk state attribute data corresponding to each hard disk cluster is respectively input into a hard disk fault prediction model of a preset multi-tower structure to perform abnormality detection processing, and hard disk health index information output by the hard disk fault prediction model of the multi-tower structure is obtained, and specifically comprises the following steps:
according to the hard disk class cluster to which the current hard disk state attribute data belongs, selecting and inputting the hard disk state attribute data to a corresponding hard disk individual feature extraction module in a hard disk fault prediction model of the multi-tower structure to obtain individual feature parameters corresponding to each hard disk class cluster;
inputting the hard disk state attribute data to a hard disk common characteristic extraction module in a hard disk fault prediction model of the multi-tower structure to obtain common characteristic parameters of different hard disk clusters;
determining target attribute characterization corresponding to different types of hard disks in the heterogeneous hard disk system based on the personality characteristic parameter and the commonality characteristic parameter;
and inputting the target attribute representation to a hard disk fault early-warning functional module in a hard disk fault prediction model of the multi-tower structure to perform fault reasoning analysis, so as to obtain hard disk health index information output by the hard disk fault early-warning functional module.
3. The heterogeneous hard disk system fault early warning method according to claim 1, wherein the clustering grouping is performed on the hard disk state attribute data according to the distribution difference of the hard disk model and the hard disk state attribute data, and the determining of the hard disk class cluster to which the hard disk state attribute data belongs specifically comprises:
according to the hard disk model in the heterogeneous hard disk system, distributing hard disk state attribute data belonging to the same hard disk model into the same hard disk class cluster; if the data volume of the hard disk state attribute data of one or more hard disk models is greater than or equal to a preset data volume threshold, splitting the corresponding one or more first hard disk class clusters into a plurality of second hard disk class clusters respectively; the data volume of the first hard disk cluster is larger than the data volume of the second hard disk cluster; or if the data volume of the hard disk state attribute data of one or more hard disk models is smaller than the data volume threshold, merging the corresponding one or more third hard disk class clusters into a fourth hard disk class cluster; the data volume of the fourth hard disk cluster is larger than the data volume of the third hard disk cluster;
clustering the hard disk state attribute data based on the distribution difference of the hard disk state attribute data so as to distribute the hard disk state attribute data to the second hard disk cluster or the fourth hard disk cluster, and determining the hard disk cluster to which the hard disk state attribute data belongs.
4. The heterogeneous hard disk system fault early warning method according to claim 1, wherein the hard disk health index information is a hard disk health score value obtained by performing anomaly detection processing on the hard disk state attribute data by a hard disk fault prediction model of the multi-tower structure; the hard disk health score value is in direct proportion to the hard disk health degree;
the fault early warning is carried out on the heterogeneous hard disk system based on the hard disk health index information, and the fault early warning specifically comprises the following steps: and comparing and analyzing the hard disk health score value with a scoring threshold value selected currently, judging that the heterogeneous hard disk system fails under the condition that the hard disk health score value is smaller than the scoring threshold value, and generating corresponding failure early warning prompt information.
5. The heterogeneous hard disk system failure warning method of claim 4, further comprising, after comparing the hard disk health score value with a currently selected scoring threshold value: and under the condition that the hard disk health score value is larger than or equal to the scoring threshold value, judging that the heterogeneous hard disk system is in a health state.
6. The heterogeneous hard disk system failure warning method of claim 5, further comprising: under the condition that the heterogeneous hard disk system is in a healthy state, the hard disk state attribute data with the hard disk health score value being greater than or equal to the scoring threshold value is used as a new training sample to carry out self-adaptive gradient training on the hard disk fault prediction model of the multi-tower structure so as to update parameters of each module in the hard disk fault prediction model of the multi-tower structure in real time, obtain a new hard disk fault prediction model of the multi-tower structure, and carry out abnormality detection processing on the subsequently input hard disk state attribute data by utilizing the new hard disk fault prediction model of the multi-tower structure.
7. The method for early warning of a failure of a heterogeneous hard disk system according to claim 1, further comprising, before obtaining hard disk state attribute data of hard disks of different types in the heterogeneous hard disk system: model training is carried out in an offline state, and a hard disk fault prediction model of the multi-tower structure is obtained;
model training is carried out in an offline state, and a hard disk fault prediction model of the multi-tower structure is obtained, which comprises the following steps:
acquiring sample hard disk state attribute data of a sample hard disk; the sample hard disk state attribute data comprise state attribute data corresponding to a healthy hard disk and state attribute data corresponding to an abnormal hard disk;
clustering and grouping the sample hard disk state attribute data according to the hard disk model of the sample hard disk and the distribution difference of the sample hard disk state attribute data, and determining a sample hard disk cluster to which the sample hard disk state attribute data belongs;
training an initial hard disk fault prediction model of the multi-tower structure based on the sample hard disk state attribute data corresponding to the sample hard disk class cluster, screening a plurality of hard disk abnormality detection thresholds by comparing the influence condition of the plurality of hard disk abnormality detection thresholds on a final abnormality detection result, determining a hard disk abnormality detection threshold meeting a preset probability confidence condition, and updating a scoring threshold of model parameters based on the hard disk abnormality detection threshold meeting the preset probability confidence condition to obtain the hard disk fault prediction model of the multi-tower structure after final training; the scoring threshold is a hard disk abnormality detection threshold with probability confidence degree larger than or equal to preset probability confidence degree.
8. The method for early warning of a failure of a heterogeneous hard disk system according to claim 1, wherein the obtaining hard disk state attribute data of hard disks of different types in the heterogeneous hard disk system specifically includes: acquiring original hard disk state attribute data of different types of hard disks in a heterogeneous hard disk system, and filling missing values of the original hard disk state attribute data to acquire first hard disk state attribute data; feature screening is carried out on the first hard disk state attribute data to obtain second hard disk state attribute data; and carrying out normalization processing on the second hard disk state attribute data to obtain the hard disk state attribute data.
9. The method for early warning of faults of heterogeneous hard disk systems according to claim 2, wherein determining target attribute representations corresponding to different types of hard disks in the heterogeneous hard disk systems based on the individual characteristic parameters and the common characteristic parameters specifically comprises:
multiplying the personality characteristic parameter and the commonality characteristic parameter to obtain target attribute characterization corresponding to different types of hard disks in the heterogeneous hard disk system.
10. The heterogeneous hard disk system fault early warning method according to claim 3, wherein the clustering the hard disk state attribute data based on the distribution difference of the hard disk state attribute data to allocate the hard disk state attribute data to the second hard disk class cluster or the fourth hard disk class cluster, and determining the hard disk class cluster to which the hard disk state attribute data belongs specifically includes:
Determining a metric for the clustering;
and clustering the hard disk state attribute data based on the distribution difference of the hard disk state attribute data and the measurement criterion to distribute the hard disk state attribute data to the second hard disk cluster or the fourth hard disk cluster so as to obtain the hard disk cluster to which the hard disk state attribute data belongs.
11. The heterogeneous hard disk system fault early warning method according to claim 2, wherein the selecting and inputting the hard disk state attribute data to the corresponding hard disk personality characteristic extraction module in the hard disk fault prediction model of the multi-tower structure according to the hard disk class cluster to which the current hard disk state attribute data belongs, and obtaining personality characteristic parameters corresponding to each hard disk class cluster specifically includes:
determining identification information of a corresponding hard disk personality characteristic extraction module according to a hard disk class cluster to which the current hard disk state attribute data belongs;
based on the identification information, the hard disk state attribute data are selected and input to a corresponding hard disk personality characteristic extraction module in the hard disk fault prediction model of the multi-tower structure, and personality characteristic parameters corresponding to each hard disk cluster are obtained.
12. The heterogeneous hard disk system fault early warning method according to claim 2, wherein the hard disk fault prediction model of the multi-tower structure comprises a plurality of hard disk individual feature extraction modules, each hard disk individual feature extraction module is used for correspondingly processing hard disk state attribute data of one hard disk class cluster, and each hard disk individual feature extraction module is used for correspondingly processing one target weight parameter.
13. The utility model provides a heterogeneous hard disk system trouble early warning device which characterized in that includes:
the clustering grouping unit is used for obtaining hard disk state attribute data of hard disks of different types in the heterogeneous hard disk system, clustering and grouping the hard disk state attribute data according to the hard disk model and the distribution difference of the hard disk state attribute data, and determining the hard disk class cluster to which the hard disk state attribute data belong;
the fault analysis unit is used for respectively inputting the hard disk state attribute data corresponding to each hard disk cluster into a hard disk fault prediction model of a preset multi-tower structure to perform abnormality detection processing, and obtaining hard disk health index information output by the hard disk fault prediction model of the multi-tower structure; the hard disk fault prediction model is obtained by training based on sample hard disk state attribute data and hard disk health tag information corresponding to the sample hard disk state attribute data;
And the fault early warning unit is used for carrying out fault early warning on the heterogeneous hard disk system based on the hard disk health index information.
14. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the heterogeneous hard disk system failure warning method of any of claims 1 to 12 when the computer program is executed by the processor.
15. A processor readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the steps of the heterogeneous hard disk system failure warning method according to any one of claims 1 to 12.
CN202311736825.5A 2023-12-18 2023-12-18 Heterogeneous hard disk system fault early warning method and device Active CN117421145B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311736825.5A CN117421145B (en) 2023-12-18 2023-12-18 Heterogeneous hard disk system fault early warning method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311736825.5A CN117421145B (en) 2023-12-18 2023-12-18 Heterogeneous hard disk system fault early warning method and device

Publications (2)

Publication Number Publication Date
CN117421145A true CN117421145A (en) 2024-01-19
CN117421145B CN117421145B (en) 2024-03-01

Family

ID=89525098

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311736825.5A Active CN117421145B (en) 2023-12-18 2023-12-18 Heterogeneous hard disk system fault early warning method and device

Country Status (1)

Country Link
CN (1) CN117421145B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9612896B1 (en) * 2015-08-24 2017-04-04 EMC IP Holding Company LLC Prediction of disk failure
CN113778766A (en) * 2021-08-17 2021-12-10 华中科技大学 Hard disk failure prediction model establishing method based on multi-dimensional characteristics and application thereof
CN114116292A (en) * 2022-01-27 2022-03-01 华南理工大学 Hard disk fault prediction method fusing AP clustering and width learning system
CN115480948A (en) * 2022-10-21 2022-12-16 济南浪潮数据技术有限公司 Hard disk failure prediction method and related equipment
CN115729761A (en) * 2022-11-23 2023-03-03 中国人民解放军陆军装甲兵学院 Hard disk fault prediction method, system, device and medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9612896B1 (en) * 2015-08-24 2017-04-04 EMC IP Holding Company LLC Prediction of disk failure
CN113778766A (en) * 2021-08-17 2021-12-10 华中科技大学 Hard disk failure prediction model establishing method based on multi-dimensional characteristics and application thereof
CN114116292A (en) * 2022-01-27 2022-03-01 华南理工大学 Hard disk fault prediction method fusing AP clustering and width learning system
CN115480948A (en) * 2022-10-21 2022-12-16 济南浪潮数据技术有限公司 Hard disk failure prediction method and related equipment
CN115729761A (en) * 2022-11-23 2023-03-03 中国人民解放军陆军装甲兵学院 Hard disk fault prediction method, system, device and medium

Also Published As

Publication number Publication date
CN117421145B (en) 2024-03-01

Similar Documents

Publication Publication Date Title
Xu et al. Improving service availability of cloud systems by predicting disk error
WO2017129032A1 (en) Disk failure prediction method and apparatus
CN110164501B (en) Hard disk detection method, device, storage medium and equipment
CN108052528A (en) A kind of storage device sequential classification method for early warning
US10878335B1 (en) Scalable text analysis using probabilistic data structures
CN112214369A (en) Hard disk fault prediction model establishing method based on model fusion and application thereof
WO2013055311A1 (en) Methods and systems for identifying action for responding to anomaly in cloud computing system
CN107168995B (en) Data processing method and server
WO2022166481A1 (en) Fault prediction method for storage drive, apparatus, and device
CN112771504A (en) Multi-factor cloud service storage device error prediction
US11416321B2 (en) Component failure prediction
US10891181B2 (en) Smart system dump
CN109918313B (en) GBDT decision tree-based SaaS software performance fault diagnosis method
CN108536548A (en) A kind of processing method of Bad Track, device and computer storage media
CN112532455B (en) Abnormal root cause positioning method and device
CN114036826A (en) Model training method, root cause determination method, device, equipment and storage medium
CN112951311A (en) Hard disk fault prediction method and system based on variable weight random forest
CN111367782B (en) Regression testing data automatic generation method and device
CN117421145B (en) Heterogeneous hard disk system fault early warning method and device
CN116302795A (en) Terminal operation and maintenance system and method based on artificial intelligence
Lin et al. Edits: An easy-to-difficult training strategy for cloud failure prediction
CN114116122A (en) High-availability load platform for application container
CN115981911A (en) Memory failure prediction method, electronic device and computer-readable storage medium
CN117093433B (en) Fault detection method and device, electronic equipment and storage medium
CN116781495A (en) Pulsar Proxy node selection method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant