CN117520104B - System for predicting abnormal state of hard disk - Google Patents

System for predicting abnormal state of hard disk Download PDF

Info

Publication number
CN117520104B
CN117520104B CN202410024906.0A CN202410024906A CN117520104B CN 117520104 B CN117520104 B CN 117520104B CN 202410024906 A CN202410024906 A CN 202410024906A CN 117520104 B CN117520104 B CN 117520104B
Authority
CN
China
Prior art keywords
hard disk
data
list
target
data list
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410024906.0A
Other languages
Chinese (zh)
Other versions
CN117520104A (en
Inventor
李国�
侯雪雪
李静
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Civil Aviation University of China
Original Assignee
Civil Aviation University of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Civil Aviation University of China filed Critical Civil Aviation University of China
Priority to CN202410024906.0A priority Critical patent/CN117520104B/en
Publication of CN117520104A publication Critical patent/CN117520104A/en
Application granted granted Critical
Publication of CN117520104B publication Critical patent/CN117520104B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3037Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a memory, e.g. virtual memory, cache
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3055Monitoring arrangements for monitoring the status of the computing system or of the computing system component, e.g. monitoring if the computing system is on, off, available, not available
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2433Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Quality & Reliability (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to a system for predicting abnormal states of a hard disk, which relates to the technical field of data processing and comprises the following components: a processor and a memory storing a computer program which, when executed by the processor, performs the steps of: the method comprises the steps of obtaining a target hard disk data list set, performing feature screening on the target hard disk data set, obtaining a first hard disk data list and a second hard disk data list, inputting the first hard disk data list into a first module, obtaining a third hard disk data list, inputting the second hard disk data list and the third hard disk data list into a second module, and obtaining tag data corresponding to a target hard disk; feature screening is carried out on the SMART data corresponding to the obtained hard disk, so that the obtained hard disk data are more effective, and the accuracy of predicting the hard disk abnormal state model is improved.

Description

System for predicting abnormal state of hard disk
Technical Field
The invention relates to the technical field of data processing, in particular to a system for predicting abnormal states of a hard disk.
Background
As a storage guarantee facility for digital transformation of economy and society, a cloud storage system provides data storage service through a hard disk, a mechanical hard disk is a current main storage medium in consideration of factors such as life expectancy and cost, and the hard disk is inevitably in abnormal state, namely faults, in the operation process, but the hard disk is in normal state mostly in the operation period, the occurrence probability of the abnormal state is small, and the problem of unbalance of positive and negative samples, namely abnormal state data and non-abnormal state data, is caused, so that the abnormal state of the hard disk is predicted.
In the prior art, the method for predicting the abnormal state of the hard disk comprises the following steps: firstly, constructing a first classifier training, identifying a deterioration window, marking a time sequence window for starting to display fault signs, generating fault hard disk data by using GAN, and then constructing a second classifier to predict hard disk faults.
In summary, the method for predicting the abnormal state of the hard disk has the following problems: the SMART data is marked as an abnormal state only on the current day of the abnormal state, abnormal state information before the abnormal state cannot be captured, the abnormal state data is easy to be lost, and the problem of unbalance of positive and negative sample data is caused; after SMART data corresponding to the hard disk are obtained, noise is easily additionally introduced because feature screening is not performed, and accuracy of predicting an abnormal state model of the hard disk is reduced.
Disclosure of Invention
Aiming at the technical problems, the invention adopts the following technical scheme: a system for predicting abnormal states of a hard disk, the system comprising: a processor and a memory storing a computer program which, when executed by the processor, performs the steps of:
a system for predicting abnormal states of a hard disk, the system comprising: a processor and a memory storing a computer program which, when executed by the processor, performs the steps of:
s100, acquiring a target hard disk data list set corresponding to a target hard disk, wherein the target hard disk is a hard disk to be detected, the target hard disk data list set comprises a plurality of target hard disk data lists, the target hard disk data list comprises a plurality of target hard disk data, the target hard disk data is SMART data corresponding to the target hard disk acquired based on initial hard disk operation characteristics, and the initial hard disk operation characteristics are characteristics of the hard disk during operation.
S200, acquiring a first hard disk data list and a second hard disk data list according to a target hard disk data list set, wherein the first hard disk data list comprises a plurality of first hard disk data, the first hard disk data are SMART data which are acquired from the target hard disk data list and only comprise target hard disk operation features and are in an abnormal state, the second hard disk data list comprises a plurality of second hard disk data, and the second hard disk data are SMART data which are acquired from the target hard disk data list and only comprise target hard disk operation features and are except the first hard disk data, and the target hard disk operation features are acquired in S200 through the following steps.
S201, acquiring a key hard disk data list set A= { A 1 ,……,A i ,……,A n },A i ={A i1 ,……,A ij ,……,A im (wherein A) ij The method comprises the steps that the j-th key hard disk data list corresponding to the i-th key hard disk comprises a plurality of key hard disk data, wherein the key hard disk data are SMART data corresponding to key hard disks obtained based on initial hard disk operation characteristics, j= … … m, m is the number of the initial hard disk operation characteristics, i= … … n, n is the number of the key hard disks, and the key hard disks are hard disks in abnormal states and used for training and obtaining target hard disk operation characteristics.
S203, according to A, obtaining a candidate score list set B= { B corresponding to A 1 ,……,B i ,……,B n },B i ={B i 1 ,……,B i r ,……,B i s },B i r ={B i r1 ,……,B i rj ,……,B i rm },B i rj Is A i And j-th candidate scores in the corresponding r-th candidate score list, wherein r= … … s, s are the number of candidate score types, and the candidate scores are scores corresponding to each initial hard disk operation feature acquired by using different feature importance acquisition algorithms based on A.
S205, obtaining a candidate priority list D according to A and B 0 ={D 0 1 ,……,D 0 j ,……,D 0 m },D 0 j Candidate priority corresponding to the jth initial hard disk running feature, wherein D 0 j Meets the following conditions:
,ω i rj is B i rj The corresponding fractional part of the number, ɛ, of the number comprised from the fractional first digit to the first non-zero digit i rj To B i r B after the candidate scores in (a) are ordered in order from big to small i rj And a serial number corresponding to the position.
S207 according to D 0 Obtaining a target priority list d= { D 1 ,……,D i ,……,D n },D i Target priority corresponding to the ith initial hard disk running characteristic, wherein D i Meets the following conditions:
D i =(D 0 i -D 1 )/(D 2 -D 1 ) Wherein D is 1 For D 0 Minimum candidate priority of D 2 For D 0 Is the largest candidate priority in the list.
S209, when D i And when the FD is not less than the preset priority threshold, acquiring the corresponding initial hard disk operation characteristic as the target hard disk operation characteristic, wherein the FD is the preset priority threshold.
S300, inputting a first hard disk data list into a first module to obtain a third hard disk data list, wherein the third hard disk data list comprises a plurality of third hard disk data, the third hard disk data is obtained by carrying out data enhancement on the first hard disk data, and the first module is a module for carrying out data enhancement.
S400, inputting a second hard disk data list and a third hard disk data list into a second module, and acquiring tag data corresponding to a target hard disk to predict an abnormal state of the target hard disk, wherein the second module is a module for acquiring the tag data of the target hard disk.
Compared with the prior art, the system for predicting the abnormal state of the hard disk has obvious beneficial effects, can achieve quite technical progress and practicality, has wide industrial utilization value, and has at least the following beneficial effects:
the invention relates to a system for predicting abnormal state of hard disk, comprising: a processor and a memory storing a computer program which, when executed by the processor, performs the steps of: acquiring a target hard disk data list set corresponding to a target hard disk, and acquiring a first hard disk data list and a second hard disk data list according to the target hard disk data list set, wherein the first hard disk data is SMART data which only comprises target hard disk operation characteristics and is in an abnormal state and is acquired from the target hard disk data list, and the acquisition mode for acquiring the target hard disk operation characteristics is as follows: acquiring a key hard disk data list set, acquiring a candidate score list set corresponding to the key hard disk data list set according to the key hard disk data list set, acquiring a candidate priority list according to the key hard disk data list set and the candidate score list set, acquiring a target priority list according to the candidate priority list, acquiring a target hard disk operation characteristic according to the target priority list, inputting a first hard disk data list into a first module, acquiring a third hard disk data list, inputting a second hard disk data list and the third hard disk data list into a second module, and acquiring tag data corresponding to a target hard disk to realize prediction of an abnormal state of the target hard disk; feature screening is carried out on the SMART data corresponding to the obtained hard disk, so that the obtained hard disk data are more effective, and the accuracy of predicting the hard disk abnormal state model is improved.
The foregoing description is only an overview of the present invention, and is intended to be implemented in accordance with the teachings of the present invention, as well as the preferred embodiments thereof, together with the following detailed description of the invention, given by way of illustration only, together with the accompanying drawings.
Drawings
FIG. 1 is a flowchart of a system for predicting abnormal states of a hard disk, which is implemented when a processor of the system executes a computer program according to an embodiment of the present invention;
fig. 2 is a flowchart of step S200 provided in an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the invention.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or server that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Examples
The embodiment provides a system for predicting abnormal states of a hard disk, which comprises: a processor and a memory storing a computer program which, when executed by the processor, performs the steps of, as shown in fig. 1:
s100, acquiring a target hard disk data list set corresponding to a target hard disk, wherein the target hard disk is a hard disk to be detected, the target hard disk data list set comprises a plurality of target hard disk data lists, the target hard disk data list comprises a plurality of target hard disk data, the target hard disk data is SMART data corresponding to the target hard disk acquired based on initial hard disk operation characteristics, and the initial hard disk operation characteristics are characteristics of the hard disk during operation;
specifically, the target hard disk data included in each target hard disk data list characterizes data of the hard disk at different times based on one hard disk operation feature.
Specifically, SMART data has timing.
Specifically, the initial hard disk operation feature is a feature that is characterized by the hard disk during operation, for example: hard disk operation characteristics such as the power-on time of the hard disk, the time of the magnetic head moving on the disk surface during the power-on of the magnetic disk, and the like.
S200, acquiring a first hard disk data list and a second hard disk data list according to a target hard disk data list set, wherein the first hard disk data list comprises a plurality of first hard disk data, the first hard disk data are SMART data which are acquired from the target hard disk data list and only comprise target hard disk operation characteristics and are in an abnormal state, the second hard disk data list comprises a plurality of second hard disk data, and the second hard disk data are SMART data which are acquired from the target hard disk data list and only comprise target hard disk operation characteristics and are except the first hard disk data.
Specifically, in S200, the target hard disk operation feature is obtained by the following steps, as shown in fig. 2:
s201, acquiring a key hard disk data list set A= { A 1 ,……,A i ,……,A n },A i ={A i1 ,……,A ij ,……,A im (wherein A) ij The method comprises the steps that the j-th key hard disk data list corresponding to the i-th key hard disk comprises a plurality of key hard disk data, wherein the key hard disk data are SMART data corresponding to key hard disks obtained based on initial hard disk operation characteristics, j= … … m, m is the number of the initial hard disk operation characteristics, i= … … n, n is the number of the key hard disks, and the key hard disks are hard disks in abnormal states and used for training and obtaining target hard disk operation characteristics.
S203, according to A, obtaining a candidate score list set B= { B corresponding to A 1 ,……,B i ,……,B n },B i ={B i 1 ,……,B i r ,……,B i s },B i r ={B i r1 ,……,B i rj ,……,B i rm },B i rj Is A i And j-th candidate scores in the corresponding r-th candidate score list, wherein r= … … s, s are the number of candidate score types, and the candidate scores are scores corresponding to each initial hard disk operation feature acquired by using different feature importance acquisition algorithms based on A.
Preferably, s=5.
Further, when r=1, B i 1j The method for acquiring the relationship strength of two variables in any one of the prior art is known to those skilled in the art, and falls into the protection scope of the present invention, and the method for acquiring the relationship strength of two variables, such as pearson correlation coefficient algorithm, is not described herein.
Further, when the hard disk is in an abnormal state, the tag data is "1"; when the hard disk is in a normal state, the tag data is "0".
Further, when r=2, B i 2j The method for acquiring the relevant directions of two variables in any one of the prior art is known to those skilled in the art, and falls into the protection scope of the present invention, and the method for acquiring the relevant directions of two variables, such as the spearman correlation coefficient algorithm, is not described herein.
Further, when r=3, B i 3j The method comprises the step of adding noise data into data corresponding to the operation characteristics of the jth initial hard disk, wherein the data is obtained based on a random forest model and is used for reducing classification accuracy.
Further, when r=4, B i 4j And obtaining a score corresponding to the j initial hard disk operation characteristics corresponding to the i key hard disk based on the XGBoot model.
Further, when r=5, B i 5j Is based onAnd obtaining a score corresponding to the j initial hard disk operation characteristics corresponding to the i-th key hard disk from the Relief characteristic selection algorithm.
According to the method, the scores corresponding to the initial hard disk operation features are obtained through selecting different algorithms, and the obtained candidate scores can be obtained based on different dimensions by selecting five algorithms based on different dimensions, so that the accuracy of subsequent feature selection is higher.
S205, obtaining a candidate priority list D according to A and B 0 ={D 0 1 ,……,D 0 j ,……,D 0 m },D 0 j Candidate priority corresponding to the jth initial hard disk running feature, wherein D 0 j Meets the following conditions:
,ω i rj is B i rj The corresponding fractional part of the number, ɛ, of the number comprised from the fractional first digit to the first non-zero digit i rj To B i r B after the candidate scores in (a) are ordered in order from big to small i rj And a serial number corresponding to the position.
Specifically, it can be understood that: when B is i rj When=0.0457, the number of digits comprised by the decimal part from 0 to the first digit other than 0, i.e. 4, is 2, i.e. ω i rj =2; when B is i rj When= 0.4057, ω i rj =1, when B i rj When=0.0057, ω i rj =3。
Specifically, ɛ i rj The value of (2) ranges from 1 to m, and can be understood as: will B r B after the candidate scores in (a) are ordered in order from big to small i rj At the 5 th row, ɛ i rj =5。
S207 according to D 0 Obtaining a target priority list d= { D 1 ,……,D i ,……,D n },D i Target priority corresponding to the ith initial hard disk running characteristic, wherein D i Meets the following conditions:
D i =(D 0 i -D 1 )/(D 2 -D 1 ) Wherein D is 1 For D 0 Minimum candidate priority of D 2 For D 0 Is the largest candidate priority in the list.
S209, when D i When the FD is not less than the preset priority threshold, acquiring the corresponding initial hard disk operation characteristic as a target hard disk operation characteristic, wherein the FD is a preset priority threshold;
specifically, the value range of FD is 0.05-0.1, where those skilled in the art know that the selection of the preset priority threshold can be performed according to the actual requirement, which falls within the protection range of the present invention, and will not be described herein.
According to the method, based on the candidate score corresponding to each initial hard disk operation feature, the final priority corresponding to each initial hard disk operation feature is obtained, the weight corresponding to each feature and the position of the feature in the data set are considered in the process of obtaining the final priority, so that the accuracy of the final priority corresponding to the obtained feature is higher, and the obtained hard disk data is more effective by feature screening when the SMART data corresponding to the hard disk is obtained, and the accuracy of predicting the hard disk abnormal state model is improved.
S300, inputting a first hard disk data list into a first module to obtain a third hard disk data list, wherein the third hard disk data list comprises a plurality of third hard disk data, the third hard disk data is obtained by carrying out data enhancement on the first hard disk data, and the first module is a module for carrying out data enhancement.
Specifically, in S300, the first module is acquired by:
s1, acquiring a sample hard disk data list, wherein the sample hard disk data list comprises a plurality of sample hard disk data, the sample hard disk data are SMART data corresponding to the sample hard disk in an abnormal state, and the sample hard disk is a hard disk for training.
S2, acquiring a first sample hard disk data list P and a second sample hard disk data list Q according to the sample hard disk data list, wherein the first sample hard disk data list P comprises a plurality of first sample hard disk data, the first sample hard disk data are sample hard disk data which are acquired from the sample hard disk data list and have no association relation with time, the second sample hard disk data list Q comprises a plurality of second sample hard disk data, and the second sample hard disk data are sample hard disk data which are acquired from the sample hard disk data list and have association relation with time.
Specifically, it can be understood that: the time-uncorrelated relation is not related to the time of the running of the hard disk, such as the corresponding transmission rate of the hard disk; in contrast, the time-related relationship is that the data change is related to the running time of the hard disk, such as the flying time of the magnetic head of the hard disk, and the moving time of the magnetic head on the disk surface after the hard disk is electrified is controlled by the running time of the magnetic disk.
S3, inputting P and Q into a generator to obtain a first candidate priority L 1 Wherein, in S3, the first candidate priority L is obtained by the following steps 1
S31, inputting P into the coding function to obtain a coded data list P corresponding to P 1
S32, P 1 Inputting into a decoding function for decoding to obtain a decoded data list P corresponding to P 0
Specifically, the encoding function and the decoding function are realized through an LSTM model.
S33, inputting Q into the coding function to obtain a coded data list Q corresponding to the Q 1
S34, Q 1 Inputting into the decoding function to decode to obtain a decoded data list Q corresponding to Q 0
S35, according to P 0 And Q 0 Acquiring a first candidate priority L 1 Wherein the first candidate priority L 1 For each data in P and P 0 Similarity between each corresponding position data in (a) and (b) in QEach data and Q 0 The sum of the similarities between each of the corresponding position data.
Specifically, the generator is a generator in a GAN model.
Specifically, those skilled in the art know that any method for obtaining the similarity between two data in the prior art falls within the protection scope of the present invention, and is not described herein. Such as cosine similarity, and the like.
S4, continuously adjusting parameters in the generator until L 1 And acquiring a first vector list and a second vector list at the minimum time, wherein the first vector list comprises a plurality of first vectors, the first vectors are randomly generated vectors based on the first sample hard disk data in P, the second vector list comprises a plurality of second vectors, and the second vectors are randomly generated vectors based on the second sample hard disk data in Q.
S5, inputting the first vector list and the second vector list into a generator after parameter tuning, and obtaining a coded vector list E corresponding to the first vector list and a coded vector list F corresponding to the second vector list.
S6, P 1 And E, inputting the encoded data corresponding to the first sample hard disk data and the encoded vector corresponding to the first vector into a discriminator, and acquiring a first designated data list corresponding to P, wherein the first designated data list comprises a plurality of first designated data, and the first designated data is a classification result acquired by using the encoded data corresponding to the first sample hard disk data and the encoded vector corresponding to the first vector through a countermeasure function in the discriminator.
Specifically, the discriminator is a discriminator in the GAN model, and is used for distinguishing real data from synthetic data of the generator.
S7, Q 1 And F, inputting the encoded data and the encoded vector corresponding to the second vector into a discriminator, and acquiring a second designated data list corresponding to Q, wherein the second designated data list comprises a plurality of second designated data, and the second designated data is a classification result acquired by using the encoded data corresponding to the second sample hard disk data and the encoded vector corresponding to the second vector through a countermeasure function in the discriminator.
S8, according to the first appointed data listAnd a second designated data list for acquiring a second candidate priority L 2 Wherein the second candidate priority L 2 Accuracy of identifying P, Q, E and F for the discriminator acquired from the first specified data list and the second specified data list.
Specifically, those skilled in the art know that any method for determining the discrimination capability of the discriminator, that is, the loss function corresponding to the discriminator in the prior art falls within the protection scope of the present invention, and is not described herein.
S9, continuously adjusting parameters in the discriminator until L 2 Obtaining the third candidate priority L at maximum 3 Wherein the third candidate priority L 3 Is the sum of the similarities between P and E and the similarities between Q and F.
Specifically, the method for obtaining the similarity between P and E is consistent with the method for obtaining the similarity between Q and F and the method for obtaining the similarity in step S35.
S10, continuously adjusting parameters of a preset initial module until a third candidate priority L 3 Minimum time to acquire the first module.
Above-mentioned, obtain the corresponding SMART data of hard disk after the feature screening, carry out data enhancement to the data that is in the abnormal state wherein, captured the abnormal state information before the hard disk abnormal state, be difficult for causing the deletion of abnormal state data to not only remain the abnormal state information before the hard disk abnormal state, also enriched negative sample information, make positive and negative sample data more balanced.
S700, inputting a second hard disk data list and a third hard disk data list into a second module, and acquiring tag data corresponding to a target hard disk to realize prediction of an abnormal state of the target hard disk, wherein the second module is a module for acquiring the tag data of the target hard disk.
Specifically, the step S700 further includes the following steps:
s701, acquiring a training hard disk data set, wherein the training hard disk data set comprises a plurality of training hard disk data, and the training hard disk data are used for predicting model training data.
Specifically, training the prediction model to obtain the second module.
Specifically, the training hard disk data set includes SMART data when the training hard disk is in a normal state, SMART data when the training hard disk is in an abnormal state, and data obtained by performing data enhancement on the SMART data when the training hard disk is in the abnormal state.
S702, acquiring a first data set V corresponding to a training hard disk data set at the moment t t ={V t 1 ,……,V t d ,……,V t a },V t d D= … … a, a is the number of first data, where the first data is data obtained based on the hard disk operation feature, and the number of first data is consistent with the number of the hard disk operation feature.
S703, V t Input into the first self-attention mechanism to obtain V t Corresponding second data set DV t ={DV t 1 ,……,DV t d ,……,DV t a }, wherein DV t d =V t d ×q t d ,q t d Is V (V) t d And a corresponding first weight.
Specifically, those skilled in the art know that any method of acquiring feature weights based on a self-attention mechanism in the prior art falls within the protection scope of the present invention, and is not described herein again, for example, by a method of activating a function or the like.
S704, DV t Input to LSTM coding unit to obtain third data set G t Wherein the third data set G t To V pair t Encoding the acquired data set.
S705, G t Input into the second self-attention mechanism to obtain G t Corresponding fourth data set DG t ={DG t 1 ,……,DG t d ,……,DG t a }, DG therein t d =G t d ×b t d ,b t d Is G t d And a corresponding second weight.
Specifically, the second weight is a weight corresponding to each encoder in the LSTM coding unit, where those skilled in the art know that any method for obtaining the weight of the encoder through a self-attention mechanism in the prior art falls into the protection scope of the present invention, and the method such as a convolution value corresponding to a CNN model convolution kernel is not described herein.
S706, DG is added t 、G t-1 Y t-1 Inputting to LSTM decoding unit, obtaining label data corresponding to hard disk at t moment, G t-1 Training a third data set corresponding to the hard disk for the time (t-1), y t-1 And (3) training the label data corresponding to the hard disk at the time (t-1).
And S707, continuously adjusting parameters of the prediction model until the final prediction result meets the preset condition to obtain a second module.
Firstly, the obtained hard disk performs feature screening on the SMART data obtained based on the initial hard disk operation features to obtain SMART data only comprising target hard disk operation features, screens SMART data in an abnormal state, performs data enhancement on the SMART data, and can capture abnormal state information before the abnormal state of the hard disk, so that the abnormal state information before the abnormal state of the hard disk is not easy to cause the deletion of the abnormal state data, and the abnormal state information before the abnormal state of the hard disk is reserved in the data enhancement process, so that positive and negative sample data are more balanced; feature screening is carried out on the SMART data corresponding to the obtained hard disk, so that the obtained hard disk data are more effective, and the accuracy of predicting the hard disk abnormal state model is improved.
The embodiment is a system for predicting abnormal states of a hard disk, including: a processor and a memory storing a computer program which, when executed by the processor, performs the steps of: acquiring a target hard disk data list set corresponding to a target hard disk, and acquiring a first hard disk data list and a second hard disk data list according to the target hard disk data list set, wherein the first hard disk data is SMART data which only comprises target hard disk operation characteristics and is in an abnormal state and is acquired from the target hard disk data list, and the acquisition mode for acquiring the target hard disk operation characteristics is as follows: acquiring a key hard disk data list set, acquiring a candidate score list set corresponding to the key hard disk data list set according to the key hard disk data list set, acquiring a candidate priority list according to the key hard disk data list set and the candidate score list set, acquiring a target priority list according to the candidate priority list, acquiring a target hard disk operation characteristic according to the target priority list, inputting a first hard disk data list into a first module, acquiring a third hard disk data list, inputting a second hard disk data list and the third hard disk data list into a second module, and acquiring tag data corresponding to a target hard disk to realize prediction of an abnormal state of the target hard disk; feature screening is carried out on the SMART data corresponding to the obtained hard disk, so that the obtained hard disk data are more effective, and the accuracy of predicting the hard disk abnormal state model is improved.
While certain specific embodiments of the invention have been described in detail by way of example, it will be appreciated by those skilled in the art that the above examples are for illustration only and are not intended to limit the scope of the invention. Those skilled in the art will also appreciate that many modifications may be made to the embodiments without departing from the scope and spirit of the invention. The scope of the invention is defined by the appended claims.

Claims (6)

1. A system for predicting abnormal states of a hard disk, the system comprising: a processor and a memory storing a computer program which, when executed by the processor, performs the steps of:
s100, acquiring a target hard disk data list set corresponding to a target hard disk, wherein the target hard disk is a hard disk to be detected, the target hard disk data list set comprises a plurality of target hard disk data lists, the target hard disk data list comprises a plurality of target hard disk data, the target hard disk data is SMART data corresponding to the target hard disk acquired based on initial hard disk operation characteristics, and the initial hard disk operation characteristics are characteristics of the hard disk during operation;
s200, acquiring a first hard disk data list and a second hard disk data list according to a target hard disk data list set, wherein the first hard disk data list comprises a plurality of first hard disk data, the first hard disk data are SMART data which are acquired from the target hard disk data list and only comprise target hard disk operation characteristics and are in an abnormal state, the second hard disk data list comprises a plurality of second hard disk data, and the second hard disk data are SMART data which are acquired from the target hard disk data list and only comprise the target hard disk operation characteristics and are except the first hard disk data, and the target hard disk operation characteristics are acquired in S200 through the following steps:
s201, acquiring a key hard disk data list set A= { A 1 ,……,A i ,……,A n },A i ={A i1 ,……,A ij ,……,A im (wherein A) ij The method comprises the steps that a j-th key hard disk data list corresponding to an i-th key hard disk is obtained, wherein the key hard disk data list comprises a plurality of key hard disk data, the key hard disk data are SMART data corresponding to key hard disks obtained based on initial hard disk operation characteristics, j= … … m, m is the number of the initial hard disk operation characteristics, i= … … n, n is the number of the key hard disks, and the key hard disks are hard disks in abnormal states and are used for training and obtaining target hard disk operation characteristics;
s203, according to A, obtaining a candidate score list set B= { B corresponding to A 1 ,……,B i ,……,B n },B i ={B i 1 ,……,B i r ,……,B i s },B i r ={B i r1 ,……,B i rj ,……,B i rm },B i rj Is A i In a corresponding class r candidate score listThe j-th candidate score, r= … … s, s is the number of candidate score types, wherein the candidate score is a score corresponding to each initial hard disk operation feature acquired by using different feature importance acquisition algorithms based on a, s=5, and when r=1, B i 1j For the ith key hard disk, a score is obtained based on the relation strength between the data corresponding to the jth initial hard disk operation characteristic and the label data corresponding to the key hard disk, when r=2, B i 2j For the ith key hard disk, a score is obtained based on the correlation direction between the data corresponding to the jth initial hard disk operation characteristic and the label data corresponding to the key hard disk, when r=3, B i 3j In order to obtain the degree of reduction of classification precision after adding noise data into data corresponding to the operation characteristics of the jth initial hard disk based on the ith key hard disk acquired by the random forest model, when r=4, B is as follows i 4j For the score corresponding to the j initial hard disk operation feature corresponding to the i key hard disk obtained based on XGBoot model, when r=5, B i 5j The score corresponding to the j initial hard disk operation characteristics corresponding to the i-th key hard disk obtained based on the Relief characteristic selection algorithm is obtained;
s205, obtaining a candidate priority list D according to A and B 0 ={D 0 1 ,……,D 0 j ,……,D 0 m },D 0 j Candidate priority corresponding to the jth initial hard disk running feature, wherein D 0 j Meets the following conditions:
,ω i rj is B i rj The corresponding fractional part of the number, ɛ, of the number comprised from the fractional first digit to the first non-zero digit i rj To B i r B after the candidate scores in (a) are ordered in order from big to small i rj A serial number corresponding to the position;
s207 according to D 0 Acquiring a target priority listD={D 1 ,……,D i ,……,D n },D i Target priority corresponding to the ith initial hard disk running characteristic, wherein D i Meets the following conditions:
D i =(D 0 i -D 1 )/(D 2 -D 1 ) Wherein D is 1 For D 0 Minimum candidate priority of D 2 For D 0 The largest candidate priority of (a);
s209, when D i When the FD is not less than the preset priority threshold, acquiring the corresponding initial hard disk operation characteristic as a target hard disk operation characteristic, wherein the FD is a preset priority threshold;
s300, inputting a first hard disk data list into a first module to obtain a third hard disk data list, wherein the third hard disk data list comprises a plurality of third hard disk data, the third hard disk data is obtained by carrying out data enhancement on the first hard disk data, and the first module is a module for carrying out data enhancement;
s400, inputting a second hard disk data list and a third hard disk data list into a second module, and acquiring tag data corresponding to a target hard disk to predict an abnormal state of the target hard disk, wherein the second module is a module for acquiring the tag data of the target hard disk.
2. The system for predicting abnormal state of hard disk of claim 1, wherein ɛ i rj The value of (2) is in the range of 1 to m.
3. The system for predicting abnormal states of a hard disk of claim 1, wherein the FD has a value ranging from 0.05 to 0.1.
4. The system for predicting an abnormal state of a hard disk of claim 1, wherein the tag data is "1" when the hard disk is in the abnormal state; when the hard disk is in a normal state, the tag data is "0".
5. The system for predicting abnormal states of a hard disk of claim 1, wherein the first module is acquired in S300 by:
s1, acquiring a sample hard disk data list, wherein the sample hard disk data list comprises a plurality of sample hard disk data, the sample hard disk data are SMART data corresponding to the sample hard disk in an abnormal state, and the sample hard disk is a hard disk for training;
s2, acquiring a first sample hard disk data list P and a second sample hard disk data list Q according to the sample hard disk data list, wherein the first sample hard disk data list P comprises a plurality of first sample hard disk data, the first sample hard disk data are sample hard disk data which are acquired from the sample hard disk data list and have no association relation with time, the second sample hard disk data list Q comprises a plurality of second sample hard disk data, and the second sample hard disk data are sample hard disk data which are acquired from the sample hard disk data list and have association relation with time;
s3, inputting P and Q into a generator to obtain a first candidate priority L 1 Wherein, in S3, the first candidate priority L is obtained by the following steps 1
S31, inputting P into the coding function to obtain a coded data list P corresponding to P 1
S32, P 1 Inputting into a decoding function for decoding to obtain a decoded data list P corresponding to P 0
S33, inputting Q into the coding function to obtain a coded data list Q corresponding to the Q 1
S34, Q 1 Inputting into the decoding function to decode to obtain a decoded data list Q corresponding to Q 0
S35, according to P 0 And Q 0 Acquiring a first candidate priority L 1 Wherein the first candidate priority L 1 For each data in P and P 0 Similarity between each corresponding position data in Q and Q 0 A sum of similarities between each of the corresponding position data;
s4, continuously adjusting parameters in the generator until L 1 Acquiring a first vector list and a second vector list at the minimum, wherein the first vector list comprises a plurality of first vectors, the first vectors are randomly generated vectors based on first sample hard disk data in P, the second vector list comprises a plurality of second vectors, and the second vectors are randomly generated vectors based on second sample hard disk data in Q;
s5, inputting the first vector list and the second vector list into a generator after parameter tuning, and obtaining a coded vector list E corresponding to the first vector list and a coded vector list F corresponding to the second vector list;
s6, P 1 E, inputting the first specified data list corresponding to the P into a discriminator, wherein the first specified data list comprises a plurality of first specified data, and the first specified data is a classification result obtained by using the coded data corresponding to the first sample hard disk data and the coded vector corresponding to the first vector through a countermeasure function in the discriminator;
s7, Q 1 F, inputting the encoded data and the encoded vectors corresponding to the second vectors into a discriminator, and acquiring a second designated data list corresponding to Q, wherein the second designated data list comprises a plurality of second designated data, and the second designated data is a classification result obtained by passing the encoded data corresponding to the second sample hard disk data and the encoded vectors corresponding to the second vectors through a countermeasure function in the discriminator;
s8, acquiring a second candidate priority L according to the first specified data list and the second specified data list 2 Wherein the second candidate priority L 2 Accuracy of identifying P, Q, E and F for the discriminator acquired from the first specified data list and the second specified data list;
s9, continuously adjusting parameters in the discriminator until L 2 Obtaining the third candidate priority L at maximum 3 Wherein the third candidate priority L 3 Is the sum of the similarity between P and E and the similarity between Q and F;
s10, continuously adjusting parameters of a preset initial module until the first stepThree candidate priority levels L 3 Minimum time to acquire the first module.
6. The system for predicting abnormal states of a hard disk of claim 5, wherein the encoding function and the decoding function are implemented by an LSTM model.
CN202410024906.0A 2024-01-08 2024-01-08 System for predicting abnormal state of hard disk Active CN117520104B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410024906.0A CN117520104B (en) 2024-01-08 2024-01-08 System for predicting abnormal state of hard disk

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410024906.0A CN117520104B (en) 2024-01-08 2024-01-08 System for predicting abnormal state of hard disk

Publications (2)

Publication Number Publication Date
CN117520104A CN117520104A (en) 2024-02-06
CN117520104B true CN117520104B (en) 2024-03-29

Family

ID=89751756

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410024906.0A Active CN117520104B (en) 2024-01-08 2024-01-08 System for predicting abnormal state of hard disk

Country Status (1)

Country Link
CN (1) CN117520104B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105260279A (en) * 2015-11-04 2016-01-20 四川效率源信息安全技术股份有限公司 Method and device of dynamically diagnosing hard disk failure based on S.M.A.R.T (Self-Monitoring Analysis and Reporting Technology) data
CN107797899A (en) * 2017-10-12 2018-03-13 记忆科技(深圳)有限公司 A kind of method of solid state hard disc data safety write-in
CN110164501A (en) * 2018-06-29 2019-08-23 腾讯科技(深圳)有限公司 A kind of hard disk detection method, device, storage medium and equipment
CN112951311A (en) * 2021-04-16 2021-06-11 中国民航大学 Hard disk fault prediction method and system based on variable weight random forest
CN113822336A (en) * 2021-08-20 2021-12-21 济南浪潮数据技术有限公司 Cloud hard disk fault prediction method, device and system and readable storage medium
WO2022227373A1 (en) * 2021-04-26 2022-11-03 华为技术有限公司 Hard disk health evaluation method and storage device
CN115543707A (en) * 2022-09-29 2022-12-30 苏州浪潮智能科技有限公司 Hard disk fault detection method, system and device, storage medium and electronic device
CN116302870A (en) * 2022-12-14 2023-06-23 苏州华启智能科技有限公司 Mechanical hard disk health assessment method, system and storage medium based on evolutionary diagram

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105260279A (en) * 2015-11-04 2016-01-20 四川效率源信息安全技术股份有限公司 Method and device of dynamically diagnosing hard disk failure based on S.M.A.R.T (Self-Monitoring Analysis and Reporting Technology) data
CN107797899A (en) * 2017-10-12 2018-03-13 记忆科技(深圳)有限公司 A kind of method of solid state hard disc data safety write-in
CN110164501A (en) * 2018-06-29 2019-08-23 腾讯科技(深圳)有限公司 A kind of hard disk detection method, device, storage medium and equipment
CN112951311A (en) * 2021-04-16 2021-06-11 中国民航大学 Hard disk fault prediction method and system based on variable weight random forest
WO2022227373A1 (en) * 2021-04-26 2022-11-03 华为技术有限公司 Hard disk health evaluation method and storage device
CN113822336A (en) * 2021-08-20 2021-12-21 济南浪潮数据技术有限公司 Cloud hard disk fault prediction method, device and system and readable storage medium
CN115543707A (en) * 2022-09-29 2022-12-30 苏州浪潮智能科技有限公司 Hard disk fault detection method, system and device, storage medium and electronic device
CN116302870A (en) * 2022-12-14 2023-06-23 苏州华启智能科技有限公司 Mechanical hard disk health assessment method, system and storage medium based on evolutionary diagram

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于深度强化学习的故障硬盘预测与处理方法;管文白 ,房笑宇,夏 彬;软 件 导 刊;20230331;全文 *

Also Published As

Publication number Publication date
CN117520104A (en) 2024-02-06

Similar Documents

Publication Publication Date Title
CN108228915B (en) Video retrieval method based on deep learning
Lin et al. A general two-step approach to learning-based hashing
JP4697670B2 (en) Identification data learning system, learning device, identification device, and learning method
US11232141B2 (en) Method and device for processing an electronic document
US20120207387A1 (en) Method and Apparatus for Multi-Dimensional Content Search and Video Identification
CN111950728B (en) Image feature extraction model construction method, image retrieval method and storage medium
CN111143838B (en) Database user abnormal behavior detection method
CN109829065B (en) Image retrieval method, device, equipment and computer readable storage medium
CN112052451A (en) Webshell detection method and device
Choi et al. Face video retrieval based on the deep CNN with RBF loss
CN113691542B (en) Web attack detection method and related equipment based on HTTP request text
CN111783088B (en) Malicious code family clustering method and device and computer equipment
CN117520104B (en) System for predicting abnormal state of hard disk
Yu et al. Towards artificially intelligent recycling Improving image processing for waste classification
CN115761837A (en) Face recognition quality detection method, system, device and medium
CN115170813A (en) Network supervision fine-grained image identification method based on partial label learning
CN114528908A (en) Network request data classification model training method, classification method and storage medium
CN115512693A (en) Audio recognition method, acoustic model training method, device and storage medium
CN111581640A (en) Malicious software detection method, device and equipment and storage medium
CN112766312B (en) User information acquisition method, electronic equipment and medium
CN116049660B (en) Data processing method, apparatus, device, storage medium, and program product
CN115730234A (en) User behavior prediction method, device, equipment and medium based on artificial intelligence
CN114298304A (en) Active learning method and device, electronic equipment and readable storage medium
Ahmad et al. A Data-Driven Approach for Online Phishing Activity Detection
CN117218477A (en) Image recognition and model training method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant