WO2020134442A1 - 磁盘故障的预测方法、设备及存储介质 - Google Patents

磁盘故障的预测方法、设备及存储介质 Download PDF

Info

Publication number
WO2020134442A1
WO2020134442A1 PCT/CN2019/113115 CN2019113115W WO2020134442A1 WO 2020134442 A1 WO2020134442 A1 WO 2020134442A1 CN 2019113115 W CN2019113115 W CN 2019113115W WO 2020134442 A1 WO2020134442 A1 WO 2020134442A1
Authority
WO
WIPO (PCT)
Prior art keywords
disk
disk failure
failure prediction
prediction model
information file
Prior art date
Application number
PCT/CN2019/113115
Other languages
English (en)
French (fr)
Inventor
弄庆鹏
屠要峰
李忠良
杨洪章
沈文全
林阳
祁鹏
郭斌
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2020134442A1 publication Critical patent/WO2020134442A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment

Definitions

  • the present invention relates to the technical field of storage systems, and in particular, to a disk failure prediction method, device, and storage medium.
  • Disks are at the core of data storage.
  • the normal operation of disks is the foundation for ensuring data validity and security.
  • software detection combined with manual inspection is currently widely used.
  • the operation and maintenance efficiency and effects of the storage system are not satisfactory.
  • every data storage point needs to invest a lot of manpower for on-site maintenance for 24-hour maintenance, which is a huge challenge to the energy of the operation and maintenance personnel;
  • Automated detection software and manual inspection methods can usually only find and process online and failed disks. In this case, the performance of the computer network system is reduced, and the computer system may be paralyzed.
  • the disk failure detection function In some disk failure detection methods, only the disk failure detection function can be realized, and the disk failure prediction cannot be realized in advance. In other disk failure detection methods, the disk failure is predicted by simple threshold judgment, and the disk failure time cannot be predicted more accurately. Therefore, how to improve the proactive prediction of disk failures to ensure the normal operation of the disks, thereby ensuring the reliability and safety of the storage system in the network, in some cases, no effective solution has been given.
  • a method for predicting a disk failure in an embodiment of the present invention includes: constructing input features based on disk-related information files collected online; according to the input features, loading a current disk failure prediction model to predict disk failures, Alternatively, incremental training is performed on the current disk failure prediction model to obtain an updated disk failure prediction model, and the updated disk failure prediction model is loaded to predict disk failure.
  • a disk failure prediction device in an embodiment of the present invention includes a processor and a memory; the memory is used to store a disk failure prediction program, and the processor is used to run the program to implement the steps of the method described above .
  • a computer storage medium in an embodiment of the present invention stores a disk failure prediction program; the program may be executed by at least one processor to implement the steps of the method described above.
  • the computer program product includes a computer program stored on a non-transitory computer-readable storage medium, the computer program includes program instructions, and when the program instructions are executed by a computer To enable the computer to execute the method described in the above aspects.
  • FIG. 1 is a main flowchart of a disk failure prediction method in an embodiment of the present invention
  • FIG. 2 is an architecture diagram of a disk failure prediction model in an embodiment of the present invention
  • FIG. 3 is a schematic diagram of software modules involved in implementing the prediction method in the embodiment of the present invention.
  • FIG. 4 is an implementation framework diagram of a disk SMART data acquisition module in an embodiment of the present invention.
  • FIG. 5 is an implementation framework diagram of a disk SMART data analysis module in an embodiment of the present invention.
  • FIG. 6 is an implementation framework diagram of a disk failure prediction feature construction module in an embodiment of the present invention.
  • FIG. 7 is an implementation framework diagram of a disk failure training feature construction module in an embodiment of the present invention.
  • FIG. 8 is a flowchart of online prediction implementation of a disk failure prediction model in an embodiment of the present invention.
  • FIG. 9 is a flowchart of online training implementation of a disk failure prediction model in an embodiment of the present invention.
  • FIG. 11 is a flowchart of an optional disk failure prediction method in an embodiment of the present invention.
  • FIG. 12 is a frame diagram of a neural network model in an embodiment of the present invention.
  • FIG. 13 is a schematic diagram of the remapped sector count parameter value and raw_value values in an embodiment of the present invention.
  • 15 is a flowchart of another optional disk failure prediction method in an embodiment of the present invention.
  • module means for the benefit of the present invention, and has no specific meaning in itself. Therefore, “module”, “component” or “unit” can be used in a mixed manner.
  • An embodiment of the present invention provides a disk failure prediction method. As shown in FIG. 1, the method includes: S101, constructing input features based on disk-related information files acquired online; S102, loading according to the input features The current disk failure prediction model predicts disk failure, or incremental training is performed on the current disk failure prediction model to obtain an updated disk failure prediction model, and the updated disk failure prediction model is loaded to predict disk failure.
  • the disk-related information files can include disk basic information files, SMART (Self-Monitoring Analysis and Reporting Technology, self-monitoring, analysis and reporting technology) data information files and offline disk files.
  • the input features may include disk failure prediction features for predicting disk failures and disk failure training features and labels for incrementally training disk failure prediction models.
  • the disk failure prediction model may be described as a prediction model or a failure prediction model.
  • the embodiment of the invention effectively improves the accuracy of the disk failure proactive prediction, and realizes the dual functions of online training and online prediction of the disk failure prediction model, avoids the lag of traditional offline training, and improves the dynamics and adaptability of the prediction model Performance, greatly improving the stability of the network storage system operation and reducing its operation and maintenance costs; and, in the construction of the input characteristics of the disk prediction, the entire disk monitoring data is used instead of only the latest time period monitoring. Data (such as the monitoring data of the last 5 weeks or 10 days) greatly improves the reliability of disk prediction.
  • the input features are constructed based on the disk-related information files obtained online; based on the input features, the current disk failure prediction model is loaded to predict the disk failure, or, the current disk Incremental training of the fault prediction model, in an exemplary embodiment, includes: judging whether to directly predict the disk fault based on the disk-related information file; if so, constructing a disk fault based on the disk-related information file Prediction features; based on the disk failure prediction features, load the current disk failure prediction model to predict disk failures; if not, automatically construct disk failure training features and labels based on the disk-related information files; based on the disk failures Train features and labels, and perform incremental training on the current disk failure prediction model.
  • the current disk failure prediction model in the first training process is a pre-constructed fully connected artificial neural network model.
  • the disk prediction model uses an artificial fully-connected neural network model (referred to as a neural network model for short), rather than a commonly used machine learning model (such as support vector machine, decision tree, etc.),
  • the artificial neural network model has a strong feature learning ability, which can further improve the accuracy of proactive prediction of disk failure.
  • the The disk failure training feature and label, before performing incremental training on the current disk failure prediction model includes: upsampling the disk failure training feature by synthesizing a minority class oversampling technology SMOTE .
  • the input characteristics of the prediction model are constructed instead of directly using SMART's original data and thresholds
  • constructing the input feature based on the disk-related information file obtained online includes: according to the variance and mean of each parameter in the disk-related information file, multi-dimension is obtained Input characteristics.
  • obtaining the multi-dimensional input characteristics according to the variance and mean value of each parameter in the disk-related information file may include: based on the variance and mean value of the raw_value and converted value of each parameter To get the multi-dimensional input features. That is to say, each SMART data item (ie parameter) in the disk-related information file includes but is not limited to: the underlying data read error rate, spindle spin-up time, remapping sector count, seek error rate, power-on time Accumulation, I/O error detection and correction, temperature, current count of sectors to be mapped, offline uncorrectable sector count, Ultra ATA access check error rate raw_value and value SMART data.
  • PCA principal component analysis is performed on multi-dimensional input features , Taking the first N dimensions in the principal component, where N is greater than 0.
  • the judging whether to directly predict a disk failure based on the disk-related information file may also include: when the disk-related information file is a disk basic information file and a SMART data information file, It is determined that the disk failure is directly predicted; when the disk-related information files are the disk basic information file, the SMART data information file, and the offline disk file, it is determined that the disk failure is not directly predicted.
  • the input feature based on the disk-related information file acquired online including: acquiring SMART data of self-monitoring, analysis and reporting technology of the disk collected online; from the SMART data Obtain the information file related to the disk.
  • the SMART data of the disk is collected regularly, and the collected SMART data is parsed to parse out the disk-related information file; in the embodiment of the present invention, the SMART disk data is used to analyze the current health status of the disk, which is highly relevant to the occurrence of the fault
  • the disk SMART data is used for relevant statistics to construct the input characteristics of the disk failure prediction model.
  • the following is a detailed description of various implementations of the embodiments of the present invention based on the form of a software module, that is, in a specific implementation process, the corresponding steps can be implemented by a software module.
  • the following software modules may be included in the embodiment of the present invention: disk SMART data acquisition module, disk SMART data analysis module, model prediction feature construction module, model training feature construction module, neural network model online prediction module, neural Network model online training module, and disk failure prediction result feedback module.
  • the specific implementation steps of each software module are as follows:
  • Disk SMART data collection module This module is a disk data automatic collection script, implemented in python (computer programming language), and its composition is shown in Figure 4, including: timer, disk SMART data collection script, and data file upload script , Timer regularly triggers the data collection script to collect SMART data. After the collection is completed, the data file script is uploaded to the disk SMART data analysis module in the format of .txt.
  • Disk SMART data parsing module This module is a script program, implemented in python, and its structure is shown in Figure 5, including: timer, data parsing module (ie data parsing script) two parts, the timer regularly triggers the data parsing script to run Analyze the disk SMART data uploaded by the disk SMART data collection module, automatically filter out the basic information of the bad disk, and generate three files of the disk basic information file, SMART data information file, and offline disk file, and pass the file to the model for predictive feature construction
  • the module and the model training feature construction module perform corresponding feature construction.
  • Model prediction feature construction module This module is a script program, implemented in python, and its structure is shown in Figure 6. It regularly detects the disk SMART analysis data file update status and constructs disk failure prediction based on the disk basic information file and SMART data information file. Feature, and generate a disk failure prediction feature file and pass it to the artificial neural network model prediction module for prediction processing.
  • Model training feature construction module This module is a script program, implemented in python, and its structure is shown in Figure 7. It regularly detects the disk SMART analysis data file update status, according to the disk basic information file, SMART data information file, offline disk file, Construct the disk fault training features and labels, and generate the disk fault training features and label files and pass them to the neural network model training module for prediction model training.
  • This module is a script program, which uses python to achieve its structure as shown in Figure 8. It mainly implements the loading, prediction and result output operations of the failure prediction model. The model obtains the disk failure prediction After the feature file, the disk failure prediction script is triggered to predict the predicted data, and the prediction result and the basic information of the disk are fed back to the disk failure prediction result feedback module.
  • This module is a script program, implemented in python, and its structure is shown in Figure 9, which mainly implements the training of fault prediction model, model output, and update signal output operation. This module obtains the disk After the failure training feature and label file are triggered, the disk failure training script is triggered. After the prediction model training is completed, the model is saved and a model update signal is issued.
  • Disk failure prediction result feedback module This module is a script program, implemented in python, and its structure is shown in Figure 10. It analyzes the feedback information of the neural network model prediction module and feeds it back to the monitoring software interface of the computer network storage system.
  • the embodiments of the present invention provide an optional disk failure prediction method.
  • the following describes the failure prediction in detail in conjunction with the SATA_HDD disk in the Western Digital brand.
  • the present invention The disk failure prediction method in the embodiment may include: Step S0: construct a SATA_HDD disk failure prediction model in the Western Digital brand; in detail, the embodiment of the present invention uses an artificial fully-connected neural network model to predict disk failure.
  • the model uses a 4-layer artificial fully connected neural network model, including 1 input layer, 2 hidden layers, and 1 output layer, where the number of neurons in the input layer is determined according to the input feature dimension, The number of output layer nodes is determined according to the output dimension.
  • the remaining configuration information of the artificial fully connected neural network is: the selection of the activation function type of the neuron, the model weight and the offset value are the selection of the random number initialization method, the selection of the model solution method, and the construction of the cost function (using the cross-entropy loss function Or quadratic variance cost function), and the choice of using regularization methods, the setting of the number of training iterations of the model, the configuration of the learning rate, etc.
  • step S8 After the model of S0 is constructed, wait for step S8 to train the model.
  • Step S1 Accept the disk state data collection instruction.
  • the disk SMART data collection instruction adopts a periodic automatic triggering method, specifically: a scheduled task is set in the automatic collection script, and the SMART data collection task program script for the Western Digital SATA_HDD interface disk is regularly triggered, with a time granularity of 1 hour.
  • Step S2 Collect the SMART data of the specified disk specified by the Western Digital SATA_HDD disk.
  • a scheduled task triggers a data collection task to collect SMART data from Western Digital's SATA_HDD disk.
  • the contents of SMART data are: underlying data read error rate, spindle spin-up time, remapping sector count, seek error rate, power-on time accumulation, serial port Speed reduction error count, temperature, current count of sectors to be mapped, offline uncorrectable sector count, Ultra ATA access check error rate, a total of 10 ID raw_value and value data, after obtaining the SMART raw data of Western Digital SATA_HDD disk, Upload the text to the server.
  • Step S3 Parse the raw data collected by SMART from the Western Digital SATA_HDD disk, and obtain the disk-related information files.
  • the related information files include three parts: disk basic information file (base.csv), offline disk basic information file (offline.csv), and SMART data information file (param.csv).
  • Disk basic information file includes: all disk IP information, site information, model information, brand/manufacturer information, interface type information, storage medium type information, collection time stamp and other information; offline disk basic information file ( offline.csv) includes: IP information, site information, model information, brand/manufacturer information, interface type information, storage media type information, collection time stamp and other information of all offline disks (ie bad disks); SMART data information files ( param.csv) includes: SMART detection data of all disks, including the underlying data read error rate, spindle spin-up time, remapping sector count, seek error rate, power-on time accumulation, I/O error detection and correction, The temperature, the current count of sectors to be mapped, the count of sectors that cannot be corrected offline, and the Ultra_ATA access check error rate are a total of 10 ID raw_value and value data.
  • Step S4 whether to perform direct prediction judgment on the data, if yes, jump to S5; if no, jump to S6.
  • Step S5 Construct a disk failure prediction feature.
  • the embodiment of the present invention selects the corresponding disk SMART parameter statistics as the characteristics of the disk failure prediction.
  • the underlying data read error rate parameter (001) the variance and mean of the value, and the variance and mean of the raw_value ; Spindle spin time parameter (003) value variance, and its raw_value value variance; remapping sector count parameter (005) value variance and mean value, and raw_value value variance and mean value; seek error rate ( 007) Variance and mean of value, and variance and mean of its raw_value; power-up time accumulation parameter (009) Variance and mean of value, and its variance and mean of raw_value; Serial port slowdown error count (183) value Variance and mean of values, variance and mean of their raw_value values; Variance and mean of temperature (194) value, and variance and mean of their raw_value values; Variance and mean of current sector count to be mapped (197) value , And the variance and mean of its raw_value value; the variance and mean of the sector count (198) value that cannot be corrected offline, and the variance and mean of its raw_value value; Ultra ATA access check error rate (199) value Variance and mean,
  • Step S6 Construct disk failure training features and labels.
  • step S6 labels the training features and classifies each feature information for training; another
  • the SMOTE technique is used to upsample the characteristics of the training samples, so that the number of good and bad disk samples in training is equal. After obtaining the disk training features, skip to S8.
  • Step S7 Load a neural network model (that is, a disk failure prediction model) to perform online prediction on the disk life status (that is, a disk failure).
  • a neural network model that is, a disk failure prediction model
  • the disk prediction feature file generated after S5 and then call the disk failure prediction neural network model to predict the sample data to be predicted and return the prediction result.
  • the returned information includes: disk failure classification prediction information, disk IP information, Disk site information, disk interface type, disk brand/manufacturer information, and disk model information.
  • Step S8 Perform online incremental training on the current disk life prediction model, and jump to S9 after the prediction model is trained.
  • Step S9 Update the disk failure prediction model. After the model update is completed, wait for the jump to S7.
  • Step S10 Output of the disk failure prediction result.
  • the embodiments of the present invention provide an optional disk failure prediction method.
  • the following uses a SATA_HDD interface disk in the Seagate brand in conjunction with the drawings to explain the failure prediction in detail, as shown in FIG. 15
  • the disk failure prediction method in the embodiment of the present invention may include the following steps: Step S0′: construct a disk failure prediction model of the Seagate SATA_HDD disk embodiment. It can be implemented according to S0 in the second embodiment.
  • Step S1' Accept the disk state data collection instruction.
  • Step S2’ Collect the SMART data of the specified disk specified by the Seagate SATA_HDD disk.
  • a scheduled task triggers a data collection task to collect SMART data.
  • the contents of SMART data are: underlying data read error rate, spindle spin-up time, remapping sector count, seek error rate, serial port slowdown error count, temperature, The offline uncorrectable sector count, Ultra ATA access check error rate, a total of 8 id raw_value and value data, two SMART collection items less than Western Digital's SATA_HDD disk, get the SMART raw data of Seagate SATA_HDD disk with text To upload to the server.
  • Step S3’ Analyze the raw data collected by the Seagate SATA_HDD disk SMART and obtain the disk-related information files.
  • Step S4' whether to perform prediction judgment on the data data, if yes, jump to S5'; if no, jump to S6'.
  • Step S5' Construct a disk failure prediction feature.
  • the embodiment of the present invention selects the corresponding disk SMART parameter statistics as the characteristics of the disk failure prediction.
  • the choices are as follows: the variance and mean of the bottom data reading error rate parameter (001) value, and the variance and mean of its raw_value value; the variance of the spindle spin time parameter (003) value, and the variance of its raw_value value; remapping The variance and mean of the value of the sector count parameter (005), and the variance and mean of its raw_value; the seek error rate (007) value is worth the variance and mean, and its raw_value is worth the variance and mean; 183) Variance and mean of value, variance and mean of its raw_value; variance and mean of temperature (194) value, variance and mean of its raw_value; sector count (198) value that cannot be corrected offline Variance and mean of values, and variance and mean of their raw_value values; Ultra ATA access check error rate (199) Vari
  • Step S6' Constructing disk failure training features and labels.
  • Step S7' Load a neural network model to make an online prediction of the disk life status.
  • Step S8' Perform online incremental training on the current disk life prediction model, and jump to S9' after the prediction model is trained.
  • Step S9' the disk failure prediction model is updated, and after the model update is completed, it waits to jump to S7.
  • Step S10' Output of the disk failure prediction result.
  • the embodiment of the invention provides a method for directly predicting the disk failure state based on the artificial fully connected neural network model and SMART disk parameters, and realizes the online training and prediction functions of the artificial fully connected neural network prediction model, and realizes the computer network storage
  • the automation of disk failure prediction and warning in the system provides the efficiency of storage system operation and maintenance.
  • An embodiment of the present invention provides a disk failure prediction device, the device includes a processor and a memory; the memory is used to store a disk failure prediction program, and the processor is used to run the program to implement the first to Example 3 The steps of the method described in any embodiment.
  • An embodiment of the present invention provides a computer storage medium that stores a disk failure prediction program; the program can be executed by at least one processor to implement the method described in any one of Embodiment 1 to Embodiment 3. step.
  • An embodiment of the present invention provides a computer program product.
  • the computer program product includes a computer program stored on a non-transitory computer-readable storage medium.
  • the computer program includes program instructions. When the program instructions are executed by a computer, Causing the computer to perform the method described in the above aspects.
  • Such software may be distributed on computer-readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media).
  • computer storage medium includes both volatile and nonvolatile implemented in any method or technology for storing information such as computer readable instructions, data structures, program modules, or other data Sex, removable and non-removable media.
  • Computer storage media include but are not limited to RAM, ROM, EEPROM, flash memory or other memory technologies, CD-ROM, digital versatile disk (DVD) or other optical disk storage, magnetic cartridges, magnetic tape, magnetic disk storage or other magnetic storage devices, or may Any other medium used to store desired information and accessible by a computer.
  • the communication medium generally contains computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transmission mechanism, and may include any information delivery medium .

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

一种磁盘故障的预测方法、设备及存储介质,所述预测方法包括:根据在线采集获得的磁盘相关的信息文件,构建输入特征(S101);根据所述输入特征,加载当前的磁盘故障预测模型对磁盘故障进行预测,或者,对当前的磁盘故障预测模型进行增量训练,得到更新的磁盘故障预测模型,加载所述更新的磁盘故障预测模型对磁盘故障进行预测(S102)。

Description

磁盘故障的预测方法、设备及存储介质
交叉引用
本发明要求在2018年12月28提交至中国专利局、申请号为201811632163.6、发明名称为“磁盘故障的预测方法、设备及存储介质”的中国专利申请的优先权,该申请的全部内容通过引用结合在本发明中。
技术领域
本发明涉及存储系统技术领域,特别是涉及一种磁盘故障的预测方法、设备及存储介质。
背景技术
磁盘在数据存储中处于核心的地位,磁盘的正常运行是确保数据有效性和安全性的基础。为了确保存储系统的正常运行,当前普遍采用软件检测结合人工巡查的保障方式,但是这样的管理方式,存储系统的运维效率和效果都不满人意。一方面,为保证存储系统全天候的正常运行,每一个数据存储点都需要投入大量的人力进行现场维护进行24小时的维护,这对运维人员的精力是一种巨大的挑战;另一方面,自动化检测软件和人工巡查方式通常只能发现和处理在线且已失效的磁盘,此时,轻则降低了计算机网络系统的性能,重则可能会导致计算机系统的瘫痪。
目前,在一些磁盘故障检测方式中,只能实现磁盘故障的检测功能,无法实现磁盘故障提前预测。在另一些磁盘故障检测方式中,只是通过简单的阈值判断来预测磁盘的故障,无法较准确的预测磁盘的故障时间。因此,如何提高对磁盘故障主动性预测,以保证磁盘的正常运行,从而保证网络中存储系统可靠性和安全性,在一些情形下,未给出有效的解决方案。
发明内容
本发明实施例中的一种磁盘故障的预测方法,包括:根据在线采集获得的磁盘相关的信息文件,构建输入特征;根据所述输入特征,加载当前的磁盘故障预测模型对磁盘故障进行预测,或者,对当前的磁盘故障预测模型进行增量训练,得到更新的磁盘故障预测模型,加载所述更新的磁盘故障预测模型对磁盘故障进行预测。
本发明实施例中的一种磁盘故障的预测设备,包括处理器和存储器;所述存储器用于存储磁盘故障的预测程序,所述处理器用于运行所述程序,以实现如上所述方法的步骤。
本发明实施例中的一种计算机存储介质,存储有磁盘故障的预测程序;所述程序可被至少一个处理器执行,以实现如上所述方法的步骤。
本发明实施例中的一种计算机程序产品,所述计算机程序产品包括存储在非暂态计算机可读存储介质上的计算机程序,所述计算机程序包括程序指令,当所述程序指令被计算机执行时,使所述计算机执行以上各个方面所述的方法。
上述说明仅是本发明技术方案的概述,为了能够更清楚了解本发明的技术手段,而可依照说明书的内容予以实施,并且为了让本发明的上述和其它目的、特征和优点能够更明显易懂,以下特举本发明的具体实施方式。
附图说明
通过阅读下文优选实施方式的详细描述,各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出优选实施方式的目的,而并不认为是对本发明的限制。而且在整个附图中,用相同的参考符号表示相同的部件。在附图中:
图1是本发明实施例中一种磁盘故障的预测方法的主流程图;
图2是本发明实施例中磁盘故障预测模型的架构图;
图3是实现本发明实施例中预测方法所涉及的软件模块示意图;
图4是本发明实施例中磁盘SMART数据采集模块的实现框架图;
图5是本发明实施例中磁盘SMART数据解析模块的实现框架图;
图6是本发明实施例中磁盘故障预测特征构建模块的实现框架图;
图7是本发明实施例中磁盘故障训练特征构建模块的实现框架图;
图8是本发明实施例中磁盘故障预测模型在线预测实现流程图;
图9是本发明实施例中磁盘故障预测模型在线训练实现流程图;
图10是本发明实施例中磁盘故障预测结果反馈模块实现流程图;
图11是本发明实施例中一种可选地磁盘故障的预测方法的流程图;
图12是本发明实施例中神经网络模型框架图;
图13是本发明实施例中重映射扇区计数参数value值和raw_value值示意图;
图14是本发明实施例中当前待映射扇区计数参数value值和raw_value值示意图;
图15是本发明实施例中又一种可选地磁盘故障的预测方法的流程图。
具体实施方式
下面将参照附图更详细地描述本公开的示例性实施例。虽然附图中显示了本公开的示例性实施例,然而应当理解,可以以各种形式实现本公开而不应被这里阐述的实施例所限制。相反,提供这些实施例是为了能够更透彻地理解本公开,并且能够将本公开的范围完整的传达给本领域的技术人员。
在后续的描述中,使用用于表示元件的诸如“模块”、“部件”或“单元”的后缀仅为了有利于本发明的说明,其本身没有特定的意义。因此,“模块”、“部件”或“单元”可以混合地使用。
使用用于区分元件的诸如“第一”、“第二”等前缀仅为了有利于本发明的说明,其本身没有特定的意义。
实施例一
本发明实施例提供一种磁盘故障的预测方法,如图1所示,所述方法包括:S101,根据在线采集获得的磁盘相关的信息文件,构建输入特征;S102,根据所述输入特征,加载当前的磁盘故障预测模型对磁盘故障进行预测,或者,对当前的磁盘故障预测模型进行增量训练,得到更新的磁盘故障预测模型,加载所述更新的磁盘故障预测模型对磁盘故障进行预测。
其中,磁盘相关的信息文件可以包括磁盘基本信息文件、SMART(Self-Monitoring Analysis and Reporting Technology,自我监测、分析及报告技术)数据信息文件和离线磁盘文件。输入特征可以包括用于对磁盘故障进行预测的磁盘故障预测特征和用于对磁盘故障预测模型进行增量训练的磁盘故障训练特征和标签。在本发明实施例中为了描述清楚简洁,磁盘故障预测模型可以描述为预测模型、故障预测模型。
本发明实施例有效提高对磁盘故障主动性预测的准确率,并且实现了磁盘故障预测模型的在线训练和在线预测双重功能,避免了传统离线训练的滞后性,提高了预测模型的动态性和适应性,大幅度提升了网络存储系统运行的稳定性,并降低了其运维成本;并且,在磁盘预测的输入特征构建中使用 的是磁盘全程监控数据,而不是只使用了最新时间段的监控数据(比如最近的5周或10天的监控数据),极大提高了磁盘预测的可靠性。
其中,在一些实施方式中,所述根据在线采集获得的磁盘相关的信息文件,构建输入特征;根据所述输入特征,加载当前的磁盘故障预测模型对磁盘故障进行预测,或者,对当前的磁盘故障预测模型进行增量训练,在一示例性实施例中,包括:根据所述磁盘相关的信息文件,判断是否对磁盘故障进行直接预测;若是,根据所述磁盘相关的信息文件,构建磁盘故障预测特征;根据所述磁盘故障预测特征,加载当前的磁盘故障预测模型对磁盘故障进行预测;若否,根据所述磁盘相关的信息文件,自动化构建磁盘故障训练特征和标签;根据所述磁盘故障训练特征和标签,对当前的磁盘故障预测模型进行增量训练。
其中,在第一次训练过程中的当前的磁盘故障预测模型为预先构造的全连人工神经网络模型。在本发明实施例中,如图2所示,磁盘预测模型采用的是人工全连神经网络模型(简称神经网络模型),而不是常用的机器学习模型(例如支持向量机、决策树等),人工神经网络模型具有很强的特征学习能力,从而可以进一步提高对磁盘故障主动性预测的准确率。
为了解决磁盘故障预测中模型训练数据样本不平衡问题,避免了预测模型无法对数量少的类别样本识别的问题,提高模型的准确率,获得更好的效果;在一些实施方式中,所述根据所述磁盘故障训练特征和标签,对当前的磁盘故障预测模型进行增量训练之前,在一示例性实施例中包括:通过合成少数类过采样技术SMOTE对所述磁盘故障训练特征进行上采样处理。
为了进一步提高对磁盘故障主动性预测的准确率,在一些实施方式中,通过对特定的磁盘SMART数据的方差和均值进行分析,构建预测模型的输入特征,而不是直接使用SMART的原始数据和阈值作为预测模型输入特征,也就是说,所述根据在线采集获得的磁盘相关的信息文件,构建输入特征,包括:根据所述磁盘相关的信息文件中的每个参数的方差和均值,得到多维度的输入特征。
其中,所述根据所述磁盘相关的信息文件中的每个参数的方差和均值,得到多维度的输入特征,可以包括:根据所述每个参数的原始值raw_value和转换值value的方差和均值,得到所述多维度的输入特征。也就是说,磁盘相关的信息文件中的每个SMART数据项(即参数)包括但不限于:底层数据读取错误率、主轴起旋时间、重映射扇区计数、寻道错误率、通电时间 累计、I/O错误检测与校正、温度、当前待映射扇区计数、脱机无法校正的扇区计数、Ultra ATA访问校验错误率的raw_value和value的SMART数据。
当然,为了在提高对磁盘故障主动性预测的准确率的基础上,降低预测、增量训练的复杂度,并提高效率,在一些实施方式中,对得到多维度的输入特征进行PCA主成分分析,取主成分中的前N个维度,其中N大于0。
在一些实施方式中,所述根据所述磁盘相关的信息文件,判断是否对磁盘故障进行直接预测,也可以包括:在所述磁盘相关的信息文件为磁盘基本信息文件和SMART数据信息文件时,判定对磁盘故障进行直接预测;在所述磁盘相关的信息文件为磁盘基本信息文件、SMART数据信息文件和离线磁盘文件时,判定不对磁盘故障进行直接预测。
其中,在一些实施方式中,所述根据在线采集获得的磁盘相关的信息文件,构建输入特征之前,包括:获取在线采集的磁盘的自我监测、分析及报告技术SMART数据;从所述SMART数据中获得所述磁盘相关的信息文件。
例如,定时进行磁盘的SMART数据的采集,对采集的SMART数据进行解析,解析出磁盘相关的信息文件;本发明实施例中通过磁盘的SMART数据分析磁盘当前的健康状况,对与故障发生高度相关的磁盘SMART数据进行相关的统计,构建磁盘故障预测模型的输入特征。
以下基于软件模块的形式,来详细描述本发明实施例的各个实施方式,也就是说,在具体实现过程中,可以通过软件模块来实现相应步骤。如图3所示,在本发明实施例中可以包括以下软件模块:磁盘SMART数据采集模块、磁盘SMART数据解析模块、模型预测特征构建模块、模型训练特征构建模块、神经网络模型在线预测模块、神经网络模型在线训练模块、以及磁盘故障预测结果反馈模块。其中各个软件模块具体的实现的步骤如下:
磁盘SMART数据采集模块:该模块为磁盘数据自动化采集脚本,采用python(计算机程序设计语言)实现,其构成如图4所示,包括:定时器、磁盘SMART数据采集脚本、以及数据文件上传脚本组成,定时器定时触发数据采集脚本进行SMART数据采集,采集完毕后通过数据文件脚本以.txt形式上传给磁盘SMART数据解析模块。
磁盘SMART数据解析模块:该模块为脚本程序,采用python实现,其构成如图5所示,包括:定时器、数据解析模块(即数据解析脚本)两部分组成,定时器定时触发数据解析脚本运行对磁盘SMART数据采集模块上传的磁盘SMART数据进行解析,自动筛选出坏盘基本信息,并生成磁盘基本 信息文件、SMART数据信息文件、离线磁盘文件三个文件,并将文件传递给模型预测特征构建模块和模型训练特征构建模块进行相应的特征构建。
模型预测特征构建模块:该模块为脚本程序,采用python实现,其构成如图6所示,其定时检测磁盘SMART解析数据文件更新情况,根据磁盘基本信息文件、SMART数据信息文件,构建磁盘故障预测特征,并生成磁盘故障预测特征文件传递给人工神经网络模型预测模块进行预测处理。
模型训练特征构建模块:该模块为脚本程序,采用python实现,其构成如图7所示,其定时检测磁盘SMART解析数据文件更新情况,根据磁盘基本信息文件、SMART数据信息文件、离线磁盘文件,构建磁盘故障训练特征和标签,并生成磁盘故障训练特征和标签文件传递给神经网络模型训练模块进行预测模型训练。
人工神经网络模型在线预测模块:该模块为脚本程序,采用python,实现其构成如图8所示,其主要实现了故障预测模型的加载、预测、结果输出操作,该模型在获取到磁盘故障预测特征文件之后触发磁盘故障预测脚本,对预测数据进行预测,并将预测结果和磁盘的基本信息反馈给磁盘故障预测结果反馈模块。
人工神经网络模型在线训练模块:该模块为脚本程序,采用python实现,实现其构成如图9所示,其主要实现了故障预测模型的训练、模型输出、更新信号输出操作,该模块获取到磁盘故障训练特征和标签文件后触发磁盘故障训练脚本,完成预测模型训练后将模型进行保存并发出模型更新信号。
磁盘故障预测结果反馈模块:该模块为脚本程序,采用python实现,实现其构成如图10所示,其将神经网络模型预测模块反馈信息进行解析,反馈到计算机网络存储系统的监控软件界面中。
实施例二
基于实施例一的各实施方式,本发明实施例提供一种可选地磁盘故障的预测方法,以下结合西数品牌中SATA_HDD磁盘,对其故障预测在进行详细说明,如图11所示,本发明实施例中磁盘故障的预测方法可以包括:步骤S0:构建西数品牌中SATA_HDD磁盘故障预测模型;详细地,本发明实施例采用人工全连神经网络模型对磁盘故障进行预测,模型的具体设计如图12所示;其中,模型采用的是4层的人工全连神经网络模型,包括1个输入层、2个隐含层、1个输出层,其中输入层神经元节点数根据输入特征维度确定,输出层节点数根据输出维度确定。
人工全连神经网络其余配置信息为:神经元的激活函数类型选择、模型权重和偏置值采用的是随机数初始化方式选择、采用模型求解方式的选择、代价函数的构造(采用交叉熵损失函数或者二次方差代价函数)、以及使用正则化方法的选择、模型的训练迭代次数设置、学习率为配置等。
S0的模型构建完毕后,等待S8步骤对模型进行训练。
步骤S1:接受磁盘状态数据采集指令。
在本实施例中磁盘SMART数据采集指令采用的是周期性自动触发方式,具体为:自动化采集脚本中设置一个定时任务,定时触发对西数SATA_HDD接口磁盘的SMART数据采集任务程序脚本,时间粒度为1小时。
步骤S2:对西数SATA_HDD磁盘指定的指定的磁盘SMART数据进行采集。
例如,定时任务触发数据采集任务对西数SATA_HDD磁盘SMART数据进行采集,SMART数据内容为:底层数据读取错误率、主轴起旋时间、重映射扇区计数、寻道错误率、通电时间累计、串口降速错误计数、温度、当前待映射扇区计数、脱机无法校正的扇区计数、Ultra ATA访问校验错误率总共10个ID的raw_value和value数据,获得西数SATA_HDD磁盘的SMART原始数据后以文本的形式上传到服务器。
步骤S3:对西数SATA_HDD磁盘SMART采集的原始数据进行解析,并获得磁盘相关的信息文件。
例如,在S3中设置一个定时任务,定时对前端上传的西数SATA_HDD磁盘的SMART监测数据进行解析,时间粒度为一天,即一天处理一次SMART检测数据并进行磁盘寿命状态评估。其中相关的信息文件包括磁盘基本信息文件(base.csv)、离线磁盘基本信息文件(offline.csv)、SMART数据信息文件(param.csv)三部分组成。
磁盘基本信息文件(base.csv)包括:所有磁盘的IP信息、局点信息、型号信息、品牌/厂商信息、接口类型信息、存储介质类型信息、采集时间戳等信息;离线磁盘基本信息文件(offline.csv)包括:所有离线磁盘(即坏盘)的IP信息、局点信息、型号信息、品牌/厂商信息、接口类型信息、存储介质类型信息、采集时间戳等信息;SMART数据信息文件(param.csv)包括:所有的磁盘的SMART检测数据,包括底层数据读取错误率、主轴起旋时间、重映射扇区计数、寻道错误率、通电时间累计、I/O错误检测与校正、温度、当前待映射扇区计数、脱机无法校正的扇区计数、Ultra ATA访问校验错误 率总共10个ID的raw_value和value数据。
步骤S4:进行数据是否执行直接预测判断,如果是,则跳转到S5;如果否,则跳转到S6。
步骤S5:构建磁盘故障预测特征。
例如,根据S3步骤中磁盘SMART数据的特点,经过对如图12和图13所示的SMART参数分析,其中,每个图中的左侧图示对应value值,右侧图示对应raw_value值。本发明实施例选择了相应的磁盘SMART参数的统计量作为磁盘故障预测的特征,具体选择如下:底层数据读取错误率参数(001)value值的方差和均值,及其raw_value值的方差和均值;主轴起旋时间参数(003)value的方差,及其raw_value值的方差;重映射扇区计数参数(005)value值的方差和均值,及其raw_value值的方差和均值;寻道错误率(007)value值的方差和均值,及其raw_value值的方差和均值;通电时间累计参数(009)value值的方差和均值,及其raw_value值的方差和均值;串口降速错误计数(183)value值的方差和均值,及其raw_value值的方差和均值;温度(194)value值的方差和均值,及其raw_value值的方差和均值;当前待映射扇区计数(197)value值的方差和均值,及其raw_value值的方差和均值;脱机无法校正的扇区计数(198)value值的方差和均值,及其raw_value值的方差和均值;Ultra ATA访问校验错误率(199)value值的方差和均值,及其raw_value值的方差和均值。总共38个特征维度,并对构建的特征进行PCA主成分分析,取主成分中的前30个维度。获得磁盘故障预测特征后跳到S7。
步骤S6:构建磁盘故障训练特征及标签。
例如,该步骤的特征分析和构建部分与S5中相同,不在重复,该步骤与S5不同部分在于,一方面,步骤S6对训练特征进行标签,对每一个特征信息归类,以便训练;另一方面,由于在磁盘中好盘和坏盘的分布不平衡,所以采用了SMOTE技术对训练样本的特征进行了上采样,使得训练中好盘和坏盘的样本数相等。获取磁盘训练特征后跳到S8。
步骤S7:加载神经网络模型(即磁盘故障预测模型)对磁盘寿命状态(即磁盘故障)进行在线预测。
例如,在S5之后生成的磁盘预测特征文件,然后调用磁盘故障预测神经网络模型对待预测样本数据进行预测,并将预测结果返回,返回的信息包括:磁盘故障归类预测信息、磁盘的IP信息、磁盘局点信息、磁盘接口类型、 磁盘品牌/厂商信息、以及磁盘型号信息等。
步骤S8:对当前的磁盘寿命预测模型进行在线增量训练,预测模型训练完毕之后跳转到S9。
步骤S9:磁盘故障预测模型更新,模型更新完毕之后等待跳转到S7。
步骤S10:磁盘故障预测结果输出。
详细地,对S7中生成的预测结果和信息进行解析,获取磁盘的故障归类预测信息、IP信息、局点信息、接口类型、品牌/厂商信息、以及型号信息,以显示在磁盘监控软件交互终端中。
实施例三
基于实施例一的各实施方式,本发明实施例提供一种可选地磁盘故障的预测方法,以下结合附图以希捷品牌中SATA_HDD接口磁盘,对其故障预测在进行详细说明,如图15所示,本发明实施例中磁盘故障的预测方法可以包括:步骤S0’:构建希捷SATA_HDD磁盘实施例的磁盘故障预测模型。可以按照实施例二中的S0实现。
步骤S1’:接受磁盘状态数据采集指令。
步骤S2’:对希捷SATA_HDD磁盘指定的指定的磁盘SMART数据进行采集。
例如,定时任务触发数据采集任务对SMART数据进行采集,SMART数据内容为:底层数据读取错误率、主轴起旋时间、重映射扇区计数、寻道错误率、串口降速错误计数、温度、脱机无法校正的扇区计数、Ultra ATA访问校验错误率总共8个id的raw_value和value数据,比西数的SATA_HDD磁盘少了两个SMART采集项,获得希捷SATA_HDD磁盘的SMART原始数据后以文本的形式上传到服务器。
步骤S3’:对希捷SATA_HDD磁盘SMART采集的原始数据进行解析并获得磁盘相关的信息文件。
步骤S4’:进行数据数据是否执行预测判断,如果是,则跳转到S5’;如果否,则跳转到S6’。
步骤S5’:构建磁盘故障预测特征。
例如,根据S3’步骤中磁盘SMART数据的特点,经过对如图13和图14所示的SMART参数分析,本发明实施例选择了相应的磁盘SMART参数的统计量作为磁盘故障预测的特征,具体选择如下:底层数据读取错误率参数(001)value值的方差和均值,及其raw_value值的方差和均值;主轴起旋 时间参数(003)value的方差,及其raw_value值的方差;重映射扇区计数参数(005)value值的方差和均值,及其raw_value值的方差和均值;寻道错误率(007)value值得方差和均值,及其raw_value值得方差和均值;串口降速错误计数(183)value值的方差和均值,及其raw_value值的方差和均值;温度(194)value值的方差和均值,及其raw_value值的方差和均值;脱机无法校正的扇区计数(198)value值的方差和均值,及其raw_value值的方差和均值;Ultra ATA访问校验错误率(199)value值的方差和均值,及其raw_value值的方差和均值。总共38个特征维度,并对构建的特征进行PCA主成分分析,取主成分中的前30个维度。获得磁盘故障预测特征后跳到S7’。
步骤S6’:构建磁盘故障训练特征及标签。
步骤S7’:加载神经网络模型对磁盘寿命状态进行在线预测。
步骤S8’:对当前的磁盘寿命预测模型进行在线增量训练,预测模型训练完毕之后跳转到S9’。
步骤S9’:磁盘故障预测模型更新,模型更新完毕之后等待跳转到S7。
步骤S10’:磁盘故障预测结果输出。
其中上述步骤S0’-S10’的具体实现可以参阅实施例二中的S0-S10。
本发明实施例提供了一种基于人工全连神经网络模型和SMART磁盘参数直接预测磁盘故障状态的方法,并实现了人工全连神经网络预测模型的在线训练和预测的功能,实现了计算机网络存储系统中磁盘故障预测和警告的自动化,提供了存储系统运维的效率。
实施例四
本发明实施例提供一种磁盘故障的预测设备,所述设备包括处理器和存储器;所述存储器用于存储磁盘故障的预测程序,所述处理器用于运行所述程序,以实现实施例一至实施例三任一实施方式所述方法的步骤。
实施例五
本发明实施例提供一种计算机存储介质,所述介质存储有磁盘故障的预测程序;所述程序可被至少一个处理器执行,以实现如实施例一至实施例三任一实施方式所述方法的步骤。
其中实施例四和实施例五在具体实现时,可以参阅实施例一至实施例三,具有相应的技术效果。
实施例六
本发明实施例提供一种计算机程序产品,所述计算机程序产品包括存储在非暂态计算机可读存储介质上的计算机程序,所述计算机程序包括程序指令,当所述程序指令被计算机执行时,使所述计算机执行以上各个方面所述的方法。
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。
上述本发明实施例序号仅仅为了描述,不代表实施例的优劣。
本领域普通技术人员可以理解,上文中所公开方法中的全部或某些步骤、系统、装置中的功能模块/单元可以被实施为软件、固件、硬件及其适当的组合。在硬件实施方式中,在以上描述中提及的功能模块/单元之间的划分不一定对应于物理组件的划分;例如,一个物理组件可以具有多个功能,或者一个功能或步骤可以由若干物理组件合作执行。某些物理组件或所有物理组件可以被实施为由处理器,如中央处理器、数字信号处理器或微处理器执行的软件,或者被实施为硬件,或者被实施为集成电路,如专用集成电路。这样的软件可以分布在计算机可读介质上,计算机可读介质可以包括计算机存储介质(或非暂时性介质)和通信介质(或暂时性介质)。如本领域普通技术人员公知的,术语计算机存储介质包括在用于存储信息(诸如计算机可读指令、数据结构、程序模块或其他数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于RAM、ROM、EEPROM、闪存或其他存储器技术、CD-ROM、数字多功能盘(DVD)或其他光盘存储、磁盒、磁带、磁盘存储或其他磁存储装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其他的介质。此外,本领域普通技术人员公知的是,通信介质通常包含计算机可读指令、数据结构、程序模块或者诸如载波或其他传输机制之类的调制数据信号中的其他数据,并且可包括任何信息递送介质。
上面结合附图对本发明的实施例进行了描述,但是本发明并不局限于上述的具体实施方式,上述的具体实施方式仅仅是示意性的,而不是限制性的,本领域的普通技术人员在本发明的启示下,在不脱离本发明宗旨和权利要求 所保护的范围情况下,还可做出很多形式,这些均属于本发明的保护之内。

Claims (10)

  1. 一种磁盘故障的预测方法,其中,所述方法包括:
    根据在线采集获得的磁盘相关的信息文件,构建输入特征;
    根据所述输入特征,加载当前的磁盘故障预测模型对磁盘故障进行预测,或者,对当前的磁盘故障预测模型进行增量训练,得到更新的磁盘故障预测模型,加载所述更新的磁盘故障预测模型对磁盘故障进行预测。
  2. 如权利要求1所述的方法,其中,所述根据在线采集获得的磁盘相关的信息文件,构建输入特征;根据所述输入特征,加载当前的磁盘故障预测模型对磁盘故障进行预测,或者,对当前的磁盘故障预测模型进行增量训练,包括:
    根据所述磁盘相关的信息文件,判断是否对磁盘故障进行直接预测;
    若是,根据所述磁盘相关的信息文件,构建磁盘故障预测特征;根据所述磁盘故障预测特征,加载当前的磁盘故障预测模型对磁盘故障进行预测;
    若否,根据所述磁盘相关的信息文件,构建磁盘故障训练特征和标签;根据所述磁盘故障训练特征和标签,对当前的磁盘故障预测模型进行增量训练。
  3. 如权利要求2所述的方法,其中,所述根据所述磁盘故障训练特征和标签,对当前的磁盘故障预测模型进行增量训练之前,包括:
    通过合成少数类过采样技术SMOTE对所述磁盘故障训练特征进行上采样处理。
  4. 如权利要求2所述的方法,其中,在第一次训练过程中的当前的磁盘故障预测模型为预先构造的全连人工神经网络模型。
  5. 如权利要求2所述的方法,其中,所述根据在线采集获得的磁盘相关的信息文件,构建输入特征之前,包括:
    获取在线采集的磁盘的自我监测、分析及报告技术SMART数据;
    从所述SMART数据中获得所述磁盘相关的信息文件。
  6. 如权利要求5所述的方法,其中,所述根据所述磁盘相关的信息文件,判断是否对磁盘故障进行直接预测,包括:
    在所述磁盘相关的信息文件为磁盘基本信息文件和SMART数据信息文件时,判定对磁盘故障进行直接预测;
    在所述磁盘相关的信息文件为磁盘基本信息文件、SMART数据信息文件和离线磁盘文件时,判定不对磁盘故障进行直接预测。
  7. 如权利要求1-6中任意一项所述的方法,其中,所述根据在线采集获得的磁盘相关的信息文件,构建输入特征,包括:
    根据所述磁盘相关的信息文件中的每个参数的方差和均值,得到多维度的输入特征。
  8. 如权利要求7所述的方法,其中,所述根据所述磁盘相关的信息文件中的每个参数的方差和均值,得到多维度的输入特征,包括:
    根据所述每个参数的原始值和转换值的方差和均值,得到所述多维度的输入特征。
  9. 一种磁盘故障的预测设备,其中,所述设备包括处理器和存储器;所述存储器用于存储磁盘故障的预测程序,所述处理器用于运行所述程序,以实现如权利要求1至8中任一项所述方法的步骤。
  10. 一种计算机存储介质,其中,所述介质存储有磁盘故障的预测程序;所述程序可被至少一个处理器执行,以实现如权利要求1至8中任意一项所述方法的步骤。
PCT/CN2019/113115 2018-12-28 2019-10-24 磁盘故障的预测方法、设备及存储介质 WO2020134442A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811632163.6A CN109739739B (zh) 2018-12-28 2018-12-28 磁盘故障的预测方法、设备及存储介质
CN201811632163.6 2018-12-28

Publications (1)

Publication Number Publication Date
WO2020134442A1 true WO2020134442A1 (zh) 2020-07-02

Family

ID=66362137

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/113115 WO2020134442A1 (zh) 2018-12-28 2019-10-24 磁盘故障的预测方法、设备及存储介质

Country Status (2)

Country Link
CN (1) CN109739739B (zh)
WO (1) WO2020134442A1 (zh)

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109739739B (zh) * 2018-12-28 2020-10-02 南京中兴软件有限责任公司 磁盘故障的预测方法、设备及存储介质
CN110347538B (zh) * 2019-06-13 2020-08-14 华中科技大学 一种存储设备故障预测方法和系统
CN110399238B (zh) * 2019-06-27 2023-09-22 浪潮电子信息产业股份有限公司 一种磁盘故障预警方法、装置、设备及可读存储介质
CN110515752B (zh) * 2019-08-23 2022-04-22 浪潮(北京)电子信息产业有限公司 一种磁盘设备寿命预测方法及装置
CN111124732A (zh) * 2019-12-20 2020-05-08 浪潮电子信息产业股份有限公司 一种磁盘故障的预测方法、系统、设备及存储介质
CN111158981A (zh) * 2019-12-26 2020-05-15 西安邮电大学 一种cdn硬盘可靠运行状态的实时监控方法及系统
CN111258788B (zh) * 2020-01-17 2024-04-12 上海商汤智能科技有限公司 磁盘故障预测方法、装置及计算机可读存储介质
CN111414286B (zh) * 2020-03-06 2021-11-09 同济大学 一种基于深度学习的不平衡硬盘数据的故障诊断方法
CN111414289A (zh) * 2020-03-16 2020-07-14 上海威固信息技术股份有限公司 一种基于迁移学习的磁盘故障预测方法及装置
CN112115004B (zh) * 2020-07-29 2022-02-11 西安交通大学 一种基于反向传播贝叶斯深度学习的硬盘寿命预测方法
CN114327241A (zh) * 2020-09-29 2022-04-12 伊姆西Ip控股有限责任公司 管理磁盘的方法、电子设备和计算机程序产品
CN112596964B (zh) * 2020-12-15 2024-05-17 中国建设银行股份有限公司 磁盘故障的预测方法及装置
CN112652351A (zh) * 2020-12-25 2021-04-13 平安科技(深圳)有限公司 硬件状态检测方法、装置、计算机设备及存储介质
CN112764960A (zh) * 2021-01-27 2021-05-07 北京明略昭辉科技有限公司 磁盘故障预测、检测及无感知更换的方法及系统
CN113076217B (zh) * 2021-04-21 2024-04-12 扬州万方科技股份有限公司 基于国产平台的磁盘故障预测方法
CN113971003A (zh) * 2021-10-17 2022-01-25 中国船舶重工集团公司第七一六研究所 一种磁盘smart数据的在线采样装置与方法
CN116701150B (zh) * 2023-06-19 2024-01-16 深圳市银闪科技有限公司 一种基于物联网的存储数据安全监管系统及方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102129397A (zh) * 2010-12-29 2011-07-20 深圳市永达电子股份有限公司 一种自适应磁盘阵列故障预测方法及系统
CN104503874A (zh) * 2014-12-29 2015-04-08 南京大学 一种云计算平台的硬盘故障预测方法
CN106897178A (zh) * 2017-02-21 2017-06-27 曲阜师范大学 一种基于极限学习机的慢盘检测方法及系统
WO2017129030A1 (zh) * 2016-01-29 2017-08-03 阿里巴巴集团控股有限公司 磁盘的故障预测方法和装置
CN109739739A (zh) * 2018-12-28 2019-05-10 中兴通讯股份有限公司 磁盘故障的预测方法、设备及存储介质

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7313550B2 (en) * 2002-03-27 2007-12-25 Council Of Scientific & Industrial Research Performance of artificial neural network models in the presence of instrumental noise and measurement errors
CN107273273A (zh) * 2017-06-27 2017-10-20 郑州云海信息技术有限公司 一种分布式集群硬件故障预警方法及系统
CN108205424B (zh) * 2017-12-29 2021-09-28 北京奇虎科技有限公司 基于磁盘的数据迁移方法、装置及电子设备
CN108228377B (zh) * 2017-12-29 2020-07-07 华中科技大学 一种面向磁盘故障检测的smart阈值优化方法
CN108681496A (zh) * 2018-05-09 2018-10-19 北京奇艺世纪科技有限公司 磁盘故障的预测方法、装置及电子设备
CN108845760A (zh) * 2018-05-28 2018-11-20 郑州云海信息技术有限公司 一种硬盘维护方法、装置、设备及可读存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102129397A (zh) * 2010-12-29 2011-07-20 深圳市永达电子股份有限公司 一种自适应磁盘阵列故障预测方法及系统
CN104503874A (zh) * 2014-12-29 2015-04-08 南京大学 一种云计算平台的硬盘故障预测方法
WO2017129030A1 (zh) * 2016-01-29 2017-08-03 阿里巴巴集团控股有限公司 磁盘的故障预测方法和装置
CN106897178A (zh) * 2017-02-21 2017-06-27 曲阜师范大学 一种基于极限学习机的慢盘检测方法及系统
CN109739739A (zh) * 2018-12-28 2019-05-10 中兴通讯股份有限公司 磁盘故障的预测方法、设备及存储介质

Also Published As

Publication number Publication date
CN109739739A (zh) 2019-05-10
CN109739739B (zh) 2020-10-02

Similar Documents

Publication Publication Date Title
WO2020134442A1 (zh) 磁盘故障的预测方法、设备及存储介质
US10334106B1 (en) Detecting events from customer support sessions
US11558272B2 (en) Methods and systems for predicting time of server failure using server logs and time-series data
US11403164B2 (en) Method and device for determining a performance indicator value for predicting anomalies in a computing infrastructure from values of performance indicators
US11334845B2 (en) System and method for generating notification of an order delivery
US10983855B2 (en) Interface for fault prediction and detection using time-based distributed data
US11307916B2 (en) Method and device for determining an estimated time before a technical incident in a computing infrastructure from values of performance indicators
AU2019275633B2 (en) System and method of automated fault correction in a network environment
TW201629766A (zh) 儲存裝置壽命監控系統以及其儲存裝置壽命監控方法
CN111381970B (zh) 集群任务的资源分配方法及装置、计算机装置及存储介质
US9372734B2 (en) Outage window scheduler tool
CN112308126A (zh) 故障识别模型训练方法、故障识别方法、装置及电子设备
US20190095495A1 (en) Computerized methods and systems for grouping data using data streams
WO2021002780A1 (ru) Система мониторинга качества и процессов на базе машинного обучения
CN111124732A (zh) 一种磁盘故障的预测方法、系统、设备及存储介质
CN109032891A (zh) 一种云计算服务器硬盘故障预测方法及装置
CN114758714A (zh) 一种硬盘故障预测方法、装置、电子设备及存储介质
CN112527572A (zh) 磁盘故障预测方法、装置、计算机可读存储介质及服务器
CN117227177B (zh) 基于设备间监督的多任务打印控制方法及系统
CN114357858A (zh) 一种基于多任务学习模型的设备劣化分析方法及系统
CN116757838A (zh) 一种企业信用评价风险模型的建模方法
CN116032725A (zh) 故障根因定位模型的生成方法及装置
CN112819182B (zh) 一种钞盒状态预测模型的训练方法、装置及设备
CN114610590A (zh) 作业运行时长确定方法、装置、设备及存储介质
CN114389962A (zh) 宽带流失用户确定方法、装置、电子设备和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19906184

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 12.11.2021)

122 Ep: pct application non-entry in european phase

Ref document number: 19906184

Country of ref document: EP

Kind code of ref document: A1