CN111475319A - Hard disk screening method and device based on machine learning - Google Patents

Hard disk screening method and device based on machine learning Download PDF

Info

Publication number
CN111475319A
CN111475319A CN202010154855.5A CN202010154855A CN111475319A CN 111475319 A CN111475319 A CN 111475319A CN 202010154855 A CN202010154855 A CN 202010154855A CN 111475319 A CN111475319 A CN 111475319A
Authority
CN
China
Prior art keywords
hard disk
data
model
training
machine learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202010154855.5A
Other languages
Chinese (zh)
Inventor
汪荣义
王思明
陈宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202010154855.5A priority Critical patent/CN111475319A/en
Publication of CN111475319A publication Critical patent/CN111475319A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/008Reliability or availability analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • G06F11/2205Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing using arrangements specific to the hardware being tested
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Quality & Reliability (AREA)
  • Databases & Information Systems (AREA)
  • Computer Hardware Design (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a hard disk screening method and a hard disk screening device based on machine learning, wherein the method comprises the following steps: creating a BP neural network model and training the created model; deploying the trained machine learning model on line, respectively extracting the performance parameters of the hard disk and the user information data from the data warehouse by scanning the serial number of the hard disk and the production order number of the server, processing the extracted performance parameters of the hard disk and the user information data, and entering the machine learning model to predict the reliability of the hard disk; the method comprises the steps of training a machine model on historical hard disk performance parameters, user service scenes and user machine room environment data, after the trained model is deployed on line, predicting the reliability of a hard disk by combining the hard disk performance parameters with the user service scenes and the user machine room environment data through the machine learning model to realize targeted selection, and improving the reliability of the hard disk of a server in actual service, so that the safety and reliability of the service data of a user are improved.

Description

Hard disk screening method and device based on machine learning
Technical Field
The invention relates to the technical field of hard disk testing, in particular to a hard disk screening method and device based on machine learning.
Background
In the era of internet big data, with the development of business, the server is widely applied to various fields, and the requirement on the reliability of the server is higher, and as one of the most main data storage media of the server, namely a hard disk, the hard disk is used as a main medium for storing mass data, and how to ensure the reliability of the hard disk becomes a serious topic. In the traditional server production flow, a plurality of procedures are used for testing and ensuring the basic performance of the hard disk, but the hard disk which can not cover all servers is applied to user service scenes and user machine room environments, so that the reliability of the hard disk becomes uncontrollable.
Disclosure of Invention
The invention provides a hard disk screening method and device based on machine learning, aiming at the problem that in the traditional server production process, a plurality of process tests guarantee the basic performance of a hard disk, but the hard disk which can not cover all servers is applied to user service scenes and user machine room environment, so that the reliability of the hard disk becomes uncontrollable.
The technical scheme of the invention is as follows:
on one hand, the technical scheme of the invention provides a hard disk screening method based on machine learning, which comprises the following steps:
creating a BP neural network model and training the created model;
deploying the trained model on line, respectively extracting the performance parameters and the user information data of the hard disk from a data warehouse by scanning the serial number of the hard disk and the production order number of the server, processing the extracted performance parameters and the extracted user information data of the hard disk, and entering a machine learning model for predicting the reliability of the hard disk;
and screening the hard disk according to the prediction result.
Further, the steps of creating a BP neural network model and training the created model include:
creating a BP neural network model, and training and adjusting the model through a training data set;
and verifying and evaluating the built model by using the test data set.
Further, before the step of creating the BP neural network model, training and tuning the model by using the training data set, the method further includes:
acquiring historical hard disk performance parameters and user information data;
carrying out data preprocessing on the acquired data;
and performing data characteristic engineering processing on the preprocessed data to generate a training data set and a test data set.
Further, the data preprocessing comprises: data variable missing value processing, category variable quantification processing and text quantity serialization processing;
further, the step of performing data feature engineering processing on the preprocessed data comprises:
and obtaining historical hard disk reliability classification, wherein the historical hard disk reliability classification is divided into reliable and unreliable according to the size relation between the actual failure time and the predicted failure time of the hard disk, and the reliability classification is quantized into 1 and 0.
Further, the user information data includes user service scenario data and user machine room environment data;
further, the step of obtaining the historical hard disk performance parameters and the user information data further comprises:
and processing the acquired hard disk performance parameter data, the user service scene data and the user machine room environment data through the data ET L and storing the data into a data warehouse.
The method comprises the steps of establishing relevance between a user service scene, a user machine room environment and a hard disk performance parameter through machine learning by utilizing a user service scene, the user machine room environment and the hard disk performance parameter, training a machine learning model by utilizing historical data, predicting the reliability by combining the current user service scene, the user machine room environment and the hard disk performance parameter, and screening the hard disk through the machine learning model.
Furthermore, in order to ensure the accuracy of prediction, the accuracy of prediction is ensured through periodic self training after the trained model is deployed on line; and retraining the machine learning model by periodically integrating historical data and data warehouse data, verifying and evaluating, and redeploying.
On the other hand, the technical scheme of the invention provides a hard disk screening device based on machine learning, which comprises a model training module, a reliability prediction module and a prediction result analysis module;
the model training module is used for creating a BP neural network model and training the created model;
the reliability prediction module is used for deploying the trained model on line, extracting the performance parameters of the hard disk and the user information data from the data warehouse respectively by scanning the serial number of the hard disk and the production order number of the server, processing the extracted performance parameters of the hard disk and the user information data, and entering a machine learning model for predicting the reliability of the hard disk;
and the prediction result analysis module is used for screening the hard disk according to the prediction result.
Furthermore, the model training module comprises a creating unit, a training unit and a verification and evaluation unit;
the creating unit is used for creating a BP neural network model;
the training unit is used for training and optimizing the model through a training data set;
and the verification evaluation unit is used for verifying and evaluating the built model by using the test data set.
Further, the apparatus further comprises: the system comprises a data acquisition module, a data preprocessing module and a data characteristic processing module;
the data acquisition module is used for acquiring historical hard disk performance parameters and user information data;
the data preprocessing module is used for preprocessing the acquired data;
and the data characteristic processing module is used for performing data characteristic engineering processing on the preprocessed data to generate a training data set and a test data set. The reliability of the hard disk before online is predicted through a machine learning model, and performance parameters of the hard disk, user service scenes and environmental data of a user machine room are combined. After the trained model is deployed on line, the accuracy of prediction is ensured through periodic self training.
According to the technical scheme, the invention has the following advantages: the invention carries out machine model training on historical hard disk performance parameters, user service scenes and user machine room environment data, after the trained models are deployed on line, hard disk performance parameters provided by a hard disk manufacturer are combined with the user service scenes and the user machine room environment data to predict the reliability of the hard disk through a machine learning model to realize targeted selection, thereby improving the reliability of the hard disk of a server in actual service and further improving the safety and reliability of the service data of a user.
In addition, the invention has reliable design principle, simple structure and very wide application prospect.
Therefore, compared with the prior art, the invention has prominent substantive features and remarkable progress, and the beneficial effects of the implementation are also obvious.
Drawings
In order to more clearly illustrate the embodiments or technical solutions in the prior art of the present invention, the drawings used in the description of the embodiments or prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained based on these drawings without creative efforts.
FIG. 1 is a schematic flow diagram of a method of one embodiment of the invention.
Detailed Description
In order to make those skilled in the art better understand the technical solution of the present invention, the technical solution in the embodiment of the present invention will be clearly and completely described below with reference to the drawings in the embodiment of the present invention, and it is obvious that the described embodiment is only a part of the embodiment of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example one
As shown in fig. 1, the technical solution of the present invention provides a hard disk screening method based on machine learning, which includes the following steps:
s11: creating a BP neural network model and training the created model;
s12: the trained machine learning model is deployed on line, hard disk performance parameters and user information data are respectively extracted from a data warehouse by scanning the serial number of the hard disk and the production order number of a server, and the extracted hard disk performance parameters and the extracted user information data are processed and enter the machine learning model to predict the reliability of the hard disk; the user information data comprises user service scene data and user machine room environment data;
s13: and screening the hard disk according to the prediction result.
The method comprises the steps of establishing relevance between a user service scene, a user machine room environment and a hard disk performance parameter through machine learning by utilizing a user service scene, the user machine room environment and the hard disk performance parameter, training a machine learning model by utilizing historical data, predicting the reliability by combining the current user service scene, the user machine room environment and the hard disk performance parameter, and screening the hard disk through the machine learning model.
Example two
The technical scheme of the invention provides a hard disk screening method based on machine learning, which comprises the following steps:
s11: creating a BP neural network model and training the created model; the method specifically comprises the following steps: creating a BP neural network model, and training and adjusting the model through a training data set; and verifying and evaluating the built model by using the test data set.
It should be noted that, before the step of creating the BP neural network model, training and tuning the model by using the training data set, the method further includes: acquiring historical hard disk performance parameters and user information data; the user information data comprises user service scene data and user machine room environment data; carrying out data preprocessing on the acquired data; the data preprocessing comprises the following steps: data variable missing value processing, category variable quantification processing and text quantity serialization processing; and performing data characteristic engineering processing on the preprocessed data to generate a training data set and a test data set. And obtaining historical hard disk reliability classification, wherein the historical hard disk reliability classification is divided into reliable and unreliable according to the size relation between the actual failure time and the predicted failure time of the hard disk, and the reliability classification is quantized into 1 and 0.
The BP neural Network (Back-ProPagation Network) is also called as a Back ProPagation neural Network, and through training of sample data, the weight and the threshold of the Network are continuously corrected to enable an error function to descend along the direction of negative gradient and approach to expected output. It is a neural network model with wider application.
The BP network consists of an input layer, a hidden layer and an output layer, the hidden layer can have one layer or a plurality of layers, the network selects an S-shaped transfer function,
Figure BDA0002403713550000071
by back-propagation of error functions
Figure BDA0002403713550000072
(ti is the expected output and Oi is the calculated output of the network), the network weight and the threshold are continuously adjusted to make the error function E extremely small.
The neural network toolbox in MAT L AB is selected for network training in the prediction, and the concrete implementation steps of the prediction model are as follows:
training sample data is input into a network after being normalized, excitation functions of a hidden layer and an output layer of the network are set to be tan sig and logsig functions respectively, the network training function is thingdx, the network performance function is mse, and the number of hidden layer neurons is initially set to be 6. And setting network parameters. The number of network iterations epochs is 5000, the expected error, goal, is 0.00000001, and the learning rate is 0.01. And after the parameters are set, starting to train the network.
S12, deploying the trained model on line, extracting hard disk performance parameters and user information data from a data warehouse respectively by scanning the serial number of the hard disk and the production order number of the server, processing the extracted hard disk performance parameters and the extracted user information data, and entering a machine learning model to predict the reliability of the hard disk, after network training is completed, inputting all quality indexes into a network to obtain predicted data, wherein the user information data comprises user service scene data and user machine room environment data, and processing the obtained hard disk performance parameter data, the user service scene data and the user machine room environment data through data ET L and storing the data into the data warehouse.
S13: and screening the hard disk according to the prediction result. The reliability of the hard disk before online is predicted through a machine learning model, and performance parameters of the hard disk, user service scenes and environmental data of a user machine room are combined. After the trained model is deployed on line, the accuracy of prediction is ensured through periodic self training.
The method comprises the steps of establishing relevance between a user service scene, a user machine room environment and a hard disk performance parameter through machine learning by utilizing a user service scene, the user machine room environment and the hard disk performance parameter, training a machine learning model by utilizing historical data, predicting the reliability by combining the current user service scene, the user machine room environment and the hard disk performance parameter, and screening the hard disk through the machine learning model.
In order to ensure the accuracy of prediction, the accuracy of prediction is ensured by periodic self training after the trained model is deployed on line; and retraining the machine learning model by periodically integrating historical data and data warehouse data, verifying and evaluating, and redeploying.
EXAMPLE III
The technical scheme of the invention provides a hard disk screening device based on machine learning, which comprises a model training module, a reliability prediction module and a prediction result analysis module;
the model training module is used for creating a BP neural network model and training the created model;
the reliability prediction module is used for deploying the trained model on line, extracting the performance parameters of the hard disk and the user information data from the data warehouse respectively by scanning the serial number of the hard disk and the production order number of the server, processing the extracted performance parameters of the hard disk and the user information data, and entering a machine learning model for predicting the reliability of the hard disk;
and the prediction result analysis module is used for screening the hard disk according to the prediction result.
The model training module comprises a creating unit, a training unit and a verification and evaluation unit; the creating unit is used for creating a BP neural network model; the training unit is used for training and optimizing the model through a training data set; and the verification evaluation unit is used for verifying and evaluating the built model by using the test data set.
The device also includes: the system comprises a data acquisition module, a data preprocessing module and a data characteristic processing module; the data acquisition module is used for acquiring historical hard disk performance parameters and user information data; the data preprocessing module is used for preprocessing the acquired data; and the data characteristic processing module is used for performing data characteristic engineering processing on the preprocessed data to generate a training data set and a test data set. The reliability of the hard disk before online is predicted through a machine learning model, and performance parameters of the hard disk, user service scenes and environmental data of a user machine room are combined. After the trained model is deployed on line, the accuracy of prediction is ensured through periodic self training.
Although the present invention has been described in detail by referring to the drawings in connection with the preferred embodiments, the present invention is not limited thereto. Various equivalent modifications or substitutions can be made on the embodiments of the present invention by those skilled in the art without departing from the spirit and scope of the present invention, and these modifications or substitutions are within the scope of the present invention/any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims (10)

1. A hard disk screening method based on machine learning is characterized by comprising the following steps:
creating a BP neural network model and training the created model;
deploying the trained machine learning model on line, respectively extracting the performance parameters of the hard disk and the user information data from the data warehouse by scanning the serial number of the hard disk and the production order number of the server, processing the extracted performance parameters of the hard disk and the user information data, and entering the machine learning model to predict the reliability of the hard disk;
and screening the hard disk according to the prediction result.
2. The hard disk screening method based on machine learning of claim 1, wherein the step of creating a BP neural network model and training the created model comprises:
creating a BP neural network model, and training and adjusting the model through a training data set;
and carrying out verification evaluation on the created model by using the test data set.
3. The hard disk screening method based on machine learning of claim 2, wherein the step of creating the BP neural network model and performing model training and tuning through the training data set further comprises:
acquiring historical hard disk performance parameters and user information data;
carrying out data preprocessing on the acquired data;
and performing data characteristic engineering processing on the preprocessed data to generate a training data set and a test data set.
4. The hard disk screening method based on machine learning of claim 3, wherein the data preprocessing process comprises: data variable missing value processing, category variable quantification processing and text quantification serialization processing.
5. The hard disk screening method based on machine learning of claim 4, wherein the step of performing data feature engineering processing on the preprocessed data comprises:
and obtaining historical hard disk reliability classification, wherein the historical hard disk reliability classification is divided into reliable and unreliable according to the size relation between the actual failure time and the predicted failure time of the hard disk, and the reliability classification is quantized into 1 and 0.
6. The hard disk screening method based on machine learning of claim 3, wherein the user information data comprises user service scenario data and user room environment data.
7. The hard disk screening method based on machine learning of claim 6, wherein the step of obtaining historical hard disk performance parameters and user information data further comprises:
and processing the acquired hard disk performance parameter data, the user service scene data and the user machine room environment data through the data ET L and storing the data into a data warehouse.
8. A hard disk screening device based on machine learning is characterized by comprising a model training module, a reliability prediction module and a prediction result analysis module;
the model training module is used for creating a BP neural network model and training the created model;
the reliability prediction module is used for deploying the trained model on line, extracting the performance parameters of the hard disk and the user information data from the data warehouse respectively by scanning the serial number of the hard disk and the production order number of the server, processing the extracted performance parameters of the hard disk and the user information data, and entering a machine learning model for predicting the reliability of the hard disk;
and the prediction result analysis module is used for screening the hard disk according to the prediction result.
9. The hard disk screening device based on machine learning of claim 8, wherein the model training module comprises a creation unit, a training unit, a verification evaluation unit;
the creating unit is used for creating a BP neural network model;
the training unit is used for training and optimizing the model through a training data set;
and the verification evaluation unit is used for verifying and evaluating the built model by using the test data set.
10. The hard disk screening device based on machine learning of claim 9, wherein the device further comprises: the system comprises a data acquisition module, a data preprocessing module and a data characteristic processing module;
the data acquisition module is used for acquiring historical hard disk performance parameters and user information data;
the data preprocessing module is used for preprocessing the acquired data;
and the data characteristic processing module is used for performing data characteristic engineering processing on the preprocessed data to generate a training data set and a test data set.
CN202010154855.5A 2020-03-08 2020-03-08 Hard disk screening method and device based on machine learning Withdrawn CN111475319A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010154855.5A CN111475319A (en) 2020-03-08 2020-03-08 Hard disk screening method and device based on machine learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010154855.5A CN111475319A (en) 2020-03-08 2020-03-08 Hard disk screening method and device based on machine learning

Publications (1)

Publication Number Publication Date
CN111475319A true CN111475319A (en) 2020-07-31

Family

ID=71747262

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010154855.5A Withdrawn CN111475319A (en) 2020-03-08 2020-03-08 Hard disk screening method and device based on machine learning

Country Status (1)

Country Link
CN (1) CN111475319A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112540893A (en) * 2020-12-16 2021-03-23 北京同有飞骥科技股份有限公司 Performance test method for distributed storage

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112540893A (en) * 2020-12-16 2021-03-23 北京同有飞骥科技股份有限公司 Performance test method for distributed storage

Similar Documents

Publication Publication Date Title
US10600005B2 (en) System for automatic, simultaneous feature selection and hyperparameter tuning for a machine learning model
CN111177095B (en) Log analysis method, device, computer equipment and storage medium
US11461847B2 (en) Applying a trained model to predict a future value using contextualized sentiment data
CN110366734B (en) Optimizing neural network architecture
US20220121906A1 (en) Task-aware neural network architecture search
CN109615116A (en) A kind of telecommunication fraud event detecting method and detection system
CN110336838B (en) Account abnormity detection method, device, terminal and storage medium
CN106803799B (en) Performance test method and device
CN110490304B (en) Data processing method and device
CN111158964B (en) Disk failure prediction method, system, device and storage medium
Huang et al. Reliable machine prognostic health management in the presence of missing data
CN109670549A (en) The data screening method, apparatus and computer equipment of fired power generating unit
CN110674100B (en) User demand prediction method and framework based on full-channel operation data
CN111210332A (en) Method and device for generating post-loan management strategy and electronic equipment
CN112966778B (en) Data processing method and device for unbalanced sample data
CN111475319A (en) Hard disk screening method and device based on machine learning
CN113761193A (en) Log classification method and device, computer equipment and storage medium
CN104580109A (en) Method and device for generating click verification code
Wu et al. A multi-sensor fusion-based prognostic model for systems with partially observable failure modes
CN115904916A (en) Hard disk failure prediction method and device, electronic equipment and storage medium
WO2023048807A1 (en) Hierarchical representation learning of user interest
CN115543762A (en) Method and system for expanding SMART data of disk and electronic equipment
CN111612783B (en) Data quality assessment method and system
Patel et al. Rumour detection using graph neural network and oversampling in benchmark Twitter dataset
CN112801327A (en) Method, device, equipment and storage medium for predicting logistics flow and modeling thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20200731

WW01 Invention patent application withdrawn after publication