CN110489299A - A kind of method and its system optimizing hard disk service life - Google Patents

A kind of method and its system optimizing hard disk service life Download PDF

Info

Publication number
CN110489299A
CN110489299A CN201910681285.2A CN201910681285A CN110489299A CN 110489299 A CN110489299 A CN 110489299A CN 201910681285 A CN201910681285 A CN 201910681285A CN 110489299 A CN110489299 A CN 110489299A
Authority
CN
China
Prior art keywords
hard disk
test parameter
cluster
module
threshold value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910681285.2A
Other languages
Chinese (zh)
Inventor
曾宪力
梁永堂
徐景鸿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Ruijiang Cloud Computing Co Ltd
Guangdong Eflycloud Computing Co Ltd
Original Assignee
Guangdong Ruijiang Cloud Computing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Ruijiang Cloud Computing Co Ltd filed Critical Guangdong Ruijiang Cloud Computing Co Ltd
Priority to CN201910681285.2A priority Critical patent/CN110489299A/en
Publication of CN110489299A publication Critical patent/CN110489299A/en
Priority to JP2020008082A priority patent/JP6760619B1/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3037Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a memory, e.g. virtual memory, cache
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3058Monitoring arrangements for monitoring environmental properties or parameters of the computing system or of the computing system component, e.g. monitoring of power, currents, temperature, humidity, position, vibrations

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a kind of methods and its system for optimizing hard disk service life, and method and step includes: S1, increases monitoring point to each hard disk in cluster hard disk, and tests the test parameter of monitoring point on each hard disk;S2, the numerical value for collecting simultaneously typing test parameter, numerical value and pre-set test parameter threshold value are compared;S3, logarithm have reached the hard disk of test parameter threshold value, and osd role's removal in cluster hard disk this hard disk is offline, take over its work using other hard disks;S4, step S1 to S3 is repeated, it is when each hard disk in cluster hard disk all has reached test parameter threshold value, all hard disks for removing offline in cluster hard disk are again online.The present invention optimizes cluster-based storage, and each hard disk is made to obtain unified monitoring and scheduling, realizes the being consistent property of service life of hard disk cluster.

Description

A kind of method and its system optimizing hard disk service life
Technical field
The present invention relates to hard disc of computer technical field, in particular to a kind of method for optimizing hard disk service life and its it is System.
Background technique
With the rapid development of cloud computing, there are various distributed computings, especially distributed storage The development of calculating, the centralization of dramatically saving over, the cost of shared storage.But internet is not also thorough at present Change, uses the server architecture of x86 at present or in bottom, distributed storage still needs the hard disk using server It goes to construct.
Distributed storage is substantially the storage for going to optimize and carry out itself using the open source projects of ceph on the internet, Ceph is exactly that multiple servers is used to constitute a storage cluster.Actually specific data, copy are all on physical hard disk. As time goes by, the increase of data is read and write, the aging of accessory, slowly hard disk will damage or aging.
And wherein, the aging of hard disk is unavoidable, but in several hard disks of such multiple servers, such as Fruit allows the period of hard disk life-span is substantially consistent, achievees the effect that back off together, rather than some hard disk corruptions of certain day, we It was found that one handles one, it is difficult to accomplish in this way.
Summary of the invention
The technical problem to be solved in the present invention is that a kind of method and its system for optimizing hard disk service life is provided, it is right Cluster-based storage optimizes, and each hard disk is made to obtain unified monitoring and scheduling, realizes the being consistent property of service life of hard disk cluster.
In order to solve the above technical problems, the invention provides the following technical scheme: it is a kind of optimize hard disk service life method, The following steps are included:
S1, monitoring point is increased to each hard disk in cluster hard disk, and tests the test ginseng of monitoring point on each hard disk Number;
S2, the numerical value for collecting simultaneously typing test parameter, numerical value and pre-set test parameter threshold value are compared;
S3, logarithm have reached the hard disk of test parameter threshold value, under the osd role of this hard disk is removed in cluster hard disk Line takes over its work using other hard disks;
S4, step S1 to S3 is repeated, when each hard disk in cluster hard disk all has reached test parameter threshold value, will collected It is again online that offline all hard disks are removed in group's hard disk.
Preferably, the test parameter of the step S1 are as follows: the number of starts remaps sector number, hard disk energization number, is powered Time cumulation, main shaft rise rotation number of retries, hard disk calibration number of retries, bottom data read error rate, parity error rate, Write error rate, read-write number, read-write one of capacity and hard disk temperature or a variety of.
Preferably, the test parameter threshold value is pre-set according to selected test parameter type, test parameter threshold The set-up mode of value are as follows: voluntarily the specific value of test parameter threshold value is arranged according to demand in user.
Preferably, the specific value of the test parameter is collected using smartctl tool.
It is a further object of the present invention to provide a kind of systems for optimizing hard disk service life, comprising:
Monitoring module for monitoring point on each hard disk in cluster hard disk to be arranged, and is monitored monitoring point;
Test module, for testing the test parameter on monitoring point;
Test parameter collection module is specifically counted for collecting the test parameter that test module described in simultaneously typing is tested Value;
Test parameter threshold preset module, for test parameter threshold value to be preset and saved;
Contrast module, test parameter specific value and the test for collecting to the test parameter collection module are joined The number pre-set test parameter threshold value of threshold preset module compares judgement;
Hard disk downline module is removed, the hard disk for having reached test parameter threshold value for the specific value to test parameter carries out It removes offline;
And wire module is gone up again, it is used for: when hard disks all in cluster hard disk are removed by the removal hard disk downline module After offline, wire module is gone up again by all hard disks in cluster hard disk again online use.
Preferably, the test parameter collection module uses smartctl tool.
After adopting the above technical scheme, the present invention at least has the following beneficial effects: the present invention by collecting test parameter And matching test parameter threshold, the offline hard disk for reaching threshold value one by one, when all hard disks by it is offline after in online use again, So that be able to maintain the service life consistent for all hard disks;When by offline hard disk, task is accepted in other hard disks, To reduce the hard disk use for reaching threshold value, reduces individual hard disk corruptions and obtain too fast possibility.
Detailed description of the invention
Fig. 1 is a kind of step flow chart for the method for optimizing hard disk service life of the embodiment of the present invention 1;
Fig. 2 is a kind of structural block diagram for the system for optimizing hard disk service life of the embodiment of the present invention 4.
Specific embodiment
It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase It mutually combines, the application is described in further detail in the following with reference to the drawings and specific embodiments.
Embodiment 1
All be to do OSD storage object using hard disk one by one in ceph distributed storage (i.e. hard disk cluster), allow data, Copy is all write in storage respectively.Thus to a batch in the same hard disk cluster, there are a United Dispatching and monitoring Plane.The present invention mainly passes through bottom hard disk: service life, read-write state, bad track situation etc. a series of test parameter, right Hard disk in hard disk cluster carries out the online and offline processing of weight, and target is to coordinate the indicator consilience of hard disk response, is finally reached The problem of extending the service life.
As shown in Figure 1, present embodiments providing a kind of method for optimizing hard disk service life, step includes:
S1, monitoring point is increased to each hard disk in cluster hard disk, and tests the test ginseng of monitoring point on each hard disk Number;Generally, the specific value of test parameter is collected using preferential smartctl tool;
S2, the numerical value for collecting simultaneously typing test parameter, numerical value and pre-set test parameter threshold value are compared;Institute Stating test parameter threshold value is pre-set according to selected test parameter type, the set-up mode of test parameter threshold value are as follows: is used Voluntarily the specific value of test parameter threshold value is arranged according to demand in family
S3, logarithm have reached the hard disk of test parameter threshold value, under the osd role of this hard disk is removed in cluster hard disk Line takes over its work using other hard disks;
S4, step S1 to S3 is repeated, when each hard disk in cluster hard disk all has reached test parameter threshold value, will collected It removes that offline all hard disks are again online in group's hard disk, has reached the being consistent property of service life of all hard disks in cluster hard disk Purpose.
Wherein, test parameter includes the number of starts, remaps that sector number, hard disk energization number, conduction time is accumulative, main shaft It plays rotation number of retries, hard disk calibration number of retries, bottom data read error rate, parity error rate, write error rate, read Number, read-write capacity and hard disk temperature are write, it is one or more in test parameter use above all in actual test application To be determined with specific reference to the demand of user.
Embodiment 2
The present embodiment is on the basis of embodiment 1, hard disk temperature to be used to carry out saying for specific method as test parameter It is bright.A method of optimization hard disk service life, step include:
S11, monitoring point is increased to each hard disk in cluster hard disk, and tests the hard disk temperature of monitoring point on each hard disk Degree;
S12, it collects and the real-time hard disk temperature of each hard disk of typing, by the numerical value of real-time hard disk temperature and pre-set Hard disk temperature threshold value compares;Here hard disk temperature threshold value is preferentially set as 50 DEG C;
S13, the numerical value of hard disk temperature has been reached or more than 50 DEG C, has been removed in the osd role of this hard disk offline, used it His hard disk takes over its work;
S14, the comparison that step S11 to S13 carries out hard disk temperature to the hard disk in cluster hard disk is repeated, when in cluster hard disk The hard disk temperatures of all hard disks all have reached 50 DEG C and by after offline, then by all hard disks in cluster hard disk again on Line.
Remaining test parameter, such as the number of starts, to remap sector number, hard disk energization number, conduction time accumulative, main Axis rise rotation number of retries, hard disk calibration number of retries, bottom data read error rate, parity error rate, write error rate, Number, read-write capacity are read and write, can be single as test parameter progress threshold decision, then reach the offline one by one of threshold value and lays equal stress on New whole hard disk is online together, has achieved the purpose that the being consistent property of service life of all hard disks in cluster hard disk.
Embodiment 3
Present embodiment discloses a kind of methods for optimizing hard disk service life, are also to carry out on the basis of embodiment 1 Optimization, pre-set threshold value is specifically removed, cluster hard-disk system program is allowed voluntarily to handle all test parameters.This implementation A kind of method for optimizing hard disk service life of example, step include:
S21, monitoring point is increased to each hard disk in cluster hard disk, and tests the test ginseng of monitoring point on each hard disk Number;
S22, the numerical value for collecting simultaneously typing test parameter, after the numerical value of same test parameter is added, divided by same test The hard disk number of parameter, obtains the average value of test parameter;
S23, logarithm are higher by the hard disk of average value, the osd role of this hard disk are removed in cluster hard disk offline, use Other hard disks take over its work;
S24, step S22, S23 is repeated, is continuously available the average value of different test parameters, and average value can be constantly It increases;After the average value of test parameter increases, those will restore at leisure online by offline hard disk, until all hard disks Test parameter tend to consistent, then whole hard disk all online work together again, such hard disk life-spans can also reach unanimously Property, reduce individual hard disk corruptions and obtains too fast possibility.
Embodiment 4
As shown in Fig. 2, present embodiments providing a kind of system for optimizing hard disk service life, system includes:
Monitoring module for monitoring point on each hard disk in cluster hard disk to be arranged, and is monitored monitoring point;
Test module, for testing the test parameter on monitoring point;
Test parameter collection module is specifically counted for collecting the test parameter that test module described in simultaneously typing is tested Value;Preferably, test parameter collection module uses smartctl tool
Test parameter threshold preset module, for test parameter threshold value to be preset and saved;
Contrast module, test parameter specific value and the test for collecting to the test parameter collection module are joined The number pre-set test parameter threshold value of threshold preset module compares judgement;
Hard disk downline module is removed, the hard disk for having reached test parameter threshold value for the specific value to test parameter carries out It removes offline;
And wire module is gone up again, it is used for: when hard disks all in cluster hard disk are removed by the removal hard disk downline module After offline, wire module is gone up again by all hard disks in cluster hard disk again online use.
It although an embodiment of the present invention has been shown and described, for the ordinary skill in the art, can be with Understand, these embodiments can be carried out with a variety of equivalent changes without departing from the principles and spirit of the present invention Change, modification, replacement and variant, the scope of the present invention is defined by the appended claims and their equivalents.

Claims (6)

1. a kind of method for optimizing hard disk service life, which comprises the following steps:
S1, monitoring point is increased to each hard disk in cluster hard disk, and tests the test parameter of monitoring point on each hard disk;
S2, the numerical value for collecting simultaneously typing test parameter, numerical value and pre-set test parameter threshold value are compared;
S3, logarithm have reached the hard disk of test parameter threshold value, and osd role's removal in cluster hard disk this hard disk is offline, Its work is taken over using other hard disks;
S4, step S1 to S3 is repeated, it is when each hard disk in cluster hard disk all has reached test parameter threshold value, cluster is hard It is again online that offline all hard disks are removed in disk.
2. a kind of method for optimizing hard disk service life according to claim 1, which is characterized in that the survey of the step S1 Try parameter are as follows: the number of starts remaps that sector number, hard disk energization number, conduction time is accumulative, main shaft plays rotation number of retries, hard Disk calibrates number of retries, bottom data read error rate, parity error rate, write error rate, read-write number, read-write capacity And one of hard disk temperature or a variety of.
3. a kind of method for optimizing hard disk service life according to claim 2, which is characterized in that the test parameter threshold Value is pre-set according to selected test parameter type, the set-up mode of test parameter threshold value are as follows: user is voluntarily according to need It asks the specific value of test parameter threshold value is arranged.
4. a kind of method for optimizing hard disk service life according to any one of claims 1 to 3, which is characterized in that the survey The specific value of examination parameter is collected using smartctl tool.
5. a kind of system for optimizing hard disk service life characterized by comprising
Monitoring module for monitoring point on each hard disk in cluster hard disk to be arranged, and is monitored monitoring point;
Test module, for testing the test parameter on monitoring point;
Test parameter collection module, the test parameter specific value tested for collecting test module described in simultaneously typing;
Test parameter threshold preset module, for test parameter threshold value to be preset and saved;
Contrast module, test parameter specific value and the test parameter threshold for being collected to the test parameter collection module The pre-set test parameter threshold value of value presetting module compares judgement;
Hard disk downline module is removed, the hard disk for having reached test parameter threshold value for the specific value to test parameter removes It is offline;
And upper wire module again, be used for: when hard disks all in cluster hard disk by removals hard disk downline module removal it is offline Afterwards, wire module is gone up again by all hard disks in cluster hard disk again online use.
6. a kind of system for optimizing hard disk service life according to claim 5, which is characterized in that the test parameter is received Collect module and uses smartctl tool.
CN201910681285.2A 2019-07-26 2019-07-26 A kind of method and its system optimizing hard disk service life Pending CN110489299A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910681285.2A CN110489299A (en) 2019-07-26 2019-07-26 A kind of method and its system optimizing hard disk service life
JP2020008082A JP6760619B1 (en) 2019-07-26 2020-01-22 Hard disk service life optimization method and its system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910681285.2A CN110489299A (en) 2019-07-26 2019-07-26 A kind of method and its system optimizing hard disk service life

Publications (1)

Publication Number Publication Date
CN110489299A true CN110489299A (en) 2019-11-22

Family

ID=68547681

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910681285.2A Pending CN110489299A (en) 2019-07-26 2019-07-26 A kind of method and its system optimizing hard disk service life

Country Status (2)

Country Link
JP (1) JP6760619B1 (en)
CN (1) CN110489299A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112162705A (en) * 2020-09-30 2021-01-01 新浪网技术(中国)有限公司 RAID (redundant array of independent disk) set fault automatic offline repair reporting method and system

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113625957B (en) * 2021-06-30 2024-02-13 济南浪潮数据技术有限公司 Method, device and equipment for detecting hard disk faults
CN113608915A (en) * 2021-08-31 2021-11-05 新华三技术有限公司成都分公司 Disk fault detection method and device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112162705A (en) * 2020-09-30 2021-01-01 新浪网技术(中国)有限公司 RAID (redundant array of independent disk) set fault automatic offline repair reporting method and system

Also Published As

Publication number Publication date
JP6760619B1 (en) 2020-09-23
JP2021022416A (en) 2021-02-18

Similar Documents

Publication Publication Date Title
CN110489299A (en) A kind of method and its system optimizing hard disk service life
US10397076B2 (en) Predicting hardware failures in a server
US8892959B2 (en) Automatic problem diagnosis
US9075838B2 (en) Method and apparatus for an improved file repository
JP4756675B2 (en) System, method and program for predicting computer resource capacity
US8914598B2 (en) Distributed storage resource scheduler and load balancer
Li et al. Being accurate is not enough: New metrics for disk failure prediction
Cao et al. Carver: Finding important parameters for storage system tuning
US10606722B2 (en) Method and system for diagnosing remaining lifetime of storages in data center
JP2019191929A (en) Performance analysis method and management computer
EP2620837A1 (en) Operation management method of information processing system
US7181364B2 (en) Automated detecting and reporting on field reliability of components
US10310935B2 (en) Dynamically restoring disks based on array properties
US20160210640A1 (en) Assortment optimization using incremental swapping with demand transference
CN101246727A (en) Buffer management method and optical disc drive
US20230073644A1 (en) Systems and methods for identification of issue resolutions using collaborative filtering
JP2019132457A (en) Control program, control method, and control device
US20230136274A1 (en) Ceph Media Failure and Remediation
CN111767162B (en) Fault prediction method for hard disks of different models and electronic device
WO2020263335A1 (en) Use of error correction-based metric for identifying poorly performing data storage devices
US20080208930A1 (en) Management of redundancy in data arrays
CN108376553B (en) Monitoring method and system for magnetic disk of video server
WO2021100800A1 (en) Failure probability evaluation system
US11113163B2 (en) Storage array drive recovery
CN112005223A (en) Device state assessment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20191122

WD01 Invention patent application deemed withdrawn after publication