CN110489299A

CN110489299A - A kind of method and its system optimizing hard disk service life

Info

Publication number: CN110489299A
Application number: CN201910681285.2A
Authority: CN
Inventors: 曾宪力; 梁永堂; 徐景鸿
Original assignee: Guangdong Ruijiang Cloud Computing Co Ltd
Current assignee: Guangdong Ruijiang Cloud Computing Co Ltd; Guangdong Eflycloud Computing Co Ltd
Priority date: 2019-07-26
Filing date: 2019-07-26
Publication date: 2019-11-22
Also published as: JP6760619B1; JP2021022416A

Abstract

The invention discloses a kind of methods and its system for optimizing hard disk service life, and method and step includes: S1, increases monitoring point to each hard disk in cluster hard disk, and tests the test parameter of monitoring point on each hard disk；S2, the numerical value for collecting simultaneously typing test parameter, numerical value and pre-set test parameter threshold value are compared；S3, logarithm have reached the hard disk of test parameter threshold value, and osd role's removal in cluster hard disk this hard disk is offline, take over its work using other hard disks；S4, step S1 to S3 is repeated, it is when each hard disk in cluster hard disk all has reached test parameter threshold value, all hard disks for removing offline in cluster hard disk are again online.The present invention optimizes cluster-based storage, and each hard disk is made to obtain unified monitoring and scheduling, realizes the being consistent property of service life of hard disk cluster.

Description

A kind of method and its system optimizing hard disk service life

Technical field

The present invention relates to hard disc of computer technical field, in particular to a kind of method for optimizing hard disk service life and its it is System.

Background technique

With the rapid development of cloud computing, there are various distributed computings, especially distributed storage The development of calculating, the centralization of dramatically saving over, the cost of shared storage.But internet is not also thorough at present Change, uses the server architecture of x86 at present or in bottom, distributed storage still needs the hard disk using server It goes to construct.

Distributed storage is substantially the storage for going to optimize and carry out itself using the open source projects of ceph on the internet, Ceph is exactly that multiple servers is used to constitute a storage cluster.Actually specific data, copy are all on physical hard disk. As time goes by, the increase of data is read and write, the aging of accessory, slowly hard disk will damage or aging.

And wherein, the aging of hard disk is unavoidable, but in several hard disks of such multiple servers, such as Fruit allows the period of hard disk life-span is substantially consistent, achievees the effect that back off together, rather than some hard disk corruptions of certain day, we It was found that one handles one, it is difficult to accomplish in this way.

Summary of the invention

The technical problem to be solved in the present invention is that a kind of method and its system for optimizing hard disk service life is provided, it is right Cluster-based storage optimizes, and each hard disk is made to obtain unified monitoring and scheduling, realizes the being consistent property of service life of hard disk cluster.

In order to solve the above technical problems, the invention provides the following technical scheme: it is a kind of optimize hard disk service life method, The following steps are included:

S1, monitoring point is increased to each hard disk in cluster hard disk, and tests the test ginseng of monitoring point on each hard disk Number；

S2, the numerical value for collecting simultaneously typing test parameter, numerical value and pre-set test parameter threshold value are compared；

S3, logarithm have reached the hard disk of test parameter threshold value, under the osd role of this hard disk is removed in cluster hard disk Line takes over its work using other hard disks；

S4, step S1 to S3 is repeated, when each hard disk in cluster hard disk all has reached test parameter threshold value, will collected It is again online that offline all hard disks are removed in group's hard disk.

Preferably, the test parameter of the step S1 are as follows: the number of starts remaps sector number, hard disk energization number, is powered Time cumulation, main shaft rise rotation number of retries, hard disk calibration number of retries, bottom data read error rate, parity error rate, Write error rate, read-write number, read-write one of capacity and hard disk temperature or a variety of.

Preferably, the test parameter threshold value is pre-set according to selected test parameter type, test parameter threshold The set-up mode of value are as follows: voluntarily the specific value of test parameter threshold value is arranged according to demand in user.

Preferably, the specific value of the test parameter is collected using smartctl tool.

It is a further object of the present invention to provide a kind of systems for optimizing hard disk service life, comprising:

Monitoring module for monitoring point on each hard disk in cluster hard disk to be arranged, and is monitored monitoring point；

Test module, for testing the test parameter on monitoring point；

Test parameter collection module is specifically counted for collecting the test parameter that test module described in simultaneously typing is tested Value；

Test parameter threshold preset module, for test parameter threshold value to be preset and saved；

Contrast module, test parameter specific value and the test for collecting to the test parameter collection module are joined The number pre-set test parameter threshold value of threshold preset module compares judgement；

Hard disk downline module is removed, the hard disk for having reached test parameter threshold value for the specific value to test parameter carries out It removes offline；

And wire module is gone up again, it is used for: when hard disks all in cluster hard disk are removed by the removal hard disk downline module After offline, wire module is gone up again by all hard disks in cluster hard disk again online use.

Preferably, the test parameter collection module uses smartctl tool.

After adopting the above technical scheme, the present invention at least has the following beneficial effects: the present invention by collecting test parameter And matching test parameter threshold, the offline hard disk for reaching threshold value one by one, when all hard disks by it is offline after in online use again, So that be able to maintain the service life consistent for all hard disks；When by offline hard disk, task is accepted in other hard disks, To reduce the hard disk use for reaching threshold value, reduces individual hard disk corruptions and obtain too fast possibility.

Detailed description of the invention

Fig. 1 is a kind of step flow chart for the method for optimizing hard disk service life of the embodiment of the present invention 1；

Fig. 2 is a kind of structural block diagram for the system for optimizing hard disk service life of the embodiment of the present invention 4.

Specific embodiment

It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase It mutually combines, the application is described in further detail in the following with reference to the drawings and specific embodiments.

Embodiment 1

All be to do OSD storage object using hard disk one by one in ceph distributed storage (i.e. hard disk cluster), allow data, Copy is all write in storage respectively.Thus to a batch in the same hard disk cluster, there are a United Dispatching and monitoring Plane.The present invention mainly passes through bottom hard disk: service life, read-write state, bad track situation etc. a series of test parameter, right Hard disk in hard disk cluster carries out the online and offline processing of weight, and target is to coordinate the indicator consilience of hard disk response, is finally reached The problem of extending the service life.

As shown in Figure 1, present embodiments providing a kind of method for optimizing hard disk service life, step includes:

S1, monitoring point is increased to each hard disk in cluster hard disk, and tests the test ginseng of monitoring point on each hard disk Number；Generally, the specific value of test parameter is collected using preferential smartctl tool；

S2, the numerical value for collecting simultaneously typing test parameter, numerical value and pre-set test parameter threshold value are compared；Institute Stating test parameter threshold value is pre-set according to selected test parameter type, the set-up mode of test parameter threshold value are as follows: is used Voluntarily the specific value of test parameter threshold value is arranged according to demand in family

S4, step S1 to S3 is repeated, when each hard disk in cluster hard disk all has reached test parameter threshold value, will collected It removes that offline all hard disks are again online in group's hard disk, has reached the being consistent property of service life of all hard disks in cluster hard disk Purpose.

Wherein, test parameter includes the number of starts, remaps that sector number, hard disk energization number, conduction time is accumulative, main shaft It plays rotation number of retries, hard disk calibration number of retries, bottom data read error rate, parity error rate, write error rate, read Number, read-write capacity and hard disk temperature are write, it is one or more in test parameter use above all in actual test application To be determined with specific reference to the demand of user.

Embodiment 2

The present embodiment is on the basis of embodiment 1, hard disk temperature to be used to carry out saying for specific method as test parameter It is bright.A method of optimization hard disk service life, step include:

S11, monitoring point is increased to each hard disk in cluster hard disk, and tests the hard disk temperature of monitoring point on each hard disk Degree；

S12, it collects and the real-time hard disk temperature of each hard disk of typing, by the numerical value of real-time hard disk temperature and pre-set Hard disk temperature threshold value compares；Here hard disk temperature threshold value is preferentially set as 50 DEG C；

S13, the numerical value of hard disk temperature has been reached or more than 50 DEG C, has been removed in the osd role of this hard disk offline, used it His hard disk takes over its work；

S14, the comparison that step S11 to S13 carries out hard disk temperature to the hard disk in cluster hard disk is repeated, when in cluster hard disk The hard disk temperatures of all hard disks all have reached 50 DEG C and by after offline, then by all hard disks in cluster hard disk again on Line.

Remaining test parameter, such as the number of starts, to remap sector number, hard disk energization number, conduction time accumulative, main Axis rise rotation number of retries, hard disk calibration number of retries, bottom data read error rate, parity error rate, write error rate, Number, read-write capacity are read and write, can be single as test parameter progress threshold decision, then reach the offline one by one of threshold value and lays equal stress on New whole hard disk is online together, has achieved the purpose that the being consistent property of service life of all hard disks in cluster hard disk.

Embodiment 3

Present embodiment discloses a kind of methods for optimizing hard disk service life, are also to carry out on the basis of embodiment 1 Optimization, pre-set threshold value is specifically removed, cluster hard-disk system program is allowed voluntarily to handle all test parameters.This implementation A kind of method for optimizing hard disk service life of example, step include:

S21, monitoring point is increased to each hard disk in cluster hard disk, and tests the test ginseng of monitoring point on each hard disk Number；

S22, the numerical value for collecting simultaneously typing test parameter, after the numerical value of same test parameter is added, divided by same test The hard disk number of parameter, obtains the average value of test parameter；

S23, logarithm are higher by the hard disk of average value, the osd role of this hard disk are removed in cluster hard disk offline, use Other hard disks take over its work；

S24, step S22, S23 is repeated, is continuously available the average value of different test parameters, and average value can be constantly It increases；After the average value of test parameter increases, those will restore at leisure online by offline hard disk, until all hard disks Test parameter tend to consistent, then whole hard disk all online work together again, such hard disk life-spans can also reach unanimously Property, reduce individual hard disk corruptions and obtains too fast possibility.

Embodiment 4

As shown in Fig. 2, present embodiments providing a kind of system for optimizing hard disk service life, system includes:

Test module, for testing the test parameter on monitoring point；

Test parameter collection module is specifically counted for collecting the test parameter that test module described in simultaneously typing is tested Value；Preferably, test parameter collection module uses smartctl tool

It although an embodiment of the present invention has been shown and described, for the ordinary skill in the art, can be with Understand, these embodiments can be carried out with a variety of equivalent changes without departing from the principles and spirit of the present invention Change, modification, replacement and variant, the scope of the present invention is defined by the appended claims and their equivalents.

Claims

1. a kind of method for optimizing hard disk service life, which comprises the following steps:

S1, monitoring point is increased to each hard disk in cluster hard disk, and tests the test parameter of monitoring point on each hard disk；

S3, logarithm have reached the hard disk of test parameter threshold value, and osd role's removal in cluster hard disk this hard disk is offline, Its work is taken over using other hard disks；

S4, step S1 to S3 is repeated, it is when each hard disk in cluster hard disk all has reached test parameter threshold value, cluster is hard It is again online that offline all hard disks are removed in disk.

2. a kind of method for optimizing hard disk service life according to claim 1, which is characterized in that the survey of the step S1 Try parameter are as follows: the number of starts remaps that sector number, hard disk energization number, conduction time is accumulative, main shaft plays rotation number of retries, hard Disk calibrates number of retries, bottom data read error rate, parity error rate, write error rate, read-write number, read-write capacity And one of hard disk temperature or a variety of.

3. a kind of method for optimizing hard disk service life according to claim 2, which is characterized in that the test parameter threshold Value is pre-set according to selected test parameter type, the set-up mode of test parameter threshold value are as follows: user is voluntarily according to need It asks the specific value of test parameter threshold value is arranged.

4. a kind of method for optimizing hard disk service life according to any one of claims 1 to 3, which is characterized in that the survey The specific value of examination parameter is collected using smartctl tool.

5. a kind of system for optimizing hard disk service life characterized by comprising

Test module, for testing the test parameter on monitoring point；

Test parameter collection module, the test parameter specific value tested for collecting test module described in simultaneously typing；

Contrast module, test parameter specific value and the test parameter threshold for being collected to the test parameter collection module The pre-set test parameter threshold value of value presetting module compares judgement；

Hard disk downline module is removed, the hard disk for having reached test parameter threshold value for the specific value to test parameter removes It is offline；

And upper wire module again, be used for: when hard disks all in cluster hard disk by removals hard disk downline module removal it is offline Afterwards, wire module is gone up again by all hard disks in cluster hard disk again online use.

6. a kind of system for optimizing hard disk service life according to claim 5, which is characterized in that the test parameter is received Collect module and uses smartctl tool.