CN110737924A

CN110737924A - method and equipment for data protection

Info

Publication number: CN110737924A
Application number: CN201810803843.3A
Authority: CN
Inventors: 李宏杰; 张绍文; 郭占东
Original assignee: Zhongchang (suzhou) Software Technology Co Ltd; China Mobile Communications Group Co Ltd
Current assignee: Zhongchang (suzhou) Software Technology Co Ltd; China Mobile Communications Group Co Ltd
Priority date: 2018-07-20
Filing date: 2018-07-20
Publication date: 2020-01-31
Anticipated expiration: 2038-07-20
Also published as: CN110737924B

Abstract

The invention discloses data protection methods and devices, which are used for solving the problems that in the prior art, after a fault domain fails, the performance of a distributed storage cluster is reduced and the operation of a service is influenced.A distributed storage cluster can subdivide hard disks in the fault domain which does not fail to obtain at least new fault domains, reselects the hard disks in data fragmentation distribution in the new fault domain according to an initial data fragmentation distribution rule, and performs recovery and rebalancing in the reselected hard disks.

Description

method and equipment for data protection

Technical Field

The invention relates to the technical field of computers, in particular to data protection methods and equipment.

Background

With the explosive growth of data, the traditional data storage mode cannot meet the requirement, and distributed storage becomes the first choice. Distributed storage includes several classes of distributed block storage, distributed file storage, and distributed object storage.

The existing distributed storage cluster usually adopts a data redundancy mode to carry out data protection, and typical data redundancy modes comprise a copy strategy and an erasure code strategy, wherein the copy strategy or the erasure code strategy can generate data fragments (such as copies and erasure code check blocks), the existing distributed storage cluster adopts a fault domain dividing method to place the data fragments in different fault domains by establishing corresponding distribution rules, and the fault domain division can be artificially divided based on a physical topology, for example, machine rooms are fault domains, so that when fault domains are failed and become unavailable, data can not be lost.

In the prior art, after fault domain division and data fragmentation cross-fault domain distribution rule division are adopted for a distributed storage cluster, if a physical topology in the distributed storage cluster changes (no matter fault occurs, capacity expansion and capacity reduction occur), the original fault domain division and data fragmentation cross-fault domain distribution rule cannot meet the requirements, so that the performance of the distributed storage cluster is reduced, and the service operation is influenced.

Disclosure of Invention

The invention provides data protection methods and devices, which are used for solving the problems that after a fault domain fails, the performance of a distributed storage cluster is reduced and the operation of a service is influenced in the prior art.

, after the hard disks in the non-failed domain in the distributed storage cluster are repartitioned to obtain at least new failure domains, the hard disks distributed with data fragments are reselected in the new failure domains according to the initial data fragment distribution rule, and the data fragments in the failed domain are recovered and rebalanced in the reselected hard disks.

In the embodiment of the invention, after the fault domain fails, the distributed storage cluster can timely re-partition the hard disks in the fault domain which does not fail to obtain the new fault domain, and then recover the lost data based on the re-partitioned fault domain, thereby reducing the probability that the performance is reduced and the service cannot normally run because the fault domain fails and cannot be processed in time.

In a specific implementation, before repartitioning the hard disks in the non-failed failure domain in the distributed storage cluster to obtain at least new failure domains, it is necessary to determine that the hard disk parameter values in at least failure domains in the distributed storage cluster do not satisfy the preset range of the preset hard disk parameter values, or determine that the failure rate of at least failure domains in the distributed storage cluster reaches the th preset value, where the failure rate may be determined in the following manner:

the method comprises the steps of obtaining operation indexes of at least fault domains of the distributed storage cluster in an operation state, and determining the fault rate of at least fault domains according to the operation indexes.

In the embodiment of the invention, whether the failure rate of at least fault domains in the distributed storage cluster reaches the th preset value or not is determined, and when the failure rate of each fault domain exceeds the th preset value, it is determined that the fault domain fails, the distributed storage cluster is re-divided, so that the probability that the performance is reduced and the service cannot normally run due to the fact that the fault domain fails to be processed in time is reduced.

In a specific implementation, when the hard disks in the non-failed failure domain in the distributed storage cluster are re-partitioned to obtain at least new failure domains, the operation parameters of at least hard disks in the non-failed failure domain need to be obtained, and then the hard disks in the non-failed failure domain are re-partitioned to obtain at least new failure domains according to the initial failure domain partitioning rule and the operation parameters.

In the embodiment of the invention, after the fault domain fails, the hard disks in the fault domain which does not fail are re-divided based on the initial fault domain division rule and the operation parameters to obtain a new fault domain, so that the new fault domain division of the distributed storage cluster is more reasonable.

In a specific implementation, before recovering and rebalancing data fragments in a failed domain in a reselected hard disk, it is further required to determine that traffic volume running in a distributed storage cluster is smaller than a second preset value.

In the embodiment of the invention, the recovery and rebalancing are carried out when the traffic volume running in the distributed storage cluster is smaller than the second preset value, so that the traffic volume running in the distributed storage cluster is not too large, the condition of blocking during running of the distributed storage cluster is reduced, and the user experience is improved.

In a second aspect, an apparatus for data protection, comprising at least processing units and at least storage units, wherein the storage units store program code that, when executed by the processing units, causes the processing units to:

the method comprises the steps of carrying out repartitioning on hard disks in fault domains without faults in a distributed storage cluster to obtain at least new fault domains, reselecting the hard disks with data fragmentation distribution in the new fault domains according to an initial data fragmentation distribution rule, and carrying out recovery and rebalancing on the data fragmentation in the fault domains with faults in the reselected hard disks.

In a specific implementation, the processing unit is further configured to:

before the hard disks in the fault domains without faults in the distributed storage cluster are re-divided to obtain at least new fault domains, determining that the hard disk parameter values in at least fault domains in the distributed storage cluster do not meet the preset range of preset hard disk parameter values, or

And determining that the failure rate of at least failure domains in the distributed storage cluster reaches a preset value.

In a specific implementation, the processing unit is specifically configured to:

In a specific implementation, the processing unit is further configured to:

and according to the initial fault domain division rule and the operation parameters, the hard disks in the fault domain which does not have faults are divided again to obtain at least new fault domains.

In a specific implementation, the processing unit is further configured to:

and before the data fragments in the fault domain with faults are recovered and rebalanced in the reselected hard disk, determining that the service volume running in the distributed storage cluster is less than a second preset value.

The technical effects of any implementation manners in the second aspect can be found in the technical effects of the implementation manner in the aspect, and are not described herein again.

These and other aspects of the present application will be more readily apparent from the following description of the embodiments.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive exercise.

FIG. 1 is a diagram illustrating methods for data protection according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of initially partitioning fault domains of a distributed storage cluster according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of repartitioning a fault domain of a distributed storage cluster according to an embodiment of the present invention;

FIG. 4 is a flowchart of a complete method for data protection according to an embodiment of the present invention;

FIG. 5 is a diagram illustrating an apparatus for data protection according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of another kinds of data protection devices according to an embodiment of the present invention.

Detailed Description

The embodiment of the invention provides data protection methods and devices, which can be used in an application scenario of distributed storage cluster data storage, wherein a distributed storage cluster refers to a large number of PC servers interconnected through the Internet, providing overall services to the outside, and having no following characteristics:

(1) the scalable, distributed storage cluster system can be scaled to cluster sizes of hundreds or even thousands, and the overall performance of the system is increased linearly.

(2) Low cost, allowing distributed storage clusters to be built on low cost servers due to the automatic fault tolerance and automatic load balancing characteristics of the distributed storage clusters, and linear scalability also allows the cost of the servers to be reduced.

(2) High performance, whether for a single server or for the entire distributed storage cluster, requires that the distributed storage cluster have high performance.

(1) The distributed storage cluster needs to provide an interface which is convenient and easy to use for the outside, needs to be provided with a perfect monitoring tool and an operation and maintenance tool, and requires to be conveniently integrated with other systems.

However, once the physical topology in the distributed storage cluster changes, the original fault domain division and data fragmentation cross-fault domain distribution rules cannot meet the requirements, which causes the performance of the distributed storage cluster to be reduced and affects the service operation.

In the embodiment of the invention, once the fault domain fails, the distributed storage cluster can timely re-partition the hard disks in the fault domain which does not fail to obtain a new fault domain, and then recover the lost data based on the re-partitioned fault domain, thereby avoiding the performance reduction of the distributed storage cluster and ensuring the normal operation of the service.

For purposes of clarity, technical solutions and advantages of the present invention, the present invention will be described in further detail with reference to the accompanying drawings , and it is to be understood that the described embodiments are only a partial embodiment, rather than a complete embodiment, .

As shown in fig. 1, an embodiment of the present invention provides methods for data protection, where the method includes:

100, repartitioning hard disks in fault domains without faults in the distributed storage cluster to obtain at least new fault domains;

step 101, reselecting a hard disk with data fragment distribution in the new fault domain according to an initial data fragment distribution rule;

and 102, recovering and rebalancing the data fragments in the failed domain in the reselected hard disk.

In the embodiment of the invention, the distributed storage cluster can re-partition the hard disks in the fault domain which does not have faults to obtain at least new fault domains, reselect the hard disks in the data fragment distribution in the new fault domains according to the initial data fragment distribution rule, and recover and re-balance the hard disks in the reselected hard disks.

And dividing fault domains of the distributed storage cluster by adopting a reasonable initial fault domain dividing rule according to a physical topological structure of the distributed storage cluster, and assigning hard disks in the divided fault domains according to the initial data fragment distribution rule, wherein the distributed storage cluster can operate based on the divided fault domains.

Here, the fault domain refers to areas logically isolated in the distributed storage cluster, and an internal fault occurring in an area does not affect other isolated areas (other fault domains), so the fault domain may also be called an isolated domain.

The initial fault domain rule refers to a rule that needs to divide the distributed storage cluster into a set of several fault domains and which regions serve as fault domains.

For example, the initial fault domain rule may be that two servers in the distributed storage cluster are the zone of the fault domain, and each servers in the two are fault domains.

For another example, the initial fault domain rule may be to treat all servers in rooms in the distributed storage cluster as fault domains.

In the operation process, operation indexes of at least fault domains of the distributed storage cluster in the operation state are obtained based on the use condition of the distributed storage cluster, and then the fault rate of at least fault domains is determined according to the obtained operation indexes.

The failure rate of the fault domain refers to the probability of failure of the fault domain;

the operation indexes of the fault domains are logic concepts, specifically refer to operation indexes of hard disks in fault domains in the fault domains, and may specifically be part or all of the following parameters:

the utilization rate of the hard disk, the read-write speed, the annual failure probability of the hard disk, the temperature, the average erasing times, the reading error times, the rotating speed of the hard disk, the power-on time of the hard disk and the like.

Determining the failure rate of at least failure domains according to the acquired operation indexes, wherein optional embodiments are that the annual failure probability of the hard disks in the failure domains is obtained by using the operation indexes, and the annual failure probability of the hard disks in the failure domains is used as the failure rate.

For example, the annual failure probability of a hard disk is calculated according to the following formula:

in the above formula, AFR is the annual failure probability of the hard disk, and MTBF is the mean time between failures, and can be provided by the hard disk manufacturer.

Here, it should be noted that: it should be emphasized that the AFR calculation method of other hardware (such as CPU, fan) can also be obtained by applying the above formula, and for simplifying the description, the embodiment of the present invention only considers the case of hard disk damage as the failure rate of the fault domain.

Correspondingly, the failure probability symbol of the hard disk in the fixed year time of is obtained according to the Poisson distribution, which is specifically as follows:

wherein n represents the number of failed hard disks within time, λ represents the expected value of single hard disk damage per unit time, t represents the preset time, λ can be obtained from annual failure probability AFR, such as:

wherein N represents the number of hard disks of the entire storage cluster.

Therefore, the probability of more than hard disk damages in years can be obtained by the following formula:

here, it should be noted that: the above description uses the annual failure probability of the hard disk as the failure rate, which is only an example, and it is within the protection scope of the embodiment of the present invention to use other parameters as the failure rate in the specific implementation, for example, the hard disk usage is used as the failure rate.

After the failure rates of at least failure domains in the distributed storage cluster are obtained, whether the failure rate of at least failure domains in the distributed storage cluster reaches the th preset value is judged, and if the failure rate reaches the th preset value, the failure domain is considered to have a failure.

For example, the th preset value is set to 10, a th fault domain, a second fault domain and a third fault domain exist in the distributed storage cluster, during operation, operation indexes of at least three fault domains of the distributed storage cluster in an operating state are obtained based on the use condition of the distributed storage cluster, and then the failure rate of the th fault domain is determined to be 5, the failure rate of the second fault domain is determined to be 5, the failure rate of the third fault domain is determined to be 11 according to the obtained operation indexes, and at this time, if the failure rate of the third fault domain exceeds the th preset value, the third fault domain is set to 10, and it is determined that the third fault domain fails.

Optionally, in the operation process, hard disk parameter values in at least fault domains in the distributed storage cluster may also be obtained based on the use condition of the distributed storage cluster, and it is determined whether the hard disk parameters are equal to the preset range of the preset hard disk parameters, and when the hard disk parameter values are not within the preset range of the preset hard disk parameter values, it is determined that at least fault domains in the distributed storage cluster have failed.

For example, for some specific hard disk parameter values, if the obtained hard disk parameter values are not 0, the greater the number of the obtained hard disk parameter values, the greater the possibility of failure, and when the obtained hard disk parameter values exceed a preset range, the failure domain where the hard disk parameter values are located is considered to have failed, and other parameters such as the state of the hard disk exist, if the obtained hard disk parameter values represent offline, the hard disk is considered to have been damaged, and when the damage rate of the hard disk in obtained failure domains exceeds a preset range, such as 50%, the failure domain where the hard disk is located is considered to have failed.

Here, it should be noted that: when the distributed storage cluster has the following phenomena, for example, when a hard disk is damaged, the process constructed based on the hard disk cannot normally run, the cluster state is abnormal, the number of normal processes is reduced, and at this time, it is determined that a fault domain where the damaged hard disk is located has failed.

After determining that at least fault domains in the distributed storage cluster have faults, acquiring operation parameters of at least hard disks in the fault domains without faults, and then re-dividing the hard disks in the fault domains without faults according to the initial fault domain division rule and the operation parameters to obtain at least new fault domains.

The operation parameter of the hard disk herein refers to a parameter value of the hard disk in an operation state, and may specifically be part or all of the following parameters:

For example, the distributed storage cluster is divided into th fault domain, a second fault domain and a third fault domain according to an initial fault domain division rule, during operation, the third fault domain fails, and at this time, the utilization rate of at least hard disks in the th fault domain and the second fault domain is obtained, it is known that the utilization rate of hard disks in the th fault domain is 50%, and the utilization rate of hard disks in the second fault domain is 80%;

since the initial fault domain division rule is to divide the fault domain into 3 fault domains, the combination of the hard disk in the th fault domain and the hard disk in the second fault domain needs to be divided into 3 fault domains.

For example, new fault domains are obtained by combining the part of the hard disks in the fault domain with the part of the hard disks in the second fault domain, and the rest of the hard disks in the fault domain and the rest of the hard disks in the second fault domain are respectively combined to serve as two new fault domains.

For another example, the distributed storage cluster is divided into th fault domain and a second fault domain according to an initial fault domain division rule, and in the operation process, the second fault domain is found to have a fault, and at this time, the annual failure probability of the hard disk in th fault domain is obtained, knowing that the annual failure probability of two hard disks in th fault domain is 50%, and the annual failure probability of the remaining 4 hard disks is 80%;

since the initial fault domain division rule is to divide the fault domain into 2 fault domains, at this time, it is necessary to subdivide the hard disks in the th fault domain into 2 fault domains, and it is possible to combine the hard disks with a 50% annual failure probability and 2 80% annual failure probabilities to obtain new fault domains, and combine the remaining hard disks with a 50% annual failure probability and 2 80% annual failure probabilities to obtain another new fault domains.

The explanation here is: the above manner of repartitioning the hard disk in the non-failed domain according to the initial fault domain partitioning rule and the operation parameter is only an example, and other manners of partitioning and combining are also within the protection scope of the embodiment of the present invention.

And after at least new fault domains are obtained, the hard disks distributed by the data fragments are reselected in the new fault domains according to the initial data fragment distribution rule.

The initial data fragmentation distribution rule refers to the number of hard disks selected in a divided fault domain after the distributed storage cluster is created.

For example, the initial data fragmentation distribution rule is to select hard disks in the failed domain and, after repartitioning, still select hard disks in the new failed domain.

Correspondingly, after the service volume running in the distributed storage cluster is determined to be smaller than the second preset value, the data fragments in the fault domain with the fault are recovered and rebalanced in the reselected hard disk until the distributed storage cluster is recovered to be normal.

The running traffic may be a traffic used in running in the distributed storage cluster, and the specific value of the th preset value and the second preset value may be set according to actual needs, which is not limited in the embodiment of the present invention.

The following detailed description of embodiments of the invention refers to the accompanying drawings.

If there are 3 servers in distributed storage clusters, each server has 4 hard disks, each server includes a CPU (Central Processing Unit), a memory, a network card, a fan, a power lamp, and other hardware modules, corresponding hardware indexes may be read error times of the hard disks, power-on time of the hard disks, throughput (bandwidth/iops) of the hard disks, network bandwidth, and other hardware indexes may be response time, request error rate, cluster operating state, process operating state, and other hardware indexes that can be obtained from the hardware can be obtained by a prior art method, which is not described herein again in the embodiments of the present invention.

When the distributed storage cluster is completed, a fault domain and a data sharding distribution rule are specified, in this embodiment, each server is fault domains, and there are 3 fault domains in total, and then for each fault domains, hard disks are selected in each fault domains based on the data sharding distribution rule, with specific effects as shown in fig. 2.

In the embodiment of the invention, the distributed storage cluster adopts a data redundancy strategy of three copies, so 3 parts of data need to be written into the distributed storage cluster, at the moment, the 3 parts of data are placed on 3 servers, and each server stores 1 part of data.

In the operation process, based on the use condition of the distributed storage cluster, the operation indexes of the 3 fault domains of the distributed storage cluster in the operation state are obtained, and then the fault rates of the 3 fault domains are determined according to the obtained operation indexes, wherein optional embodiments are that the annual failure probability of the hard disks in the fault domains is obtained by using the operation indexes, and the annual failure probability of the hard disks in the fault domains is used as the fault rate.

Correspondingly, the annual failure probability of the hard disk of the server 2 reaches the th preset value obtained through the formula, at this time, it is considered that the server 2 has a failure, only 2 failure domains remain in the distributed storage cluster, and the operation parameters of the hard disks in the failure domains that do not have the failure need to be obtained to subdivide the failure domains, for example, two hard disks in the server 1 are used as a new failure domain 1, two remaining hard disks in the server 1 and two hard disks in the server 3 are combined to be used as a new failure domain 2, and a new failure domain 3 of two remaining hard disks in the server 3 is used, which has the specific effect shown in fig. 3.

The explanation here is: the above-mentioned manner of combining hard disks is only an example, and other manners of dividing and combining are also within the scope of the embodiments of the present invention.

Correspondingly, hard disks are selected in the 3 divided new failure domains based on the initial data fragment distribution rule, data copies are placed in each new failure domain, and when the traffic volume running in the distributed storage cluster is smaller than a second preset value, the data fragments in the failed failure domain are restored and rebalanced in the reselected hard disks.

For example, if the early morning traffic is smaller than the set second preset value, the data fragments in the failure domain that will fail in the early morning are selected to be restored and rebalanced in the reselected hard disk.

Wherein, kinds of optional modes for recovering and rebalancing the data fragments in the failed domain in the reselected hard disk are as follows:

for example, three data copies exist in hard disk groups {1,3,5}, wherein the three data copies are respectively placed in the hard disk 1, the hard disk 3 and the hard disk 5, and the hard disk 1, the hard disk 3 and the hard disk 5 respectively belong to three fault domains, at the moment, the hard disk 5 becomes unusable, after the fault domain is redefined, the hard disk 7 is selected to replace the hard disk 5 in a new fault domain, so that the hard disk group storing the data copies becomes {1,3, 7}, wherein the hard disk 1 and the hard disk 3 respectively have complete data, the hard disk 7 does not have data, and at the moment, complete data needs to be written in the hard disk 7 according to the data in the hard disk 1 and the hard disk 3 to reconstruct the three data copies.

As shown in fig. 4, a flowchart of a complete method for providing data protection according to an embodiment of the present invention:

step 400, dividing the distributed storage cluster into initial fault domains, and assigning a data fragmentation rule;

step 401, judging whether at least fault domains in the distributed storage cluster have faults, if so, executing step 402, otherwise, the distributed storage cluster is normal;

step 402, the hard disks in the fault domains which do not have faults in the distributed storage cluster are re-divided to obtain at least new fault domains;

step 403, reselecting a hard disk with distributed data fragments in the new fault domain according to an initial data fragment distribution rule;

step 404, judging that the traffic running in the distributed storage cluster is smaller than a second preset value, if so, executing step 405, otherwise, executing step 406;

step 405, the data fragments in the fault domain with the fault are recovered and rebalanced in the reselected hard disk;

step 406, continuing to wait until the traffic volume is smaller than a second preset value, and then executing step 405;

and step 407, the distributed storage cluster is recovered to be normal.

As shown in fig. 5, data protection devices according to an embodiment of the present invention include at least processing units 500 and at least storage units 501, wherein the storage units store program codes, and when the program codes are executed by the processing units 500, the processing units 500 execute the following processes:

Optionally, the processing unit 500 is further configured to:

Determining that the failure rate of at least failure domains in the distributed storage cluster reaches a th preset value.

Optionally, the processing unit 500 is specifically configured to:

Optionally, the processing unit 500 is further specifically configured to:

Optionally, the processing unit 500 is further configured to:

As shown in fig. 6, kinds of data protection devices according to an embodiment of the present invention include:

the partitioning module 600 is configured to re-partition the hard disks in the failed domain that does not have a failure in the distributed storage cluster to obtain at least new failed domains;

a selecting module 601, configured to reselect a hard disk for data fragment distribution in the new failure domain according to an initial data fragment distribution rule;

and a recovery module 602, configured to recover and rebalance the data fragments in the failed domain in the reselected hard disk.

Optionally, the dividing module 600 is further configured to:

determining that the hard disk parameter values in at least fault domains in the distributed storage cluster do not meet the preset range of the preset hard disk parameter values, or

Optionally, the dividing module 600 is further configured to:

acquiring operation indexes of at least fault domains of the distributed storage cluster in an operation state;

determining a failure rate of the at least failure domains based on the operational indicator.

Optionally, the dividing module 600 is specifically configured to:

acquiring the operation parameters of at least hard disks in the fault domain without faults;

and according to the initial fault domain division rule and the operation parameters, re-dividing the hard disks in the fault domains without faults to obtain at least new fault domains.

Optionally, the recovery module 602 is further configured to:

and determining that the traffic running in the distributed storage cluster is smaller than a second preset value.

In possible embodiments, aspects of the data protection provided by the embodiments of the invention may also be implemented in the form of program products comprising program code for causing a computer device to perform the steps in the methods of data protection according to the various exemplary embodiments of the invention described in this specification when the program code is run on the computer device.

A more specific example (a non-exhaustive list) of the readable storage medium includes an electrical connection having or more wires, a portable diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

A program product for data forwarding control according to an embodiment of the present invention may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a server device. However, the program product of the present invention is not limited thereto, and in this document, the readable storage medium may be any tangible medium containing or storing the program, which can be used by or in connection with an information transmission, apparatus, or device.

A readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave .

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations of the present invention may be written in any combination of or more programming languages, including an object oriented programming language such as Java, C + +, or the like, as well as conventional procedural programming languages, such as the "C" language, or similar programming languages.

computer-readable storage media, i.e., a storage medium that is not lost after power is turned off, are provided for a method of data protection according to an embodiment of the present invention, the storage medium having stored therein a software program comprising program code that, when executed on a computing device, when read and executed by or more processors, implements a scheme of data protection according to an embodiment of the present invention.

It will be appreciated that blocks of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by computer program instructions to produce a machine, such that the instructions, which execute via the processor of the computer and/or other programmable data processing apparatus, create means for implementing the functions/acts specified in the block diagrams and/or flowchart illustrations.

This application may take the form of a computer program product on a computer-usable or computer-readable storage medium having computer-usable or computer-readable program code embodied in the medium for use by or in connection with an instruction execution system .

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

A method of data protection of , the method comprising:

the hard disks in the fault domains without faults in the distributed storage cluster are re-divided to obtain at least new fault domains;

reselecting a hard disk for data fragment distribution in the new fault domain according to an initial data fragment distribution rule;

and recovering and rebalancing the data fragments in the failed domain in the reselected hard disk.
2. The method of claim 1, wherein before repartitioning the hard disks in the non-failed domain of the distributed storage cluster to obtain at least new failed domains, further comprising:

determining that the hard disk parameter values in at least fault domains in the distributed storage cluster do not meet the preset range of the preset hard disk parameter values, or

Determining that the failure rate of at least failure domains in the distributed storage cluster reaches a th preset value.
3. The method of claim 2, wherein the failure rate is determined by:

acquiring operation indexes of at least fault domains of the distributed storage cluster in an operation state;

determining a failure rate of the at least failure domains based on the operational indicator.
4. The method of claim 1, wherein said repartitioning of hard disks in non-failing domains in the distributed storage cluster into at least new failure domains comprises:

acquiring the operation parameters of at least hard disks in the fault domain without faults;

and according to the initial fault domain division rule and the operation parameters, re-dividing the hard disks in the fault domains without faults to obtain at least new fault domains.
5. The method of claim 1, wherein before recovering and rebalancing the data fragments in the failed domain in the reselected hard disk, further comprising:

and determining that the traffic running in the distributed storage cluster is smaller than a second preset value.
apparatus for data protection, comprising at least processing units and at least storage units, wherein said storage units store program code that, when executed by said processing units, causes said processing units to perform the following:

the method comprises the steps of carrying out repartitioning on hard disks in fault domains without faults in a distributed storage cluster to obtain at least new fault domains, reselecting the hard disks with data fragmentation distribution in the new fault domains according to an initial data fragmentation distribution rule, and carrying out recovery and rebalancing on the data fragmentation in the fault domains with faults in the reselected hard disks.
7. The device of claim 6, wherein the processing unit is further to:

before the hard disks in the fault domains without faults in the distributed storage cluster are re-divided to obtain at least new fault domains, determining that the hard disk parameter values in at least fault domains in the distributed storage cluster do not meet the preset range of preset hard disk parameter values, or

Determining that the failure rate of at least failure domains in the distributed storage cluster reaches a th preset value.
8. The device of claim 7, wherein the processing unit is specifically configured to:

the method comprises the steps of obtaining operation indexes of at least fault domains of the distributed storage cluster in an operation state, and determining the fault rate of at least fault domains according to the operation indexes.
9. The device of claim 6, wherein the processing unit is further to:

and according to the initial fault domain division rule and the operation parameters, the hard disks in the fault domain which does not have faults are divided again to obtain at least new fault domains.
10. The device of claim 6, wherein the processing unit is further to:

and before the data fragments in the fault domain with faults are recovered and rebalanced in the reselected hard disk, determining that the service volume running in the distributed storage cluster is less than a second preset value.