CN111176900A

CN111176900A - Distributed storage system and data recovery method, device and medium thereof

Info

Publication number: CN111176900A
Application number: CN201911402944.0A
Authority: CN
Inventors: 丁纯杰; 孟祥瑞
Original assignee: Inspur Electronic Information Industry Co Ltd
Current assignee: Inspur Electronic Information Industry Co Ltd
Priority date: 2019-12-30
Filing date: 2019-12-30
Publication date: 2020-05-19

Abstract

The application discloses a distributed storage system, a data recovery method and a data recovery device thereof, and a plurality of storage nodes for performing redundant storage on the same data in the distributed storage system of a computer readable storage medium, wherein the plurality of storage nodes are used as a redundant backup group, and the method is applied to a main storage node in the redundant backup group and comprises the following steps: receiving a data acquisition request sent from a storage node in a redundant backup group; reading each data needing to be sent to the slave storage node from the local; judging whether each data is abnormal or not so as to determine the abnormal data as target data and suspend the sending of the target data; judging whether a target slave storage node storing complete backup of target data exists in the redundant backup group or not; and if the target slave storage node exists, performing local data recovery based on the complete backup of the target data, and sending the target data after the local recovery to the slave storage node. The data recovery mechanism can be reasonably optimized, the hanging-up of the server is reduced, and the stability and the sustainability of the service are improved.

Description

Distributed storage system and data recovery method, device and medium thereof

Technical Field

The present application relates to the field of distributed storage technologies, and in particular, to a distributed storage system, a data recovery method and apparatus thereof, and a computer-readable storage medium.

Background

The distributed storage system is a storage system which divides and scatters data according to a certain rule and stores the data on a plurality of independent general storage servers.

In order to ensure the security of user data, multiple copies of the same data are usually copied and stored in storage devices of different storage nodes, and the storage nodes backed up by the same data form a redundant backup group. For example, if three copy backup rules are used, one copy of data is copied into three copies and stored in three different hard disks, and the redundant backup set includes three hard disks. Data maintenance can be realized on each storage node of each redundant backup group by deploying a Placement Group (PG) service

If a storage node needs to update data due to being offline for a period of time or due to data damage, the distributed system may restore the data of the storage device based on the data in the primary storage nodes in the redundant backup group. However, in the data recovery mechanism in the prior art, the main storage node does not perform abnormal data judgment after reading data from the local storage node, but directly sends the data out. Once receiving abnormal data, the storage node receiving the data considers that the storage node itself loses data consistency, and then a server of the storage node is hung up, so that storage service interruption is caused, and the working pressure of other storage nodes in the distributed storage system is aggravated, especially when the storage system is busy in service. In addition, generally, after a certain storage node is hung up, the distributed system will automatically initiate a round of master node election again, further affecting the working efficiency of the distributed storage system.

In view of the above, it is an important need for those skilled in the art to provide a solution to the above technical problems.

Disclosure of Invention

The application aims to provide a distributed storage system, a data recovery method and a data recovery device thereof, and a computer-readable storage medium, so as to optimize a data recovery mechanism, improve the success rate of data recovery, reduce the phenomenon of server hang-up, and improve the working stability of each storage node and the sustainability of services.

In order to solve the above technical problem, in a first aspect, the present application discloses a data recovery method in a distributed storage system, where a plurality of storage nodes for performing redundant storage on the same data in the distributed storage system are used as a redundant backup group; the method is applied to the main storage node in the redundant backup group and comprises the following steps:

receiving a data acquisition request sent from a storage node in a redundant backup group;

reading each data needing to be sent to the slave storage node from the local;

judging whether each data is abnormal or not so as to determine the abnormal data as target data and suspend the sending of the target data;

judging whether a target slave storage node storing the complete backup of the target data exists in the redundant backup group or not;

and if the target secondary storage node exists, performing local data recovery based on the complete backup of the target data, and sending the target data after local recovery to the secondary storage node.

Optionally, the locally reading each piece of data that needs to be sent to the slave storage node includes:

generating a data information queue to be sent; the data information queue to be sent stores the data information of each data to be sent to the slave storage node;

and reading corresponding data from the local according to the data information queue to be sent.

Optionally, the generating a queue of data information to be sent includes:

comparing the local data change record with the data change record of the slave storage node;

and determining data information of each data required to be sent to the slave storage node according to the comparison result so as to generate the data information queue to be sent.

Optionally, the determining abnormal data as target data and suspending sending of the target data includes:

determining abnormal data as the target data;

deleting the data information of the target data from the data information queue to be sent;

and sending the data without abnormity to the slave storage node according to the data information queue to be sent, and sending a deferred sending message aiming at the target data to the slave storage node.

Optionally, after the determining whether there is a target slave storage node storing a full backup of the target data in the redundant backup group, the method further includes:

and if the target slave storage node does not exist, marking the target data as unrecoverable data and generating corresponding prompt information.

The application also discloses another data recovery method in the distributed storage system, wherein a plurality of storage nodes for performing redundant storage on the same data in the distributed storage system are used as a redundant backup group; the method is applied to the slave storage nodes in the redundant backup group and comprises the following steps:

when a data updating requirement occurs, sending a data acquisition request to a main storage node in a redundancy backup group;

after the main storage node reads and judges whether each data needing to be sent is abnormal from the local and determines the abnormal data as target data, the abnormal data sent by the main storage node is received;

after the main storage node judges whether a target secondary storage node storing the complete backup of the target data exists in the redundant backup group, if the target secondary storage node exists, the main storage node receives the target data sent by the main storage node after performing local data recovery based on the complete backup of the target data.

In a second aspect, the present application further discloses a data recovery apparatus in a distributed storage system, where a plurality of storage nodes for performing redundant storage on the same data in the distributed storage system are used as a redundant backup group; the data recovery device is applied to the main storage node in the redundancy backup group and comprises:

the request receiving module is used for receiving a data acquisition request sent from a storage node in the redundant backup group;

the data reading module is used for locally reading each data needing to be sent to the slave storage node;

the abnormality judgment module is used for judging whether each data is abnormal or not so as to determine the abnormal data as target data and suspend the sending of the target data;

the data query module is used for judging whether a target slave storage node storing the complete backup of the target data exists in the redundant backup group or not;

and the data recovery module is used for performing local data recovery based on the complete backup of the target data and sending the target data after local recovery to the slave storage node when the target slave storage node exists.

Optionally, the data reading module specifically includes:

the queue generating unit is used for generating a data information queue to be sent; the data information queue to be sent stores the data information of each data to be sent to the slave storage node;

and the data reading unit is used for reading corresponding data from the local according to the to-be-sent data information queue.

Optionally, the queue generating unit is specifically configured to:

comparing the local data change record with the data change record of the slave storage node; and determining data information of each data required to be sent to the slave storage node according to the comparison result so as to generate the data information queue to be sent.

Optionally, the abnormality determining module specifically includes:

a determination unit configured to determine abnormal data as the target data;

a deleting unit, configured to delete the data information of the target data from the to-be-sent data information queue;

and the sending unit is used for sending the data without abnormity to the slave storage node according to the data information queue to be sent and sending a deferred sending message aiming at the target data to the slave storage node.

Optionally, the method further comprises:

and the data error reporting module is used for marking the target data as unrecoverable data and generating corresponding prompt information if the target secondary storage node does not exist after the data query module judges whether the target secondary storage node storing the complete backup of the target data exists in the redundant backup group.

The application also discloses another data recovery device in the distributed storage system, wherein a plurality of storage nodes for performing redundant storage on the same data in the distributed storage system are used as a redundant backup group; the data recovery device is applied to the slave storage nodes in the redundant backup group and comprises:

the data request module is used for sending a data acquisition request to the main storage node in the redundant backup group when the data updating requirement occurs;

the data receiving module is used for receiving the data which are not abnormal and sent by the main storage node after the main storage node reads from the local and judges whether each data which needs to be sent is abnormal or not and determines the abnormal data as target data; after the main storage node judges whether a target secondary storage node storing the complete backup of the target data exists in the redundant backup group, if the target secondary storage node exists, the main storage node receives the target data sent by the main storage node after performing local data recovery based on the complete backup of the target data.

In a third aspect, the present application further discloses a distributed storage system, where multiple storage nodes for performing redundant storage on the same data in the distributed storage system are used as a redundant backup group, and a group-in-group service for maintaining data consistency in the redundant backup group is deployed on each storage node;

the slave storage nodes in the redundant backup group are used for sending data acquisition requests to the master storage nodes in the redundant backup group through the homing group service when the data updating requirement occurs;

the main storage node is used for locally reading and judging whether each data needing to be sent to the auxiliary storage node is abnormal or not through the homing group service, determining the abnormal data as target data and suspending the sending of the target data, judging whether a target auxiliary storage node storing a complete backup of the target data exists in the redundant backup group or not, if the target auxiliary storage node exists, performing local data recovery based on the complete backup of the target data, and sending the target data after local recovery to the auxiliary storage node.

In a fourth aspect, the present application further discloses a computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, is adapted to implement the steps of the data recovery method in any one of the distributed storage systems as described above.

The application provides a data recovery method in a distributed storage system, wherein a plurality of storage nodes for performing redundant storage on the same data in the distributed storage system are used as a redundant backup group, and a homing group service for maintaining the data consistency in the redundant backup group is deployed on each storage node; the put-group service deployed on a primary storage node in a redundant backup group is used to implement the method, which comprises: receiving a data acquisition request sent from a storage node in a redundant backup group; reading each data needing to be sent to the slave storage node from the local; judging whether each data is abnormal or not so as to determine the abnormal data as target data and suspend the sending of the target data; judging whether a target slave storage node storing the complete backup of the target data exists in the redundant backup group or not; and if the target secondary storage node exists, performing local data recovery based on the complete backup of the target data, and sending the target data after local recovery to the secondary storage node.

Therefore, in the data recovery method in the distributed storage system provided by the application, the main storage node firstly judges whether the data is abnormal before sending the data read from the local to the auxiliary storage node, and suspends the sending of the abnormal data, thereby effectively avoiding unnecessary on-hook and off-line of the auxiliary storage node receiving the abnormal data, preventing the aggravation of business operation pressure, and improving the stability and sustainability of storage service; according to the method and the device, local data recovery is tried to be carried out based on other slave storage nodes, so that the recovered data are sent to the slave storage nodes after the local data are recovered, an error is prevented from being reported immediately under the condition that only the local data are abnormal, a data recovery mechanism is reasonably optimized, and the success rate of data recovery is effectively improved. The distributed storage system, the data recovery device and the computer readable storage medium provided by the application also have the beneficial effects.

Drawings

In order to more clearly illustrate the technical solutions in the prior art and the embodiments of the present application, the drawings that are needed to be used in the description of the prior art and the embodiments of the present application will be briefly described below. Of course, the following description of the drawings related to the embodiments of the present application is only a part of the embodiments of the present application, and it will be obvious to those skilled in the art that other drawings can be obtained from the provided drawings without any creative effort, and the obtained other drawings also belong to the protection scope of the present application.

Fig. 1 is a flowchart of a data recovery method in a distributed storage system according to an embodiment of the present application;

fig. 2 is a block diagram illustrating a structure of a data recovery apparatus in a distributed storage system according to an embodiment of the present disclosure;

FIG. 3 is a flowchart of another method for recovering data in a distributed storage system according to an embodiment of the present disclosure;

fig. 4 is a block diagram illustrating a structure of a data recovery apparatus in another distributed storage system according to an embodiment of the present disclosure.

Detailed Description

The core of the application is to provide a distributed storage system, a data recovery method and device thereof, and a computer-readable storage medium, so as to optimize a data recovery mechanism, improve a data recovery success rate, reduce a server hang-up phenomenon, and improve the working stability of each storage node and the sustainability of services.

In order to more clearly and completely describe the technical solutions in the embodiments of the present application, the technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application. It is to be understood that the embodiments described are only a few embodiments of the present application and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

If a storage node needs to update data due to being offline for a period of time or due to data damage, the distributed system may restore the data of the storage device based on the data in the primary storage nodes in the redundant backup group. However, in the data recovery mechanism in the prior art, the main storage node does not perform abnormal data judgment after reading data from the local storage node, but directly sends the data out. Once receiving abnormal data, the storage node receiving the data considers that the storage node itself loses data consistency, and then a server of the storage node is hung up, so that service interruption is caused, and the working pressure of other storage nodes in the distributed storage system is increased, especially when the storage system is busy in service. In addition, generally, after a certain storage node is hung up, the distributed system will automatically initiate a round of master node election again, further affecting the working efficiency of the distributed storage system. In view of this, the present application provides a data recovery scheme in a distributed storage system, which can effectively solve the above problem.

Referring to fig. 1, an embodiment of the present application discloses a data recovery method in a distributed storage system.

The method is particularly applied to main storage nodes in the redundancy backup group. Specifically, a group-in-group service, i.e., a PG service, for maintaining data consistency in the redundant backup group may be deployed on each storage node. A staging group service deployed on a primary storage node in a redundant backup group may be used to implement the data recovery method.

The data recovery method in the distributed storage system provided by the embodiment of the application mainly comprises the following steps:

s101: and receiving a data acquisition request sent from the storage node in the redundant backup group.

Specifically, a redundant backup group includes a master storage node and a plurality of slave storage nodes, each of which stores a different backup of target data. When the data backup stored in a certain slave storage node is damaged, or the slave storage node is offline for a period of time due to a fault and is started online again, the slave storage node initiates a data acquisition request to the master storage node so as to update the data of the slave storage node and keep the data consistency with other storage nodes.

S102: reading each data that needs to be sent to the slave storage node from the local.

After receiving a data acquisition request sent by the slave storage node, the master storage node may read the corresponding data stored locally.

S103: and judging whether each data is abnormal or not so as to determine the abnormal data as target data and suspend the transmission of the target data.

It should be noted that, in the data recovery method provided in the embodiment of the present application, after the data is read from the local storage node, the master storage node does not send the data directly to the slave storage node, but performs an exception judgment on the read data first. For example, if the read data is actually only some error codes and the original data content is missing, it can be determined that the data is abnormal.

Since sending the abnormal data to the slave storage node is meaningless, and the slave storage node hangs up to interrupt service, which aggravates the service operation pressure of other storage nodes in the distributed storage system, in the embodiment of the present application, the master storage node suspends sending the abnormal data, and only sends the non-abnormal data to the slave storage node.

S104: judging whether a target slave storage node storing complete backup of target data exists in the redundant backup group or not; if yes, the process proceeds to S105.

It should be further noted that, in the data recovery method provided in the embodiment of the present application, for locally abnormal data, the master storage node may determine that the data is the target data, and attempt to perform data recovery on the data through other slave storage nodes, so that after the local data recovery is completed, the slave storage node that initiates the data acquisition request performs data recovery.

In particular, since there is typically more than one slave storage node in the redundant backup group, although the target data has been lost in the master storage node, it is still possible for other slave storage nodes to store a full backup of the target data. Therefore, by attempting to recover local data from the storage node based on other data instead of determining that the data is "lost forever" or "unrecoverable" after only the local data is found to be abnormal, the method and the device can effectively avoid unnecessary data loss error reporting and data re-storage.

S105: and performing local data recovery based on the complete backup of the target data, and sending the target data after the local recovery to the slave storage node.

The master storage node can determine the slave storage node storing the complete backup of the target data as the target slave storage node, acquire the complete backup of the target data from the target slave storage node, execute Pull operation, namely write operation, and write the complete backup of the target data into the master storage node after copying, so that data recovery of the master storage node is realized.

After the local data recovery is completed, the master storage node may perform a Push operation, i.e., a sending operation, and send the target data after the local recovery to the slave storage node that originally initiated the data acquisition request, so as to implement data recovery for the slave storage node.

The data recovery method in the distributed storage system provided by the embodiment of the application is realized based on the storage group service deployed on the main storage node in the redundant backup group, and comprises the following steps: receiving a data acquisition request sent from a storage node in a redundant backup group; reading each data needing to be sent to the slave storage node from the local; judging whether each data is abnormal or not so as to determine the abnormal data as target data and suspend the sending of the target data; judging whether a target slave storage node storing complete backup of target data exists in the redundant backup group or not; and if the target slave storage node exists, performing local data recovery based on the complete backup of the target data, and sending the target data after the local recovery to the slave storage node.

Therefore, in the data recovery method in the distributed storage system provided by the application, the main storage node firstly judges whether the data is abnormal before sending the data read from the local to the auxiliary storage node, and suspends the sending of the abnormal data, thereby effectively avoiding unnecessary on-hook and off-line of the auxiliary storage node receiving the abnormal data, preventing the aggravation of business operation pressure, and improving the stability and sustainability of storage service; according to the method and the device, local data recovery is tried to be carried out based on other slave storage nodes, so that the recovered data are sent to the slave storage nodes after the local data are recovered, an error is prevented from being reported immediately under the condition that only the local data are abnormal, a data recovery mechanism is reasonably optimized, and the success rate of data recovery is effectively improved.

As a specific embodiment, the data recovery method in the distributed storage system provided in the embodiment of the present application, based on the above contents, reads each piece of data that needs to be sent to the slave storage node from the local, and includes:

Specifically, in this embodiment, after receiving a data acquisition request sent by a slave storage node, a master storage node may determine data information of each data to be sent, and place the data information into a queue to generate a data information queue to be sent, so as to read data according to the data information queue to be sent.

Further, as a specific embodiment, the generating of the data information queue to be sent may specifically include:

and determining data information of each data required to be sent to the slave storage node according to the comparison result so as to generate a data information queue to be sent.

Specifically, in this embodiment, the storage node where the storage node is located may be logged by the homing group service, i.e., the PG service, and a data change record is generated and updated and maintained in real time. The master storage node can compare the data change records with the slave storage nodes, determine the information of each data required to be sent to the slave storage nodes according to the difference of the data change records, and generate a data information queue to be sent.

For example, when a slave storage node goes off-line due to some failure, the slave storage node is successfully brought back on-line again. The slave storage node will request data from the master storage node, which may compare the differences in the data change records and determine the data updated by the storage system during the offline of the slave storage node as the data that needs to be sent to the slave storage node.

As a specific embodiment, the data recovery method in the distributed storage system provided in the embodiment of the present application, based on the above contents, determines abnormal data as target data and suspends sending of the target data, and includes:

determining abnormal data as target data;

deleting the data information of the target data from a data information queue to be sent;

Specifically, in this embodiment, the master storage node may specifically delete the data information of the abnormal data from the to-be-sent data information queue, and further send the relevant data (all data without the abnormal data at this time) to the slave storage node according to the to-be-sent data information queue after the deletion operation is performed, so as to recover part of the data from the slave storage node. Meanwhile, for data that is abnormal, i.e., target data, a suspend-to-send message may be sent to the slave storage node.

As a specific embodiment, the data recovery method in the distributed storage system provided in the embodiment of the present application, on the basis of the foregoing content, after determining whether a target slave storage node storing a complete backup of target data exists in a redundant backup group, further includes:

Specifically, if there is no complete backup of the target data in the entire redundant backup group, the primary storage node will determine that the target data is "lost forever" or "unrecoverable" at this time. Thus, the primary storage node may mark it as unrecoverable data and prompt the user with a prompt.

Referring to fig. 2, an embodiment of the present application discloses a data recovery apparatus in a distributed storage system, where a plurality of storage nodes in the distributed storage system, which are used for performing redundant storage on the same data, are used as a redundant backup group; the data recovery device is applied to the main storage nodes in the redundant backup group and comprises:

a request receiving module 201, configured to receive a data acquisition request sent from a storage node in a redundant backup group;

the data reading module 202 is used for locally reading each data required to be sent to the slave storage node;

an anomaly determination module 203, configured to determine whether each data is anomalous, so as to determine the anomalous data as target data and suspend sending of the target data;

the data query module 204 is configured to determine whether a target secondary storage node storing a complete backup of the target data exists in the redundant backup group;

and the data recovery module 205 is configured to, when there is a target secondary storage node, perform local data recovery based on the complete backup of the target data, and send the target data after local recovery to the secondary storage node.

As a specific embodiment, in the data recovery apparatus in a distributed storage system provided in the embodiment of the present application, based on the foregoing, the data reading module 202 specifically includes:

and the data reading unit is used for reading corresponding data from the local according to the data information queue to be sent.

As a specific embodiment, in the data recovery apparatus in the distributed storage system provided in the embodiment of the present application, on the basis of the foregoing content, the queue generating unit is specifically configured to:

comparing the local data change record with the data change record of the slave storage node; and determining data information of each data required to be sent to the slave storage node according to the comparison result so as to generate a data information queue to be sent.

As a specific embodiment, on the basis of the foregoing content, the data recovery apparatus in the distributed storage system provided in the embodiment of the present application, the abnormality determining module 203 specifically includes:

a determination unit configured to determine the abnormal data as target data;

the deleting unit is used for deleting the data information of the target data from the data information queue to be sent;

As a specific embodiment, the data recovery apparatus in a distributed storage system provided in the embodiment of the present application, based on the foregoing, further includes:

and a data error reporting module, configured to, after the data querying module 204 determines whether a target secondary storage node storing a complete backup of the target data exists in the redundant backup group, mark the target data as unrecoverable data if the target secondary storage node does not exist, and generate corresponding prompt information.

For specific contents of the data recovery apparatus in the distributed storage system, reference may be made to the foregoing detailed description of the data recovery method in the distributed storage system, and details thereof are not repeated here.

Therefore, according to the data recovery device in the distributed storage system, the main storage node can firstly judge whether the data is abnormal before sending the data read from the local to the auxiliary storage node, and suspend sending of the abnormal data, so that unnecessary hanging-up and offline of the auxiliary storage node receiving the abnormal data are effectively avoided, the service operation pressure is prevented from being intensified, and the stability and the sustainability of the storage service are improved; according to the method and the device, local data recovery is tried to be carried out based on other slave storage nodes, so that the recovered data are sent to the slave storage nodes after the local data are recovered, an error is prevented from being reported immediately under the condition that only the local data are abnormal, a data recovery mechanism is reasonably optimized, and the success rate of data recovery is effectively improved.

Referring to fig. 3, an embodiment of the present application discloses another method for recovering data in a distributed storage system.

The method is specifically applied to slave storage nodes in a redundant backup group. Specifically, a group-in-group service, i.e., a PG service, for maintaining data consistency in the redundant backup group may be deployed on each storage node. A staging group service deployed on a slave storage node in a redundant backup group may be used to implement the data recovery method.

s301: and when the data updating requirement occurs, sending a data acquisition request to the main storage node in the redundant backup group.

Specifically, when the storage node is on-line again after being off-hook, or some data is found to be damaged, the data updating requirement occurs.

S302: and after the main storage node reads and judges whether each data needing to be sent is abnormal from the local and determines the abnormal data as target data, the data without the abnormality sent by the main storage node is received.

S303: after the main storage node judges whether a target slave storage node storing the complete backup of the target data exists in the redundant backup group, if the target slave storage node exists, the main storage node receives the target data sent by the main storage node after performing local data recovery based on the complete backup of the target data.

For the specific content of the data recovery method applied to the slave storage nodes in the distributed storage system, reference may be made to the foregoing detailed description of the data recovery method applied to the master storage nodes in the distributed storage system, and details are not repeated here.

Referring to fig. 4, an embodiment of the present application discloses another data recovery apparatus in a distributed storage system, where a plurality of storage nodes in the distributed storage system for performing redundant storage on the same data are used as a redundant backup group; the data recovery device is applied to the slave storage nodes in the redundant backup group and comprises:

a data request module 401, configured to send a data obtaining request to a main storage node in a redundant backup group when a data update requirement occurs;

a data receiving module 402, configured to receive data that is sent by a main storage node and has no exception after the main storage node locally reads and determines whether each data that needs to be sent is abnormal and determines the abnormal data as target data; after the main storage node judges whether a target slave storage node storing the complete backup of the target data exists in the redundant backup group, if the target slave storage node exists, the main storage node receives the target data sent by the main storage node after performing local data recovery based on the complete backup of the target data.

Therefore, in the data recovery device in the distributed storage system provided by the application, the main storage node firstly judges whether the data is abnormal before sending the data read from the local to the auxiliary storage node, and suspends the sending of the abnormal data, thereby effectively avoiding unnecessary on-hook and off-line of the auxiliary storage node receiving the abnormal data, preventing the aggravation of business operation pressure, and improving the stability and sustainability of storage service; according to the method and the device, local data recovery is tried to be carried out based on other slave storage nodes, so that the recovered data are sent to the slave storage nodes after the local data are recovered, an error is prevented from being reported immediately under the condition that only the local data are abnormal, a data recovery mechanism is reasonably optimized, and the success rate of data recovery is effectively improved.

Furthermore, the application also discloses a distributed storage system, wherein a plurality of storage nodes for performing redundant storage on the same data in the distributed storage system are used as a redundant backup group, and a homing group service for maintaining the data consistency in the redundant backup group is deployed on each storage node;

the main storage node is used for reading and judging whether each data needing to be sent to the auxiliary storage nodes is abnormal from the local through the homing group service, determining the abnormal data as target data, suspending the sending of the target data, judging whether a target auxiliary storage node storing the complete backup of the target data exists in the redundancy backup group, if so, performing local data recovery based on the complete backup of the target data, and sending the target data after local recovery to the auxiliary storage node.

Further, the present application also discloses a computer-readable storage medium, in which a computer program is stored, and the computer program is used for implementing the steps of the data recovery method in any one of the distributed storage systems as described above when being executed by a processor.

For the details of the distributed storage system and the computer-readable storage medium, reference may be made to the foregoing detailed description of the data recovery method in the distributed storage system, and details thereof are not repeated here.

The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the equipment disclosed by the embodiment, the description is relatively simple because the equipment corresponds to the method disclosed by the embodiment, and the relevant parts can be referred to the method part for description.

It is further noted that, throughout this document, relational terms such as "first" and "second" are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Furthermore, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The technical solutions provided by the present application are described in detail above. The principles and embodiments of the present application are explained herein using specific examples, which are provided only to help understand the method and the core idea of the present application. It should be noted that, for those skilled in the art, without departing from the principle of the present application, several improvements and modifications can be made to the present application, and these improvements and modifications also fall into the protection scope of the present application.

Claims

1. A data recovery method in a distributed storage system is characterized in that a plurality of storage nodes for performing redundant storage on the same data in the distributed storage system are used as a redundant backup group; the method is applied to the main storage node in the redundant backup group and comprises the following steps:

reading each data needing to be sent to the slave storage node from the local;

2. The data recovery method of claim 1, wherein the locally reading each data that needs to be sent to the slave storage node comprises:

3. The data recovery method of claim 2, wherein the generating a queue of data information to be sent comprises:

4. The data recovery method according to claim 2, wherein the determining abnormal data as target data and suspending transmission of the target data includes:

determining abnormal data as the target data;

5. The data recovery method according to any one of claims 1 to 4, further comprising, after the determining whether there is a target slave storage node in the redundant backup group that stores a full backup of the target data,:

6. A data recovery method in a distributed storage system is characterized in that a plurality of storage nodes for performing redundant storage on the same data in the distributed storage system are used as a redundant backup group; the method is applied to the slave storage nodes in the redundant backup group and comprises the following steps:

7. A data recovery device in a distributed storage system is characterized in that a plurality of storage nodes for performing redundant storage on the same data in the distributed storage system are used as a redundant backup group; the data recovery device is applied to the main storage node in the redundancy backup group and comprises:

8. A data recovery device in a distributed storage system is characterized in that a plurality of storage nodes for performing redundant storage on the same data in the distributed storage system are used as a redundant backup group; the data recovery device is applied to the slave storage nodes in the redundant backup group and comprises:

9. A distributed storage system is characterized in that a plurality of storage nodes for performing redundant storage on the same data in the distributed storage system are used as a redundant backup group, and a grouping service for maintaining the data consistency in the redundant backup group is deployed on each storage node;

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, is adapted to carry out the steps of the method for data recovery in a distributed storage system according to any one of claims 1 to 6.