CN111176900A - Distributed storage system and data recovery method, device and medium thereof - Google Patents

Distributed storage system and data recovery method, device and medium thereof Download PDF

Info

Publication number
CN111176900A
CN111176900A CN201911402944.0A CN201911402944A CN111176900A CN 111176900 A CN111176900 A CN 111176900A CN 201911402944 A CN201911402944 A CN 201911402944A CN 111176900 A CN111176900 A CN 111176900A
Authority
CN
China
Prior art keywords
data
storage node
target
sent
slave
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911402944.0A
Other languages
Chinese (zh)
Inventor
丁纯杰
孟祥瑞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Electronic Information Industry Co Ltd
Original Assignee
Inspur Electronic Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Electronic Information Industry Co Ltd filed Critical Inspur Electronic Information Industry Co Ltd
Priority to CN201911402944.0A priority Critical patent/CN111176900A/en
Publication of CN111176900A publication Critical patent/CN111176900A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1479Generic software techniques for error detection or fault masking

Abstract

The application discloses a distributed storage system, a data recovery method and a data recovery device thereof, and a plurality of storage nodes for performing redundant storage on the same data in the distributed storage system of a computer readable storage medium, wherein the plurality of storage nodes are used as a redundant backup group, and the method is applied to a main storage node in the redundant backup group and comprises the following steps: receiving a data acquisition request sent from a storage node in a redundant backup group; reading each data needing to be sent to the slave storage node from the local; judging whether each data is abnormal or not so as to determine the abnormal data as target data and suspend the sending of the target data; judging whether a target slave storage node storing complete backup of target data exists in the redundant backup group or not; and if the target slave storage node exists, performing local data recovery based on the complete backup of the target data, and sending the target data after the local recovery to the slave storage node. The data recovery mechanism can be reasonably optimized, the hanging-up of the server is reduced, and the stability and the sustainability of the service are improved.

Description

Distributed storage system and data recovery method, device and medium thereof
Technical Field
The present application relates to the field of distributed storage technologies, and in particular, to a distributed storage system, a data recovery method and apparatus thereof, and a computer-readable storage medium.
Background
The distributed storage system is a storage system which divides and scatters data according to a certain rule and stores the data on a plurality of independent general storage servers.
In order to ensure the security of user data, multiple copies of the same data are usually copied and stored in storage devices of different storage nodes, and the storage nodes backed up by the same data form a redundant backup group. For example, if three copy backup rules are used, one copy of data is copied into three copies and stored in three different hard disks, and the redundant backup set includes three hard disks. Data maintenance can be realized on each storage node of each redundant backup group by deploying a Placement Group (PG) service
If a storage node needs to update data due to being offline for a period of time or due to data damage, the distributed system may restore the data of the storage device based on the data in the primary storage nodes in the redundant backup group. However, in the data recovery mechanism in the prior art, the main storage node does not perform abnormal data judgment after reading data from the local storage node, but directly sends the data out. Once receiving abnormal data, the storage node receiving the data considers that the storage node itself loses data consistency, and then a server of the storage node is hung up, so that storage service interruption is caused, and the working pressure of other storage nodes in the distributed storage system is aggravated, especially when the storage system is busy in service. In addition, generally, after a certain storage node is hung up, the distributed system will automatically initiate a round of master node election again, further affecting the working efficiency of the distributed storage system.
In view of the above, it is an important need for those skilled in the art to provide a solution to the above technical problems.
Disclosure of Invention
The application aims to provide a distributed storage system, a data recovery method and a data recovery device thereof, and a computer-readable storage medium, so as to optimize a data recovery mechanism, improve the success rate of data recovery, reduce the phenomenon of server hang-up, and improve the working stability of each storage node and the sustainability of services.
In order to solve the above technical problem, in a first aspect, the present application discloses a data recovery method in a distributed storage system, where a plurality of storage nodes for performing redundant storage on the same data in the distributed storage system are used as a redundant backup group; the method is applied to the main storage node in the redundant backup group and comprises the following steps:
receiving a data acquisition request sent from a storage node in a redundant backup group;
reading each data needing to be sent to the slave storage node from the local;
judging whether each data is abnormal or not so as to determine the abnormal data as target data and suspend the sending of the target data;
judging whether a target slave storage node storing the complete backup of the target data exists in the redundant backup group or not;
and if the target secondary storage node exists, performing local data recovery based on the complete backup of the target data, and sending the target data after local recovery to the secondary storage node.
Optionally, the locally reading each piece of data that needs to be sent to the slave storage node includes:
generating a data information queue to be sent; the data information queue to be sent stores the data information of each data to be sent to the slave storage node;
and reading corresponding data from the local according to the data information queue to be sent.
Optionally, the generating a queue of data information to be sent includes:
comparing the local data change record with the data change record of the slave storage node;
and determining data information of each data required to be sent to the slave storage node according to the comparison result so as to generate the data information queue to be sent.
Optionally, the determining abnormal data as target data and suspending sending of the target data includes:
determining abnormal data as the target data;
deleting the data information of the target data from the data information queue to be sent;
and sending the data without abnormity to the slave storage node according to the data information queue to be sent, and sending a deferred sending message aiming at the target data to the slave storage node.
Optionally, after the determining whether there is a target slave storage node storing a full backup of the target data in the redundant backup group, the method further includes:
and if the target slave storage node does not exist, marking the target data as unrecoverable data and generating corresponding prompt information.
The application also discloses another data recovery method in the distributed storage system, wherein a plurality of storage nodes for performing redundant storage on the same data in the distributed storage system are used as a redundant backup group; the method is applied to the slave storage nodes in the redundant backup group and comprises the following steps:
when a data updating requirement occurs, sending a data acquisition request to a main storage node in a redundancy backup group;
after the main storage node reads and judges whether each data needing to be sent is abnormal from the local and determines the abnormal data as target data, the abnormal data sent by the main storage node is received;
after the main storage node judges whether a target secondary storage node storing the complete backup of the target data exists in the redundant backup group, if the target secondary storage node exists, the main storage node receives the target data sent by the main storage node after performing local data recovery based on the complete backup of the target data.
In a second aspect, the present application further discloses a data recovery apparatus in a distributed storage system, where a plurality of storage nodes for performing redundant storage on the same data in the distributed storage system are used as a redundant backup group; the data recovery device is applied to the main storage node in the redundancy backup group and comprises:
the request receiving module is used for receiving a data acquisition request sent from a storage node in the redundant backup group;
the data reading module is used for locally reading each data needing to be sent to the slave storage node;
the abnormality judgment module is used for judging whether each data is abnormal or not so as to determine the abnormal data as target data and suspend the sending of the target data;
the data query module is used for judging whether a target slave storage node storing the complete backup of the target data exists in the redundant backup group or not;
and the data recovery module is used for performing local data recovery based on the complete backup of the target data and sending the target data after local recovery to the slave storage node when the target slave storage node exists.
Optionally, the data reading module specifically includes:
the queue generating unit is used for generating a data information queue to be sent; the data information queue to be sent stores the data information of each data to be sent to the slave storage node;
and the data reading unit is used for reading corresponding data from the local according to the to-be-sent data information queue.
Optionally, the queue generating unit is specifically configured to:
comparing the local data change record with the data change record of the slave storage node; and determining data information of each data required to be sent to the slave storage node according to the comparison result so as to generate the data information queue to be sent.
Optionally, the abnormality determining module specifically includes:
a determination unit configured to determine abnormal data as the target data;
a deleting unit, configured to delete the data information of the target data from the to-be-sent data information queue;
and the sending unit is used for sending the data without abnormity to the slave storage node according to the data information queue to be sent and sending a deferred sending message aiming at the target data to the slave storage node.
Optionally, the method further comprises:
and the data error reporting module is used for marking the target data as unrecoverable data and generating corresponding prompt information if the target secondary storage node does not exist after the data query module judges whether the target secondary storage node storing the complete backup of the target data exists in the redundant backup group.
The application also discloses another data recovery device in the distributed storage system, wherein a plurality of storage nodes for performing redundant storage on the same data in the distributed storage system are used as a redundant backup group; the data recovery device is applied to the slave storage nodes in the redundant backup group and comprises:
the data request module is used for sending a data acquisition request to the main storage node in the redundant backup group when the data updating requirement occurs;
the data receiving module is used for receiving the data which are not abnormal and sent by the main storage node after the main storage node reads from the local and judges whether each data which needs to be sent is abnormal or not and determines the abnormal data as target data; after the main storage node judges whether a target secondary storage node storing the complete backup of the target data exists in the redundant backup group, if the target secondary storage node exists, the main storage node receives the target data sent by the main storage node after performing local data recovery based on the complete backup of the target data.
In a third aspect, the present application further discloses a distributed storage system, where multiple storage nodes for performing redundant storage on the same data in the distributed storage system are used as a redundant backup group, and a group-in-group service for maintaining data consistency in the redundant backup group is deployed on each storage node;
the slave storage nodes in the redundant backup group are used for sending data acquisition requests to the master storage nodes in the redundant backup group through the homing group service when the data updating requirement occurs;
the main storage node is used for locally reading and judging whether each data needing to be sent to the auxiliary storage node is abnormal or not through the homing group service, determining the abnormal data as target data and suspending the sending of the target data, judging whether a target auxiliary storage node storing a complete backup of the target data exists in the redundant backup group or not, if the target auxiliary storage node exists, performing local data recovery based on the complete backup of the target data, and sending the target data after local recovery to the auxiliary storage node.
In a fourth aspect, the present application further discloses a computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, is adapted to implement the steps of the data recovery method in any one of the distributed storage systems as described above.
The application provides a data recovery method in a distributed storage system, wherein a plurality of storage nodes for performing redundant storage on the same data in the distributed storage system are used as a redundant backup group, and a homing group service for maintaining the data consistency in the redundant backup group is deployed on each storage node; the put-group service deployed on a primary storage node in a redundant backup group is used to implement the method, which comprises: receiving a data acquisition request sent from a storage node in a redundant backup group; reading each data needing to be sent to the slave storage node from the local; judging whether each data is abnormal or not so as to determine the abnormal data as target data and suspend the sending of the target data; judging whether a target slave storage node storing the complete backup of the target data exists in the redundant backup group or not; and if the target secondary storage node exists, performing local data recovery based on the complete backup of the target data, and sending the target data after local recovery to the secondary storage node.
Therefore, in the data recovery method in the distributed storage system provided by the application, the main storage node firstly judges whether the data is abnormal before sending the data read from the local to the auxiliary storage node, and suspends the sending of the abnormal data, thereby effectively avoiding unnecessary on-hook and off-line of the auxiliary storage node receiving the abnormal data, preventing the aggravation of business operation pressure, and improving the stability and sustainability of storage service; according to the method and the device, local data recovery is tried to be carried out based on other slave storage nodes, so that the recovered data are sent to the slave storage nodes after the local data are recovered, an error is prevented from being reported immediately under the condition that only the local data are abnormal, a data recovery mechanism is reasonably optimized, and the success rate of data recovery is effectively improved. The distributed storage system, the data recovery device and the computer readable storage medium provided by the application also have the beneficial effects.
Drawings
In order to more clearly illustrate the technical solutions in the prior art and the embodiments of the present application, the drawings that are needed to be used in the description of the prior art and the embodiments of the present application will be briefly described below. Of course, the following description of the drawings related to the embodiments of the present application is only a part of the embodiments of the present application, and it will be obvious to those skilled in the art that other drawings can be obtained from the provided drawings without any creative effort, and the obtained other drawings also belong to the protection scope of the present application.
Fig. 1 is a flowchart of a data recovery method in a distributed storage system according to an embodiment of the present application;
fig. 2 is a block diagram illustrating a structure of a data recovery apparatus in a distributed storage system according to an embodiment of the present disclosure;
FIG. 3 is a flowchart of another method for recovering data in a distributed storage system according to an embodiment of the present disclosure;
fig. 4 is a block diagram illustrating a structure of a data recovery apparatus in another distributed storage system according to an embodiment of the present disclosure.
Detailed Description
The core of the application is to provide a distributed storage system, a data recovery method and device thereof, and a computer-readable storage medium, so as to optimize a data recovery mechanism, improve a data recovery success rate, reduce a server hang-up phenomenon, and improve the working stability of each storage node and the sustainability of services.
In order to more clearly and completely describe the technical solutions in the embodiments of the present application, the technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application. It is to be understood that the embodiments described are only a few embodiments of the present application and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The distributed storage system is a storage system which divides and scatters data according to a certain rule and stores the data on a plurality of independent general storage servers.
In order to ensure the security of user data, multiple copies of the same data are usually copied and stored in storage devices of different storage nodes, and the storage nodes backed up by the same data form a redundant backup group. For example, if three copy backup rules are used, one copy of data is copied into three copies and stored in three different hard disks, and the redundant backup set includes three hard disks. Data maintenance can be realized on each storage node of each redundant backup group by deploying a Placement Group (PG) service
If a storage node needs to update data due to being offline for a period of time or due to data damage, the distributed system may restore the data of the storage device based on the data in the primary storage nodes in the redundant backup group. However, in the data recovery mechanism in the prior art, the main storage node does not perform abnormal data judgment after reading data from the local storage node, but directly sends the data out. Once receiving abnormal data, the storage node receiving the data considers that the storage node itself loses data consistency, and then a server of the storage node is hung up, so that service interruption is caused, and the working pressure of other storage nodes in the distributed storage system is increased, especially when the storage system is busy in service. In addition, generally, after a certain storage node is hung up, the distributed system will automatically initiate a round of master node election again, further affecting the working efficiency of the distributed storage system. In view of this, the present application provides a data recovery scheme in a distributed storage system, which can effectively solve the above problem.
Referring to fig. 1, an embodiment of the present application discloses a data recovery method in a distributed storage system.
The method is particularly applied to main storage nodes in the redundancy backup group. Specifically, a group-in-group service, i.e., a PG service, for maintaining data consistency in the redundant backup group may be deployed on each storage node. A staging group service deployed on a primary storage node in a redundant backup group may be used to implement the data recovery method.
The data recovery method in the distributed storage system provided by the embodiment of the application mainly comprises the following steps:
s101: and receiving a data acquisition request sent from the storage node in the redundant backup group.
Specifically, a redundant backup group includes a master storage node and a plurality of slave storage nodes, each of which stores a different backup of target data. When the data backup stored in a certain slave storage node is damaged, or the slave storage node is offline for a period of time due to a fault and is started online again, the slave storage node initiates a data acquisition request to the master storage node so as to update the data of the slave storage node and keep the data consistency with other storage nodes.
S102: reading each data that needs to be sent to the slave storage node from the local.
After receiving a data acquisition request sent by the slave storage node, the master storage node may read the corresponding data stored locally.
S103: and judging whether each data is abnormal or not so as to determine the abnormal data as target data and suspend the transmission of the target data.
It should be noted that, in the data recovery method provided in the embodiment of the present application, after the data is read from the local storage node, the master storage node does not send the data directly to the slave storage node, but performs an exception judgment on the read data first. For example, if the read data is actually only some error codes and the original data content is missing, it can be determined that the data is abnormal.
Since sending the abnormal data to the slave storage node is meaningless, and the slave storage node hangs up to interrupt service, which aggravates the service operation pressure of other storage nodes in the distributed storage system, in the embodiment of the present application, the master storage node suspends sending the abnormal data, and only sends the non-abnormal data to the slave storage node.
S104: judging whether a target slave storage node storing complete backup of target data exists in the redundant backup group or not; if yes, the process proceeds to S105.
It should be further noted that, in the data recovery method provided in the embodiment of the present application, for locally abnormal data, the master storage node may determine that the data is the target data, and attempt to perform data recovery on the data through other slave storage nodes, so that after the local data recovery is completed, the slave storage node that initiates the data acquisition request performs data recovery.
In particular, since there is typically more than one slave storage node in the redundant backup group, although the target data has been lost in the master storage node, it is still possible for other slave storage nodes to store a full backup of the target data. Therefore, by attempting to recover local data from the storage node based on other data instead of determining that the data is "lost forever" or "unrecoverable" after only the local data is found to be abnormal, the method and the device can effectively avoid unnecessary data loss error reporting and data re-storage.
S105: and performing local data recovery based on the complete backup of the target data, and sending the target data after the local recovery to the slave storage node.
The master storage node can determine the slave storage node storing the complete backup of the target data as the target slave storage node, acquire the complete backup of the target data from the target slave storage node, execute Pull operation, namely write operation, and write the complete backup of the target data into the master storage node after copying, so that data recovery of the master storage node is realized.
After the local data recovery is completed, the master storage node may perform a Push operation, i.e., a sending operation, and send the target data after the local recovery to the slave storage node that originally initiated the data acquisition request, so as to implement data recovery for the slave storage node.
The data recovery method in the distributed storage system provided by the embodiment of the application is realized based on the storage group service deployed on the main storage node in the redundant backup group, and comprises the following steps: receiving a data acquisition request sent from a storage node in a redundant backup group; reading each data needing to be sent to the slave storage node from the local; judging whether each data is abnormal or not so as to determine the abnormal data as target data and suspend the sending of the target data; judging whether a target slave storage node storing complete backup of target data exists in the redundant backup group or not; and if the target slave storage node exists, performing local data recovery based on the complete backup of the target data, and sending the target data after the local recovery to the slave storage node.
Therefore, in the data recovery method in the distributed storage system provided by the application, the main storage node firstly judges whether the data is abnormal before sending the data read from the local to the auxiliary storage node, and suspends the sending of the abnormal data, thereby effectively avoiding unnecessary on-hook and off-line of the auxiliary storage node receiving the abnormal data, preventing the aggravation of business operation pressure, and improving the stability and sustainability of storage service; according to the method and the device, local data recovery is tried to be carried out based on other slave storage nodes, so that the recovered data are sent to the slave storage nodes after the local data are recovered, an error is prevented from being reported immediately under the condition that only the local data are abnormal, a data recovery mechanism is reasonably optimized, and the success rate of data recovery is effectively improved.
As a specific embodiment, the data recovery method in the distributed storage system provided in the embodiment of the present application, based on the above contents, reads each piece of data that needs to be sent to the slave storage node from the local, and includes:
generating a data information queue to be sent; the data information queue to be sent stores the data information of each data to be sent to the slave storage node;
and reading corresponding data from the local according to the data information queue to be sent.
Specifically, in this embodiment, after receiving a data acquisition request sent by a slave storage node, a master storage node may determine data information of each data to be sent, and place the data information into a queue to generate a data information queue to be sent, so as to read data according to the data information queue to be sent.
Further, as a specific embodiment, the generating of the data information queue to be sent may specifically include:
comparing the local data change record with the data change record of the slave storage node;
and determining data information of each data required to be sent to the slave storage node according to the comparison result so as to generate a data information queue to be sent.
Specifically, in this embodiment, the storage node where the storage node is located may be logged by the homing group service, i.e., the PG service, and a data change record is generated and updated and maintained in real time. The master storage node can compare the data change records with the slave storage nodes, determine the information of each data required to be sent to the slave storage nodes according to the difference of the data change records, and generate a data information queue to be sent.
For example, when a slave storage node goes off-line due to some failure, the slave storage node is successfully brought back on-line again. The slave storage node will request data from the master storage node, which may compare the differences in the data change records and determine the data updated by the storage system during the offline of the slave storage node as the data that needs to be sent to the slave storage node.
As a specific embodiment, the data recovery method in the distributed storage system provided in the embodiment of the present application, based on the above contents, determines abnormal data as target data and suspends sending of the target data, and includes:
determining abnormal data as target data;
deleting the data information of the target data from a data information queue to be sent;
and sending the data without abnormity to the slave storage node according to the data information queue to be sent, and sending a deferred sending message aiming at the target data to the slave storage node.
Specifically, in this embodiment, the master storage node may specifically delete the data information of the abnormal data from the to-be-sent data information queue, and further send the relevant data (all data without the abnormal data at this time) to the slave storage node according to the to-be-sent data information queue after the deletion operation is performed, so as to recover part of the data from the slave storage node. Meanwhile, for data that is abnormal, i.e., target data, a suspend-to-send message may be sent to the slave storage node.
As a specific embodiment, the data recovery method in the distributed storage system provided in the embodiment of the present application, on the basis of the foregoing content, after determining whether a target slave storage node storing a complete backup of target data exists in a redundant backup group, further includes:
and if the target slave storage node does not exist, marking the target data as unrecoverable data and generating corresponding prompt information.
Specifically, if there is no complete backup of the target data in the entire redundant backup group, the primary storage node will determine that the target data is "lost forever" or "unrecoverable" at this time. Thus, the primary storage node may mark it as unrecoverable data and prompt the user with a prompt.
Referring to fig. 2, an embodiment of the present application discloses a data recovery apparatus in a distributed storage system, where a plurality of storage nodes in the distributed storage system, which are used for performing redundant storage on the same data, are used as a redundant backup group; the data recovery device is applied to the main storage nodes in the redundant backup group and comprises:
a request receiving module 201, configured to receive a data acquisition request sent from a storage node in a redundant backup group;
the data reading module 202 is used for locally reading each data required to be sent to the slave storage node;
an anomaly determination module 203, configured to determine whether each data is anomalous, so as to determine the anomalous data as target data and suspend sending of the target data;
the data query module 204 is configured to determine whether a target secondary storage node storing a complete backup of the target data exists in the redundant backup group;
and the data recovery module 205 is configured to, when there is a target secondary storage node, perform local data recovery based on the complete backup of the target data, and send the target data after local recovery to the secondary storage node.
As a specific embodiment, in the data recovery apparatus in a distributed storage system provided in the embodiment of the present application, based on the foregoing, the data reading module 202 specifically includes:
the queue generating unit is used for generating a data information queue to be sent; the data information queue to be sent stores the data information of each data to be sent to the slave storage node;
and the data reading unit is used for reading corresponding data from the local according to the data information queue to be sent.
As a specific embodiment, in the data recovery apparatus in the distributed storage system provided in the embodiment of the present application, on the basis of the foregoing content, the queue generating unit is specifically configured to:
comparing the local data change record with the data change record of the slave storage node; and determining data information of each data required to be sent to the slave storage node according to the comparison result so as to generate a data information queue to be sent.
As a specific embodiment, on the basis of the foregoing content, the data recovery apparatus in the distributed storage system provided in the embodiment of the present application, the abnormality determining module 203 specifically includes:
a determination unit configured to determine the abnormal data as target data;
the deleting unit is used for deleting the data information of the target data from the data information queue to be sent;
and the sending unit is used for sending the data without abnormity to the slave storage node according to the data information queue to be sent and sending a deferred sending message aiming at the target data to the slave storage node.
As a specific embodiment, the data recovery apparatus in a distributed storage system provided in the embodiment of the present application, based on the foregoing, further includes:
and a data error reporting module, configured to, after the data querying module 204 determines whether a target secondary storage node storing a complete backup of the target data exists in the redundant backup group, mark the target data as unrecoverable data if the target secondary storage node does not exist, and generate corresponding prompt information.
For specific contents of the data recovery apparatus in the distributed storage system, reference may be made to the foregoing detailed description of the data recovery method in the distributed storage system, and details thereof are not repeated here.
Therefore, according to the data recovery device in the distributed storage system, the main storage node can firstly judge whether the data is abnormal before sending the data read from the local to the auxiliary storage node, and suspend sending of the abnormal data, so that unnecessary hanging-up and offline of the auxiliary storage node receiving the abnormal data are effectively avoided, the service operation pressure is prevented from being intensified, and the stability and the sustainability of the storage service are improved; according to the method and the device, local data recovery is tried to be carried out based on other slave storage nodes, so that the recovered data are sent to the slave storage nodes after the local data are recovered, an error is prevented from being reported immediately under the condition that only the local data are abnormal, a data recovery mechanism is reasonably optimized, and the success rate of data recovery is effectively improved.
Referring to fig. 3, an embodiment of the present application discloses another method for recovering data in a distributed storage system.
The method is specifically applied to slave storage nodes in a redundant backup group. Specifically, a group-in-group service, i.e., a PG service, for maintaining data consistency in the redundant backup group may be deployed on each storage node. A staging group service deployed on a slave storage node in a redundant backup group may be used to implement the data recovery method.
The data recovery method in the distributed storage system provided by the embodiment of the application mainly comprises the following steps:
s301: and when the data updating requirement occurs, sending a data acquisition request to the main storage node in the redundant backup group.
Specifically, when the storage node is on-line again after being off-hook, or some data is found to be damaged, the data updating requirement occurs.
S302: and after the main storage node reads and judges whether each data needing to be sent is abnormal from the local and determines the abnormal data as target data, the data without the abnormality sent by the main storage node is received.
S303: after the main storage node judges whether a target slave storage node storing the complete backup of the target data exists in the redundant backup group, if the target slave storage node exists, the main storage node receives the target data sent by the main storage node after performing local data recovery based on the complete backup of the target data.
For the specific content of the data recovery method applied to the slave storage nodes in the distributed storage system, reference may be made to the foregoing detailed description of the data recovery method applied to the master storage nodes in the distributed storage system, and details are not repeated here.
Therefore, in the data recovery method in the distributed storage system provided by the application, the main storage node firstly judges whether the data is abnormal before sending the data read from the local to the auxiliary storage node, and suspends the sending of the abnormal data, thereby effectively avoiding unnecessary on-hook and off-line of the auxiliary storage node receiving the abnormal data, preventing the aggravation of business operation pressure, and improving the stability and sustainability of storage service; according to the method and the device, local data recovery is tried to be carried out based on other slave storage nodes, so that the recovered data are sent to the slave storage nodes after the local data are recovered, an error is prevented from being reported immediately under the condition that only the local data are abnormal, a data recovery mechanism is reasonably optimized, and the success rate of data recovery is effectively improved.
Referring to fig. 4, an embodiment of the present application discloses another data recovery apparatus in a distributed storage system, where a plurality of storage nodes in the distributed storage system for performing redundant storage on the same data are used as a redundant backup group; the data recovery device is applied to the slave storage nodes in the redundant backup group and comprises:
a data request module 401, configured to send a data obtaining request to a main storage node in a redundant backup group when a data update requirement occurs;
a data receiving module 402, configured to receive data that is sent by a main storage node and has no exception after the main storage node locally reads and determines whether each data that needs to be sent is abnormal and determines the abnormal data as target data; after the main storage node judges whether a target slave storage node storing the complete backup of the target data exists in the redundant backup group, if the target slave storage node exists, the main storage node receives the target data sent by the main storage node after performing local data recovery based on the complete backup of the target data.
Therefore, in the data recovery device in the distributed storage system provided by the application, the main storage node firstly judges whether the data is abnormal before sending the data read from the local to the auxiliary storage node, and suspends the sending of the abnormal data, thereby effectively avoiding unnecessary on-hook and off-line of the auxiliary storage node receiving the abnormal data, preventing the aggravation of business operation pressure, and improving the stability and sustainability of storage service; according to the method and the device, local data recovery is tried to be carried out based on other slave storage nodes, so that the recovered data are sent to the slave storage nodes after the local data are recovered, an error is prevented from being reported immediately under the condition that only the local data are abnormal, a data recovery mechanism is reasonably optimized, and the success rate of data recovery is effectively improved.
Furthermore, the application also discloses a distributed storage system, wherein a plurality of storage nodes for performing redundant storage on the same data in the distributed storage system are used as a redundant backup group, and a homing group service for maintaining the data consistency in the redundant backup group is deployed on each storage node;
the slave storage nodes in the redundant backup group are used for sending data acquisition requests to the master storage nodes in the redundant backup group through the homing group service when the data updating requirement occurs;
the main storage node is used for reading and judging whether each data needing to be sent to the auxiliary storage nodes is abnormal from the local through the homing group service, determining the abnormal data as target data, suspending the sending of the target data, judging whether a target auxiliary storage node storing the complete backup of the target data exists in the redundancy backup group, if so, performing local data recovery based on the complete backup of the target data, and sending the target data after local recovery to the auxiliary storage node.
Further, the present application also discloses a computer-readable storage medium, in which a computer program is stored, and the computer program is used for implementing the steps of the data recovery method in any one of the distributed storage systems as described above when being executed by a processor.
For the details of the distributed storage system and the computer-readable storage medium, reference may be made to the foregoing detailed description of the data recovery method in the distributed storage system, and details thereof are not repeated here.
The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the equipment disclosed by the embodiment, the description is relatively simple because the equipment corresponds to the method disclosed by the embodiment, and the relevant parts can be referred to the method part for description.
It is further noted that, throughout this document, relational terms such as "first" and "second" are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Furthermore, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The technical solutions provided by the present application are described in detail above. The principles and embodiments of the present application are explained herein using specific examples, which are provided only to help understand the method and the core idea of the present application. It should be noted that, for those skilled in the art, without departing from the principle of the present application, several improvements and modifications can be made to the present application, and these improvements and modifications also fall into the protection scope of the present application.

Claims (10)

1. A data recovery method in a distributed storage system is characterized in that a plurality of storage nodes for performing redundant storage on the same data in the distributed storage system are used as a redundant backup group; the method is applied to the main storage node in the redundant backup group and comprises the following steps:
receiving a data acquisition request sent from a storage node in a redundant backup group;
reading each data needing to be sent to the slave storage node from the local;
judging whether each data is abnormal or not so as to determine the abnormal data as target data and suspend the sending of the target data;
judging whether a target slave storage node storing the complete backup of the target data exists in the redundant backup group or not;
and if the target secondary storage node exists, performing local data recovery based on the complete backup of the target data, and sending the target data after local recovery to the secondary storage node.
2. The data recovery method of claim 1, wherein the locally reading each data that needs to be sent to the slave storage node comprises:
generating a data information queue to be sent; the data information queue to be sent stores the data information of each data to be sent to the slave storage node;
and reading corresponding data from the local according to the data information queue to be sent.
3. The data recovery method of claim 2, wherein the generating a queue of data information to be sent comprises:
comparing the local data change record with the data change record of the slave storage node;
and determining data information of each data required to be sent to the slave storage node according to the comparison result so as to generate the data information queue to be sent.
4. The data recovery method according to claim 2, wherein the determining abnormal data as target data and suspending transmission of the target data includes:
determining abnormal data as the target data;
deleting the data information of the target data from the data information queue to be sent;
and sending the data without abnormity to the slave storage node according to the data information queue to be sent, and sending a deferred sending message aiming at the target data to the slave storage node.
5. The data recovery method according to any one of claims 1 to 4, further comprising, after the determining whether there is a target slave storage node in the redundant backup group that stores a full backup of the target data,:
and if the target slave storage node does not exist, marking the target data as unrecoverable data and generating corresponding prompt information.
6. A data recovery method in a distributed storage system is characterized in that a plurality of storage nodes for performing redundant storage on the same data in the distributed storage system are used as a redundant backup group; the method is applied to the slave storage nodes in the redundant backup group and comprises the following steps:
when a data updating requirement occurs, sending a data acquisition request to a main storage node in a redundancy backup group;
after the main storage node reads and judges whether each data needing to be sent is abnormal from the local and determines the abnormal data as target data, the abnormal data sent by the main storage node is received;
after the main storage node judges whether a target secondary storage node storing the complete backup of the target data exists in the redundant backup group, if the target secondary storage node exists, the main storage node receives the target data sent by the main storage node after performing local data recovery based on the complete backup of the target data.
7. A data recovery device in a distributed storage system is characterized in that a plurality of storage nodes for performing redundant storage on the same data in the distributed storage system are used as a redundant backup group; the data recovery device is applied to the main storage node in the redundancy backup group and comprises:
the request receiving module is used for receiving a data acquisition request sent from a storage node in the redundant backup group;
the data reading module is used for locally reading each data needing to be sent to the slave storage node;
the abnormality judgment module is used for judging whether each data is abnormal or not so as to determine the abnormal data as target data and suspend the sending of the target data;
the data query module is used for judging whether a target slave storage node storing the complete backup of the target data exists in the redundant backup group or not;
and the data recovery module is used for performing local data recovery based on the complete backup of the target data and sending the target data after local recovery to the slave storage node when the target slave storage node exists.
8. A data recovery device in a distributed storage system is characterized in that a plurality of storage nodes for performing redundant storage on the same data in the distributed storage system are used as a redundant backup group; the data recovery device is applied to the slave storage nodes in the redundant backup group and comprises:
the data request module is used for sending a data acquisition request to the main storage node in the redundant backup group when the data updating requirement occurs;
the data receiving module is used for receiving the data which are not abnormal and sent by the main storage node after the main storage node reads from the local and judges whether each data which needs to be sent is abnormal or not and determines the abnormal data as target data; after the main storage node judges whether a target secondary storage node storing the complete backup of the target data exists in the redundant backup group, if the target secondary storage node exists, the main storage node receives the target data sent by the main storage node after performing local data recovery based on the complete backup of the target data.
9. A distributed storage system is characterized in that a plurality of storage nodes for performing redundant storage on the same data in the distributed storage system are used as a redundant backup group, and a grouping service for maintaining the data consistency in the redundant backup group is deployed on each storage node;
the slave storage nodes in the redundant backup group are used for sending data acquisition requests to the master storage nodes in the redundant backup group through the homing group service when the data updating requirement occurs;
the main storage node is used for locally reading and judging whether each data needing to be sent to the auxiliary storage node is abnormal or not through the homing group service, determining the abnormal data as target data and suspending the sending of the target data, judging whether a target auxiliary storage node storing a complete backup of the target data exists in the redundant backup group or not, if the target auxiliary storage node exists, performing local data recovery based on the complete backup of the target data, and sending the target data after local recovery to the auxiliary storage node.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, is adapted to carry out the steps of the method for data recovery in a distributed storage system according to any one of claims 1 to 6.
CN201911402944.0A 2019-12-30 2019-12-30 Distributed storage system and data recovery method, device and medium thereof Pending CN111176900A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911402944.0A CN111176900A (en) 2019-12-30 2019-12-30 Distributed storage system and data recovery method, device and medium thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911402944.0A CN111176900A (en) 2019-12-30 2019-12-30 Distributed storage system and data recovery method, device and medium thereof

Publications (1)

Publication Number Publication Date
CN111176900A true CN111176900A (en) 2020-05-19

Family

ID=70654232

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911402944.0A Pending CN111176900A (en) 2019-12-30 2019-12-30 Distributed storage system and data recovery method, device and medium thereof

Country Status (1)

Country Link
CN (1) CN111176900A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112527561A (en) * 2020-12-09 2021-03-19 广州技象科技有限公司 Data backup method and device based on Internet of things cloud storage
CN113791922A (en) * 2021-07-30 2021-12-14 济南浪潮数据技术有限公司 Exception handling method, system and device for distributed storage system
CN115328880A (en) * 2022-10-13 2022-11-11 浙江智臾科技有限公司 Distributed file online recovery method, system, computer equipment and storage medium
CN116662081A (en) * 2023-08-01 2023-08-29 苏州浪潮智能科技有限公司 Distributed storage redundancy method and device, electronic equipment and storage medium

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101826040A (en) * 2009-03-03 2010-09-08 闪联信息技术工程中心有限公司 Method and system for automatically detecting and restoring memory equipment
CN102567438A (en) * 2010-09-28 2012-07-11 迈塔斯威士网络有限公司 Method for providing access to data items from a distributed storage system
JP2013009184A (en) * 2011-06-24 2013-01-10 Nippon Telegr & Teleph Corp <Ntt> Time synchronous system
CN102984009A (en) * 2012-12-06 2013-03-20 北京邮电大学 Disaster recovery backup method for VoIP (Voice overInternet Protocol) system based on P2P
US20140245073A1 (en) * 2013-02-22 2014-08-28 International Business Machines Corporation Managing error logs in a distributed network fabric
CN104572339A (en) * 2013-10-17 2015-04-29 捷达世软件(深圳)有限公司 Data backup restoring system and method based on distributed file system
CN106406758A (en) * 2016-09-05 2017-02-15 华为技术有限公司 Data processing method based on distributed storage system, and storage equipment
CN107544862A (en) * 2016-06-29 2018-01-05 中兴通讯股份有限公司 A kind of data storage reconstructing method and device, memory node based on correcting and eleting codes
CN107870829A (en) * 2016-09-24 2018-04-03 华为技术有限公司 A kind of distributed data restoration methods, server, relevant device and system
US20180150521A1 (en) * 2016-11-28 2018-05-31 Sap Se Distributed joins in a distributed database system
WO2018098972A1 (en) * 2016-11-30 2018-06-07 华为技术有限公司 Log recovery method, storage device and storage node
CN109669929A (en) * 2018-12-14 2019-04-23 江苏瑞中数据股份有限公司 Method for storing real-time data and system based on distributed parallel database
US20190163374A1 (en) * 2017-11-28 2019-05-30 Entit Software Llc Storing data objects using different redundancy schemes

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101826040A (en) * 2009-03-03 2010-09-08 闪联信息技术工程中心有限公司 Method and system for automatically detecting and restoring memory equipment
CN102567438A (en) * 2010-09-28 2012-07-11 迈塔斯威士网络有限公司 Method for providing access to data items from a distributed storage system
JP2013009184A (en) * 2011-06-24 2013-01-10 Nippon Telegr & Teleph Corp <Ntt> Time synchronous system
CN102984009A (en) * 2012-12-06 2013-03-20 北京邮电大学 Disaster recovery backup method for VoIP (Voice overInternet Protocol) system based on P2P
US20140245073A1 (en) * 2013-02-22 2014-08-28 International Business Machines Corporation Managing error logs in a distributed network fabric
CN104572339A (en) * 2013-10-17 2015-04-29 捷达世软件(深圳)有限公司 Data backup restoring system and method based on distributed file system
CN107544862A (en) * 2016-06-29 2018-01-05 中兴通讯股份有限公司 A kind of data storage reconstructing method and device, memory node based on correcting and eleting codes
CN106406758A (en) * 2016-09-05 2017-02-15 华为技术有限公司 Data processing method based on distributed storage system, and storage equipment
CN107870829A (en) * 2016-09-24 2018-04-03 华为技术有限公司 A kind of distributed data restoration methods, server, relevant device and system
US20180150521A1 (en) * 2016-11-28 2018-05-31 Sap Se Distributed joins in a distributed database system
WO2018098972A1 (en) * 2016-11-30 2018-06-07 华为技术有限公司 Log recovery method, storage device and storage node
US20190163374A1 (en) * 2017-11-28 2019-05-30 Entit Software Llc Storing data objects using different redundancy schemes
CN109669929A (en) * 2018-12-14 2019-04-23 江苏瑞中数据股份有限公司 Method for storing real-time data and system based on distributed parallel database

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
胡至洵: "面向分布式文件存储系统的数据恢复策略", 《能源与环保》 *
赵正文, 电子科技大学出版社 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112527561A (en) * 2020-12-09 2021-03-19 广州技象科技有限公司 Data backup method and device based on Internet of things cloud storage
CN112527561B (en) * 2020-12-09 2021-10-01 广州技象科技有限公司 Data backup method and device based on Internet of things cloud storage
CN113791922A (en) * 2021-07-30 2021-12-14 济南浪潮数据技术有限公司 Exception handling method, system and device for distributed storage system
CN113791922B (en) * 2021-07-30 2024-02-20 济南浪潮数据技术有限公司 Exception handling method, system and device for distributed storage system
CN115328880A (en) * 2022-10-13 2022-11-11 浙江智臾科技有限公司 Distributed file online recovery method, system, computer equipment and storage medium
CN115328880B (en) * 2022-10-13 2023-03-24 浙江智臾科技有限公司 Distributed file online recovery method, system, computer equipment and storage medium
CN116662081A (en) * 2023-08-01 2023-08-29 苏州浪潮智能科技有限公司 Distributed storage redundancy method and device, electronic equipment and storage medium
CN116662081B (en) * 2023-08-01 2024-02-27 苏州浪潮智能科技有限公司 Distributed storage redundancy method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN111176900A (en) Distributed storage system and data recovery method, device and medium thereof
US7793060B2 (en) System method and circuit for differential mirroring of data
JP4301849B2 (en) Information processing method and its execution system, its processing program, disaster recovery method and system, storage device for executing the processing, and its control processing method
US7509468B1 (en) Policy-based data protection
JP4796854B2 (en) Measures against data overflow of intermediate volume in differential remote copy
JP4744171B2 (en) Computer system and storage control method
CN106951559B (en) Data recovery method in distributed file system and electronic equipment
US11892922B2 (en) State management methods, methods for switching between master application server and backup application server, and electronic devices
CN106776130B (en) Log recovery method, storage device and storage node
US6654771B1 (en) Method and system for network data replication
US20070208918A1 (en) Method and apparatus for providing virtual machine backup
JP2004334574A (en) Operation managing program and method of storage, and managing computer
JP2009015476A (en) Journal management method in cdp remote configuration
CN110597779A (en) Data reading and writing method in distributed file system and related device
KR101605455B1 (en) Method for Replicationing of Redo Log without Data Loss and System Thereof
JP5154843B2 (en) Cluster system, computer, and failure recovery method
JP2008276281A (en) Data synchronization system, method, and program
CN112256201B (en) Distributed block storage system and volume information management method thereof
CN113297134B (en) Data processing system, data processing method and device, and electronic device
JP2004185573A (en) Data writing method and device
JP4721057B2 (en) Data management system, data management method, and data management program
JP2004078437A (en) Method and system for duplexing file system management information
CN111756562B (en) Cluster takeover method, system and related components
CN113076065B (en) Data output fault tolerance method in high-performance computing system
CN117170937A (en) Data operation request processing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200519

RJ01 Rejection of invention patent application after publication