CN111176900A - Distributed storage system and data recovery method, device and medium thereof - Google Patents
Distributed storage system and data recovery method, device and medium thereof Download PDFInfo
- Publication number
- CN111176900A CN111176900A CN201911402944.0A CN201911402944A CN111176900A CN 111176900 A CN111176900 A CN 111176900A CN 201911402944 A CN201911402944 A CN 201911402944A CN 111176900 A CN111176900 A CN 111176900A
- Authority
- CN
- China
- Prior art keywords
- data
- storage node
- target
- sent
- slave
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000011084 recovery Methods 0.000 title claims abstract description 124
- 238000000034 method Methods 0.000 title claims abstract description 65
- 230000002159 abnormal effect Effects 0.000 claims abstract description 79
- 230000008859 change Effects 0.000 claims description 14
- 230000005856 abnormality Effects 0.000 claims description 5
- 238000004590 computer program Methods 0.000 claims description 4
- 230000005540 biological transmission Effects 0.000 claims description 2
- 230000007246 mechanism Effects 0.000 abstract description 10
- 230000008569 process Effects 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 230000002547 anomalous effect Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000012423 maintenance Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1446—Point-in-time backing up or restoration of persistent data
- G06F11/1448—Management of the data involved in backup or backup restore
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1479—Generic software techniques for error detection or fault masking
Abstract
The application discloses a distributed storage system, a data recovery method and a data recovery device thereof, and a plurality of storage nodes for performing redundant storage on the same data in the distributed storage system of a computer readable storage medium, wherein the plurality of storage nodes are used as a redundant backup group, and the method is applied to a main storage node in the redundant backup group and comprises the following steps: receiving a data acquisition request sent from a storage node in a redundant backup group; reading each data needing to be sent to the slave storage node from the local; judging whether each data is abnormal or not so as to determine the abnormal data as target data and suspend the sending of the target data; judging whether a target slave storage node storing complete backup of target data exists in the redundant backup group or not; and if the target slave storage node exists, performing local data recovery based on the complete backup of the target data, and sending the target data after the local recovery to the slave storage node. The data recovery mechanism can be reasonably optimized, the hanging-up of the server is reduced, and the stability and the sustainability of the service are improved.
Description
Technical Field
The present application relates to the field of distributed storage technologies, and in particular, to a distributed storage system, a data recovery method and apparatus thereof, and a computer-readable storage medium.
Background
The distributed storage system is a storage system which divides and scatters data according to a certain rule and stores the data on a plurality of independent general storage servers.
In order to ensure the security of user data, multiple copies of the same data are usually copied and stored in storage devices of different storage nodes, and the storage nodes backed up by the same data form a redundant backup group. For example, if three copy backup rules are used, one copy of data is copied into three copies and stored in three different hard disks, and the redundant backup set includes three hard disks. Data maintenance can be realized on each storage node of each redundant backup group by deploying a Placement Group (PG) service
If a storage node needs to update data due to being offline for a period of time or due to data damage, the distributed system may restore the data of the storage device based on the data in the primary storage nodes in the redundant backup group. However, in the data recovery mechanism in the prior art, the main storage node does not perform abnormal data judgment after reading data from the local storage node, but directly sends the data out. Once receiving abnormal data, the storage node receiving the data considers that the storage node itself loses data consistency, and then a server of the storage node is hung up, so that storage service interruption is caused, and the working pressure of other storage nodes in the distributed storage system is aggravated, especially when the storage system is busy in service. In addition, generally, after a certain storage node is hung up, the distributed system will automatically initiate a round of master node election again, further affecting the working efficiency of the distributed storage system.
In view of the above, it is an important need for those skilled in the art to provide a solution to the above technical problems.
Disclosure of Invention
The application aims to provide a distributed storage system, a data recovery method and a data recovery device thereof, and a computer-readable storage medium, so as to optimize a data recovery mechanism, improve the success rate of data recovery, reduce the phenomenon of server hang-up, and improve the working stability of each storage node and the sustainability of services.
In order to solve the above technical problem, in a first aspect, the present application discloses a data recovery method in a distributed storage system, where a plurality of storage nodes for performing redundant storage on the same data in the distributed storage system are used as a redundant backup group; the method is applied to the main storage node in the redundant backup group and comprises the following steps:
receiving a data acquisition request sent from a storage node in a redundant backup group;
reading each data needing to be sent to the slave storage node from the local;
judging whether each data is abnormal or not so as to determine the abnormal data as target data and suspend the sending of the target data;
judging whether a target slave storage node storing the complete backup of the target data exists in the redundant backup group or not;
and if the target secondary storage node exists, performing local data recovery based on the complete backup of the target data, and sending the target data after local recovery to the secondary storage node.
Optionally, the locally reading each piece of data that needs to be sent to the slave storage node includes:
generating a data information queue to be sent; the data information queue to be sent stores the data information of each data to be sent to the slave storage node;
and reading corresponding data from the local according to the data information queue to be sent.
Optionally, the generating a queue of data information to be sent includes:
comparing the local data change record with the data change record of the slave storage node;
and determining data information of each data required to be sent to the slave storage node according to the comparison result so as to generate the data information queue to be sent.
Optionally, the determining abnormal data as target data and suspending sending of the target data includes:
determining abnormal data as the target data;
deleting the data information of the target data from the data information queue to be sent;
and sending the data without abnormity to the slave storage node according to the data information queue to be sent, and sending a deferred sending message aiming at the target data to the slave storage node.
Optionally, after the determining whether there is a target slave storage node storing a full backup of the target data in the redundant backup group, the method further includes:
and if the target slave storage node does not exist, marking the target data as unrecoverable data and generating corresponding prompt information.
The application also discloses another data recovery method in the distributed storage system, wherein a plurality of storage nodes for performing redundant storage on the same data in the distributed storage system are used as a redundant backup group; the method is applied to the slave storage nodes in the redundant backup group and comprises the following steps:
when a data updating requirement occurs, sending a data acquisition request to a main storage node in a redundancy backup group;
after the main storage node reads and judges whether each data needing to be sent is abnormal from the local and determines the abnormal data as target data, the abnormal data sent by the main storage node is received;
after the main storage node judges whether a target secondary storage node storing the complete backup of the target data exists in the redundant backup group, if the target secondary storage node exists, the main storage node receives the target data sent by the main storage node after performing local data recovery based on the complete backup of the target data.
In a second aspect, the present application further discloses a data recovery apparatus in a distributed storage system, where a plurality of storage nodes for performing redundant storage on the same data in the distributed storage system are used as a redundant backup group; the data recovery device is applied to the main storage node in the redundancy backup group and comprises:
the request receiving module is used for receiving a data acquisition request sent from a storage node in the redundant backup group;
the data reading module is used for locally reading each data needing to be sent to the slave storage node;
the abnormality judgment module is used for judging whether each data is abnormal or not so as to determine the abnormal data as target data and suspend the sending of the target data;
the data query module is used for judging whether a target slave storage node storing the complete backup of the target data exists in the redundant backup group or not;
and the data recovery module is used for performing local data recovery based on the complete backup of the target data and sending the target data after local recovery to the slave storage node when the target slave storage node exists.
Optionally, the data reading module specifically includes:
the queue generating unit is used for generating a data information queue to be sent; the data information queue to be sent stores the data information of each data to be sent to the slave storage node;
and the data reading unit is used for reading corresponding data from the local according to the to-be-sent data information queue.
Optionally, the queue generating unit is specifically configured to:
comparing the local data change record with the data change record of the slave storage node; and determining data information of each data required to be sent to the slave storage node according to the comparison result so as to generate the data information queue to be sent.
Optionally, the abnormality determining module specifically includes:
a determination unit configured to determine abnormal data as the target data;
a deleting unit, configured to delete the data information of the target data from the to-be-sent data information queue;
and the sending unit is used for sending the data without abnormity to the slave storage node according to the data information queue to be sent and sending a deferred sending message aiming at the target data to the slave storage node.
Optionally, the method further comprises:
and the data error reporting module is used for marking the target data as unrecoverable data and generating corresponding prompt information if the target secondary storage node does not exist after the data query module judges whether the target secondary storage node storing the complete backup of the target data exists in the redundant backup group.
The application also discloses another data recovery device in the distributed storage system, wherein a plurality of storage nodes for performing redundant storage on the same data in the distributed storage system are used as a redundant backup group; the data recovery device is applied to the slave storage nodes in the redundant backup group and comprises:
the data request module is used for sending a data acquisition request to the main storage node in the redundant backup group when the data updating requirement occurs;
the data receiving module is used for receiving the data which are not abnormal and sent by the main storage node after the main storage node reads from the local and judges whether each data which needs to be sent is abnormal or not and determines the abnormal data as target data; after the main storage node judges whether a target secondary storage node storing the complete backup of the target data exists in the redundant backup group, if the target secondary storage node exists, the main storage node receives the target data sent by the main storage node after performing local data recovery based on the complete backup of the target data.
In a third aspect, the present application further discloses a distributed storage system, where multiple storage nodes for performing redundant storage on the same data in the distributed storage system are used as a redundant backup group, and a group-in-group service for maintaining data consistency in the redundant backup group is deployed on each storage node;
the slave storage nodes in the redundant backup group are used for sending data acquisition requests to the master storage nodes in the redundant backup group through the homing group service when the data updating requirement occurs;
the main storage node is used for locally reading and judging whether each data needing to be sent to the auxiliary storage node is abnormal or not through the homing group service, determining the abnormal data as target data and suspending the sending of the target data, judging whether a target auxiliary storage node storing a complete backup of the target data exists in the redundant backup group or not, if the target auxiliary storage node exists, performing local data recovery based on the complete backup of the target data, and sending the target data after local recovery to the auxiliary storage node.
In a fourth aspect, the present application further discloses a computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, is adapted to implement the steps of the data recovery method in any one of the distributed storage systems as described above.
The application provides a data recovery method in a distributed storage system, wherein a plurality of storage nodes for performing redundant storage on the same data in the distributed storage system are used as a redundant backup group, and a homing group service for maintaining the data consistency in the redundant backup group is deployed on each storage node; the put-group service deployed on a primary storage node in a redundant backup group is used to implement the method, which comprises: receiving a data acquisition request sent from a storage node in a redundant backup group; reading each data needing to be sent to the slave storage node from the local; judging whether each data is abnormal or not so as to determine the abnormal data as target data and suspend the sending of the target data; judging whether a target slave storage node storing the complete backup of the target data exists in the redundant backup group or not; and if the target secondary storage node exists, performing local data recovery based on the complete backup of the target data, and sending the target data after local recovery to the secondary storage node.
Therefore, in the data recovery method in the distributed storage system provided by the application, the main storage node firstly judges whether the data is abnormal before sending the data read from the local to the auxiliary storage node, and suspends the sending of the abnormal data, thereby effectively avoiding unnecessary on-hook and off-line of the auxiliary storage node receiving the abnormal data, preventing the aggravation of business operation pressure, and improving the stability and sustainability of storage service; according to the method and the device, local data recovery is tried to be carried out based on other slave storage nodes, so that the recovered data are sent to the slave storage nodes after the local data are recovered, an error is prevented from being reported immediately under the condition that only the local data are abnormal, a data recovery mechanism is reasonably optimized, and the success rate of data recovery is effectively improved. The distributed storage system, the data recovery device and the computer readable storage medium provided by the application also have the beneficial effects.
Drawings
In order to more clearly illustrate the technical solutions in the prior art and the embodiments of the present application, the drawings that are needed to be used in the description of the prior art and the embodiments of the present application will be briefly described below. Of course, the following description of the drawings related to the embodiments of the present application is only a part of the embodiments of the present application, and it will be obvious to those skilled in the art that other drawings can be obtained from the provided drawings without any creative effort, and the obtained other drawings also belong to the protection scope of the present application.
Fig. 1 is a flowchart of a data recovery method in a distributed storage system according to an embodiment of the present application;
fig. 2 is a block diagram illustrating a structure of a data recovery apparatus in a distributed storage system according to an embodiment of the present disclosure;
FIG. 3 is a flowchart of another method for recovering data in a distributed storage system according to an embodiment of the present disclosure;
fig. 4 is a block diagram illustrating a structure of a data recovery apparatus in another distributed storage system according to an embodiment of the present disclosure.
Detailed Description
The core of the application is to provide a distributed storage system, a data recovery method and device thereof, and a computer-readable storage medium, so as to optimize a data recovery mechanism, improve a data recovery success rate, reduce a server hang-up phenomenon, and improve the working stability of each storage node and the sustainability of services.
In order to more clearly and completely describe the technical solutions in the embodiments of the present application, the technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application. It is to be understood that the embodiments described are only a few embodiments of the present application and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The distributed storage system is a storage system which divides and scatters data according to a certain rule and stores the data on a plurality of independent general storage servers.
In order to ensure the security of user data, multiple copies of the same data are usually copied and stored in storage devices of different storage nodes, and the storage nodes backed up by the same data form a redundant backup group. For example, if three copy backup rules are used, one copy of data is copied into three copies and stored in three different hard disks, and the redundant backup set includes three hard disks. Data maintenance can be realized on each storage node of each redundant backup group by deploying a Placement Group (PG) service
If a storage node needs to update data due to being offline for a period of time or due to data damage, the distributed system may restore the data of the storage device based on the data in the primary storage nodes in the redundant backup group. However, in the data recovery mechanism in the prior art, the main storage node does not perform abnormal data judgment after reading data from the local storage node, but directly sends the data out. Once receiving abnormal data, the storage node receiving the data considers that the storage node itself loses data consistency, and then a server of the storage node is hung up, so that service interruption is caused, and the working pressure of other storage nodes in the distributed storage system is increased, especially when the storage system is busy in service. In addition, generally, after a certain storage node is hung up, the distributed system will automatically initiate a round of master node election again, further affecting the working efficiency of the distributed storage system. In view of this, the present application provides a data recovery scheme in a distributed storage system, which can effectively solve the above problem.
Referring to fig. 1, an embodiment of the present application discloses a data recovery method in a distributed storage system.
The method is particularly applied to main storage nodes in the redundancy backup group. Specifically, a group-in-group service, i.e., a PG service, for maintaining data consistency in the redundant backup group may be deployed on each storage node. A staging group service deployed on a primary storage node in a redundant backup group may be used to implement the data recovery method.
The data recovery method in the distributed storage system provided by the embodiment of the application mainly comprises the following steps:
s101: and receiving a data acquisition request sent from the storage node in the redundant backup group.
Specifically, a redundant backup group includes a master storage node and a plurality of slave storage nodes, each of which stores a different backup of target data. When the data backup stored in a certain slave storage node is damaged, or the slave storage node is offline for a period of time due to a fault and is started online again, the slave storage node initiates a data acquisition request to the master storage node so as to update the data of the slave storage node and keep the data consistency with other storage nodes.
S102: reading each data that needs to be sent to the slave storage node from the local.
After receiving a data acquisition request sent by the slave storage node, the master storage node may read the corresponding data stored locally.
S103: and judging whether each data is abnormal or not so as to determine the abnormal data as target data and suspend the transmission of the target data.
It should be noted that, in the data recovery method provided in the embodiment of the present application, after the data is read from the local storage node, the master storage node does not send the data directly to the slave storage node, but performs an exception judgment on the read data first. For example, if the read data is actually only some error codes and the original data content is missing, it can be determined that the data is abnormal.
Since sending the abnormal data to the slave storage node is meaningless, and the slave storage node hangs up to interrupt service, which aggravates the service operation pressure of other storage nodes in the distributed storage system, in the embodiment of the present application, the master storage node suspends sending the abnormal data, and only sends the non-abnormal data to the slave storage node.
S104: judging whether a target slave storage node storing complete backup of target data exists in the redundant backup group or not; if yes, the process proceeds to S105.
It should be further noted that, in the data recovery method provided in the embodiment of the present application, for locally abnormal data, the master storage node may determine that the data is the target data, and attempt to perform data recovery on the data through other slave storage nodes, so that after the local data recovery is completed, the slave storage node that initiates the data acquisition request performs data recovery.
In particular, since there is typically more than one slave storage node in the redundant backup group, although the target data has been lost in the master storage node, it is still possible for other slave storage nodes to store a full backup of the target data. Therefore, by attempting to recover local data from the storage node based on other data instead of determining that the data is "lost forever" or "unrecoverable" after only the local data is found to be abnormal, the method and the device can effectively avoid unnecessary data loss error reporting and data re-storage.
S105: and performing local data recovery based on the complete backup of the target data, and sending the target data after the local recovery to the slave storage node.
The master storage node can determine the slave storage node storing the complete backup of the target data as the target slave storage node, acquire the complete backup of the target data from the target slave storage node, execute Pull operation, namely write operation, and write the complete backup of the target data into the master storage node after copying, so that data recovery of the master storage node is realized.
After the local data recovery is completed, the master storage node may perform a Push operation, i.e., a sending operation, and send the target data after the local recovery to the slave storage node that originally initiated the data acquisition request, so as to implement data recovery for the slave storage node.
The data recovery method in the distributed storage system provided by the embodiment of the application is realized based on the storage group service deployed on the main storage node in the redundant backup group, and comprises the following steps: receiving a data acquisition request sent from a storage node in a redundant backup group; reading each data needing to be sent to the slave storage node from the local; judging whether each data is abnormal or not so as to determine the abnormal data as target data and suspend the sending of the target data; judging whether a target slave storage node storing complete backup of target data exists in the redundant backup group or not; and if the target slave storage node exists, performing local data recovery based on the complete backup of the target data, and sending the target data after the local recovery to the slave storage node.
Therefore, in the data recovery method in the distributed storage system provided by the application, the main storage node firstly judges whether the data is abnormal before sending the data read from the local to the auxiliary storage node, and suspends the sending of the abnormal data, thereby effectively avoiding unnecessary on-hook and off-line of the auxiliary storage node receiving the abnormal data, preventing the aggravation of business operation pressure, and improving the stability and sustainability of storage service; according to the method and the device, local data recovery is tried to be carried out based on other slave storage nodes, so that the recovered data are sent to the slave storage nodes after the local data are recovered, an error is prevented from being reported immediately under the condition that only the local data are abnormal, a data recovery mechanism is reasonably optimized, and the success rate of data recovery is effectively improved.
As a specific embodiment, the data recovery method in the distributed storage system provided in the embodiment of the present application, based on the above contents, reads each piece of data that needs to be sent to the slave storage node from the local, and includes:
generating a data information queue to be sent; the data information queue to be sent stores the data information of each data to be sent to the slave storage node;
and reading corresponding data from the local according to the data information queue to be sent.
Specifically, in this embodiment, after receiving a data acquisition request sent by a slave storage node, a master storage node may determine data information of each data to be sent, and place the data information into a queue to generate a data information queue to be sent, so as to read data according to the data information queue to be sent.
Further, as a specific embodiment, the generating of the data information queue to be sent may specifically include:
comparing the local data change record with the data change record of the slave storage node;
and determining data information of each data required to be sent to the slave storage node according to the comparison result so as to generate a data information queue to be sent.
Specifically, in this embodiment, the storage node where the storage node is located may be logged by the homing group service, i.e., the PG service, and a data change record is generated and updated and maintained in real time. The master storage node can compare the data change records with the slave storage nodes, determine the information of each data required to be sent to the slave storage nodes according to the difference of the data change records, and generate a data information queue to be sent.
For example, when a slave storage node goes off-line due to some failure, the slave storage node is successfully brought back on-line again. The slave storage node will request data from the master storage node, which may compare the differences in the data change records and determine the data updated by the storage system during the offline of the slave storage node as the data that needs to be sent to the slave storage node.
As a specific embodiment, the data recovery method in the distributed storage system provided in the embodiment of the present application, based on the above contents, determines abnormal data as target data and suspends sending of the target data, and includes:
determining abnormal data as target data;
deleting the data information of the target data from a data information queue to be sent;
and sending the data without abnormity to the slave storage node according to the data information queue to be sent, and sending a deferred sending message aiming at the target data to the slave storage node.
Specifically, in this embodiment, the master storage node may specifically delete the data information of the abnormal data from the to-be-sent data information queue, and further send the relevant data (all data without the abnormal data at this time) to the slave storage node according to the to-be-sent data information queue after the deletion operation is performed, so as to recover part of the data from the slave storage node. Meanwhile, for data that is abnormal, i.e., target data, a suspend-to-send message may be sent to the slave storage node.
As a specific embodiment, the data recovery method in the distributed storage system provided in the embodiment of the present application, on the basis of the foregoing content, after determining whether a target slave storage node storing a complete backup of target data exists in a redundant backup group, further includes:
and if the target slave storage node does not exist, marking the target data as unrecoverable data and generating corresponding prompt information.
Specifically, if there is no complete backup of the target data in the entire redundant backup group, the primary storage node will determine that the target data is "lost forever" or "unrecoverable" at this time. Thus, the primary storage node may mark it as unrecoverable data and prompt the user with a prompt.
Referring to fig. 2, an embodiment of the present application discloses a data recovery apparatus in a distributed storage system, where a plurality of storage nodes in the distributed storage system, which are used for performing redundant storage on the same data, are used as a redundant backup group; the data recovery device is applied to the main storage nodes in the redundant backup group and comprises:
a request receiving module 201, configured to receive a data acquisition request sent from a storage node in a redundant backup group;
the data reading module 202 is used for locally reading each data required to be sent to the slave storage node;
an anomaly determination module 203, configured to determine whether each data is anomalous, so as to determine the anomalous data as target data and suspend sending of the target data;
the data query module 204 is configured to determine whether a target secondary storage node storing a complete backup of the target data exists in the redundant backup group;
and the data recovery module 205 is configured to, when there is a target secondary storage node, perform local data recovery based on the complete backup of the target data, and send the target data after local recovery to the secondary storage node.
As a specific embodiment, in the data recovery apparatus in a distributed storage system provided in the embodiment of the present application, based on the foregoing, the data reading module 202 specifically includes:
the queue generating unit is used for generating a data information queue to be sent; the data information queue to be sent stores the data information of each data to be sent to the slave storage node;
and the data reading unit is used for reading corresponding data from the local according to the data information queue to be sent.
As a specific embodiment, in the data recovery apparatus in the distributed storage system provided in the embodiment of the present application, on the basis of the foregoing content, the queue generating unit is specifically configured to:
comparing the local data change record with the data change record of the slave storage node; and determining data information of each data required to be sent to the slave storage node according to the comparison result so as to generate a data information queue to be sent.
As a specific embodiment, on the basis of the foregoing content, the data recovery apparatus in the distributed storage system provided in the embodiment of the present application, the abnormality determining module 203 specifically includes:
a determination unit configured to determine the abnormal data as target data;
the deleting unit is used for deleting the data information of the target data from the data information queue to be sent;
and the sending unit is used for sending the data without abnormity to the slave storage node according to the data information queue to be sent and sending a deferred sending message aiming at the target data to the slave storage node.
As a specific embodiment, the data recovery apparatus in a distributed storage system provided in the embodiment of the present application, based on the foregoing, further includes:
and a data error reporting module, configured to, after the data querying module 204 determines whether a target secondary storage node storing a complete backup of the target data exists in the redundant backup group, mark the target data as unrecoverable data if the target secondary storage node does not exist, and generate corresponding prompt information.
For specific contents of the data recovery apparatus in the distributed storage system, reference may be made to the foregoing detailed description of the data recovery method in the distributed storage system, and details thereof are not repeated here.
Therefore, according to the data recovery device in the distributed storage system, the main storage node can firstly judge whether the data is abnormal before sending the data read from the local to the auxiliary storage node, and suspend sending of the abnormal data, so that unnecessary hanging-up and offline of the auxiliary storage node receiving the abnormal data are effectively avoided, the service operation pressure is prevented from being intensified, and the stability and the sustainability of the storage service are improved; according to the method and the device, local data recovery is tried to be carried out based on other slave storage nodes, so that the recovered data are sent to the slave storage nodes after the local data are recovered, an error is prevented from being reported immediately under the condition that only the local data are abnormal, a data recovery mechanism is reasonably optimized, and the success rate of data recovery is effectively improved.
Referring to fig. 3, an embodiment of the present application discloses another method for recovering data in a distributed storage system.
The method is specifically applied to slave storage nodes in a redundant backup group. Specifically, a group-in-group service, i.e., a PG service, for maintaining data consistency in the redundant backup group may be deployed on each storage node. A staging group service deployed on a slave storage node in a redundant backup group may be used to implement the data recovery method.
The data recovery method in the distributed storage system provided by the embodiment of the application mainly comprises the following steps:
s301: and when the data updating requirement occurs, sending a data acquisition request to the main storage node in the redundant backup group.
Specifically, when the storage node is on-line again after being off-hook, or some data is found to be damaged, the data updating requirement occurs.
S302: and after the main storage node reads and judges whether each data needing to be sent is abnormal from the local and determines the abnormal data as target data, the data without the abnormality sent by the main storage node is received.
S303: after the main storage node judges whether a target slave storage node storing the complete backup of the target data exists in the redundant backup group, if the target slave storage node exists, the main storage node receives the target data sent by the main storage node after performing local data recovery based on the complete backup of the target data.
For the specific content of the data recovery method applied to the slave storage nodes in the distributed storage system, reference may be made to the foregoing detailed description of the data recovery method applied to the master storage nodes in the distributed storage system, and details are not repeated here.
Therefore, in the data recovery method in the distributed storage system provided by the application, the main storage node firstly judges whether the data is abnormal before sending the data read from the local to the auxiliary storage node, and suspends the sending of the abnormal data, thereby effectively avoiding unnecessary on-hook and off-line of the auxiliary storage node receiving the abnormal data, preventing the aggravation of business operation pressure, and improving the stability and sustainability of storage service; according to the method and the device, local data recovery is tried to be carried out based on other slave storage nodes, so that the recovered data are sent to the slave storage nodes after the local data are recovered, an error is prevented from being reported immediately under the condition that only the local data are abnormal, a data recovery mechanism is reasonably optimized, and the success rate of data recovery is effectively improved.
Referring to fig. 4, an embodiment of the present application discloses another data recovery apparatus in a distributed storage system, where a plurality of storage nodes in the distributed storage system for performing redundant storage on the same data are used as a redundant backup group; the data recovery device is applied to the slave storage nodes in the redundant backup group and comprises:
a data request module 401, configured to send a data obtaining request to a main storage node in a redundant backup group when a data update requirement occurs;
a data receiving module 402, configured to receive data that is sent by a main storage node and has no exception after the main storage node locally reads and determines whether each data that needs to be sent is abnormal and determines the abnormal data as target data; after the main storage node judges whether a target slave storage node storing the complete backup of the target data exists in the redundant backup group, if the target slave storage node exists, the main storage node receives the target data sent by the main storage node after performing local data recovery based on the complete backup of the target data.
Therefore, in the data recovery device in the distributed storage system provided by the application, the main storage node firstly judges whether the data is abnormal before sending the data read from the local to the auxiliary storage node, and suspends the sending of the abnormal data, thereby effectively avoiding unnecessary on-hook and off-line of the auxiliary storage node receiving the abnormal data, preventing the aggravation of business operation pressure, and improving the stability and sustainability of storage service; according to the method and the device, local data recovery is tried to be carried out based on other slave storage nodes, so that the recovered data are sent to the slave storage nodes after the local data are recovered, an error is prevented from being reported immediately under the condition that only the local data are abnormal, a data recovery mechanism is reasonably optimized, and the success rate of data recovery is effectively improved.
Furthermore, the application also discloses a distributed storage system, wherein a plurality of storage nodes for performing redundant storage on the same data in the distributed storage system are used as a redundant backup group, and a homing group service for maintaining the data consistency in the redundant backup group is deployed on each storage node;
the slave storage nodes in the redundant backup group are used for sending data acquisition requests to the master storage nodes in the redundant backup group through the homing group service when the data updating requirement occurs;
the main storage node is used for reading and judging whether each data needing to be sent to the auxiliary storage nodes is abnormal from the local through the homing group service, determining the abnormal data as target data, suspending the sending of the target data, judging whether a target auxiliary storage node storing the complete backup of the target data exists in the redundancy backup group, if so, performing local data recovery based on the complete backup of the target data, and sending the target data after local recovery to the auxiliary storage node.
Further, the present application also discloses a computer-readable storage medium, in which a computer program is stored, and the computer program is used for implementing the steps of the data recovery method in any one of the distributed storage systems as described above when being executed by a processor.
For the details of the distributed storage system and the computer-readable storage medium, reference may be made to the foregoing detailed description of the data recovery method in the distributed storage system, and details thereof are not repeated here.
The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the equipment disclosed by the embodiment, the description is relatively simple because the equipment corresponds to the method disclosed by the embodiment, and the relevant parts can be referred to the method part for description.
It is further noted that, throughout this document, relational terms such as "first" and "second" are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Furthermore, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The technical solutions provided by the present application are described in detail above. The principles and embodiments of the present application are explained herein using specific examples, which are provided only to help understand the method and the core idea of the present application. It should be noted that, for those skilled in the art, without departing from the principle of the present application, several improvements and modifications can be made to the present application, and these improvements and modifications also fall into the protection scope of the present application.
Claims (10)
1. A data recovery method in a distributed storage system is characterized in that a plurality of storage nodes for performing redundant storage on the same data in the distributed storage system are used as a redundant backup group; the method is applied to the main storage node in the redundant backup group and comprises the following steps:
receiving a data acquisition request sent from a storage node in a redundant backup group;
reading each data needing to be sent to the slave storage node from the local;
judging whether each data is abnormal or not so as to determine the abnormal data as target data and suspend the sending of the target data;
judging whether a target slave storage node storing the complete backup of the target data exists in the redundant backup group or not;
and if the target secondary storage node exists, performing local data recovery based on the complete backup of the target data, and sending the target data after local recovery to the secondary storage node.
2. The data recovery method of claim 1, wherein the locally reading each data that needs to be sent to the slave storage node comprises:
generating a data information queue to be sent; the data information queue to be sent stores the data information of each data to be sent to the slave storage node;
and reading corresponding data from the local according to the data information queue to be sent.
3. The data recovery method of claim 2, wherein the generating a queue of data information to be sent comprises:
comparing the local data change record with the data change record of the slave storage node;
and determining data information of each data required to be sent to the slave storage node according to the comparison result so as to generate the data information queue to be sent.
4. The data recovery method according to claim 2, wherein the determining abnormal data as target data and suspending transmission of the target data includes:
determining abnormal data as the target data;
deleting the data information of the target data from the data information queue to be sent;
and sending the data without abnormity to the slave storage node according to the data information queue to be sent, and sending a deferred sending message aiming at the target data to the slave storage node.
5. The data recovery method according to any one of claims 1 to 4, further comprising, after the determining whether there is a target slave storage node in the redundant backup group that stores a full backup of the target data,:
and if the target slave storage node does not exist, marking the target data as unrecoverable data and generating corresponding prompt information.
6. A data recovery method in a distributed storage system is characterized in that a plurality of storage nodes for performing redundant storage on the same data in the distributed storage system are used as a redundant backup group; the method is applied to the slave storage nodes in the redundant backup group and comprises the following steps:
when a data updating requirement occurs, sending a data acquisition request to a main storage node in a redundancy backup group;
after the main storage node reads and judges whether each data needing to be sent is abnormal from the local and determines the abnormal data as target data, the abnormal data sent by the main storage node is received;
after the main storage node judges whether a target secondary storage node storing the complete backup of the target data exists in the redundant backup group, if the target secondary storage node exists, the main storage node receives the target data sent by the main storage node after performing local data recovery based on the complete backup of the target data.
7. A data recovery device in a distributed storage system is characterized in that a plurality of storage nodes for performing redundant storage on the same data in the distributed storage system are used as a redundant backup group; the data recovery device is applied to the main storage node in the redundancy backup group and comprises:
the request receiving module is used for receiving a data acquisition request sent from a storage node in the redundant backup group;
the data reading module is used for locally reading each data needing to be sent to the slave storage node;
the abnormality judgment module is used for judging whether each data is abnormal or not so as to determine the abnormal data as target data and suspend the sending of the target data;
the data query module is used for judging whether a target slave storage node storing the complete backup of the target data exists in the redundant backup group or not;
and the data recovery module is used for performing local data recovery based on the complete backup of the target data and sending the target data after local recovery to the slave storage node when the target slave storage node exists.
8. A data recovery device in a distributed storage system is characterized in that a plurality of storage nodes for performing redundant storage on the same data in the distributed storage system are used as a redundant backup group; the data recovery device is applied to the slave storage nodes in the redundant backup group and comprises:
the data request module is used for sending a data acquisition request to the main storage node in the redundant backup group when the data updating requirement occurs;
the data receiving module is used for receiving the data which are not abnormal and sent by the main storage node after the main storage node reads from the local and judges whether each data which needs to be sent is abnormal or not and determines the abnormal data as target data; after the main storage node judges whether a target secondary storage node storing the complete backup of the target data exists in the redundant backup group, if the target secondary storage node exists, the main storage node receives the target data sent by the main storage node after performing local data recovery based on the complete backup of the target data.
9. A distributed storage system is characterized in that a plurality of storage nodes for performing redundant storage on the same data in the distributed storage system are used as a redundant backup group, and a grouping service for maintaining the data consistency in the redundant backup group is deployed on each storage node;
the slave storage nodes in the redundant backup group are used for sending data acquisition requests to the master storage nodes in the redundant backup group through the homing group service when the data updating requirement occurs;
the main storage node is used for locally reading and judging whether each data needing to be sent to the auxiliary storage node is abnormal or not through the homing group service, determining the abnormal data as target data and suspending the sending of the target data, judging whether a target auxiliary storage node storing a complete backup of the target data exists in the redundant backup group or not, if the target auxiliary storage node exists, performing local data recovery based on the complete backup of the target data, and sending the target data after local recovery to the auxiliary storage node.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, is adapted to carry out the steps of the method for data recovery in a distributed storage system according to any one of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911402944.0A CN111176900A (en) | 2019-12-30 | 2019-12-30 | Distributed storage system and data recovery method, device and medium thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911402944.0A CN111176900A (en) | 2019-12-30 | 2019-12-30 | Distributed storage system and data recovery method, device and medium thereof |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111176900A true CN111176900A (en) | 2020-05-19 |
Family
ID=70654232
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911402944.0A Pending CN111176900A (en) | 2019-12-30 | 2019-12-30 | Distributed storage system and data recovery method, device and medium thereof |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111176900A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112527561A (en) * | 2020-12-09 | 2021-03-19 | 广州技象科技有限公司 | Data backup method and device based on Internet of things cloud storage |
CN113791922A (en) * | 2021-07-30 | 2021-12-14 | 济南浪潮数据技术有限公司 | Exception handling method, system and device for distributed storage system |
CN115328880A (en) * | 2022-10-13 | 2022-11-11 | 浙江智臾科技有限公司 | Distributed file online recovery method, system, computer equipment and storage medium |
CN116662081A (en) * | 2023-08-01 | 2023-08-29 | 苏州浪潮智能科技有限公司 | Distributed storage redundancy method and device, electronic equipment and storage medium |
Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101826040A (en) * | 2009-03-03 | 2010-09-08 | 闪联信息技术工程中心有限公司 | Method and system for automatically detecting and restoring memory equipment |
CN102567438A (en) * | 2010-09-28 | 2012-07-11 | 迈塔斯威士网络有限公司 | Method for providing access to data items from a distributed storage system |
JP2013009184A (en) * | 2011-06-24 | 2013-01-10 | Nippon Telegr & Teleph Corp <Ntt> | Time synchronous system |
CN102984009A (en) * | 2012-12-06 | 2013-03-20 | 北京邮电大学 | Disaster recovery backup method for VoIP (Voice overInternet Protocol) system based on P2P |
US20140245073A1 (en) * | 2013-02-22 | 2014-08-28 | International Business Machines Corporation | Managing error logs in a distributed network fabric |
CN104572339A (en) * | 2013-10-17 | 2015-04-29 | 捷达世软件(深圳)有限公司 | Data backup restoring system and method based on distributed file system |
CN106406758A (en) * | 2016-09-05 | 2017-02-15 | 华为技术有限公司 | Data processing method based on distributed storage system, and storage equipment |
CN107544862A (en) * | 2016-06-29 | 2018-01-05 | 中兴通讯股份有限公司 | A kind of data storage reconstructing method and device, memory node based on correcting and eleting codes |
CN107870829A (en) * | 2016-09-24 | 2018-04-03 | 华为技术有限公司 | A kind of distributed data restoration methods, server, relevant device and system |
US20180150521A1 (en) * | 2016-11-28 | 2018-05-31 | Sap Se | Distributed joins in a distributed database system |
WO2018098972A1 (en) * | 2016-11-30 | 2018-06-07 | 华为技术有限公司 | Log recovery method, storage device and storage node |
CN109669929A (en) * | 2018-12-14 | 2019-04-23 | 江苏瑞中数据股份有限公司 | Method for storing real-time data and system based on distributed parallel database |
US20190163374A1 (en) * | 2017-11-28 | 2019-05-30 | Entit Software Llc | Storing data objects using different redundancy schemes |
-
2019
- 2019-12-30 CN CN201911402944.0A patent/CN111176900A/en active Pending
Patent Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101826040A (en) * | 2009-03-03 | 2010-09-08 | 闪联信息技术工程中心有限公司 | Method and system for automatically detecting and restoring memory equipment |
CN102567438A (en) * | 2010-09-28 | 2012-07-11 | 迈塔斯威士网络有限公司 | Method for providing access to data items from a distributed storage system |
JP2013009184A (en) * | 2011-06-24 | 2013-01-10 | Nippon Telegr & Teleph Corp <Ntt> | Time synchronous system |
CN102984009A (en) * | 2012-12-06 | 2013-03-20 | 北京邮电大学 | Disaster recovery backup method for VoIP (Voice overInternet Protocol) system based on P2P |
US20140245073A1 (en) * | 2013-02-22 | 2014-08-28 | International Business Machines Corporation | Managing error logs in a distributed network fabric |
CN104572339A (en) * | 2013-10-17 | 2015-04-29 | 捷达世软件(深圳)有限公司 | Data backup restoring system and method based on distributed file system |
CN107544862A (en) * | 2016-06-29 | 2018-01-05 | 中兴通讯股份有限公司 | A kind of data storage reconstructing method and device, memory node based on correcting and eleting codes |
CN106406758A (en) * | 2016-09-05 | 2017-02-15 | 华为技术有限公司 | Data processing method based on distributed storage system, and storage equipment |
CN107870829A (en) * | 2016-09-24 | 2018-04-03 | 华为技术有限公司 | A kind of distributed data restoration methods, server, relevant device and system |
US20180150521A1 (en) * | 2016-11-28 | 2018-05-31 | Sap Se | Distributed joins in a distributed database system |
WO2018098972A1 (en) * | 2016-11-30 | 2018-06-07 | 华为技术有限公司 | Log recovery method, storage device and storage node |
US20190163374A1 (en) * | 2017-11-28 | 2019-05-30 | Entit Software Llc | Storing data objects using different redundancy schemes |
CN109669929A (en) * | 2018-12-14 | 2019-04-23 | 江苏瑞中数据股份有限公司 | Method for storing real-time data and system based on distributed parallel database |
Non-Patent Citations (2)
Title |
---|
胡至洵: "面向分布式文件存储系统的数据恢复策略", 《能源与环保》 * |
赵正文, 电子科技大学出版社 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112527561A (en) * | 2020-12-09 | 2021-03-19 | 广州技象科技有限公司 | Data backup method and device based on Internet of things cloud storage |
CN112527561B (en) * | 2020-12-09 | 2021-10-01 | 广州技象科技有限公司 | Data backup method and device based on Internet of things cloud storage |
CN113791922A (en) * | 2021-07-30 | 2021-12-14 | 济南浪潮数据技术有限公司 | Exception handling method, system and device for distributed storage system |
CN113791922B (en) * | 2021-07-30 | 2024-02-20 | 济南浪潮数据技术有限公司 | Exception handling method, system and device for distributed storage system |
CN115328880A (en) * | 2022-10-13 | 2022-11-11 | 浙江智臾科技有限公司 | Distributed file online recovery method, system, computer equipment and storage medium |
CN115328880B (en) * | 2022-10-13 | 2023-03-24 | 浙江智臾科技有限公司 | Distributed file online recovery method, system, computer equipment and storage medium |
CN116662081A (en) * | 2023-08-01 | 2023-08-29 | 苏州浪潮智能科技有限公司 | Distributed storage redundancy method and device, electronic equipment and storage medium |
CN116662081B (en) * | 2023-08-01 | 2024-02-27 | 苏州浪潮智能科技有限公司 | Distributed storage redundancy method and device, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111176900A (en) | Distributed storage system and data recovery method, device and medium thereof | |
US7793060B2 (en) | System method and circuit for differential mirroring of data | |
JP4301849B2 (en) | Information processing method and its execution system, its processing program, disaster recovery method and system, storage device for executing the processing, and its control processing method | |
US7509468B1 (en) | Policy-based data protection | |
JP4796854B2 (en) | Measures against data overflow of intermediate volume in differential remote copy | |
JP4744171B2 (en) | Computer system and storage control method | |
CN106951559B (en) | Data recovery method in distributed file system and electronic equipment | |
US11892922B2 (en) | State management methods, methods for switching between master application server and backup application server, and electronic devices | |
CN106776130B (en) | Log recovery method, storage device and storage node | |
US6654771B1 (en) | Method and system for network data replication | |
US20070208918A1 (en) | Method and apparatus for providing virtual machine backup | |
JP2004334574A (en) | Operation managing program and method of storage, and managing computer | |
JP2009015476A (en) | Journal management method in cdp remote configuration | |
CN110597779A (en) | Data reading and writing method in distributed file system and related device | |
KR101605455B1 (en) | Method for Replicationing of Redo Log without Data Loss and System Thereof | |
JP5154843B2 (en) | Cluster system, computer, and failure recovery method | |
JP2008276281A (en) | Data synchronization system, method, and program | |
CN112256201B (en) | Distributed block storage system and volume information management method thereof | |
CN113297134B (en) | Data processing system, data processing method and device, and electronic device | |
JP2004185573A (en) | Data writing method and device | |
JP4721057B2 (en) | Data management system, data management method, and data management program | |
JP2004078437A (en) | Method and system for duplexing file system management information | |
CN111756562B (en) | Cluster takeover method, system and related components | |
CN113076065B (en) | Data output fault tolerance method in high-performance computing system | |
CN117170937A (en) | Data operation request processing method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200519 |
|
RJ01 | Rejection of invention patent application after publication |