CN111625402A - Data recovery method and device, electronic equipment and computer readable storage medium - Google Patents

Data recovery method and device, electronic equipment and computer readable storage medium Download PDF

Info

Publication number
CN111625402A
CN111625402A CN202010471706.1A CN202010471706A CN111625402A CN 111625402 A CN111625402 A CN 111625402A CN 202010471706 A CN202010471706 A CN 202010471706A CN 111625402 A CN111625402 A CN 111625402A
Authority
CN
China
Prior art keywords
recovered
copy
recovery
data
copies
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010471706.1A
Other languages
Chinese (zh)
Inventor
甘红星
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Kingsoft Cloud Network Technology Co Ltd
Original Assignee
Beijing Kingsoft Cloud Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Kingsoft Cloud Network Technology Co Ltd filed Critical Beijing Kingsoft Cloud Network Technology Co Ltd
Priority to CN202010471706.1A priority Critical patent/CN111625402A/en
Publication of CN111625402A publication Critical patent/CN111625402A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1464Management of the backup or restore process for networked environments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's

Abstract

The application provides a data recovery method, a data recovery device, electronic equipment and a computer-readable storage medium, which relate to the technical field of data processing, wherein when detecting that one or more copies of data are missing, the method selects the missing one or more copies as to-be-recovered copies; and then generating a recovery task command, wherein the recovery task command comprises the ID of each copy to be recovered and the information of the destination node device of each copy to be recovered. And sending the task recovery command to target storage equipment, so that the target storage equipment recovers each copy to be recovered according to the task recovery command, and writes each recovered copy to be recovered into the corresponding target node equipment according to the information of the target node equipment of each copy to be recovered. The method reduces unnecessary resource waste, shortens the data recovery time and improves the data security.

Description

Data recovery method and device, electronic equipment and computer readable storage medium
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a data recovery method and apparatus, an electronic device, and a computer-readable storage medium.
Background
In a distributed storage system, two of the most common data redundancy techniques are multi-copy policy and Erasure Code (EC). Compared with a multi-copy strategy, the erasure code has the advantages of lower redundancy, higher disk utilization rate and the like. The erasure code technology mainly encodes an original data block through a related erasure code algorithm to obtain a check block, and stores the data block and the check block together to achieve the purpose of fault tolerance.
In the current distributed storage system, when the copy is lost, a recovery mechanism of the copy is triggered. The management node will recover the lost copies one by one when the number of copies does not reach the threshold number. For multiple copies, if one copy is lost, one recovery is triggered; two copies are lost, and two times of recovery can be triggered in sequence; multiple copies are lost, triggering multiple recoveries in turn.
Therefore, in the recovery process of the existing EC copies, when a plurality of copies are recovered, a plurality of recovery tasks need to be triggered, which wastes processor and network bandwidth, and the recovery takes longer time for a plurality of times, thereby reducing the data security.
Disclosure of Invention
The invention aims to provide a data recovery method, a data recovery device, an electronic device and a computer readable storage medium, so as to solve the technical problem that the efficiency of the conventional distributed storage system is low in the recovery process of a plurality of EC copies.
In a first aspect, an embodiment of the present invention provides a data recovery method, which is applied to a management device in a distributed object storage system, and the method includes:
when detecting that one or more copies of the data are missing, selecting the missing one or more copies as the copies to be recovered;
generating a recovery task command, wherein the recovery task command comprises the ID of each copy to be recovered and the information of the destination node equipment of each copy to be recovered;
and sending the task recovery command to target storage equipment, so that the target storage equipment recovers each copy to be recovered according to the task recovery command, and writes each recovered copy to be recovered into the corresponding target node equipment according to the information of the target node equipment of each copy to be recovered.
In some embodiments, the target storage device is one of the destination node devices,
the sending the task recovery command to the target storage device to enable the target storage device to recover each copy to be recovered according to the task recovery command includes:
when detecting that the data lacks a plurality of copies, sending the recovery task command to any destination node device, so that the any destination node device recovers the plurality of copies to be recovered, and writing the recovered copies to be recovered into the destination node device corresponding to the recovered copies according to the information of the destination node device of each copy to be recovered.
In some embodiments, the target storage device includes a storage unit,
the sending the task recovery command to a target storage device to enable the target storage device to recover each copy to be recovered according to the task recovery command includes:
and sending the recovery task command to a storage unit of any one of the target storage devices to recover each copy to be recovered in the storage unit of the target storage device, and writing the recovered copy to be recovered into the corresponding target node device according to the information of the target node device of each copy to be recovered.
In some embodiments, after the step of sending the resume task command to the target storage device to cause the target storage device to resume each copy to be resumed according to the resume task command, the method further includes:
and receiving a complete recovery message from the target storage device, and updating the state information of each copy to be recovered.
In a second aspect, an embodiment of the present application further provides a data recovery method, which is applied to a storage device in a distributed object storage system, and the method includes:
receiving a recovery task command from a management device, wherein the recovery task command comprises one or more IDs (identity) of copies to be recovered and information of destination node devices of each copy to be recovered;
obtaining data of each copy to be recovered through erasure code decoding;
and writing the data of each copy to be recovered into the corresponding destination node equipment according to the information of the destination node equipment of each copy to be recovered.
In some embodiments, the storage device is one of the destination node devices;
the writing the data of each copy to be restored into the destination node device corresponding to the copy to be restored according to the information of the destination node device of each copy to be restored comprises:
and forming each recovered copy to be recovered according to the data of each copy to be recovered, and writing the recovered copy to be recovered into the corresponding destination node device according to the information of the destination node device of each copy to be recovered.
In some embodiments, the storage device includes a storage unit,
the obtaining the data of each copy to be recovered through erasure code decoding includes:
and acquiring data of each copy to be recovered through erasure code decoding, and recovering each copy to be recovered in the storage unit.
In some embodiments, after the step of writing the data of each of the copies to be restored into the corresponding destination node device, the method further includes:
and sending a recovery completion message to the management equipment, so that the management equipment updates the state information of each copy to be recovered.
In a third aspect, an embodiment of the present application further provides a data recovery apparatus, which is applied to a management device in a distributed object storage system, and the apparatus includes:
the to-be-recovered copy selection module is used for selecting one or more missing copies as to-be-recovered copies when detecting that one or more missing copies of the data are detected;
a recovery task command generating module, configured to generate a recovery task command, where the recovery task command includes an ID of each to-be-recovered copy and information of a destination node device of each to-be-recovered copy;
and the to-be-recovered copy recovery module is used for sending the recovery task command to target storage equipment, so that the target storage equipment recovers each to-be-recovered copy according to the recovery task command, and writes each recovered to-be-recovered copy into the corresponding target node equipment according to the information of the target node equipment of each to-be-recovered copy.
In a fourth aspect, an embodiment of the present application further provides a data recovery apparatus, which is applied to a storage device in a distributed object storage system, and the apparatus includes:
a recovery task command receiving module, configured to receive a recovery task command from a management device, where the recovery task command includes IDs of one or more to-be-recovered copies and information of a destination node device of each to-be-recovered copy;
the to-be-recovered copy data acquisition module is used for acquiring the data of each to-be-recovered copy through erasure code decoding;
and the data writing module of the to-be-recovered copy is used for writing the data of each to-be-recovered copy into the corresponding destination node equipment according to the information of the destination node equipment of each to-be-recovered copy.
In a fifth aspect, an embodiment of the present application further provides an electronic device, which includes a memory and a processor, where the memory stores a computer program that is executable on the processor, and the processor implements the method of the first aspect or the second aspect when executing the computer program.
In a sixth aspect, embodiments of the present application further provide a computer-readable storage medium storing machine-executable instructions that, when invoked and executed by a processor, cause the processor to perform the method of the first or second aspect.
The embodiment of the application brings the following beneficial effects:
when the data recovery method is applied to management equipment in a distributed object storage system, when detecting that one or more copies of data are missing, selecting the missing one or more copies as to-be-recovered copies; and then generating a recovery task command, wherein the recovery task command comprises the ID of each copy to be recovered and the information of the destination node device of each copy to be recovered. And sending the task recovery command to the target storage equipment, so that the target storage equipment recovers each copy to be recovered according to the task recovery command, and writing the recovered copy to be recovered into the corresponding target node equipment according to the information of the target node equipment of each copy to be recovered. Receiving a recovery task command from a management device in a storage device in the distributed object storage system, wherein the recovery task command comprises the ID of one or more copies to be recovered and the information of a destination node device of each copy to be recovered; then, obtaining the data of each copy to be recovered through erasure code decoding; and finally, writing the data of each copy to be recovered into the corresponding destination node equipment according to the information of the destination node equipment of each copy to be recovered. When the node equipment in the method detects the recovery task, the recovery of one or more nodes can be realized only by issuing the recovery task once, all lost copies are restored and recovered to the specified one or more nodes in one EC decoding process, unnecessary resource waste is reduced, the data recovery time is shorter, and the data safety is improved.
In order to make the aforementioned objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the detailed description of the present application or the technical solutions in the prior art, the drawings needed to be used in the detailed description of the present application or the prior art description will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a flowchart of a data recovery method applied to a management device in a distributed object storage system according to an embodiment of the present application;
fig. 2 is a flowchart of a data recovery method applied to a storage device in a distributed object storage system according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of a data recovery apparatus applied to a management device in a distributed object storage system according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of a data recovery method applied to a storage device in a distributed object storage system according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Icon:
310-a to-be-restored copy selection module; 320-resume task Command Generation Module; 330-a to-be-restored copy restoring module; 410-resume task Command receiving Module; 420-a to-be-recovered replica data acquisition module; 430-copy data to be restored writing module; 500-an electronic device; 501-a memory; 502-a processor; 503-bus; 504-communication interface.
Detailed Description
To make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions of the present application will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The terms "comprising" and "having," and any variations thereof, as referred to in the embodiments of the present application, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements but may alternatively include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The object storage system can store arbitrary objects in a persistent, robust, and highly available system, and users and applications can access data in the object storage using a simple API (Application Programming Interface). The object storage system basically guarantees the safety of data by storing a plurality of copies on a plurality of disks, and the data are consistent among the plurality of copies.
Two of the most common data redundancy techniques in a distributed storage system are multi-copy policy and erasure codes, which have higher disk utilization than multi-copy policy. The erasure code technology mainly encodes an original data block through an erasure code algorithm to obtain a check block, and stores the data block and the check block together to achieve the purpose of fault tolerance. The basic idea is that M original data blocks are calculated to obtain N check blocks, and for the M + N blocks, when any N elements are lost, the original data can be recovered through a corresponding algorithm. The process of generating the check block is called encoding (ENCODE) and the process of recovering the missing data block is called decoding (DECODE). Compared with the multiple methods, the method based on the erasure codes has the advantages of low redundancy, high disk utilization rate and the like.
When the copies are missing, the recovery of the copies is triggered, and the management node of the system in the prior art can recover the missing copies one by one when the number of the copies does not reach the specified number. For example, for three copies, loss of one copy triggers a recovery; two copies are lost, and two times of recovery can be triggered in sequence; the loss of three copies triggers three recoveries in turn.
Therefore, in the recovery process of the existing EC copies, when a plurality of copies are recovered, a plurality of recovery tasks need to be triggered, which wastes processor and network bandwidth, and the recovery takes longer time for a plurality of times, thereby reducing the data security.
The embodiment of the application provides a data recovery method and device, electronic equipment and a computer readable storage medium. The method can solve the technical problem of low efficiency in the process of recovering a plurality of EC copies in the conventional distributed storage system.
Embodiments of the present invention are further described below with reference to the accompanying drawings.
Fig. 1 is a schematic flowchart of a data recovery method according to an embodiment of the present application. The method is applied to management equipment in a distributed object storage system, and comprises the following steps:
s110, when detecting that one or more copies of the data are missing, selecting the missing one or more copies as the to-be-recovered copies.
In the distributed storage system in the step, the original data block can be coded by using an erasure code related algorithm by using an erasure code EC technology to obtain a check block, and the obtained check block and the data block are simultaneously stored to achieve the purpose of fault tolerance.
For example, the distributed storage system calculates M blocks of original data to be stored to obtain N blocks of check blocks. For the M + N blocks, when any N block elements are lost, the original data can be restored through a corresponding algorithm.
Assuming that the management device in the distributed object storage system monitors the data blocks in an EC manner, the monitoring process may be implemented in a monitoring manner. The process of snooping may be a periodic loop detection, and when detecting that one or more copies of data start to be lost, one or more of the lost copies of data are selected as the copies to be recovered.
In the above process, the selected copy to be recovered may be all lost copies, or may be a part of these copies.
And S120, generating a recovery task command, wherein the recovery task command comprises the ID of each copy to be recovered and the information of the destination node device of each copy to be recovered.
The generation of the recovery task command is realized through the management equipment in the distributed object storage system, and the obtained recovery task command needs to traverse the to-be-recovered copies, so that the relevant attribute of each to-be-recovered copy is obtained.
Specifically, the recovery task command includes the ID of each copy to be recovered, and the destination node device of each copy to be recovered. The ID (Identity document) value is used as a unique measurement parameter of the copy to be restored, and is a unique numerical value, and the corresponding copy to be restored can be measured by the ID value.
The target node is associated with the copy to be recovered, the association process is realized in a mapping mode, and the specific realization process is specified through the management equipment in the distributed object storage system to complete the association process.
S130, sending the task recovery command to the target storage device, enabling the target storage device to recover each copy to be recovered according to the task recovery command, and writing each recovered copy to be recovered into the corresponding target node device according to the information of the target node device of each copy to be recovered.
And the management equipment in the distributed object storage system sends a recovery command to the target storage equipment, wherein the recovery task command comprises the ID of each copy to be recovered and the information of the corresponding target node equipment, so that the corresponding copy to be recovered is found through the ID, and each recovered copy to be recovered is written into the corresponding target node equipment according to the information of the target node equipment of each copy to be recovered. In the process, as the destination node device corresponds to the copy to be restored, the restoration of other copies to be restored is completed through one restoration process and the corresponding destination node device.
The embodiment can know that when the node device in the method detects the recovery task, the recovery of one or more nodes can be realized only by issuing the recovery task once, so that all lost copies are restored and recovered to the specified one or more nodes in one decoding process, unnecessary resource waste is reduced, the data recovery time is shorter, and the data security is improved.
In some embodiments, the target storage device is one of the destination node devices.
When the management node finds that the data needs to be restored, if the EC copy lacks N copies, the N nodes meeting the requirements are directly selected as destination nodes, wherein the target storage device can be one of destination node devices. In the step S130, when it is detected that the data lacks multiple copies, the recovery task command is sent to any destination node device, so that the any destination node device recovers multiple copies to be recovered, and the recovery task is issued to one destination node, and mapping information of each node and the copy ID is carried. And after receiving the task recovery command, the target node is responsible for requesting the existing copy nodes and carrying out EC decoding to obtain N missing copy blocks, and according to the information of the target node equipment of each copy to be recovered, writing the recovered copy to be recovered into the corresponding target node equipment until the copy is successfully recovered and feeding back the copy to the management node.
In some embodiments, the target storage device includes a storage unit, and in step S130, the management node sends a recovery task command to any storage unit of the target storage device to recover each copy to be recovered in the storage unit of the target storage device. And writing the recovered copy to be recovered into the corresponding destination node equipment according to the information of the destination node equipment of each copy to be recovered.
In some embodiments, after the step S130 of sending the resume task command to the target storage device, and enabling the target storage device to resume each copy to be resumed according to the resume task command, the method further includes:
and receiving a complete recovery message from the target storage device, and updating the state information of each copy to be recovered.
When the target storage device completes the recovery, the target management device in the distributed object storage system is informed that the recovery of the copy data is completed. At this time, the management device may receive a recovery completion message from the target storage device, where the message includes the related attribute information of each to-be-recovered copy, and the recovery completion message may include: the ID of each copy to be recovered, and the destination node equipment and other related attribute data of each copy to be recovered. And after the recovery completion message from the target storage device is acquired, updating the state information of each corresponding copy to be recovered, thereby completing the updating process.
In some embodiments, before the step S110 of selecting the missing one or more copies as the to-be-restored copies when the missing one or more copies of the data is detected, the data restoring method further includes:
and detecting whether the copy in the distributed object storage system is lost or not according to a preset period.
In the specific implementation process, the means for detecting whether the copy in the distributed object storage system is missing can be realized by adopting event monitoring. For example, a related file listener may be employed to listen for copies in the storage system.
In the process of monitoring the copy, the path where the copy is located needs to be used as an input parameter and input to a related file monitor, and when monitoring is executed, the file monitor monitors the states of all files and folders contained in the input path, so that the changed state of the file can be acquired in real time. The events that can be monitored mainly include: listening for reading data from a file, listening for editing data from a file, listening for whether metadata (rights, owner, timestamp) in a file has been changed, listening for opening a file or directory to write and closing it, listening for opening a file or directory but not editing and closing it, listening for whether a file or directory has been opened, listening for whether a file or sub-directory has been removed from a monitored directory, listening for whether a file or sub-directory has been created under a monitored directory, listening for whether a file has been deleted from a monitored directory, monitoring stops when a monitored file or directory is deleted, monitoring continues when a monitored file or directory is moved, etc.
The various monitoring events can be performed in a manner of combining a plurality of events in the application process, for example, whether all the duplicate files in the path where the duplicate is located are opened or not can be monitored, and if yes, the duplicate files are monitored to be deleted.
The monitoring process can be performed through a preset period, and the preset period can adopt cycle time as a metering unit. For example, the period is set to 1 hour, and a loop check is performed to determine whether a copy in the distributed object storage system is missing in the 1 hour. If no missing file is found in the period, the monitoring is executed again until the follow-up process is executed when the missing file is found.
The preset period can also be set by adopting a cycle number, for example, the period is 10 times of traversing all the copy files under the monitored path, and if no copy is lost when the number of times exceeds 10 times, the monitoring is continued until a subsequent flow is executed when the copy is found to be lost.
Fig. 2 is a schematic flowchart of a data recovery method according to an embodiment of the present application. The method is applied to storage equipment in a distributed object storage system, and comprises the following steps:
step S210, receiving a recovery task command from the management device, where the recovery task command includes IDs of one or more copies to be recovered and information of a destination node device of each copy to be recovered.
After monitoring that the copy is missing, the management device in the distributed object storage system sends a task recovery command to the storage device in the distributed object storage system, where the task recovery command is generated in step S120 and includes the ID of each copy to be recovered and information of the destination node device of each copy to be recovered.
Step S220, obtaining data of each copy to be recovered through erasure code decoding.
In the art, the process of generating the parity block is referred to as encoding, and the process of recovering the lost data block is decoding. The erasure code technology mainly encodes an original data block through an erasure code algorithm to obtain a check block, and stores the data block and the check block together to achieve the purpose of fault tolerance. The basic idea is that M original data blocks are calculated to obtain N check blocks, and for the M + N blocks, when any N elements are lost, the original data of the copy to be recovered can be recovered through a corresponding algorithm.
In some embodiments, the storage device includes a storage unit, and in this step, the storage device may save the recovery task command to the storage unit, obtain data of each copy to be recovered through erasure code decoding, and recover each copy to be recovered in the storage unit. Or the recovery task command is not stored in the storage unit, the data of each copy to be recovered is obtained directly through erasure code decoding, and each copy to be recovered is recovered in the storage unit.
Step S230, writing the data of each copy to be restored into the destination node device corresponding to the copy to be restored according to the information of the destination node device of each copy to be restored.
Specifically, in some embodiments, the storage device is one of destination node devices. Then, in step S230, the storage device forms each recovered copy to be recovered according to the data of each copy to be recovered, and writes the recovered copy to be recovered into the destination node device corresponding to the recovered copy according to the information of the destination node device of each copy to be recovered.
In some embodiments, after the step S230 of writing the data of each copy to be restored into the corresponding destination node device, the method further includes:
and sending a recovery completion message to the management equipment, so that the management equipment updates the state information of each copy to be recovered.
The triggering of the recovery message is established after the judgment of whether the data of each copy to be recovered is written into the corresponding destination node device. In a specific implementation process, when all to-be-recovered copies have been written into the corresponding node device, the relevant message is sent to the management device, and the callback function may be executed in a callback function manner, that is, when all to-be-recovered copies have been written into the corresponding node device, the callback function is triggered immediately, and the completed message is sent to the management device.
The data recovery method in the above embodiment explains the flow from two levels, respectively, and the following describes the data recovery method as a whole. For example, under the existing architecture, it is assumed that the EC mode is 8 data blocks, and the corresponding copy IDs are 1 to 8, respectively; the IDs corresponding to the 4 check block copies are 9 to 12 respectively. For a better illustration, if two copies of a data block are missing from the above copies, the IDs are 2 and 6, respectively, and the conventional EC data recovery logic will be described first.
Step 1, the management node periodically detects whether the copy is missing, and when the copy is missing, a lost copy is randomly selected to trigger a recovery task. And allocating a node X as a destination node and issuing a recovery task to the node X.
And 2, after receiving the recovery command, the node X requests any 8 existing copies: and the copy 1, the copy 3, the copy 4, the copy 5, the copy 7, the copy 8, the copy 9 and the copy 10 obtain the data of the copy 2 through the decoding of the EC, write the data into the local, and then submit a recovery completion message to the management node.
And 3, the management node updates the information of the copy 2, and the recovery task is finished.
And 4, the management node periodically detects that the copy is missing by the task, only the copy 6 is missing at the moment, and the management node allocates a node Y as a target node and issues a recovery task to the node Y.
And step 5, similar to the process of restoring the copy 2 by the node X, completing the restoration of the copy 6 by the node Y and submitting a restoration completion message to the management node.
And 6, the management node updates the information of the copy 6, and the recovery task is finished.
And 7, periodically detecting tasks by the management node, confirming that no copy is lost, and waiting for the next detection.
Therefore, in the process of recovering the conventional EC copies, when a plurality of copies are recovered, a plurality of recovery tasks need to be triggered, which wastes device resources and bandwidth, and the recovery time is long, which is not favorable for data security.
After the data recovery method provided by this embodiment, the data recovery logic of the EC copy is as follows:
step 1, the management node periodically executes a detection task to detect whether the copy is missing. When detecting that the copy 2 and the copy 6 are lost, selecting a node X for the copy 2 as a destination node; selecting node Y as a destination node for copy 6; and sending the recovery task to a random lost copy to trigger the recovery task.
Step 2, taking copy 2 as an example, after receiving the recovery command, node X requests any 8 existing copies: the copy 1, the copy 3, the copy 4, the copy 5, the copy 7, the copy 8, the copy 9 and the copy 10 are sequentially decoded by the EC to obtain data of the copy 2 and the copy 6, the data of the copy 2 is written into the local, the data of the copy 6 is written into the node Y, and then a recovery completion message is submitted to the management node.
And 3, the management node updates the information of the copy 2 and the copy 6, and the recovery task is finished.
And 4, the management node periodically executes task detection, and waits for next detection after confirming that no copy is lost.
The management node may split the recovery task and may recover a partial copy at a time, rather than all copies. If four copies are missing, the recovery task is issued twice, and two copies are recovered each time. When EC decodes, data can be written into local and other nodes after being restored to be complete, or a part of data can be written into a part of data by decoding a part of data in a streaming manner.
According to the data recovery method provided by the embodiment of the invention, when the management node finds that data needs to be recovered, if N copies of the EC copy are lost, N nodes meeting requirements can be directly selected as target nodes, a recovery task is issued to one of the target nodes, and mapping information of each node and the copy ID is carried. And after receiving the recovery task, the destination node is responsible for requesting the existing copy nodes, carrying out EC decoding to obtain missing N copy blocks, writing the corresponding copy blocks into the corresponding nodes, and feeding back the corresponding copy blocks to the management node after the recovery is successful. Therefore, the management node only needs to send the recovery task once, interaction with the data node is reduced, the data node only needs to decode EC once, unnecessary resource consumption is reduced, and recovery time is shorter.
The data recovery method applied to the storage device in the distributed object storage system provided by the embodiment of the present application has the same technical features as the data recovery method applied to the management device in the distributed object storage system provided by the above embodiment, so that the same technical problems can be solved, and the same technical effects can be achieved.
Fig. 3 provides a schematic structural diagram of a data recovery apparatus. The data recovery device is applied to management equipment in a distributed object storage system. As shown in fig. 3, the data restoring apparatus includes:
a to-be-restored copy selection module 310, configured to, when detecting that one or more copies of data are missing, select the missing one or more copies as to-be-restored copies;
a recovery task command generating module 320, configured to generate a recovery task command, where the recovery task command includes an ID of each to-be-recovered copy and information of a destination node device of each to-be-recovered copy;
and the to-be-recovered copy recovery module 330 is configured to send the task recovery command to the target storage device, so that the target storage device recovers each to-be-recovered copy according to the task recovery command, and writes each recovered to-be-recovered copy into the corresponding destination node device according to information of the destination node device of each to-be-recovered copy.
In some embodiments, the target storage device is one of the destination node devices. The to-be-recovered copy recovery module 330 is specifically configured to, when it is detected that multiple copies of data are missing, send a recovery task command to any destination node device, so that the any destination node device recovers the multiple to-be-recovered copies, and write the recovered to-be-recovered copies into the destination node devices corresponding to the recovered to-be-recovered copies according to information of the destination node device of each to-be-recovered copy.
In some embodiments, the target storage device includes a storage unit. The to-be-recovered copy recovery module 330 is specifically configured to send a recovery task command to a storage unit of any one target storage device, so as to recover each to-be-recovered copy in the storage unit of the target storage device, and write the recovered to-be-recovered copy into a destination node device corresponding to each to-be-recovered copy according to information of the destination node device of each to-be-recovered copy.
In some embodiments, the data recovery apparatus further includes:
and the state information updating module of the to-be-recovered copy is used for receiving the recovery completion message from the target storage device and updating the state information of each to-be-recovered copy.
In some embodiments, the data recovery apparatus further includes:
and the period detection module is used for detecting whether the copy in the distributed object storage system is lost or not according to a preset period.
The data recovery apparatus applied to the management device in the distributed object storage system provided in the embodiment of the present application has the same technical features as the data recovery method applied to the management device in the distributed object storage system provided in the above embodiment, so that the same technical problems can be solved, and the same technical effects can be achieved.
Fig. 4 provides a schematic structural diagram of a data recovery apparatus, which is applied to a storage device in a distributed object storage system, and includes:
a recovery task command receiving module 410, configured to receive a recovery task command from the management device, where the recovery task command includes IDs of one or more copies to be recovered and information of a destination node device of each copy to be recovered;
a to-be-recovered copy data obtaining module 420, configured to obtain data of each to-be-recovered copy through erasure code decoding;
and a to-be-recovered copy data writing module 430, configured to write the data of each to-be-recovered copy into the destination node device corresponding to the to-be-recovered copy according to the information of the destination node device of each to-be-recovered copy.
In some embodiments, the storage device is one of the destination node devices. The to-be-recovered copy data writing module 430 is specifically configured to form each recovered to-be-recovered copy according to the data of each to-be-recovered copy, and write the recovered to-be-recovered copy into the destination node device corresponding to the recovered to-be-recovered copy according to the information of the destination node device of each to-be-recovered copy.
In some embodiments, the target storage device includes a storage unit. The to-be-recovered copy data obtaining module 420 is specifically configured to store the recovery task command in the storage unit, obtain data of each to-be-recovered copy through erasure code decoding, and recover each to-be-recovered copy in the storage unit. The to-be-restored copy data obtaining module 420 may also obtain data of each to-be-restored copy by directly decoding erasure codes without storing the restoration task command to the storage unit, and restore each to-be-restored copy in the storage unit.
In some embodiments, the data recovery apparatus further includes:
and the recovery message sending module is used for sending a recovery completion message to the management equipment so that the management equipment updates the state information of each copy to be recovered.
The data recovery apparatus applied to the storage device in the distributed object storage system provided in the embodiment of the present application has the same technical features as the data recovery method applied to the storage device in the distributed object storage system provided in the above embodiment, so that the same technical problems can be solved, and the same technical effects can be achieved.
As shown in fig. 5, an electronic device 500 includes a memory 501 and a processor 502, where the memory stores a computer program that can run on the processor, and the processor executes the computer program to implement the steps of the method provided in the foregoing embodiment.
Referring to fig. 5, the electronic device further includes: a bus 503 and a communication interface 504, and the processor 502, the communication interface 504 and the memory 501 are connected by the bus 503; the processor 502 is for executing executable modules, e.g. computer programs, stored in the memory 501.
The Memory 501 may include a high-speed Random Access Memory (RAM) and may also include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. The communication connection between the network element of the system and at least one other network element is realized through at least one communication interface 504 (which may be wired or wireless), and the internet, a wide area network, a local network, a metropolitan area network, and the like can be used.
Bus 503 may be an ISA bus, PCI bus, EISA bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one double-headed arrow is shown in FIG. 5, but this does not indicate only one bus or one type of bus.
The memory 501 is used for storing a program, and the processor 502 executes the program after receiving an execution instruction, and the method performed by the apparatus defined by the process disclosed in any of the foregoing embodiments of the present application may be applied to the processor 502, or implemented by the processor 502.
The processor 502 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 502. The Processor 502 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the device can also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field-Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 501, and the processor 502 reads the information in the memory 501, and completes the steps of the method in combination with the hardware thereof.
Corresponding to the data storage method, the embodiment of the application also provides a computer readable storage medium, wherein the computer readable storage medium stores machine executable instructions, and when the computer executable instructions are called and executed by a processor, the computer executable instructions cause the processor to execute the steps of the data recovery method.
The data recovery device provided by the embodiment of the present application may be specific hardware on the device, or software or firmware installed on the device, etc. The device provided by the embodiment of the present application has the same implementation principle and technical effect as the foregoing method embodiments, and for the sake of brief description, reference may be made to the corresponding contents in the foregoing method embodiments where no part of the device embodiments is mentioned. It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the foregoing systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments provided in the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the mobile control method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method can be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus once an item is defined in one figure, it need not be further defined and explained in subsequent figures, and moreover, the terms "first", "second", "third", etc. are used merely to distinguish one description from another and are not to be construed as indicating or implying relative importance.
Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present application, and are used for illustrating the technical solutions of the present application, but not limiting the same, and the scope of the present application is not limited thereto, and although the present application is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope disclosed in the present application; such modifications, changes or substitutions do not depart from the scope of the embodiments of the present application. Are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (12)

1. A data recovery method is applied to a management device in a distributed object storage system, and comprises the following steps:
when detecting that one or more copies of the data are missing, selecting the missing one or more copies as the copies to be recovered;
generating a recovery task command, wherein the recovery task command comprises the ID of each copy to be recovered and the information of the destination node equipment of each copy to be recovered;
and sending the task recovery command to target storage equipment, so that the target storage equipment recovers each copy to be recovered according to the task recovery command, and writes each recovered copy to be recovered into the corresponding target node equipment according to the information of the target node equipment of each copy to be recovered.
2. The method of claim 1, wherein the target storage device is one of the destination node devices,
the sending the task recovery command to a target storage device to enable the target storage device to recover each copy to be recovered according to the task recovery command includes:
when detecting that the data lacks a plurality of copies, sending the recovery task command to any destination node device, so that the any destination node device recovers the plurality of copies to be recovered, and writing the recovered copies to be recovered into the destination node device corresponding to the recovered copies according to the information of the destination node device of each copy to be recovered.
3. The method of claim 1 or 2, wherein the target storage device comprises a storage unit,
the sending the task recovery command to a target storage device to enable the target storage device to recover each copy to be recovered according to the task recovery command includes:
and sending the recovery task command to a storage unit of any one of the target storage devices to recover each copy to be recovered in the storage unit of the target storage device, and writing the recovered copy to be recovered into the corresponding target node device according to the information of the target node device of each copy to be recovered.
4. The method according to claim 1, wherein after the step of sending the resume task command to the target storage device to enable the target storage device to resume each copy to be resumed according to the resume task command, the method further comprises:
and receiving a complete recovery message from the target storage device, and updating the state information of each copy to be recovered.
5. A data recovery method is applied to a storage device in a distributed object storage system, and comprises the following steps:
receiving a recovery task command from a management device, wherein the recovery task command comprises one or more IDs (identity) of copies to be recovered and information of destination node devices of each copy to be recovered;
obtaining data of each copy to be recovered through erasure code decoding;
and writing the data of each copy to be recovered into the corresponding destination node equipment according to the information of the destination node equipment of each copy to be recovered.
6. The method of claim 5, wherein the storage device is one of the destination node devices;
the writing the data of each copy to be restored into the destination node device corresponding to the copy to be restored according to the information of the destination node device of each copy to be restored comprises:
and forming each recovered copy to be recovered according to the data of each copy to be recovered, and writing the recovered copy to be recovered into the corresponding destination node device according to the information of the destination node device of each copy to be recovered.
7. The method of claim 5 or 6, wherein the storage device comprises a storage unit,
the obtaining the data of each copy to be recovered through erasure code decoding includes:
and acquiring data of each copy to be recovered through erasure code decoding, and recovering each copy to be recovered in the storage unit.
8. The method according to claim 5, further comprising, after the step of writing the data of each of the copies to be restored into the corresponding destination node device:
and sending a recovery completion message to the management equipment, so that the management equipment updates the state information of each copy to be recovered.
9. A data recovery apparatus, applied to a management device in a distributed object storage system, the apparatus comprising:
the to-be-recovered copy selection module is used for selecting one or more missing copies as to-be-recovered copies when detecting that one or more missing copies of the data are detected;
a recovery task command generating module, configured to generate a recovery task command, where the recovery task command includes an ID of each to-be-recovered copy and information of a destination node device of each to-be-recovered copy;
and the to-be-recovered copy recovery module is used for sending the recovery task command to target storage equipment, so that the target storage equipment recovers each to-be-recovered copy according to the recovery task command, and writes each recovered to-be-recovered copy into the corresponding target node equipment according to the information of the target node equipment of each to-be-recovered copy.
10. A data recovery apparatus, applied to a storage device in a distributed object storage system, the apparatus comprising:
a recovery task command receiving module, configured to receive a recovery task command from a management device, where the recovery task command includes IDs of one or more to-be-recovered copies and information of a destination node device of each to-be-recovered copy;
the to-be-recovered copy data acquisition module is used for acquiring the data of each to-be-recovered copy through erasure code decoding;
and the data writing module of the to-be-recovered copy is used for writing the data of each to-be-recovered copy into the corresponding destination node equipment according to the information of the destination node equipment of each to-be-recovered copy.
11. An electronic device comprising a memory and a processor, wherein the memory stores a computer program operable on the processor, and wherein the processor implements the steps of the method of any of claims 1 to 8 when executing the computer program.
12. A computer readable storage medium having stored thereon machine executable instructions which, when invoked and executed by a processor, cause the processor to execute the method of any of claims 1 to 8.
CN202010471706.1A 2020-05-28 2020-05-28 Data recovery method and device, electronic equipment and computer readable storage medium Pending CN111625402A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010471706.1A CN111625402A (en) 2020-05-28 2020-05-28 Data recovery method and device, electronic equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010471706.1A CN111625402A (en) 2020-05-28 2020-05-28 Data recovery method and device, electronic equipment and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN111625402A true CN111625402A (en) 2020-09-04

Family

ID=72260810

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010471706.1A Pending CN111625402A (en) 2020-05-28 2020-05-28 Data recovery method and device, electronic equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN111625402A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113553216A (en) * 2021-06-28 2021-10-26 北京百度网讯科技有限公司 Data recovery method and device, electronic equipment and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102270161A (en) * 2011-06-09 2011-12-07 华中科技大学 Methods for storing, reading and recovering erasure code-based multistage fault-tolerant data
CN103209210A (en) * 2013-03-04 2013-07-17 华中科技大学 Method for improving erasure code based storage cluster recovery performance
CN104735107A (en) * 2013-12-20 2015-06-24 中国移动通信集团公司 Recovery method and device for data copies in distributed storage system
CN105610879A (en) * 2014-10-31 2016-05-25 深圳市华为技术软件有限公司 Data processing method and data processing device
CN106844098A (en) * 2016-12-29 2017-06-13 中国科学院计算技术研究所 A kind of fast data recovery method and system based on right-angled intersection erasure code
US20170206135A1 (en) * 2015-12-31 2017-07-20 Huawei Technologies Co., Ltd. Data Reconstruction Method in Distributed Storage System, Apparatus, and System
CN108572793A (en) * 2017-10-18 2018-09-25 北京金山云网络技术有限公司 Data are written and data reconstruction method, device, electronic equipment and storage medium
CN110825559A (en) * 2018-08-10 2020-02-21 华为技术有限公司 Data processing method and equipment

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102270161A (en) * 2011-06-09 2011-12-07 华中科技大学 Methods for storing, reading and recovering erasure code-based multistage fault-tolerant data
CN103209210A (en) * 2013-03-04 2013-07-17 华中科技大学 Method for improving erasure code based storage cluster recovery performance
CN104735107A (en) * 2013-12-20 2015-06-24 中国移动通信集团公司 Recovery method and device for data copies in distributed storage system
CN105610879A (en) * 2014-10-31 2016-05-25 深圳市华为技术软件有限公司 Data processing method and data processing device
US20170206135A1 (en) * 2015-12-31 2017-07-20 Huawei Technologies Co., Ltd. Data Reconstruction Method in Distributed Storage System, Apparatus, and System
CN106844098A (en) * 2016-12-29 2017-06-13 中国科学院计算技术研究所 A kind of fast data recovery method and system based on right-angled intersection erasure code
CN108572793A (en) * 2017-10-18 2018-09-25 北京金山云网络技术有限公司 Data are written and data reconstruction method, device, electronic equipment and storage medium
CN110825559A (en) * 2018-08-10 2020-02-21 华为技术有限公司 Data processing method and equipment

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
HDFS纠删码(ERASURE CODING)_超级侠哥的博客-CSDN博客: "HDFS纠删码(Erasure Coding)_超级侠哥的博客-CSDN博客" *
服务质量感知的云系统数据副本恢复算法_徐超: "服务质量感知的云系统数据副本恢复算法_徐超", vol. 45, no. 9 *
王素贞等: "大数据技术基础实验教程", vol. 1, 河北科学技术出版社, pages: 61 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113553216A (en) * 2021-06-28 2021-10-26 北京百度网讯科技有限公司 Data recovery method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN102594849B (en) Data backup and recovery method and device, virtual machine snapshot deleting and rollback method and device
CN106776130B (en) Log recovery method, storage device and storage node
CN108363657B (en) Method, equipment and medium for monitoring integrity of embedded data acquisition of APP client
US20150213100A1 (en) Data synchronization method and system
CN112988683A (en) Data processing method and device, electronic equipment and storage medium
CN107506266B (en) Data recovery method and system
US10185631B2 (en) System and method of performing continuous backup of a data file on a computing device
CN109783014B (en) Data storage method and device
EP3474143B1 (en) Method and apparatus for incremental recovery of data
CN111382008B (en) Virtual machine data backup method, device and system
CN112463437B (en) Service recovery method, system and related components of storage cluster system offline node
CN112380057A (en) Data recovery method, device, equipment and storage medium
CN110825562A (en) Data backup method, device, system and storage medium
CN111211993B (en) Incremental persistence method, device and storage medium for stream computation
CN111625402A (en) Data recovery method and device, electronic equipment and computer readable storage medium
CN108133034B (en) Shared storage access method and related device
CN114003439A (en) Data backup method, device, equipment and storage medium
CN112579550B (en) Metadata information synchronization method and system of distributed file system
CN110737716A (en) data writing method and device
US20210397599A1 (en) Techniques for generating a consistent view of an eventually consistent database
WO2017080362A1 (en) Data managing method and device
CN110825552B (en) Data storage method, data recovery method, node and storage medium
CN111226200B (en) Method, device and distributed system for creating consistent snapshot for distributed application
CN108089942B (en) Data backup and recovery method and device
CN115080538A (en) Block chain version verification method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination