CN107346273B - Data recovery method and device and electronic equipment - Google Patents

Data recovery method and device and electronic equipment Download PDF

Info

Publication number
CN107346273B
CN107346273B CN201710446718.7A CN201710446718A CN107346273B CN 107346273 B CN107346273 B CN 107346273B CN 201710446718 A CN201710446718 A CN 201710446718A CN 107346273 B CN107346273 B CN 107346273B
Authority
CN
China
Prior art keywords
data block
path
fault
server
locked
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710446718.7A
Other languages
Chinese (zh)
Other versions
CN107346273A (en
Inventor
赵东富
陈永旺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing QIYI Century Science and Technology Co Ltd
Original Assignee
Beijing QIYI Century Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing QIYI Century Science and Technology Co Ltd filed Critical Beijing QIYI Century Science and Technology Co Ltd
Priority to CN201710446718.7A priority Critical patent/CN107346273B/en
Publication of CN107346273A publication Critical patent/CN107346273A/en
Application granted granted Critical
Publication of CN107346273B publication Critical patent/CN107346273B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1464Management of the backup or restore process for networked environments
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides a data recovery method, a data recovery device and electronic equipment, wherein the method comprises the following steps: obtaining a target data block of a target object; acquiring two adjacent data blocks of a target data block, and detecting whether a fault data block exists in the two adjacent data blocks; if the fault data block exists, all data blocks of the target object are obtained, and the fault data block and the normal data block are determined from all the data blocks of the target object; judging whether the path of each fault data block can be locked, if so, sequentially locking the path of each fault data block according to a preset sequence; and restoring each fault data block into a normal data block corresponding to the fault data block by using a preset number of normal data blocks, and sending the normal data block corresponding to the fault data block to a server corresponding to a path where each fault data block is located. When the scheme provided by the embodiment of the invention is applied to data recovery, the network resource and the computing resource are saved.

Description

Data recovery method and device and electronic equipment
Technical Field
The invention relates to the technical field of cloud storage, in particular to a data recovery method and device and electronic equipment.
Background
In recent years, cloud storage systems have been developed rapidly, and common cloud storage systems include: microsoft AzureStorage, Facebook, Google, and Openstack Swift. The Openstack Swift cloud storage system is a distributed object storage system, the distributed object storage system includes multiple servers, and the object storage is to store data as one object, for example, picture data, video data, audio data, document data, and the like, which can be stored as an object in any server of the Openstack Swift cloud storage system.
In order to prevent data loss caused by system failure and improve availability and reliability of data, a storage policy adopted by many cloud storage systems is a copy policy. The copy strategy is to store multiple copies of data, and each copy of data is called a copy. To prevent collective failures of adjacent partial servers, different copies of data are stored on different disks of different servers in different physical areas. Thus, even if one of the copies is lost, the data still exists.
Another common storage strategy is erasure coding. Compared with a copy strategy, the erasure code saves a large amount of storage space while ensuring the availability and reliability of data. The basic principle of erasure codes is to divide original data into k data blocks, and then obtain r check blocks by operating the k data blocks and an auxiliary matrix. Among the k + r parity chunks and data chunks, even if any 1 to r parity chunks and/or data chunks are lost, recovery can be performed by the remaining data chunks and parity chunks. In practical application, the value of r is generally smaller than k, so that on the premise of ensuring data redundancy, the storage space is saved, and the cost is saved.
At present, many cloud storage systems apply erasure codes as storage strategies, and a data recovery process of an Openstack Swift cloud storage system that adopts an erasure code storage strategy is described as an example, where the data recovery process is as follows: each server in the cloud storage system scans an adjacent data block of each data block stored on the server through a Reconstructor (recovery) process, where the adjacent data block of each data block refers to a data block that belongs to the same object as the data block and is adjacent to the data block on a hash ring, and if the adjacent data block of the data block is found to be a failed data block, the failed data block is recovered. When the server recovers the fault data block, k normal data blocks need to be read from other servers, and then matrix calculation is performed by using the k normal data blocks to recover the fault data block into a normal data block, so that the normal data block is sent to the server storing the fault data block, and recovery of the fault data block is completed.
For example, in the distributed system architecture shown in fig. 1, the data blocks adjacent to the data block B are the data block a and the data block c, the data blocks adjacent to the data block c are the data block B and the data block D, the data blocks a, B, c, and D are respectively stored in the server A, B, C, D of the distributed system 101, if the data block c is a failed data block, the server B and the server D can both detect that the data block c is the failed data block, and further the server B and the server D may perform data recovery on the data block c, thereby possibly causing waste of network resources and computing resources of the cloud storage system.
Disclosure of Invention
The embodiment of the invention aims to provide a data recovery method, a data recovery device and electronic equipment so as to save network resources and computing resources. The specific technical scheme is as follows:
in a first aspect, an embodiment of the present invention provides a data recovery method, which is applied to servers in a distributed storage system, and for each server, the method includes:
obtaining a target data block of a target object, wherein the target data block is: the data blocks stored on the server in all the data blocks of the target object;
acquiring two adjacent data blocks of the target data block, and detecting whether a fault data block exists in the two adjacent data blocks;
if the fault data block exists, all data blocks of the target object are obtained, and the fault data block and the normal data block are determined from all the data blocks of the target object;
judging whether the path of each fault data block can be locked, if so, sequentially locking the path of each fault data block according to a preset sequence; and restoring each fault data block into a normal data block corresponding to the fault data block by using a preset number of normal data blocks, and sending the normal data block corresponding to the fault data block to a server corresponding to a path where each fault data block is located.
Optionally, if there is no faulty data block, or it is determined that the path where each faulty data block is located cannot be locked, the method further includes:
and obtaining any object except the target object stored on the server as the target object, and returning to execute the step of obtaining the target data block of the target object stored on the server.
Optionally, the method further includes:
and unlocking the path of each fault data block.
Optionally, the step of obtaining two adjacent data blocks of the target data block includes:
obtaining the path of each adjacent data block by using the hash ring stored on the server;
and determining each server for storing each adjacent data block by using the path of each adjacent data block, and acquiring each adjacent data block based on each server.
Optionally, each of the failure data blocks includes a unique number, and the determining whether the path where each of the failure data blocks is located can be locked includes:
checking whether the path where the fault data block with the minimum number or the maximum number is located is locked;
if the path is locked, obtaining a difference value between the locked starting time and the current time of the path, judging whether the difference value is greater than a preset threshold value, and if the difference value is greater than the preset threshold value, judging that the path where each fault data block is located can be locked; if the number of the data blocks is less than or equal to the number of the data blocks, judging that the path where each fault data block is located cannot be locked;
and if the data block is not locked, judging that the path of each fault data block can be locked.
Optionally, for the condition that the smallest serial number of the fault data blocks is checked, sequentially locking the paths where the fault data blocks are located according to a preset sequence includes:
sequentially locking the path of each fault data block according to the sequence of the numbers from small to large;
aiming at the condition that the serial number of the largest fault data block is checked, sequentially locking the path where each fault data block is located according to a preset sequence, the method comprises the following steps:
and locking the path of each fault data block in sequence according to the sequence of the numbers from large to small.
In a second aspect, an embodiment of the present invention provides a data recovery apparatus, which is applied to servers in a distributed storage system, and for each server, the apparatus includes:
a first obtaining module, configured to obtain a target data block of a target object, where the target data block is: the data blocks stored on the server in all the data blocks of the target object;
a second obtaining module, configured to obtain two adjacent data blocks of the target data block, and detect whether a faulty data block exists in the two adjacent data blocks;
a third obtaining module, configured to, if a determination result of the second obtaining module is yes, obtain all data blocks of the target object, and determine a faulty data block and a normal data block from all data blocks of the target object;
the judging module is used for judging whether the path where each fault data block is located can be locked or not, and if the path where each fault data block is located can be locked, the paths where each fault data block is located are sequentially locked according to a preset sequence; and restoring each fault data block into a normal data block corresponding to the fault data block by using a preset number of normal data blocks, and sending the normal data block corresponding to the fault data block to a server corresponding to a path where each fault data block is located.
Optionally, the apparatus further comprises:
and a fourth obtaining module, configured to, if the determination result of the second obtaining module is negative or the determination result of the determining module is negative, obtain any object other than the target object stored in the server as the target object, and return to the step of obtaining the target data block of the target object stored in the server.
Optionally, the apparatus further comprises:
and the unlocking module is used for unlocking the path where each fault data block is located.
Optionally, the second obtaining module includes:
the first obtaining submodule is used for obtaining the path of each adjacent data block by utilizing the hash ring stored on the server;
and the second obtaining submodule is used for determining each server for storing each adjacent data block by using the path of each adjacent data block, and obtaining each adjacent data block based on each server.
Optionally, each of the failure data blocks includes a unique number, and the determining module includes:
the detection submodule is used for checking whether the path where the fault data block with the minimum number or the maximum number is located is locked;
the first judgment sub-module is used for obtaining the difference value between the locked starting time of the path and the current time when the detection result of the detection sub-module is yes, judging whether the difference value is larger than a preset threshold value or not, and if the difference value is larger than the preset threshold value, judging that the path where each fault data block is located can be locked; if the number of the data blocks is less than or equal to the number of the data blocks, judging that the path where each fault data block is located cannot be locked;
and the second judging submodule is used for judging whether the path where each fault data block is located can be locked when the detection result of the detection submodule is negative.
Optionally, for the condition that the number of the checked smallest failure data block is the smallest, the determining module includes:
the first locking submodule is used for sequentially locking the paths of the fault data blocks according to the sequence of the numbers from small to large;
for the condition that the checked data block with the largest serial number is the largest fault data block, the judging module comprises:
and the second locking submodule is used for sequentially locking the paths where the fault data blocks are located according to the sequence of the numbers from large to small.
In yet another aspect of the present invention, there is also provided a computer-readable storage medium having stored therein instructions, which when run on a computer, cause the computer to execute any of the above-described data recovery methods.
In yet another aspect of the present invention, the present invention further provides a computer program product containing instructions, which when run on a computer, causes the computer to execute any of the above data recovery methods.
Therefore, by applying the embodiment of the invention, one server locks the paths of all the fault data blocks of one object, and other servers cannot access the fault data blocks because the paths of the fault data blocks are locked, so that the data recovery can be performed on all the fault data blocks of the object only by the server locking the paths of the fault data blocks, thereby avoiding that a plurality of servers simultaneously perform data recovery on the same fault data block, and saving network resources and computing resources.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.
FIG. 1 is a prior art distributed system architecture diagram;
fig. 2 is a schematic flowchart of a data recovery method according to an embodiment of the present invention;
FIG. 3 is a diagram of a distributed system architecture for data recovery using a data recovery method provided by an embodiment of the present invention;
FIG. 4 is a flowchart illustrating a data recovery operation performed by the data recovery method according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a data recovery apparatus according to an embodiment of the present invention;
FIG. 6 is a schematic structural diagram of another data recovery apparatus according to an embodiment of the present invention;
FIG. 7 is a schematic structural diagram of another data recovery apparatus according to an embodiment of the present invention;
fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention.
In order to solve the problem of network resource and computing resource waste in the prior art, the embodiment of the invention discloses a data recovery method, a data recovery device and electronic equipment.
Specifically, an application scenario of the data recovery method provided in the embodiment of the present invention may be as follows: a recovery process in a server obtains a target data block of a target object stored on the server, further obtains two adjacent data blocks of the target data block, detects whether a fault data block exists in the two adjacent data blocks, obtains all data blocks of the target object if the fault data block exists, and determines a fault data block and a normal data block from all data blocks of the target object; judging whether the path of each fault data block can be locked, if so, sequentially locking the path of each fault data block according to a preset sequence; and restoring each fault data block into a normal data block corresponding to the fault data block by using a preset number of normal data blocks, and sending the normal data block corresponding to the fault data block to a server corresponding to a path where each fault data block is located.
A data recovery method provided in an embodiment of the present invention is described in detail below.
It should be noted that the data recovery method provided by the embodiment of the present invention is applied to a server in a distributed storage system. In addition, the functional software for implementing the data recovery method provided by the embodiment of the invention may be special data recovery software, or may be a plug-in existing data recovery software or other software with a data recovery function.
Referring to fig. 2, fig. 2 is a schematic flow chart of a data recovery method according to an embodiment of the present invention, where for each server in a distributed storage system, the method includes the following steps:
s201, obtaining a target data block of a target object.
It is understood that the servers in the distributed storage system may store the data blocks of the respective objects, and each server may obtain the data blocks stored by itself, or may send an obtaining request to another server to obtain the data blocks stored by another server. Each object has a plurality of data blocks, wherein the target data block is: the data blocks stored on the server are all the data blocks of the target object.
S202, two adjacent data blocks of the target data block are obtained, whether a fault data block exists in the two adjacent data blocks is detected, and if the fault data block exists, S203 is executed.
It is to be understood that the adjacent data block refers to a data block adjacent to the target data block on the hash ring. All data blocks of the target object respectively correspond to one position on the hash ring, and each adjacent data block of the target object can be obtained by obtaining the data block adjacent to the target data block on the hash ring. Each adjacent data block and the target data block are data blocks of a target object, and in practical application, the adjacent data blocks and the target data block may be stored in the same server or may be stored in different servers. The embodiment of the invention does not limit the specific storage path of the adjacent data blocks.
The failed data block is a data block in which the server cannot normally read the content because the data is destroyed. A specific method for detecting whether a data block is a failed data block is the prior art, and the embodiment of the present invention is not described herein again. The embodiment of the present invention does not limit the specific detection method. For example, a consistency detection algorithm may be employed to detect whether a failed data block exists in adjacent data blocks.
S203, all data blocks of the target object are obtained, and a fault data block and a normal data block are determined from all the data blocks of the target object.
It can be understood that, for each object, if a certain data block of the object is a faulty data block, there are likely to be multiple faulty data blocks in the object, and therefore, the adjacent data blocks are part of all the data blocks of the target object, and if there is a faulty data block in the adjacent data blocks, it indicates that there are likely data blocks other than the adjacent data blocks and also faulty data blocks in all the data blocks of the target object, and therefore, it is necessary to obtain all the data blocks of the target object and perform troubleshooting one by one, thereby determining a faulty data block and a normal data block from all the data blocks of the target object.
S204, judging whether the path of each fault data block can be locked, and if the path of each fault data block can be locked, sequentially locking the paths of each fault data block according to a preset sequence; and restoring each fault data block into a normal data block corresponding to the fault data block by using a preset number of normal data blocks, and sending the normal data block corresponding to the fault data block to a server corresponding to a path where each fault data block is located.
It should be noted that, if a path of a failed data block is locked by a certain server, only the server can read the content of the failed data block, that is, no other server except the server can access the content of the failed data block, and if the server cannot read the content of the failed data block due to the fact that the path of the failed data block is locked by another server, the server cannot repair the content of the failed data block. Therefore, locking can prevent different servers from repairing the fault data blocks of the same object at the same time, further, the paths where the fault data blocks are located are sequentially locked according to the preset sequence, and the paths where different fault data blocks of the same object are located can be prevented from being simultaneously locked by different servers, so that the purpose that the paths where all fault data blocks of the target object are located are locked by one server can be achieved.
For example, the data blocks of the target object with numbers 0,3,6 and 8 fail respectively. If the server a locks the paths of the data blocks of 0 and 6, and the server B locks the paths of the data blocks of 3 and 8, both of them cannot lock the paths of all the failed data blocks of the target object. However, if the path of the failed data block is locked according to the preset sequence, the above situation will not occur. If the server a locks the path of the data block 0, the server B cannot lock the path of the data block 0, and therefore the server B gives up locking the paths of all the faulty data blocks of the object, and the server a locks the paths of all the faulty data blocks of the target object.
After the server locks the path where each fault data block of the target object is located, the server can restore each fault data block to the normal data block corresponding to the fault data block by using the preset number of normal data blocks of the target object, and then send the normal data block corresponding to the fault data block to the server corresponding to the path where each fault data block is located, thereby completing the repair process of the fault data block.
It should be noted that, if the original data of the target object is divided into k data blocks, and the k data blocks and the auxiliary matrix are used for performing operation to obtain r check blocks, the target object has k + r data blocks in total, by using the embodiment of the present invention, each faulty data block can be restored to the normal data block corresponding to the faulty data block at one time by directly using k normal data blocks, compared with the prior art that k normal data blocks are read for each time of restoring one faulty data block, by applying the embodiment of the present invention, network overhead and time overhead caused by reading k normal data blocks for many times are saved, thereby improving the performance of data restoration, saving the resources of the distributed storage system, and enabling the distributed storage system to better provide service for the access of the client.
Therefore, by applying the embodiment of the invention, one server locks the paths of all the fault data blocks of one object, and other servers cannot access the fault data blocks because the paths of the fault data blocks are locked, so that the data recovery can be performed on all the fault data blocks of the object only by the server locking the paths of the fault data blocks, thereby avoiding that a plurality of servers simultaneously perform data recovery on the same fault data block, and saving network resources and computing resources.
In order to reduce the number of times that the server detects a fault in a data block of a target object, so as to reduce waste of network resources and computing resources of the cloud storage system, in a specific embodiment, if it is detected that no fault data block exists in adjacent data blocks, or it is determined that a path in which each fault data block is located cannot be locked, the method further includes:
any object other than the target object stored on the server is acquired as a target object, and the process returns to S201.
It can be understood that, in the distributed storage system, a plurality of objects are stored on each server, and for each server, when the server detects that no faulty data block exists in the adjacent data blocks, the server considers that no faulty data block exists in all the data blocks of the target object, and thus, it is not necessary to perform fault detection on other data blocks of the target object, and any object stored on the server except for the target object can be directly obtained as the target object. For example, the server stores data blocks of object a and object B, and when the server detects that there is no faulty data block in the data blocks adjacent to the target data block of target object a, the server may return to execution of S201 with object B as the target object, which is any object other than target object a.
Or, when the server determines that the path where each failed data block is located cannot be locked, it indicates that the server cannot perform data recovery on each failed data block of the target object, so the server does not need to perform failure detection on other data blocks of the target object, and can directly obtain any object stored on the server except the target object as the target object.
It can be seen that if no fault data block exists in the adjacent data blocks, the server considers that no fault data block exists in all the data blocks of the target object, and no fault detection needs to be performed on other data blocks of the target object, so that the frequency of performing fault detection on the data blocks of the target object by the server is reduced, and the waste of network resources and computing resources of the cloud storage system is reduced;
or if the path where each fault data block is located cannot be locked, it indicates that the server cannot perform data recovery on each fault data block of the target object, so the server does not need to perform fault detection on other data blocks of the target object, and the frequency of performing fault detection on the data block of the target object by the server is reduced, thereby reducing the waste of network resources and computing resources of the cloud storage system.
In order to avoid unnecessary network resource occupation and improve the utilization rate of the network resources, the method further comprises the following steps:
and unlocking the path of each fault data block.
After the server sends the normal data block corresponding to the fault data block to the server corresponding to the path where the fault data block is located, the server unlocks the path where the fault data block is located, so that the servers except the server can access the recovered content of the fault data block, thereby avoiding the occupation of the server on network resources and improving the utilization rate of the network resources.
In order to increase the speed of obtaining the adjacent data blocks, the step of obtaining two adjacent data blocks of the target data block may include the following two steps:
the first step, utilizing the hash ring stored on the server to obtain the path of each adjacent data block;
and secondly, determining each server for storing each adjacent data block by using the path of each adjacent data block, and acquiring each adjacent data block based on each server.
In practical application, based on each server, obtaining each adjacent data block may be: and sending a request for obtaining each adjacent data block to each server, thereby obtaining each adjacent data block sent by each server for the request.
The server can rapidly acquire the path of each adjacent data block by using the hash ring, so that the speed of acquiring the adjacent data blocks is increased.
Specifically, in order to prevent different servers from simultaneously locking paths where different fault data blocks of the same object are located, which causes waste of computing resources, each fault data block includes a unique number, and the step of determining whether to lock the path where each fault data block is located may include the following steps:
step A, checking whether the path where the fault data block with the minimum number or the maximum number is located is locked;
step B, if the path is locked, obtaining a difference value between the locked starting time and the current time of the path, judging whether the difference value is greater than a preset threshold value, and if the difference value is greater than the preset threshold value, judging that the path where each fault data block is located can be locked; if the number of the data blocks is less than or equal to the number of the data blocks, judging that the path where each fault data block is located cannot be locked;
it can be understood that, for each server, if the server determines that the difference is greater than the preset threshold, it indicates that the time length for which the path is locked by other servers exceeds the effective time length, the path may be locked again by the server, and according to the foregoing, the paths in which the fault data blocks are located are sequentially locked according to the preset sequence, so that when the time length for which the path in which the fault data block with the minimum number or the maximum number is located is locked exceeds the effective time length, it can be considered that the locked time lengths of the other locked fault data blocks all exceed the effective time length, and it can be determined that the path in which each fault data block is located can be locked; if the server determines that the difference is smaller than the preset threshold, it indicates that the time length of the path locked by other servers does not exceed the effective time length, the path cannot be locked again by the server, and the paths of other fault data blocks can only be locked by the server which locks the path, so that it can be determined that the paths of the fault data blocks cannot be locked.
The preset threshold value can be set according to the requirements of users, and the specific numerical value of the preset threshold value is not limited in the embodiment of the invention. For example, the preset threshold may be: 1. 2, 3, 4, 5, etc.
And step C, if the data blocks are not locked, judging that the paths of the fault data blocks can be locked.
It can be seen that, if the path where the failure data block with the minimum number or the maximum number of the target object is located is already locked by the server, and the locking duration does not exceed the preset threshold, the other servers except the server cannot lock the path where each failure data block is located, so that different servers can be prevented from simultaneously locking the paths where different failure data blocks of the same object are located, and the waste of computing resources is avoided.
Specifically, in order to ensure that the paths where all the fault data blocks of the same object can be locked by the same server, the paths where all the fault data blocks are located are sequentially locked according to a preset sequence aiming at the condition that the fault data block with the smallest number is checked, and the method can be as follows: sequentially locking paths of the fault data blocks according to the sequence of the numbers from small to large;
for the condition that the serial number of the largest fault data block is checked, sequentially locking the path where each fault data block is located according to a preset sequence, wherein the sequence can be as follows: and locking the paths of the fault data blocks in sequence according to the sequence of the numbers from large to small.
It can be understood that, for the condition that the smallest-numbered faulty data block is checked, the paths in which the faulty data blocks are located are sequentially locked according to the sequence from small to large of the numbers, so that if the server locks the path in which the faulty data block with the smallest number is located, the server can only sequentially lock the paths in which the faulty data blocks are located according to the sequence from small to large of the numbers; similarly, for the condition that the serial number of the fault data block with the largest number is checked, the paths of the fault data blocks are sequentially locked according to the sequence from the largest to the smallest of the serial numbers, so that if the server locks the path of the fault data block with the largest number, the server can only sequentially lock the paths of the fault data blocks according to the sequence from the largest to the smallest of the serial numbers, and the paths of all the fault data blocks of the same object can be ensured to be locked by the same server.
The following presents a simplified summary of an embodiment of the invention by way of a specific example.
The searching method provided by the embodiment of the present invention is applied to a certain server in a distributed storage system, as shown in fig. 3, a recovery (resolver) process 2 is run by a server B in a distributed system 301, the resolver process 2 executes the data recovery method according to the embodiment of the present invention, and a specific data recovery flow is shown in fig. 4.
The Reconstructor2 process obtains a target data block of a target object, namely a data block b, further obtains two adjacent nodes of the target data block on a hash ring, and obtains two adjacent data blocks of the target data block, namely a data block a and a data block c, as the two adjacent nodes respectively correspond to the two adjacent data blocks of the target data block, and detects whether a fault data block exists in the two adjacent data blocks;
if a fault data block exists, recording a path where the fault data block is located into the set, obtaining all data blocks of the target object, and further determining the fault data block and a normal data block from all the data blocks of the target object, wherein the path where the fault data block is located is recorded into the set every time a fault data block is determined, and the Reconstructor process 2 judges whether the path where each fault data block recorded in the set is located can be locked;
for example, if it is determined that the data block a, the data block c, and the data block d are all fault data blocks, and the numbers of the data block a, the data block c, and the data block d are 1, 2, and 4, respectively, a process of specifically determining whether to lock a path where each fault data block is located is as follows:
checking whether the path of the fault data block with the minimum number in the set is locked;
if so, obtaining a difference value between the locked initial time of the path and the current time, wherein the difference value is the locking time length, judging whether the locking time length is greater than a preset threshold timeout, if so, judging that the path where each fault data block is located can be locked, and sequentially locking the paths where each fault data block in the set is located according to the sequence of the numbers from small to large, namely sequentially locking the paths where the data block a, the data block c and the data block d are located; and if the number of the data blocks is less than or equal to the preset number, judging that the path of each fault data block cannot be locked.
After the faulty nodes are sequentially locked, the Reconstructor process 2 reads M normal data blocks, matrix calculation is performed by using the M normal data blocks, the data block a, the data block c and the data block d are recovered to normal data blocks, A, C, D normal data blocks corresponding to the data block a, the data block c and the data block d are respectively sent to the server, and finally, the paths where the data block a, the data block c and the data block d are located are unlocked.
If no fault data block exists, any object except the target object stored on the server is obtained and used as the target object, and the step of obtaining the target data block of the target object is returned to be executed.
It can be seen that, by applying the embodiment of the present invention, one server locks the paths of all the failed data blocks of an object, and other servers cannot access the failed data blocks because the paths of the failed data blocks are locked, so that data recovery can be performed on all the failed data blocks of the object only by the server locking the paths of the failed data blocks, thereby avoiding that multiple servers simultaneously perform data recovery on the same failed data block, saving network resources and computational resources, further, the server can recover each failed data block to a normal data block at one time only by reading the normal data block required for recovering the failed data block once, and compared with the prior art that the normal data block is read when one failed data block is repaired each time, by applying the embodiment of the present invention, network overhead and time overhead caused by reading the normal data blocks for multiple times are saved, therefore, the performance of data recovery is improved, and the resources of the distributed storage system are saved.
Corresponding to the above data recovery method, an embodiment of the present invention further provides a data recovery device, which is applied to a server in a distributed storage system.
Referring to fig. 5, fig. 5 is a schematic structural diagram of a data recovery apparatus according to an embodiment of the present invention, where for each server in a distributed storage system, the apparatus includes:
a first obtaining module 501, configured to obtain a target data block of a target object, where the target data block is: the data block stored on the server in all the data blocks of the target object;
a second obtaining module 502, configured to obtain two adjacent data blocks of the target data block, and detect whether a faulty data block exists in the two adjacent data blocks;
a third obtaining module 503, configured to obtain all data blocks of the target object and determine a faulty data block and a normal data block from all data blocks of the target object when the determination result of the second obtaining module 502 is yes;
a determining module 504, configured to determine whether the path where each faulty data block is located can be locked, and if so, sequentially lock the paths where each faulty data block is located according to a preset sequence; and restoring each fault data block into a normal data block corresponding to the fault data block by using a preset number of normal data blocks, and sending the normal data block corresponding to the fault data block to a server corresponding to a path where each fault data block is located.
It can be seen that, when the data recovery device provided in the embodiment of the present invention is applied to data recovery, one server locks the paths of all the failed data blocks of an object, and other servers cannot access the failed data blocks because the paths of the failed data blocks are locked, so that data recovery can be performed on all the failed data blocks of the object only by the server that locks the path of the failed data block, thereby avoiding that multiple servers perform data recovery on the same failed data block at the same time, and saving network resources and computing resources.
Optionally, the second obtaining module 502 includes:
the first obtaining submodule is used for obtaining the path of each adjacent data block by utilizing the hash ring stored on the server;
and the second obtaining submodule is used for determining each server for storing each adjacent data block by using the path of each adjacent data block, and obtaining each adjacent data block based on each server.
Optionally, each failure data block includes a unique number, and the determining module 504 includes:
the detection submodule is used for checking whether the path where the fault data block with the minimum number or the maximum number is located is locked;
the first judgment submodule is used for acquiring the difference value between the locked starting time of the path and the current time when the detection result of the detection submodule is yes, judging whether the difference value is larger than a preset threshold value or not, and if the difference value is larger than the preset threshold value, judging that the path where each fault data block is located can be locked; if the number of the data blocks is less than or equal to the number of the data blocks, judging that the path where each fault data block is located cannot be locked;
and the second judging submodule is used for judging whether the path where each fault data block is located can be locked when the detection result of the detection submodule is negative.
Wherein, for the condition that the checked data block with the smallest number is the smallest failure data block, the determining module 504 includes:
the first locking submodule is used for sequentially locking the paths of the fault data blocks according to the sequence of the numbers from small to large;
for the case that the checked data block with the largest number of faults is the one with the largest number, the determining module 504 includes:
and the second locking submodule is used for sequentially locking the paths of the fault data blocks according to the sequence of the numbers from large to small.
Referring to fig. 6, fig. 6 is a schematic structural diagram of another data recovery apparatus according to an embodiment of the present invention; the embodiment of fig. 6 of the present invention is based on the embodiment shown in fig. 5, and adds an unlocking module 505;
and an unlocking module 505, configured to unlock a path where each failed data block is located.
Referring to fig. 7, fig. 7 is a schematic structural diagram of another data recovery apparatus according to an embodiment of the present invention; the embodiment of fig. 7 of the present invention is added with a fourth obtaining module 506 on the basis of the embodiment shown in fig. 5;
a fourth obtaining module 506, configured to, if the determination result of the second obtaining module 502 is negative or the determination result of the determining module 504 is negative, obtain any object other than the target object stored on the server, as the target object, and return to the step of obtaining the target data block of the target object stored on the server.
An embodiment of the present invention further provides an electronic device, as shown in fig. 8, which includes a processor 801, a communication interface 802, a memory 803, and a communication bus 804, where the processor 801, the communication interface 802, and the memory 803 complete mutual communication through the communication bus 804,
a memory 803 for storing a computer program;
the processor 801 is configured to implement the data recovery method provided in the embodiment of the present invention when executing the program stored in the memory 803, and specifically, the data recovery method includes the following steps:
obtaining a target data block of a target object, wherein the target data block is: the data block stored on the server in all the data blocks of the target object;
acquiring two adjacent data blocks of the target data block, and detecting whether a fault data block exists in the two adjacent data blocks;
if the fault data block exists, all data blocks of the target object are obtained, and the fault data block and the normal data block are determined from all the data blocks of the target object;
judging whether the path of each fault data block can be locked, if so, sequentially locking the path of each fault data block according to a preset sequence; and restoring each fault data block into a normal data block corresponding to the fault data block by using a preset number of normal data blocks, and sending the normal data block corresponding to the fault data block to a server corresponding to a path where each fault data block is located.
Optionally, if there is no faulty data block, or it is determined that the path where each faulty data block is located cannot be locked, the method further includes:
and obtaining any object except the target object stored on the server as the target object, and returning to execute the step of obtaining the target data block of the target object stored on the server.
Optionally, the method further includes:
and unlocking the path of each fault data block.
Optionally, the step of obtaining two adjacent data blocks of the target data block includes:
obtaining the path of each adjacent data block by using the hash ring stored on the server;
and determining each server for storing each adjacent data block by using the path of each adjacent data block, and acquiring each adjacent data block based on each server.
Optionally, each fault data block includes a unique number, and determining whether a path where each fault data block is located can be locked includes:
checking whether the path where the fault data block with the minimum number or the maximum number is located is locked;
if the path is locked, obtaining a difference value between the locked starting time and the current time of the path, judging whether the difference value is greater than a preset threshold value, and if so, judging that the path where each fault data block is located can be locked; if the number of the data blocks is less than or equal to the number of the data blocks, judging that the path where each fault data block is located cannot be locked;
and if the data block is not locked, judging that the path of each fault data block can be locked.
Optionally, for the condition that the smallest serial number of the fault data blocks is checked, sequentially locking paths where the fault data blocks are located according to a preset sequence, including:
sequentially locking paths of the fault data blocks according to the sequence of the numbers from small to large;
aiming at the condition that the serial number of the largest fault data block is checked, sequentially locking the path of each fault data block according to a preset sequence, the method comprises the following steps:
and locking the paths of the fault data blocks in sequence according to the sequence of the numbers from large to small.
The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.
The communication interface is used for communication between the electronic equipment and other equipment.
The Memory may include a Random Access Memory (RAM) or a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.
The Processor may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a network Processor (Ne word Processor, NP), and the like; the integrated Circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.
In yet another aspect of the present invention, there is also provided a computer-readable storage medium having stored therein instructions, which when run on a computer, cause the computer to execute the above-mentioned data recovery method.
In yet another aspect of the present invention, the present invention also provides a computer program product containing instructions, which when run on a computer, causes the computer to execute the above data recovery method.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus, the electronic device, the computer-readable storage medium, and the computer program product embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiments.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (13)

1. A data recovery method applied to servers in a distributed storage system, the method comprising, for each of the servers:
obtaining a target data block of a target object, wherein the target data block is: the data blocks stored on the server in all the data blocks of the target object;
acquiring two adjacent data blocks of the target data block, and detecting whether a fault data block exists in the two adjacent data blocks;
if the fault data block exists, all data blocks of the target object are obtained, and the fault data block and the normal data block are determined from all the data blocks of the target object;
judging whether the path of each fault data block can be locked, if so, sequentially locking the path of each fault data block according to a preset sequence; and restoring each fault data block into a normal data block corresponding to the fault data block by using a preset number of normal data blocks, and sending the normal data block corresponding to the fault data block to a server corresponding to a path where each fault data block is located.
2. The method of claim 1, wherein if there is no failed data block or it is determined that the path of each failed data block cannot be locked, the method further comprises:
and obtaining any object except the target object stored on the server as the target object, and returning to execute the step of obtaining the target data block of the target object stored on the server.
3. The method of claim 1, further comprising:
and unlocking the path of each fault data block.
4. The method of claim 1, wherein the step of obtaining two adjacent data blocks of the target data block comprises:
obtaining the path of each adjacent data block by using the hash ring stored on the server;
and determining each server for storing each adjacent data block by using the path of each adjacent data block, and acquiring each adjacent data block based on each server.
5. The method according to claim 1, wherein each of the failure data blocks includes a unique number, and the determining whether the path where each of the failure data blocks is located can be locked includes:
checking whether the path where the fault data block with the minimum number or the maximum number is located is locked;
if the path is locked, obtaining a difference value between the locked starting time and the current time of the path, judging whether the difference value is greater than a preset threshold value, and if the difference value is greater than the preset threshold value, judging that the path where each fault data block is located can be locked; if the number of the data blocks is less than or equal to the number of the data blocks, judging that the path where each fault data block is located cannot be locked;
and if the data block is not locked, judging that the path of each fault data block can be locked.
6. The method according to claim 5, wherein for the condition that the checked data block with the smallest number is the smallest, sequentially locking paths of the fault data blocks according to a preset sequence comprises:
sequentially locking the path of each fault data block according to the sequence of the numbers from small to large;
aiming at the condition that the serial number of the largest fault data block is checked, sequentially locking the path where each fault data block is located according to a preset sequence, the method comprises the following steps:
and locking the path of each fault data block in sequence according to the sequence of the numbers from large to small.
7. A data recovery apparatus applied to servers in a distributed storage system, the apparatus comprising, for each of the servers:
a first obtaining module, configured to obtain a target data block of a target object, where the target data block is: the data blocks stored on the server in all the data blocks of the target object;
a second obtaining module, configured to obtain two adjacent data blocks of the target data block, and detect whether a faulty data block exists in the two adjacent data blocks;
a third obtaining module, configured to, if a determination result of the second obtaining module is yes, obtain all data blocks of the target object, and determine a faulty data block and a normal data block from all data blocks of the target object;
the judging module is used for judging whether the path where each fault data block is located can be locked or not, and if the path where each fault data block is located can be locked, the paths where each fault data block is located are sequentially locked according to a preset sequence; and restoring each fault data block into a normal data block corresponding to the fault data block by using a preset number of normal data blocks, and sending the normal data block corresponding to the fault data block to a server corresponding to a path where each fault data block is located.
8. The apparatus of claim 7, further comprising:
and a fourth obtaining module, configured to, if the determination result of the second obtaining module is negative or the determination result of the determining module is negative, obtain any object other than the target object stored in the server as the target object, and return to the step of obtaining the target data block of the target object stored in the server.
9. The apparatus of claim 7, further comprising:
and the unlocking module is used for unlocking the path where each fault data block is located.
10. The apparatus of claim 7, wherein the second obtaining module comprises:
the first obtaining submodule is used for obtaining the path of each adjacent data block by utilizing the hash ring stored on the server;
and the second obtaining submodule is used for determining each server for storing each adjacent data block by using the path of each adjacent data block, and obtaining each adjacent data block based on each server.
11. The apparatus of claim 7, wherein each of the failure data blocks contains a unique number, and wherein the determining module comprises:
the detection submodule is used for checking whether the path where the fault data block with the minimum number or the maximum number is located is locked;
the first judgment sub-module is used for obtaining the difference value between the locked starting time of the path and the current time when the detection result of the detection sub-module is yes, judging whether the difference value is larger than a preset threshold value or not, and if the difference value is larger than the preset threshold value, judging that the path where each fault data block is located can be locked; if the number of the data blocks is less than or equal to the number of the data blocks, judging that the path where each fault data block is located cannot be locked;
and the second judging submodule is used for judging whether the path where each fault data block is located can be locked when the detection result of the detection submodule is negative.
12. The apparatus of claim 11, wherein the determining module, for the case that the smallest numbered faulty data block is checked, comprises:
the first locking submodule is used for sequentially locking the paths of the fault data blocks according to the sequence of the numbers from small to large;
for the condition that the checked data block with the largest serial number is the largest fault data block, the judging module comprises:
and the second locking submodule is used for sequentially locking the paths where the fault data blocks are located according to the sequence of the numbers from large to small.
13. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;
a memory for storing a computer program;
a processor for implementing the method steps of any of claims 1-6 when executing a program stored in the memory.
CN201710446718.7A 2017-06-14 2017-06-14 Data recovery method and device and electronic equipment Active CN107346273B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710446718.7A CN107346273B (en) 2017-06-14 2017-06-14 Data recovery method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710446718.7A CN107346273B (en) 2017-06-14 2017-06-14 Data recovery method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN107346273A CN107346273A (en) 2017-11-14
CN107346273B true CN107346273B (en) 2020-09-04

Family

ID=60254494

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710446718.7A Active CN107346273B (en) 2017-06-14 2017-06-14 Data recovery method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN107346273B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111104283B (en) * 2019-11-29 2022-04-22 浪潮电子信息产业股份有限公司 Fault detection method, device, equipment and medium of distributed storage system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101276302A (en) * 2007-03-29 2008-10-01 中国科学院计算技术研究所 Magnetic disc fault processing and data restructuring method in magnetic disc array system
CN101561773A (en) * 2009-06-03 2009-10-21 成都市华为赛门铁克科技有限公司 Method for recovering disk data and device thereof
CN102156928A (en) * 2011-04-29 2011-08-17 浪潮通信信息系统有限公司 Method for system concurrency control through business logic lock

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101276302A (en) * 2007-03-29 2008-10-01 中国科学院计算技术研究所 Magnetic disc fault processing and data restructuring method in magnetic disc array system
CN101561773A (en) * 2009-06-03 2009-10-21 成都市华为赛门铁克科技有限公司 Method for recovering disk data and device thereof
CN102156928A (en) * 2011-04-29 2011-08-17 浪潮通信信息系统有限公司 Method for system concurrency control through business logic lock

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
NAS设备卷管理模块中失效数据恢复问题研究;孔华锋;《小型微型计算机系统》;20040131;第3节 *

Also Published As

Publication number Publication date
CN107346273A (en) 2017-11-14

Similar Documents

Publication Publication Date Title
EP3202123B1 (en) Semi-automatic failover
US10031816B2 (en) Systems and methods for healing images in deduplication storage
CN106776130B (en) Log recovery method, storage device and storage node
US9778998B2 (en) Data restoration method and system
US20160006461A1 (en) Method and device for implementation data redundancy
CN109656895B (en) Distributed storage system, data writing method, device and storage medium
CN108540315B (en) Distributed storage system, method and device
CN111818124B (en) Data storage method, data storage device, electronic equipment and medium
EP3809708B1 (en) Video data storage method and device in cloud storage system
WO2020048488A1 (en) Data storage method and storage device
CN109726036B (en) Data reconstruction method and device in storage system
CN109254956B (en) Data downloading method and device and electronic equipment
CN113377569A (en) Method, apparatus and computer program product for recovering data
CN107346273B (en) Data recovery method and device and electronic equipment
CN112436962B (en) Block chain consensus network dynamic expansion method, electronic device, system and medium
CN110837521A (en) Data query method and device and server
US11210003B2 (en) Method, device and computer program product for restoring data based on replacing child node identifiers with parent node identifier
CN111857603B (en) Data processing method and related device
US9098446B1 (en) Recovery of corrupted erasure-coded data files
CN111506450B (en) Method, apparatus and computer program product for data processing
CN108133034B (en) Shared storage access method and related device
CN106020975B (en) Data operation method, device and system
CN113485872A (en) Fault processing method and device and distributed storage system
US9471409B2 (en) Processing of PDSE extended sharing violations among sysplexes with a shared DASD
US10489239B2 (en) Multiplexing system, multiplexing method, and computer program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant