CN106919470A - A kind of data reconstruction method and device - Google Patents
A kind of data reconstruction method and device Download PDFInfo
- Publication number
- CN106919470A CN106919470A CN201510991047.3A CN201510991047A CN106919470A CN 106919470 A CN106919470 A CN 106919470A CN 201510991047 A CN201510991047 A CN 201510991047A CN 106919470 A CN106919470 A CN 106919470A
- Authority
- CN
- China
- Prior art keywords
- data block
- target data
- message
- request
- access time
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1446—Point-in-time backing up or restoration of persistent data
- G06F11/1448—Management of the data involved in backup or backup restore
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1446—Point-in-time backing up or restoration of persistent data
- G06F11/1458—Management of the backup or restore process
- G06F11/1461—Backup scheduling policy
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Information Transfer Between Computers (AREA)
Abstract
The present invention relates to data processing field, a kind of data reconstruction method and device are disclosed, when distributed file system HDFS carries out data recovery, the method includes:Download metadata list and data block list;The request for accessing either objective file is received, the target data block message that the file destination is included is searched in the metadata according to the request;If finding the target data block message in data block list, the corresponding download complement mark of the target data block message is obtained;Determine that the target data block message does not download completion according to the download complement mark, then the attribute information of the target data block is searched in the data block list;The expected access time of the target data block message, and the expected access time according to the Attribute tuning of the request are obtained from the attribute information.
Description
Technical field
The present invention relates to technical field of data processing, more particularly to a kind of data reconstruction method and device.
Background technology
As the arrival in big data epoch, data message become more and more important, data protection problem is outstanding day by day.Number
It is extremely important topic according to the disaster tolerance of protection, data.
According to statistics, the reason for causing system to go wrong distribution proportion is generally:Hardware fault accounts for 44%, people
14% is accounted for for mistake accounts for 32%, software fault, virus influence accounts for 7%, natural disaster and accounts for 3%.Data are held
Calamity system, is exactly an environment that can deal with various disasters that computer information system is provided.Work as computer
System be subjected to such as fire, floods, earthquake, the irresistible natural disaster of war and computer crime,
The artificial calamity such as computer virus, power down, network/communication failure, hardware/software mistake and manual operation mistake
When difficult, disaster tolerance system will ensure the security of user data.Even, a more perfect disaster tolerance system,
Continual application service can also be provided.
Hadoop distributed file systems (Hadoop distributed filesystem, abbreviation HDFS) quilt
It is designed to be adapted to operate in the distributed file system on common hardware.It and existing distributed file system
There are many common ground.But meanwhile, it and other distributed file systems also have apparent difference.HDFS
It is a system for Error Tolerance, is adapted to be deployed on cheap machine.HDFS can provide high-throughput
Data access, be especially suitable for the application on large-scale dataset.
It is how that the mass data in HDFS is rapidly and accurately standby as the use of HDFS is more and more universal
Part can be rapidly reverted to for a prominent question to far-end server on specified cluster.
In system failure recovery process, it is intended that HDFS can recover data from backup server as early as possible,
And service is provided out, to reduce the break period of service.
But substantial amounts of data are often saved in a HDFS system, although having used multiple back end
The method of parallel transmission, still needing to take an undesirably long time recover from backup server, causes clothes
The business break period is long.
The content of the invention
The present invention provides a kind of data reconstruction method and device, and the method and device are used to solve in the prior art
HDFS systems from backup server recover data when the long problem of out of service time.
The present invention discloses a kind of data reconstruction method, when distributed file system HDFS carries out data recovery,
Including:
Download metadata list and data block list;
The request for accessing either objective file is received, the mesh is searched in the metadata according to the request
The target data block message that mark file is included;
If finding the target data block message in data block list, the target data block letter is obtained
Cease corresponding download complement mark;
Determine that the target data block message does not download completion according to the download complement mark, then in the number
According to the attribute information that the target data block message is searched in block list;
The expected access time of the target data block message is obtained from the attribute information, and according to described
Expected access time described in the Attribute tuning of request.
Optionally, back end expected access time according to the Attribute tuning of the request includes:
When determining the request to read any file according to the attribute information of the request, it is determined that institute
The minimum value in expected access time and current system time is stated, the expected access time is adjusted to institute
State minimum value so that system recovers the target data block message using the described expected access time after adjustment.
Optionally, the expected access time is adjusted to after the minimum value, the method is further included:
The corresponding target data node of the target data block message is found in the data block list, is passed through
Remote procedure call sends adjustment information to the target data node so that the target data node is according to this
Adjustment information updates the expected access time of the target data block message for prestoring.
Optionally, before download metadata list and data block list, the method is further included:
Operational threshold is set so that can externally be provided when data recovery operation is not completed and data are grasped
The service of work;Wherein, the operational threshold is that the quantity of the data block recovered accounts for all data to be restored
The ratio of the quantity of block.
Optionally, the method also includes:
When the request is determined to delete any file according to the attribute information of the request, then described
The corresponding target data node of the target data block message is searched in data block list, is adjusted by remote process
Message is deleted with being sent to the target data node;So that the target data node receives the deletion and disappears
After breath, the corresponding information of the target data block message is deleted in the download queue of data block.
Optionally, after expected access time described in the Attribute tuning according to the request, further include:
The Priority Queues that a data block is downloaded is safeguarded in back end, back end takes out from queue every time
The minimum block message of expected access time;
The corresponding data block of block message is downloaded from backup end server.
The present invention also provides a kind of Data Recapture Unit, including:
List block, for when distributed file system HDFS carries out data recovery, download metadata to be arranged
Table and data block list;
First searching modul, the request of either objective file is accessed for receiving, according to the request described
The target data block message that the file destination is included is searched in metadata;
Acquisition module, if for finding the target data block message in data block list, obtaining institute
State the corresponding download complement mark of target data block message;
Second searching modul, under determining the target data block message not according to the download complement mark
Carry and complete, then the attribute information of the target data block message is searched in the data block list;
Adjusting module, during for the expected access that the target data block message is obtained from the attribute information
Between, and the expected access time according to the Attribute tuning of the request.
Optionally, the adjusting module determines the request specifically for working as according to the attribute information of the request
It is to read any file, it is determined that the minimum value in the expected access time and current system time,
The expected access time is adjusted to the minimum value so that system is using the described expected access after adjustment
Time recovers the target data block message.
Optionally, the adjusting module is additionally operable to be adjusted to the expected access time after the minimum value,
The corresponding target data node of the target data block message is found in the data block list, by long-range
The invocation of procedure sends adjustment information to the target data node so that the target data node is according to the adjustment
The expected access time of the target data block message that information updating prestores.
Optionally, the device also includes:
Recovery module, before download metadata list and data block list, sets operational threshold so that
The service operated to data can be externally provided when data recovery operation is not completed;Wherein, the behaviour
The ratio of the quantity of all data blocks to be restored is accounted for as the quantity that threshold value is the data block recovered.
Optionally, the device also includes:
Removing module, determines that the request is described any to delete for working as according to the attribute information of the request
File, then search the corresponding target data node of the target data block message in the data block list,
Sent to the target data node by remote procedure call and delete message;So that the target data node
After receiving the deletion message, the target data block message is deleted in the download queue of data block corresponding
Information.
Optionally, the device also includes:
Download module, for safeguarding the Priority Queues that a data block is downloaded in back end, back end is every
It is secondary that the minimum block message of expected access time is taken out from queue;Block message correspondence is downloaded from backup end server
Data block.
One or two in above-mentioned technical proposal, at least has the following technical effect that:
Method and apparatus disclosed by the invention, during downloading data block, download and are loaded into metadata first
Start to provide file system service;The data block list to be downloaded is preserved in namenode and back end;
According to client in recovery process to the priority of the requirements for access adjustment and recovery of data block, preferentially recover client
Hold the file for accessing;Recovery process is abandoned with time-consuming for the file that client is deleted;Height can be realized
The distributed file system of effect is recovered on demand.
Brief description of the drawings
A kind of Fig. 1 flow charts of data reconstruction method for the embodiment of the present invention is provided;
Fig. 2 provides method practical apparatus structure schematic diagram by the embodiment of the present invention;
Fig. 3 is provided schematic flow sheet when method is used in specific environment by the embodiment of the present invention;
A kind of Fig. 4 structural representations of Data Recapture Unit for the embodiment of the present invention is provided.
Specific embodiment
To make the purpose, technical scheme and advantage of the embodiment of the present invention clearer, below in conjunction with the present invention
Accompanying drawing in embodiment, is clearly and completely described to the technical scheme in the embodiment of the present invention.
In HDFS can't all of data it is accessed in synchronization, generally require the number for accessing at once
According to often be sub-fraction.In order to reduce out of service time during data recovery, HDFS is allowed to start as early as possible
Service is provided out, in method provided in an embodiment of the present invention, data can be recovered on demand;Exist
In data recovery procedure, in real time according to the priority for accessing situation adjustment and recovery, first recover to need to visit as early as possible
The data asked, recover the data that need not be accessed now afterwards.By the above method, service is greatly reduced
Interrupting Time, and there is provided the method for more flexible control recovery process.Below in conjunction with specific
Accompanying drawing elaborates to method provided in an embodiment of the present invention, specifically includes:
As shown in figure 1, the embodiment of the present invention provides a kind of data reconstruction method, when the distributed texts of Hadoop
When part system carries out data recovery, specifically include:
Step 101, download metadata list and data block list;
In the prior art, if data are not returned to a certain degree, HDFS file system is in safe mould
Formula, now only supports checking for file system directory tree, does not support to the modification of file system and to file
Read;Even if be not downloaded completion for the data block ensured in file system, file system is still
Complete service can be externally provided.In this embodiment it is possible in the list of step 101 download metadata and
Before data block list, further include:
Operational threshold is set so that can externally be provided when data recovery operation is not completed and data are grasped
The service of work;Wherein, the operational threshold is that the quantity of the data block recovered accounts for all data to be restored
The ratio of the quantity of block.When specifically used, the operational threshold could be arranged to the acceptable minimum value of system,
For example it is set to 0.
Step 102, receives the request for accessing either objective file, according to the request in the metadata
Search the target data block message that the file destination is included;
Step 103, if finding the target data block message in data block list, obtains the mesh
The corresponding download complement mark of mark data block information;
Step 104, determines that the target data block message does not download completion according to the download complement mark,
The attribute information of the target data block message is then searched in the data block list;
Step 105, obtains the expected access time of the target data block message from the attribute information,
And access time is expected according to the Attribute tuning of the request.
In embodiments of the present invention, the Attribute tuning according to request is expected the side of implementing that access time includes
Formula includes a kind of various modes of optimization presented below, can be specifically:
When determining the request to read any file according to the attribute information of the request, it is determined that institute
The minimum value in expected access time and current system time is stated, the expected access time is adjusted to institute
State minimum value so that system recovers the target data block message using the described expected access time after adjustment.
Further, the expected access time of setting before expected access time being adjusted to system time and downloaded
In minimum value after, the method is further included:
The corresponding target data node of the target data block message is found in the data block list, is passed through
Remote procedure call sends adjustment information to the target data node so that the target data node is according to this
Adjustment information updates the expected access time of the target data block message for prestoring.
In addition, in data recovery procedure, can select to abandon recovery process for the file that client is deleted
To save the time of data recovery;Can realize that efficient distributed file system is recovered on demand.Implement
Can be:
When the request is determined to delete any file according to the attribute information of the request, then described
The corresponding target data node of the target data block message is searched in data block list, is adjusted by remote process
Message is deleted with being sent to the target data node;So that the target data node receives the deletion and disappears
After breath, the corresponding information of the target data block message is deleted in the download queue of data block.
After being adjusted to the expected access time of data block according to the demand for accessing based on such scheme, tool
The download that body carries out data block includes:
The Priority Queues that a data block is downloaded is safeguarded in back end, back end takes out from queue every time
The minimum block message of expected access time;
The corresponding data block of block message is downloaded from backup end server.
In addition, after the completion of data block is downloaded, by remote procedure call namenode (NameNode)
A function, inform namenode (NameNode) completed download the data block information,
The information of data block can include the id of data block, generation mark, length;Namenode NameNode
The download complement mark of corresponding data block is set to true.
When the method that above-described embodiment is provided is applied in specific environment, the embodiment of the present invention is practical
Use environment is as shown in Fig. 2 implementing flow can be:
HDFS is from backup server downloading data in the embodiment, then the namenode in HDFS
(NameNode) and back end (DataNode) to download data carry out corresponding treatment, specifically
Realization includes:
The internal memory of namenode (NameNode) includes all data blocks downloaded from backup end server
List, while HDFS specifies the number of back end (DataNode), NameNode when starting
All of data block is averagely allocated to all of DataNode, i.e., is noted to NameNode in DataNode
During volume, the data block list for distributing to the NameNode is sent to it by NameNode, and
NameNode preserves a data block information to mapping relations (the i.e. data block of DataNode in internal memory
List:Block-DataNode Map), the information of data block includes the id of data block, generation mark, length
Spend, download complement mark (acquiescence false), (lint-long integer is defaulted as maximum to expected access time
Lint-long integer).
In embodiment, can be from implementing for backup server downloading data:
The Priority Queues that a data block is downloaded is preserved in back end, back end takes from queue every time
(expected access time type is a lint-long integer, and initialization is all to go out the minimum data block of expected access time
The expected access time of data block is maximum lint-long integer), the data block is downloaded from backup end server, under
After the completion of load, a function on namenode is called by remote procedure call, inform namenode
Complete download the data block information, including data block id, generation mark, length, namenode handle
The download complement mark of corresponding data block is set to true.
Start namenode, download metadata;
Configured in configuration file, HDFS is specified and recovered option, recovered by start up with command-line options
File system title and data node number.Configuration file is preferably hdfs-backup.xml.Configuration file
$ HADOOP/etc/hadoop/hdfs-backup.xml are deposited in, it is necessary to the address end of configuration backup server
Mouthful.The execution recovery order of HDFS systems starts after configuration is completed, and specifies recovery option
- recoverFromBackup, the file system title to be recovered, the number of back end.Namenode meeting
Compressed file from backup end server download metadata is loaded into this document in internal memory and obtains text to local
The metadata of part system, namely file system directory tree;Meanwhile, namenode is obtained from backup end server
The list of all data blocks included in file system is taken, is put into internal memory.
Specifically, HDFS acquiescences quantity of existing data block in file system accounts for the institute that include
When the ratio for having the quantity of data block is less than the threshold value of certain setting, HDFS file system is in safe mode,
Checking for file system directory tree is now only supported, the modification to file system and the reading to file are not supported;
Even if in order to when all of data block is not all downloaded in file system, file system still can be external
Complete service is provided, the threshold value 0 can be set in this embodiment.
Log-on data node, downloads block number evidence;
The backward namenode of back end startup is registered.Namenode is in registration to DataNode
Distribution needs the list of the data block downloaded.Back end is from the data in backup server download list.
Specifically, the internal memory of namenode includes the list of all data blocks downloaded from backup end server,
Specify the number of back end when HDFS starts simultaneously, namenode is by all of data block mean allocation
To all of back end, i.e., when back end is registered to namenode, namenode handle distributes to the number
It is sent to according to the data block list of node, and namenode preserves a data block information in internal memory and arrives
Mapping relations (the i.e. data block list of back end:Block-DataNode Map), the information of data block
When the id including data block, generation mark, length, download complement mark (acquiescence false), expected access
Between (lint-long integer, be defaulted as maximum lint-long integer).
Specifically, preserve the Priority Queues that data block is downloaded in back end, back end every time from
The minimum block of expected access time is taken out in queue, and (expected access time type is a lint-long integer, initialization
The expected access time of all data blocks is maximum lint-long integer), the data block is downloaded from backup end server,
After the completion of download, a function on namenode is called by remote procedure call, inform namenode
Through complete download the data block information, including data block id, generation mark, length, namenode
The download complement mark of corresponding data block is set to true.
As shown in figure 3, the method downloaded based on above-mentioned data, if when data are not recovered also to complete,
(access request can read a certain data to receive the request of a certain data of access of client transmission
Can delete data), then specific the method also includes:
Step 301, client reads or deletes any file;
Step 302, searches the corresponding data block list of any file in the metadata;
Step 303, when searching data block updates expected access in searching data block list in data block list
Between or delete data block, and send instructions to corresponding back end;
Step 304, back end updates corresponding expected access time or deletion according to the instruction of namenode
Corresponding data block;
Back end is according to the Priority Queues for pre-setting, the preferential team for pre-setting in downloading data
Row are to be expected the Priority Queues of access time sequence.
Step 305, order of the back end in amended Priority Queues, one by one downloading data block.
For more detailed description, digital independent and implementing of deleting, once respectively to digital independent and
Implementing for deleting is described further, and specifically includes:
(1) priority is changed according to access file situation.
Client sends to namenode and opens file command;Searched in metadata of the namenode in internal memory
To the data block to be read that this document is included, the access of continuing is searched in Block-DataNode Map
According to block, if not finding, illustrate that the data block to be read is file newly-built after starting, carry out normal
Read-write operation;Otherwise check the value of the download complement mark of data block.If the download of all data blocks is completed
Mark is all true (completion has been downloaded in expression), then carry out normal read-write operation.If there is data
The download complement mark of block is false (completion is not downloaded in expression), then return to " a data block to client
In download " mistake, it is desirable to client accesses this document after a while;Meanwhile, in Block-DataNode Map
It is middle search the data block information, enchashment system time and the data block expected access time in most
Small value is set to the expected access time of the data block, if the expected access time of the data block changes,
Its corresponding back end is found in Block-DataNode Map, by remote procedure call to the data section
Point transmission information, updates the expected access time of the data block.
(2) priority is changed according to deletion file situation.
Client sends to namenode and deletes file command;Searched in metadata of the namenode in internal memory
To the data block that this document is included, these data blocks are searched in Block-DataNode Map, if not
Find, then what explanation to be deleted is file newly-built after starting, and carries out normal deletion action;Otherwise enter
Row is normal to delete metadata operation, and checks the value of the download complement mark of included data block;For
It is the data block of true to download complement mark, carries out normal deletion action;It is for downloading complement mark
The data block of false, searches the corresponding back end of the data block in Block-DataNode Map, leads to
Cross remote procedure call and send deletion message to the back end;After back end receives message, if downloading team
There is the data block in row, then the data block is deleted from the download queue of data block.
Complete recovery process.When recovering step above is performed, in detection namenode
Block-DataNode Map, when Block-DataNode Map are changed into empty or wherein all of data block
When downloading mark all for true, the Block-DataNode Map in NameNode internal memories are deleted, completed
Recovery process.
The embodiment of the present invention provides provided method and is downloaded during downloading data block, first and be loaded into unit
Data start to provide file system service;The data block to be downloaded of the preservation row in namenode and back end
Table;According to client in recovery process to the priority of the requirements for access adjustment and recovery of data block, preferentially recover
The file that client is accessed;Recovery process is abandoned with time-consuming for the file that client is deleted;Can be real
Now efficient distributed file system is recovered on demand.
Embodiment
As shown in figure 4, the embodiment of the present invention also provides a kind of Data Recapture Unit, the device includes:
List block 401, for when distributed file system HDFS carries out data recovery, downloading first number
According to list and data block list;
First searching modul 402, the request of either objective file is accessed for receiving, and is existed according to the request
The target data block message that the file destination is included is searched in the metadata;
Acquisition module 403, if for finding the target data block message in data block list, obtaining
Take the corresponding download complement mark of the target data block message;
Second searching modul 404, for determining the target data block message according to the download complement mark
Completion is not downloaded, then the attribute information of the target data block message is searched in the data block list;
Adjusting module 405, visits for obtaining the expected of the target data block message from the attribute information
Ask the time, and the expected access time according to the Attribute tuning of the request.
Optionally, the adjusting module 405 when according to the determination of the attribute information of the request specifically for asking
Ask is to read any file, it is determined that the minimum in the expected access time and current system time
Value, the minimum value is adjusted to by the expected access time so that system is using the expection after adjustment
Access time recovers the target data block message.
Optionally, the adjusting module 405 be additionally operable to by the expected access time be adjusted to the minimum value it
Afterwards, the corresponding target data node of the target data block message is found in the data block list, is passed through
Remote procedure call sends adjustment information to the target data node so that the target data node is according to this
Adjustment information updates the expected access time of the target data block message for prestoring.
In addition, the device also includes:
Recovery module, before download metadata list and data block list, sets operational threshold so that
The service operated to data can be externally provided when data recovery operation is not completed;Wherein, the behaviour
The ratio of the quantity of all data blocks to be restored is accounted for as the quantity that threshold value is the data block recovered.
Optionally, in order to save time of data recovery, can select to put for the file to be deleted of client
Abandon extensive, then the device also includes:
Removing module, determines that the request is described any to delete for working as according to the attribute information of the request
File, then search the corresponding target data node of the target data block message in the data block list,
Sent to the target data node by remote procedure call and delete message;So that the target data node
After receiving the deletion message, the target data block message is deleted in the download queue of data block corresponding
Information.
Optionally, the device also includes:
Download module, for safeguarding the Priority Queues that a data block is downloaded in back end, back end is every
It is secondary that the minimum block message of expected access time is taken out from queue;Block message correspondence is downloaded from backup end server
Data block.
Method and device provided by the present invention, with advantages below:
Method and apparatus disclosed by the invention, during downloading data block, download and are loaded into metadata first
Start to provide file system service;The data block list to be downloaded is preserved in namenode and back end;
According to client in recovery process to the priority of the requirements for access adjustment and recovery of data block, preferentially recover client
Hold the file for accessing;Recovery process is abandoned with time-consuming for the file that client is deleted;Height can be realized
The distributed file system of effect is recovered on demand.
Obviously, described embodiment is a part of embodiment of the invention, rather than whole embodiments.Base
Embodiment in the present invention, those of ordinary skill in the art are obtained under the premise of creative work is not made
The every other embodiment for obtaining, belongs to the scope of protection of the invention.
Claims (12)
1. a kind of data reconstruction method, it is characterised in that when distributed file system HDFS carries out data
During recovery, including:
Download metadata list and data block list;
The request for accessing either objective file is received, the mesh is searched in the metadata according to the request
The target data block information that mark file is included;
If finding the target data block message in data block list, the target data block letter is obtained
Cease corresponding download complement mark;
Determine that the target data block message does not download completion according to the download complement mark, then in the number
According to the attribute information that the target data block message is searched in block list;
The expected access time of the target data block message is obtained from the attribute information, and according to described
Expected access time described in the Attribute tuning of request.
2. the method for claim 1, it is characterised in that category of the back end according to the request
Property the adjustment expected access time include:
When determining the request to read any file according to the attribute information of the request, it is determined that institute
The minimum value in expected access time and current system time is stated, the expected access time is adjusted to institute
State minimum value so that system recovers the target data block message using the described expected access time after adjustment.
3. method as claimed in claim 2, it is characterised in that be adjusted to the expected access time
After the minimum value, the method is further included:
The corresponding target data node of the target data block message is found in the data block list, is passed through
Remote procedure call sends adjustment information to the target data node so that the target data node is according to this
Adjustment information updates the expected access time of the target data block message for prestoring.
4. the method as described in claims 1 to 3 is any, it is characterised in that download metadata list sum
Before according to block list, the method is further included:
Operational threshold is set so that can externally be provided when data recovery operation is not completed and data are grasped
The service of work;Wherein, the operational threshold is that the quantity of the data block recovered accounts for all data to be restored
The ratio of the quantity of block.
5. the method for claim 1, it is characterised in that the method also includes:
When the request is determined to delete any file according to the attribute information of the request, then described
The corresponding target data node of the target data block message is searched in data block list, is adjusted by remote process
Message is deleted with being sent to the target data node;So that the target data node receives the deletion and disappears
After breath, the corresponding information of the target data block message is deleted in the download queue of data block.
6. the method as described in Claims 1 to 5 is any, it is characterised in that according to the attribute of the request
Adjust after the expected access time, further include:
The Priority Queues that a data block is downloaded is safeguarded in back end, back end takes out from queue every time
The minimum block message of expected access time;
The corresponding data block of block message is downloaded from backup end server.
7. a kind of Data Recapture Unit, it is characterised in that including:
List block, for when distributed file system HDFS carries out data recovery, download metadata to be arranged
Table and data block list;
First searching modul, the request of either objective file is accessed for receiving, according to the request described
The target data block message that the file destination is included is searched in metadata;
Acquisition module, if for finding the target data block message in data block list, obtaining institute
State the corresponding download complement mark of target data block message;
Second searching modul, under determining the target data block message not according to the download complement mark
Carry and complete, then the attribute information of the target data block message is searched in the data block list;
Adjusting module, during for the expected access that the target data block message is obtained from the attribute information
Between, and the expected access time according to the Attribute tuning of the request.
8. device as claimed in claim 6, it is characterised in that the adjusting module is specifically for working as root
Determine the request to read any file according to the attribute information of the request, it is determined that the expected visit
The minimum value in time and current system time is asked, the expected access time is adjusted to the minimum value,
So that system recovers the target data block message using the described expected access time after adjustment.
9. device as claimed in claim 7, it is characterised in that the adjusting module is additionally operable to will be described
Expected access time is adjusted to after the minimum value, and the target data is found in the data block list
The corresponding target data node of block message, is sent to the target data node by remote procedure call and adjusted
Information so that the target data block message that the target data node prestores according to adjustment information renewal
Expected access time.
10. the device as described in claim 6~8 is any, it is characterised in that the device also includes:
Recovery module, before download metadata list and data block list, sets operational threshold so that
The service operated to data can be externally provided when data recovery operation is not completed;Wherein, the behaviour
The ratio of the quantity of all data blocks to be restored is accounted for as the quantity that threshold value is the data block recovered.
11. devices as claimed in claim 6, it is characterised in that the device also includes:
Removing module, determines that the request is described any to delete for working as according to the attribute information of the request
File, then search the corresponding target data node of the target data block message in the data block list,
Sent to the target data node by remote procedure call and delete message;So that the target data node
After receiving the deletion message, the target data block message is deleted in the download queue of data block corresponding
Information.
12. device as described in claim 7~11 is any, it is characterised in that the device also includes:
Download module, for safeguarding the Priority Queues that a data block is downloaded in back end, back end is every
It is secondary that the minimum block message of expected access time is taken out from queue;Block message correspondence is downloaded from backup end server
Data block.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510991047.3A CN106919470A (en) | 2015-12-25 | 2015-12-25 | A kind of data reconstruction method and device |
PCT/CN2016/111762 WO2017107984A1 (en) | 2015-12-25 | 2016-12-23 | Data recovery method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510991047.3A CN106919470A (en) | 2015-12-25 | 2015-12-25 | A kind of data reconstruction method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106919470A true CN106919470A (en) | 2017-07-04 |
Family
ID=59089054
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510991047.3A Pending CN106919470A (en) | 2015-12-25 | 2015-12-25 | A kind of data reconstruction method and device |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN106919470A (en) |
WO (1) | WO2017107984A1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111176901A (en) * | 2019-12-31 | 2020-05-19 | 厦门市美亚柏科信息股份有限公司 | HDFS deleted file recovery method, terminal device and storage medium |
CN112579179A (en) * | 2019-09-30 | 2021-03-30 | 合肥杰发科技有限公司 | Partition mounting method of embedded system |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109542860B (en) * | 2018-10-25 | 2023-07-07 | 平安科技(深圳)有限公司 | Service data management method based on HDFS and terminal equipment |
CN111831625B (en) * | 2020-07-14 | 2024-03-12 | 深圳力维智联技术有限公司 | Data migration method, data migration device, and readable storage medium |
US20220237010A1 (en) * | 2021-01-28 | 2022-07-28 | Red Hat, Inc. | Executing containerized applications using partially downloaded container image files |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101414277A (en) * | 2008-11-06 | 2009-04-22 | 清华大学 | Need-based increment recovery disaster-containing system and method based on virtual machine |
CN103207867A (en) * | 2012-01-16 | 2013-07-17 | 联想(北京)有限公司 | Method for processing data blocks, method for initiating recovery operation and nodes |
US8639974B1 (en) * | 2005-07-20 | 2014-01-28 | Dell Software Inc. | Method and system for virtual on-demand recovery |
CN103617097A (en) * | 2013-11-19 | 2014-03-05 | 华为技术有限公司 | File recovery method and file recovery device |
CN104572357A (en) * | 2014-12-30 | 2015-04-29 | 清华大学 | Backup and recovery method for HDFS (Hadoop distributed filesystem) |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8204863B2 (en) * | 2009-12-21 | 2012-06-19 | International Business Machines Corporation | Multi-phase file system restore with selective on-demand data availability |
EP2951694B1 (en) * | 2013-01-30 | 2017-08-16 | Hewlett-Packard Enterprise Development LP | Recovering pages of a database |
CN105007172A (en) * | 2015-05-28 | 2015-10-28 | 杭州健港信息科技有限公司 | Method for realizing HDFS high-availability scheme |
-
2015
- 2015-12-25 CN CN201510991047.3A patent/CN106919470A/en active Pending
-
2016
- 2016-12-23 WO PCT/CN2016/111762 patent/WO2017107984A1/en active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8639974B1 (en) * | 2005-07-20 | 2014-01-28 | Dell Software Inc. | Method and system for virtual on-demand recovery |
CN101414277A (en) * | 2008-11-06 | 2009-04-22 | 清华大学 | Need-based increment recovery disaster-containing system and method based on virtual machine |
CN103207867A (en) * | 2012-01-16 | 2013-07-17 | 联想(北京)有限公司 | Method for processing data blocks, method for initiating recovery operation and nodes |
CN103617097A (en) * | 2013-11-19 | 2014-03-05 | 华为技术有限公司 | File recovery method and file recovery device |
CN104572357A (en) * | 2014-12-30 | 2015-04-29 | 清华大学 | Backup and recovery method for HDFS (Hadoop distributed filesystem) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112579179A (en) * | 2019-09-30 | 2021-03-30 | 合肥杰发科技有限公司 | Partition mounting method of embedded system |
CN111176901A (en) * | 2019-12-31 | 2020-05-19 | 厦门市美亚柏科信息股份有限公司 | HDFS deleted file recovery method, terminal device and storage medium |
Also Published As
Publication number | Publication date |
---|---|
WO2017107984A1 (en) | 2017-06-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11704290B2 (en) | Methods, devices and systems for maintaining consistency of metadata and data across data centers | |
CN103116618B (en) | Based on Telefile mirror method and the system of the lasting buffer memory of client | |
CN104008152B (en) | Support the framework method of the distributed file system of mass data access | |
CN106919470A (en) | A kind of data reconstruction method and device | |
CN102662992B (en) | Method and device for storing and accessing massive small files | |
CN111078121B (en) | Data migration method and system for distributed storage system and related components | |
CN104378423B (en) | Metadata cluster distributed memory system and reading, the method for write-in storage data | |
US10922303B1 (en) | Early detection of corrupt data partition exports | |
US20140081919A1 (en) | Distributed backup system for determining access destination based on multiple performance indexes | |
CN105677251B (en) | Storage system based on Redis cluster | |
CN104281506A (en) | Data maintenance method and system for file system | |
CN109391655A (en) | Service gray scale dissemination method, device, system and storage medium | |
JP2005242403A (en) | Computer system | |
CN111506592B (en) | Database upgrading method and device | |
CN109684282A (en) | A kind of method and device constructing metadata cache | |
CN109710586B (en) | A kind of clustered node configuration file synchronous method and device | |
CN109639773A (en) | A kind of the distributed data cluster control system and its method of dynamic construction | |
CN107180034A (en) | The group system of MySQL database | |
CN104794119A (en) | Middleware message storage and transmission method and system | |
US20160275085A1 (en) | Methods for facilitating a nosql database with integrated management and devices thereof | |
CN109597903A (en) | Image file processing apparatus and method, document storage system and storage medium | |
CN108762982A (en) | A kind of database restoring method, apparatus and system | |
CN108595616B (en) | Unified namespace management method for distributed file system | |
CN105635264B (en) | A kind of file system based on online game application | |
CN105511808B (en) | Data operation method, system and related device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20170704 |