CN110716690B - Data recovery method and system - Google Patents

Data recovery method and system Download PDF

Info

Publication number
CN110716690B
CN110716690B CN201810763447.2A CN201810763447A CN110716690B CN 110716690 B CN110716690 B CN 110716690B CN 201810763447 A CN201810763447 A CN 201810763447A CN 110716690 B CN110716690 B CN 110716690B
Authority
CN
China
Prior art keywords
data recovery
task
data
module
reclamation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810763447.2A
Other languages
Chinese (zh)
Other versions
CN110716690A (en
Inventor
赵立芳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201810763447.2A priority Critical patent/CN110716690B/en
Publication of CN110716690A publication Critical patent/CN110716690A/en
Application granted granted Critical
Publication of CN110716690B publication Critical patent/CN110716690B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0604Improving or facilitating administration, e.g. storage management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/0652Erasing, e.g. deleting, data cleaning, moving of data to a wastebasket
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]

Abstract

The invention discloses a data recovery method and a data recovery system. Wherein, the method comprises the following steps: receiving a data recovery plan sent by a fragment service module, wherein the data recovery plan is used for indicating a data recovery task which needs to be executed by a data recovery module; constructing a data recovery task according to the data recovery plan; and distributing the data recovery task to a data recovery module, and executing the data recovery task by the data recovery module. The invention solves the technical problem in the prior art that the data recovery task needs to be executed again after the migration of the fragment service module, which causes the waste of computing resources.

Description

Data recovery method and system
Technical Field
The invention relates to the field of data processing, in particular to a data recovery method and a data recovery system.
Background
At present, log Structured Merge Trees (LSM) is commonly used in a distributed environment of NoSql, and the method adopts a sequential read-write disk and has better output performance than random read-write. However, the disadvantages of this method are also evident: since the required content needs to be found by reverse scanning, the read request needs more time than the write operation, especially when the user deletes or updates, the read performance and the List request performance are greatly reduced, and the storage space is wasted, so each product of NoSql performs a transaction in the background, that is, data recovery.
The deletion of data in the NoSql is not real deletion, but additional writing is carried out in a file, a deletion mark is carried on the data to be deleted, so that the data written in the file originally is invalid, garbage exists, data recovery refers to reading the effective data in the file, writing the effective data into a new file, deleting the old file, and the purpose of cleaning up the storage space is achieved.
The NoSql products usually need to perform the data recovery to achieve the purposes of space recovery and read performance improvement, but the current realization mode is that corresponding fragments (partition) are used as data recovery tasks in the own region, namely, the data recovery tasks are all initiated by a fragment service module (partition worker), a group of files are selected according to corresponding strategies as the input of the data recovery tasks, and the data recovery tasks are executed, so that when the fragments are migrated, the currently executed data recovery tasks fail, and the resource waste is caused; in addition, the busy and idle degrees of some machines are different, so that the efficiency of the data recovery task cannot reach the global optimum.
Aiming at solving the problem in the prior art that the data recovery task needs to be executed again after the migration of the fragment service module, which causes the waste of computing resources, an effective solution is not provided at present.
Disclosure of Invention
The embodiment of the invention provides a data recovery method and a data recovery system, which are used for at least solving the technical problem in the prior art that computing resources are wasted because a data recovery task needs to be executed again after a fragment service module is migrated.
According to an aspect of an embodiment of the present invention, there is provided a data recovery method, including: receiving a data recovery plan sent by a fragment service module, wherein the data recovery plan is used for indicating a data recovery task which needs to be executed by a data recovery module; constructing a data recovery task according to the data recovery plan; and distributing the data recovery task to a data recovery module, and executing the data recovery task by the data recovery module.
According to another aspect of the embodiments of the present invention, there is also provided a data recovery method, including: receiving a data recovery task distributed by a central node, wherein the central node forms the data recovery task according to a data recovery plan generated by a fragment service module; a data reclamation task is performed.
According to another aspect of the embodiments of the present invention, there is also provided a data recovery method, including: generating a data recovery plan, wherein the data recovery plan is used for indicating a data recovery task which needs to be executed by a data recovery module; sending the data recovery plan to a central node, wherein the central node constructs a data recovery task according to the data recovery plan, and the data recovery task is executed by a data recovery module
According to another aspect of the embodiments of the present invention, there is also provided a storage medium including a stored program, wherein when the program runs, a device on which the storage medium is located is controlled to perform the following steps: receiving a data recovery plan sent by a fragment service module, wherein the data recovery plan is used for indicating a data recovery task which needs to be executed by a data recovery module; constructing a data recovery task according to the data recovery plan; and distributing the data recovery task to a data recovery module, and executing the data recovery task by the data recovery module.
According to another aspect of the embodiments of the present invention, there is also provided a processor, configured to execute a program, where the program executes the following steps: receiving a data recovery plan sent by a fragment service module, wherein the data recovery plan is used for indicating a data recovery task which needs to be executed by a data recovery module; constructing a data recovery task according to the data recovery plan; and distributing the data recovery task to a data recovery module, and executing the data recovery task by the data recovery module.
According to another aspect of the embodiments of the present invention, there is also provided a data recovery system, including: the fragment service module is used for generating a data recovery plan, wherein the data recovery plan is used for indicating a data recovery task which needs to be executed by the data recovery module; the central node is used for constructing a data recovery task according to the data recovery plan and distributing the data recovery task to the data recovery module; and the data recovery module is used for executing the data recovery task.
In the embodiment of the invention, under the condition that a plurality of data recovery modules are included, a central node can distribute a plurality of data recovery plans submitted by the same fragmentation service module to different data recovery modules, so that the plurality of data recovery modules execute the plan submitted by one fragmentation service module in parallel, and the data recovery efficiency is improved.
It should be further noted that, according to the solution provided by the present application, even if the fragment service module is migrated, since the data recovery task is executed by the data recovery module, the migration of the fragment service module does not affect the processing of the data recovery task, and thus, the waste of computing resources caused by the migration of the fragment service module is not generated.
Therefore, the embodiment of the application schedules from the perspective of the whole cluster, reduces resource waste to the maximum extent, improves the efficiency and the success rate of data recovery, and simultaneously realizes that a single fragment service module concurrently executes a plurality of data recovery tasks. The method and the device solve the technical problem that in the prior art, after the migration of the fragment service module, the data recovery task needs to be executed again, so that the computing resources are wasted.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
fig. 1 shows a hardware configuration block diagram of a computer terminal (or mobile device) for implementing a data reclamation method;
FIG. 2 is a flow chart of a data recovery method according to embodiment 1 of the present application;
FIG. 3 is a schematic diagram of a data recovery method according to embodiment 1 of the present application;
FIG. 4 is a flow chart of a data recovery method according to embodiment 2 of the present application;
fig. 5 is a flow chart of a data recovery method according to embodiment 3 of the present application;
FIG. 6 is a schematic diagram of a data recovery device according to embodiment 4 of the present application;
FIG. 7 is a schematic diagram of a data recovery device according to embodiment 5 of the present application;
FIG. 8 is a schematic view of a data recovery device according to embodiment 6 of the present application;
FIG. 9 is a schematic diagram of a data recovery system according to embodiment 7 of the present application; and
fig. 10 is a block diagram of a computer terminal according to embodiment 8 of the present invention.
Detailed Description
In order to make those skilled in the art better understand the technical solutions of the present invention, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Moreover, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
First, some terms or terms appearing in the description of the embodiments of the present application are applicable to the following explanations:
a fragment service module: namely the partition worker. In the field of distributed storage, a distributed storage system comprises a plurality of nodes, the nodes are divided into regions, a plurality of subareas are obtained, and a partition worker can be arranged on a machine of one subarea to execute a data processing task of the subarea.
A central node: the compact master is a scheme adding module and is used for distributing data recovery tasks to the data recovery module.
A data recovery module: the composition worker is a newly added module in the scheme and is used for executing a data recovery task.
Data recovery: the deletion of data in the NoSql is not real deletion, but additional writing is carried out in a file, and a deletion mark is carried on the data to be deleted, so that the data originally written in the file is invalid, and garbage exists.
Example 1
There is also provided, in accordance with an embodiment of the present invention, an embodiment of a data recovery method, noting that the steps illustrated in the flowchart of the figure may be performed in a computer system such as a set of computer-executable instructions, and that while a logical order is illustrated in the flowchart, in some cases, the steps illustrated or described may be performed in an order different than presented herein.
The method provided by the first embodiment of the present application may be executed in a mobile terminal, a computer terminal, or a similar computing device. Fig. 1 shows a hardware configuration block diagram of a computer terminal (or mobile device) for implementing the data reclamation method. As shown in fig. 1, the computer terminal 10 (or mobile device 10) may include one or more (shown as 102a, 102b, \8230; 102 n) a processor 102 (the processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA, etc.), a memory 104 for storing data, and a transmission module 106 for communication functions. Besides, the method can also comprise the following steps: a display, an input/output interface (I/O interface), a Universal Serial Bus (USB) port (which may be included as one of the ports of the I/O interface), a network interface, a power source, and/or a camera. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration and is not intended to limit the structure of the electronic device. For example, the computer terminal 10 may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.
It should be noted that the one or more processors 102 and/or other data processing circuitry described above may be referred to generally herein as "data processing circuitry". The data processing circuitry may be embodied in whole or in part in software, hardware, firmware, or any combination thereof. Further, the data processing circuit may be a single stand-alone processing module, or incorporated in whole or in part into any of the other elements in the computer terminal 10 (or mobile device). As referred to in the embodiments of the application, the data processing circuit acts as a processor control (e.g. selection of a variable resistance termination path connected to the interface).
The memory 104 may be used to store software programs and modules of application software, such as program instructions/data storage devices corresponding to the data recovery method in the embodiment of the present invention, and the processor 102 executes various functional applications and data processing by running the software programs and modules stored in the memory 104, that is, implementing the data recovery method. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor 102, which may be connected to the computer terminal 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The transmission device 106 is used for receiving or transmitting data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the computer terminal 10. In one example, the transmission device 106 includes a Network adapter (NIC) that can be connected to other Network devices through a base station to communicate with the internet. In one example, the transmission device 106 may be a Radio Frequency (RF) module, which is used to communicate with the internet via wireless.
The display may be, for example, a touch screen type Liquid Crystal Display (LCD) that may enable a user to interact with a user interface of the computer terminal 10 (or mobile device).
It should be noted here that in some alternative embodiments, the computer device (or mobile device) shown in fig. 1 described above may include hardware elements (including circuitry), software elements (including computer code stored on a computer-readable medium), or a combination of both hardware and software elements. It should be noted that fig. 1 is only one example of a particular specific example and is intended to illustrate the types of components that may be present in the computer device (or mobile device) described above.
In this embodiment, two software modules, a central node (composition master) and a data recovery module (composition worker), are added, wherein one central node may be operated on a machine of any node in the distributed storage system, or multiple central nodes may be operated as a master-slave structure, and a backup central node is operated when a main central node fails; the number of the data recovery modules can be one or more, and in the case of a plurality of data recovery modules, the data processing efficiency is higher. In an alternative embodiment, in a distributed storage system, a central node is deployed on a machine of one of the nodes, and a data recovery module is deployed on a machine deployed by each fragment service module (partition worker), and the data recovery module and the fragment service modules may share a process.
In the above operating environment, the present application provides a flow chart of a data recovery method as shown in fig. 2. Fig. 2 is a flowchart of a data recovery method according to embodiment 1 of the present application. The following steps in this embodiment are performed by the central node, including:
step S21, receiving a data recovery plan sent by the fragment service module, where the data recovery plan is used to indicate a data recovery task that needs to be executed by the data recovery module.
Specifically, the fragment service module refers to a partition worker. In the field of distributed storage, a distributed storage system comprises a plurality of nodes, the nodes are divided into regions, a plurality of subareas are obtained, and a partition worker can be arranged on a machine of one subarea to execute a data processing task of the subarea.
In an optional embodiment, the partition worker may also be deployed on each node of the distributed storage system, or may be deployed according to a preset policy, which is not limited in this embodiment.
Specifically, the data recovery plan includes one or more files that need to be subjected to data recovery. The file to be recycled may be a file including data with a deletion flag. The data recovery module is a module for executing a data recovery task, that is, in the present application, the data recovery task is no longer executed by the fragment service module, the fragment service module only determines a data recovery plan, and the data recovery task is executed by the newly added data recovery module.
In an optional embodiment, the fragment service module finds out files in the area in which the fragment service module is responsible for, and files needing data recovery are made into a data recovery plan and handed to the central node.
Step S23, construct the data recovery task according to the data recovery plan.
Specifically, after receiving the data recovery plan, the central node generates a corresponding data recovery task according to the data recovery plan. The essence of the data recovery plan and the data recovery tasks are the same, except that the data recovery plan is in a data format that can be recognized by the central node, and the data recovery tasks are in a data format that can be recognized by the data recovery module.
And S25, distributing the data recovery task to the data recovery module, and executing the data recovery task by the data recovery module.
In step S25, the data recovery plan is configured as a data recovery task by the central node, and then distributed to the data recovery module by the central node, so that the data recovery module executes the data recovery task.
In an optional embodiment, all the fragment service modules submit a data recovery plan to a common central node, the central node forms corresponding data recovery tasks according to the data recovery plan, distributes the data recovery tasks to the data recovery modules, and the data recovery modules process the data recovery tasks.
It should be noted that, in the prior art, the data recovery tasks are executed by the fragment service modules, and the fragment service modules can execute the data recovery tasks one by one in sequence, whereas in the solution provided in the application, in the case that the solution includes a plurality of data recovery modules, the central node can allocate a plurality of data recovery plans submitted by the same fragment service module to different data recovery modules, so that the plurality of data recovery modules execute the plan submitted by one fragment service module in parallel, thereby improving the efficiency of data recovery.
It should be further noted that, according to the scheme provided by the present application, even if the fragment service module is migrated, the data recovery task is executed by the data recovery module, so that the migration of the fragment service module does not affect the processing of the data recovery task, and thus the waste of computing resources caused by the migration of the fragment service module is not generated.
Therefore, the embodiment of the application schedules from the view angle of the whole cluster, reduces the resource waste to the maximum extent, improves the efficiency and the success rate of data recovery, and simultaneously realizes that a single fragment service module executes a plurality of data recovery tasks concurrently. The method and the device solve the technical problem that in the prior art, after the migration of the fragment service module, the data recovery task needs to be executed again, so that the computing resources are wasted.
As an alternative embodiment, the data reclamation plan includes at least one file that needs to be reclaimed.
Specifically, the file that needs to be subjected to data recovery may be a file in which data in the file has been subjected to a deletion operation. The data recovery plan may include a space saving parameter, that is, how much space can be saved after the data recovery module executes a data recovery task on the file, in addition to the file to be recovered, and the parameter may be used by the central node to determine the priority of the data recovery plan.
Under the condition that a plurality of data recovery modules are provided, when a central node allocates a data recovery task, it needs to determine to which data recovery module the data recovery task to be currently allocated is allocated, as an optional embodiment, the data recovery task is sent to the data recovery module, and the data recovery module executes the data recovery task, including:
step S251, resource information corresponding to the data recovery module is obtained.
Specifically, before the central node sends the data recovery task to the data recovery module, the data recovery module requests resource information of the data recovery module, and the data recovery module collects the corresponding resource information and sends the resource information to the central node.
The resource information corresponding to the data recovery module is used to indicate an idle state of the data recovery module and an idle state of a machine running the data recovery module, so that the resource information corresponding to the data recovery module may include a completion state of the data recovery module in processing a data recovery task, and may also include a resource remaining condition of the machine running the data processing module.
Step S253, allocating the data recovery task to the data recovery module according to the resource information.
In the step S253, the central node allocates the data recovery module according to the resource information, which may allocate the data recovery task to be currently allocated to the currently most idle data recovery module, so as to achieve the purpose of load balancing.
In the scheme, the central node distributes the data recovery tasks to the data recovery module according to the resource information, and the purpose is to ensure the balanced matching of the resources and the tasks to the maximum extent and improve the efficiency of data recovery.
As an alternative embodiment, the resource information of the data recovery module includes one or more of the following items: the method comprises the steps of obtaining the network state of a machine where a data recovery module is located, the residual capacity of a disk, the utilization rate of a Central Processing Unit (CPU), the busy-idle degree of the current thread of the data recovery module and the state of the current execution task.
Specifically, the network state of the machine where the data recovery module is located, the remaining disk capacity, and the CPU utilization rate are used to characterize the state of the machine where the data recovery module is located, and can be measured by using corresponding data.
The idle and busy degree of the current thread of the data recovery module and the state of the current execution task are used for representing the idle and busy state of the data recovery module, the idle and busy degree of the current thread can be determined according to the number of the data recovery tasks waiting to be executed by the data recovery module, and the state of the current execution task can be determined according to the remaining time of the task currently being executed, that is, the idle and busy degree of the current thread of the data recovery module and the state of the current execution task can both have corresponding data for measurement.
It should be noted that the resource information is only used as an example, and is intended to be used as a basis for load balancing of a data recovery task by a central node, where each piece of load information has a corresponding parameter, and the parameter is a parameter that can be acquired by a data recovery module.
As an alternative embodiment, allocating the data recovery task to the data recovery module according to the resource information includes:
step S2531, a weight corresponding to each item of resource information is acquired.
Specifically, the central node may set a corresponding weight for each resource information, and may perform adjustment according to the task processing condition.
In an alternative embodiment, the resource information of the data recycling module includes: the network state, the disk remaining capacity, and the CPU utilization of the machine where the data recovery module is located are taken as examples, and the central node receives parameters corresponding to these items of resource information, for example, the parameter corresponding to the network state of the machine is a, the parameter corresponding to the disk remaining capacity is B, and the parameter corresponding to the CPU utilization is C. If the CPU utilization rate is considered to have a large influence on the data processing efficiency of the data recovery module, a high weight can be matched for the CPU utilization rate, and a low weight can be matched for the network state and the disk residual capacity. For example, the CPU utilization may be weighted 60%, and the net state and disk remaining capacity may be weighted 20%.
Step S2533, determining resource information parameters of the data recovery module according to the weight corresponding to each item of resource information and the parameters corresponding to each item of resource information.
Specifically, the resource information parameter can represent idle states of the data recovery module and a machine running the data recovery module in a data form.
Still in the above alternative embodiment, the resource information parameter X = a "20% + B" 20% + C "60% of the data reclamation module.
Step S2535, allocate the data recovery task to the data recovery module with the largest resource information parameter.
Specifically, the resource information parameter is in a direct proportional relationship with an idle state of the data recovery module and an idle state of a machine running the data recovery module, that is, the larger the resource information parameter of the data recovery module is, the more idle the data recovery module or the machine running the data recovery module is.
Therefore, the central node realizes load balance of the data recovery module in the process of distributing the data recovery tasks, and the data recovery tasks achieve global optimization under the condition of different idle and busy degrees of the machines.
As an optional embodiment, the data recycling plan further includes: the priority information of the data recovery task allocates the data recovery task to the data recovery module with the maximum resource information parameter, and the method comprises the following steps: and sequentially distributing the data recovery tasks to the data recovery module with the largest resource information parameter according to the priority of the data recovery tasks.
Specifically, the priority information may be any information indicating the priority of the task, and may be, for example, an identifier indicating the priority.
In an optional embodiment, the data recovery plan includes a space saving parameter, that is, how much space can be saved after the data recovery module performs a data recovery task on the file, in this case, after receiving the data recovery plan sent by the fragment service module, the central node takes out the space saving parameter therein, and preferentially processes the data recovery plan with the largest space saving parameter, so as to release more storage space for the system as soon as possible.
In the above scheme, the central node determines the distribution sequence of the data recovery tasks according to the priority information of the data recovery tasks, so that a better space releasing effect is achieved for the global data recovery tasks.
As an alternative embodiment, after the data recovery task is distributed to the data recovery module and the data recovery task is executed by the data recovery module, the method further includes: and receiving a task processing result returned by the data recovery module.
Specifically, the task processing result still includes one or more files, and the files are new files obtained after data recovery is performed on the files in the data recovery plan. And after the data recovery task is completed, the data recovery module returns a new file formed after the data recovery is completed to the central node, so that the next data recovery task is continuously executed.
As an optional embodiment, after the data recovery task is sent to the data recovery module, and the data recovery module executes the data recovery task, the method further includes:
and step S27, sending the task processing result to the fragment service module.
In step S27, the central node returns the received task processing result to the fragment service module, where the fragment processing module may be a fragment service module that sends the data recovery plan to the central node.
In step S29, the task processing result is deleted.
After the central node sends the task processing result to the fragment service module, the fragment service module stores the task processing result, and the central node can delete the task processing result.
As an optional embodiment, after the data recovery task is allocated to the data recovery module, in the process of executing the data recovery task by the data recovery module, the method further includes:
step S211, receiving task node information uploaded by the data recovery module, where the task node information includes one or more of the following items: identification, length, and offset of the input/output file.
Specifically, the task node information is information for determining a task execution node, and the task execution node is a node for recording a state of task execution. In the above solution, the task node information includes an identifier, a length, and an offset of the input/output file, and these pieces of information can enable the data recovery module to identify the execution condition of the task and to continue execution on the basis.
In step S211, the data recovery module may upload task node information to the central node according to a predetermined policy (for example, according to a certain period or according to the execution progress of the task) in the process of executing the data recovery task.
In an optional embodiment, the data recovery module records the task execution node information according to a certain period, and reports the task node information to the central node.
Step S213, if the data recovery module fails, after the data recovery module recovers, the data recovery module is instructed to continue to execute the data recovery task from the task node recorded by the task node information.
In an optional embodiment, the data recovery module uploads task node information to the central node according to a certain period, and when the data recovery module fails and cannot continue to execute the data recovery task, the central node stores the last task node information uploaded before the data recovery module fails and waits for the data recovery module to recover. After the data recovery module recovers, the central node sends the task node information to the data recovery module, so that the data recovery module does not need to restart to execute the task, but continues to process from the position recorded by the task node information.
Still in the above embodiment, if the data recovery module does not recover within the preset time, the central node may allocate the task node information to another data recovery module, and the other data recovery module continues to process the task node according to the task node recorded by the task node information, and does not need to process the task from the beginning.
According to the scheme, the breakpoint execution of the data recovery task is realized in a mode that the data recovery module uploads the task node information, so that resource waste possibly caused by the fault of the data recovery module is prevented.
Fig. 3 is a schematic diagram of a data recovery method according to embodiment 1 of the present application, where a Partition Worker (fragmentation service module) is used to generate a data recovery Plan (FileSelect CPT Plan), a central node compact Master waits to download a data recovery task (CPT task) from the Partition Worker (fragmentation service module), and a data recovery module (compact Worker) waits to execute a data recovery task. The above data recovery method of the present application is described in detail with reference to fig. 3, in which a solid arrow indicates that one side transmits data to the other side, and a dotted arrow indicates a response returned after the other side receives the data.
In step S31, a Task (Submit Task) is submitted.
And in the steps, the fragment service module sends the generated data recovery plan to the central node. And the central node returns response information to the fragment service module after receiving the data recovery plan.
Step S32, request the resource information (Qurey Stats).
In the above steps, the central node requests resource information from the data recovery module, and the data recovery module returns the resource information of the data recovery module and the machine where the data recovery module is located to the central node.
In step S33, load balancing (Dispatch Task) is performed.
And after receiving the resource information returned by the data recovery module, the central node allocates the data recovery task to the most idle data recovery module according to the resource information.
In step S34, the Task processing result (Commit Task) is returned.
And after the data recovery module executes the data recovery task, the data recovery module returns the task processing result to the central node.
Step S35, ask Task (Query Task).
In the process that the data recovery module executes the data recovery task, the fragment service module inquires whether the task is completed or not from the central node according to a preset period, and if the data recovery task is completed, the central node returns the received task processing result to the fragment service module.
In step S36, the task processing result (Commit MetaInfo) is saved.
And the fragment service module stores the task processing result uploaded by the central node.
In step S37, the Task processing result (Cancel Task) is deleted.
And after the central node sends the task processing result to the fragment service module, deleting the task processing result according to the instruction of the fragment service module.
Example 2
According to an embodiment of the present invention, there is also provided an embodiment of a data recovery method, where the following steps are performed by a data recovery module, and fig. 4 is a flowchart of a data recovery method according to embodiment 2 of the present application, and is shown in conjunction with fig. 4, where the method includes:
and S41, receiving a data recovery task distributed by the central node, wherein the central node forms the data recovery task according to the data recovery plan generated by the fragment service module.
Specifically, the data recovery plan includes one or more files that need to be subjected to data recovery. The file to be recycled may be a file including data with a deletion flag. The essence of the data recovery plan and the data recovery tasks are the same, except that the data recovery plan is in a data format that can be recognized by the central node, and the data recovery tasks are in a data format that can be recognized by the data recovery module.
In the above step S41, the data recovery module receives the data recovery task and executes it.
In step S43, a data recovery task is executed.
It should be noted that, in the prior art, the data recovery tasks are executed by the fragment service modules, and the fragment service modules can only execute one data recovery task in sequence, whereas in the solution provided by the application, in the case that the multiple data recovery modules are included, the central node can allocate multiple data recovery plans submitted by the same fragment service module to different data recovery modules, so that the multiple data recovery modules execute the plan submitted by one fragment service module in parallel, thereby improving the efficiency of data recovery.
It should be further noted that, according to the solution provided by the present application, even if the fragment service module is migrated, the data recovery task is executed by the data recovery module, so that the migration of the fragment service module does not affect the processing of the data recovery task, and thus the waste of computing resources caused by the migration of the fragment service module does not occur.
Therefore, the embodiment of the application solves the technical problem that computing resources are wasted due to the fact that the data recovery task needs to be executed again after the migration of the fragment service module in the prior art.
As an alternative embodiment, before executing the data recycling task, the method further includes: and sending the resource information corresponding to the data recovery module to the central node.
Specifically, the resource information corresponding to the data recovery module is used to indicate an idle state of the data recovery module and an idle state of a machine running the data recovery module, so that the resource information corresponding to the data recovery module may include a completion state of a data recovery task processed by the data recovery module, and may also include a resource remaining condition of the machine running the data processing module.
In an alternative embodiment, the resource information of the data recovery module includes one or more of the following items: the network state of the machine where the data recovery module is located, the residual capacity of a disk, the CPU utilization rate, the busy and idle degree of the current thread of the data recovery module and the state of the current execution task.
It should be noted that the resource information is only used as an example, and is intended to be used as a basis for load balancing of a data recovery task by a central node, where each piece of load information has a corresponding parameter, and the parameter is a parameter that can be acquired by a data recovery module.
As an alternative embodiment, the executing the data recovery task includes performing data recovery on a file in the data recovery task, where performing data recovery on a file to be processed in the data recovery task includes any one of:
extracting effective data in the file to form a new file;
and extracting effective data in the files and combining the effective data into a new file.
Specifically, the valid data is data that has no deletion marker in the file. The data recovery task comprises a file to be processed, the file to be processed comprises data with a deletion mark, and the data recovery module processes the data recovery task, namely the data with the deletion mark in the file to be processed is removed to form a new file. After obtaining the plurality of new files, if the data volumes of the plurality of new files are all smaller than the preset value, the plurality of new files can be merged.
As an alternative embodiment, after executing the data recovery task, the method further includes: and returning a task processing result to the central node.
Specifically, the task processing result still includes one or more files, and the files are new files obtained after data recovery is performed on the files in the data recovery plan. And after the data recovery module executes the data recovery task, returning a new file formed after the data recovery is finished to the central node, so as to continuously execute the next data recovery task.
Example 3
According to an embodiment of the present invention, there is further provided an embodiment of a data recovery method, where the following steps in this embodiment are performed by a fragmentation service module, and fig. 5 is a flowchart of a data recovery method according to embodiment 3 of this application, and as shown in fig. 5, this method includes:
in step S51, a data collection plan is generated, where the data collection plan indicates data that needs to be collected.
Specifically, the data recovery plan includes one or more files that need to be subjected to data recovery. The file to be recycled may include data with a deletion marker. The data recovery module is a module for executing a data recovery task, that is, in the present application, the data recovery task is no longer executed by the fragment service module, the fragment service module only determines a data recovery plan, and the data recovery task is executed by the newly added data recovery module.
And S53, sending the data recovery plan to the central node, wherein the central node constructs a data recovery task according to the data recovery plan, and the data recovery module executes the data recovery task.
In an alternative embodiment, the fragment service module finds out files in the area in which the fragment service module is responsible, the files needing data recovery, makes the files into a data recovery plan, and delivers the data recovery plan to the central node.
It should be noted that, in the prior art, the data recovery tasks are executed by the fragment service modules, and the fragment service modules can only execute one data recovery task in sequence, whereas in the solution provided by the application, in the case that the multiple data recovery modules are included, the central node can allocate multiple data recovery plans submitted by the same fragment service module to different data recovery modules, so that the multiple data recovery modules execute the plan submitted by one fragment service module in parallel, thereby improving the efficiency of data recovery.
It should be further noted that, according to the scheme provided by the present application, even if the fragment service module is migrated, the data recovery task is executed by the data recovery module, so that the migration of the fragment service module does not affect the processing of the data recovery task, and thus the waste of computing resources caused by the migration of the fragment service module is not generated.
Therefore, the embodiment of the application solves the technical problem that in the prior art, after the migration of the fragment service module, the data recovery task needs to be executed again, so that the computing resources are wasted.
As an alternative embodiment, after sending the data reclamation plan to the central node, the method further comprises:
and step S55, sending inquiry information to the central node, wherein the inquiry information is used for determining whether the data recovery task is completed.
In step S55, after the fragment service module hands the data recovery plan to the central node, the completion of the data recovery task may be inquired from the central node according to a predetermined period.
And step S57, receiving a task processing result returned by the central node.
Specifically, if the data recovery module completes the data recovery task, the task processing result is returned to the central node, and if the central node receives the query information after receiving the task processing result, the task processing result may be sent to the fragment service module.
And step S59, returning a deleting instruction to the central node, wherein the deleting instruction is used for indicating the central node to delete the task processing result.
After the central node sends the task processing result to the fragment service module, the fragment service module stores the task processing result, and the central node can delete the task processing result.
It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.
Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.
Example 4
According to an embodiment of the present invention, there is further provided a data recovery apparatus for implementing the data recovery method in embodiment 1, and fig. 6 is a schematic diagram of a data recovery apparatus according to embodiment 4 of the present application, and as shown in fig. 4, the apparatus 600 includes:
the receiving module 602 is configured to receive a data recovery plan sent by the fragment service module, where the data recovery plan is used to indicate a data recovery task that needs to be executed by the data recovery module.
A construction module 604 for constructing the data recovery task according to the data recovery plan.
The allocating module 606 is configured to allocate the data recovery task to the data recovery module, and the data recovery module executes the data recovery task.
It should be noted here that the receiving module 602, the constructing module 604 and the executing module 606 correspond to steps S21 to S25 in embodiment 1, and the two modules are the same as the corresponding steps in the implementation example and the application scenario, but are not limited to the disclosure in the first embodiment. It should be noted that the modules described above as part of the apparatus may be run in the computer terminal 10 provided in the first embodiment.
As an alternative embodiment, the data reclamation plan includes at least one file that needs to be reclaimed.
As an alternative embodiment, the allocation module comprises: the acquisition submodule is used for acquiring resource information corresponding to the data recovery module; and the distribution submodule is used for distributing the data recovery task to the data recovery module according to the resource information.
As an alternative embodiment, the resource information of the data recovery module includes one or more of the following items: the network state of the machine where the data recovery module is located, the residual capacity of a disk, the utilization rate of a central processing unit, the busy and idle degree of the current thread of the data recovery module and the state of the current execution task.
As an alternative embodiment, the allocation submodule includes: the acquiring unit is used for acquiring the weight corresponding to each item of resource information; the determining unit is used for determining the resource information parameters of the data recovery module according to the weight corresponding to each item of resource information and the parameters corresponding to each item of resource information; and the distribution unit is used for distributing the data recovery task to the data recovery module with the maximum resource information parameter.
As an optional embodiment, the data recovery plan further includes: the priority information of the data recovery task, the allocation unit includes: and the distribution subunit is used for sequentially distributing the data recovery tasks to the data recovery module with the largest resource information parameter according to the priority of the data recovery tasks.
As an alternative embodiment, the apparatus further comprises: and the result receiving module is used for receiving the task processing result returned by the data recovery module after the data recovery task is distributed to the data recovery module and executed by the data recovery module.
As an alternative embodiment, the apparatus further comprises: the sending module is used for sending the data recovery task to the data recovery module, and sending a task processing result to the fragment service module after the data recovery module executes the data recovery task; and the deleting module is used for deleting the task processing result.
As an optional embodiment, the apparatus further comprises: the node information receiving module is used for receiving task node information uploaded by the data recovery module in the process of executing the data recovery task by the data recovery module after the data recovery task is distributed to the data recovery module, wherein the task node information comprises one or more of the following items: identification, length and offset of the input/output file; an indication module for indicating the data recovery module to continue executing the data recovery task from the task node of the task node information record after the data recovery module recovers if the data recovery module fails
Example 5
There is further provided, according to an embodiment of the present invention, a data recovery apparatus for implementing the data recovery method in embodiment 2, and fig. 7 is a schematic diagram of a data recovery apparatus according to embodiment 5 of the present application, as shown in fig. 7, where the apparatus 700 includes:
a receiving module 702, configured to receive a data recovery task allocated by a central node, where the central node forms the data recovery task according to a data recovery plan generated by the fragment service module.
And an execution module 704, configured to execute a data recovery task.
It should be noted here that the receiving module 702 and the executing module 704 correspond to steps S41 to S45 in embodiment 2, and the two modules are the same as the corresponding steps in the implementation example and application scenario, but are not limited to the disclosure of the first embodiment. It should be noted that the modules described above as part of the apparatus may be run in the computer terminal 10 provided in the first embodiment.
As an alternative embodiment, the apparatus further comprises: and the sending module is used for sending the resource information corresponding to the data recovery module to the central node before executing the data recovery task.
As an alternative embodiment, the data recovery task is executed, and includes performing data recovery on a file in the data recovery task, where the execution module includes any one of the following: the first extraction module is used for extracting effective data in the file to form a new file; and the second extraction module is used for extracting effective data in the files and combining the effective data into a new file.
As an alternative embodiment, the returning module is configured to return the task processing result to the central node after executing the data recovery task.
Example 6
According to an embodiment of the present invention, there is further provided a data recovery apparatus for implementing the data recovery method in embodiment 3, and fig. 8 is a schematic diagram of a data recovery apparatus according to embodiment 6 of the present application, and as shown in fig. 8, the apparatus 800 includes:
a generating module 802, configured to generate a data recovery plan, where the data recovery plan is used to indicate a data recovery task that needs to be executed by the data recovery module.
The sending module 804 is configured to send the data recovery plan to the central node, where the central node constructs a data recovery task according to the data recovery plan, and the data recovery module executes the data recovery task.
It should be noted here that the generating module 802 and the sending module 804 correspond to steps S51 to S53 in embodiment 3, and the two modules are the same as the corresponding steps in the implementation example and application scenario, but are not limited to the disclosure of the first embodiment. It should be noted that the modules described above as part of the apparatus may be run in the computer terminal 10 provided in the first embodiment.
As an optional embodiment, after sending the data recovery plan to the central node, the sending sub-module is configured to send query information to the central node, where the query information is used to determine whether the data recovery task is completed; the receiving submodule is used for receiving a task processing result returned by the central node; and the returning submodule is used for returning a deleting instruction to the central node, wherein the deleting instruction is used for indicating the central node to delete the task processing result.
Example 7
An embodiment of the present invention further provides a data recycling system, and fig. 9 is a schematic diagram of a data recycling system according to embodiment 7 of the present application, as shown in fig. 9, the system includes:
and the fragment service module 90 is used for generating a data recovery plan, wherein the data recovery plan is used for indicating data recovery tasks needing to be executed by the data recovery module.
And the central node 92 is used for constructing a data recovery task according to the data recovery plan and distributing the data recovery task to the data recovery module.
A data reclamation module 94 for performing data reclamation tasks.
The central node 92 may also perform other steps in embodiment 1; the fragmentation service module 90 may also perform other steps in embodiment 2; the data recovery module 94 may also perform other steps in embodiment 3, which are not described herein.
Example 8
The embodiment of the invention can provide a computer terminal which can be any computer terminal device in a computer terminal group. Optionally, in this embodiment, the computer terminal may also be replaced with a terminal device such as a mobile terminal.
Optionally, in this embodiment, the computer terminal may be located in at least one network device of a plurality of network devices of a computer network.
In this embodiment, the computer terminal may execute the program code of the following steps in the data recovery method: receiving a data recovery plan sent by a fragment service module, wherein the data recovery plan is used for indicating a data recovery task which needs to be executed by a data recovery module; constructing a data recovery task according to the data recovery plan; and distributing the data recovery task to a data recovery module, and executing the data recovery task by the data recovery module. .
Alternatively, fig. 10 is a block diagram of a computer terminal according to embodiment 8 of the present invention. As shown in fig. 10, the computer terminal a may include: one or more processors 1002 (only one of which is shown), a memory 1004, and a transmitting device 1006.
The memory may be configured to store software programs and modules, such as program instructions/modules corresponding to the data recovery method and apparatus in the embodiments of the present invention, and the processor executes various functional applications and data processing by running the software programs and modules stored in the memory, so as to implement the data recovery method. The memory may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory may further include memory remotely located from the processor, which may be connected to terminal a through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The processor can call the information and application program stored in the memory through the transmission device to execute the following steps: receiving a data recovery plan sent by a fragment service module, wherein the data recovery plan is used for indicating a data recovery task which needs to be executed by a data recovery module; constructing a data recovery task according to the data recovery plan; and distributing the data recovery task to a data recovery module, and executing the data recovery task by the data recovery module.
Optionally, the data recovery plan includes at least one file to be subjected to data recovery.
Optionally, the processor may further execute the program code of the following steps: acquiring resource information corresponding to the data recovery module; and distributing the data recovery tasks to the data recovery modules according to the resource information.
Optionally, the resource information of the data recovery module includes one or more of the following items: the network state of the machine where the data recovery module is located, the residual capacity of a disk, the utilization rate of a central processing unit, the busy and idle degree of the current thread of the data recovery module and the state of the current execution task.
Optionally, the processor may further execute the program code of the following steps: acquiring the weight corresponding to each item of resource information; determining resource information parameters of a data recovery module according to the weight corresponding to each item of resource information and the parameters corresponding to each item of resource information; and distributing the data recovery task to the data recovery module with the maximum resource information parameter.
Optionally, the data recovery plan further includes: the processor may further execute the program code for: and sequentially distributing the data recovery tasks to the data recovery module with the largest resource information parameter according to the priority of the data recovery tasks.
The processor may further execute the program code for: after the data recovery task is distributed to the data recovery module and the data recovery module executes the data recovery task, a task processing result returned by the data recovery module is received.
The processor may further execute program code for: after the data recovery task is sent to the data recovery module and the data recovery module executes the data recovery task, the task processing result is sent to the fragment service module; and deleting the task processing result.
The processor may further execute program code for: after the data recovery task is distributed to the data recovery module, receiving task node information uploaded by the data recovery module in the process of executing the data recovery task by the data recovery module, wherein the task node information comprises one or more of the following items: identification, length and offset of the input/output file; and if the data recovery module fails, after the data recovery module recovers, the data recovery module is instructed to continue executing the data recovery task from the task node recorded by the task node information.
It should be noted that, in the prior art, the data recovery tasks are executed by the fragment service modules, and the fragment service modules can execute the data recovery tasks one by one in sequence, whereas in the solution provided in the application, in the case that the solution includes a plurality of data recovery modules, the central node can allocate a plurality of data recovery plans submitted by the same fragment service module to different data recovery modules, so that the plurality of data recovery modules execute the plan submitted by one fragment service module in parallel, thereby improving the efficiency of data recovery.
It should be further noted that, according to the scheme provided by the present application, even if the fragment service module is migrated, the data recovery task is executed by the data recovery module, so that the migration of the fragment service module does not affect the processing of the data recovery task, and thus the waste of computing resources caused by the migration of the fragment service module is not generated.
Therefore, the embodiment of the application schedules from the perspective of the whole cluster, reduces resource waste to the maximum extent, improves the efficiency and the success rate of data recovery, and simultaneously realizes that a single fragment service module concurrently executes a plurality of data recovery tasks. The method and the device solve the technical problem that in the prior art, after the migration of the fragment service module, the data recovery task needs to be executed again, so that the computing resources are wasted.
It can be understood by those skilled in the art that the structure shown in fig. 10 is only an illustration, and the computer terminal may also be a terminal device such as a smart phone (e.g., an Android phone, an iOS phone, etc.), a tablet computer, a palmtop computer, a Mobile Internet Device (MID), a PAD, and the like. Fig. 10 is a diagram illustrating the structure of the electronic device. For example, computer terminal 100 may also include more or fewer components (e.g., network interfaces, display devices, etc.) than shown in FIG. 10, or have a different configuration than shown in FIG. 10.
Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by a program instructing hardware associated with the terminal device, where the program may be stored in a computer-readable storage medium, and the storage medium may include: flash disks, read-Only memories (ROMs), random Access Memories (RAMs), magnetic or optical disks, and the like.
Example 9
The embodiment of the invention also provides a storage medium. Optionally, in this embodiment, the storage medium may be configured to store a program code executed by the data recovery method provided in the first embodiment.
Optionally, in this embodiment, the storage medium may be located in any one of computer terminals in a computer terminal group in a computer network, or in any one of mobile terminals in a mobile terminal group.
Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: receiving a data recovery plan sent by a fragment service module, wherein the data recovery plan is used for indicating a data recovery task which needs to be executed by a data recovery module; constructing a data recovery task according to the data recovery plan; and distributing the data recovery task to a data recovery module, and executing the data recovery task by the data recovery module.
The above-mentioned serial numbers of the embodiments of the present invention are only for description, and do not represent the advantages and disadvantages of the embodiments.
In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed technical content can be implemented in other manners. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed coupling or direct coupling or communication connection between each other may be an indirect coupling or communication connection through some interfaces, units or modules, and may be electrical or in other forms.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention, which is substantially or partly contributed by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and amendments can be made without departing from the principle of the present invention, and these modifications and amendments should also be considered as the protection scope of the present invention.

Claims (18)

1. A method of data recovery, comprising:
receiving a data recovery plan sent by a fragment service module, wherein the data recovery plan is used for indicating a data recovery task which needs to be executed by a data recovery module;
constructing a data recovery task according to the data recovery plan;
distributing the data recovery task to the data recovery module, and executing the data recovery task by the data recovery module;
wherein the allocating the data recovery task to the data recovery module, the executing the data recovery task by the data recovery module includes: the plurality of data recovery tasks are distributed to the plurality of different data recovery modules, and the plurality of data recovery tasks are executed in parallel by the plurality of data recovery modules.
2. The method of claim 1, wherein the data reclamation plan includes at least one file for which data reclamation is desired.
3. The method of claim 1, wherein sending the data reclamation task to a data reclamation module, the data reclamation task being performed by the data reclamation module, comprises:
acquiring resource information corresponding to the data recovery module;
and distributing the data recovery task to the data recovery module according to the resource information.
4. The method of claim 3, wherein the resource information of the data reclamation module includes one or more of: the system comprises a network state of a machine where the data recovery module is located, the residual capacity of a disk, the utilization rate of a central processing unit, the busy and idle degree of the current thread of the data recovery module and the state of the current execution task.
5. The method of claim 4, wherein assigning the data reclamation tasks to the data reclamation modules according to the resource information comprises:
acquiring the weight corresponding to each item of resource information;
determining resource information parameters of the data recovery module according to the weight corresponding to each resource information and the parameters corresponding to each resource information;
and distributing the data recovery task to the data recovery module with the maximum resource information parameter.
6. The method of claim 5, wherein the data reclamation plan further comprises: the priority information of the data recovery task allocates the data recovery task to the data recovery module with the maximum resource information parameter, and the method comprises the following steps:
and sequentially distributing the data recovery tasks to the data recovery module with the maximum resource information parameter according to the priority of the data recovery tasks.
7. The method of claim 1, wherein after assigning the data reclamation task to a data reclamation module, the data reclamation task being performed by the data reclamation module, the method further comprises:
and receiving a task processing result returned by the data recovery module.
8. The method of claim 7, wherein after sending the data reclamation task to a data reclamation module, the data reclamation task being performed by the data reclamation module, the method further comprises:
sending a task processing result to the fragment service module;
and deleting the task processing result.
9. The method of claim 1, wherein after assigning the data reclamation task to a data reclamation module, the method further comprises, during execution of the data reclamation task by the data reclamation module:
receiving task node information uploaded by the data recovery module, wherein the task node information comprises one or more of the following items: identification, length and offset of the input/output file;
and if the data recovery module fails, after the data recovery module recovers, the data recovery module is instructed to continue to execute the data recovery task from the task node recorded by the task node information.
10. A method of data recovery, comprising:
receiving a data recovery task distributed by a central node, wherein the central node forms the data recovery task according to a data recovery plan generated by a fragment service module;
executing the data recovery task;
wherein executing the data recovery task comprises: the data recovery tasks are distributed to different data recovery modules, and the data recovery tasks are executed in parallel by the data recovery modules.
11. The method of claim 10, wherein prior to performing the data reclamation task, the method further comprises: and sending the resource information corresponding to the data recovery module to the central node.
12. The method of claim 10, wherein performing the data reclamation task comprises performing data reclamation on files in the data reclamation task, wherein performing data reclamation on files in the data reclamation task comprises any one of:
extracting effective data in the file to form a new file;
and extracting effective data in the files, and combining the effective data into a new file.
13. The method of claim 10, wherein after performing the data reclamation task, the method further comprises: and returning a task processing result to the central node.
14. A method of data recovery, comprising:
generating a data recovery plan, wherein the data recovery plan is used for indicating a data recovery task which needs to be executed by a data recovery module;
sending the data recovery plan to a central node, wherein the central node constructs a data recovery task according to the data recovery plan, and the data recovery task is executed by a data recovery module;
the central node constructs a data recovery task according to the data recovery plan, and the data recovery module executes the data recovery task, wherein the data recovery task comprises the following steps: the central node distributes the data recovery tasks to the different data recovery modules, and the data recovery modules execute the data recovery tasks in parallel.
15. The method of claim 14, wherein after sending the data reclamation plan to the central node, the method further comprises:
sending query information to the central node, wherein the query information is used for determining whether the data recovery task is executed and completed;
receiving a task processing result returned by the central node;
and returning a deleting instruction to the central node, wherein the deleting instruction is used for indicating the central node to delete the task processing result.
16. A storage medium comprising a stored program, wherein the program, when executed, controls an apparatus on which the storage medium is located to perform the steps of: receiving a data recovery plan sent by a fragment service module, wherein the data recovery plan is used for indicating a data recovery task which needs to be executed by a data recovery module; constructing a data recovery task according to the data recovery plan; distributing the data recovery task to the data recovery module, and executing the data recovery task by the data recovery module;
wherein the allocating the data recovery task to the data recovery module, the executing the data recovery task by the data recovery module includes: the data recovery tasks are distributed to the data recovery modules, and the data recovery tasks are executed by the data recovery modules in parallel.
17. A processor for running a program, wherein the program when run performs the steps of: receiving a data recovery plan sent by a fragment service module, wherein the data recovery plan is used for indicating a data recovery task which needs to be executed by a data recovery module; constructing a data recovery task according to the data recovery plan; distributing the data recovery task to the data recovery module, and executing the data recovery task by the data recovery module;
wherein the allocating the data recovery task to the data recovery module, the executing the data recovery task by the data recovery module includes: the data recovery tasks are distributed to the data recovery modules, and the data recovery tasks are executed by the data recovery modules in parallel.
18. A data recovery system comprising:
the system comprises a fragment service module and a data recovery module, wherein the fragment service module is used for generating a data recovery plan, and the data recovery plan is used for indicating a data recovery task which needs to be executed by the data recovery module;
the central node is used for constructing a data recovery task according to the data recovery plan and distributing the data recovery task to the data recovery module;
the data recovery module is used for executing the data recovery task;
wherein the central node allocates the data recovery tasks to the data recovery modules by: a plurality of the data recovery tasks are distributed to a plurality of different data recovery modules;
the data recovery module executes the data recovery task by: executing the plurality of data reclamation tasks in parallel by a plurality of the data reclamation modules.
CN201810763447.2A 2018-07-12 2018-07-12 Data recovery method and system Active CN110716690B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810763447.2A CN110716690B (en) 2018-07-12 2018-07-12 Data recovery method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810763447.2A CN110716690B (en) 2018-07-12 2018-07-12 Data recovery method and system

Publications (2)

Publication Number Publication Date
CN110716690A CN110716690A (en) 2020-01-21
CN110716690B true CN110716690B (en) 2023-02-28

Family

ID=69208372

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810763447.2A Active CN110716690B (en) 2018-07-12 2018-07-12 Data recovery method and system

Country Status (1)

Country Link
CN (1) CN110716690B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11693743B2 (en) * 2020-08-13 2023-07-04 EMC IP Holding Company LLC Method to optimize restore based on data protection workload prediction
CN116841453A (en) * 2022-03-25 2023-10-03 中移(苏州)软件技术有限公司 Data recovery method, system, device and computer readable storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101354720A (en) * 2008-09-04 2009-01-28 中兴通讯股份有限公司 Distributed memory database data system and sharing method thereof
CN104360824A (en) * 2014-11-10 2015-02-18 北京奇虎科技有限公司 Data merging method and device
CN106202138A (en) * 2015-06-01 2016-12-07 三星电子株式会社 Storage device and method for autonomous space compression
CN106844650A (en) * 2017-01-20 2017-06-13 中国科学院计算技术研究所 A kind of daily record merges the merging method and system of tree
CN108021702A (en) * 2017-12-26 2018-05-11 百度在线网络技术(北京)有限公司 Classification storage method, device, OLAP database system and medium based on LSM-tree

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8977805B2 (en) * 2009-03-25 2015-03-10 Apple Inc. Host-assisted compaction of memory blocks
WO2015066085A1 (en) * 2013-10-28 2015-05-07 Bawaskar Swapnil Prakash Selecting files for compaction

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101354720A (en) * 2008-09-04 2009-01-28 中兴通讯股份有限公司 Distributed memory database data system and sharing method thereof
CN104360824A (en) * 2014-11-10 2015-02-18 北京奇虎科技有限公司 Data merging method and device
CN106202138A (en) * 2015-06-01 2016-12-07 三星电子株式会社 Storage device and method for autonomous space compression
CN106844650A (en) * 2017-01-20 2017-06-13 中国科学院计算技术研究所 A kind of daily record merges the merging method and system of tree
CN108021702A (en) * 2017-12-26 2018-05-11 百度在线网络技术(北京)有限公司 Classification storage method, device, OLAP database system and medium based on LSM-tree

Also Published As

Publication number Publication date
CN110716690A (en) 2020-01-21

Similar Documents

Publication Publication Date Title
CN107220142B (en) Method and device for executing data recovery operation
EP3410288B1 (en) Online upgrade method, device and system
CN105049268A (en) Distributed computing resource allocation system and task processing method
CN110716690B (en) Data recovery method and system
CN112631780A (en) Resource scheduling method and device, storage medium and electronic equipment
CN110928681A (en) Data processing method and device, storage medium and electronic device
CN111784329A (en) Service data processing method and device, storage medium and electronic device
CN115469813A (en) Data processing method and device, storage medium and electronic device
CN112748961A (en) Method and device for executing starting task
CN109558270A (en) Method and apparatus, the method and apparatus of data convert of data backup
CN109542841B (en) Method for creating data snapshot in cluster and terminal equipment
CN107422991B (en) Storage strategy management system
CN115292280A (en) Cross-region data scheduling method, device, equipment and storage medium
CN114675931A (en) Resource monitoring method and monitoring device for integrated platform instance
CN114637574A (en) Data processing method, device and system based on paravirtualization equipment
CN114490083A (en) CPU resource binding method and device, storage medium and electronic device
CN114187300A (en) Image processing method, system and storage medium
CN111385327B (en) Data processing method and system
CN114281473A (en) Cloud platform test environment management method, system, terminal and storage medium
CN111949617A (en) Aggregate file object header management method, system, terminal and storage medium
CN114675776A (en) Resource storage method and device, storage medium and electronic equipment
CN105867961A (en) System upgrading and degrading method and device
CN110780817B (en) Data recording method and apparatus, storage medium, and electronic apparatus
CN113703682B (en) File mounting method and device, storage medium and electronic device
CN114189512B (en) Baseline code downloading method, device, terminal equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant