CN113312147A

CN113312147A - Method and system for migrating object storage across cluster mass data

Info

Publication number: CN113312147A
Application number: CN202110654199.XA
Authority: CN
Inventors: 张致江; 凌震华; 王智国; 王芝斌
Original assignee: University of Science and Technology of China USTC
Current assignee: University of Science and Technology of China USTC
Priority date: 2021-06-11
Filing date: 2021-06-11
Publication date: 2021-08-27
Anticipated expiration: 2041-06-11
Also published as: CN113312147B

Abstract

The invention discloses a method and a system for migrating object storage cross-cluster mass data, wherein the method comprises the following steps: step S1, receiving a migration task request sent by a user and a built subtask; step S2, generating a corresponding configuration file of the migration task according to the related information of the established subtask, and storing the related information of the established subtask and the corresponding configuration file of the migration task into a task queue of a back-end database by an OSS object; step S3, scanning the task queue stored in the back-end database according to the preset time length, and scheduling and executing the transition tasks in the waiting state and the pause state; step S4, the called migration task runs in any mode of a Docker container, a jobs of K8S and a process; step S5, starting a migration plug-in of a corresponding type according to the type of the migration task to perform migration operation; and step S6, the called corresponding migration plug-in completes data migration in a multi-level task mode. The method can realize flexible trans-cluster mass data migration.

Description

Method and system for migrating object storage across cluster mass data

Technical Field

The invention relates to the field of cloud storage data migration, in particular to a system and a system for migrating object storage cross-cluster mass data.

Background

Data migration is a necessary product and tool for various large cloud service manufacturers, for example, microsoft Azure, ali cloud, huazhiyun and tengcong cloud have own data migration tool; the various storage clusters and frameworks also have their own data migration and balancing schemes.

At present, a dts (data Transmission service) scheme of the arri cloud focuses on data migration between OSS clusters, and a cloud center service provides data migration of a plurality of different OSS stored in the arri cloud OSS, and data migration of heterogeneous clusters is not supported; nor does it support data migration between non-arry cloud storage.

The migration of Huashi cloud is data migration in a transaction mode, and all operations are rolled back when the migration fails.

The existing cloud storage data migration technology mostly takes a centralized service as a migration center, mostly refers to migration among specific types of data clusters, and if different requirements exist, a set of migration system needs to be rebuilt; meanwhile, most of the migration systems except a few public cloud manufacturers are abandoned after the temporary construction of the system responds to the requirements; meanwhile, the prior art rarely realizes data migration among heterogeneous clusters or only realizes data migration among heterogeneous clusters with special requirements.

Disclosure of Invention

Based on the problems existing in the prior art, the invention aims to provide an object storage cross-cluster mass data migration system and system, which can solve the problem that most of existing cloud storage data migration systems do not support data migration among heterogeneous clusters or only support special requirements for data migration among heterogeneous clusters.

The purpose of the invention is realized by the following technical scheme:

the embodiment of the invention provides a method for migrating object storage cross-cluster mass data, which comprises the following steps:

step S1, receiving a migration task request sent by a user and a request for establishing a subtask sent by the migration task request;

step S2, generating a corresponding configuration file of the created migration task according to the related information of the created subtask, and storing the related information of the created subtask and the corresponding configuration file of the created migration task to a back-end database by an OSS object;

step S3, scanning the task information stored in the back-end database at regular time, and scheduling the migration tasks in waiting state and pause state to execute the migration tasks;

step S4, the called migration task runs in any mode of a Docker container, a job of K8S and a process mode;

step S5, according to the type of the task, starting a task instance by a migration plug-in of a corresponding type to perform migration operation;

and step S6, the migration plug-in unit completes the migration of the data in a multi-level task mode.

An embodiment of the present invention provides an object storage cross-cluster mass data migration system, which is used for implementing the method of the present invention, and includes:

the system comprises an entrance service unit, a migration task scheduling unit, a migration task execution unit and a back-end database; wherein the content of the first and second substances,

the entrance service unit is in communication connection with the back-end database, can receive a migration task request sent by a user and a subtask established by the migration task request, generates a corresponding configuration file of the migration task according to related information for establishing the subtask, and stores the established subtask and the related information of the corresponding configuration file of the migration task into a task queue of the back-end database by an OSS object;

the migration task scheduling unit is in communication connection with the back-end database, can scan a task queue stored in the back-end database according to a preset time length, schedules and executes a migration task in a waiting state and a pause state, and the scheduled migration task runs in any one mode of a Docker container, a job of K8S and a process;

the migration task execution unit is respectively in communication connection with the migration task scheduling unit and the back-end database, and can start a migration plug-in of a corresponding type according to the type of the migration task to perform migration operation, and the migration plug-in completes data migration in a multi-level task mode.

According to the technical scheme provided by the invention, the object storage cross-cluster mass data migration system and system provided by the embodiment of the invention have the beneficial effects that:

the task instance is started by the migration plug-in of the corresponding type according to the relation of the migration task to perform migration operation, mass data migration among heterogeneous clusters can be conveniently achieved, due to the fact that the migration task service assembly is supported to be automatically deployed and destroyed through one key based on the data migration heterogeneous plug-in of the cloud native technology, and plug-in type access is supported for data migration of different storage systems.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.

Fig. 1 is a flowchart of an object storage cross-cluster mass data migration method according to an embodiment of the present invention;

FIG. 2 is a block diagram of an object storage cross-cluster mass data migration system provided by an embodiment of the present invention;

FIG. 3 is a block diagram of an entry service unit of an object storage cross-cluster mass data migration system according to an embodiment of the present invention;

fig. 4 is a block diagram of a migration task scheduling unit of an object storage cross-cluster mass data migration system according to an embodiment of the present invention;

fig. 5 is a block diagram of a migration task execution unit of an object storage cross-cluster mass data migration system according to an embodiment of the present invention;

FIG. 6 is a flowchart illustrating a processing procedure of an entry service unit of the object storage cross-cluster mass data migration system according to an embodiment of the present invention;

fig. 7 is a flowchart illustrating a task scheduling unit processing of the object storage cross-cluster mass data migration system according to an embodiment of the present invention;

FIG. 8 is a flowchart illustrating processing of a task execution unit of the object storage cross-cluster mass data migration system according to an embodiment of the present invention;

fig. 9 is a flowchart of an entry service processing of an object storage cross-cluster mass data migration method according to an embodiment of the present invention;

fig. 10 is a schematic diagram illustrating a specific configuration of an object storage cross-cluster mass data migration system according to an embodiment of the present invention;

fig. 11 is a flowchart of task generation and scheduling in the method for migrating mass data across clusters in object storage according to the embodiment of the present invention;

fig. 12 is a schematic logical structure diagram of an account-level data migration plug-in of an object storage cross-cluster mass data migration system according to an embodiment of the present invention;

fig. 13 is a schematic logical structure diagram of a container/bucket level data migration plug-in of an object storage cross-cluster mass data migration system according to an embodiment of the present invention;

fig. 14 is a schematic logical structure diagram of a FileList-level data migration plug-in of the object storage cross-cluster mass data migration system according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the specific contents of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention. Details which are not described in detail in the embodiments of the invention belong to the prior art which is known to the person skilled in the art.

Referring to fig. 1, an embodiment of the present invention provides a method for migrating object storage across cluster mass data, including:

In step S6 of the method, the migration plug-in completes data migration in a multi-level task manner as follows:

step S61) scheduling a corresponding migration task mirror image according to the received request of the user migration task, if the user needs to migrate the whole account (account), starting the account migration task, and creating a bucket migration task after the account migration task traverses all buckets (namely container or bucket);

step S62), after the created barrel migration task is scheduled, traversing all files under the barrel, and creating a file migration task for each n files according to a preset number n;

step S63), comparing the source file of the source end cluster with the target file of the target end cluster, if the source file and the target file are the same, skipping the copying and transferring of the file, if the source file is different from the target file, copying and transferring the file.

The method further comprises the following steps:

step S7, recovering from the failure, where all the dispatched tasks insert a checkpoint (i.e., checkpoint) into the back-end database at regular time, and when the task is rescheduled after the failure, the task resumes from the last completed checkpoint;

the checkpoint is an internal event that triggers a database write process to write out dirty data blocks in the data buffer to the data file after activation.

Referring to fig. 2, an embodiment of the present invention further provides an object storage cross-cluster mass data migration system, configured to implement the foregoing method, where the object storage cross-cluster mass data migration system includes:

the system comprises an entrance service unit 1, a migration task scheduling unit 2, a migration task execution unit 3 and a back-end database 4; wherein the content of the first and second substances,

FIG. 2 is a diagram of the logical architecture of the overall system: the system is divided into three main logic units, namely an entrance service unit 1, a migration task scheduling unit 2 and a migration task execution unit 3; wherein the content of the first and second substances,

the entrance service unit is used for receiving and executing instructions from a user or in a task execution process;

the migration task scheduling unit schedules the migration tasks according to a certain scheduling algorithm to realize the concurrent execution of the tasks;

the migration task execution unit is an executor of the real migration task, accesses the source and target clusters to collect information, and executes the data migration copy.

Referring to fig. 3, in the above system, the portal service unit 1 includes:

the system comprises a task creating request module 11, a task request submodule 12, a configuration information generating module 13, an insertion queue module 14, a database operation module 15, a task scheduling module 16, a container cloud resource adjusting module 17 and a container starting module 18; wherein the content of the first and second substances,

the task creation request module is used for receiving a migration task request sent by a user and establishing a migration task according to the migration task request;

the task request submodule is in communication connection with the task creation request module and is used for creating corresponding subtasks according to the migration tasks;

the configuration information generating module is respectively in communication connection with the task creating request module and the task request submodule and is used for generating a corresponding configuration file of the migration task according to the relevant information for establishing the subtasks;

the inserting queue module is respectively in communication connection with the task creating request module, the task request submodule and the configuration information generating module, and inserts the related information of the corresponding configuration files of the established subtasks and the migration tasks into a task queue;

the database operation module is in communication connection with the insertion queue module and stores each task of the task queue into a back-end database by using an OSS object;

the task scheduling module is respectively in communication connection with the migration task scheduling unit, the database operation module and the adjustment container cloud resource module, can send a scheduling task request to the migration task scheduling unit, receives the migration task scheduled by the migration task scheduling unit, and respectively sends the migration task request to the database operation module and the adjustment container cloud resource module;

the adjustment container cloud resource module is in communication connection with the task scheduling module and is used for adjusting container cloud resources according to the migration tasks sent by the task scheduling module;

the starting container module is in communication connection with the adjusting container cloud resource module and is used for starting and running the called migration task in any one mode of a Docker container, a job of K8S and a process.

Fig. 6 illustrates a process flow of the portal service unit.

Referring to fig. 4, in the above system, the migration task scheduling unit 2 includes:

a task scanning judgment module 21 and a migration task determination module 22; wherein the content of the first and second substances,

the task scanning and judging module is in communication connection with the migration task determining module, can scan the task queue stored in the back-end database according to preset time, and sends the migration tasks in a waiting state and a suspension state to the migration task determining module;

and the migration task determining module schedules and executes the migration task sent by the task scanning and judging module.

Fig. 7 illustrates a processing flow of the migration task scheduling unit described above.

Referring to fig. 5, in the above system, the migration task execution unit 3 includes:

a migration plug-in selection module 31 and a plurality of migration plug-in modules 32, 33 … … 3 n; wherein the content of the first and second substances,

the plug-in selection module is respectively in communication connection with each migration plug-in module and can start the migration plug-in of the corresponding type according to the type of the migration task to perform migration operation;

and each migration plug-in module can complete data migration in a multi-level task mode after being started by the migration plug-in selection module.

In the above system, the migration task execution unit further includes: a failure recovery module, which is respectively connected with the back-end database and each migration plug-in, and is used for the plug-in module to insert a check point (i.e. checkpoint) into the back-end database according to a preset time length when migrating all the scheduled file migration tasks, and when the file migration tasks are rescheduled after a failure, the plug-in module starts to operate again from the last completed check point; the checkpoint is an internal event that, when activated, triggers the back-end database write process to write out dirty data blocks in the data buffer to the data file.

Fig. 8 illustrates a processing flow of the migration task execution unit described above.

The data migration system is a completely elastic plug-in type cross-cluster data migration system, and can flexibly schedule to realize that the migration system does not occupy resources when not working, schedules necessary resources when needing to work and releases used resources after finishing working by setting different types of migration plug-ins which can be correspondingly selected according to the types of migration tasks; different migration requirements are realized through migration plug-ins, and data migration among various heterogeneous clusters is supported; when the task fails, the operation can be continued from the interrupted position in the process.

The embodiments of the present invention are described in further detail below.

Referring to fig. 1, 9, 10 and 11, the object storage cross-cluster mass data migration method of the present invention includes the following steps:

step S1, constructing the service on InterFaceServer for receiving the migration task request sent by the user, wherein the service receives the request for establishing subtask sent by the task in addition to the request for establishing task from the user;

step S2, the interface server writes the relevant information of the created task into the back-end database, generates the configuration file of the task that needs to be created, and uploads the configuration file to the back-end unified storage, which uses OSS object storage in this embodiment;

step S3, the task scheduler scans the task queue in the back-end database at regular time, and schedules the tasks in waiting and pause states to execute the migration task;

step S4, the scheduled migration task may run in a Docker container, may also be started to run in a jobb manner of K8S, and may also generate a process running task on the physical machine;

step S5, the task scheduler will schedule different plug-ins to start task instances according to the type of the task; in the embodiment, the image of the corresponding Docker container is downloaded according to the task type, and the jobinstance is started to execute the migration operation;

step S6, specifically, the migration plug-in is implemented in a multi-level task manner:

step S61), after receiving the user task, scheduling a corresponding migration task mirror image according to the requirement of the user task, if the user needs to migrate the whole account, starting the account migration task, traversing all containers by the account migration task, and creating a container migration task;

step S62), the created container migration task is scheduled and then traverses the files under the container, and a file migration task is created for each n files according to the specified number of files (assumed as n);

step S63), after the created file migration task is scheduled, comparing a source file of a source end cluster with a target file of a target end cluster, if the source file is the same as the target file, skipping the copying and migration of the source file, and if the source file is different from the target file, copying and migrating the file;

step S7, the failure is recovered, and when all scheduled tasks insert checkpoint into the database at regular time, and are rescheduled after the failure, restart is started from the last checkpoint that is completed (see fig. 9).

The steps of the method are realized as follows: fig. 9 illustrates an execution flow of an intersacenerver: the user creates tasks and subtasks, and executes and schedules the operation and maintenance instructions, and fig. 10 illustrates a specific configuration of an object storage cross-cluster mass data migration system;

implementing and deploying an InterFaceServer; the InterFaceServer receiving the user's request may use RestAPI, which may be based on RPC or CMD (command line) mode;

(1) after receiving a request for creating a character, the InterfaceServer writes task information to be created into a database or writes tasks to be created into a storage or memory block which can be read in a public way, and the juxtaposition state is waiting;

(2) the scheduler schedules a corresponding type of task progress in the physical machine according to task information in a database or a public read storage or memory block, and appoints related information required by the task together, and the state of the task progress is in operation;

(3) the scheduler can also schedule a corresponding task or pod in the K8s, and the state is set to be running;

(4) the scheduler can also set the state as running for the task corresponding to the scheduler in the Docker container;

(5) closing the self process/job/pod after the task is finished, and setting the juxtaposition state as the end;

(6) implementation of migration plug-ins:

referring to fig. 12, (61) account level data migration plug-in:

(611) writing a checkpoint into a database or a shared storage after the task is started;

(612) all container/bucket information under the account is collected from the source end cluster list according to the task configuration information;

(613) applying for creating a container level migration task to the InterfaceServer according to the container information from the list;

(614) creating a piece of checkpoint information to be written into a shared storage or a database every time a certain number of containers are scanned;

referring to FIG. 13, (62) Container level data migration plug-in

(621) Writing a checkpoint into a database or a shared storage after the task is started;

(622) all file information under the container is clustered from the source end according to the task configuration information;

(623) organizing every 1000 (or other numbers) of file lists output by the list into a database table or applying information in the form of serialized files to an InterfaceServer for creating a FileList level migration task;

(624) creating a piece of checkpoint information to be written into a shared storage or a database every time a certain number of FileLists are scanned;

referring to FIG. 14, (63) FileList level data migration plug-in:

(631) writing a checkpoint into a database or a shared storage after the task is started;

(632) acquiring a database table or a serialized file corresponding to the task according to the task configuration information;

(633) copying the file from the source end cluster to the target cluster according to the acquired file list;

(634) and finishing the self after finishing.

The migration system and the migration method at least have the following advantages:

(1) the invention adopts a popular containerization scheme, can easily realize tasks based on K8S arrangement, thereby realizing complete flexible scheduling, and being closer to the characteristics of migration tasks in design: the tasks are relatively centralized, and system resources are not needed when no task exists;

(2) the invention adopts a plug-in type extension scheme, can easily extend new requirements and migrate among different types of clusters, rather than the traditional method of only migrating data in the same cluster;

(3) the invention realizes automatic fault recovery based on a checkpoint mode, and has the advantages of high recovery speed, easy tracking of fault recovery points, less recorded information of the recovery points, and simple and convenient implementation method.

Those of ordinary skill in the art will understand that: all or part of the processes in the method according to the embodiments of the present invention may be implemented by a program that can be stored in a computer-readable storage medium and that, when executed, can include the processes according to the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A method for migrating object storage across cluster mass data is characterized by comprising the following steps:

step S1, receiving a migration task request sent by a user and a subtask established by the migration task request;

step S2, generating a corresponding configuration file of the migration task according to the related information of the established subtask, and storing the established subtask and the related information of the corresponding configuration file of the migration task into a task queue of a back-end database by an OSS object;

step S3, scanning the task queue stored in the back-end database according to the preset time length, and dispatching and executing the transition tasks in the waiting state and the pause state;

step S4, the transfer task is called to run in any mode of a Docker container, a jobs of K8S and a process;

step S5, starting a migration plug-in of a corresponding type according to the type of the migration task to perform migration operation;

and step S6, the called migration plug-in completes data migration in a multi-level task mode.

2. The method for migrating the object storage across the cluster mass data according to claim 1, wherein in step S6, the migration plug-in completes the data migration in a multi-level task manner as follows:

step S61) scheduling a corresponding migration task mirror image according to the received request of the user migration task, if the user needs to migrate the whole account, starting the account migration task, and creating a bucket migration task after the account migration task traverses all the buckets;

step S62), after the created barrel migration task is scheduled, traversing all files under the barrel, and creating a file migration task according to each n files, wherein n is a preset number;

step S63), comparing the source file of the source end cluster with the target file of the target end cluster, skipping copying and migrating the source file if the source file and the target file are the same, and copying and migrating the source file to the target end cluster as the target file if the source file and the target file are different.

3. The method for migrating the object storage across the cluster mass data according to claim 1 or 2, wherein the method further comprises:

step S7, recovering the fault, inserting check points into the back end database according to the preset time length for all the scheduled file migration tasks, and when the file migration tasks are rescheduled after the fault, restarting from the last completed check point;

the check point is an internal event, and the event triggers a back-end database writing process to write out the dirty data blocks in the data buffer into the data file after being activated.

4. The method for migrating the mass data of the object storage across the clusters according to claim 1 or 2, wherein in the step 5, the migration plug-in of the corresponding type comprises:

any of an account level data migration plug-in, a bucket level data migration plug-in, and a file list level data migration plug-in.

5. An object storage cross-cluster mass data migration system for implementing the method of any one of claims 1 to 3, comprising:

6. The system for migrating object storage across cluster mass data according to claim 5, wherein said portal service unit comprises:

the system comprises a task creating request module, a task request submodule, a configuration information generating module, an insertion queue module, a database operation module, a task scheduling module, a container cloud resource adjusting module and a container starting module; wherein the content of the first and second substances,

7. The system for migrating the object storage across the cluster mass data according to claim 5 or 6, wherein the migration task scheduling unit comprises:

the task scanning and judging module and the migration task determining module; wherein the content of the first and second substances,

8. The system for migrating the mass data of the object storage across the clusters according to claim 5 or 6, wherein the migration task execution unit comprises:

the system comprises a migration plug-in selection module and a plurality of migration plug-in modules; wherein the content of the first and second substances,

the migration plug-in selection module is respectively in communication connection with each migration plug-in module and can start the migration plug-in modules of corresponding types according to the types of the migration tasks to perform migration operation;

9. The system for migrating object storage across cluster mass data according to claim 8, wherein the migration task execution unit further comprises:

the fault recovery module is respectively in communication connection with the back-end database and each migration plug-in, and is used for inserting a check point into the back-end database according to a preset time length when the plug-in module migrates all scheduled files, and when the file migration tasks are rescheduled after a fault, restarting the operation from the last finished checkpoint;