CN116414803A

CN116414803A - Data migration method, device and readable storage medium

Info

Publication number: CN116414803A
Application number: CN202310294434.6A
Authority: CN
Inventors: 范文; 李东
Original assignee: Shenzhen Aijieyun Technology Co ltd
Current assignee: Shenzhen Aijieyun Technology Co ltd
Priority date: 2023-03-17
Filing date: 2023-03-17
Publication date: 2023-07-11

Abstract

The application discloses a data migration method, data migration equipment and a readable storage medium. And for each subtask, migrating the data corresponding to the data section of the subtask from the data source end to the destination end by utilizing a migration channel of the subtask. By adopting the scheme, the migration server divides the migration task into a plurality of subtasks, opens the migration channel of each subtask, utilizes the migration channel of each subtask to migrate the data corresponding to the data section of the subtask, does not need to match an API interface, supports streaming, and simultaneously improves the data migration speed and reduces the cost.

Description

Data migration method, device and readable storage medium

Technical Field

The embodiment of the application relates to the technical field of cloud storage, in particular to a data migration method, data migration equipment and a readable storage medium.

Background

Currently, there are three main stream storage types: block storage, file storage, and object storage. The object storage has the advantage of high-speed direct access disk of block storage, and has the distributed sharing characteristic of file storage. Thus, object storage is increasingly taking an important role in the field of cloud computing storage services.

With the development of technology, cloud manufacturers capable of providing object storage services are increasing, and customers can flexibly select which cloud manufacturer's object storage service to use. Meanwhile, data migration requirements between different cloud vendors are increasing. In the migration process, a cloud manufacturer at a data source end and a cloud manufacturer at a destination end are matched with application program interfaces (Application Programming Interface, API) of the two parties, so that data migration is further performed.

However, the manner of matching the API interface is time consuming and costly. After the API interface is matched, if a new cloud manufacturer needs to be supported, the API interface needs to be reconfigured, which is tedious and time-consuming.

Disclosure of Invention

The embodiment of the application provides a data migration method, equipment and a readable storage medium, wherein a migration server divides a migration task into a plurality of subtasks, a migration channel of each subtask is opened, data corresponding to a data section of the subtask is migrated by utilizing the migration channel of each subtask, an API interface is not required to be matched, and streaming is supported, and meanwhile, the data migration speed is improved and the cost is reduced.

In a first aspect, an embodiment of the present application provides a data migration method, applied to a migration server, where the method includes:

Creating a migration task, wherein the migration task indicates data to be migrated, and a data source end and a data destination end of the data to be migrated;

according to the migration task, sequentially creating at least one subtask, wherein different subtasks in the at least one subtask correspond to different data sections, and the data to be migrated is the sum of data corresponding to the data sections of each subtask;

and for each subtask, migrating the data corresponding to the data section of the subtask from the data source end to the destination end by utilizing a migration channel of the subtask.

In a second aspect, an embodiment of the present application provides a data migration apparatus, including:

the system comprises a creation module, a migration module and a storage module, wherein the creation module is used for creating a migration task, wherein the migration task indicates data to be migrated, and a data source end and a data destination end of the data to be migrated;

the processing module is used for sequentially creating at least one subtask according to the migration task, different subtasks in the at least one subtask correspond to different data sections, and the data to be migrated is the sum of the data corresponding to the data sections of the subtasks;

and the migration module is used for migrating the data corresponding to the data section of each subtask from the data source end to the destination end by utilizing the migration channel of the subtask.

In a third aspect, an embodiment of the present application provides an electronic device, including: a processor, a memory and a computer program stored on the memory and executable on the processor, which processor, when executing the computer program, causes the electronic device to carry out the method as described in the various possible implementations of the first aspect.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium having stored therein computer instructions, which when executed by a processor, are adapted to carry out the method according to the various possible implementations of the first aspect above.

In a fifth aspect, embodiments of the present application provide a computer program product comprising a computer program which, when executed by a processor, implements the method as described above in the various possible implementations of the first aspect.

According to the data migration method, the data migration device and the readable storage medium, after the migration server creates the migration task, a plurality of subtasks are sequentially created according to the migration task. And for each subtask, migrating the data corresponding to the data section of the subtask from the data source end to the destination end by utilizing a migration channel of the subtask. By adopting the scheme, the migration server divides the migration task into a plurality of subtasks, opens the migration channel of each subtask, utilizes the migration channel of each subtask to migrate the data corresponding to the data section of the subtask, does not need to match an API interface, supports streaming, and simultaneously improves the data migration speed and reduces the cost.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of a network architecture to which a data migration method according to an embodiment of the present application is applied;

FIG. 2 is a schematic diagram of another network architecture of a data migration method according to an embodiment of the present disclosure;

FIG. 3 is a flow chart of a data migration method provided in an embodiment of the present application;

FIG. 4 is a flow chart of creating subtasks in a data migration method according to an embodiment of the present application

FIG. 5 is a flowchart of executing a subtask in a data migration method according to an embodiment of the present application;

fig. 6 is a schematic diagram of a data migration apparatus according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

Object storage is a distributed storage technology and is widely applied to the field of cloud computing. In theory, object storage supports the storage of an unlimited amount of unstructured data, accessed based on a simple storage service (Simple Storage Service, S3) or swift protocol, etc. Basic concepts in object storage include buckets (buckets), objects (objects), and the like. A bucket is a memory space used to manage objects, similar to a root directory or disk partition within a file store, through which objects are managed. The object may be any type of file.

The user flexibly selects the object storage service provided by the cloud manufacturer according to the requirement, and the data migration requirement is inevitably generated. For example, initially, a user selects an object storage service provided by cloud vendor a, and data is stored in a storage system of cloud vendor a. After a period of time, the user selects an object storage service provided by cloud vendor b, and at this time, data stored in cloud vendor a before needs to be migrated to the storage system of cloud vendor b.

In general, each cloud manufacturer has its own migration scheme, which is bound with its own product, and supports migration of data from other cloud manufacturers to its own storage system, but does not support data migration.

Moreover, the migration schemes of most cloud vendors are extremely simple, or require separate hardware support, and do not support multi-machine multitasking acceleration, streaming, etc. In the migration process, the data to be migrated needs to be downloaded to the local and then uploaded, so that the migration period is very long.

In addition, the existing migration scheme is simply packaging tools such as rsync and the like, and lacks support of a business model. The whole migration process cannot be monitored, the migrated content cannot be recorded, and the exported migration report is not supported.

Based on this, the embodiment of the application provides a data migration method, a device and a readable storage medium, wherein a migration server divides a migration task into a plurality of subtasks, a migration channel of each subtask is opened, data corresponding to a data section of the subtask is migrated by using the migration channel of each subtask, an API interface is not required to be matched, and streaming is supported while the data migration speed is improved and the cost is reduced.

Fig. 1 is a schematic diagram of a network architecture to which the data migration method according to the embodiment of the present application is applied. Referring to fig. 1, the network architecture includes a migration server 11, a data source 12, a destination 13, and a terminal 14. Network connections are established between the migration server 11 and the data source 12, between the migration server 11 and the destination 13, and between the migration server 11 and the terminal device 14.

The data source end 12 and the destination end 13 correspond to storage systems of different cloud manufacturers; or, different storage systems corresponding to the same cloud vendor. When a client requests to migrate data originally stored in the data source terminal 12 to the destination terminal 13, a migration request is sent to the migration server 11 through the terminal device 14. After receiving the migration request, the migration server 11 creates a migration task according to parameters carried by the migration request, and sequentially creates a plurality of sub-tasks according to the migration task, wherein each sub-task corresponds to a different data section. The migration server 11 concurrently executes each subtask to migrate the data corresponding to the data segment from the data source end 12 to the destination end 13.

The migration server 11 may be hardware or software. When the migration server 11 is hardware, the migration server 11 is a single server or a distributed server cluster composed of a plurality of servers. When the migration server 11 is software, it may be a plurality of software modules or a single software module, etc., and the embodiment of the present application is not limited.

When the migration server 11 serves a scenario with few cloud vendors and a relatively small amount of data, the migration server 11 is a single server on which a plurality of processes, such as a plurality of gateway processes, proxy processes, migration channel processes, and the like, run. The gateway process is used for receiving the migration request, generating a migration task according to the migration request, and sequentially creating a plurality of subtasks according to the migration task. After each creation of the subtask, determining a target process from a plurality of proxy processes, and sending the subtask to the target process. The target process is used for starting a migration channel process, and data corresponding to the subtasks are migrated from the data source end 12 to the destination end 13 through the migration channel process.

When the migration server 11 serves a plurality of cloud vendors and the data volume is relatively large, the migration server 11 is a distributed server cluster, and the distributed server cluster includes a plurality of servers, for example, a gateway server, a proxy server, and a migration channel server, where the proxy server and the migration channel server are in one-to-one correspondence. The gateway server is used for receiving the migration request, generating a migration task according to the migration request, and sequentially creating a plurality of subtasks according to the migration task. After each creation of the subtask, determining a target server from the plurality of proxy servers, and sending the subtask to the target server. The target server starts a migration channel, and data corresponding to the subtasks is migrated from the data source end 12 to the destination end 13 through the migration channel.

In this embodiment, the data source end 12 and the destination end 13 are opposite, and a storage system of a cloud manufacturer may be the data source end 12 or the destination end 13. For example, when data of cloud vendor a is migrated to a storage system of cloud vendor b, cloud vendor a is a data source end 12, and cloud vendor b is a destination end 13; when the data of the cloud manufacturer b is migrated to the storage system of the cloud manufacturer a, the cloud manufacturer b is a data source end 12, and the cloud manufacturer a is a destination end 13.

The terminal device 14 is a device of a client, and the client issues a migration request, a monitoring request, an acquisition of requesting to acquire a migration report, and the like to the migration server 11 through the terminal device 14. The terminal device 14 may be hardware or software. When the terminal device 14 is hardware, the terminal device 14 is, for example, a mobile phone, a tablet computer, an electronic book reader, a laptop, a desktop computer, a server, or the like. When the terminal device 14 is software, it may be installed in the above-listed hardware device, and in this case, the terminal device 14 is, for example, a plurality of software modules or a single software module, etc., the embodiment of the present application is not limited. In this case, the client is also referred to as a user, and is not limited to a personal or business, etc. that uses various storage services provided by the cloud manufacturer.

It should be understood that the number of migration servers 11, data sources 12, destinations 13, and terminal devices 14 in fig. 1 is merely illustrative. In practical implementation, any number of migration servers 11, data source ends 12, destination ends 13 and terminal devices 14 are deployed according to practical requirements.

Fig. 2 is a schematic diagram of another network architecture of the data migration method according to the embodiment of the present application. Referring to fig. 2, in the network architecture, data stored in a data source end includes an object, a file, and the like. The files are, for example, files in a file system (file system), files in a network file system (Network File System, NFS), or the like. The destination end is a storage system, and a plurality of storage buckets (buckets) are deployed. The data migration refers to migrating a file stored in a file storage mode or an object stored in an object storage mode from a data source end to a destination end.

The migration server comprises a plurality of migration gateways, proxy nodes and migration channels rclone. And the client sends a migration request to the migration gateway through the terminal equipment. The migration gateway creates migration tasks according to the migration requests, sequentially creates a plurality of subtasks according to the migration tasks, and sends the subtasks to the agent nodes with lighter loads. And opening a migration channel rclone by the proxy node, and migrating the data from the data source end to the destination end by using the rclone.

When the migration server is an independent server, each migration gateway is an independently running service process, and the proxy node is also an independently running service process. Rclone is also an independently running process. For clarity, the service process corresponding to the migration gateway is referred to as a gateway process, the service process corresponding to the proxy node is referred to as a proxy process, and the process corresponding to rclone is referred to as an rclone process.

In FIG. 2, the database is used to record migration lists, section lists, etc., and may be a relational database such as mysql, etc.

Referring to fig. 2, the migration server includes a plurality of migration gateways, and a client may send a migration request to any one of the migration gateways. In addition, a master-slave relationship of a plurality of migration gateways can be set, namely one of the plurality of migration gateways is used as a master migration gateway, and the rest is a slave migration gateway. The client sends a migration request to any slave migration gateway, after receiving the migration request from the migration gateway, the client sends the migration request to the master migration gateway, and the master migration gateway sequentially creates a plurality of subtasks according to the migration tasks, sequentially issues the subtasks and the like.

Referring to fig. 2, the migration gateway may send subtasks to the proxy node through rest and the like, and the proxy node starts a migration channel through fork and the like.

The data migration method provided by the embodiment of the application has the following characteristics, so that the data migration requirements of object storage and the like are met:

1) The cloud manufacturer of the main stream providing object storage service supports migration channels such as rclone and the like, and can be in seamless butt joint with each other;

2) Supporting large file fragment uploading;

3) Support incremental transmission;

4) Supporting concurrency and bandwidth control;

5) Support streaming (controllable for memory consumption);

6) Is simple and easy to use;

7) Code is open source, community is active.

In the above network architecture, the migration channel is rclone, for example, and embodiments of the present application are not limited thereto. The architecture shown in fig. 2 can be used as a service model, and the data source end and the destination end are storage systems of any cloud manufacturer respectively, so that the data migration requirements among different cloud manufacturers are met.

The data migration method provided in the embodiment of the present application is described in detail below based on the network architecture shown in fig. 1 and fig. 2. For example, please refer to fig. 3.

Fig. 3 is a flowchart of a data migration method provided in an embodiment of the present application. The present embodiment is described in terms of a migration server. The embodiment comprises the following steps:

301. creating a migration task, wherein the migration task indicates data to be migrated, and a data source end and a data destination end of the data to be migrated.

In this embodiment of the present application, the data to be migrated is data of a data source, such as a file stored by the data source in a file storage manner, an object (object) stored by the data source in an object storage manner, and so on.

The migration server is triggered to create a migration task. In one mode, after receiving a migration request from a terminal device, a migration server creates a migration task. The user sends a migration request to the migration server through a command line or REST API or the like. The migration request is used for indicating migration time, a data source end, a destination end, data to be migrated and the like. For example, the migration request instructs the migration server 12:00 to begin migrating data in bucket 1 in the storage system of cloud vendor a to the storage system of cloud vendor b. For another example, the migration request instructs the migration server 12:00 to begin migrating the full amount of data in the storage system of cloud vendor a to the storage system of cloud vendor b. For another example, the migration instruction instructs the migration server 12:00 to start migrating the data under the catalog a, catalog B and catalog C in the storage bucket 1 of the cloud vendor a to the storage system of the cloud vendor B.

In another approach, the migration server periodically creates migration tasks. For example, the migration server receives 12 pm: 00 automatically creating a migration task, and migrating data in the storage bucket 1 in the storage system of the cloud manufacturer a to the storage system of the cloud manufacturer b.

The migration task created by the migration server indicates which data is to be migrated, which cloud vendor's storage system the data source is, which cloud vendor's storage system the destination is, and so on. In addition, the migration task further indicates migration time, conditions to be satisfied when the migration task is executed, and the like, and the embodiment of the present application is not limited.

302. And sequentially creating at least one subtask according to the migration task, wherein different subtasks in the at least one subtask correspond to different data sections, and the data to be migrated is the sum of the data corresponding to the data sections of each subtask.

In the embodiment of the present application, the data size of the data to be migrated is often large, and if the data is migrated serially one by one, a lot of time is consumed. In order to avoid the problem, the migration server sequentially creates a plurality of subtasks, each subtask corresponds to different data sections, each subtask only needs to migrate data corresponding to the data section of the subtask, and the subtasks are executed in parallel, so that the migration speed is increased.

For example, there are 100 ten thousand pieces of data to be migrated, and the migration server sequentially creates 100 subtasks, each subtask corresponding to 1 ten thousand pieces of data.

303. And for each subtask, migrating the data corresponding to the data section of the subtask from the data source end to the destination end by utilizing a migration channel of the subtask.

Referring to fig. 2, different proxy nodes correspond to different migration channels. After receiving the subtask, one proxy node starts a migration channel of the proxy node where the subtask is located, and data corresponding to a data section of the subtask is migrated from a data source end to a destination end by using the migration channel.

According to the data migration method provided by the embodiment of the application, after the migration task is created by the migration server, a plurality of subtasks are sequentially created according to the migration task. And for each subtask, migrating the data corresponding to the data section of the subtask from the data source end to the destination end by utilizing a migration channel of the subtask. By adopting the scheme, the migration server divides the migration task into a plurality of subtasks, opens the migration channel of each subtask, utilizes the migration channel of each subtask to migrate the data corresponding to the data section of the subtask, does not need to match an API interface, has the characteristics of high reliability, high concurrency, simplicity, easiness in use and the like, supports streaming, and simultaneously improves the data migration speed and reduces the cost.

In the conventional object storage migration scheme, due to the limitation of the S3 API, when the data size of the data to be migrated is relatively large, for example, all data in a storage bucket needs to be migrated, thousands of data or even tens of thousands of data are stored in the storage bucket, and a multithreading and production consumer mode is adopted, that is, one thread reads attribute information and writes the attribute information into a message queue, and the thread in a thread pool acquires the attribute information from the message queue and performs migration in the next step, so that serial migration of a plurality of data to be migrated is realized, and the migration efficiency is extremely low.

In the embodiment of the application, after a certain amount of attribute information is loaded by the migration gateway of the migration server, one subtask can be created each time, so that a plurality of subtasks are created in sequence until all the attribute information of the data to be migrated is loaded. After creating the subtasks each time, the subtasks are issued to the target process with lighter load, and each subtask is executed by the target process. Because the number of the agent processes is large, the target processes selected after each creation of the subtasks may be different, and the subtasks are executed by a plurality of target processes in parallel, so that the purpose of improving the data migration speed is achieved. The creation process of the subtasks is described in detail below.

Fig. 4 is a flowchart of creating a subtask in the data migration method provided in the embodiment of the present application. The embodiment comprises the following steps:

401. a migration task is created and started.

After receiving the migration request, the migration gateway of the migration server creates a migration task according to the request parameters carried by the migration request. The request parameters include, but are not limited to, a user identification, such as an Access Key (AK), a combination of a cryptographic Access Key (Secret Access Key, SK) and an Access address (endpoint), etc.

402. Judging whether the attribute information of all the data to be migrated is loaded, and if at least one piece of attribute information of the data to be migrated is not loaded, executing step 403; if all the attribute information of the data to be migrated has been loaded, step 405 is performed.

For example, when the data size of the data to be migrated is huge, the migration server determines whether all attribute information of the data to be migrated has been loaded before loading the attribute information of the data to be migrated each time. If one or some of the migration data has not been loaded, continuing to execute step 403; if the attribute information of all the data to be migrated is already loaded, generating a subtask no matter how much attribute information of the data to be migrated is loaded currently. For example, there are 1000008 pieces of data to be migrated, one subtask corresponds to 1 ten thousand pieces of data, 1000 pieces of data are loaded each time, and 10 times of loading are needed. After 1000 th loading, generating 100 th subtasks, and leaving attribute information of 8 pieces of data to be migrated unloaded.

After creating the 100 th subtask, the migration gateway finds that the attribute information of 8 pieces of data to be migrated is not loaded, so that the 1001 st loading is continued, and the migration gateway only loads 8 pieces of data to be migrated corresponding to the 101 st subtask which is finally generated is 8 pieces of data to be migrated because 1000 pieces are not enough.

403. And loading attribute information of each batch of data from the data source end in batches.

In the embodiment of the application, the data to be migrated is located at a data source end, the migration gateway of the migration server loads attribute information of the data to be migrated, and a subtask is created after the attribute information is loaded to a certain amount of attribute information. And then, continuing to load attribute information and creating a new subtask until all the attribute information of the data to be migrated is loaded. The attribute information is, for example, an identification, a size (size), or the like of data to be migrated.

The standard S3 API defaults to exporting only 1000 pieces of attribute information of the data to be migrated at a time. The data volume corresponding to the data section of each subtask may be greater than 1000, so the migration gateway of the migration server needs to call the S3 API for multiple times to obtain the attribute information of the data corresponding to the subtask from the data source end. For example, a total of 100 ten thousand pieces of data to be migrated needs to be divided into 100 subtasks, each subtask corresponds to 1 ten thousand pieces of data, and the S3 API of the migration gateway can only export attribute information of 1000 pieces of data to be migrated from the data source end at a time. Therefore, the attribute information of one batch is the attribute information of 1000 pieces of data to be migrated, and the migration gateway calls the S3 API 10 times to obtain the attribute information of 1 ten thousand pieces of data.

404. After each batch of attribute information of data is loaded, judging whether the data quantity indicated by the loaded attribute information is greater than or equal to a preset threshold value. If the data amount indicated by the loaded attribute information is greater than or equal to the preset threshold, executing step 405; if the amount of data indicated by the loaded attribute information is smaller than the preset threshold, step 403 is executed.

Continuing with the above example, each subtask corresponds to 1 ten thousand pieces of data, and the preset threshold is, for example, 1 ten thousand pieces, 8 thousand pieces, etc., which is not limited in the embodiment of the present application. The migration gateway of the migration server calls an S3 API to load attribute information of data to be migrated from a data source end, 1000 pieces of attribute information of 1000 pieces of data to be migrated are loaded each time, namely each batch is 1000 pieces of attribute information of the data to be migrated. Assuming that 10 batches of attribute information are loaded, the data amount indicated by the loaded attribute information is equal to a preset threshold value, and the migration gateway of the migration server executes step 405; assuming that attribute information of 4 batches is loaded, the migration gateway of the migration server performs step 403, where the amount of data indicated by the loaded attribute information is smaller than a preset threshold.

405. A subtask is created, after which steps 402 and 406 are performed.

When the data quantity indicated by the loaded attribute information is greater than or equal to a preset threshold value, the migration gateway of the migration server creates a subtask, and writes the loaded attribute information into a migration list in the database.

By adopting the scheme, the migration server creates the subtasks after loading a certain amount of attribute information of the data to be migrated each time, so that a plurality of subtasks are sequentially created, preparation work is prepared for realizing the concurrency of the multitasks, and the aim of improving the data migration speed is fulfilled.

406. And after each creation of a new subtask, determining a target process from a plurality of proxy processes, wherein the target process is a process with load smaller than a preset load in the plurality of proxy processes.

In the embodiment of the present application, a plurality of proxy processes are run on the migration server, as shown by the plurality of proxy nodes in fig. 2. After creating the subtask each time, the migration gateway of the migration server can randomly send the subtask from a certain proxy process in the multiple proxy processes; or the migration gateway selects a target process with the minimum load from a plurality of proxy processes and sends the subtasks to the target process; or the migration gateway selects a process with load smaller than a preset load from a plurality of proxy processes, when at least two proxy processes exist and the load is smaller than the preset load, the migration gateway randomly selects one proxy process from the processes as a target process, and issues the subtasks to the target process.

407. And triggering the target process to execute the subtask so as to utilize a migration channel of the subtask to migrate the data corresponding to the data section of the subtask from the data source end to the destination end.

After receiving the subtask, the target process starts a migration channel of the subtask, so that channels of a data source end and a data destination end are opened, data to be migrated corresponding to the subtask are migrated through the migration channel, and the data size of the data to be migrated of the subtask is, for example, 1 ten thousand.

By adopting the scheme, after each time of subtasks are generated by the migration gateway of the migration server, a process with lighter load is selected from a plurality of proxy processes as a target process, data migration is performed, and data migration efficiency is improved while load balancing of each proxy process is realized.

Optionally, in the above embodiment, the migration server further generates a migration list according to attribute information of each batch of data, and stores the migration list in the database. And after creating the subtask each time, the migration server generates a data section of the subtask according to the loaded attribute information corresponding to the subtask, and writes the corresponding relation between the subtask and the data section into the database.

For example, after the migration gateway loads attribute information of a batch of data to be migrated, the attribute information is written into the migration list. After the attribute information of all the data to be migrated is loaded, the migration list records the attribute information of all the data to be migrated. For example, there are 100 ten thousand pieces of data to be migrated, 1000 pieces of attribute information of the data to be migrated are loaded each time, and a subtask is created 10 times each time. Then, when the initial knowledge is carried out, the migration list is empty, 1000 pieces of attribute information of data to be migrated are loaded for the first time, the migration gateway writes the write attribute information … … in the migration list, after the 1000 th loading, the migration gateway writes the last 1000 pieces of attribute information of the data to be migrated in the migration list, and the migration list is stored in the database. The migration list is stored in the database, so that each agent process can read attribute information of data to be migrated of the subtask from the database.

After each time a new subtask is generated by the migration gateway, a section identifier is generated according to attribute information of data to be migrated corresponding to the subtask. For example, 100 ten thousand pieces of data to be migrated are total, the identifiers are respectively 1 to 100 ten thousand pieces, after the migration gateway loads the attribute information of 10001 to 20000 pieces of data to be migrated, a second subtask is created, the identifier of the data section of the subtask is represented by b, and the identifier of the data corresponding to the data section b recorded by the migration gateway in the database is 10001 to 20000. The migration gateway sends the identification of the data section, namely the data section b, to the target process while sending the subtask to the target process. After the target process receives the subtasks, determining that the subtasks are: 10000 pieces of data to be migrated are migrated from a data source end to a destination end, a database is read according to a data section b, and the identification of the data indicated by the data section b is 10001-20000. Therefore, the target process acquires the data to be migrated with the identifier of 10001-20000 from the data source end, and migrates the data to be migrated to the destination end through the migration channel.

By adopting the scheme, the migration server writes the corresponding relation of the migration list, the subtasks and the data sections for indicating all the data to be migrated into the database, so that each target process can quickly and accurately determine the data to be migrated of one subtask, and the purpose of improving the data migration rate is realized.

Next, detailed description will be given of how the target process executes the subtasks. For example, please refer to fig. 5. Fig. 5 is a flowchart of executing a subtask in the data migration method according to the embodiment of the present application. The embodiment comprises the following steps:

501. and reading the migration list in the database.

The migration server is provided with a plurality of proxy processes, and when the migration gateway selects a target process for one subtask and distributes the subtask to the target process, the target process is triggered to execute the subtask. In the execution process, the target process reads the migration list in the database according to the data section corresponding to the subtask, so as to determine the data migrated by the target process. For example, the target process reads the database in the data section b corresponding to the subtask, and determines that the identifier of the data indicated by the data section b is 10001-20000.

502. A section list is generated.

The section list is used for indicating data to be migrated corresponding to the subtasks, and the data to be migrated of different subtasks are different. The difference list corresponds to a piece of content that is intercepted from the migration list. For example, the migration list indicates attribute information of 1 to 100 ten thousand data to be migrated, and one section list indicates attribute information of 10001 to 20000 data to be migrated.

503. And reading the data to be migrated of the subtask from the data source terminal according to the section list.

504. And migrating the data to be migrated of the subtask from the data source end to the destination end by utilizing the migration channel of the subtask.

After creating the section list, the target process determines which data to be migrated of the data source end to the destination end according to the section list. Meanwhile, a migration channel is started, a copy command is executed, and data indicated by the section list is migrated from a data source end to a destination end, so that subtasks are completed.

In addition, in the process of executing the subtasks, the target process writes the attribute information of the completed data, such as the identifier, the size, the migration time and the like, into the database.

By adopting the scheme, the target process loads the section list from the database, so that the data to be migrated corresponding to the subtasks is accurately determined, and the purpose of improving the data migration accuracy is realized.

Optionally, in the foregoing embodiment, in a process that the migration server triggers the target process to execute the subtask, the target resource allocated to the subtask is determined according to the resource consumption condition of the target process. Thereafter, the subtasks are performed using the target resources.

For example, referring to fig. 2, the target process is, for example, a proxy node, where the proxy node has resources such as CPU, bandwidth, etc. The proxy node performs sub-tasks as well as other tasks. In order to avoid the mutual interference of the tasks of the mutual target process, the target process allocates certain target resources for the subtasks according to the consumption conditions of the current CPU, bandwidth and the like. Based on the allocated target resources, the concurrency quantity and bandwidth limitation when the subtasks start the migration channel are set. And then, starting a migration channel to migrate the data.

By adopting the scheme, the aim of improving the data migration quality is fulfilled by distributing the target resources for the subtasks, setting the concurrency quantity and bandwidth limitation of the tasks executed on the target process when the migration channel is started, ensuring that the execution of the subtasks is not interfered and the like.

Optionally, in the above embodiment, the migration server further receives a monitoring request from the terminal device. And then, generating monitoring information in response to the monitoring request, and feeding back the monitoring information to the terminal equipment. The monitoring information is used for indicating the execution status of the migration task, and the execution status at least comprises the progress of each subtask and the consumption of each subtask on the target resource.

In the data migration process, the terminal device may send a monitoring request to the migration server through a command line, REST API, etc. to request to check the migration progress, the execution status of each subtask, the consumption of CPU, memory, etc. by the startup of the migration channel, etc., so as to timely learn the migration status, and timely discover and solve the problem. For example, if a subtask is found to be slow to execute through the monitoring information, the subtask is tuned to other proxy processes; or, suspending other tasks on the target process, and preferentially executing the subtasks.

By adopting the scheme, the monitoring in the migration process is supported, so that the data migration progress and the like can be known in time, and the purposes of improving the data migration speed and quality are realized.

Optionally, in the above embodiment, after each subtask is executed, the migration server generates a migration report, and sends the migration report to the terminal device. Wherein the migration report records at least one of the information: the method comprises the steps of identifying the migration task, starting and stopping time of the migration task, data to be migrated of the migration task, and a data source end and a data destination end of the data to be migrated.

The migration server monitors each subtask, generates a migration report for the migration task after the subtask is finished, and the migration report records detailed information of the migration task, such as start-stop time, a data source end, a destination end, a storage bucket where data to be migrated are located, and the like. After generating the migration report, the migration server sends the migration report to the terminal equipment for a user to track, and the like, so that the purpose of improving the data migration quality is achieved.

The following are device embodiments of the present application, which may be used to perform method embodiments of the present application. For details not disclosed in the device embodiments of the present application, please refer to the method embodiments of the present application.

Fig. 6 is a schematic diagram of a data migration apparatus according to an embodiment of the present application. The data migration apparatus 600 includes: a creation module 61, a processing module 62, a migration module 63.

The creation module 61 is configured to create a migration task, where the migration task indicates data to be migrated, and a data source end and a destination end of the data to be migrated;

the processing module 62 is configured to sequentially create at least one subtask according to the migration task, where different subtasks in the at least one subtask correspond to different data segments, and the data to be migrated is a sum of data corresponding to the data segments of each subtask;

and the migration module 63 is configured to migrate, for each subtask, data corresponding to a data segment of the subtask from the data source end to the destination end by using a migration channel of the subtask.

In a possible implementation manner, the processing module 62 is configured to load attribute information of each batch of data from the data source end in batches when the data to be migrated includes a plurality of data; after each batch of attribute information of data is loaded, judging whether the data quantity indicated by the loaded attribute information is greater than or equal to a preset threshold value; when the data amount indicated by the loaded attribute information is greater than or equal to a preset threshold value, creating one subtask, and starting the creation of the next subtask.

In a possible implementation manner, the processing module 62 is further configured to generate a migration list according to attribute information of each batch of data and store the migration list in the database; after each creation of a subtask, generating a data section of the subtask according to the loaded attribute information corresponding to the subtask; and writing the corresponding relation between the subtasks and the data sections into the database.

In a possible implementation manner, the migration module 63 is configured to determine, after each creation of a new subtask, a target process from a plurality of proxy processes, where the target process is a process with a load less than a preset load in the plurality of proxy processes; and triggering the target process to execute the subtask so as to utilize a migration channel of the subtask to migrate the data corresponding to the data section of the subtask from the data source end to the destination end.

In a possible implementation manner, the migration module 63 is configured to determine, according to the resource consumption status of the target process, a target resource allocated to the subtask; and executing the subtasks by utilizing the target resources.

In a possible implementation manner, the migration module 63 is configured to trigger the target process to read a migration list in a database according to the data section corresponding to the subtask, so as to establish a section list, where the migration list is used to indicate data to be migrated corresponding to the migration task, and the section list is used to indicate data to be migrated corresponding to the subtask; reading data to be migrated of the subtask from the data source terminal according to the section list; and migrating the data to be migrated of the subtask from the data source end to the destination end by utilizing the migration channel of the subtask.

In a possible implementation, referring to fig. 6 again, optionally, the data migration apparatus 600 further includes:

a transceiver module 64 for receiving a monitoring request from a terminal device;

the processing module 62 is further configured to generate, in response to the monitoring request, monitoring information, where the monitoring information is used to indicate an execution status of the migration task, where the execution status includes at least a progress of each subtask and consumption of a target resource by each subtask;

the transceiver module 64 is further configured to send the monitoring information to the terminal device.

In a possible implementation manner, the processing module 62 is further configured to generate a migration report after each subtask is executed, where the migration report records at least one of the information: the identification of the migration task, the start-stop time of the migration task, the data to be migrated of the migration task, and the data source end and the destination end of the data to be migrated;

the transceiver module 64 is further configured to send the migration report to a terminal device.

In a possible implementation manner, the creating module 61 is configured to receive a creating instruction by using any one gateway process of a plurality of gateway processes, so that the gateway process creates the migration task in response to the creating instruction, where the plurality of gateway processes run on the migration server, and the gateway process is configured to create the migration task and generate at least one subtask according to the migration task; the migration server also runs a plurality of proxy processes for executing the subtasks.

The data migration apparatus provided in the embodiment of the present application may perform the actions of migrating a server in the above embodiment, the implementation principle and the technical effect are similar, and are not repeated here.

Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application. The electronic device 700 is, for example, the migration server described above, and the electronic device 700 includes:

a processor 71 and a memory 72;

the memory 72 stores computer instructions;

the processor 71 executes the computer instructions stored in the memory 72 such that the processor 71 performs the data migration method as implemented by the migration server above.

The specific implementation process of the processor 71 may be referred to the above method embodiment, and its implementation principle and technical effects are similar, and this embodiment will not be described herein again.

Optionally, the electronic device 700 further comprises a communication component 73. The processor 71, the memory 72, and the communication section 73 may be connected via a bus 74.

Embodiments of the present application also provide a computer readable storage medium having stored therein computer instructions which, when executed by a processor, are configured to implement a data migration method as implemented by a migration server above.

Embodiments of the present application also provide a computer program product comprising a computer program which, when executed by a processor, implements a data migration method as implemented by a migration server as above.

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the application disclosed herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.

It is to be understood that the present application is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims

1. A data migration method, applied to a migration server, the method comprising:

2. The method according to claim 1, wherein creating at least one sub-task in turn according to the migration task comprises:

when the data to be migrated contains a plurality of data, loading attribute information of each batch of data from the data source end in batches;

after each batch of attribute information of data is loaded, judging whether the data quantity indicated by the loaded attribute information is greater than or equal to a preset threshold value;

when the data amount indicated by the loaded attribute information is greater than or equal to a preset threshold value, creating one subtask, and starting the creation of the next subtask.

3. The method as recited in claim 2, further comprising:

generating a migration list according to attribute information of each batch of data and storing the migration list in a database;

After each creation of a subtask, generating a data section of the subtask according to the loaded attribute information corresponding to the subtask;

and writing the corresponding relation between the subtasks and the data sections into the database.

4. The method according to claim 1, wherein for each subtask, using the migration channel of the subtask, migrating data corresponding to the data segment of the subtask from the data source end to the destination end includes:

after creating a new subtask each time, determining a target process from a plurality of proxy processes, wherein the target process is a process with load smaller than a preset load in the plurality of proxy processes;

and triggering the target process to execute the subtask so as to utilize a migration channel of the subtask to migrate the data corresponding to the data section of the subtask from the data source end to the destination end.

5. The method of claim 4, wherein the triggering the target process to execute the subtask comprises:

determining target resources allocated to the subtasks according to the resource consumption conditions of the target processes;

and executing the subtasks by utilizing the target resources.

6. The method of claim 4, wherein the triggering the target process to execute the subtask comprises:

triggering the target process to read a migration list in a database according to the data section corresponding to the subtask so as to establish a section list, wherein the migration list is used for indicating data to be migrated corresponding to the migration task, and the section list is used for indicating the data to be migrated corresponding to the subtask;

reading data to be migrated of the subtask from the data source terminal according to the section list;

and migrating the data to be migrated of the subtask from the data source end to the destination end by utilizing the migration channel of the subtask.

7. The method of any one of claims 1-6, further comprising:

receiving a monitoring request from a terminal device;

responding to the monitoring request, generating monitoring information, wherein the monitoring information is used for indicating the execution status of the migration task, and the execution status at least comprises the progress of each subtask and the consumption of each subtask on a target resource;

and sending the monitoring information to the terminal equipment.

8. The method of any one of claims 1-6, further comprising:

After all the subtasks are executed, generating a migration report, wherein the migration report records at least one piece of information: the identification of the migration task, the start-stop time of the migration task, the data to be migrated of the migration task, and the data source end and the destination end of the data to be migrated;

and sending the migration report to terminal equipment.

9. The method of any of claims 1-6, wherein creating a migration task comprises:

receiving a creation instruction by using any gateway process in a plurality of gateway processes, so that the gateway process responds to the creation instruction to create the migration task, the plurality of gateway processes run on the migration server, and the gateway process is used for creating the migration task and generating at least one subtask according to the migration task; the migration server also runs a plurality of proxy processes for executing the subtasks.

10. An electronic device comprising a processor, a memory and a computer program stored on the memory and executable on the processor, characterized in that the processor, when executing the computer program, causes the electronic device to implement the method of any one of claims 1 to 9.