CN108628874B

CN108628874B - Method and device for migrating data, electronic equipment and readable storage medium

Info

Publication number: CN108628874B
Application number: CN201710159838.9A
Authority: CN
Inventors: 温帮; 彭兴勃
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Priority date: 2017-03-17
Filing date: 2017-03-17
Publication date: 2020-12-22
Anticipated expiration: 2037-03-17
Also published as: CN108628874A

Abstract

The invention provides a method and a device for migrating data, electronic equipment and a readable storage medium, which can solve the problem that a business application stops writing for a long time in the data migration process, thereby realizing seamless data migration without stopping writing of the business application. The method comprises the following steps: configuring the replication service among clusters and suspending the replication service; creating a snapshot of the data of the source cluster at the current moment, and exporting the snapshot to a target cluster; updating the data of the target cluster by using the snapshot, and restarting the copy service after the updating is finished; and playing back the operation of the business application on the source cluster during the suspension of the copy service in the target cluster, and writing the operation into the target cluster.

Description

Method and device for migrating data, electronic equipment and readable storage medium

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a method and an apparatus for migrating data, an electronic device, and a readable storage medium.

Background

As data volumes increase and large data populates, more and more business systems choose to use database clusters as storage. As the number and size of clusters increase, the inter-cluster data migration involved continues to increase.

The data migration between clusters in the prior art is usually performed by copying files using a Distcp (Distcp is a distributed copy, which is a tool for copying inside a large-scale cluster and between clusters) tool, and performing data loading on a target cluster to implement the migration. Taking HBase (a distributed, column-oriented, open source database) cluster as an example, the prior art process for migrating data is roughly as shown in fig. 1.

In the process of implementing the invention, the inventor finds that at least the following problems exist in the prior art: the Distcp tool needs to disable the table before operation to ensure that no data is written. Therefore, the business application needs to stop writing for a long time during the data migration process. This will affect the use of the online service, resulting in poor user experience, and also fails to meet the requirement of high availability of the service application to the database cluster.

Disclosure of Invention

In view of this, embodiments of the present invention provide a method and an apparatus for migrating data, an electronic device, and a readable storage medium, which can solve the problem that a service application stops writing for a long time in a data migration process, thereby implementing seamless data migration without stopping writing for the service application.

To achieve the above object, according to an aspect of an embodiment of the present invention, there is provided a method of migrating data.

The method for migrating data in the embodiment of the invention comprises the following steps: configuring inter-cluster copy service and suspending the copy service, wherein the copy service is to send an operation on a source cluster by a business application to a target cluster, the target cluster plays back the operation and writes the operation into the target cluster, and the suspending the copy service means that the operation on the source cluster by the business application is kept not to be sent to the target cluster temporarily; creating a snapshot of the data of the source cluster at the current moment, and exporting the snapshot to a target cluster; updating the data of the target cluster by using the snapshot, and restarting the copy service after the updating is finished; and playing back the operation of the business application on the source cluster during the suspension of the copy service in the target cluster, and writing the operation into the target cluster.

Optionally, the cluster is an HBase cluster.

Optionally, configuring the inter-cluster replication service includes: configuring a Replication queue, and sending the WAL log of the source cluster to the target cluster through the Replication queue, wherein the WAL log is used for storing the operation of a service application on the source cluster; and playing back the WAL log at the target cluster so as to update the operation of the service application on the source cluster to the target cluster.

Optionally, the method further comprises: the source cluster retains the WAL log during suspension of replication service.

Optionally, the updating the data of the target cluster by using the snapshot includes: updating the definition of the table of the target cluster with the snapshot; recovering Region information of the table; and a Region for offline change, updating the information of the meta table.

To achieve the above object, according to another aspect of the embodiments of the present invention, there is provided an apparatus for migrating data.

The device for migrating data in the embodiment of the invention comprises: the system comprises a configuration module and a replication module, wherein the configuration module is used for configuring inter-cluster replication service and suspending the replication service, the replication service is used for sending operation of a business application on a source cluster to a target cluster, the target cluster plays back the operation and writes the operation into the target cluster, and the suspension of the replication service means that the operation of the business application on the source cluster is kept and is not sent to the target cluster temporarily; the snapshot module is used for creating a snapshot of the data of the source cluster at the current moment and exporting the snapshot to a target cluster; the updating module is used for updating the data of the target cluster by using the snapshot and restarting the copy service after the updating is finished; and the replication module is used for playing back the operation of the business application on the source cluster during the suspension of the replication service in the target cluster and writing the operation into the target cluster.

Optionally, the cluster is an HBase cluster.

Optionally, the configuration module is further configured to: configuring a Replication queue, and sending the WAL log of the source cluster to the target cluster through the Replication queue, wherein the WAL log is used for storing the operation of a service application on the source cluster; and playing back the WAL log at the target cluster so as to update the operation of the service application on the source cluster to the target cluster.

Optionally, the configuration module is further configured to: the source cluster retains the WAL log during suspension of replication service.

Optionally, the update module is further configured to: updating the definition of the table of the target cluster with the snapshot; recovering Region information of the table; and a Region for offline change, updating the information of the meta table.

To achieve the above object, according to still another aspect of embodiments of the present invention, there is provided an electronic apparatus.

An electronic device of an embodiment of the present invention includes: one or more processors; a storage device, configured to store one or more programs, which when executed by the one or more processors, cause the one or more processors to implement the method for migrating data of an embodiment of the present invention.

To achieve the above object, according to still another aspect of an embodiment of the present invention, there is provided a computer-readable medium.

The computer readable medium of the embodiment of the present invention has a computer program stored thereon, and the program realizes the method of migrating data of the embodiment of the present invention when executed by the processor.

One embodiment of the above invention has the following advantages or benefits: by utilizing a mode of combining snapshot and copy, the service application does not need to stop writing in the data migration process, and the seamless migration of the service application data is further completed under the condition of not influencing the service application; the WAL logs are temporarily backlogged in a Replication queue during the suspension of the Replication service, so that the WAL logs which are not replicated are not deleted, and after the Replication service is restarted, backlogged data operations can be continuously consumed and the Replication is completed; by creating the online snapshot of the table to be migrated on the source cluster, the influence on the cluster performance can be reduced to the minimum in the process of migrating data by utilizing the snapshot in view of the facts that the creation of the snapshot does not influence the reading and writing of the table of the source cluster and the fact that the snapshot is derived to be in an HDFS (Hadoop distributed file system) layer; after the snapshot data migration is completed, the Replication service of the source cluster is restarted, the WAL log backlogged by the Replication queue of the source cluster during the snapshot migration is consumed, and the incremental data migration is completed, so that the data of the source cluster and the data of the target cluster can be finally consistent.

Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.

Drawings

The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:

FIG. 1 is a flow diagram of a method of data migration of the prior art;

FIG. 2 is a schematic diagram of the main steps of a method of migrating data according to an embodiment of the present invention;

FIG. 3 is a schematic flow chart illustrating a method for migrating data according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of the main modules of an apparatus for migrating data according to an embodiment of the present invention;

FIG. 5 is a diagram illustrating a hardware architecture of an electronic device for migrating data according to an embodiment of the present invention;

fig. 6 schematically shows a structural diagram of a computer system in which a terminal device or a server according to an embodiment of the present invention can be implemented.

Detailed Description

Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In order to solve the problem that a service application needs to stop writing for a long time in the data migration process in the prior art, the embodiment of the invention provides a technical scheme for migrating data, and the seamless migration of data without stopping writing of the service application is realized in a mode of combining snapshot and copy.

In this embodiment of the present invention, the service application may be installed on a terminal device, and communicate with the source database cluster by using the terminal device, and a network used in communication may include various connection types, such as a wired connection, a wireless communication link, or an optical fiber cable.

The terminal device may be various electronic devices having a display screen and supporting web browsing, including but not limited to a smart phone, a tablet computer, an electronic book reader, an MP3 player (Moving Picture Experts Group Audio Layer III, motion Picture Experts compression standard Audio Layer 3), an MP4 player (Moving Picture Experts Group Audio Layer IV, motion Picture Experts compression standard Audio Layer 4), a laptop portable computer, a desktop computer, and the like. Various communication client applications, such as a web browser application, may be installed on the terminal device.

FIG. 2 is a diagram illustrating the main steps of a method for migrating data according to an embodiment of the present invention.

As shown in fig. 2, a method for migrating data according to an embodiment of the present invention mainly includes the following steps:

step S21: configuring inter-cluster copy service and suspending the copy service, wherein the copy service is to send an operation on a source cluster by a business application to a target cluster, the target cluster plays back the operation and writes the operation into the target cluster, the suspending the copy service means that the operation on the source cluster by the business application is kept and is not sent to the target cluster, and after the copy service is restarted, the operation is sent to the target cluster to complete the copy.

The purpose of this step is to achieve replication of data between clusters. When there are addition, modification, deletion, etc. operations in the source database cluster, the operations will be recorded, for example, in the HBase cluster, the operations will be written into the WAL Log by the RegionServer (i.e., Write-Ahead-Log, and Write Log system, which is an efficient Log algorithm in the database). And sending the WAL log to the target cluster, playing back the WAL log in a client form after the target cluster receives the WAL log, and writing the WAL log into the target cluster to finish the copying of the data aimed at by the operation.

After the inter-cluster copy service is established and suspended at step S21, migration of data is started from step S22.

Step S22: and creating a snapshot of the data of the source cluster at the current moment, and exporting the snapshot to a target cluster. All data of the source cluster at a time is migrated by exporting an online snapshot of the source cluster data.

Step S23: and updating the data of the target cluster by using the snapshot, and restarting the copy service after the updating is completed. After the snapshot is created and exported in step S22, the purpose of updating the data of the target cluster with the snapshot is achieved by restoring the snapshot at the target cluster.

Step S24: and playing back the operation of the business application on the source cluster during the suspension of the copy service in the target cluster, and writing the operation into the target cluster, thereby finishing the seamless migration of the data. The purpose of this step is that the migration of the incremental data of the source cluster during snapshot migration has been completed in relation to the operation of the service application on the source cluster backlogged during the consumption and migration of the snapshot by the target cluster.

The cluster in the embodiment of the present invention may be, but is not limited to, an HBase cluster, and may also be another type of database cluster. In addition, the aforementioned step of configuring the inter-cluster Replication service may be implemented by using a Replication queue (Replication is a mechanism for real-time synchronization of cluster data). For example, in the HBase cluster, a Replication relationship between the source cluster and the target cluster is established, the WAL log recorded with the operation of the service application on the source cluster is sent to the target cluster by using the Replication queue, and the WAL log is played back in the target cluster, so that the purpose of data synchronization of the source cluster and the target cluster can be achieved.

In the embodiment of the invention, after the replication service is configured, the replication service needs to be suspended, and snapshot migration of the full amount of data of the source cluster is performed first, so that all data of the source cluster at the current moment is exported to the target cluster in a snapshot mode.

During the snapshot migration (i.e. from the beginning of creating the snapshot of the source cluster to the end of updating the data of the target cluster), there may still be business applications that add, delete, change, etc. the operations on the source cluster, so the replication service is suspended during the snapshot migration, and the WAL logs generated during the suspension of the replication service are backlogged in advance and are not sent to the target cluster. And after the snapshot migration is completed, namely the data of the target cluster is updated by using the snapshot, restarting the copy service, sending the WAL log backlogged during the snapshot migration to the target cluster, and completing the migration of the incremental data of the source cluster, thereby realizing the seamless migration of the data among the database clusters.

In addition, in the embodiment of the present invention, updating the data of the target cluster by using the snapshot mainly includes the following processes: updating the definition of the table of the target cluster with the snapshot; the Region information of the recovery table (the Region is the basic unit for storing and managing HBase data, one table can contain one or more regions); and a Region for offline change, updating the information of the meta table.

Fig. 3 is a schematic main flow chart of a method for migrating data according to an embodiment of the present invention. The method for migrating data according to the embodiment of the present invention is described in detail below according to the flowchart of fig. 3.

As can be seen from the above description, the embodiments of the present invention implement seamless migration of data by combining snapshot and copy. The basic principle is that all data of a database table at a certain moment are migrated by exporting online snapshots of source cluster data, and then migration of incremental data is completed by combining a copying mechanism, so that the final consistency of the source cluster data and the target cluster data is achieved, and the whole migration process does not need application stop writing.

The method comprises the following specific steps:

first, configuring the data copy service among clusters and suspending the copy service first

The replication service is configured and suspended first, at this time, the WAL log of the source cluster is temporarily backlogged, and the non-replicated WAL log is not deleted. And after the snapshot is exported, the replication service is restarted, and the backlog data can be continuously consumed and the replication is completed. This allows the final consistency of the inter-cluster data to be achieved.

The method comprises the following specific operations: 1. establishing a replication service relationship between a source cluster and a target cluster, wherein the target cluster configures a slave cluster of the replication service for the source cluster; 2. a pause copy services command is executed.

Principle of data replication service: taking an HBase cluster as an example, when a service application inserts or deletes data into an HBase, a RegionServer writes an insertion or deletion operation into a WAL log in a playable manner (where writing the WAL log is not associated with configuring or not configuring the Replication, whether writing the WAL log is parameter controllable or not, and default starting), and after a Replication service is started, a source cluster puts the WAL log recording the operation into a Replication queue and asynchronously sends the WAL log to a target cluster. The WAL log will remain in the source cluster until the replication service is complete.

Creating an online snapshot, exporting the snapshot and migrating data

And creating an online snapshot (snapshot) of the data to be migrated on the source cluster, wherein the creation of the snapshot does not influence the reading and writing of a source cluster table, the snapshot is actually a series of metadata information sets, and at the moment, no data is copied. And then, executing the exported snapshot to the target cluster, and recovering the snapshot in the target cluster after the export is finished so as to finish the data migration. The method comprises the following specific steps:

1. a snapshot is created at the source cluster. Creating the snapshot includes: executing online snapshot, backing up the description information of the table, creating a reference file of HFile and the like through distributed transaction and the Region information of the backup table;

2. and copying the snapshot to the target cluster, namely exporting the snapshot. At this point, all HBase table snapshot data involved in the snapshot is copied to the target cluster. The export snapshot is realized by the export snapshot of the HBase, and the essential is to copy the table snapshot data by executing a MapReduce mode. In addition, as the copied data is in an HDFS (Hadoop Distributed File System, namely a Hadoop Distributed File System, which provides high throughput to access data of the application program and is suitable for the application program with an ultra-large data set), the influence on the cluster performance is small;

3. and executing a snapshot restore _ snapshot command on the target cluster to complete the update of the target cluster data and realize the snapshot data migration. Taking the recovery of the snapshot in the HBase cluster as an example, the method mainly includes the following steps:

(1) updating the definition of the table: firstly, judging whether the table exists or not, and if not, creating the table. And then judging whether the definition information of the table is updated or not, and if the definition information of the table is updated, covering the current table definition by using the table definition of the snapshot.

(2) And (3) recovering the Region: and comparing and recovering Region information in the snapshot one by one. Deleting regions which do not exist in the snapshot and exist at present; the snapshot is compared with the current Region to be changed, and the recovery is carried out; creating regions which exist in the snapshot and do not exist now.

(3) And (5) updating the information of the meta table by the area of the offline change.

Third, resume replication service

After the snapshot data migration is completed, the replication service of the source cluster can be restarted, after the replication service is restarted, the source cluster continues to send the WAL log to the target cluster, and the target cluster consumes (i.e., plays back) the WAL log overstocked by the source cluster during the migration snapshot and copies the data to the target cluster. This operation is performed until all WAL logs are sent and the Replication queue is not backlogged. Therefore, the table data of the source cluster and the table data of the target cluster can be finally consistent, the application stop of the service party is not needed in the whole migration process, and the seamless migration of the data is realized.

It should be noted that in the embodiment of the present invention, the WAL log is still maintained in the source cluster before being sent to the target cluster. If the transmission to the target cluster has been successful, the primary cluster WAL log is no longer retained.

According to the method for migrating data, the service application does not need to stop writing in the data migration process by using the mode of combining the snapshot with the copy, and the seamless migration of the service application data is completed under the condition of not influencing the service application; the WAL logs are temporarily backlogged in a Replication queue during the suspension of the Replication service, so that the WAL logs which are not replicated are not deleted, and after the Replication service is restarted, backlogged data operations can be continuously consumed and the Replication is completed; by creating the online snapshot of the table to be migrated on the source cluster, the influence on the cluster performance can be reduced to the minimum in the process of migrating data by utilizing the snapshot in view of the facts that the creation of the snapshot does not influence the reading and writing of the table of the source cluster and the fact that the snapshot is derived to be in an HDFS (Hadoop distributed file system) layer; after the snapshot data migration is completed, the Replication service of the source cluster is restarted, the WAL log backlogged by the Replication queue of the source cluster during the snapshot migration is consumed, and the incremental data migration is completed, so that the data of the source cluster and the data of the target cluster can be finally consistent.

Fig. 4 is a schematic diagram of main modules of an apparatus for migrating data according to an embodiment of the present invention.

As shown in fig. 4, an apparatus 40 for migrating data according to an embodiment of the present invention mainly includes the following modules: a configuration module 401, a snapshot module 402, an update module 403, and a copy module 404, wherein:

the configuration module 401 is configured to configure an inter-cluster copy service and suspend the copy service, where the copy service is to send an operation on a source cluster by a service application to a target cluster, the target cluster plays back the operation and writes the operation into the target cluster, and the suspension of the copy service refers to that an operation on the source cluster by the service application is kept and is not sent to the target cluster temporarily, and after the copy service is restarted, the operation is sent to the target cluster to complete the copy; a snapshot module 402, configured to create a snapshot of data of the source cluster at the current time, and export the snapshot to a target cluster; an updating module 403, configured to update the data of the target cluster by using the snapshot, and restart the replication service after the update is completed; and the replication module 404 is used for playing back the operation on the source cluster related to the business application during the suspension of the replication service in the target cluster, and writing the operation into the target cluster, so as to complete the seamless migration of the data.

The cluster in the embodiment of the present invention may be, but is not limited to, an HBase cluster.

In addition, the configuration module 401 may also be configured to: configuring a Replication queue, and sending the WAL log of the source cluster to the target cluster through the Replication queue, wherein the WAL log is used for storing the operation of a service application on the source cluster; and playing back the WAL log at the target cluster so as to update the operation of the service application on the source cluster to the target cluster.

In addition, the configuration module 401 may also be configured to: the source cluster retains the WAL log during suspension of replication service.

In this embodiment of the present invention, the update module 403 may further be configured to: updating the definition of the table of the target cluster with the snapshot; recovering Region information of the table; and a Region for offline change, updating the information of the meta table.

According to the device for migrating data, provided by the embodiment of the invention, through the cooperation of the modules and the utilization of a snapshot and copy combination mode, the service application does not need to stop writing in the data migration process, and the seamless migration of the service application data is completed under the condition that the service application is not influenced; the WAL logs are temporarily backlogged in a Replication queue during the suspension of the Replication service, so that the WAL logs which are not replicated are not deleted, and after the Replication service is restarted, backlogged data operations can be continuously consumed and the Replication is completed; by creating the online snapshot of the table to be migrated on the source cluster, the influence on the cluster performance can be reduced to the minimum in the process of migrating data by utilizing the snapshot in view of the facts that the creation of the snapshot does not influence the reading and writing of the table of the source cluster and the fact that the snapshot is derived to be in an HDFS (Hadoop distributed file system) layer; after the snapshot data migration is completed, the Replication service of the source cluster is restarted, the WAL log backlogged by the Replication queue of the source cluster during the snapshot migration is consumed, and the incremental data migration is completed, so that the data of the source cluster and the data of the target cluster can be finally consistent.

The invention also provides an electronic device and a readable storage medium according to the embodiment of the invention.

The electronic device of the embodiment of the invention comprises: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the processor, the instructions being executable by the at least one processor to cause the at least one processor to perform the method of migrating data provided by the present invention.

The non-transitory computer-readable storage medium of an embodiment of the present invention stores computer instructions for causing the computer to execute the method of migrating data provided by the present invention.

Fig. 5 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present invention. As shown in fig. 5, the electronic apparatus includes: one or more processors 52 and a memory 51, one processor 52 being exemplified in fig. 5. The memory 51 is a non-transitory computer readable storage medium provided by the present invention.

The electronic device of the method of migrating data may further include: an input device 53 and an output device 54.

The processor 52, the memory 51, the input device 53 and the output device 54 may be connected by a bus or other means, as exemplified by the bus connection in fig. 5.

The memory 51, which is a computer-readable storage medium, can be used for storing non-transitory software programs, non-transitory computer-executable programs, and modules, such as program instructions/modules corresponding to the method for migrating data in the embodiment of the present invention (for example, the configuration module 401, the snapshot module 402, the update module 403, and the copy module 404 shown in fig. 4). The processor 52 executes various functional applications of the server and data processing, i.e., implements the method of migrating data in the above-described method embodiments, by executing non-transitory software programs, instructions, and modules stored in the memory 51.

The memory 51 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the device that migrated the data, and the like. Further, the memory 51 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 51 may optionally include memory located remotely from processor 52, which may be connected to the means for migrating data via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The input device 53 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the device that migrates data. The output device 54 may include a display device such as a display screen.

The one or more modules are stored in the memory 51 and, when executed by the one or more processors 52, perform a method of migrating data in any of the method embodiments described above.

The product can execute the method provided by the embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method. For technical details that are not described in detail in this embodiment, reference may be made to the method provided by the embodiment of the present invention.

Referring now to FIG. 6, a block diagram of a computer system 600 suitable for use with a terminal device or server implementing an embodiment of the invention is shown.

As shown in fig. 6, the computer system 600 includes a central processing module (CPU)601 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage section 608 into a Random Access Memory (RAM) 603. In the RAM603, various programs and data necessary for the operation of the system 600 are also stored. The CPU 601, ROM602, and RAM603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

The following components are connected to the I/O interface 605: an input portion 606 including a keyboard, a mouse, and the like; an output portion 607 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 608 including a hard disk and the like; and a communication section 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the internet. The driver 610 is also connected to the I/O interface 605 as needed. A removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 610 as necessary, so that a computer program read out therefrom is mounted in the storage section 608 as necessary.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program tangibly embodied on a machine-readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 609, and/or installed from the removable medium 611.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules described in the embodiments of the present invention may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor comprises a configuration module, a snapshot module, an update module, and a copy module. Where the names of these modules do not in some cases constitute a limitation on the module itself, for example, a configuration module may also be described as a "module that configures an inter-cluster replication service and suspends the replication service".

From the above description, it can be seen that by using a combination of snapshot and copy, the service application does not need to stop writing in the data migration process, and the seamless migration of the service application data is completed without affecting the service application; the WAL logs are temporarily backlogged in a Replication queue during the suspension of Replication, so that the WAL logs which are not replicated are not deleted, and after the Replication service is restarted, the backlogged data operations can be continuously consumed and the Replication is completed; by creating the online snapshot of the table to be migrated on the source cluster, the influence on the cluster performance can be reduced to the minimum in the process of migrating data by utilizing the snapshot in view of the facts that the creation of the snapshot does not influence the reading and writing of the table of the source cluster and the fact that the snapshot is derived to be in an HDFS (Hadoop distributed file system) layer; after the snapshot data migration is completed, the Replication service of the source cluster is restarted, the WAL log backlogged by the Replication queue of the source cluster during the snapshot migration is consumed, and the incremental data migration is completed, so that the data of the source cluster and the data of the target cluster can be finally consistent.

The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method of migrating data, comprising:

configuring inter-cluster copy service and suspending the copy service, wherein the copy service is to send an operation on a source cluster by a business application to a target cluster, the target cluster plays back the operation and writes the operation into the target cluster, and the suspending the copy service means that the operation on the source cluster by the business application is kept not to be sent to the target cluster temporarily;

creating a snapshot of the data of the source cluster at the current moment, and exporting the snapshot to a target cluster;

updating the data of the target cluster by using the snapshot, and restarting the copy service after the updating is finished;

and playing back the operation of the business application on the source cluster during the suspension of the copy service in the target cluster, and writing the operation into the target cluster.

2. The method of claim 1, wherein the cluster is an HBase cluster.

3. The method of claim 1, wherein configuring the inter-cluster replication service comprises:

configuring a Replication queue, and sending the WAL log of the source cluster to the target cluster through the Replication queue, wherein the WAL log is used for storing the operation of a service application on the source cluster; and

and playing back the WAL log at the target cluster so as to update the operation of the service application on the source cluster to the target cluster.

4. The method of claim 3, further comprising: the source cluster retains the WAL log during suspension of replication service.

5. The method of claim 1, wherein updating the data of the target cluster using the snapshot comprises:

updating the definition of the table of the target cluster with the snapshot;

recovering Region information of the table; and

and (5) updating the information of the meta table by the area of the offline change.

6. An apparatus for migrating data, comprising:

the system comprises a configuration module and a replication module, wherein the configuration module is used for configuring inter-cluster replication service and suspending the replication service, the replication service is used for sending operation of a business application on a source cluster to a target cluster, the target cluster plays back the operation and writes the operation into the target cluster, and the suspension of the replication service means that the operation of the business application on the source cluster is kept and is not sent to the target cluster temporarily;

the snapshot module is used for creating a snapshot of the data of the source cluster at the current moment and exporting the snapshot to a target cluster;

the updating module is used for updating the data of the target cluster by using the snapshot and restarting the copy service after the updating is finished;

and the replication module is used for playing back the operation of the business application on the source cluster during the suspension of the replication service in the target cluster and writing the operation into the target cluster.

7. The apparatus of claim 6, wherein the cluster is an HBase cluster.

8. The apparatus of claim 6, wherein the configuration module is further configured to:

9. The apparatus of claim 8, wherein the configuration module is further configured to: the source cluster retains the WAL log during suspension of replication service.

10. The apparatus of claim 6, wherein the update module is further configured to:

updating the definition of the table of the target cluster with the snapshot;

recovering Region information of the table; and

11. An electronic device, comprising:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-5.

12. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-5.