CN112286905A

CN112286905A - Data migration method and device, storage medium and electronic equipment

Info

Publication number: CN112286905A
Application number: CN202011103107.0A
Authority: CN
Inventors: 姚智博
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Wodong Tianjun Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Wodong Tianjun Information Technology Co Ltd
Priority date: 2020-10-15
Filing date: 2020-10-15
Publication date: 2021-01-29

Abstract

The present disclosure relates to the field of data processing, and provides a data migration method, a data migration apparatus, a computer storage medium, and an electronic device, wherein the data migration method includes: establishing communication connection with the migration data source and the receiving data source; carrying out fragmentation processing on the full data in the migration data source to obtain a plurality of data fragments; when a data migration instruction is received, migrating a plurality of data fragments to a receiving data source in a distributed manner through a migration cluster; acquiring incremental data from a migration data source, and judging whether the incremental data is newly added data or not according to a creation timestamp of the incremental data; when the incremental data is newly added data, migrating the incremental data to a receiving data source through the migration cluster; and when the incremental data is not the newly added data, replacing the target data in the receiving data source by the migration cluster by using the incremental data. The method and the device can improve the migration speed of the data and ensure the orderly progress of the data migration process on the premise of not influencing the read-write service of the migration data source.

Description

Data migration method and device, storage medium and electronic equipment

Technical Field

The present disclosure relates to the field of data processing technologies, and in particular, to a data migration method, a data migration apparatus, a computer storage medium, and an electronic device.

Background

The elastic search (ES cluster for short) is a search engine based on a Lucene library, provides a distributed full-text search engine supporting multiple tenants, has high query efficiency and good full-text search performance, also has better distributed support, can be conveniently and transversely expanded based on the cluster environment, and can disperse mass data to multiple clusters for storage and retrieval. However, with the version upgrade of the Elasticsearch, all users face a problem of how to quickly and accurately migrate data from a low-version cluster to a high-version cluster without affecting the reading and writing of business.

In the related art, data of a low-version cluster can only be migrated step by step in the process of migrating to a high-version cluster, and cannot span a large version (for example, directly migrating from ES2.1 to ES6.3), and after migration is completed, the cluster needs to be restarted, and if migration fails in the middle, the cluster needs to be migrated from the beginning again, so that migration time span is large and data migration efficiency is low.

In view of the above, there is a need in the art to develop a new data migration method and apparatus.

It is to be noted that the information disclosed in the background section above is only used to enhance understanding of the background of the present disclosure.

Disclosure of Invention

The present disclosure is directed to a data migration method, a data migration apparatus, a computer storage medium, and an electronic device, so as to avoid, at least to a certain extent, the defects that the read/write of a service is affected and the migration efficiency is low in the related art.

Additional features and advantages of the disclosure will be set forth in the detailed description which follows, or in part will be obvious from the description, or may be learned by practice of the disclosure.

According to a first aspect of the present disclosure, there is provided a data migration method, including: establishing communication connection with the migration data source and the receiving data source; carrying out fragmentation processing on the full data in the migration data source to obtain a plurality of data fragments; when a data migration instruction is received, migrating the plurality of data fragments to the receiving data source in a distributed mode through a migration cluster; acquiring incremental data from the migration data source, and judging whether the incremental data is newly added data or not according to the creation timestamp of the incremental data; when the incremental data is newly added data, migrating the incremental data to the receiving data source through the migration cluster; when the incremental data is not the newly added data, replacing target data in the receiving data source with the incremental data through the migration cluster; wherein the target data is the same as the creation timestamp of the delta data.

In an exemplary embodiment of the present disclosure, the performing fragmentation processing on the full amount of data in the migration data source to obtain a plurality of data fragments includes: acquiring initial running time of the migration data source, and determining the time span between the current time and the initial running time as a target time period; carrying out fragmentation processing on the target time period according to a preset time interval to obtain a plurality of time fragments; and taking the data stored in the migration data source in each time slice as a data slice.

In an exemplary embodiment of the present disclosure, each data slice corresponds to one migration subtask; when a data migration instruction is received, migrating the plurality of data fragments to the receiving data source through a migration cluster, including: when a data migration instruction is received, executing the migration subtasks in a distributed manner through the migration cluster to migrate the plurality of data fragments to the receiving data source.

In an exemplary embodiment of the present disclosure, the method further comprises: when the data volume to be migrated corresponding to the target data fragment is larger than the data volume threshold, splitting the migration subtask corresponding to the target data fragment into a plurality of target subtasks, so that the data volume to be migrated corresponding to each target subtask is smaller than the data volume threshold.

In an exemplary embodiment of the present disclosure, the replacing, by the migration cluster, the target data in the receiving data source with the delta data includes: comparing the update time stamps of the incremental data and the target data according to an optimistic locking mechanism; replacing target data in the receiving data source with the delta data when the update timestamp of the delta data is greater than the update timestamp of the target data.

In an exemplary embodiment of the disclosure, after replacing target data in the receiving data source with the delta data by the migration cluster, the method further comprises: querying the number of data stored in the migration data source; if the number of the data is larger than that of the data in the receiving data source, acquiring difference data of the migration data source and the receiving data source; updating, by the migration cluster, the difference data into the receiving data source.

In an exemplary embodiment of the present disclosure, the method further comprises: if the number of the data is equal to the number of the data in the received data source, acquiring field values of all data in the migrated data source, and acquiring field values of all data in the received data source; comparing whether the field values of the same data are consistent or not; marking data in the migration data source with inconsistent field values; updating the marked data into the receiving data source.

In an exemplary embodiment of the present disclosure, the method further comprises: recording interruption time when data migration is interrupted; and carrying out breakpoint continuous transmission according to the interruption time.

In an exemplary embodiment of the present disclosure, the method further comprises: creating a first standby server of the migration data source, and creating a second standby server of the receiving data source; migrating the data in the migration data source to the first standby server through the migration cluster at regular time; and migrating the data in the receiving data source to the second standby server through the migration cluster at regular time.

According to a second aspect of the present disclosure, there is provided a data migration apparatus comprising: the communication module is used for establishing communication connection with the migration data source and the receiving data source; the data fragmentation module is used for carrying out fragmentation processing on the full data in the migration data source to obtain a plurality of data fragments; the full data migration module is used for migrating the plurality of data fragments to the receiving data source in a distributed manner through the migration cluster when a data migration instruction is received; the incremental data migration module is used for acquiring incremental data from the migration data source and judging whether the incremental data is newly added data or not according to the creation timestamp of the incremental data; when the incremental data is newly added data, migrating the incremental data to the receiving data source through the migration cluster; when the incremental data is not the newly added data, replacing target data in the receiving data source with the incremental data through the migration cluster; wherein the target data is the same as the creation timestamp of the delta data.

According to a third aspect of the present disclosure, there is provided a computer storage medium having stored thereon a computer program which, when executed by a processor, implements the data migration method of the first aspect described above.

According to a fourth aspect of the present disclosure, there is provided an electronic device comprising: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to perform the data migration method of the first aspect described above via execution of the executable instructions.

As can be seen from the foregoing technical solutions, the data migration method, the data migration apparatus, the computer storage medium and the electronic device in the exemplary embodiments of the present disclosure have at least the following advantages and positive effects:

in the technical solutions provided in some embodiments of the present disclosure, on one hand, after establishing communication connection with a migration data source and a received data source, fragmentation processing is performed on full data in the migration data source to obtain a plurality of data fragments, so that concurrent migration processing can be performed on the full data through a cluster, and a data migration scheme that is general, efficient, simple to operate, rich in migration scenario, and accurate in migration is provided. Furthermore, when a data migration instruction is received, the plurality of data fragments are migrated to the received data source in a distributed manner through the migration cluster, so that the technical problems of service write stop, long operation time and complex migration process of the migrated data source in the related technology can be solved, the migration speed of data can be improved on the premise of not influencing the read-write service of the migrated data source, and the ordered proceeding of the data migration process can be ensured. On the other hand, incremental data is obtained from the migration data source, and whether the incremental data is newly added data or not is judged according to the creation timestamp of the incremental data; when the incremental data are newly added data, the incremental data are migrated to the data receiving source through the migration cluster, so that data omission can be avoided, and data integrity is guaranteed; when the incremental data is not the newly added data, the migration cluster replaces the target data in the received data source with the incremental data, so that the data effectiveness can be ensured, and the storage space of the received data source is saved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without the exercise of inventive faculty.

Fig. 1A is a schematic diagram showing a data migration method in the related art;

FIG. 1B is a schematic diagram showing another data migration method in the related art;

FIG. 2 illustrates a flow diagram of a data migration method in an exemplary embodiment of the present disclosure;

FIG. 3 is a diagram illustrating a deployment architecture of a data migration package in an exemplary embodiment of the present disclosure;

FIG. 4 is a diagram illustrating a deployment architecture of a data migration package in another exemplary embodiment of the present disclosure;

FIG. 5 illustrates a schematic diagram of a data migration method in an exemplary embodiment of the present disclosure;

FIG. 6 shows a schematic diagram of a data migration method in another example embodiment of the present disclosure;

FIG. 7A illustrates a sub-diagram of a data migration method in yet another exemplary embodiment of the present disclosure;

FIG. 7B illustrates a sub-diagram of a data migration method in yet another exemplary embodiment of the present disclosure;

FIG. 8 illustrates an overall flow diagram of a data migration method in an exemplary embodiment of the present disclosure;

FIG. 9 illustrates a schematic diagram of a data migration method in an exemplary embodiment of the present disclosure;

FIG. 10 is a sub-flow diagram illustrating a method of data migration in an exemplary embodiment of the present disclosure;

FIG. 11 illustrates a sub-flow diagram of a data migration method in an exemplary embodiment of the present disclosure;

FIG. 12 illustrates a schematic diagram of a data migration method in an exemplary embodiment of the present disclosure;

FIG. 13 is a sub-flow diagram illustrating a method of data migration in an exemplary embodiment of the present disclosure;

FIG. 14 illustrates a schematic diagram of synchronizing task tables in an exemplary embodiment of the present disclosure;

FIG. 15 is a schematic diagram illustrating a structure of a data migration apparatus according to an exemplary embodiment of the present disclosure;

FIG. 16 shows a schematic diagram of a structure of a computer storage medium in an exemplary embodiment of the disclosure;

fig. 17 shows a schematic structural diagram of an electronic device in an exemplary embodiment of the present disclosure.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and the like. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the present disclosure.

The terms "a," "an," "the," and "said" are used in this specification to denote the presence of one or more elements/components/parts/etc.; the terms "comprising" and "having" are intended to be inclusive and mean that there may be additional elements/components/etc. other than the listed elements/components/etc.; the terms "first" and "second", etc. are used merely as labels, and are not limiting on the number of their objects.

Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities.

In the related art, there are generally several data migration schemes:

rolling upgrade (Rolling upgrade is an upgrade mode of multi-copy service) is supported above the 5.0 version of the elastic search cluster, and cannot be upgraded across large versions below the 5.0 version. For example, referring to fig. 1A, fig. 1A is a schematic diagram illustrating a data migration method in the related art, and in particular, a schematic diagram illustrating that an ES cluster above version 5.0 supports rolling upgrade in the related art. Therefore, the drawbacks of this solution are: the version can only be increased step by step, the cluster needs to be restarted after the increase, the cluster needs to be migrated from the beginning if the fault interruption occurs, the process is too complex after the long-term operation, and the service needs to stop reading and writing.

Secondly, the index (re _ index) is rebuilt through the official ES cluster, and the data can be migrated after the index creation is completed. The scheme has the following disadvantages: the low version index cannot be created onto the high version cluster.

And thirdly, directly reading the data in the Mysql to the ES cluster with the high version by writing the data in the ES cluster with the low version into the Mysql. The scheme has the following disadvantages: the full amount of data must be backed up in Mysql, and the scheme is not universal.

Separately deploying the low-version migration program and the high-version migration program, namely deploying the low-version migration program on a client2, deploying the high-version migration program on a client6, and further writing asynchronous consumption data into the high-version ES cluster through jmq, namely, the data migration process is as follows: ES2.1-client2-jmq-client6-ES 6. The scheme has the following disadvantages: the client2 and the client6 are separately deployed, and have more input and output ports and low efficiency of migration.

The migration mode of the big data platform is as follows: JAVA-kafka-JRC (JAVA real-time computing platform Storm/Spark/Flink) -ES cluster. The scheme has the following disadvantages: the environment depending on big data is not universal, the process is not easy to monitor, and both failure compensation and monitoring alarm are not easy to do.

Sixthly, the Spark script performs data migration through a scheduling task based on the elastic search for Apache Hadoop. For example, referring to fig. 1B, fig. 1B shows a schematic diagram of another data migration method in the related art, and in particular shows a schematic diagram of a spare script in the related art that needs to depend on a big data environment, and therefore, the solution has the following disadvantages: depending on the big data environment, the scheme is imperfect and not universal.

In summary, the solutions in the related art and the drawbacks and features thereof can be referred to the following table 1:

TABLE 1

Therefore, in the related art, data cannot be quickly and accurately migrated from the low-version ES cluster to the high-version ES cluster without affecting the service reading and writing.

In the embodiment of the present disclosure, a data migration method is provided first, which overcomes the defects that the read-write of a service is affected and the migration efficiency is low in the related art at least to a certain extent.

Fig. 2 is a flowchart illustrating a data migration method according to an exemplary embodiment of the present disclosure, where an execution subject of the data migration method may be a Tomcat server deploying a data migration tool.

Referring to fig. 2, a data migration method according to one embodiment of the present disclosure includes the steps of:

step S210, establishing communication connection with the migration data source and the receiving data source;

step S220, carrying out fragmentation processing on the full data in the migration data source to obtain a plurality of data fragments;

step S230, when a data migration instruction is received, migrating a plurality of data fragments to a receiving data source in a distributed manner through a migration cluster;

step S240, obtaining incremental data from the migration data source, and judging whether the incremental data is newly added data or not according to the creation timestamp of the incremental data;

step S250, when the incremental data is newly added data, migrating the incremental data to a receiving data source through the migration cluster;

step S260, when the incremental data is not the newly added data, replacing the target data in the received data source by the incremental data through the migration cluster; wherein the target data is the same as the creation timestamp of the delta data.

In the technical solution provided in the embodiment shown in fig. 2, on one hand, after the communication connection is established with the migration data source and the received data source, fragmentation processing is performed on the full amount of data in the migration data source to obtain a plurality of data fragments, so that concurrent migration processing can be performed on the full amount of data through a cluster, and a data migration scheme with universality, high efficiency, simple operation, rich migration scenarios and accurate migration is provided. Furthermore, when a data migration instruction is received, the plurality of data fragments are migrated to the received data source in a distributed manner through the migration cluster, so that the technical problems of service write stop, long operation time and complex migration process of the migrated data source in the related technology can be solved, the migration speed of data can be improved on the premise of not influencing the read-write service of the migrated data source, and the ordered proceeding of the data migration process can be ensured. On the other hand, incremental data is obtained from the migration data source, and whether the incremental data is newly added data or not is judged according to the creation timestamp of the incremental data; when the incremental data are newly added data, the incremental data are migrated to the data receiving source through the migration cluster, so that data omission can be avoided, and data integrity is guaranteed; when the incremental data is not the newly added data, the migration cluster replaces the target data in the received data source with the incremental data, so that the data effectiveness can be ensured, and the storage space of the received data source is saved.

The following describes the specific implementation process of each step in fig. 2 in detail:

the migration data source in the present disclosure may be a low-version storage cluster and the receiving data source may be a high-version storage cluster. The storage cluster is deployed with a plurality of distributed storage nodes, and the plurality of storage nodes may use a consistency replication protocol, such as a Raft protocol, to ensure data consistency. It should be noted that the storage cluster in the present disclosure may be an ES cluster, a cluster composed of at least two or more Mysql database servers, a cluster composed of at least two or more Mongo database servers, a cluster composed of at least two or more hbase database servers, or the like, and may be set by itself according to actual situations, which belongs to the protection scope of the present disclosure.

In the following embodiments, an ES cluster with a low version as a migration data source and an ES cluster with a high version as a reception data source are taken as examples for explanation.

For example, a data migration package (hereinafter referred to as SDK package) may be developed and deployed in a migration cluster (Tomcat cluster), resulting in a data migration tool. Wherein, the full English name of the SDK is: software development kit, translation into Chinese means a software development kit. It is understood that the tool kit for implementing a certain function of a software product provided by a third party service provider is generally presented in the form of an API (Application Programming Interface) and documents, paradigms and tools.

By way of example, reference may be made to fig. 3, which is a diagram illustrating the deployment architecture of an SDK package in an exemplary embodiment of the disclosure. Referring to fig. 3, a task monitoring platform may be configured in any configuration manner of Web config or yaml config. Specifically, a migration subtask may be created in the migration data source of the ES2.1 version, and then the migration cluster (Tomcat cluster) may read the migration subtask from the task configuration table, and then the migration task may be started in two configuration manners, Web config or yaml config.

The Web config is an XML (Extensible Markup Language, XML) text file, which is used to store configuration information of asp. The method comprises the following steps of monitoring a page, cluster information, index information, type information, migration time periods, time slice sizes, thread pool numbers, migration button setting and the like. Json is a static configuration file that can be used to make static configurations, for example: database connection information, user names and passwords, table structures, mapping relationships, processing classes, and other configurations.

Exemplary, and referring also to fig. 4, fig. 4 shows a deployment architecture diagram of an SDK package in another exemplary embodiment of the disclosure. Referring to fig. 4, each server node in the migration cluster may register in a configuration center (any one of Zookeeper, Spring closed configuration, Nacos, or Apollo), and after the registration is completed, the migration cluster performs the migration subtasks in a distributed manner when a data migration instruction is received. The configuration center can perform task management (grouping, configuring, starting and stopping tasks), global configuration (registration center, configuration API, dynamic capacity expansion of relevant storage units and the like), log management (task operation log, task configuration log and the like), monitoring and statistics (task failure compensation and alarm, migration progress bar display, data consistency comparison) and the like through the task monitoring platform.

With continued reference to FIG. 2, in step S210, a communication connection is established with the migrating data source and the receiving data source.

When data in the current 2.1 version of the ES cluster needs to be migrated into the 6.3 version of the ES cluster, the migration data source is the 2.1 version of the ES cluster, and the receiving data source is the 6.3 version of the ES cluster.

After submitting the above information, the data migration tool may establish a communication connection with the migration data source according to the cluster name, the user name, and the password of the migration data source, and establish a communication connection with the reception data source according to the cluster name, the user name, and the password of the reception data source.

Further, the user may enter and submit relevant configuration parameters in the configuration page, for example, the parameters entered by the user may include: a preset duration interval (e.g., 1 hour), the number of server nodes (e.g., 10), the number of thread pools on each server node (e.g., 16 threads), and the duration of time that each server node performs each time slice (e.g., 5 minutes).

Furthermore, the data migration tool may obtain an initial running time of the migration data source (for example, the time when the migration data source performs the first service read-write, or the time when the first piece of data is written in the migration data source), further determine a time span between the current time and the initial running time as a target time period (for example, the target time period may be N years, where N is a rational number greater than 0), and perform fragmentation processing on the target time period according to the preset time interval (for example, 1 hour) to obtain a plurality of time fragments (for example, M, where M is a positive integer greater than 0).

Further, the data migration tool may estimate the migration time length for executing the migration task according to the number of the server nodes (e.g., 10), the number of the thread pools on each server node (e.g., 16 threads), and the time for each server node to execute each time slice (e.g., 5 minutes), and further may add the migration time length to the obtained task start time to generate the task termination time. For example, the start time may be set to 2019, month 1 and day 1, and the end time may be determined to be 2020, month 1 and day 1, so that the execution time of the migration task may be determined to be 2019, month 1 and day 2020, month 1 and day 1.

Further, a migration task a may be created, and the execution time of the migration task a is from 1/month 1/2019 to 1/month 1/2020. Furthermore, the data migration tool may create two temporary task tables (including a main task table a for recording execution information related to the migration task and a sub task table B for recording execution information related to a plurality of migration sub tasks) in the migration data source for storing the migration task a and the migration sub tasks.

The field names in the main task table a may include a cluster (cluster name), an index _ name (index name), an ES _ type (cluster type), a begin _ time (task start time), an end _ time (task end time), a shardtask _ num (number of pieces), a complete _ task _ num (number of pieces completed), and a progress, and a specific data structure of the main task table a may refer to the following table 2:

TABLE 2

Cluster (Cluster name)	Coupon
		index _ name (index name)	index_name
ES _ type (index type)	type
		begin _ time (start time)	2019-01-01
end _ time (end time)	2020-01-01
		shardtask _ num (number of slicing tasks)	100
complete _ task _ num (number of completed tasks)	20
		progress (migration progress)	20％

The field names in subtask table B may include: system _ id (system identification), cluster (cluster name), source _ index (source index), type (source index type), query _ date (query date), query _ hour (query time), query _ start _ time (start query time), query _ end _ time (end query time), data _ count (data amount), used time (migration time), reference _ field _ name (associated field name), status (migration status), target _ cluster (received data source), target _ index (target index), target _ type (target index type), ip (internet protocol address), re (number of times of re-migration), msg (hint message), and the data structure of the specific subtask table B can be referred to as the following table 3:

TABLE 3

With continued reference to fig. 2, in step S220, a fragmentation process is performed on the full amount of data in the migration data source to obtain a plurality of data fragments.

Referring to the related explanation of step S210, after the time slices are divided, the data stored in the migration data source in each time slice may be taken as a data slice, and referring to the related explanation of the above step, taking each time slice as 1 hour as an example for explanation, one of the data slices i may be from 0 point 1/2018 to 1 point 1/2018, and all the data stored in the migration data source. Thus, for example, the data in the migration data source may be divided into M data slices.

It should be noted that, when data is fragmented, a data amount corresponding to each data fragment (each data fragment is a migration subtask) may also be obtained, and when the data amount to be migrated corresponding to a target data fragment is greater than the data amount threshold, the migration subtask corresponding to the target data fragment may be split into multiple target subtasks, so that the data amount to be migrated corresponding to each target subtask is less than the data amount threshold. For example, when the target data segment (for example, the data segment corresponding to 23 points-24 points on 11/2018) is greater than the data amount threshold, the migration subtask T corresponding to the target data segment may be segmented to segment the migration subtask T into the migration subtask T corresponding to 23 points-23 points 30₁And, 23 points 30-24 points correspond to migration subtask T₂。

Further, the migration subtask T can be judged₁Whether the data volume to be migrated of the corresponding data fragment is smaller than the data volume threshold value or not, and judging the migration subtask T₂Whether the data volume to be migrated of the corresponding data fragment is smaller than the data volume threshold value or not, if so, the segmentation can be stopped, and further, exemplarily, the server node S of the cluster can be migrated through the data₁Executing migration subtask T₁Migrating clustered server nodes S through data₂Executing migration subtask T₂。

If the data volume to be migrated corresponding to any migration subtask is larger than the data volume threshold, the migration subtask can be continuously segmented. For example, if the subtask T is migrated₂When the data volume to be migrated of (points 30-24 of 23) is still larger than the threshold value of the data volume, the subtask T can be migrated₂Performing a second segmentation to divide T₂The target subtask T corresponding to 23 points 30-23 points 45 is divided into₂₁Target subtask T corresponding to 23 points 45-24₂₂. Similarly, if the data volume to be migrated of a certain target subtask is still greater than the data volume threshold, the target subtask may continue to be iteratively segmented until the data volume to be migrated corresponding to each target subtask is less than the data volume threshold.

As can be seen from the above explanation, by segmenting the migration subtask whose data size is greater than the data size threshold, the migration subtask T can be segmented into the migration subtask T₁Segmentation task T₂₁And segmenting task T₂₂Thus, the migration subtask T is made to be alternately executed by the three server nodes in the data migration cluster (illustratively, the server node S of the data migration cluster may be used₁Executing migration subtask T₁Migrating clustered server nodes S through data₂Executing migration subtask T₂₁Migrating clustered server nodes S through data₂Executing migration subtask T₂₂) The phenomena of overweight task, overload and node breakdown caused by the execution of the migration subtask T by a single server node are avoided, and the normal operation of each server node in the Tomcat cluster is ensured.

In step S230, when the data migration instruction is received, the plurality of data fragments are migrated to the receiving data source by the migration cluster distribution.

After the initialization of the data migration tool is completed, a user can click a migration start button (i.e., send a data migration instruction) on the configuration page, and then the data migration tool can receive the data migration instruction, and migrate a plurality of data fragments in the migrated data source to the received data source through the Tomcat cluster on the premise of not affecting the normal read-write service of the migrated data source. For example, referring to fig. 5, fig. 5 shows a schematic diagram of a data migration method in an exemplary embodiment of the present disclosure, and specifically shows a schematic diagram of starting a message queue of a migrated data source in a data migration process, and referring to fig. 5, it can be known that, in a migration process of full data, an MQ of the migrated data source (ES2.1) is always in an open state, so that technical problems that a read-write service of the migrated data source needs to be stopped and written, and a normal service processing process is affected in the related art can be solved, normal data read-write of the migrated data source is ensured, and user experience is optimized.

Specifically, when a data migration instruction is received, the data migration tool may call a Tomcat cluster, and a plurality of server nodes in the Tomcat cluster may concurrently preempt a migration subtask and preempt the migration subtask T₁Thereafter, the migration subtask T may be executed₁The migration status of the corresponding data fragment is updated to "1" during migration, and further, whether the data already exists in the receiving data source (e.g., ES6.3) may be queried, if the data already exists in the receiving data source, the data may be deleted, and further, the data fragment may be migrated to the receiving data source, and after the migration is completed, the migration status of each piece of data in the data fragment may be updated to "2" successful.

After the data migration is completed, the migration state of the data may be modified. For example, after the server node a migrates the full amount of data a in the low version cluster (e.g., ES2.1) to the high version cluster (e.g., ES6.3), the migration status of the full amount of data a may be modified (e.g., status ═ 2) to determine that the full amount of data a is migrated successfully.

It should be noted that the data in the migration data source is not deleted, almost all online data in the actual environment cannot be deleted, and the migration data source can only be periodically archived, and only the data of more than several years is allowed to be physically deleted to release space. Therefore, after the data in the migration data source is backed up and migrated to the receiving data source, all the previous full data is stored in the migration data source.

With continued reference to fig. 2, in step S240, incremental data is obtained from the migration data source, and whether the incremental data is newly added data is determined according to the creation timestamp of the incremental data.

For example, the creation timestamp and the update timestamp of all data in the migration data source may be obtained, and when the creation timestamp or the update timestamp of a certain piece of data B is after the task start time, the data may be determined to be incremental data.

After the migration of the data of the full amount of data is completed, the incremental data written into the migration data source in the process of migrating the full amount of data can be found, and for example, the data of which the creation time stamp is after the starting time of the migration task can be determined as the incremental data, and the data of which the update time stamp is after the starting time of the task can be determined as the incremental data.

After the incremental data is determined, the data amount of the incremental data may be counted or estimated, and then a migration task B is created (the migration task of the incremental data may set only the start time timelag, and does not set the end time).

Furthermore, when the data volume of the incremental data is greater than the first data volume threshold (that is, if the service migration data is stopped, the time for stopping writing is long, which affects normal operation of the service), the incremental data may be fragmented to obtain a plurality of data fragments, and on the premise of maintaining normal read-write service of the migrated data source, the migration task is executed in a distributed manner by the data migration cluster, so as to migrate the plurality of data fragments corresponding to the incremental data to the received data source. Therefore, when the data volume is overlarge, the influence on the normal query service of a user due to the overlong writing stop service time can be avoided, and the normal operation of the storage cluster during the data migration can be kept.

When the data volume of the incremental data is smaller than the data volume threshold (that is, if the service migration data is stopped, the time required to stop writing is short, the influence on the user is small and can be ignored), the incremental data does not need to be fragmented, the data read-write service of the migrated data source is stopped (refer to fig. 6, which shows a schematic diagram of a data migration method of a database in another exemplary embodiment of the present disclosure, specifically shows a schematic diagram of a data read-write service of the migrated data source which is stopped by closing a message queue of the migrated data source in the process of the incremental data migration), and a migration task is executed to migrate the incremental data to the received data source.

Further, it may be determined whether the incremental data is new data (i.e., data written into the migration data source during the process of migrating the full amount of data), and specifically, if the creation timestamp of the incremental data is after the task start time, the incremental data may be determined to be the new data.

With continued reference to fig. 2, in step S250, when the incremental data is the new data, the incremental data is migrated to the receiving data source through the migration cluster.

When the incremental data are judged to be newly added data, the incremental data can be directly migrated to the data receiving source through the migration cluster, so that omission of the newly added data can be avoided, and the integrity of the data is guaranteed.

In step S260, when the incremental data is not the newly added data, the target data in the received data source is replaced with the incremental data by the migration cluster.

When the incremental data is judged not to be the newly added data (namely the creation timestamp of the data is before the task starting time and the update timestamp of the data is after the task starting time), the data can be determined not to be the newly added data but the modified historical data during the migration of the full amount of data. Thus, the target data that has been migrated into the receiving data source and is the same as the creation timestamp of the delta data can be replaced according to the delta data and deleted. Therefore, invalid data which has been changed can be deleted, and the validity of the data can be ensured.

Illustratively, in the process of data replacement, the update time stamp of the incremental data and the update time stamp of the target data can be compared according to an optimistic lock mechanism; when the update timestamp of the delta data is greater than the update timestamp of the target data, then the target data in the received data source may be replaced by the delta data. And if the update time stamp of the incremental data is smaller than the update time stamp of the target data, it may be determined that the data replacement has failed.

It should be noted that, when the data migration process is aborted (for example, a network outage occurs during the transmission of the data Y), the current outage time may be recorded, for example: and after the network connection is normally recovered, the update timestamp of the data Y can be positioned to the data Y in the data fragment according to the interruption time, and the data is continuously migrated from the data Y backward, so that breakpoint continuous transmission can be performed.

It should be noted that, during the data migration process, the number of data fragments that have completed migration (status 2) may also be obtained in real time, and the migration progress is calculated according to the ratio of the number to the total number of data fragments, and further, a migration progress bar may be displayed in the configuration page (for example, the executed portion and the non-executed portion may be represented by different colors, for example, the executed portion is represented by red, the non-executed portion is represented by white, and a specific color may be set according to an actual situation, and belongs to the protection range of the present disclosure), or the execution progress of the migration task may be displayed in the configuration page (for example, 20%), so that a relevant user may visually know the migration progress of the data in real time, and it is ensured that the migration process is transparent and monitorable.

When data migration is interrupted due to system abnormality (such as program self abnormality and network abnormality) and the like, alarm information can be displayed in the configuration page, so that related users can conveniently and quickly perform troubleshooting on related abnormal reasons, the problem troubleshooting time is shortened, and the troubleshooting efficiency and the migration efficiency are improved.

Exemplarily, referring to fig. 7A and 7B, fig. 7A shows a schematic diagram of a data migration method in still another exemplary embodiment of the present disclosure, and specifically shows a schematic diagram of a data migration process when the storage cluster is an ES cluster, and fig. 7B shows a schematic diagram of a data migration method in still another exemplary embodiment of the present disclosure, and specifically shows a schematic diagram of a data migration process when the storage cluster is Mysql, ES, Mongo, hbase, and a specific implementation is explained below with reference to fig. 7A and 7B.

Illustratively, the migration task and the migration subtask can be created through Web-config, and the execution time period of the migration task is set to be 1 month and 1 day in 2019 to 1 month and 1 day in 2020, so that the data fragment corresponding to each hour corresponds to one migration subtask, and further, after the data migration tool establishes communication connection with the migration data source (ES2.1) and the receiving data source (ES6.3), the data migration tool can communicate with the migration data source through a Transport API interface and communicate with the receiving data source through a REST API interface. Further, the history (i.e., the full amount of data of the synchronization history) may be synchronized first, the new data in the incremental data may be synchronized again, and the non-new data (modified data) in the incremental data may be synchronized.

Exemplarily, fig. 8 shows an overall flowchart of a data migration method in an exemplary embodiment of the present disclosure, and a specific implementation is explained below with reference to fig. 8.

Referring to fig. 8, the full amount of data (i.e., the historical data in fig. 8) may be migrated first, specifically, a new index of the receiving data source (i.e., the new cluster in fig. 8) may be prepared first, and then, a migration task (whose execution time is from 1 month in 2019 to 10 months in 2019) and a migration subtask are created, and the migration task and the migration subtask are executed by the data migration cluster, so that the migration of the full amount of data to the receiving data source is realized.

After the full amount of data is migrated, the creation time stamp and the update time stamp of each piece of data in the migration data source may be compared to determine incremental data, and the incremental data is migrated to the receiving data source. Specifically, a new index of the received data source may be prepared first, and then the incremental data is fragmented to create a migration task (the execution time is 2019 and 11 months, and the termination time is determined by the actual data writing condition in the migration data source), and the migration task is executed by the data migration cluster, so that the incremental data is migrated to the received data source. After migration is complete, traffic may be switched to the receiving data source (i.e., the MQ of the receiving data source is turned on) by the system switch.

After migrating the incremental data in the migrated data source to the received data source, the data read-write service of the migrated data source may be stopped, and a Message MQ (Message Queue, abbreviated as MQ) may be opened to the received data source (for example, fig. 9 shows a schematic diagram of a data migration method of a database in an exemplary embodiment of the present disclosure, specifically shows a schematic diagram of closing the Message Queue of the migrated data source during the incremental data migration, and opening the Message Queue of the received data source, so as to open the data read-write service of the received data source).

Fig. 10 is a schematic sub-flow diagram of a data migration method in an exemplary embodiment of the present disclosure, and the specifically shown sub-flow diagram includes steps S1001 to S1003, and a specific implementation is explained below with reference to fig. 10.

In step S1001, the number of pieces of data stored in the migration data source is queried.

Further, on the premise of maintaining the data read-write service of the received data source and stopping writing the data (data migrated from the migrated data source) in the received data source, the number of data stored in the migrated data source may be queried.

In step S1002, if the number of data pieces is greater than the number of data pieces in the receiving data source, difference data between the migration data source and the receiving data source is obtained.

If the number of data pieces is greater than the current number of data pieces in the received data source, the difference data in the migrated data source can be determined according to the data update time stamp. For example, referring to the related explanation of the step S240, when the creation timestamp or the update timestamp of a certain piece of data C is after the start time of the migration task of the incremental data, the data may be determined to be difference data.

In step S1003, the difference data is updated into the receiving data source by the migration cluster.

After determining the difference data, the difference data may be updated to the receiving data source by the migration cluster. Specifically, whether the difference data is newly added data or not can be judged, and when the difference data is newly added data, the difference data can be directly migrated to the receiving data source, and the update timestamp of the data can be changed. And when the difference data is not the newly added data, the difference data may be substituted for the data in the reception data source that is the same as the creation time stamp of the difference data.

Exemplarily, referring to fig. 11, fig. 11 is a sub-flowchart diagram illustrating a data migration method in an exemplary embodiment of the present disclosure, and in particular, a sub-flowchart diagram illustrating comparing field values of data and updating data with inconsistent field values when the number of data pieces in a migrated data source is equal to the number of data pieces in a received data source, including steps S1101-S1104, and the following explains a specific implementation manner with reference to fig. 11.

In step S1101, if the number of pieces of data is equal to the number of pieces of data in the received data source, the field values of all data in the migrated data source are acquired, and the field values of all data in the received data source are acquired.

If the number of the data in the migration data source is equal to the number of the data in the receiving data source, the field value of each data in the migration data source and the field value corresponding to the field name of each data in the receiving data source can be obtained.

In step S1102, it is checked whether or not the field values of the same data match.

In this step, it can be compared whether the field values of the same data are the same. For example: and comparing whether the field values of the data M in the migration data source and the data M' in the receiving data source are the same.

In step S1103, data in the migration data source whose field values do not coincide is marked.

Further, when the field values of the data M and the data M' are not consistent, the data M may be marked, for example: the migration status of the data M is updated to migration failure, i.e., status is 3. Therefore, the marked data can be conveniently and quickly found out subsequently, and relevant updating operation is carried out on the data.

In step S1104, the marked data is updated into the receiving data source.

Further, the marked data may be updated into the receiving data source, and specifically, the corresponding data (the same data as the creation timestamp of the marked data) in the receiving data source may be replaced according to the data in the marked migration data source.

Illustratively, statistics of the number of the marked data can be obtained and displayed, so as to monitor the data migration effect.

Fig. 12 is a schematic diagram illustrating a data migration method of a database in an exemplary embodiment of the present disclosure, and specifically, an overall flow diagram illustrating data comparison and updating of difference data after incremental data migration is completed, where a specific implementation is explained below with reference to fig. 12.

Referring to fig. 12, during data alignment, normal read and write traffic (turn on MQ) of the receiving data source (ES6.3) may be maintained, but data requiring alignment needs to be stopped writing because the data in change is not aligned.

Further, the data in the migration data source (including the creation timestamp and the update timestamp) exists in the form of data fragments, and each data fragment corresponds to one migration subtask. Furthermore, the Compare task may read a migration subtask, read data a included in the migration subtask from the migration data source, read data b included in the migration subtask from the receiving data source, Compare whether field values of the data a and the data b are completely the same through equals (a, b), and if not, mark the data a in the migration data source (update the migration status to migration failure, i.e., status is 3), and update the data a to the receiving data source.

Exemplarily, fig. 13 shows a sub-flow diagram of a data migration method of a database in an exemplary embodiment of the present disclosure, and specifically shows a sub-flow diagram of creating a disaster recovery cluster, which includes steps S1301 to S1303, and the following explains a specific implementation manner with reference to fig. 13.

In step S1301, a first standby server is created that migrates the data source, and a second standby server is created that receives the data source.

In an exemplary embodiment of the present disclosure, a first standby server to migrate a data source may also be created, and a second standby server to receive the data source may also be created, to enhance the fault tolerance of the system.

In step S1302, the data in the migration data source is migrated to the first standby server through the migration cluster timing.

Data in the migrated data source may be migrated to the first standby server at the migration cluster timing. Specifically, a synchronous task table may be created, for example, referring to fig. 15, fig. 15 shows a schematic diagram of the synchronous task table created in an exemplary embodiment of the present disclosure, and referring to fig. 15, the synchronous task table may include: the index of the data in the migration data source (_ index), the type of the data (_ type), the identification number of the data (_ id), the correlation degree of the data and the query condition (_ score), the identification of the first standby server (dest _ client), the last synchronization field of the data (last _ backup _ time), the index of the data in the first standby server (dest _ index), and the index type of the data in the first standby server (dest _ index _ type). Furthermore, the migration progress of the data can be queried in real time through the index of the data or other self-set query conditions.

Each task has a last _ backup _ time field (last synchronization time field), so that the regularly scheduled task synchronizes some of the incremental data of the part to the first standby server at intervals, then writes the synchronization start time into the last _ backup _ time field, and so on, and updates the field again next time. Therefore, the problem that data cannot be repaired and acquired when the migration data source is down can be avoided, and data security is guaranteed.

In step S1303, the data in the receiving data source is migrated to the second standby server at the migration cluster timing.

Data in the receiving data source may be migrated to the second standby server at regular times by the migration cluster. Specifically, only one synchronization task table is needed, and reference may be continued to fig. 14, where fig. 14 illustrates a schematic diagram of a synchronization task table created in an exemplary embodiment of the present disclosure, and referring to fig. 14, the synchronization task table may include: the index of the data in the receiving data source (index), the type of the data (type), the identification number of the data (id), the correlation degree of the data and the query condition (score), the identification of the second standby server (dest _ client), the last synchronization field of the data (last _ backup _ time), the index of the data in the second standby server (dest _ index), and the index type of the data in the second standby server (dest _ index _ type). Furthermore, the migration progress of the data can be queried in real time through the index of the data or other self-set query conditions.

Each task has a last _ backup _ time field (last synchronization time field), so that the scheduling task synchronizes some of the incremental data to the second standby server at intervals, then writes the synchronization start time into the last _ backup _ time field, and so on, and updates the last _ backup _ time field again. Therefore, the problem that data cannot be repaired and acquired when the received data source is down can be avoided, and data safety is guaranteed.

Based on the technical scheme, on one hand, the full data can be concurrently migrated through the cluster, and a data migration scheme which is general, efficient, simple to operate, rich in migration scene and accurate in migration is provided. Furthermore, the technical problems of service write stop, long operation time and complex migration process of the migrated data source in the related technology can be solved, the migration speed of the data can be improved on the premise of not influencing the read-write service of the migrated data source, and the ordered proceeding of the data migration process can be ensured. Furthermore, data omission can be avoided, and data integrity and effectiveness are guaranteed. On the other hand, the problem that data cannot be repaired and acquired when the received data source is down can be avoided, and data safety is guaranteed.

The present disclosure also provides a data migration apparatus, and fig. 15 shows a schematic structural diagram of the data migration apparatus in an exemplary embodiment of the present disclosure; as shown in fig. 15, the data migration apparatus 1500 may include a communication module 1501, a data fragmentation module 1502, a full data migration module 1503, and an incremental data migration module 1504. Wherein:

the communication module 1501 is configured to establish communication connection with the migration data source and the reception data source.

The data fragmentation module 1502 is configured to perform fragmentation processing on the full amount of data in the migration data source to obtain a plurality of data fragments.

In an exemplary embodiment of the present disclosure, the data fragmentation module is configured to obtain an initial running time of the migrated data source, and determine a time span between a current time and the initial running time as a target time period; carrying out fragmentation processing on the target time period according to a preset time interval to obtain a plurality of time fragments; and taking the data stored in the migration data source in each time slice as a data slice.

The full data migration module 1503 is configured to, when a data migration instruction is received, distributively migrate the plurality of data fragments to the receiving data source through the migration cluster.

In an exemplary embodiment of the present disclosure, each data slice corresponds to one migration subtask; and the full data migration module is used for executing the migration subtasks in a distributed manner through the migration cluster when receiving the data migration instruction so as to migrate the plurality of data fragments to the receiving data source.

In an exemplary embodiment of the disclosure, the full data migration module is configured to split the migration subtasks corresponding to the target data shards into a plurality of target subtasks when the amount of data to be migrated corresponding to the target data shards is greater than the data amount threshold, so that the amount of data to be migrated corresponding to each target subtask is less than the data amount threshold.

An incremental data migration module 1504, configured to obtain incremental data from a migration data source, and determine whether the incremental data is newly added data according to a creation timestamp of the incremental data; when the incremental data is newly added data, migrating the incremental data to a receiving data source through the migration cluster; and when the incremental data is not the newly added data, replacing the target data in the receiving data source by the migration cluster by using the incremental data.

In an exemplary embodiment of the present disclosure, the incremental data migration module is configured to compare the update timestamps of the incremental data and the target data according to an optimistic lock mechanism; when the update time stamp of the delta data is greater than the update time stamp of the target data, the target data in the source of the received data is replaced with the delta data.

In an exemplary embodiment of the present disclosure, the incremental data migration module is configured to query the number of data pieces stored in the migration data source; if the number of the data is larger than that of the data in the receiving data source, acquiring difference data of the migration data source and the receiving data source; the difference data is updated into the receiving data source by the migration cluster.

In an exemplary embodiment of the present disclosure, the incremental data migration module is configured to, if the number of data pieces is equal to the number of data pieces in the receiving data source, obtain field values of all data in the migrated data source, and obtain field values of all data in the receiving data source; comparing whether the field values of the same data are consistent or not; marking data in the migration data source with inconsistent field values; the marked data is updated into the receiving data source.

In an exemplary embodiment of the present disclosure, the incremental data migration module is configured to record an interruption time when the data migration is interrupted; and carrying out breakpoint continuous transmission according to the interruption time.

In an exemplary embodiment of the present disclosure, the incremental data migration module is configured to create a first standby server to migrate a data source, and to create a second standby server to receive the data source; migrating data in the migration data source to a first standby server at regular time through the migration cluster; and migrating the data in the receiving data source to a second standby server through the migration cluster timing.

The specific details of each module in the data migration apparatus have been described in detail in the corresponding data migration method, and therefore are not described herein again.

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

Moreover, although the steps of the methods of the present disclosure are depicted in the drawings in a particular order, this does not require or imply that the steps must be performed in this particular order, or that all of the depicted steps must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions, etc.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a mobile terminal, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.

In an exemplary embodiment of the present disclosure, there is also provided a computer storage medium capable of implementing the above method. On which a program product capable of implementing the above-described method of the present specification is stored. In some possible embodiments, various aspects of the disclosure may also be implemented in the form of a program product comprising program code for causing a terminal device to perform the steps according to various exemplary embodiments of the disclosure described in the "exemplary methods" section above of this specification, when the program product is run on the terminal device.

Referring to fig. 16, a program product 1600 for implementing the above method according to an embodiment of the present disclosure is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present disclosure is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

In addition, in an exemplary embodiment of the present disclosure, an electronic device capable of implementing the above method is also provided.

As will be appreciated by one skilled in the art, aspects of the present disclosure may be embodied as a system, method or program product. Accordingly, various aspects of the present disclosure may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.

An electronic device 1700 according to this embodiment of the present disclosure is described below with reference to fig. 17. The electronic device 1700 shown in fig. 17 is only an example and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 17, electronic device 1700 is in the form of a general purpose computing device. Components of electronic device 1700 may include, but are not limited to: the at least one processing unit 1710, the at least one memory unit 1720, a bus 1730 connecting various system components including the memory unit 1720 and the processing unit 1710, and a display unit 1740.

Wherein the storage unit stores program code that is executable by the processing unit 1710 to cause the processing unit 1710 to perform steps according to various exemplary embodiments of the present disclosure described in the above section "exemplary method" of this specification. For example, processing unit 1710 may perform the following as shown in fig. 2: step S210, establishing communication connection with the migration data source and the receiving data source; step S220, carrying out fragmentation processing on the full data in the migration data source to obtain a plurality of data fragments; step S230, when a data migration instruction is received, migrating a plurality of data fragments to a receiving data source in a distributed manner through a migration cluster; step S240, obtaining incremental data from the migration data source, and judging whether the incremental data is newly added data or not according to the creation timestamp of the incremental data; step S250, when the incremental data is newly added data, migrating the incremental data to a receiving data source through the migration cluster; step S260, when the incremental data is not the newly added data, replacing the target data in the received data source by the incremental data through the migration cluster; wherein the target data is the same as the creation timestamp of the delta data.

The storage unit 1720 may include readable media in the form of volatile memory units, such as a random access memory unit (RAM)17201 and/or a cache memory unit 17202, and may further include a read only memory unit (ROM) 17203.

Storage unit 1720 may also include a program/utility 17204 having a set (at least one) of program modules 17205, such program modules 17205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

Bus 1730 may be any of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.

The electronic device 1700 can also communicate with one or more external devices 1800 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 1700, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 1700 to communicate with one or more other computing devices. Such communication can occur via an input/output (I/O) interface 1750. Also, the electronic device 1700 can communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the internet) via the network adapter 1760. As shown, the network adapter 1760 communicates with the other modules of the electronic device 1700 over the bus 1730. It should be appreciated that although not shown, other hardware and/or software modules may be used in conjunction with electronic device 1700, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.

Furthermore, the above-described figures are merely schematic illustrations of processes included in methods according to exemplary embodiments of the present disclosure, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

Claims

1. A method of data migration, comprising:

establishing communication connection with the migration data source and the receiving data source;

carrying out fragmentation processing on the full data in the migration data source to obtain a plurality of data fragments;

when a data migration instruction is received, migrating the plurality of data fragments to the receiving data source in a distributed mode through a migration cluster;

acquiring incremental data from the migration data source, and judging whether the incremental data is newly added data or not according to the creation timestamp of the incremental data;

when the incremental data is newly added data, migrating the incremental data to the receiving data source through the migration cluster;

when the incremental data is not the newly added data, replacing target data in the receiving data source with the incremental data through the migration cluster; wherein the target data is the same as the creation timestamp of the delta data.

2. The method according to claim 1, wherein the fragmenting the full amount of data in the migration data source to obtain a plurality of data fragments comprises:

acquiring initial running time of the migration data source, and determining the time span between the current time and the initial running time as a target time period;

carrying out fragmentation processing on the target time period according to a preset time interval to obtain a plurality of time fragments;

and taking the data stored in the migration data source in each time slice as a data slice.

3. The method of claim 1, wherein each data slice corresponds to a migration subtask;

when a data migration instruction is received, migrating the plurality of data fragments to the receiving data source through a migration cluster, including:

when a data migration instruction is received, executing the migration subtasks in a distributed manner through the migration cluster to migrate the plurality of data fragments to the receiving data source.

4. The method of claim 3, further comprising:

when the data volume to be migrated corresponding to the target data fragment is larger than the data volume threshold, splitting the migration subtask corresponding to the target data fragment into a plurality of target subtasks, so that the data volume to be migrated corresponding to each target subtask is smaller than the data volume threshold.

5. The method of claim 1, wherein the replacing, by the migration cluster, the target data in the receiving data source with the delta data comprises:

comparing the update time stamps of the incremental data and the target data according to an optimistic locking mechanism;

replacing target data in the receiving data source with the delta data when the update timestamp of the delta data is greater than the update timestamp of the target data.

6. The method of any of claims 1-5, wherein after replacing target data in the receiving data source with the delta data by the migration cluster, the method further comprises:

querying the number of data stored in the migration data source;

if the number of the data is larger than that of the data in the receiving data source, acquiring difference data of the migration data source and the receiving data source;

updating, by the migration cluster, the difference data into the receiving data source.

7. The method of claim 6, further comprising:

if the number of the data is equal to the number of the data in the received data source, acquiring field values of all data in the migrated data source, and acquiring field values of all data in the received data source;

comparing whether the field values of the same data are consistent or not;

marking data in the migration data source with inconsistent field values;

updating the marked data into the receiving data source.

8. The method of claim 1, further comprising:

recording interruption time when data migration is interrupted;

and carrying out breakpoint continuous transmission according to the interruption time.

9. The method of claim 1, further comprising:

creating a first standby server of the migration data source, and creating a second standby server of the receiving data source;

migrating the data in the migration data source to the first standby server through the migration cluster at regular time; and the number of the first and second groups,

and migrating the data in the receiving data source to the second standby server through the migration cluster at regular time.

10. A data migration apparatus, comprising:

the communication module is used for establishing communication connection with the migration data source and the receiving data source;

the data fragmentation module is used for carrying out fragmentation processing on the full data in the migration data source to obtain a plurality of data fragments;

the full data migration module is used for migrating the plurality of data fragments to the receiving data source in a distributed manner through the migration cluster when a data migration instruction is received;

the incremental data migration module is used for acquiring incremental data from the migration data source and judging whether the incremental data is newly added data or not according to the creation timestamp of the incremental data; when the incremental data is newly added data, migrating the incremental data to the receiving data source through the migration cluster; when the incremental data is not the newly added data, replacing target data in the receiving data source with the incremental data through the migration cluster; wherein the target data is the same as the creation timestamp of the delta data.

11. A computer storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the data migration method of any of claims 1-9.

12. An electronic device, comprising:

a processor; and

a memory for storing executable instructions of the processor;

wherein the processor is configured to perform the data migration method of any one of claims 1-9 via execution of the executable instructions.