CN108664496B - Data migration method and device - Google Patents

Data migration method and device Download PDF

Info

Publication number
CN108664496B
CN108664496B CN201710197702.7A CN201710197702A CN108664496B CN 108664496 B CN108664496 B CN 108664496B CN 201710197702 A CN201710197702 A CN 201710197702A CN 108664496 B CN108664496 B CN 108664496B
Authority
CN
China
Prior art keywords
data
relation
chain
task
relationship
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710197702.7A
Other languages
Chinese (zh)
Other versions
CN108664496A (en
Inventor
刘军
方锦亮
赵重庆
温伟飞
李良必
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201710197702.7A priority Critical patent/CN108664496B/en
Priority to PCT/CN2018/078398 priority patent/WO2018177107A1/en
Publication of CN108664496A publication Critical patent/CN108664496A/en
Application granted granted Critical
Publication of CN108664496B publication Critical patent/CN108664496B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data migration method and device, and belongs to the technical field of networks. The method comprises the following steps: acquiring a plurality of relation chains according to a calculation task log of an original service cluster, wherein the calculation task log is used for recording the relation between calculation tasks and business data in the original service cluster, and each relation chain is used for indicating a group of calculation tasks and business data with the relation; sequentially migrating the business data and the calculation tasks indicated by the plurality of relation chains to the target service cluster by taking the relation chains as units; when migration is performed based on any one relationship chain, the calculation tasks indicated by the relationship chain which is not subjected to migration in the plurality of relationship chains are normally operated. By representing the business data and the computing tasks with the association relationship by using one relationship chain, the computing tasks indicated by the relationship chain which is not migrated can still be normally operated in the process of data migration, so that the normal use of the business indicated by the relationship chain which is not migrated can not be influenced.

Description

Data migration method and device
Technical Field
The present invention relates to the field of network technologies, and in particular, to a data migration method and apparatus.
Background
With the development of network technology, the service data volume of various services is rapidly increasing continuously, and can reach PB (byte beat) level or even higher, so that the Internet and information industry enter a big data era. In the big data era, a service cluster composed of a large number of servers is generally adopted for business data storage, business processing and business management. In practical applications, the service cluster is usually deployed in the same IDC (Internet Data Center) room. However, as the business data increases, the size of the service cluster is also increasing, and the size of the IDC room is limited, in which all servers of the service cluster may not be stored, so as to limit the size of the service cluster.
In the prior art, when a service cluster performs service processing, a corresponding calculation task is created for a service and a corresponding calculation resource is allocated to the calculation task, and a processing process of service data is executed by running the calculation task. Since various services are usually associated with each other, in order to avoid affecting associated other services when migrating the service data of one service, the data of a service cluster is generally migrated as a whole, and during the whole migration, all the computing tasks need to be stopped (i.e., all the services are stopped being provided), then all the service data are migrated to a new service cluster, then the computing tasks are reconfigured and corresponding computing resources are allocated in the new service cluster, and then the reconfigured computing tasks are started, so that the services are provided for all the services again, thereby completing the data migration.
In the process of implementing the invention, the inventor finds that the prior art has at least the following problems:
due to the huge data volume of the service data in the service cluster, the migration process usually takes several days, months or longer, and if all services are stopped during the migration process, all services cannot be used normally.
Disclosure of Invention
In order to solve the problems in the prior art, embodiments of the present invention provide a data migration method and apparatus. The technical scheme is as follows:
in one aspect, a data migration method is provided, and the method includes:
acquiring a plurality of relation chains according to a calculation task log of an original service cluster, wherein the calculation task log is used for recording the incidence relation between calculation tasks and business data in the original service cluster, and each relation chain is used for indicating a group of calculation tasks and business data with the incidence relation;
sequentially migrating the business data and the calculation tasks indicated by the plurality of relation chains to a target service cluster by taking the relation chains as units;
when migration is performed based on any relationship chain, computing tasks indicated by relationship chains which are not subjected to migration in the plurality of relationship chains are normally operated.
In another aspect, an apparatus for data migration is provided, the apparatus comprising:
the system comprises a first obtaining unit, a second obtaining unit and a third obtaining unit, wherein the first obtaining unit is used for obtaining a plurality of relation chains according to a calculation task log of an original service cluster, the calculation task log is used for recording the relation between calculation tasks and business data in the original service cluster, and each relation chain is used for indicating a group of calculation tasks and business data with the relation;
the migration unit is used for sequentially migrating the business data and the calculation tasks indicated by the plurality of relation chains to the target service cluster by taking the relation chains as units;
when migration is performed based on any relationship chain, computing tasks indicated by relationship chains which are not subjected to migration in the plurality of relationship chains are normally operated.
The technical scheme provided by the embodiment of the invention has the following beneficial effects:
according to the computing task log in the original service cluster, the business data and the computing tasks with the association relation are represented by one relation chain, so that in the process of data migration by taking the relation chain as a unit, the relation chain which is being migrated does not influence other relation chains, the computing tasks indicated by the relation chains which are not migrated can still be normally operated, and the normal use of the business indicated by the relation chains which are not migrated is not influenced.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1A is a schematic diagram of an implementation scenario provided in an embodiment of the present invention;
FIG. 1B is an architecture diagram of a migration platform according to an embodiment of the present invention;
FIG. 2A is a flow chart of a data migration method according to an embodiment of the present invention;
FIG. 2B is a schematic diagram of a relationship chain according to an embodiment of the present invention;
FIG. 2C is a schematic diagram illustrating a split relationship chain according to an embodiment of the present invention;
FIG. 2D is a diagram illustrating a split relationship chain according to an embodiment of the present invention;
FIG. 2E is a schematic diagram illustrating a split relationship chain according to an embodiment of the present invention;
fig. 2F is a schematic diagram of a relationship chain obtained through splitting to access critical service data according to an embodiment of the present invention;
FIG. 2G is a schematic diagram of a dual table writing mechanism related process according to an embodiment of the present invention;
fig. 2H is a schematic diagram of a migration state in a relationship chain migration process according to an embodiment of the present invention;
FIG. 3 is a block diagram of a data migration apparatus according to an embodiment of the present invention;
fig. 4 is a block diagram of a data migration apparatus according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
Fig. 1A is a schematic diagram of an implementation scenario of data migration provided in an embodiment of the present invention, and referring to fig. 1A, the implementation scenario includes an original service cluster, a target service cluster, and a migration platform.
The original service cluster may include a plurality of storage clusters and a plurality of computing clusters, where the storage clusters are used to store business data, and the computing clusters are used to run computing tasks and store related data of the computing tasks, such as the size of computing resources of the computing tasks and the locations of the computing resources. The storage cluster and the computing cluster may be respectively deployed on different servers, or may be deployed on the same server, which is not limited in this embodiment.
It should be noted that, when the service cluster performs service processing, a corresponding computation task is created for a service and a corresponding computation resource is allocated for the computation task, and one or more service processing processes are executed by running the computation task, for example, a certain service data is read from the service cluster, and another output service data is written into the service cluster after the service data is processed. The calculation task has a certain operation periodicity, and the operation period may be several hours, several days, several weeks, several months, or the like, for example, the calculation task with the operation period of 1 hour is operated every other hour. The operation periods of different calculation tasks may be the same or different, and the type of the calculation task is related to the processing speed of the service data, which is not limited in this embodiment.
In addition, a data path mapping table is maintained in the service cluster, and the data path mapping table is used for the corresponding relationship between the service data identifier and the storage path of the service data. The calculation task can determine the storage path of the read or written service data through the data path mapping table in the service cluster, so that the read or write process of the service data is completed according to the acquired storage path. The business data read by one computing task can be written by other computing tasks, and the business data written by one computing task can be read by other computing tasks, so that certain input and output relationship exists between the computing tasks and the business data.
The migration platform is configured to migrate data of the service cluster and manage a data migration process, and may be deployed in the original service cluster, or in the target service cluster, or in another server that is different from the original service cluster and the target service cluster and is capable of communicating with the original service cluster and the target service cluster. In this embodiment, the migration platform needs to migrate data in the original service cluster to the target service cluster, where the migrated data relates to the service data and the calculation task in the original service cluster.
In particular, the migration platform may include a plurality of modules, each of which may serve a different role during the data migration process. Referring to fig. 1B, an architecture diagram of a migration platform including a plurality of functional modules is shown, and the functions of the functional modules are described below:
the analysis module is used for executing a process of acquiring a plurality of relation chains according to the computation task log, which is indicated by the following steps 201 to 203; the split module is used for executing a relationship chain splitting process indicated by the following step 204; the check module is used to perform the process of consistency checking the migration subtask and the relationship chain in step 206 described below.
The migration module is configured to execute a process involving service data migration and computation task migration in steps 205 to 208, where after the service data indicated by the relationship chain is migrated, the migration module executes a storage path switching process of the data path mapping table, and the process corresponds to step 207. The migration process of the computing task refers to a switching process of the configuration information of the computing task, the configuration information of the computing task may be obtained from the configuration library, and the process corresponds to step 208. If the migrated relationship chain is the relationship chain obtained through splitting, the key service data needs to be synchronized, and the process corresponds to step a in step 206.
The step of synchronizing the data path mapping table refers to adding the target storage path of the service data migrated to the target service cluster to the path mapping table.
The migration platform foreground may be configured to manage a migration process of the relationship chain, for example, may display various information of the relationship chain, such as a connection relationship of each node in the relationship chain, a migration state of the relationship chain in the migration process, node information in the relationship chain, and running state information of the computation task, where the node information of the relationship chain includes a storage path indicated by all data nodes in the relationship chain and a computation task identifier indicated by a task node, and related explanations of the data nodes and the task node refer to contents shown in step 203. The user can start or pause the migration process of the relationship chain through the migration platform foreground.
The configuration library is used for storing configuration information of computing resources of the computing task, such as the size and the position information of the computing resources, and the configuration library can also store an original storage path of the business data in the original service cluster and a target storage path of the target service cluster. The task relationship chain is used for storing a plurality of relationship chains generated by the analysis module. The migration task library is used for storing information of the migration subtasks, such as the number of the migration subtasks, the indicated service data, the original storage path and the target storage path of the service data, and the data size of the service data.
In one embodiment, the original service clusters may all be placed in the IDC1 room, the target service clusters may all be placed in the IDC2 room, and the IDC1 room and the IDC2 room are in different geographical locations. The size of the target service cluster is larger than that of the original service cluster, and the number of the servers which can be accommodated in the corresponding IDC2 machine room is larger than that of the servers which can be accommodated in the IDC1 machine room. Of course, in another embodiment, the original service cluster or the target service cluster may be prevented from being in different IDC rooms, which is not limited in this embodiment.
Fig. 2A is a flowchart of a data migration method according to an embodiment of the present invention, and referring to fig. 2A, a method flow according to an embodiment of the present invention includes:
201. and acquiring a computation task log of the original service cluster.
In the running process of the computing task of the original service cluster, a computing task log is generated and used for recording the association relationship between the computing task and the service data in the original service cluster. For example, the computation task log includes task identifiers of computation tasks, storage paths of business data, input and output relationships between the computation tasks and the business data, and other information, which may include business information to which the computation tasks belong, such as business identifiers, user information to which the business belongs, and the like. The storage path of the service data can be used to indicate the service data, the same storage path is used to indicate the same service data, and the computing task accesses the service data through the storage path of the service data.
The migration platform may obtain the computation task log from the original service cluster, and extract a plurality of input/output records from the computation task log. The input/output record is used to indicate a task identifier of the computation task, a storage path of the service data, and an input/output relationship between the computation task and the service data, and is an input/output record table as shown in table 1.
TABLE 1
Figure BDA0001257796630000061
In this embodiment, the migration platform may analyze the multiple input/output records extracted from the computation task log to obtain multiple relationship chains indicating the association relationship between the computation task and the service data, where the process of obtaining the multiple relationship chains is described in the following steps 202 to 204.
202. And adding the same relation chain identification for the input and output records with the incidence relation and adding different relation chain identifications for the input and output records without the incidence relation according to a plurality of input and output records recorded by the computing task log.
In this embodiment, the process of adding the same relationship chain identifier to the input and output records having the association relationship by the migration platform may be: traversing each input/output record in the plurality of input/output records, and for a currently traversed first input/output record, if the traversed input/output record comprises a second input/output record having an incidence relation with the first input/output record, adding a relation chain identifier which is the same as that of the second input/output record to the first input/output record; and if the traversed input-output record does not comprise a second input-output record having an incidence relation with the first input-output record, adding a relation chain identification different from the traversed input-output record to the first input-output record.
The first input/output record and the second input/output record have an association relationship, that is, the calculation task indicated by the first input/output record and the service data indicated by the second input/output record have an input/output relationship, or the service data indicated by the first input/output record and the calculation task indicated by the second input/output record have an input/output relationship.
For example, taking table 1 as an example, each input/output record of table 1 is traversed, when a first input/output record is traversed, a relationship chain identifier 1001 is added to the input/output record, and if the currently traversed first input/output record is a second input/output record "task 2, IN, storage path 1", since task 2 and "storage path 1" IN the traversed first input/output record have an input relationship, it is determined that the traversed input/output record includes a second input/output record having an association relationship with the currently traversed first input/output record, and then the same relationship chain identifier 1001 as the first input/output record is added to the first input/output record. Assuming that the currently traversed first i/o record is the last i/o record "task 5, IN, storage path 5" IN table 1, since there is no i/o relationship between "task 5" and the service data indicated by all the traversed i/o records, and there is no i/o relationship between "storage path 5" and the calculation tasks indicated by all the traversed i/o records, the traversed i/o record does not include the second i/o record, and therefore, a relationship chain identifier different from the traversed i/o record is added to the currently traversed last i/o record, for example, the different relationship chain identifier may be 1002. After adding the relationship chain identifiers to all the input and output records, the relationship chain table shown in table 2 can be obtained.
TABLE 2
Figure BDA0001257796630000071
203. And generating a plurality of first relation chains according to the relation between the computing task and the business data indicated by the input and output records with the same relation chain identifier.
In this embodiment, the migration platform may abstract the input and output records having the same relationship chain identifier into a first relationship chain, where the first relationship chain includes a task node for indicating the computation task, a data node for indicating the business data, and an input and output relationship between the task node and the data node. And the task nodes in the first relation chain comprise task identifications of the calculation tasks, and the data nodes comprise storage paths of the service data.
Taking table 2 as an example, according to the input/output records with the same relationship chain identifier 1001, a first relationship chain is generated as shown in fig. 2B, and fig. 2B shows an association relationship between the service data indicated by the input/output record corresponding to the relationship chain identifier 1001 and the calculation task, where the first relationship chain includes task nodes 1-4 corresponding to tasks 1 to 4, data nodes 1-4 corresponding to storage paths 1 to 4, and an input/output relationship between the service nodes and the data nodes. The connection line from the task node 1 to the data node 1 is used to instruct the computation task 1 to write the service data into the storage path 1. The connection line from the data node 1 to the task node 2 is used to instruct the computation task 2 to read the traffic data from the storage path 2.
In this embodiment, the service data or the computation task indicated by the different first relationship chains has no association with the computation task or the service data indicated by the other first relationship chains. Therefore, the business data and the computing tasks in the original service cluster can be migrated by taking the first relation chain as a unit, and when the business data and the computing tasks indicated by one relation chain are migrated, the normal operation of the computing tasks indicated by other first relation chains is not influenced.
Considering that the time of data migration is doubly constrained by the data volume of migration and the network bandwidth, but the network bandwidth is usually limited, in order to ensure that the data migration indicated by one relationship chain can be completed in a short time, thereby further reducing the influence of the migration process on the normal use of the service, this embodiment may further split the first relationship chain with a large data volume, and the detailed process is referred to as step 204.
204. And if the plurality of first relation chains comprise a second relation chain, splitting the second relation chain into a plurality of third relation chains, wherein the second relation chain is the first relation chain of which the data volume of the indicated business data exceeds the first threshold value.
The first threshold may be set by the migration platform according to a preset migration time of the relationship chain and the network bandwidth, for example, if the network bandwidth is 2GB/s (gigabytes per second) and the preset migration time is 2 minutes, the maximum first threshold is 120GB, and of course, the first threshold may also be smaller than 120GB, so as to avoid the influence on the network bandwidth due to the instability of the network environment. The preset migration time may be preset by the migration platform, or may be set according to a service requirement of a user, and the like, which is not limited in this embodiment.
For each first relation chain in the plurality of first relation chains, the migration platform may obtain the data volume of the service data indicated by the first relation chain according to the storage path indicated by the data node in the first relation chain. If the first relation chain indicates that the service data exceeds the first threshold, determining that the first relation chain is a second relation chain, and determining that splitting the second relation chain is required, wherein the splitting process may include the following steps 204a to 204 c:
and 204a, acquiring the weight values of the plurality of data nodes in the second relation chain.
And the weight value of each data node is used for indicating the association degree of the data node in the second association chain, and the association degree is higher when the weight value is higher.
The process of obtaining the weights of the plurality of data nodes may be: for each data node in the plurality of data nodes, determining the product of the number of task nodes associated with the data node and the data quantity of the service data indicated by the data node as the weight of the data node.
Taking the first relation chain shown in fig. 2B as an example, where the task nodes associated with the data node 1 include task nodes 1 to 4, the number of the task nodes is 4, and assuming that the data volume of the service data indicated by the data node is 100GB, the weight of the data node is 4 × 100 and is equal to 400.
It should be noted that, because the purpose of splitting the first relation chain is to split the relation chain with a large data amount into the relation chain with a small data amount, and for any data node in the relation chain, if more task nodes are associated with the data node, it indicates that the number of relation chains obtained by splitting based on the data node is more, so that the data amount of the service data indicated by each split relation chain is more balanced, and the data amount of a certain relation chain is not too large, therefore, two factors, that is, the number of task nodes associated with the data node and the data amount of the service data indicated by the data node, need to be considered when determining the weight of the data node.
And 204b, acquiring key data nodes from the plurality of data nodes according to the sequence of the weights from high to low and the positions of the plurality of data nodes on the second relation chain, wherein the key data node is the first data node in the sequence and can split the second relation chain into at least two third relation chains.
In this embodiment, in order to improve the efficiency and success rate of the relationship chain splitting, the migration platform analyzes each data node according to the order of the weights from high to low, for example, the migration platform performs pre-splitting on the second relationship chain based on the data node, determines the number of third relationship chains that can split the second relationship chain into third relationship chains, and if the number of the third relationship chains obtained by splitting is less than 2, performs a pre-splitting process on the next data node according to the order of the weights from high to low; and if the number of the third relation chains obtained by splitting is not less than 2, determining the data node as a key data node, and splitting the second relation chain based on the key data node. After the key data nodes are obtained from the plurality of data nodes, the migration platform does not execute the pre-splitting process on the data nodes behind the key data nodes in the arrangement sequence.
When the second relationship chain is pre-split according to the data node, the method for determining the number of the third relationship chains obtained by splitting may be: and disconnecting the association relationship between the data node and the task node associated with the data node, then determining the connectivity between nodes (including the task node and the data node) except the data node in the second relationship chain, if the nodes except the data node are connected, determining that the number of the third relationship chains obtained by splitting is 1 (namely less than 2), and otherwise, determining that the number of the third relationship chains obtained by splitting is not less than 2.
Wherein, the process of determining the connectivity between the nodes except the data node in the second relationship chain may be: and traversing the nodes except the data node, for example, optionally selecting one node as a starting point, if each node can be traversed, determining that the nodes except the data node are connected, otherwise, determining that the nodes except the data node are not connected.
It should be noted that, the process of pre-splitting the second relationship chain is not a process of actually splitting the second relationship chain, but is an analysis process that the migration platform assumes how many third relationship chains the second relationship chain can be split into based on the corresponding data node.
And 204c, splitting a plurality of task nodes associated with the key data node in the second relation chain into a plurality of third relation chains based on the key data node.
In this embodiment, the process of splitting the second relationship chain into a plurality of third relationship chains by the migration platform based on the key data node may be divided into the following three cases:
in the first case, for each task node in the plurality of task nodes directly associated with the key data node, the task node, and a node having an association relationship with the task node when the key data node is disconnected from the task node are determined as a third relationship chain.
In this case, the key data node is included in each third relationship chain. Still assume that the key data node in the relationship chain shown in fig. 2B is data node 1, and fig. 2C is a schematic diagram of a plurality of third relationship chains obtained by splitting the relationship chain shown in fig. 2B based on data node 1 in this case.
In the second case, the key data node is determined as a third relationship chain, and for each task node in the plurality of task nodes associated with the key data node, the nodes which are not the key data node and have the relationship with the task node are taken as the third relationship chain.
In this case, the key data node alone acts as a third relationship chain. For example, the key data node is first split from the second relationship chain as a third relationship chain. And in the rest nodes, traversing by taking the task node as a starting point for each task node in a plurality of task nodes associated with the key data node, and determining all nodes capable of being traversed as nodes having an association relationship with the task node. Assuming that the key data node in the relationship chain shown in fig. 2B is the data node 1, fig. 2D is a schematic diagram of splitting the relationship chain shown in fig. 2B based on the data node 1 to obtain a plurality of third relationship chains in this case. It should be noted that fig. 2B is only shown as an example and does not represent the actual splitting result, for example, a third relationship chain except the key data node in the actual splitting process should include a plurality of nodes, and does not include only one task node.
And in the third case, the key data node, at least one task node directly associated with the key data node and a node having an association relationship with the at least one task node are split into a third relationship chain, and the task nodes and the data nodes except the split third relationship chain are split into at least one third relationship chain.
The task node directly associated with the key data node is a task node which is a child node or a parent node of the key service data. In this case, the key data node is split into a third relationship chain with at least one task node directly associated with the key data node. The process of splitting the task node and the data node except for the split third relationship chain into at least one third relationship chain is the same as the process of taking the node except for the key data node and having an association relationship with the task node as one third relationship chain in the first case, which is not described herein again. For example, still assume that the key data node in the relationship chain shown in fig. 2B is data node 1, and fig. 2E is a schematic diagram of a plurality of third relationship chains obtained by splitting the relationship chain shown in fig. 2B based on data node 1 in this case.
In the first case, when migrating the third relationship chain, if it is detected that the relationship chain includes a critical data node, the target storage path of the critical data node may be written into the data path mapping table. Second case and third case: the critical traffic data may be copied to the target service cluster after splitting.
It should be noted that, in the process of splitting the second relationship chain into a plurality of third relationship chains, the migration platform may add different relationship chain identifiers to the plurality of third relationship chains. In the above three cases, the second relationship chain is formally split into a plurality of third relationship chains, in order to distinguish the third relationship chain obtained by splitting from the first relationship chain which is not split, a splitting identifier may be added to the third relationship chain, and the splitting identifier may be embodied in the relationship chain identifier, for example, the first two bits in the relationship chain identifier are taken as the splitting identifier. For example, the format of the relationship chain identifier may be xx _ yyyy, where xx is used to represent the split identifier, such as 00 represents the first relationship chain that is not split, and 01 represents the third relationship chain that is obtained by splitting. Where yyyy is used to denote the number of the relationship chain.
In this embodiment, in the process of performing data migration by using the relationship chain, the calculation task may still run, new service data may be generated in the running process, and due to the limitation of the network bandwidth, when the relationship chain is too large, the speed of the generated new service data is likely to be greater than the migration speed of the service data, which may cause the relationship chain to be unable to be migrated and completed forever, so that the migration of data indicated by the relationship chain may be implemented by splitting the large relationship chain into the small relationship chains under the condition that the calculation task is guaranteed to run normally.
The above step 203 and step 204 are processes of generating a plurality of relationship chains according to the association relationship between the computation task and the service data indicated by the input/output records identified by the same relationship chain, where each relationship chain includes a task node for indicating the computation task, a data node for indicating the service data, and an association relationship between the task node and the data node.
The above steps 202 to 204 are to obtain a plurality of relation chains according to the calculation task log of the original service cluster. Wherein each relationship chain is used for indicating a group of computing tasks and business data with the relationship.
In this embodiment, the migration platform can sequentially migrate the service data and the computation tasks indicated by the plurality of relationship chains to the target service cluster by using the relationship chains as units. The sequential migration refers to that data migration can be performed only for one relationship chain at a time, and also parallel migration can be performed for several relationship chains. When migration is performed based on any one relationship chain, the calculation tasks indicated by the relationship chain which is not subjected to migration in the plurality of relationship chains are normally operated. Wherein the migration process of a relationship chain includes the following steps 205 to 208.
205. And for each relation chain in the plurality of relation chains, generating a plurality of migration subtasks according to the plurality of business data indicated by the relation chain.
In this embodiment, in the process of performing data migration on one relationship chain, a plurality of migration subtasks may be generated according to a plurality of service data indicated by the relationship chain, and the process of generating the plurality of migration subtasks may be: for each business data of the plurality of business data indicated by the relationship chain, performing the following process: judging whether the data volume of the service data is smaller than a second threshold value; if the data volume of the service data is smaller than a second threshold value, generating a migration subtask corresponding to the service data; and if the data volume of the service data is not less than the second threshold, dividing the service data into a plurality of sub-service data according to the time sequence of data generation according to the second threshold, and generating a migration subtask corresponding to each sub-service data. Wherein the data amount of each sub-service data is smaller than the second threshold. The second threshold may be preset or changed by the migration platform, which is not limited in this embodiment. When the service data is stored in the service cluster, the service cluster correspondingly records the storage time of the service data, and the migration platform can determine the generation time of the service data according to the recorded storage time. The migration platform may add configuration information to each migration subtask, where the configuration information may include an original storage path and a target storage path of the corresponding service data.
It should be noted that, a service data shown in this embodiment refers to a service data stored in a storage path, and when the data volume of the service data is smaller than a second threshold, a migration subtask generated corresponding to the service data is used to migrate the service data stored in the storage path.
206. And migrating the business data indicated by the relationship chain to the target service cluster according to the plurality of migration subtasks.
The migration platform may migrate the service data to the target service cluster according to the original storage path and the target storage path. A plurality of sub-tasks corresponding to one relationship chain may be executed sequentially or in parallel, which is not limited in this embodiment.
The business data indicated by the relation chain are migrated by adopting different migration subtasks, so that the granularity of data migration is reduced, and the plurality of migration subtasks can run in parallel, so that the migration efficiency of the business data indicated by the relation chain is improved.
In addition, this embodiment also provides a data verification mechanism for the migration subtask, and the data verification process may be: for each migration subtask in the multiple migration subtasks, after all the business data corresponding to the migration subtask are migrated to the target service cluster, performing consistency check on the business data corresponding to the migration subtask in the target service cluster and the original service cluster; if the consistency is successfully checked, determining that the business data corresponding to the migration subtask is successfully migrated; and if the consistency check fails, determining that the business data corresponding to the migration subtask fails to migrate, and re-executing the migration subtask. It should be noted that the configuration information of each migration subtask may further include a data size of corresponding service data, and when the migration subtask is executed, if the migration platform detects that the data size of the service data migrated to the target service cluster reaches the data size indicated by the migration subtask, it is determined that all the service data corresponding to the migration subtask has been migrated to the target service cluster.
It should be noted that, during the process of migrating the service data indicated by the relationship chain, the computing task indicated by the relationship chain may also continue to run, and therefore, the service data stored under the path stored in the relationship chain may be updated. For the service data stored in one storage path, in this embodiment, the service data stored before the relationship chain is generated is referred to as historical service data, and the service data updated after the relationship chain is generated is referred to as new service data. Considering that the possibility of modifying the historical service data by the user is less than the possibility of modifying the new service data, when each migration subtask is executed, the service data under a certain storage path can be migrated according to the sequence of the service data generation time from first to last, that is, the historical service data is preferentially migrated, so as to avoid the problem that the migration efficiency is reduced because the service data needs to be retransmitted when the user modifies the service data.
The range of performing consistency check on the service data corresponding to the migration subtask includes: the method comprises the steps of checking the data volume of the service data, checking the number of files contained in the service data and checking the data content of the service data. The migration platform may perform consistency Check on the service data corresponding to the migration subtask by using a preset algorithm, where the preset algorithm may be preset, and for example, the preset algorithm may be a Cyclic Redundancy Check (CRC) Check algorithm. And when the data volume, the file number and the data content of the business data in the original service cluster and the target service cluster are consistent, determining that the consistency check of the business data is successful.
The timing for re-executing the migration subtask may be immediately executed after determining that the consistency check on the corresponding service data fails, or may be executed after determining that the consistency check on the corresponding service data fails for a preset time period, or may be re-executed after completing other migration subtasks corresponding to the relationship chain, which is not limited in this embodiment.
By carrying out data verification on the migration subtask, fine-grained verification of the business data is realized, so that when the business data is in error in migration, re-migration of the data can be carried out on the granularity of the migration subtask.
In order to reduce the influence of data migration on a normally used service to the maximum extent, in this embodiment, for a relationship chain being migrated, a related computing task is not stopped running in the whole process of migrating the relationship chain, but after the service data is migrated to a certain schedule, the computing task is migrated in a period in which the computing task is stopped running, so as to reduce the time for stopping running of the computing task to the maximum extent. In the process of migrating the business data indicated by the relationship chain, the following steps 206a to 206d may also be performed.
Step 206a, in the process of migrating the service data indicated by the relationship chain, obtaining a migration progress of the service data indicated by the relationship chain.
The migration platform may obtain the migration progress of the service data indicated by the relationship chain according to the total data volume of the service data indicated by the relationship chain and the migrated data volume of the service data of the relationship chain. For example, if the total data volume of the service data indicated by the relationship chain is 100GB and the migrated data volume is 60GB, it may be determined that the migration progress of the service data indicated by the relationship chain is 60%.
Step 206b, when the migration progress of the business data exceeds the preset progress, for each calculation task indicated by the relationship chain, judging whether the calculation task is in a running stop state, if the calculation task is in the running stop state, executing step 206c, and if the calculation task is in the running state, executing step 206 d.
The preset schedule may be preset or modified by the migration platform, and of course, the preset schedule may also be dynamically adjusted by the migration platform according to the network bandwidth, for example, when the migration platform detects that the network bandwidth is reduced, the value of the preset schedule may be appropriately increased, so as to reduce the time spent on the migration calculation task to the greatest extent.
And step 206c, if the computing task is in the stop running state, maintaining the stop running state of the computing task before the relationship chain is migrated.
And step 206d, if the computing task is in the running state, maintaining the running stop state of the computing task after the computing task stops running and before the relationship chain finishes migration.
It should be noted that the process of maintaining the stop state of the computing task in step 206c and step 206d may be referred to as a process of freezing the computing task. In order to avoid the influence of the freezing of the computing task on the business of the enterprise user, the message of the freezing of the computing task can be displayed to the enterprise user through the migration platform before the freezing of the computing task, and the process of the freezing of the computing task is executed after the freezing is confirmed by the enterprise user.
In the data migration process, since the computing task is still running, after the migration of the business data corresponding to one migration subtask is completed, the business data may be changed, for example, modified or deleted. Therefore, in order to ensure the integrity of the service data, after the service data indicated by the relationship chain is migrated, the migration platform may further perform consistency check on the service data indicated by the relationship chain in units of relationship chains, where the process may be: performing consistency check on the business data indicated by the relation chain in the target service cluster and the original service cluster; if the consistency check is successful, the following steps 207 and 208 are performed; and if the consistency check fails, determining the business data which fails to be migrated and indicated by the relationship chain according to the consistency check result, and migrating the business data which fails to be migrated again. When the consistency check is performed on the service data indicated by the relationship chain, the consistency check of the service data may be performed on each migration subtask one by one, or the consistency check of the service data may be performed on each storage path in the relationship chain one by one, and for the migration subtask or the storage path for which the consistency check fails, the service data corresponding to the migration subtask or the storage path is determined to be the service data for which the migration fails. The migration platform may adopt the corresponding original migration subtask or reestablish the migration subtask to migrate the service data that failed in the migration from the new one, and the specific migration process is the same as the above-mentioned process of performing data migration according to the migration subtask, which is not described herein again.
It should be noted that, the above steps 205 to 206 describe the process of migrating the business data indicated by the relationship chain by taking a relationship chain as an example. In the process of data migration by using the relationship chain, the migration platform may distinguish whether the migrated relationship chain is the first relationship chain which is not split or the third relationship chain which is obtained by splitting according to the relationship chain identification. Or, for a case of the relationship chain splitting in step 204c, since each third relationship chain obtained by the splitting includes a key data node, the migration platform may add a specified identifier to the key data node in each third relationship chain, and identify whether the migrated relationship chain includes the key data node through the specified identifier, thereby determining whether the migrated relationship chain is the third relationship chain.
It should be noted that, because the plurality of third relationship chains obtained through splitting still share the service data indicated by the key data node, in order to ensure that the key service data indicated by the key data node shared by the plurality of third relationship chains is synchronized during the data migration process according to the third relationship chains, this embodiment employs a double-write table mechanism, and two storage paths of the key service data are stored in a data path mapping table, one is a target storage path of a target service cluster, and the other is an original storage path of an original service cluster, and the process may be: acquiring a target storage path of key business data in a target service cluster, wherein the key business data are business data indicated by key data nodes; and adding the target storage path in the data path mapping table, and reserving the original storage path of the key business data in the original service cluster. The process of adding the target storage path to the data path mapping table may be performed after the relationship chain is split, or may be performed before the plurality of third relationship chains are migrated, which is not limited in this embodiment.
In the process of data migration by taking a relationship chain as a unit, if the migrated relationship chain is a third relationship chain obtained through splitting, based on a double-write table mechanism, the process of migrating the third relationship chain further comprises the following steps a to c:
step a, key business data are synchronized in the target service cluster and the original service cluster according to the target storage path and the original storage path.
And when detecting that the key service data of the original service cluster or the target service cluster is updated, the migration platform synchronizes the key service data in the target service cluster and the original service cluster according to the target storage path and the original storage path.
And b, if the business data and the calculation tasks indicated by the third relation chain are completely migrated to the target service cluster, accessing the key business data according to the target storage path recorded by the data mapping table when the calculation tasks indicated by the third relation chain are operated.
And c, if the business data and the calculation tasks indicated by the third relation chain are not completely migrated to the target service cluster, accessing the key business data according to the original storage path recorded by the data mapping list when the calculation tasks indicated by the third relation chain are operated.
In this embodiment, according to the identifier of the service cluster where the data indicated by the third relation chain is located, the storage path of the key service data in the corresponding service cluster may be obtained from the data path mapping table. For example, if the data indicated by the third relationship chain is in the original service cluster, that is, the data indicated by the third relationship chain has not been successfully migrated to the target service cluster, when the computing task indicated by the third relationship chain is executed, the original storage path of the critical service data is obtained from the data path mapping table, and the critical service data is accessed through the original storage path. And if the data indicated by the third relation chain is in the target service cluster, namely the data indicated by the third relation chain is successfully migrated to the target service cluster, acquiring a target storage path of the key service data from the data path mapping table when the computing task indicated by the third relation chain is run, and accessing the key service data through the target storage path. Fig. 2F is a schematic view of accessing critical service data during a migration process of a third relationship chain obtained by splitting based on the relationship chain shown in fig. 2B, where a data node 1 corresponds to the critical service data, the third relationship chain where a task node 1 is located has been migrated to a target service cluster, and the third relationship chain where task nodes 2 to 4 are located has not been migrated to the target service cluster. The calculation task indicated by the task node 1 accesses the key service data through the target storage path of the key service data, and the task nodes 2 to 4 access the key service data through the original storage paths of the key service data.
In conjunction with the above migration process of the third relationship chain obtained by splitting by using the dual writing table mechanism, the following describes the flow involved in the dual writing table mechanism, and referring to fig. 2G, the flow involved in the dual writing table mechanism includes the following processes (1) to (4):
(1) and acquiring the key data node.
This process corresponds to the process of obtaining the key data node in the second relationship chain.
(2) And synchronizing key service data.
And synchronizing the key business data in the original service cluster and the target service cluster. The process is based on the original storage path and the target storage path of the critical service data stored in the data path mapping table.
(3) Intelligent routing of critical traffic data storage paths.
And acquiring a storage path of the key service data in the corresponding service cluster from the data path mapping table according to the position of the service cluster where the third relation chain is located. Corresponding to steps b and c above.
(4) And gradually releasing the dependency relationship of the third relationship chain on the original storage path of the key business data.
After the data indicated by the third relationship chain is migrated to the target service cluster, the computing task indicated by the third relationship chain can access the key business data in the target service cluster, that is, the dependency relationship between the third relationship chain and the original storage path of the key business data is released.
In this embodiment, the migration of the data indicated by the relationship chain further includes migration of source data of the service, where the source data includes data input by the user at the user terminal, and the user terminal generates data that is not yet synchronized to the original service cluster in real time. In practical applications, this source data is typically used by the computing task. Specifically, for a relationship chain, the source data can be acquired from the real-time data processing server through a specified interface, and the source data and the business data indicated by the relationship chain are migrated to the target service cluster together, so that the normal operation of the computing task is not affected.
207. And in the data path mapping table, switching the original storage path of the service data indicated by the relationship chain in the original service cluster into a target storage path in the target service cluster.
In this embodiment, in the process of migrating the relationship chain, the migration platform may record a target storage path of each piece of service data indicated by the relationship chain, and when all pieces of service data indicated by the relationship chain are migrated to the target service cluster, for each piece of service data, the migration platform may replace, in the data path mapping table, an original storage path of the piece of service data with the target storage path of the piece of service data.
It should be noted that, if the service data is key service data, when determining that the service data corresponding to all the third relationship chains related to the key service data are migrated to the target service cluster, the migration platform deletes the original storage path of the key service data from the data path mapping table, and retains the target storage path of the key service data.
208. And migrating the computing tasks indicated by the relationship chain to the target service cluster.
In this embodiment, the process of migrating the computation task indicated by the relationship chain to the target service cluster may be: and acquiring first computing resource information and second computing resource information of the computing task, and replacing the first computing resource information of the computing task with the second computing resource information. The first computing resource information is computing resource information configured for the computing task in the original service cluster, and the second computing resource information is computing resource information configured for the computing task in the target service cluster.
It should be noted that, after migrating the computing task indicated by the relationship chain to the target service cluster, the migration platform starts to run all the computing tasks indicated by the relationship chain, thereby completing the migration of the relationship chain.
In addition, in the data migration process, incremental migration of data may also be implemented in this embodiment, where the incremental migration includes the following two layers:
and the first layer migrates the newly added data in the migration process.
In the process of data migration by taking the relationship chain as a unit, a large number of calculation tasks in the original service cluster still run, so that after the original service cluster generates a plurality of relationship chains, a large number of new service data can be generated, or calculation tasks are newly added in the original service cluster, and the newly added data can be embodied by newly added input and output records in a calculation task log. The migration platform may record a time stamp of an input/output record in the computation task log, which generates the latest time, after acquiring a plurality of relationship chains according to the computation task log. The migration platform may obtain, from the original service cluster computation task log, a newly added input/output record generated after the time stamp according to the recorded time stamp.
The migration platform may update the relationship chain that is not migrated according to the newly added input/output record, and the process may be: for any newly added input/output record, if the relationship chain which is not migrated comprises a fourth relationship chain associated with the newly added input/output record, updating the fourth relationship chain according to the newly added input/output record; if the relationship chain that is not migrated does not include the fourth relationship chain, a new relationship chain is generated according to the relationship between the new input/output record and the other new input/output records, and the process of generating the new relationship chain is the same as the process of generating the plurality of relationship chains, which is not described herein again. The fourth relationship chain associated with the newly added input/output record means that the service data indicated by the fourth relationship chain has an association relationship with the calculation task indicated by the newly added input/output record, or the calculation task indicated by the fourth relationship chain has an association relationship with the service data indicated by the newly added input/output record.
It should be noted that, according to the newly added input/output record, the migration platform may perform the step of updating the relationship chain that is not migrated, which may be performed during the migration of the relationship chain, or may be performed after the migration of a certain relationship chain is completed, which is not limited in this embodiment. The migration platform may periodically obtain a newly added input/output record in the computation task log to periodically update the relationship chain that is not migrated.
In this embodiment, in the process of migrating data indicated by the relationship chain, a new computing task generated in the original service cluster may have an association relationship with business data indicated by the migrated relationship chain, and therefore, after the data indicated by the relationship chain is migrated to the target service cluster, the new computing task needs to read and write the associated business data from the target service cluster, and since the target service cluster and the original service cluster are not in the same IDC room, the reading and writing of the business data will occupy a larger network bandwidth, therefore, the migration platform in the first layer can update the relationship chain which is not migrated in time according to the computing task log, so that the relationship chain can increase and comprehensively indicate the latest business data and computing task in the original service cluster, thereby avoiding the situation that the computing task of the original service cluster reads and writes the business data of the target service cluster to the greatest extent, thereby improving the service processing efficiency and the utilization rate of network resources. In addition, in order to further avoid the situation that the computing task of the original service cluster reads and writes the service data of the target service cluster, the migration platform can also monitor the network bandwidth occupation amount of all the computing tasks in the original service cluster, and for the computing task with the network bandwidth occupation amount higher than the preset bandwidth, the migration platform preferentially migrates the relationship chain where the computing task is located to the target service cluster.
And on the other layer, when the migration is interrupted, breakpoint continuous transmission is carried out based on the data migration state during interruption.
The breakpoint resuming process may be: when migration is carried out based on any relationship chain, when the migration interruption operation of the relationship chain is detected, recording migration subtasks which are not completed in migration, and stopping the migration process of the relationship chain; and when the continuous migration operation of the relationship chain is detected, migrating the business data and the calculation task indicated by the relationship chain to the target service cluster according to the migration subtask which does not complete migration.
It should be noted that, during data migration in a relationship chain, an emergency may occur to cause interruption of the migration process of the relationship chain, for example, a network failure occurs or service data with a higher priority needs to be migrated immediately. In the process of migrating the relationship chain according to the plurality of migration subtasks, the migration platform may record the number of the plurality of migration subtasks, and sequentially migrate the plurality of migration subtasks according to the order of the number. For different migration subtasks, the migration platform may record the state of the migration subtask, for example, the body may be an incomplete migration, an ongoing migration, and a migration complete. When the migration platform detects a migration interruption operation of a certain relationship chain, the number of migration subtasks which do not complete migration can be recorded. And when the continuous migration operation on the relationship chain is detected, only executing the migration subtasks which are not completed in migration so as to migrate the business data and the computing tasks which are not migrated when the relationship chain is interrupted to the target service cluster.
In this embodiment, in the process of performing data migration by using the relationship chain, the migration platform may further use different migration states to control the migration process, and fig. 2H shows a schematic diagram of the migration states involved in the migration process of the relationship chain. The following describes each migration state by taking a migration process of a relationship chain as an example:
and (3) starting migration: the migration of the data indicated by the relationship chain is started.
Acquiring source data: after determining that the migration process of the relationship chain is started, the migration state can be entered, and in the migration state, the migration platform acquires the source data from the real-time data processing server through the designated interface.
Waiting for user confirmation: and when the business data migration progress reaches the preset progress and before the computing task is frozen, displaying a computing task freezing confirmation interface to a user, and after the computing task freezing confirmation interface is confirmed by the user, shifting to a computing task freezing state. It should be noted that, if the migration relationship chain is the third relationship chain obtained by splitting, the migration platform enters the migration state to be confirmed by the user when determining that the original storage path and the target storage path of the critical service data indicated by the third relationship chain are both included in the data path mapping table.
Freezing the computing task: in this state, the migration platform executes the above steps 206c and 206c, and enters the next migration state when all the computing tasks are in the stop operation state.
And waiting for the service data to be consistent: after the computation task is frozen, the business data indicated by the relationship chain is in the migration state before the business data is not completely migrated to the target service cluster.
And (3) checking the consistency of the service data: and after all the business data indicated by the relationship chain are migrated to the target service cluster, entering the migration state.
Switching a service data storage path: and after the consistency of the relation chain is successfully checked, entering the migration state and executing a process of switching the service data storage path.
Migration of a computing task: and when all the business data storage paths are switched from the original service cluster to the target service cluster, entering the migration state and executing the process of migrating the computing task.
And unfreezing the computing tasks, executing the process of running all the computing tasks after the migration of the computing tasks is completed, and entering a migration completion state when all the computing tasks are normally run, thereby completing the migration of the business data and the computing tasks indicated by the relationship chain.
According to the method provided by the embodiment, the business data and the computing tasks with the association relationship are represented by one relationship chain according to the computing task log in the original service cluster, so that in the process of data migration by taking the relationship chain as a unit, the relationship chain which is being migrated does not affect other relationship chains, and the computing tasks indicated by the relationship chains which are not migrated can still be normally operated, so that the normal use of the business indicated by the relationship chains which are not migrated is not affected.
In addition, key data nodes are acquired from a large relation chain with large data volume, and key business data corresponding to the key data nodes are set as business data which can be accessed in an original service cluster and a target service cluster, so that after the large relation chain is split into a plurality of small relation chains, the small relation chains can flexibly access the key business data no matter the small relation chains belong to the original service cluster or the target service cluster, the decoupling of the correlated businesses is realized, and the gradual migration of the complex businesses through the plurality of small relation chains is realized.
In addition, in the process of migrating the data indicated by the relationship chain, the service data indicated by the relationship chain is migrated first, when the migration progress of the service data reaches the preset progress, the calculation task can be migrated at the interval when the calculation task stops running, so that the influence of data migration on the normal use of the service is greatly reduced, and because the remaining service data volume can be migrated within a short time when the service data reaches the migration progress, the time can be shorter than the running period of the calculation task, so that the normal use of the service can not be influenced in the process of data migration, and the data migration which is not perceived by a user is realized.
In addition, the business data indicated by the relationship chain is migrated by adopting different migration subtasks, so that the granularity of data migration is reduced, and the plurality of migration subtasks can run in parallel, so that the migration efficiency of the business data indicated by the relationship chain is improved. Moreover, when the migration has an error, only the migration subtask needs to be migrated again, and the service data of the whole service cluster does not need to be migrated again, so that the cost of data error in the migration process is reduced, and the efficiency of data migration is improved.
In addition, the service data and the calculation tasks in the original service cluster are gradually migrated to the target service cluster by taking a plurality of relation chains as units in a mode of breaking up the whole into parts, and in the migration process, the service data and the calculation tasks in the original service cluster are continuously reduced, so that the vacant servers in the original service cluster can be detached and moved to the target IDC machine room, the resources of server equipment can be repeatedly utilized, and the cost of data migration is reduced.
Fig. 3 is a block diagram of a data migration apparatus according to an embodiment of the present invention. Referring to fig. 3, the apparatus includes a first acquisition unit 301 and a migration unit 302.
The first obtaining unit 301 is connected to the migration unit 302, and is configured to obtain a plurality of relationship chains according to a computation task log of an original service cluster, where the computation task log is used to record an association relationship between a computation task and service data in the original service cluster, and each relationship chain is used to indicate a group of computation tasks and service data having an association relationship; a migration unit 302, configured to sequentially migrate, by using the relationship chain as a unit, the service data and the computation task indicated by the relationship chains to the target service cluster; when migration is performed based on any relationship chain, the computing tasks indicated by the relationship chains which are not subjected to migration in the plurality of relationship chains are normally operated.
In a possible implementation manner, the first obtaining unit is configured to add the same relationship chain identifier to input and output records with an association relationship and add different relationship chain identifiers to input and output records without an association relationship according to a plurality of input and output records recorded by the computation task log; and generating a plurality of relation chains according to the incidence relation between the computing task and the service data indicated by the input and output records with the same relation chain identifier, wherein each relation chain comprises a task node for indicating the computing task, a data node for indicating the service data and the incidence relation between the task node and the data node.
In one possible implementation manner, the first obtaining unit includes:
the generating subunit is used for generating a plurality of first relation chains according to the incidence relation between the calculation task and the service data indicated by the input/output records with the same relation chain identifier;
and the splitting subunit is used for splitting the second relation chain into a plurality of third relation chains if the plurality of first relation chains include the second relation chain, wherein the second relation chain is the first relation chain with the data volume of the indicated service data exceeding a first threshold value.
In a possible implementation manner, the splitting subunit is configured to obtain weights of a plurality of data nodes in the second relationship chain, where the weight of each data node is used to indicate a degree of association of the data node in the second relationship chain, and the higher the weight is, the higher the degree of association is; acquiring key data nodes from the plurality of data nodes according to the sequence of the weight values from high to low and the positions of the plurality of data nodes on the second relation chain, wherein the key data node is the first data node in the sequence and can split the second relation chain into at least two third relation chains; and splitting a plurality of task nodes associated with the key data node in the second relation chain into a plurality of third relation chains based on the key data node.
In one possible implementation, the splitting subunit is configured to:
for each task node in a plurality of task nodes directly associated with the key data node, determining the key data node, the task node and a node having an association relationship with the task node when the key data node is disconnected from the task node as a third relationship chain; or the like, or, alternatively,
determining the key data node as a third relation chain, and determining nodes which have the relation with the task node except the key data node as the third relation chain for each service node in a plurality of task nodes directly related to the key data node; or the like, or, alternatively,
splitting the key data node, at least one task node directly associated with the key data node and a node having an association relationship with the at least one task node into a third relationship chain, and splitting the task nodes and the data nodes except the split third relationship chain into at least one third relationship chain.
In a possible implementation manner, the splitting subunit is configured to, for each data node in the plurality of data nodes, determine, as a weight of the data node, a product of the number of task nodes associated with the data node and a data amount of the service data indicated by the data node.
In one possible implementation, the apparatus further includes:
a second obtaining unit, configured to obtain a target storage path of key service data in the target service cluster, where the key service data is service data indicated by the key data node;
and the adding unit is used for adding the target storage path in the data path mapping table and reserving the original storage path of the key business data in the original service cluster.
In one possible implementation, the migration unit is configured to:
in the process of migrating the third relationship chains, synchronizing the key business data in the target service cluster and the original service cluster according to the target storage path and the original storage path;
for any of the plurality of third relationship chains, performing the following process:
if all the business data and the calculation tasks indicated by the third relation chain are migrated to the target service cluster, accessing the key business data according to the target storage path recorded by the data mapping table when the calculation tasks indicated by the third relation chain are operated;
and if the indicated business data and the computation task of the third relation chain are not completely migrated to the target service cluster, accessing the key business data according to the original storage path recorded by the data mapping list when the computation task indicated by the third relation chain is operated.
In one possible implementation, the migration unit includes:
a generating subunit, configured to generate, for each relationship chain of the multiple relationship chains, multiple migration subtasks according to the multiple service data indicated by the relationship chain, where each migration subtask is used to indicate an original storage path and a target storage path of the corresponding service data;
the first migration sub-unit is used for migrating the business data indicated by the relationship chain to the target service cluster according to the plurality of migration sub-tasks;
the second migration subunit is used for migrating the computing task indicated by the relationship chain to the target service cluster;
and when the computing task indicated by the relation chain is migrated, the computing task indicated by the relation chain is in a running stop state.
In one possible implementation, the first migration subunit is configured to:
for each business data in the plurality of business data indicated by the relationship chain, the following processes are executed:
judging whether the data volume of the service data is smaller than a second threshold value;
if the data volume of the service data is smaller than the second threshold, generating a migration subtask corresponding to the service data;
if the data volume of the service data is not less than the second threshold, dividing the service data into a plurality of sub-service data according to the time sequence of data generation according to the second threshold, and generating a migration subtask corresponding to each sub-service data, wherein the data volume of each sub-service data is less than the second threshold.
In one possible implementation, the first migration subunit is further configured to:
acquiring the migration progress of the business data indicated by the relation chain in the process of migrating the business data indicated by the relation chain;
when the migration progress of the business data exceeds a preset progress, executing the following processes for each computing task indicated by the relationship chain:
judging whether the computing task is in a running stop state or not;
if the computing task is in a stop-running state, maintaining the stop-running state of the computing task before the relationship chain finishes the migration;
and if the computing task is in a running state, maintaining the running stop state of the computing task after the computing task stops running and before the relationship chain finishes migration.
In one possible implementation, the apparatus further includes:
the first checking unit is used for performing consistency checking on business data corresponding to the migration subtask in the target service cluster and the original service cluster after all business data corresponding to the migration subtask are migrated to the target service cluster for each migration subtask in the plurality of migration subtasks; if the consistency is successfully checked, determining that the business data corresponding to the migration subtask is successfully migrated; and if the consistency check fails, determining that the business data corresponding to the migration subtask fails to migrate, and re-executing the migration subtask.
In one possible implementation, the apparatus further includes:
the second checking unit is used for checking the consistency of the business data indicated by the relation chain in the target service cluster and the original service cluster; if the consistency check is successful, executing a step of migrating the computing task indicated by the relation chain to the target service cluster; and if the consistency check fails, determining the business data which fails to be migrated according to the consistency check result, and migrating the business data which fails to be migrated again.
In a possible implementation manner, the second migration subunit is configured to obtain first computing resource information and second computing resource information of the computing task, where the first computing resource information is computing resource information configured for the computing task in the original service cluster, and the second computing resource information is computing resource information configured for the computing task in the target service cluster; replacing the first computing resource information of the computing task with the second computing resource information.
In one possible implementation, the apparatus further includes:
and the switching unit is used for switching the original storage path of the service data in the original service cluster into the target storage path in the target service cluster in the data path mapping table.
In a possible implementation manner, the migration unit is further configured to, when a migration interruption operation on the relationship chain is detected during migration based on any relationship chain, record a migration subtask whose migration is not completed, and stop a migration process on the relationship chain; and when the continuous migration operation of the relationship chain is detected, migrating the business data and the calculation task indicated by the relationship chain to the target service cluster according to the migration subtask which is not completed in migration.
In one possible implementation, the apparatus further includes:
the relation chain updating unit is used for acquiring an updated calculation task log; and updating the relationship chain which is not migrated according to the updated computation task log.
According to the device provided by the embodiment, the service data and the calculation tasks with the association relationship are represented by one relationship chain according to the calculation task log in the original service cluster, so that in the process of data migration by taking the relationship chain as a unit, the relationship chain which is being migrated does not affect other relationship chains, the calculation tasks indicated by the relationship chains which are not migrated can still be normally operated, and the normal use of the services indicated by the relationship chains which are not migrated is not affected.
It should be noted that: in the data migration apparatus provided in the foregoing embodiment, when migrating data, only the division of each functional module is described as an example, and in practical applications, the above function distribution may be completed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to complete all or part of the above described functions. In addition, the data migration apparatus and the data migration method provided in the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments for details, which are not described herein again.
Fig. 4 is a block diagram of a data migration apparatus according to an embodiment of the present invention. For example, the apparatus 400 may be provided as a server. Referring to fig. 4, apparatus 400 includes a processing component 422 that further includes one or more processors and memory resources, represented by memory 432, for storing instructions, such as applications, that are executable by processing component 422. The application programs stored in memory 432 may include one or more modules that each correspond to a set of instructions. Further, the processing component 422 is configured to execute instructions to perform the method performed by the server in the data migration method embodiments described above.
The apparatus 400 may also include a power component 426 configured to perform power management of the apparatus 400, a wired or wireless network interface 450 configured to connect the apparatus 400 to a network, and an input output (I/O) interface 458. The apparatus 400 may operate based on an operating system, such as Windows Server, stored in the memory 432TM,Mac OS XTM,UnixTM,LinuxTM,FreeBSDTMOr the like.
In an exemplary embodiment, a non-transitory computer-readable storage medium comprising instructions, such as a memory comprising instructions, executable by a processor in a server to perform the data migration method in the above embodiments is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (15)

1. A method of data migration, the method comprising:
acquiring a plurality of relation chains according to a calculation task log of an original service cluster, wherein the calculation task log is used for recording the incidence relation between calculation tasks and business data in the original service cluster, and each relation chain is used for indicating a group of calculation tasks and business data with the incidence relation;
sequentially migrating the business data and the calculation tasks indicated by the plurality of relation chains to a target service cluster by taking the relation chains as units;
when migration is performed based on any relationship chain, computing tasks indicated by relationship chains which are not subjected to migration in the plurality of relationship chains are normally operated.
2. The method of claim 1, wherein obtaining a plurality of relationship chains from the computation task log of the original service cluster comprises:
adding the same relation chain identification for the input and output records with the incidence relation and adding different relation chain identifications for the input and output records without the incidence relation according to a plurality of input and output records recorded by the computing task log;
and generating a plurality of relation chains according to the incidence relation between the computing task and the service data indicated by the input and output records with the same relation chain identifier, wherein each relation chain comprises a task node for indicating the computing task, a data node for indicating the service data and the incidence relation between the task node and the data node.
3. The method of claim 2, wherein generating a plurality of relationship chains according to the relationship between the computing task and the business data indicated by the input and output records identified by the same relationship chain comprises:
generating a plurality of first relation chains according to the incidence relation between the calculation task and the service data indicated by the input and output records with the same relation chain identifier;
and if the plurality of first relation chains comprise a second relation chain, splitting the second relation chain into a plurality of third relation chains, wherein the second relation chain is the first relation chain of which the data volume of the indicated business data exceeds a first threshold value.
4. The method of claim 3, wherein splitting the second chain of relationships into a plurality of third chains of relationships comprises:
acquiring weights of a plurality of data nodes in the second relation chain, wherein the weight of each data node is used for indicating the association degree of the data node in the second relation chain, and the higher the weight is, the higher the association degree is;
acquiring key data nodes from the plurality of data nodes according to the sequence of the weight values from high to low and the positions of the plurality of data nodes on the second relation chain, wherein the key data nodes are the data nodes which are the first data nodes in the sequence and can divide the second relation chain into at least two third relation chains;
splitting a plurality of task nodes associated with the key data node in the second relationship chain into a plurality of third relationship chains based on the key data node.
5. The method of claim 4, wherein the splitting the plurality of task nodes associated with the key data node in the second relationship chain into a plurality of third relationship chains based on the key data node comprises:
for each task node in a plurality of task nodes directly associated with the key data node, determining the key data node, the task node and a node having an association relationship with the task node when the key data node is disconnected from the task node as a third relationship chain; or the like, or, alternatively,
determining the key data node as a third relation chain, and determining nodes which have the relation with the task nodes except the key data node as the third relation chain for each task node in a plurality of task nodes directly related to the key data node; or the like, or, alternatively,
splitting the key data node, at least one task node directly associated with the key data node and a node having an association relationship with the at least one task node into a third relationship chain, and splitting the task nodes and the data nodes except the split third relationship chain into at least one third relationship chain.
6. The method of claim 4, wherein after splitting the second chain of relationships into a plurality of third chains of relationships, the method further comprises:
acquiring a target storage path of key business data in the target service cluster, wherein the key business data is business data indicated by the key data node;
and adding the target storage path in a data path mapping table, and reserving an original storage path of the key business data in the original service cluster.
7. The method of claim 1, wherein the migrating the business data and the computing tasks indicated by the relationship chains to the target service cluster in sequence by taking the relationship chains as units comprises:
for each relation chain in the plurality of relation chains, generating a plurality of migration subtasks according to the plurality of service data indicated by the relation chain, wherein each migration subtask is used for indicating an original storage path and a target storage path of the corresponding service data;
migrating the business data indicated by the relation chain to the target service cluster according to the plurality of migration subtasks;
migrating the computing task indicated by the relationship chain to the target service cluster;
and when the computing task indicated by the relation chain is migrated, the computing task indicated by the relation chain is in a running stop state.
8. The method of claim 7, wherein migrating the traffic data indicated by the relationship chain to the target service cluster comprises:
acquiring the migration progress of the business data indicated by the relation chain in the process of migrating the business data indicated by the relation chain;
when the migration progress of the business data exceeds a preset progress, executing the following processes for each computing task indicated by the relation chain:
judging whether the computing task is in a running stop state or not;
if the computing task is in a stop-running state, maintaining the stop-running state of the computing task before the relationship chain finishes migration;
and if the computing task is in a running state, maintaining the running stopping state of the computing task after the computing task stops running and before the relationship chain finishes migration.
9. The method of claim 7, wherein migrating the computing task indicated by the relationship chain to the target service cluster comprises:
acquiring first computing resource information and second computing resource information of the computing task, wherein the first computing resource information is computing resource information configured for the computing task in the original service cluster, and the second computing resource information is computing resource information configured for the computing task in the target service cluster;
replacing the first computing resource information of the computing task with the second computing resource information.
10. The method of claim 7, wherein after migrating the traffic data indicated by the relationship chain to the target service cluster, the method further comprises:
and in a data path mapping table, switching the original storage path of the service data in the original service cluster into a target storage path in the target service cluster.
11. An apparatus for data migration, the apparatus comprising:
the system comprises a first obtaining unit, a second obtaining unit and a third obtaining unit, wherein the first obtaining unit is used for obtaining a plurality of relation chains according to a calculation task log of an original service cluster, the calculation task log is used for recording the relation between calculation tasks and business data in the original service cluster, and each relation chain is used for indicating a group of calculation tasks and business data with the relation;
the migration unit is used for sequentially migrating the business data and the calculation tasks indicated by the plurality of relation chains to the target service cluster by taking the relation chains as units;
when migration is performed based on any relationship chain, computing tasks indicated by relationship chains which are not subjected to migration in the plurality of relationship chains are normally operated.
12. The apparatus according to claim 11, wherein the first obtaining unit is configured to add, according to a plurality of input/output records recorded by the computation task log, the same relationship chain identifier to an input/output record having an association relationship, and add a different relationship chain identifier to an input/output record not having an association relationship; and generating a plurality of relation chains according to the incidence relation between the computing task and the service data indicated by the input and output records with the same relation chain identifier, wherein each relation chain comprises a task node for indicating the computing task, a data node for indicating the service data and the incidence relation between the task node and the data node.
13. The apparatus of claim 12, wherein the first obtaining unit comprises:
the generating subunit is used for generating a plurality of first relation chains according to the incidence relation between the calculation task and the service data indicated by the input/output records with the same relation chain identifier;
the splitting subunit is configured to split the second relation chain into a plurality of third relation chains if the plurality of first relation chains include the second relation chain, where the second relation chain is the first relation chain in which the data amount of the indicated service data exceeds the first threshold.
14. The apparatus according to claim 13, wherein the splitting unit is configured to obtain weights of a plurality of data nodes in the second relationship chain, the weight of each data node is used to indicate a degree of association of the data node in the second relationship chain, and the higher the weight is, the higher the degree of association is; acquiring key data nodes from the plurality of data nodes according to the sequence of the weight values from high to low and the positions of the plurality of data nodes on the second relation chain, wherein the key data nodes are the data nodes which are the first data nodes in the sequence and can divide the second relation chain into at least two third relation chains; splitting a plurality of task nodes associated with the key data node in the second relationship chain into a plurality of third relationship chains based on the key data node.
15. The apparatus of claim 14, wherein the molecule cleaving unit is configured to:
for each task node in a plurality of task nodes directly associated with the key data node, determining the key data node, the task node and a node having an association relationship with the task node when the key data node is disconnected from the task node as a third relationship chain; or the like, or, alternatively,
determining the key data node as a third relation chain, and determining nodes which have the relation with the task nodes except the key data node as the third relation chain for each task node in a plurality of task nodes directly related to the key data node; or the like, or, alternatively,
splitting the key data node, at least one task node directly associated with the key data node and a node having an association relationship with the at least one task node into a third relationship chain, and splitting the task nodes and the data nodes except the split third relationship chain into at least one third relationship chain.
CN201710197702.7A 2017-03-29 2017-03-29 Data migration method and device Active CN108664496B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201710197702.7A CN108664496B (en) 2017-03-29 2017-03-29 Data migration method and device
PCT/CN2018/078398 WO2018177107A1 (en) 2017-03-29 2018-03-08 Data migration method, migration server, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710197702.7A CN108664496B (en) 2017-03-29 2017-03-29 Data migration method and device

Publications (2)

Publication Number Publication Date
CN108664496A CN108664496A (en) 2018-10-16
CN108664496B true CN108664496B (en) 2022-03-25

Family

ID=63674187

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710197702.7A Active CN108664496B (en) 2017-03-29 2017-03-29 Data migration method and device

Country Status (2)

Country Link
CN (1) CN108664496B (en)
WO (1) WO2018177107A1 (en)

Families Citing this family (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110399356B (en) * 2019-06-14 2023-02-24 阿里巴巴集团控股有限公司 Online data migration method and device, computing equipment and storage medium
CN110503199A (en) * 2019-08-14 2019-11-26 北京中科寒武纪科技有限公司 Method for splitting and device, the electronic equipment and storage medium of operation node
CN110490322A (en) * 2019-08-14 2019-11-22 北京中科寒武纪科技有限公司 Method for splitting and device, the electronic equipment and storage medium of operation node
CN110597609A (en) * 2019-09-17 2019-12-20 深圳市及响科技有限公司 Cluster migration and automatic recovery method and system
CN110989929A (en) * 2019-11-22 2020-04-10 浪潮电子信息产业股份有限公司 MON service migration method, device, equipment and readable storage medium
CN113051245A (en) * 2019-12-26 2021-06-29 云丁网络技术(北京)有限公司 Method, device and system for migrating data
CN111258985A (en) * 2020-01-17 2020-06-09 中国工商银行股份有限公司 Data cluster migration method and device
CN113438267B (en) * 2020-03-23 2023-02-28 华为技术有限公司 Method and equipment for analyzing stream data
CN111274230B (en) * 2020-03-26 2024-03-08 北京奇艺世纪科技有限公司 Data migration management method, device, equipment and storage medium
CN111459411B (en) * 2020-03-30 2023-07-21 北京奇艺世纪科技有限公司 Data migration method, device, equipment and storage medium
CN111708755A (en) * 2020-05-20 2020-09-25 北京奇艺世纪科技有限公司 Data migration method, device, system, electronic equipment and readable storage medium
CN111708763B (en) * 2020-06-18 2023-12-01 北京金山云网络技术有限公司 Data migration method and device of sliced cluster and sliced cluster system
CN114024956B (en) * 2020-07-17 2024-03-12 北京达佳互联信息技术有限公司 Data migration method, device, server and storage medium
CN112506606A (en) * 2020-11-23 2021-03-16 北京达佳互联信息技术有限公司 Migration method, device, equipment and medium for containers in cluster
CN112653539B (en) * 2020-12-29 2023-06-20 杭州趣链科技有限公司 Storage method, device and equipment for data to be stored
CN113535087B (en) * 2021-07-13 2023-10-17 咪咕互动娱乐有限公司 Data processing method, server and storage system in data migration process
CN114785796A (en) * 2022-04-22 2022-07-22 中国农业银行股份有限公司 Data equalization method and device
CN116049096B (en) * 2022-05-05 2024-04-16 荣耀终端有限公司 Data migration method, electronic equipment and storage medium
CN116954870B (en) * 2023-09-19 2024-02-02 苏州元脑智能科技有限公司 Migration method, recovery method and device of cross-cluster application and cluster system

Family Cites Families (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6047323A (en) * 1995-10-19 2000-04-04 Hewlett-Packard Company Creation and migration of distributed streams in clusters of networked computers
US20040153451A1 (en) * 2002-11-15 2004-08-05 John Phillips Methods and systems for sharing data
JP4856932B2 (en) * 2005-11-18 2012-01-18 株式会社日立製作所 Storage system and data movement method
CN102999537B (en) * 2011-09-19 2017-01-18 阿里巴巴集团控股有限公司 System and method for data migration
CN103164261B (en) * 2011-12-15 2016-04-27 中国移动通信集团公司 Multicenter data task disposal route, Apparatus and system
CN102855299A (en) * 2012-08-16 2013-01-02 上海引跑信息科技有限公司 Method for realizing iterative migration of distributed database without interrupting service
CN102982085B (en) * 2012-10-31 2017-05-31 北京奇虎科技有限公司 Data mover system and method
CN103324466B (en) * 2013-05-24 2017-05-03 浪潮电子信息产业股份有限公司 Data dependency serialization IO parallel processing method
CN103647849B (en) * 2013-12-24 2017-02-08 华为技术有限公司 Method and device for migrating businesses and disaster recovery system
CN104935618B (en) * 2014-03-19 2018-01-19 福建福昕软件开发股份有限公司 A kind of clustered deploy(ment) method
CN103955491B (en) * 2014-04-15 2017-04-19 南威软件股份有限公司 Method for synchronizing timing data increment
CN103970879B (en) * 2014-05-16 2017-05-24 中国人民解放军国防科学技术大学 Method and system for regulating storage positions of data blocks
CN104184813B (en) * 2014-08-20 2018-03-09 杭州华为数字技术有限公司 The load-balancing method and relevant device and group system of virtual machine
CN105404474A (en) * 2015-12-07 2016-03-16 上海爱数信息技术股份有限公司 Data migration method of heterogeneous distributed memory system
CN106055670A (en) * 2016-06-06 2016-10-26 中国工商银行股份有限公司 Inter-system data migration method and device
CN106202212A (en) * 2016-06-28 2016-12-07 微梦创科网络科技(中国)有限公司 A kind of method and system realizing data fractionation based on data server cluster

Also Published As

Publication number Publication date
CN108664496A (en) 2018-10-16
WO2018177107A1 (en) 2018-10-04

Similar Documents

Publication Publication Date Title
CN108664496B (en) Data migration method and device
US11588755B2 (en) Distributed stream-based database triggers
CN109120678B (en) Method and apparatus for service hosting of distributed storage system
WO2019154394A1 (en) Distributed database cluster system, data synchronization method and storage medium
CN109491776B (en) Task arranging method and system
US20160239392A1 (en) Preserving management services with distributed metadata through the disaster recovery life cycle
US8082344B2 (en) Transaction manager virtualization
CN113569987A (en) Model training method and device
CN110750592A (en) Data synchronization method, device and terminal equipment
CN111880934A (en) Resource management method, device, equipment and readable storage medium
US11409711B2 (en) Barriers for dependent operations among sharded data stores
CN111984370A (en) Method and device for online migration of multi-disk virtual machine to different storage pools
CN110377664B (en) Data synchronization method, device, server and storage medium
US8621260B1 (en) Site-level sub-cluster dependencies
CN112631994A (en) Data migration method and system
EP3349416B1 (en) Relationship chain processing method and system, and storage medium
CN112667711A (en) MySQL read-only instance management method and system
CN116954816A (en) Container cluster control method, device, equipment and computer storage medium
WO2023070935A1 (en) Data storage method and apparatus, and related device
JP2017194729A (en) Computer system and system state reproducing method
CN109189615A (en) A kind of delay machine treating method and apparatus
CN114969206A (en) Data processing method, device, equipment and storage medium
CN111400098B (en) Copy management method and device, electronic equipment and storage medium
WO2017107828A1 (en) Method and device for processing data after restart of node
US10511656B1 (en) Log information transmission integrity

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant