CN115185886A - Partition-based data migration method and device - Google Patents

Partition-based data migration method and device Download PDF

Info

Publication number
CN115185886A
CN115185886A CN202210817948.0A CN202210817948A CN115185886A CN 115185886 A CN115185886 A CN 115185886A CN 202210817948 A CN202210817948 A CN 202210817948A CN 115185886 A CN115185886 A CN 115185886A
Authority
CN
China
Prior art keywords
data
migration
partition
migrated
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210817948.0A
Other languages
Chinese (zh)
Inventor
李丹峰
赵吉昆
王梦竹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202210817948.0A priority Critical patent/CN115185886A/en
Publication of CN115185886A publication Critical patent/CN115185886A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/11File system administration, e.g. details of archiving or snapshots
    • G06F16/119Details of migration of file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/214Database migration support

Abstract

The disclosure provides a data migration method based on partitions, which can be applied to the technical field of finance. The method is applied to a multi-cluster distributed system and comprises the following steps: determining data information to be migrated and a target migration partition according to the data migration configuration file; carrying out validity check on the target migration partition; when the target migration partition is determined to be effective, acquiring a data migration direction, wherein the data migration direction represents a data migration direction among multiple clusters; executing a migration data acquisition script according to the data migration direction to acquire data to be migrated; and executing a data migration transmission script according to the data to be migrated, the target migration partition and the data migration direction so as to perform partition-granularity data migration. The present disclosure also provides a partition-based data migration apparatus, device, storage medium, and program product.

Description

Partition-based data migration method and device
Technical Field
The present disclosure relates to the field of big data technologies, and in particular, to the field of data automated migration technologies, and in particular, to a partition-based data migration method, apparatus, device, storage medium, and program product.
Background
In the big data technology, a distributed ecosystem represented by hadoop is widely applied. In the process of testing a big data system, considering factors such as testing effect and cost, a testing environment does not provide sufficient cluster resources like an actual production environment, so that data stored in the testing environment is limited. The test cycles of each test environment are not consistent, and the data stored in each test environment is not consistent. However, in the data warehouse management hive architecture, single granularity data modification is not supported.
In the related art, the live table is migrated, and a large amount of invalid data exists in the data migration mode, so that the data migration efficiency is reduced.
It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.
Disclosure of Invention
In view of the foregoing, the present disclosure provides a partition-based data migration method, apparatus, device, medium, and program product that improve data migration efficiency.
According to a first aspect of the present disclosure, there is provided a partition-based data migration method applied to a multi-cluster distributed system, the data migration method including:
determining data information to be migrated and a target migration partition according to the data migration configuration file;
carrying out validity check on the target migration partition;
when the target migration partition is determined to be effective, acquiring a data migration direction, wherein the data migration direction represents a data migration direction among multiple clusters;
executing a migration data acquisition script according to the data migration direction to acquire data to be migrated; and
and executing a data migration transmission script according to the data to be migrated, the target migration partition and the data migration direction so as to perform partition-granularity data migration.
According to an embodiment of the present disclosure, executing a migration data obtaining script according to the data migration direction to obtain data to be migrated includes:
generating a migration data acquisition script according to the information of the data to be migrated;
authenticating a cluster where a partition to be migrated is located according to the data migration direction;
and storing the data to be migrated to the local according to the partition granularity.
According to an embodiment of the present disclosure, the generating a migration data acquisition script according to the to-be-migrated data information includes:
traversing the data information to be migrated to determine a target input partition;
generating a first target file according to the target input partition; and
and generating a migration data acquisition script according to the partition range recorded by the first target file.
According to an embodiment of the present disclosure, the data information to be migrated includes a name of a database to be migrated, a name of a data table to be migrated, and a name of a partition to be migrated, and traversing the data information to be migrated to determine a target input partition includes:
traversing the names of the partitions to be migrated, and determining the range of the available partitions to be migrated;
and determining a target input partition according to the available partition range to be migrated.
According to an embodiment of the present disclosure, the executing data migration transmission script according to the data to be migrated, the target migration partition, and the data migration direction to perform data migration at partition granularity includes:
generating a data migration transmission script according to the data to be migrated and the target migration partition;
and executing the data migration transmission script to perform data migration with partition granularity according to the data migration direction.
According to an embodiment of the present disclosure, the generating a data migration transmission script according to the data to be migrated and the target migration partition includes:
traversing the acquired data directory according to the migration data to generate a second target file;
and generating a data migration transmission script according to the data of the second target file.
According to an embodiment of the present disclosure, before executing the migration data transfer script, the method further includes:
generating a corresponding hive partition script according to the second target file;
and executing the hive partition script to establish a hive partition.
According to an embodiment of the present disclosure, the checking validity of the target migration partition includes:
determining a target cluster according to the target migration partition;
when the target cluster is determined to have the target migration partition, determining that the target migration partition is invalid; and
when it is determined that the target cluster does not have the target migration partition, determining that the target migration partition is valid.
A second aspect of the present disclosure provides a partition-based data migration apparatus, including: the first determining module is used for determining the data information to be migrated and the target migration partition according to the data migration configuration file;
the verification module is used for verifying the validity of the target migration partition;
the first obtaining module is used for obtaining a data migration direction when the target migration partition is determined to be effective, wherein the data migration direction represents a data migration direction among multiple clusters;
the second acquisition module is used for executing a migration data acquisition script according to the data migration direction so as to acquire data to be migrated;
and the transmission module is used for executing a data migration transmission script according to the data to be migrated, the target migration partition and the data migration direction so as to perform data migration with partition granularity.
According to an embodiment of the present disclosure, the second obtaining module includes:
the first generation submodule is used for generating a migration data acquisition script according to the information of the data to be migrated;
the authentication submodule is used for authenticating the cluster where the partition to be migrated is located according to the data migration direction;
and the to-be-migrated data storage submodule is used for storing the to-be-migrated data to the local according to the partition granularity.
According to an embodiment of the present disclosure, the first generation submodule includes:
the traversing unit is used for traversing the data information to be migrated to determine a target input partition;
the first generation unit is used for generating a first target file according to the target input partition; and
and the second generation unit is used for generating a migration data acquisition script according to the partition range recorded by the first target file.
According to an embodiment of the present disclosure, a traversal unit includes:
the traversal subunit is configured to traverse the names of the partitions to be migrated, and determine an available partition range to be migrated;
and the target input partition determining subunit is used for determining the target input partition according to the available partition range to be migrated.
According to an embodiment of the present disclosure, a transmission module includes:
the second generation submodule is used for generating a data migration transmission script according to the data to be migrated and the target migration partition;
and the migration submodule is used for executing data migration transmission script to perform data migration with partition granularity according to the data migration direction.
According to an embodiment of the present disclosure, the second generation submodule includes:
the third generation unit is used for traversing the acquired data directory according to the migration data to generate a second target file;
and the fourth generating unit is used for generating a data migration transmission script according to the data of the second target file.
According to an embodiment of the present disclosure, further comprising:
the generating module is used for generating a corresponding hive partition script according to the second target file;
and the execution module is used for executing the hive partition script to establish a hive partition.
A third aspect of the present disclosure provides an electronic device, comprising: one or more processors; memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the partition-based data migration method described above.
A fourth aspect of the present disclosure also provides a computer-readable storage medium having stored thereon executable instructions that, when executed by a processor, cause the processor to perform the partition-based data migration method described above.
A fifth aspect of the present disclosure also provides a computer program product comprising a computer program which, when executed by a processor, implements the partition-based data migration method described above.
According to the partition-based data migration method provided by the embodiment of the disclosure, data information to be migrated and a target migration partition are determined according to a data migration configuration file, after validity of the target migration partition is checked, when the target migration partition is determined to be valid, a data migration direction is obtained, a migration data obtaining script and a data migration transmission script are generated, and the script is executed to complete data migration of partition granularity.
Drawings
The foregoing and other objects, features and advantages of the disclosure will be apparent from the following description of embodiments of the disclosure, taken in conjunction with the accompanying drawings of which:
FIG. 1 schematically illustrates an application scenario diagram of a partition-based data migration method, apparatus, device, medium, and program product according to embodiments of the disclosure;
FIG. 2 schematically illustrates a flow diagram of a partition-based data migration method according to an embodiment of the present disclosure;
FIG. 3 schematically shows a flowchart of a data to be migrated acquisition method according to an embodiment of the present disclosure;
FIG. 4 schematically illustrates a flow diagram of a migration data acquisition script generation method in accordance with an embodiment of the present disclosure;
FIG. 5 schematically illustrates a flow diagram of a method of data migration according to an embodiment of the present disclosure;
FIG. 6 schematically illustrates a flow chart of a method of generating a data migration transfer script according to an embodiment of the present disclosure;
FIG. 7 schematically illustrates a flow chart of a hive partition establishment method according to an embodiment of the disclosure;
FIG. 8 is a block diagram that schematically illustrates an architecture of a partition-based data migration apparatus, in accordance with an embodiment of the present disclosure;
FIG. 9 schematically illustrates a block diagram of an electronic device suitable for implementing a partition-based data migration method according to an embodiment of the present disclosure.
Detailed Description
Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.
All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.
In those instances where a convention analogous to "at least one of A, B, and C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B, and C" would include but not be limited to systems that have A alone, B alone, C alone, A and B together, A and C together, B and C together, and/or A, B, and C together, etc.).
The terms appearing in the present disclosure are explained first:
hadoop: the distributed data processing system is a distributed system infrastructure, is a software framework capable of performing distributed processing on a large amount of data, and performs data processing in a reliable, efficient and scalable manner. And the HDFS is arranged at the bottom of the Hadoop and stores files on all storage nodes in the Hadoop cluster. One layer above the HDFS is the MapReduce engine.
HDFS: the Hadoop Distributed File System and the HDFS Distributed File System are Hadoop Distributed File systems, have high fault-tolerant capability, can be deployed on low-cost hardware, and are widely used for large data File storage.
HIVE: the distributed data at PB level can be inquired and managed by hadoop-based data warehouse software. The hive partition corresponds to the HDFS directory, and a mapping relation exists. Therefore, in the embodiment of the invention, the service personnel can only inquire the service personnel by establishing a new HIVE partition corresponding to the HDFS. While hive does not support the modification of a single field, so migration is required.
In the big data technology, a distributed ecosystem represented by hadoop is widely applied. In the test process of the big data system, considering the factors of test effect, cost and the like, the test environment does not provide sufficient cluster resources like the actual production environment, so that the data stored in the test environment is limited. The test cycles of each test environment are not consistent, and the data stored by each test environment is not consistent. Meanwhile, the test dates of the test environment and the production environment are not consistent, so that the situation that the test date and the natural date are not consistent exists in the big data loading process based on kafka data transmission. The testing efficiency and the data utilization rate are greatly reduced.
Based on the above technical problem, an embodiment of the present disclosure provides a partition-based data migration method, which is applied to a multi-cluster distributed system, and the method includes: determining data information to be migrated and a target migration partition according to the data migration configuration file; performing validity check on the target migration partition; when the target migration partition is determined to be effective, acquiring a data migration direction, wherein the data migration direction represents a data migration direction among multiple clusters; executing a migration data acquisition script according to the data migration direction to acquire data to be migrated; and executing a data migration transmission script according to the data to be migrated, the target migration partition and the data migration direction so as to perform data migration with partition granularity.
FIG. 1 schematically illustrates an application scenario diagram of a partition-based data migration method, apparatus, device, medium and program product according to an embodiment of the present disclosure.
As shown in fig. 1, an application scenario 100 according to this embodiment may include a distributed data migration scenario. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The tester may use the terminal devices 101, 102, 103 to interact with the server 105 over the network 104 to receive or send messages or the like. The terminal devices 101, 102, 103 may have installed thereon various communication client applications, such as shopping-like applications, web browser applications, search-like applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only).
The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 105 may be a data migration server, and checks the to-be-migrated data and the target migration partition for the data migration configuration file input by the tester, for example, in response to a data migration instruction sent by the tester by using the terminal devices 101, 102, and 103, the data migration server may generate a migration data acquisition script and a data migration transmission script for the received data migration configuration file, and execute the script to perform data migration in a partition granularity.
It should be noted that the partition-based data migration method provided by the embodiment of the present disclosure may be generally executed by the server 105. Accordingly, the partition-based data migration apparatus provided by the embodiments of the present disclosure may be generally disposed in the server 105. The partition-based data migration method provided by the embodiment of the present disclosure may also be performed by a server or a server cluster that is different from the server 105 and is capable of communicating with the terminal devices 101, 102, 103 and/or the server 105. Correspondingly, the partition-based data migration apparatus provided in the embodiment of the present disclosure may also be disposed in a server or a server cluster that is different from the server 105 and is capable of communicating with the terminal devices 101, 102, 103 and/or the server 105.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for an implementation.
It should be noted that the partition-based data migration method and apparatus determined in the embodiments of the present disclosure may be used in the field of financial technology, and may also be used in any field other than the financial field.
The partition-based data migration method according to the embodiment of the present disclosure will be described in detail below with reference to fig. 2 to 7 based on the scenario described in fig. 1.
FIG. 2 schematically illustrates a flow diagram of a partition-based data migration method according to an embodiment of the present disclosure. As shown in FIG. 2, the partition-based data migration method of this embodiment includes operations S210-S250, which may be performed by a server or other computing device. The embodiment of the disclosure performs partition-granularity data migration from the hdfs level.
In operation S210, data information to be migrated and a target migration partition are determined according to the data migration configuration file.
According to the embodiment of the disclosure, the data information to be migrated includes a name of a database to be migrated, a name of a data table to be migrated, and a name of a partition to be migrated.
In one example, the user generates a data migration configuration file according to the test requirement of the user to determine the data information to be migrated and the target migration partition, where the data information to be migrated includes a name of a database to be migrated, a name of a data table to be migrated, and a name of the partition to be migrated.
In operation S220, validity checking is performed on the target migration partition.
In one example, to improve data migration efficiency, a validity check is first performed on the target migration partition to verify the availability of the target migration partition. The extensible target cluster is determined according to the target migration partition; when the target cluster is determined to have the target migration partition, determining that the target migration partition is invalid. And judging whether the corresponding partition has data or not in the target cluster. And if the data exists in the partition corresponding to the target cluster, interrupting the whole process. (for example, through Hadoop fs-ls/user/hive/ecs/tb _1010, all partitions below a corresponding table are obtained and correspond to a migration data range submitted by a user.
In addition, the data to be migrated needs to be checked. The method is mainly used for judging whether the table name input by the user has the authority to migrate the table name. Whether the authority exists is judged through the beginning of the table name. E.g. beginning with DCM _ or FCM _ then the right is asserted. Specifically, a corresponding query statement of the partition size (e.g., hash fs-du-sh/user/hive/ecs/tb _ 1010) is generated according to the environment selected by the user, the input library name, and the table name. Then, authenticating the result to a corresponding hadoop cluster, and executing a partition size query statement; and the query result is fed back to the foreground, and the table name input by the user can be judged to be correct if the query is successful. If the query fails, the table name is fed back to the foreground, and the table name is wrong.
In operation S230, when it is determined that the target migration partition is valid, a data migration direction is acquired.
According to an embodiment of the present disclosure, the data migration direction characterizes a data migration direction between multiple clusters.
In one example, when data migration is performed, the direction of migration needs to be determined. Taking AB as an example, it needs to be determined whether to migrate from a to B or from B to a. In the embodiment of the present disclosure, data migration involving three clusters includes 6 migration directions; and determining the corresponding migration direction by the background through different identifiers according to different buttons selected by the user in the foreground. Since hive supports different types of retrieval, such as date, time, etc., it can be used as a partition field. Then in an actual test environment, multiple partition types may be included. Therefore, the determination of the migration type is also required. In the embodiment of the present disclosure, four different partition types and corresponding migration types are mainly included 2022-04-27, 2022-04-27-ar, 2022-04-27-051, and 2022-04-27-01.
In operation S240, a migration data obtaining script is executed according to the data migration direction to obtain data to be migrated.
In one example, a migration data acquisition script is first generated according to the data migration direction, and the specific process of the migration data acquisition script may refer to operations S2411 to S2413 shown in fig. 4. Executing the generated migration data obtaining script to obtain the data to be migrated from the partition of the data to be migrated to the local, the specific process may refer to operations S241 to S243 shown in fig. 3.
In operation S250, a data migration transmission script is executed according to the data to be migrated, the target migration partition, and the data migration direction, so as to perform data migration with partition granularity.
In one example, a data migration transmission script is first generated, and the specific process may refer to operation S2511 and operation S2512 shown in fig. 6. And authenticating the target migration partition to the corresponding target cluster according to the target migration partition, and executing the corresponding data migration transmission script to finish the data migration of the partition granularity.
According to the partition-based data migration method provided by the embodiment of the disclosure, data information to be migrated and a target migration partition are determined according to a data migration configuration file, after validity of the target migration partition is checked, when the target migration partition is determined to be valid, a data migration direction is obtained, a migration data obtaining script and a data migration transmission script are generated, and the script is executed to complete data migration of partition granularity.
Fig. 3 schematically shows a flowchart of a data to be migrated acquisition method according to an embodiment of the present disclosure. FIG. 4 schematically shows a flow diagram of a migration data acquisition script generation method according to an embodiment of the present disclosure.
As shown in fig. 3, operation S240 includes operations S241 to S243.
In operation S241, a migration data acquisition script is generated according to the to-be-migrated data information.
In operation S242, the cluster where the partition to be migrated is located is authenticated according to the data migration direction.
In operation S243, the data to be migrated is saved locally according to the partition granularity.
In one example, a migration data acquisition script is generated according to the information of the data to be migrated to perform automatic acquisition of the source data, which is described in detail in operations S2411 to S2413 shown in fig. 4.
In one example, after the migration data acquisition script is generated, the original cluster where the partition to be migrated is located is authenticated according to the data migration direction, and the data to be migrated is stored locally according to the partition granularity.
As shown in fig. 4, the operation S240 includes operations S2411 to S2413.
In operation S2411, the data information to be migrated is traversed to determine a target input partition.
According to the embodiment of the disclosure, traversing the names of the partitions to be migrated, and determining the range of the available partitions to be migrated; and determining a target input partition according to the available partition range to be migrated.
In operation S2412, a first target file is generated according to the target input partition.
In operation S2413, a migration data acquisition script is generated according to the partition range recorded by the first object file.
In one example, in order to facilitate that a user may implement batch data migration, in an input portion of migration data, the user may input a range of partitions through a data migration configuration file to perform migration of multiple partitions. In the actual migration process, the partition range input by the user needs to be traversed, the partition which meets the input of the user is obtained and serves as a target input partition, and the partition is recorded in a time. And splicing to generate a corresponding migration data acquisition script according to the partition range recorded in the time. For example, hadoop fs-get/user/hive/ecs.db/dcm _ ecs _ tb1010_ s/.
After the data to be migrated is acquired, data migration needs to be performed on the data to be migrated. FIG. 5 schematically shows a flow chart of a method of data migration according to an embodiment of the present disclosure. Fig. 6 schematically shows a flowchart of a method for generating a data migration transmission script according to an embodiment of the present disclosure. As shown in fig. 5, operation S250 includes operations S251 to S252.
In operation S251, a data migration transmission script is generated according to the data to be migrated and the target migration partition.
In operation S252, data migration of partition granularity is performed by executing the data migration transmission script according to the data migration direction.
In one example, similarly, a data migration transmission script is first generated according to the data to be migrated and the target migration partition, which is described in detail in operation S2511 and operation S2512 shown in fig. 6. And executing the data migration transmission script according to the data migration direction to realize the data migration with the partition granularity.
As shown in fig. 6, operation S251 includes operation S2511 and operation S2512.
In operation S2511, the acquired data directory is traversed according to the data to be migrated to generate a second target file. In operation S2512, a data migration transmission script is generated according to the data of the second target file.
In one example, there is a user submitting a certain time range of data, but there are some partitions without data. For example, a user submitted a range of 0301-0331, 0304 in the original cluster may have no data. In order to reduce migration of the partial invalid partitions, migration efficiency is improved. In the actual use process, a corresponding data transmission script is generated according to the data acquired from the original cluster. Firstly, traversing the acquired data directory, and recording the data directory in the data.txt file as a second target file. For example, the data.txt can be obtained by traversing the directory according to the obtained local data to be migrated. And splicing to generate a corresponding migration data transmission script according to the data recorded in the data. For example, hadoop fs-put/user/hive/ecs. Db/dcm _ ecs _ tb1010s/.
Fig. 7 schematically shows a flowchart of a hive partition establishing method according to an embodiment of the present disclosure. Before the data migration is completed by the data migration transmission script, in order to enable normal use of business personnel, a hive partition corresponding to hdfs needs to be established, which includes operations S310 and S320.
In operation S310, a corresponding hive partition script is generated according to the second object file.
In operation S320, the hive partition script is executed to establish a hive partition.
In one example, a corresponding hive partition script is generated from the data. For example, alter table ecs, tab _10010s add partition (pt _ dt = 'AAA'). And running the hive partition script to establish a hive partition.
The embodiment of the disclosure provides a partition-based data migration method, which can help a user to realize data migration operations of cross-cluster, same-cluster and same-table more quickly and efficiently, and meanwhile, supports conventional date, multi-batch, special partition field and hive partition migration, improves the utilization rate of data, reduces the time for preparing data by service, and improves the test efficiency.
Based on the partition-based data migration method, the disclosure also provides a partition-based data migration device. The apparatus will be described in detail below with reference to fig. 8.
FIG. 8 is a block diagram schematically illustrating a structure of a partition-based data migration apparatus according to an embodiment of the present disclosure.
As shown in fig. 8, the partition-based data migration apparatus 800 of this embodiment includes a first determining module 810, a verifying module 820, a first obtaining module 830, a second obtaining module 840, and a transmitting module 850.
The first determining module 810 is configured to determine, according to the data migration configuration file, data information to be migrated and a target migration partition. In an embodiment, the first determining module 810 may be configured to perform the operation S210 described above, which is not described herein again.
The checking module 820 is configured to perform validity checking on the target migration partition. In an embodiment, the checking module 820 may be configured to perform the operation S220 described above, which is not described herein again.
The first obtaining module 830 is configured to, when it is determined that the target migration partition is valid, obtain a data migration direction, where the data migration direction represents a data migration direction between multiple clusters. In an embodiment, the first obtaining module 830 may be configured to perform the operation S230 described above, and is not described herein again.
The second obtaining module 840 is configured to execute a migration data obtaining script according to the data migration direction, so as to obtain data to be migrated. In an embodiment, the output module 840 may be configured to perform the operation S240 described above, which is not described herein again.
The transmission module 850 is configured to execute a data migration transmission script according to the data to be migrated, the target migration partition, and the data migration direction, so as to perform partition-granular data migration. In an embodiment, the transmission module 850 may be configured to perform the operation S250 described above, which is not described herein again.
According to an embodiment of the present disclosure, the second obtaining module 840 includes: the device comprises a first generation submodule, an authentication submodule and a to-be-migrated data storage submodule.
And the first generation submodule is used for generating a migration data acquisition script according to the information of the data to be migrated. In an embodiment, the first generating submodule may be configured to perform the operation S241 described above, which is not described herein again.
And the authentication submodule is used for authenticating the cluster where the partition to be migrated is located according to the data migration direction. In an embodiment, the authentication sub-module may be configured to perform the operation S242 described above, which is not described herein again.
And the to-be-migrated data storage submodule is used for storing the to-be-migrated data to the local according to the partition granularity. In an embodiment, the to-be-migrated data saving submodule may be configured to perform operation S243 described above, which is not described herein again.
According to an embodiment of the present disclosure, the first generation submodule includes: the device comprises a traversing unit, a first generating unit and a second generating unit.
And the traversing unit is used for traversing the data information to be migrated so as to determine a target input partition. In an embodiment, the traversal unit may be configured to perform the operation S2411 described above, which is not described herein again.
The first generation unit is used for generating a first target file according to the target input partition; . In an embodiment, the first generating unit may be configured to perform the operation S2412 described above, which is not described herein again.
And the second generation unit is used for generating a migration data acquisition script according to the partition range recorded by the first target file. In an embodiment, the second generating unit may be configured to perform the operation S2413 described above, which is not described herein again.
According to an embodiment of the present disclosure, a traversal unit includes: the traversal subunit and the target input partition determination subunit.
And the traversing subunit is used for traversing the names of the partitions to be migrated and determining the range of the available partitions to be migrated. In an embodiment, the traversal subunit may be configured to perform the operation S2411 described above, which is not described herein again.
And the target input partition determining subunit is used for determining a target input partition according to the available partition range to be migrated. In one embodiment, the target input partition determination subunit may be configured to perform the operation S2411 described above, which is not described herein again.
According to an embodiment of the present disclosure, the transmission module 850 includes: a second generation submodule and a migration submodule.
And the second generation submodule is used for generating a data migration transmission script according to the data to be migrated and the target migration partition. In an embodiment, the second generation submodule may be configured to perform the operation S251 described above, which is not described herein again.
And the migration sub-module is used for executing data migration transmission script to perform data migration with partition granularity according to the data migration direction. In an embodiment, the migration submodule may be configured to perform the operation S252 described above, and details are not described herein again.
According to an embodiment of the present disclosure, the second generation submodule includes: a third generation unit and a fourth generation unit.
The third generating unit is used for traversing the acquired data directory according to the migration data to generate a second target file. In an embodiment, the third generating unit may be configured to perform the operation S2511 described above, and is not described herein again.
And the fourth generating unit is used for generating a data migration transmission script according to the data of the second target file. In an embodiment, the fourth generating unit may be configured to perform the operation S2512 described above, and is not described herein again.
According to an embodiment of the present disclosure, further comprising: the device comprises a generating module and an executing module.
And the generating module is used for generating a corresponding hive partition script according to the second target file. In an embodiment, the generating module may be configured to perform the operation S310 described above, which is not described herein again.
The execution module is used for executing the hive partition script to establish a hive partition. In an embodiment, the execution module may be configured to execute the operation S320 described above, which is not described herein again.
According to the embodiment of the present disclosure, any plurality of the first determining module 810, the verifying module 820, the first obtaining module 830, the second obtaining module 840, and the transmitting module 850 may be combined into one module to be implemented, or any one of them may be split into a plurality of modules. Alternatively, at least part of the functionality of one or more of these modules may be combined with at least part of the functionality of the other modules and implemented in one module. According to an embodiment of the present disclosure, at least one of the first determining module 810, the verifying module 820, the first obtaining module 830, the second obtaining module 840 and the transmitting module 850 may be at least partially implemented as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented by hardware or firmware in any other reasonable manner of integrating or packaging a circuit, or implemented by any one of three implementations of software, hardware and firmware, or an appropriate combination of any several of them. Alternatively, at least one of the first determining module 810, the verifying module 820, the first obtaining module 830, the second obtaining module 840 and the transmitting module 850 may be at least partially implemented as a computer program module, which when executed may perform the corresponding functions.
FIG. 9 schematically illustrates a block diagram of an electronic device suitable for implementing a partition-based data migration method according to an embodiment of the present disclosure.
As shown in fig. 9, an electronic apparatus 900 according to an embodiment of the present disclosure includes a processor 901 which can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM) 902 or a program loaded from a storage portion 908 into a Random Access Memory (RAM) 903. Processor 901 can include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or related chipset(s) and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), and/or the like. The processor 901 may also include on-board memory for caching purposes. The processor 901 may comprise a single processing unit or a plurality of processing units for performing the different actions of the method flows according to embodiments of the present disclosure.
In the RAM 903, various programs and data necessary for the operation of the electronic apparatus 900 are stored. The processor 901, the ROM 902, and the RAM 903 are connected to each other through a bus 904. The processor 901 performs various operations of the method flows according to the embodiments of the present disclosure by executing programs in the ROM 902 and/or the RAM 903. Note that the programs may also be stored in one or more memories other than the ROM 902 and the RAM 903. The processor 901 may also perform various operations of the method flows according to embodiments of the present disclosure by executing programs stored in the one or more memories.
Electronic device 900 may also include input/output (I/O) interface 905, input/output (I/O) interface 905 also connected to bus 904, according to an embodiment of the present disclosure. The electronic device 900 may also include one or more of the following components connected to the I/O interface 905: an input portion 906 including a keyboard, a mouse, and the like; an output section 907 including components such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 908 including a hard disk and the like; and a communication section 909 including a network interface card such as a LAN card, a modem, or the like. The communication section 909 performs communication processing via a network such as the internet. The drive 910 is also connected to the I/O interface 905 as necessary. A removable medium 911 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 910 as necessary, so that a computer program read out therefrom is mounted into the storage section 908 as necessary.
The present disclosure also provides a computer-readable storage medium, which may be contained in the apparatus/device/system described in the above embodiments; or may exist separately and not be assembled into the device/apparatus/system. The computer-readable storage medium carries one or more programs which, when executed, implement a partition-based data migration method according to an embodiment of the present disclosure.
According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to embodiments of the present disclosure, a computer-readable storage medium may include the ROM 902 and/or RAM 903 described above and/or one or more memories other than the ROM 902 and RAM 903.
Embodiments of the present disclosure also include a computer program product comprising a computer program containing program code for performing the method illustrated in the flow chart. When the computer program product runs in a computer system, the program code is used for causing the computer system to realize the partition-based data migration method provided by the embodiment of the disclosure.
The computer program performs the above-described functions defined in the system/apparatus of the embodiments of the present disclosure when executed by the processor 901. The above described systems, devices, modules, units, etc. may be implemented by computer program modules according to embodiments of the present disclosure.
In one embodiment, the computer program may be hosted on a tangible storage medium such as an optical storage device, a magnetic storage device, or the like. In another embodiment, the computer program may also be transmitted, distributed in the form of a signal on a network medium, and downloaded and installed through the communication section 909 and/or installed from the removable medium 911. The computer program containing program code may be transmitted using any suitable network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.
In such an embodiment, the computer program may be downloaded and installed from a network via the communication section 909 and/or installed from the removable medium 911. The computer program, when executed by the processor 901, performs the above-described functions defined in the system of the embodiment of the present disclosure. The systems, devices, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.
In accordance with embodiments of the present disclosure, program code for executing computer programs provided by embodiments of the present disclosure may be written in any combination of one or more programming languages, and in particular, these computer programs may be implemented using high level procedural and/or object oriented programming languages, and/or assembly/machine languages. The programming language includes, but is not limited to, programming languages such as Java, C + +, python, the "C" language, or the like. The program code may execute entirely on the user computing device, partly on the user device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
It will be appreciated by a person skilled in the art that various combinations or/and combinations of features recited in the various embodiments of the disclosure and/or in the claims may be made, even if such combinations or combinations are not explicitly recited in the disclosure. In particular, various combinations and/or combinations of the features recited in the various embodiments and/or claims of the present disclosure may be made without departing from the spirit or teaching of the present disclosure. All such combinations and/or associations are within the scope of the present disclosure.
The embodiments of the present disclosure have been described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described separately above, this does not mean that the measures in the embodiments cannot be used advantageously in combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be devised by those skilled in the art without departing from the scope of the disclosure, and these alternatives and modifications are intended to fall within the scope of the disclosure.

Claims (12)

1. A data migration method based on partitions is applied to a multi-cluster distributed system, and is characterized in that the data migration method comprises the following steps:
determining data information to be migrated and a target migration partition according to the data migration configuration file;
carrying out validity check on the target migration partition;
when the target migration partition is determined to be effective, acquiring a data migration direction, wherein the data migration direction represents a data migration direction among multiple clusters;
executing a migration data acquisition script according to the data migration direction to acquire data to be migrated; and
and executing a data migration transmission script according to the data to be migrated, the target migration partition and the data migration direction so as to perform partition-granularity data migration.
2. The data migration method according to claim 1, wherein executing a migration data acquisition script according to the data migration direction to acquire data to be migrated comprises:
generating a migration data acquisition script according to the information of the data to be migrated;
authenticating the cluster of the partition to be migrated according to the data migration direction;
and storing the data to be migrated to the local according to the partition granularity.
3. The data migration method according to claim 2, wherein the generating a migration data acquisition script according to the to-be-migrated data information includes:
traversing the data information to be migrated to determine a target input partition;
generating a first target file according to the target input partition; and
and generating a migration data acquisition script according to the partition range recorded by the first target file.
4. The data migration method according to claim 3, wherein the data information to be migrated includes a name of a database to be migrated, a name of a data table to be migrated, and a name of a partition to be migrated, and the traversing the data information to be migrated to determine the target input partition includes:
traversing the names of the partitions to be migrated, and determining the range of the available partitions to be migrated;
and determining a target input partition according to the available partition range to be migrated.
5. The data migration method according to claim 1, wherein the executing data migration transmission script according to the data to be migrated, the target migration partition, and the data migration direction to perform partition-granular data migration comprises:
generating a data migration transmission script according to the data to be migrated and the target migration partition;
and executing the data migration transmission script to perform data migration with partition granularity according to the data migration direction.
6. The data migration method according to claim 5, wherein the generating a data migration transmission script according to the data to be migrated and the target migration partition comprises:
traversing the acquired data directory according to the data to be migrated to generate a second target file;
and generating a data migration transmission script according to the data of the second target file.
7. The data migration method according to claim 6, further comprising, before executing the migration data transfer script:
generating a corresponding hive partition script according to the second target file;
and executing the hive partition script to establish a hive partition.
8. The data migration method according to any one of claims 1 to 7, wherein the performing validity check on the target migration partition includes:
determining a target cluster according to the target migration partition;
when the target cluster is determined to have the target migration partition, determining that the target migration partition is invalid; and
when it is determined that the target cluster does not have the target migration partition, determining that the target migration partition is valid.
9. A partition-based data migration apparatus, the apparatus comprising:
the first determining module is used for determining the data information to be migrated and the target migration partition according to the data migration configuration file;
the verification module is used for verifying the validity of the target migration partition;
a first obtaining module, configured to obtain a data migration direction when it is determined that the target migration partition is valid, where the data migration direction represents a data migration direction between multiple clusters;
the second acquisition module is used for executing a migration data acquisition script according to the data migration direction so as to acquire data to be migrated;
and the transmission module is used for executing a data migration transmission script according to the data to be migrated, the target migration partition and the data migration direction so as to perform data migration with partition granularity.
10. An electronic device, comprising:
one or more processors;
a storage device for storing one or more programs,
wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the data migration method of any of claims 1-8.
11. A computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to perform a method of data migration according to any one of claims 1 to 8.
12. A computer program product comprising a computer program which, when executed by a processor, implements a method of data migration according to any one of claims 1 to 8.
CN202210817948.0A 2022-07-12 2022-07-12 Partition-based data migration method and device Pending CN115185886A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210817948.0A CN115185886A (en) 2022-07-12 2022-07-12 Partition-based data migration method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210817948.0A CN115185886A (en) 2022-07-12 2022-07-12 Partition-based data migration method and device

Publications (1)

Publication Number Publication Date
CN115185886A true CN115185886A (en) 2022-10-14

Family

ID=83517791

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210817948.0A Pending CN115185886A (en) 2022-07-12 2022-07-12 Partition-based data migration method and device

Country Status (1)

Country Link
CN (1) CN115185886A (en)

Similar Documents

Publication Publication Date Title
US8875120B2 (en) Methods and apparatus for providing software bug-fix notifications for networked computing systems
US20210042207A1 (en) Application programming interface security validation for system integration testing
US11170065B2 (en) Technology agnostic UI microservices
US11106641B2 (en) Supporting graph database backed object unmarshalling
CN114281803A (en) Data migration method, device, equipment, medium and program product
CN111444148B (en) Data transmission method and device based on MapReduce
CN116069725A (en) File migration method, device, apparatus, medium and program product
CN113132400B (en) Business processing method, device, computer system and storage medium
CN114237821A (en) Self-discovery method and device for Kubernetes container cluster, electronic device and storage medium
CN115185886A (en) Partition-based data migration method and device
CN113918525A (en) Data exchange scheduling method, system, electronic device, medium, and program product
CN114780807A (en) Service detection method, device, computer system and readable storage medium
CN110727945B (en) Virus scanning method, device and computer readable medium
CN112783903B (en) Method and device for generating update log
CN112988604A (en) Object testing method, testing system, electronic device and readable storage medium
CN113592645A (en) Data verification method and device
CN112988738A (en) Data slicing method and device for block chain
CN114363172B (en) Decoupling management method, device, equipment and medium for container group
CN116401319B (en) Data synchronization method and device, electronic equipment and computer readable storage medium
CN115484149B (en) Network switching method, network switching device, electronic equipment and storage medium
US20220019421A1 (en) Method and system for verification of patch installation
CN117131018A (en) Data processing method and device, electronic equipment and storage medium
CN114218330A (en) ES cluster selection method, ES cluster selection device, ES cluster selection apparatus, ES cluster selection medium, and program product
CN113360417A (en) Test method, session modifier, electronic device, and medium
CN114528592A (en) Service processing method, device, equipment, medium and program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination