CN111258985A

CN111258985A - Data cluster migration method and device

Info

Publication number: CN111258985A
Application number: CN202010052449.8A
Authority: CN
Inventors: 陈开�; 匡蕴娟; 周凯; 卢祥光
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2020-01-17
Filing date: 2020-01-17
Publication date: 2020-06-09

Abstract

The invention provides a data cluster migration method and a device, wherein the data cluster migration method comprises the following steps: generating a parameter table according to a physical table of a data cluster to be migrated; the parameters in the parameter table comprise: the method comprises the steps of limiting conditions, priority, a corresponding library and a corresponding table of a target data cluster, a job state, a job starting time and a job ending time; generating a data export instruction and an import instruction according to the metadata of the data cluster to be migrated and the parameter table; generating export data which can be migrated to the target data cluster according to the data cluster to be migrated and the migration instruction; and importing the export data into the target data cluster according to the import instruction. The data cluster migration method and device provided by the invention can be suitable for the table-level data migration between two clusters and have universality.

Description

Data cluster migration method and device

Technical Field

The invention relates to the technical field of databases, in particular to the technical field of big data in the financial industry, and particularly relates to a data cluster migration method and device.

Background

Nowadays, with the explosive and high-speed development of big data, the stored data is increased at the TB level every day, and along with the stricter and stricter requirements on supervision data, the historical data required by new business types such as AI and the like is longer and longer. Leading to a contradiction between increasing data storage requirements and current overall cluster capacity. In order to solve the problem of economic expansion. Introducing a distributed big data platform for short: the task of migrating data from the current cluster to the target cluster is at hand.

In the prior art, the situation that two data clusters coexist does not occur, and a flexible and quick method for realizing cross-cluster relocation of data is urgently needed.

Disclosure of Invention

Aiming at the problems in the prior art, the data cluster migration method and the data cluster migration device provided by the invention can be suitable for the table-level data migration between two clusters and have universality; the method is suitable for various flexibly split tables and large-batch, multi-concurrency and full-automatic data synchronization; therefore, the development period and the cost are greatly reduced, the workload is reduced, and the online deployment can be rapidly realized.

In order to solve the technical problems, the invention provides the following technical scheme:

in a first aspect, the present invention provides a data cluster migration method, including:

generating a parameter table according to a physical table of a data cluster to be migrated; the parameters in the parameter table comprise: the method comprises the steps of limiting conditions, priority, a corresponding library and a corresponding table of a target data cluster, a job state, a job starting time and a job ending time;

generating a data export instruction and an import instruction according to the metadata of the data cluster to be migrated and the parameter table;

generating export data which can be migrated to the target data cluster according to the data cluster to be migrated and the migration instruction;

and importing the export data into the target data cluster according to the import instruction.

In one embodiment, the export and import instructions are sql instructions.

In an embodiment, the export data is a set of data texts of the logical table of the to-be-migrated data cluster corresponding to the disk.

In one embodiment, the data cluster migration method further includes: and updating the job state in the parameter table into export completion, job starting time and job ending time.

In a second aspect, the present invention provides a data cluster migration apparatus, including:

the parameter table generating unit is used for generating a parameter table according to the physical table of the data cluster to be migrated; the parameters in the parameter table comprise: the method comprises the steps of limiting conditions, priority, a corresponding library and a corresponding table of a target data cluster, a job state, a job starting time and a job ending time;

the instruction generating unit is used for generating a data export instruction and an import instruction according to the metadata of the data cluster to be migrated and the parameter table;

the export data generating unit is used for generating export data which can be migrated to the target data cluster according to the data cluster to be migrated and the migration instruction;

and the export data import unit is used for importing the export data into the target data cluster according to the import instruction.

In one embodiment, the export and import instructions are sql instructions.

In one embodiment, the data cluster migration apparatus further includes:

and the parameter table updating unit is used for updating the job state in the parameter table into export completion, job starting time and job finishing time.

In a third aspect, the present invention provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the data cluster migration method when executing the program.

In a fourth aspect, the invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the data cluster migration method.

As can be seen from the above description, the data cluster migration method and apparatus provided in the embodiments of the present invention implement automatic text export and text import by using a parameter table, thereby completing service handling. Specifically, firstly, data of a logic table is derived from the target cluster to a text, then the text is loaded into a temporary storage table corresponding to the target cluster M, and after the temporary storage table is subjected to logic processing, the data is imported into the final target table. Compared with the existing data migration method, the data migration method has the following advantages:

(1) the method can be suitable for the table-level data migration between two clusters, and has universality;

(2) the method can be suitable for various flexibly split tables and large-batch, multi-concurrent and full-automatic data synchronization;

(3) the development period and the cost are greatly reduced, the workload is reduced, and the online deployment can be rapidly realized.

In summary, the data cluster migration method and device provided by the invention can meet the requirement of a cross-cluster data rapid migration method under various complex conditions. Therefore, large-batch and large-data-volume cross-platform data replication is met.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a first flowchart illustrating a data cluster migration method according to an embodiment of the present invention;

FIG. 2 is a second flowchart illustrating a data cluster migration method according to an embodiment of the present invention;

fig. 3 is a schematic flow chart of a data cluster migration method in a specific application example of the present invention;

fig. 4 is a first schematic structural diagram of a data cluster migration apparatus in an embodiment of the present invention;

fig. 5 is a schematic structural diagram of a data cluster migration apparatus in an embodiment of the present invention;

fig. 6 is a schematic structural diagram of an electronic device in an embodiment of the invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In view of the lack of coexistence of two data clusters and the lack of a flexible and fast method for implementing cross-cluster relocation of data in the prior art, an embodiment of the present invention provides a specific implementation manner of a data cluster migration method, and referring to fig. 1, the method specifically includes the following contents:

step 100: and generating a parameter table according to the physical table of the data cluster to be migrated.

It is understood that the parameters in the parameter table in step 100 include: restrictions, priorities, a corresponding library and a corresponding table of the target data cluster, job status, job start time, and job end time.

In addition, the parameter table contains field information and a sub-process parameter file 2. Step 100 records the state of the synchronous operation, encrypted user name/password, derived code system, time of starting and ending synchronization, derived where condition, condition of processing derived temporary table, priority level of operation, library name corresponding to current cluster, table name, whether to automatically derive and import, whether to trim derived field, and whether to need bit expansion of derived field according to the subprocess parameter file 202, fork multiple subprocesses.

Step 200: and generating a data export instruction and an import instruction according to the metadata of the data cluster to be migrated and the parameter table.

Specifically, field related information is inquired from the metadata of the current cluster through the library name and the table name registered in the parameter table, and as a result, one line of each field is used for generating corresponding inquiry sql according to the field type and whether the coalses, the null and the bit expansion are added or not registered in the parameter table device table. A select statement is generated that can be executed, including a list of queries for select and the where condition.

Step 300: and generating export data which can be migrated to the target data cluster according to the data cluster to be migrated and the migration instruction.

Specifically, parameters of the synchronous jobs received by the parameter table and metadata table field information of the current cluster are analyzed. Firstly, generating login information of an export script and a path of a floor text, aiming at table field information of metadata of a current cluster, combining information sent by a parameter table, processing each field according to the type of the field, and generating an export statement.

Step 400: and importing the export data into the target data cluster according to the import instruction.

As can be seen from the above description, the data cluster migration method provided in the embodiment of the present invention realizes automatic text export and text import by using a parameter table, thereby completing service handling. Specifically, firstly, data of a logic table is derived from the target cluster to a text, then the text is loaded into a temporary storage table corresponding to the target cluster M, and after the temporary storage table is subjected to logic processing, the data is imported into the final target table. Compared with the existing data migration method, the data migration method has the following advantages:

In conclusion, the data cluster migration method provided by the invention can meet the requirement of a cross-cluster data rapid migration method under various complex conditions. Therefore, large-batch and large-data-volume cross-platform data replication is met.

In one embodiment, the export and import instructions are sql instructions.

In an embodiment, referring to fig. 2, after step 300, the data cluster migration method further includes:

step 500: and updating the job state in the parameter table into export completion, job starting time and job ending time.

To further illustrate the present solution, the present invention provides a specific application example of the data cluster migration method, and the specific application example specifically includes the following contents, see fig. 3.

In this specific application example, the current cluster is referred to as a T cluster for short, and the target cluster is referred to as an M cluster for short.

S0: a physical table A in the cluster T to be synchronized is configured as a job A1.

Job A1 is a table that stores parameters (constraints, priorities, cluster M corresponding libraries and tables, job status, job start, end time, etc.) of physical table A into a parameter table.

Job a1 includes field information 201 and a child process parameter file 202. And finishing the subsequent steps according to the subprocess parameter file 202 and the fork multiple subprocesses. And the system is responsible for receiving the operation needing to be synchronized, the parameter table records the state of the synchronous operation, the encrypted user name/password, the derived code system, the time of starting and ending the synchronization, the derived where condition, the condition of processing the derived temporary table, the priority level of the operation, the library name of the corresponding cluster M, the table name, whether to automatically derive and import, whether to trim the derived field and whether to expand the derived field.

Next, the table field information of the parameter of the synchronization job and the cluster T metadata received in the parameter table is analyzed. Firstly, generating login information of an export script and a path of a floor text, and aiming at table field information of cluster T metadata, combining data sent by a parameter table. Each field is processed according to its type, generating a derived statement.

S1: cluster T metadata and job a1 parameters are generated, and the sql command for the corresponding export file is generated.

And inquiring field related information in the cluster T metadata through the library name and the table name registered in the parameter table, wherein the result is that each field has one line, and the corresponding inquiry sql is generated according to the field type and in combination with whether coalesce is added, whether null is removed and whether bit is expanded which are registered in the parameter table. Query statement generation 301 generates a select statement that can be executed, including a list of queries for select and the where condition.

And acquiring the encrypted user name/password from the parameter table, calling user name/password decoding, and decoding the user password. Concatenate and generate a select statement from the query statement. A decoding module is called to decode the logged user/password and the logged user derivative script. In addition, step S1 defines a table for storing data for which an error occurred when an error is reported.

S2: the sql command generated at step S1 is executed, and the corresponding export file a2 is generated. And updates the status of the job of job a1 in the parameter table apparatus to export completion, and job start and end times.

And calling the export script, generating a log, and updating the parameter table, wherein the state is 1, the starting time and the ending time. The log is generated and analyzed to derive the number of records, the value of return. Then, the return value is acquired, it is determined that the return result is not 0, and the call-up derivation statement 401 is executed again to perform derivation, with the state 1, the start time, and the end time.

S3: and reading the cluster T metadata and the job parameters to generate an import file sql command.

And generating an imported appearance sentence, and inquiring field related information in the T metadata of the cluster through the library name and the table name registered in the parameter table, so that each field has one line and the corresponding appearance-establishing sql is generated. And generating a statement inserted into the target table, and inquiring and deriving a where condition in the T metadata of the cluster through the library name and the table name registered in the parameter table to serve as a condition of the delete target so as to support the rerun. And then, judging whether the gds process is started on the server or not, and if not, starting the gds process.

S4: step S3 is executed to generate an import file sql command, and the file A2 generated in step S2 is imported into the table A corresponding to the cluster M.

It is to be understood that step S4 further includes updating the status of the job of job a1 in the parameter table means to import complete, and the job start and end times.

Specifically, the cluster M is connected, the table building statement is generated by running, the table building statement is generated, and the log is executed. And analyzing the log generated by operating the exterior table building statement to determine whether the log has a keyword 'ERROR:', if so, the log fails, and returning a value of 1. Analyzing and running to generate a log generated by inserting a target table language sentence, judging whether a keyword 'ERROR:' exists or not, if so, failing, and returning a value of 1. If correct, return a value of 0.

Next, it is determined whether the deriving of the text is successful. And acquiring a return value, judging that the result of the return value is not 0, re-executing the operation of the exterior table building statement, operating to generate an insertion target table statement, importing the insertion target table statement, and updating the state of the parameter table device 1 to be 3, the starting time and the ending time. It is to be understood that when there are many tables, the steps S1 through S4 are concurrently performed until all jobs are in a completed state.

The example explanation here is made in connection with synchronizing a table dwdatat.t 15_ MF1_ pthrsopron _ RESV from cluster T to a table dwdatat.t 15_ PRS _ pthrsopron _ RESV for cluster M: is a table of parameters executed and established in the cluster T. Metadata stored in columns on cluster T, see table 1 for example dwpdata. T15_ MF1_ pthrsopron _ RESV.

Table 1 cluster T metadata table

The contents of table 1 are then encapsulated in a script trans _ schedule _ mp _07a.pl, the function in the script: get _ exportsql; fexp _ script; gen _ foreign _ sql, gen _ temp _ to _ data _ sql, and final _ import.

The specific execution program process comprises the following steps:

1) execute parameter table on cluster T sql fxp < parameter table sql &

2) Para _ config, trans _ schedule _ mp _07a.pl are stored on the cluster T and the program is called up.

Perl trans_schedule_mpp_07a.pl&

3) If a new table is synchronized, the following operations are performed: is a table of parameters executed and established in the cluster T. Add 1 record in the parameter table sql. If para _ config is configured as 10, trans _ schedule _ mp _07a. pl will invoke 10 child processes. Meanwhile, the record with the table data _ trans _ mpp _04 and status of 0 is scanned. Each sub-process is responsible for synchronizing a table. Each sub-process executes the functions in sequence: get _ exportsql, fexp _ script, gen _ foreign _ sql, gen _ temp _ to _ data _ sql, final _ import, after all functions are executed successfully, the table of the cluster T is, for example: data of dwdpata. T15_ MF1_ PTHRSOPN _ RESV is imported into a table of the cluster M, for example: dwdpdata. T15_ PRS _ PTHRSOPN _ RESV.

Based on the same inventive concept, the embodiment of the present application further provides a data cluster migration apparatus, which can be used to implement the methods described in the foregoing embodiments, such as the following embodiments. Because the principle of the data cluster migration apparatus for solving the problem is similar to that of the data cluster migration method, the implementation of the data cluster migration apparatus can be referred to the implementation of the data cluster migration method, and repeated details are not repeated. As used hereinafter, the term "unit" or "module" may be a combination of software and/or hardware that implements a predetermined function. While the system described in the embodiments below is preferably implemented in software, implementations in hardware, or a combination of software and hardware are also possible and contemplated.

An embodiment of the present invention provides a specific implementation manner of a data cluster migration apparatus capable of implementing a data cluster migration method, and referring to fig. 4, the data cluster migration apparatus specifically includes the following contents:

the parameter table generating unit 10 is configured to generate a parameter table according to a physical table of a data cluster to be migrated; the parameters in the parameter table comprise: the method comprises the steps of limiting conditions, priority, a corresponding library and a corresponding table of a target data cluster, a job state, a job starting time and a job ending time;

the instruction generating unit 20 is configured to generate a data export instruction and an import instruction according to the metadata of the data cluster to be migrated and the parameter table;

an export data generating unit 30, configured to generate export data that can be migrated to the target data cluster according to the to-be-migrated data cluster and the migration instruction;

and the export data import unit 40 is configured to import the export data into the target data cluster according to the import instruction.

Preferably, the export instruction and the import instruction are sql instructions.

Preferably, the export data is a set of data texts of the logical table of the data cluster to be migrated, which corresponds to the disk.

Referring to fig. 5, preferably, the data cluster migrating apparatus further includes:

a parameter table updating unit 50, configured to update the job status in the parameter table to be export completion, job start time, and job end time.

As can be seen from the above description, the data cluster migration apparatus provided in the embodiment of the present invention implements automatic text export and text import by using a parameter table, thereby completing service transaction. Specifically, firstly, data of a logic table is derived from the target cluster to a text, then the text is loaded into a temporary storage table corresponding to the target cluster M, and after the temporary storage table is subjected to logic processing, the data is imported into the final target table. Compared with the existing data migration method, the data migration method has the following advantages:

In summary, the data cluster migration device provided by the invention can meet the requirement of a cross-cluster data rapid migration method under various complex conditions. Therefore, large-batch and large-data-volume cross-platform data replication is met.

An embodiment of the present application further provides a specific implementation manner of an electronic device, which is capable of implementing all steps in the data cluster migration method in the foregoing embodiment, and with reference to fig. 6, the electronic device specifically includes the following contents:

a processor (processor)1201, a memory (memory)1202, a communication interface 1203, and a bus 1204;

the processor 1201, the memory 1202 and the communication interface 1203 complete communication with each other through the bus 1204; the communication interface 1203 is configured to implement information transmission between related devices, such as a server-side device, a storage device, and a client device.

The processor 1201 is configured to call the computer program in the memory 1202, and the processor executes the computer program to implement all the steps in the data cluster migration method in the above embodiments, for example, the processor executes the computer program to implement the following steps:

step 100: generating a parameter table according to a physical table of a data cluster to be migrated; the parameters in the parameter table comprise: the method comprises the steps of limiting conditions, priority, a corresponding library and a corresponding table of a target data cluster, a job state, a job starting time and a job ending time;

step 200: generating a data export instruction and an import instruction according to the metadata of the data cluster to be migrated and the parameter table;

step 300: generating export data which can be migrated to the target data cluster according to the data cluster to be migrated and the migration instruction;

step 400: and importing the export data into the target data cluster according to the import instruction. A

As can be seen from the above description, the electronic device in the embodiment of the present application implements automatic text export and automatic text import by using a parameter table manner, thereby completing service transaction. Specifically, firstly, data of a logic table is derived from the target cluster to a text, then the text is loaded into a temporary storage table corresponding to the target cluster M, and after the temporary storage table is subjected to logic processing, the data is imported into the final target table. Compared with the existing data migration method, the data migration method has the following advantages:

In summary, the electronic device in the embodiment of the present application can satisfy a cross-cluster data fast relocation method under various complex conditions. Therefore, large-batch and large-data-volume cross-platform data replication is met.

Embodiments of the present application further provide a computer-readable storage medium capable of implementing all steps in the data cluster migration method in the foregoing embodiments, where the computer-readable storage medium stores thereon a computer program, and the computer program, when executed by a processor, implements all steps of the data cluster migration method in the foregoing embodiments, for example, when the processor executes the computer program, the processor implements the following steps:

As can be seen from the above description, the computer-readable storage medium in the embodiment of the present application implements automatic text export and automatic text import by using a parameter table, thereby completing service transaction. Specifically, firstly, data of a logic table is derived from the target cluster to a text, then the text is loaded into a temporary storage table corresponding to the target cluster M, and after the temporary storage table is subjected to logic processing, the data is imported into the final target table. Compared with the existing data migration method, the data migration method has the following advantages:

In summary, the computer-readable storage medium in the embodiment of the present application can satisfy various complex conditions for fast data relocation across clusters. Therefore, large-batch and large-data-volume cross-platform data replication is met.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the hardware + program class embodiment, since it is substantially similar to the method embodiment, the description is simple, and the relevant points can be referred to the partial description of the method embodiment.

The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

Although the present application provides method steps as in an embodiment or a flowchart, more or fewer steps may be included based on conventional or non-inventive labor. The order of steps recited in the embodiments is merely one manner of performing the steps in a multitude of orders and does not represent the only order of execution. When an actual apparatus or client product executes, it may execute sequentially or in parallel (e.g., in the context of parallel processors or multi-threaded processing) according to the embodiments or methods shown in the figures.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The principle and the implementation mode of the invention are explained by applying specific embodiments in the invention, and the description of the embodiments is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A data cluster migration method is characterized by comprising the following steps:

2. The data cluster migration method according to claim 1, wherein the export instruction and the import instruction are sql instructions.

3. The data cluster migration method according to claim 1, wherein the export data is a set of data texts of the logical table of the data cluster to be migrated, which correspond to a disk.

4. The data cluster migration method according to claim 1, further comprising: and updating the job state in the parameter table into export completion, job starting time and job ending time.

5. A data cluster migration apparatus, comprising:

6. The data cluster migration apparatus according to claim 5, wherein the export instruction and the import instruction are sql instructions.

7. The data cluster migration apparatus according to claim 5, wherein the export data is a set of data texts of the logical table of the data cluster to be migrated corresponding to a disk.

8. The data cluster migration apparatus according to claim 5, further comprising:

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the data cluster migration method according to any one of claims 1 to 4 when executing the program.

10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the data cluster migration method according to any one of claims 1 to 4.