CN113961625A

CN113961625A - Task migration method for heterogeneous big data management platform

Info

Publication number: CN113961625A
Application number: CN202111256715.XA
Authority: CN
Inventors: 于洋; 高经郡; 李城军
Original assignee: Beijing Kejie Technology Co ltd
Current assignee: Beijing Kejie Technology Co ltd
Priority date: 2021-10-27
Filing date: 2021-10-27
Publication date: 2022-01-21
Anticipated expiration: 2041-10-27
Also published as: CN113961625B

Abstract

The invention discloses a task migration method for a heterogeneous big data management platform, which is characterized in that metadata information of a development environment is exported, data of the development environment is imported into a production environment through a configured algorithm, and then the data of the production environment is processed to be covered, and the data is kept unchanged, so that the task migration is automatically carried out, and the problem of repeated tasks is solved.

Description

Task migration method for heterogeneous big data management platform

Technical Field

The invention relates to the technical field of data management, in particular to a task migration method for a heterogeneous big data management platform.

Background

The current processing method for scheduling, tasks and task-related table migration among multiple environments of a data platform is to manually modify among multiple environments manually, firstly, after the tasks are tested and passed in a development environment, and then manually migrate the tasks to a production environment for use, but the consistency of the tasks cannot be guaranteed by manual operation. Another is to migrate the data of the development environment to the production environment completely, which may cause the problem already handled by the production environment to re-cover the problem of the development environment to the production environment when the development environment is not handled, and finally cause the task to repeat.

Disclosure of Invention

Aiming at the defects of the prior art, the invention aims to provide a task migration method for a heterogeneous big data management platform.

In order to achieve the purpose, the invention adopts the following technical scheme:

a task migration method for a heterogeneous big data management platform comprises the following specific processes:

exporting and importing metadata information of a development environment into a production environment, verifying the data when the data is imported, and executing corresponding operation according to a verification result; wherein:

for database data, using HiveDataBaseMerger to check data, checking whether a hive library exists in a production environment, and prompting a user to request for a cluster to create a database if the hive library does not exist;

for the data of the hive table, using HiveTableMeerger test data to test whether the hive table exists in the production environment, and if the hive table does not exist, inserting the hive table; if the hive table exists, further checking whether the hive table to be imported is modified relative to the original hive table, if not, not operating, keeping the data in an original state, if so, further checking whether the hive table has partitions or sub-buckets, if so, checking whether partition or sub-bucket information is changed, if so, prompting a user to modify, otherwise, modifying mysql to store the information;

for the data of the hive table field, using a hive ColumnMerger to check the data, firstly checking whether the hive table field exists in the production environment, if the hive table field does not exist, inserting the hive table field, if the hive table field exists, further checking whether the field is changed, if the field is changed, changing the corresponding field, otherwise, not operating, and keeping the data as it is;

for a scheduling task, checking data by using a scheduler Merger, checking whether the scheduling task exists in a production environment, if not, inserting the scheduling task, if so, checking whether information of the scheduling task changes, if so, covering and modifying the state of the scheduling task, and if not, not operating, and keeping the original state of the data;

further, the task information is checked for modifications using jobbasemeger, and if modifications are made, the original task information in the production environment is overwritten, and if no modifications are made, the data is kept as it is.

Further, the main table is checked whether the main table is modified or not by using MainTableMerger, if the main table is modified, the original main table in the production environment is covered, and if the main table is not modified, the original data state is kept; and for the association table with the unique key of the main table, verifying whether the data is modified by using a relatedMerger, if so, overwriting the original association table in the production environment, and if not, keeping the data as it is.

The invention has the beneficial effects that: the method leads out the metadata information of the development environment, leads the data of the development environment into the production environment through the configured algorithm, processes which data of the production environment need to be covered and which data are kept unchanged, and realizes that the problem of repeated tasks can not occur when the tasks are automatically migrated.

Drawings

FIG. 1 is a flow chart of a method in an embodiment of the invention.

Detailed Description

The present invention will be further described with reference to the accompanying drawings, and it should be noted that the present embodiment is based on the technical solution, and the detailed implementation and the specific operation process are provided, but the protection scope of the present invention is not limited to the present embodiment.

The embodiment provides a task migration method for a heterogeneous big data management platform, and as shown in fig. 1, the specific process is as follows:

for database data, using HiveDataBaseMerger to check data, checking whether a hive library exists in a production environment, and prompting a user to request for a cluster to create a database if the hive library does not exist; if so, not operating;

further, checking whether the task information is modified or not by using JobBaseMerger, if so, covering the original task information in the production environment, and if not, keeping the data in an original state; checking whether the main table is modified or not by using MainTableMerger, if so, covering the original main table in the production environment, and if not, keeping the data in an original state; and for the association table with the unique key of the main table, verifying whether the data is modified by using a relatedMerger, if so, overwriting the original association table in the production environment, and if not, keeping the data as it is.

Various corresponding changes and modifications can be made by those skilled in the art based on the above technical solutions and concepts, and all such changes and modifications should be included in the protection scope of the present invention.

Claims

1. A task migration method for a heterogeneous big data management platform is characterized by comprising the following specific processes:

for the scheduling task, checking data by using a scheduler Merger, checking whether the scheduling task exists in a production environment, inserting the scheduling task if the scheduling task does not exist, checking whether the information of the scheduling task changes if the scheduling task exists, covering and modifying the state of the scheduling task if the information of the scheduling task changes, and not operating and keeping the original state of the data if the information of the scheduling task does not change.

2. The method of claim 1, wherein the task information is checked for modifications using jobbasemager, and if so, overrides the original task information in the production environment, and if not, leaves the data intact.

3. The method of claim 1, wherein the main table is verified for modifications using maintablemager, and if so, overrides the original main table in the production environment, and if not, leaves the data intact; and for the association table with the unique key of the main table, verifying whether the data is modified by using a relatedMerger, if so, overwriting the original association table in the production environment, and if not, keeping the data as it is.