CN113961625A - Task migration method for heterogeneous big data management platform - Google Patents

Task migration method for heterogeneous big data management platform Download PDF

Info

Publication number
CN113961625A
CN113961625A CN202111256715.XA CN202111256715A CN113961625A CN 113961625 A CN113961625 A CN 113961625A CN 202111256715 A CN202111256715 A CN 202111256715A CN 113961625 A CN113961625 A CN 113961625A
Authority
CN
China
Prior art keywords
data
hive
production environment
checking whether
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111256715.XA
Other languages
Chinese (zh)
Other versions
CN113961625B (en
Inventor
于洋
高经郡
李城军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Kejie Technology Co ltd
Original Assignee
Beijing Kejie Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Kejie Technology Co ltd filed Critical Beijing Kejie Technology Co ltd
Priority to CN202111256715.XA priority Critical patent/CN113961625B/en
Publication of CN113961625A publication Critical patent/CN113961625A/en
Application granted granted Critical
Publication of CN113961625B publication Critical patent/CN113961625B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Fuzzy Systems (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a task migration method for a heterogeneous big data management platform, which is characterized in that metadata information of a development environment is exported, data of the development environment is imported into a production environment through a configured algorithm, and then the data of the production environment is processed to be covered, and the data is kept unchanged, so that the task migration is automatically carried out, and the problem of repeated tasks is solved.

Description

Task migration method for heterogeneous big data management platform
Technical Field
The invention relates to the technical field of data management, in particular to a task migration method for a heterogeneous big data management platform.
Background
The current processing method for scheduling, tasks and task-related table migration among multiple environments of a data platform is to manually modify among multiple environments manually, firstly, after the tasks are tested and passed in a development environment, and then manually migrate the tasks to a production environment for use, but the consistency of the tasks cannot be guaranteed by manual operation. Another is to migrate the data of the development environment to the production environment completely, which may cause the problem already handled by the production environment to re-cover the problem of the development environment to the production environment when the development environment is not handled, and finally cause the task to repeat.
Disclosure of Invention
Aiming at the defects of the prior art, the invention aims to provide a task migration method for a heterogeneous big data management platform.
In order to achieve the purpose, the invention adopts the following technical scheme:
a task migration method for a heterogeneous big data management platform comprises the following specific processes:
exporting and importing metadata information of a development environment into a production environment, verifying the data when the data is imported, and executing corresponding operation according to a verification result; wherein:
for database data, using HiveDataBaseMerger to check data, checking whether a hive library exists in a production environment, and prompting a user to request for a cluster to create a database if the hive library does not exist;
for the data of the hive table, using HiveTableMeerger test data to test whether the hive table exists in the production environment, and if the hive table does not exist, inserting the hive table; if the hive table exists, further checking whether the hive table to be imported is modified relative to the original hive table, if not, not operating, keeping the data in an original state, if so, further checking whether the hive table has partitions or sub-buckets, if so, checking whether partition or sub-bucket information is changed, if so, prompting a user to modify, otherwise, modifying mysql to store the information;
for the data of the hive table field, using a hive ColumnMerger to check the data, firstly checking whether the hive table field exists in the production environment, if the hive table field does not exist, inserting the hive table field, if the hive table field exists, further checking whether the field is changed, if the field is changed, changing the corresponding field, otherwise, not operating, and keeping the data as it is;
for a scheduling task, checking data by using a scheduler Merger, checking whether the scheduling task exists in a production environment, if not, inserting the scheduling task, if so, checking whether information of the scheduling task changes, if so, covering and modifying the state of the scheduling task, and if not, not operating, and keeping the original state of the data;
further, the task information is checked for modifications using jobbasemeger, and if modifications are made, the original task information in the production environment is overwritten, and if no modifications are made, the data is kept as it is.
Further, the main table is checked whether the main table is modified or not by using MainTableMerger, if the main table is modified, the original main table in the production environment is covered, and if the main table is not modified, the original data state is kept; and for the association table with the unique key of the main table, verifying whether the data is modified by using a relatedMerger, if so, overwriting the original association table in the production environment, and if not, keeping the data as it is.
The invention has the beneficial effects that: the method leads out the metadata information of the development environment, leads the data of the development environment into the production environment through the configured algorithm, processes which data of the production environment need to be covered and which data are kept unchanged, and realizes that the problem of repeated tasks can not occur when the tasks are automatically migrated.
Drawings
FIG. 1 is a flow chart of a method in an embodiment of the invention.
Detailed Description
The present invention will be further described with reference to the accompanying drawings, and it should be noted that the present embodiment is based on the technical solution, and the detailed implementation and the specific operation process are provided, but the protection scope of the present invention is not limited to the present embodiment.
The embodiment provides a task migration method for a heterogeneous big data management platform, and as shown in fig. 1, the specific process is as follows:
exporting and importing metadata information of a development environment into a production environment, verifying the data when the data is imported, and executing corresponding operation according to a verification result; wherein:
for database data, using HiveDataBaseMerger to check data, checking whether a hive library exists in a production environment, and prompting a user to request for a cluster to create a database if the hive library does not exist; if so, not operating;
for the data of the hive table, using HiveTableMeerger test data to test whether the hive table exists in the production environment, and if the hive table does not exist, inserting the hive table; if the hive table exists, further checking whether the hive table to be imported is modified relative to the original hive table, if not, not operating, keeping the data in an original state, if so, further checking whether the hive table has partitions or sub-buckets, if so, checking whether partition or sub-bucket information is changed, if so, prompting a user to modify, otherwise, modifying mysql to store the information;
for the data of the hive table field, using a hive ColumnMerger to check the data, firstly checking whether the hive table field exists in the production environment, if the hive table field does not exist, inserting the hive table field, if the hive table field exists, further checking whether the field is changed, if the field is changed, changing the corresponding field, otherwise, not operating, and keeping the data as it is;
for a scheduling task, checking data by using a scheduler Merger, checking whether the scheduling task exists in a production environment, if not, inserting the scheduling task, if so, checking whether information of the scheduling task changes, if so, covering and modifying the state of the scheduling task, and if not, not operating, and keeping the original state of the data;
further, checking whether the task information is modified or not by using JobBaseMerger, if so, covering the original task information in the production environment, and if not, keeping the data in an original state; checking whether the main table is modified or not by using MainTableMerger, if so, covering the original main table in the production environment, and if not, keeping the data in an original state; and for the association table with the unique key of the main table, verifying whether the data is modified by using a relatedMerger, if so, overwriting the original association table in the production environment, and if not, keeping the data as it is.
Various corresponding changes and modifications can be made by those skilled in the art based on the above technical solutions and concepts, and all such changes and modifications should be included in the protection scope of the present invention.

Claims (3)

1. A task migration method for a heterogeneous big data management platform is characterized by comprising the following specific processes:
exporting and importing metadata information of a development environment into a production environment, verifying the data when the data is imported, and executing corresponding operation according to a verification result; wherein:
for database data, using HiveDataBaseMerger to check data, checking whether a hive library exists in a production environment, and prompting a user to request for a cluster to create a database if the hive library does not exist;
for the data of the hive table, using HiveTableMeerger test data to test whether the hive table exists in the production environment, and if the hive table does not exist, inserting the hive table; if the hive table exists, further checking whether the hive table to be imported is modified relative to the original hive table, if not, not operating, keeping the data in an original state, if so, further checking whether the hive table has partitions or sub-buckets, if so, checking whether partition or sub-bucket information is changed, if so, prompting a user to modify, otherwise, modifying mysql to store the information;
for the data of the hive table field, using a hive ColumnMerger to check the data, firstly checking whether the hive table field exists in the production environment, if the hive table field does not exist, inserting the hive table field, if the hive table field exists, further checking whether the field is changed, if the field is changed, changing the corresponding field, otherwise, not operating, and keeping the data as it is;
for the scheduling task, checking data by using a scheduler Merger, checking whether the scheduling task exists in a production environment, inserting the scheduling task if the scheduling task does not exist, checking whether the information of the scheduling task changes if the scheduling task exists, covering and modifying the state of the scheduling task if the information of the scheduling task changes, and not operating and keeping the original state of the data if the information of the scheduling task does not change.
2. The method of claim 1, wherein the task information is checked for modifications using jobbasemager, and if so, overrides the original task information in the production environment, and if not, leaves the data intact.
3. The method of claim 1, wherein the main table is verified for modifications using maintablemager, and if so, overrides the original main table in the production environment, and if not, leaves the data intact; and for the association table with the unique key of the main table, verifying whether the data is modified by using a relatedMerger, if so, overwriting the original association table in the production environment, and if not, keeping the data as it is.
CN202111256715.XA 2021-10-27 2021-10-27 Task migration method for heterogeneous big data management platform Active CN113961625B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111256715.XA CN113961625B (en) 2021-10-27 2021-10-27 Task migration method for heterogeneous big data management platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111256715.XA CN113961625B (en) 2021-10-27 2021-10-27 Task migration method for heterogeneous big data management platform

Publications (2)

Publication Number Publication Date
CN113961625A true CN113961625A (en) 2022-01-21
CN113961625B CN113961625B (en) 2022-06-07

Family

ID=79467596

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111256715.XA Active CN113961625B (en) 2021-10-27 2021-10-27 Task migration method for heterogeneous big data management platform

Country Status (1)

Country Link
CN (1) CN113961625B (en)

Citations (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103176988A (en) * 2011-12-21 2013-06-26 上海博腾信息科技有限公司 Data migration system based on software-as-a-service (SaaS)
US20130205028A1 (en) * 2012-02-07 2013-08-08 Rackspace Us, Inc. Elastic, Massively Parallel Processing Data Warehouse
CN103605663A (en) * 2013-10-22 2014-02-26 芜湖大学科技园发展有限公司 General database checking and metadata loading method
CN104573100A (en) * 2015-01-29 2015-04-29 无锡江南计算技术研究所 Step-by-step database synchronization method with autoincrement identifications
CN105740411A (en) * 2016-01-30 2016-07-06 武汉大学 SOA (Service-Oriented Architecture) and WebService based data migration method
CN108241632A (en) * 2016-12-23 2018-07-03 航天星图科技(北京)有限公司 A kind of data verification method of data base-oriented Data Migration
CN108959470A (en) * 2018-06-20 2018-12-07 郑州云海信息技术有限公司 A kind of database data cross-platform migration method and device
CN109508355A (en) * 2018-10-19 2019-03-22 平安科技(深圳)有限公司 A kind of data pick-up method, system and terminal device
CN109829009A (en) * 2018-12-28 2019-05-31 北京邮电大学 Configurable isomeric data real-time synchronization and visual system and method
CN109997125A (en) * 2016-09-15 2019-07-09 英国天然气控股有限公司 System for importing data to data storage bank
CN110069335A (en) * 2019-05-07 2019-07-30 江苏满运软件科技有限公司 Task processing system, method, computer equipment and storage medium
CN110505228A (en) * 2019-08-23 2019-11-26 上海宽带技术及应用工程研究中心 Big data processing method, system, medium and device based on edge cloud framework
CN111259006A (en) * 2019-11-19 2020-06-09 中国科学院计算机网络信息中心 Universal distributed heterogeneous data integrated physical aggregation, organization, release and service method and system
CN111930850A (en) * 2020-09-24 2020-11-13 腾讯科技(深圳)有限公司 Data verification method and device, computer equipment and storage medium
CN112035444A (en) * 2020-09-03 2020-12-04 中国银行股份有限公司 Method and device for transferring image data between heterogeneous systems without stopping
US10909120B1 (en) * 2016-03-30 2021-02-02 Groupon, Inc. Configurable and incremental database migration framework for heterogeneous databases
CN112328539A (en) * 2020-10-27 2021-02-05 深圳市赛宇景观设计工程有限公司 Data migration method based on big data
CN113434482A (en) * 2021-06-28 2021-09-24 平安国际智慧城市科技股份有限公司 Data migration method and device, computer equipment and storage medium
CN113468143A (en) * 2021-07-22 2021-10-01 咪咕数字传媒有限公司 Data migration method, system, computing device and storage medium

Patent Citations (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103176988A (en) * 2011-12-21 2013-06-26 上海博腾信息科技有限公司 Data migration system based on software-as-a-service (SaaS)
US20130205028A1 (en) * 2012-02-07 2013-08-08 Rackspace Us, Inc. Elastic, Massively Parallel Processing Data Warehouse
CN103605663A (en) * 2013-10-22 2014-02-26 芜湖大学科技园发展有限公司 General database checking and metadata loading method
CN104573100A (en) * 2015-01-29 2015-04-29 无锡江南计算技术研究所 Step-by-step database synchronization method with autoincrement identifications
CN105740411A (en) * 2016-01-30 2016-07-06 武汉大学 SOA (Service-Oriented Architecture) and WebService based data migration method
US10909120B1 (en) * 2016-03-30 2021-02-02 Groupon, Inc. Configurable and incremental database migration framework for heterogeneous databases
CN109997125A (en) * 2016-09-15 2019-07-09 英国天然气控股有限公司 System for importing data to data storage bank
CN108241632A (en) * 2016-12-23 2018-07-03 航天星图科技(北京)有限公司 A kind of data verification method of data base-oriented Data Migration
CN108959470A (en) * 2018-06-20 2018-12-07 郑州云海信息技术有限公司 A kind of database data cross-platform migration method and device
CN109508355A (en) * 2018-10-19 2019-03-22 平安科技(深圳)有限公司 A kind of data pick-up method, system and terminal device
CN109829009A (en) * 2018-12-28 2019-05-31 北京邮电大学 Configurable isomeric data real-time synchronization and visual system and method
CN110069335A (en) * 2019-05-07 2019-07-30 江苏满运软件科技有限公司 Task processing system, method, computer equipment and storage medium
CN110505228A (en) * 2019-08-23 2019-11-26 上海宽带技术及应用工程研究中心 Big data processing method, system, medium and device based on edge cloud framework
CN111259006A (en) * 2019-11-19 2020-06-09 中国科学院计算机网络信息中心 Universal distributed heterogeneous data integrated physical aggregation, organization, release and service method and system
CN112035444A (en) * 2020-09-03 2020-12-04 中国银行股份有限公司 Method and device for transferring image data between heterogeneous systems without stopping
CN111930850A (en) * 2020-09-24 2020-11-13 腾讯科技(深圳)有限公司 Data verification method and device, computer equipment and storage medium
CN112328539A (en) * 2020-10-27 2021-02-05 深圳市赛宇景观设计工程有限公司 Data migration method based on big data
CN113434482A (en) * 2021-06-28 2021-09-24 平安国际智慧城市科技股份有限公司 Data migration method and device, computer equipment and storage medium
CN113468143A (en) * 2021-07-22 2021-10-01 咪咕数字传媒有限公司 Data migration method, system, computing device and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
扎心了,老铁: "Hive学习之路(一)Hive初识", 《HTTPS://WWW.CNBLOGS.COM/QINGYUNZONG/P/8707885.HTML》 *
胡静: "基于Hadoop平台的服务调度管理系统设计与实现", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 *
阿里云开发者社区: "Hive关于merge的几个参数", 《HTTPS://DEVELOPER.ALIYUN.COM/ARTICLE/476804》 *

Also Published As

Publication number Publication date
CN113961625B (en) 2022-06-07

Similar Documents

Publication Publication Date Title
CN110069572B (en) HIVE task scheduling method, device, equipment and storage medium based on big data platform
CN110209650B (en) Data normalization and migration method and device, computer equipment and storage medium
CN103793424B (en) database data migration method and system
CN106603264A (en) Method and equipment for positioning fault root
CN106021445B (en) It is a kind of to load data cached method and device
CN106777101B (en) Data processing engine
CN114399227A (en) Production scheduling method and device based on digital twins and computer equipment
CN116483586B (en) Data efficient processing method based on dynamic array
CN107168758A (en) The code compilation inspection method and device of many code libraries
CN109005198A (en) A kind of controller attack protection security strategy generation method and system
CN105528381A (en) Database data migration method and system
CN104298761A (en) Implementation method for master data matching between heterogeneous software systems
CN112181477A (en) Complex event processing method and device and terminal equipment
CN110134646B (en) Knowledge platform service data storage and integration method and system
CN110705969A (en) Transformer substation monitoring system, main station and method for automatically associating measuring point ID
CN106708902A (en) Database data migration method and system
CN113961625B (en) Task migration method for heterogeneous big data management platform
CN105630778A (en) DB data migration method and system
CN116540638B (en) Method, device and storage medium for post-processing CAM numerical control machining program
CN111625330A (en) Cross-thread task processing method and device, server and storage medium
CN116627609A (en) Hive batch processing-based scheduling method and device
US20150212799A1 (en) Migration between model elements of different types in a modeling environment
CN109165325A (en) Method, apparatus, equipment and computer readable storage medium for cutting diagram data
CN111538715B (en) Method and device for migrating wind control scheme and electronic equipment
CN114116503A (en) Test method, test device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant