CN113961625A - Task migration method for heterogeneous big data management platform - Google Patents
Task migration method for heterogeneous big data management platform Download PDFInfo
- Publication number
- CN113961625A CN113961625A CN202111256715.XA CN202111256715A CN113961625A CN 113961625 A CN113961625 A CN 113961625A CN 202111256715 A CN202111256715 A CN 202111256715A CN 113961625 A CN113961625 A CN 113961625A
- Authority
- CN
- China
- Prior art keywords
- data
- hive
- production environment
- checking whether
- information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2471—Distributed queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
- G06F9/4881—Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/30—Computing systems specially adapted for manufacturing
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Probability & Statistics with Applications (AREA)
- Fuzzy Systems (AREA)
- Computing Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a task migration method for a heterogeneous big data management platform, which is characterized in that metadata information of a development environment is exported, data of the development environment is imported into a production environment through a configured algorithm, and then the data of the production environment is processed to be covered, and the data is kept unchanged, so that the task migration is automatically carried out, and the problem of repeated tasks is solved.
Description
Technical Field
The invention relates to the technical field of data management, in particular to a task migration method for a heterogeneous big data management platform.
Background
The current processing method for scheduling, tasks and task-related table migration among multiple environments of a data platform is to manually modify among multiple environments manually, firstly, after the tasks are tested and passed in a development environment, and then manually migrate the tasks to a production environment for use, but the consistency of the tasks cannot be guaranteed by manual operation. Another is to migrate the data of the development environment to the production environment completely, which may cause the problem already handled by the production environment to re-cover the problem of the development environment to the production environment when the development environment is not handled, and finally cause the task to repeat.
Disclosure of Invention
Aiming at the defects of the prior art, the invention aims to provide a task migration method for a heterogeneous big data management platform.
In order to achieve the purpose, the invention adopts the following technical scheme:
a task migration method for a heterogeneous big data management platform comprises the following specific processes:
exporting and importing metadata information of a development environment into a production environment, verifying the data when the data is imported, and executing corresponding operation according to a verification result; wherein:
for database data, using HiveDataBaseMerger to check data, checking whether a hive library exists in a production environment, and prompting a user to request for a cluster to create a database if the hive library does not exist;
for the data of the hive table, using HiveTableMeerger test data to test whether the hive table exists in the production environment, and if the hive table does not exist, inserting the hive table; if the hive table exists, further checking whether the hive table to be imported is modified relative to the original hive table, if not, not operating, keeping the data in an original state, if so, further checking whether the hive table has partitions or sub-buckets, if so, checking whether partition or sub-bucket information is changed, if so, prompting a user to modify, otherwise, modifying mysql to store the information;
for the data of the hive table field, using a hive ColumnMerger to check the data, firstly checking whether the hive table field exists in the production environment, if the hive table field does not exist, inserting the hive table field, if the hive table field exists, further checking whether the field is changed, if the field is changed, changing the corresponding field, otherwise, not operating, and keeping the data as it is;
for a scheduling task, checking data by using a scheduler Merger, checking whether the scheduling task exists in a production environment, if not, inserting the scheduling task, if so, checking whether information of the scheduling task changes, if so, covering and modifying the state of the scheduling task, and if not, not operating, and keeping the original state of the data;
further, the task information is checked for modifications using jobbasemeger, and if modifications are made, the original task information in the production environment is overwritten, and if no modifications are made, the data is kept as it is.
Further, the main table is checked whether the main table is modified or not by using MainTableMerger, if the main table is modified, the original main table in the production environment is covered, and if the main table is not modified, the original data state is kept; and for the association table with the unique key of the main table, verifying whether the data is modified by using a relatedMerger, if so, overwriting the original association table in the production environment, and if not, keeping the data as it is.
The invention has the beneficial effects that: the method leads out the metadata information of the development environment, leads the data of the development environment into the production environment through the configured algorithm, processes which data of the production environment need to be covered and which data are kept unchanged, and realizes that the problem of repeated tasks can not occur when the tasks are automatically migrated.
Drawings
FIG. 1 is a flow chart of a method in an embodiment of the invention.
Detailed Description
The present invention will be further described with reference to the accompanying drawings, and it should be noted that the present embodiment is based on the technical solution, and the detailed implementation and the specific operation process are provided, but the protection scope of the present invention is not limited to the present embodiment.
The embodiment provides a task migration method for a heterogeneous big data management platform, and as shown in fig. 1, the specific process is as follows:
exporting and importing metadata information of a development environment into a production environment, verifying the data when the data is imported, and executing corresponding operation according to a verification result; wherein:
for database data, using HiveDataBaseMerger to check data, checking whether a hive library exists in a production environment, and prompting a user to request for a cluster to create a database if the hive library does not exist; if so, not operating;
for the data of the hive table, using HiveTableMeerger test data to test whether the hive table exists in the production environment, and if the hive table does not exist, inserting the hive table; if the hive table exists, further checking whether the hive table to be imported is modified relative to the original hive table, if not, not operating, keeping the data in an original state, if so, further checking whether the hive table has partitions or sub-buckets, if so, checking whether partition or sub-bucket information is changed, if so, prompting a user to modify, otherwise, modifying mysql to store the information;
for the data of the hive table field, using a hive ColumnMerger to check the data, firstly checking whether the hive table field exists in the production environment, if the hive table field does not exist, inserting the hive table field, if the hive table field exists, further checking whether the field is changed, if the field is changed, changing the corresponding field, otherwise, not operating, and keeping the data as it is;
for a scheduling task, checking data by using a scheduler Merger, checking whether the scheduling task exists in a production environment, if not, inserting the scheduling task, if so, checking whether information of the scheduling task changes, if so, covering and modifying the state of the scheduling task, and if not, not operating, and keeping the original state of the data;
further, checking whether the task information is modified or not by using JobBaseMerger, if so, covering the original task information in the production environment, and if not, keeping the data in an original state; checking whether the main table is modified or not by using MainTableMerger, if so, covering the original main table in the production environment, and if not, keeping the data in an original state; and for the association table with the unique key of the main table, verifying whether the data is modified by using a relatedMerger, if so, overwriting the original association table in the production environment, and if not, keeping the data as it is.
Various corresponding changes and modifications can be made by those skilled in the art based on the above technical solutions and concepts, and all such changes and modifications should be included in the protection scope of the present invention.
Claims (3)
1. A task migration method for a heterogeneous big data management platform is characterized by comprising the following specific processes:
exporting and importing metadata information of a development environment into a production environment, verifying the data when the data is imported, and executing corresponding operation according to a verification result; wherein:
for database data, using HiveDataBaseMerger to check data, checking whether a hive library exists in a production environment, and prompting a user to request for a cluster to create a database if the hive library does not exist;
for the data of the hive table, using HiveTableMeerger test data to test whether the hive table exists in the production environment, and if the hive table does not exist, inserting the hive table; if the hive table exists, further checking whether the hive table to be imported is modified relative to the original hive table, if not, not operating, keeping the data in an original state, if so, further checking whether the hive table has partitions or sub-buckets, if so, checking whether partition or sub-bucket information is changed, if so, prompting a user to modify, otherwise, modifying mysql to store the information;
for the data of the hive table field, using a hive ColumnMerger to check the data, firstly checking whether the hive table field exists in the production environment, if the hive table field does not exist, inserting the hive table field, if the hive table field exists, further checking whether the field is changed, if the field is changed, changing the corresponding field, otherwise, not operating, and keeping the data as it is;
for the scheduling task, checking data by using a scheduler Merger, checking whether the scheduling task exists in a production environment, inserting the scheduling task if the scheduling task does not exist, checking whether the information of the scheduling task changes if the scheduling task exists, covering and modifying the state of the scheduling task if the information of the scheduling task changes, and not operating and keeping the original state of the data if the information of the scheduling task does not change.
2. The method of claim 1, wherein the task information is checked for modifications using jobbasemager, and if so, overrides the original task information in the production environment, and if not, leaves the data intact.
3. The method of claim 1, wherein the main table is verified for modifications using maintablemager, and if so, overrides the original main table in the production environment, and if not, leaves the data intact; and for the association table with the unique key of the main table, verifying whether the data is modified by using a relatedMerger, if so, overwriting the original association table in the production environment, and if not, keeping the data as it is.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111256715.XA CN113961625B (en) | 2021-10-27 | 2021-10-27 | Task migration method for heterogeneous big data management platform |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111256715.XA CN113961625B (en) | 2021-10-27 | 2021-10-27 | Task migration method for heterogeneous big data management platform |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113961625A true CN113961625A (en) | 2022-01-21 |
CN113961625B CN113961625B (en) | 2022-06-07 |
Family
ID=79467596
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111256715.XA Active CN113961625B (en) | 2021-10-27 | 2021-10-27 | Task migration method for heterogeneous big data management platform |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113961625B (en) |
Citations (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103176988A (en) * | 2011-12-21 | 2013-06-26 | 上海博腾信息科技有限公司 | Data migration system based on software-as-a-service (SaaS) |
US20130205028A1 (en) * | 2012-02-07 | 2013-08-08 | Rackspace Us, Inc. | Elastic, Massively Parallel Processing Data Warehouse |
CN103605663A (en) * | 2013-10-22 | 2014-02-26 | 芜湖大学科技园发展有限公司 | General database checking and metadata loading method |
CN104573100A (en) * | 2015-01-29 | 2015-04-29 | 无锡江南计算技术研究所 | Step-by-step database synchronization method with autoincrement identifications |
CN105740411A (en) * | 2016-01-30 | 2016-07-06 | 武汉大学 | SOA (Service-Oriented Architecture) and WebService based data migration method |
CN108241632A (en) * | 2016-12-23 | 2018-07-03 | 航天星图科技(北京)有限公司 | A kind of data verification method of data base-oriented Data Migration |
CN108959470A (en) * | 2018-06-20 | 2018-12-07 | 郑州云海信息技术有限公司 | A kind of database data cross-platform migration method and device |
CN109508355A (en) * | 2018-10-19 | 2019-03-22 | 平安科技(深圳)有限公司 | A kind of data pick-up method, system and terminal device |
CN109829009A (en) * | 2018-12-28 | 2019-05-31 | 北京邮电大学 | Configurable isomeric data real-time synchronization and visual system and method |
CN109997125A (en) * | 2016-09-15 | 2019-07-09 | 英国天然气控股有限公司 | System for importing data to data storage bank |
CN110069335A (en) * | 2019-05-07 | 2019-07-30 | 江苏满运软件科技有限公司 | Task processing system, method, computer equipment and storage medium |
CN110505228A (en) * | 2019-08-23 | 2019-11-26 | 上海宽带技术及应用工程研究中心 | Big data processing method, system, medium and device based on edge cloud framework |
CN111259006A (en) * | 2019-11-19 | 2020-06-09 | 中国科学院计算机网络信息中心 | Universal distributed heterogeneous data integrated physical aggregation, organization, release and service method and system |
CN111930850A (en) * | 2020-09-24 | 2020-11-13 | 腾讯科技(深圳)有限公司 | Data verification method and device, computer equipment and storage medium |
CN112035444A (en) * | 2020-09-03 | 2020-12-04 | 中国银行股份有限公司 | Method and device for transferring image data between heterogeneous systems without stopping |
US10909120B1 (en) * | 2016-03-30 | 2021-02-02 | Groupon, Inc. | Configurable and incremental database migration framework for heterogeneous databases |
CN112328539A (en) * | 2020-10-27 | 2021-02-05 | 深圳市赛宇景观设计工程有限公司 | Data migration method based on big data |
CN113434482A (en) * | 2021-06-28 | 2021-09-24 | 平安国际智慧城市科技股份有限公司 | Data migration method and device, computer equipment and storage medium |
CN113468143A (en) * | 2021-07-22 | 2021-10-01 | 咪咕数字传媒有限公司 | Data migration method, system, computing device and storage medium |
-
2021
- 2021-10-27 CN CN202111256715.XA patent/CN113961625B/en active Active
Patent Citations (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103176988A (en) * | 2011-12-21 | 2013-06-26 | 上海博腾信息科技有限公司 | Data migration system based on software-as-a-service (SaaS) |
US20130205028A1 (en) * | 2012-02-07 | 2013-08-08 | Rackspace Us, Inc. | Elastic, Massively Parallel Processing Data Warehouse |
CN103605663A (en) * | 2013-10-22 | 2014-02-26 | 芜湖大学科技园发展有限公司 | General database checking and metadata loading method |
CN104573100A (en) * | 2015-01-29 | 2015-04-29 | 无锡江南计算技术研究所 | Step-by-step database synchronization method with autoincrement identifications |
CN105740411A (en) * | 2016-01-30 | 2016-07-06 | 武汉大学 | SOA (Service-Oriented Architecture) and WebService based data migration method |
US10909120B1 (en) * | 2016-03-30 | 2021-02-02 | Groupon, Inc. | Configurable and incremental database migration framework for heterogeneous databases |
CN109997125A (en) * | 2016-09-15 | 2019-07-09 | 英国天然气控股有限公司 | System for importing data to data storage bank |
CN108241632A (en) * | 2016-12-23 | 2018-07-03 | 航天星图科技(北京)有限公司 | A kind of data verification method of data base-oriented Data Migration |
CN108959470A (en) * | 2018-06-20 | 2018-12-07 | 郑州云海信息技术有限公司 | A kind of database data cross-platform migration method and device |
CN109508355A (en) * | 2018-10-19 | 2019-03-22 | 平安科技(深圳)有限公司 | A kind of data pick-up method, system and terminal device |
CN109829009A (en) * | 2018-12-28 | 2019-05-31 | 北京邮电大学 | Configurable isomeric data real-time synchronization and visual system and method |
CN110069335A (en) * | 2019-05-07 | 2019-07-30 | 江苏满运软件科技有限公司 | Task processing system, method, computer equipment and storage medium |
CN110505228A (en) * | 2019-08-23 | 2019-11-26 | 上海宽带技术及应用工程研究中心 | Big data processing method, system, medium and device based on edge cloud framework |
CN111259006A (en) * | 2019-11-19 | 2020-06-09 | 中国科学院计算机网络信息中心 | Universal distributed heterogeneous data integrated physical aggregation, organization, release and service method and system |
CN112035444A (en) * | 2020-09-03 | 2020-12-04 | 中国银行股份有限公司 | Method and device for transferring image data between heterogeneous systems without stopping |
CN111930850A (en) * | 2020-09-24 | 2020-11-13 | 腾讯科技(深圳)有限公司 | Data verification method and device, computer equipment and storage medium |
CN112328539A (en) * | 2020-10-27 | 2021-02-05 | 深圳市赛宇景观设计工程有限公司 | Data migration method based on big data |
CN113434482A (en) * | 2021-06-28 | 2021-09-24 | 平安国际智慧城市科技股份有限公司 | Data migration method and device, computer equipment and storage medium |
CN113468143A (en) * | 2021-07-22 | 2021-10-01 | 咪咕数字传媒有限公司 | Data migration method, system, computing device and storage medium |
Non-Patent Citations (3)
Title |
---|
扎心了,老铁: "Hive学习之路(一)Hive初识", 《HTTPS://WWW.CNBLOGS.COM/QINGYUNZONG/P/8707885.HTML》 * |
胡静: "基于Hadoop平台的服务调度管理系统设计与实现", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 * |
阿里云开发者社区: "Hive关于merge的几个参数", 《HTTPS://DEVELOPER.ALIYUN.COM/ARTICLE/476804》 * |
Also Published As
Publication number | Publication date |
---|---|
CN113961625B (en) | 2022-06-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110069572B (en) | HIVE task scheduling method, device, equipment and storage medium based on big data platform | |
CN110209650B (en) | Data normalization and migration method and device, computer equipment and storage medium | |
CN103793424B (en) | database data migration method and system | |
CN106603264A (en) | Method and equipment for positioning fault root | |
CN106021445B (en) | It is a kind of to load data cached method and device | |
CN106777101B (en) | Data processing engine | |
CN114399227A (en) | Production scheduling method and device based on digital twins and computer equipment | |
CN116483586B (en) | Data efficient processing method based on dynamic array | |
CN107168758A (en) | The code compilation inspection method and device of many code libraries | |
CN109005198A (en) | A kind of controller attack protection security strategy generation method and system | |
CN105528381A (en) | Database data migration method and system | |
CN104298761A (en) | Implementation method for master data matching between heterogeneous software systems | |
CN112181477A (en) | Complex event processing method and device and terminal equipment | |
CN110134646B (en) | Knowledge platform service data storage and integration method and system | |
CN110705969A (en) | Transformer substation monitoring system, main station and method for automatically associating measuring point ID | |
CN106708902A (en) | Database data migration method and system | |
CN113961625B (en) | Task migration method for heterogeneous big data management platform | |
CN105630778A (en) | DB data migration method and system | |
CN116540638B (en) | Method, device and storage medium for post-processing CAM numerical control machining program | |
CN111625330A (en) | Cross-thread task processing method and device, server and storage medium | |
CN116627609A (en) | Hive batch processing-based scheduling method and device | |
US20150212799A1 (en) | Migration between model elements of different types in a modeling environment | |
CN109165325A (en) | Method, apparatus, equipment and computer readable storage medium for cutting diagram data | |
CN111538715B (en) | Method and device for migrating wind control scheme and electronic equipment | |
CN114116503A (en) | Test method, test device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |