CN112597242B - Extraction method based on application system data slices related to batch tasks - Google Patents

Extraction method based on application system data slices related to batch tasks Download PDF

Info

Publication number
CN112597242B
CN112597242B CN202011485803.2A CN202011485803A CN112597242B CN 112597242 B CN112597242 B CN 112597242B CN 202011485803 A CN202011485803 A CN 202011485803A CN 112597242 B CN112597242 B CN 112597242B
Authority
CN
China
Prior art keywords
data
slave
library
application system
master
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011485803.2A
Other languages
Chinese (zh)
Other versions
CN112597242A (en
Inventor
张妍洁
唐振华
朱小容
杨斌
廖雪强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan XW Bank Co Ltd
Original Assignee
Sichuan XW Bank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan XW Bank Co Ltd filed Critical Sichuan XW Bank Co Ltd
Priority to CN202011485803.2A priority Critical patent/CN112597242B/en
Publication of CN112597242A publication Critical patent/CN112597242A/en
Application granted granted Critical
Publication of CN112597242B publication Critical patent/CN112597242B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • G06F16/275Synchronous replication
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an extraction method based on application system data slices related to batch tasks, which belongs to the technical field of data processing and solves the problems that large data extraction data slices and the application system execute batch tasks to serially lengthen batch task time, the extraction is not completed in a specified time, and then batch task pollution data is extracted during execution, and the method comprises the following steps: preprocessing a database; interrupting master-slave synchronization; the large data sampling number and the subsequent batch parallel execution of the application system; and recovering the master-slave synchronization. The purpose of the invention is that: and the time of the batch task at the end of the day is shortened, and the pollution to the data slice to be extracted is avoided. The invention is suitable for the application of banks or financial institutions which relate to more batch tasks and need big data to extract data slices.

Description

Extraction method based on application system data slices related to batch tasks
Technical Field
The invention relates to the technical field of data processing, in particular to an extraction method based on application system data slices related to batch tasks.
Background
The database cluster of the application system adopts a master-slave architecture system. When the writing operation is carried out under the same instance of the data master library, after the data is successfully written in the master library, a mechanism for automatically synchronizing the data is triggered to synchronize all operations in the current time period of the master library to the slave library and the standby library.
In the database use process, the main database is used for the application system adding, deleting and checking operation. The slave library is used for application part query operations. The backup library is used for extracting big data.
Convention: the big data extraction is that the application system synchronizes the data from the master library to the slave library and then from the slave library to the standby library, and the application does not relate to the related use of the slave library. To keep uniformity with industry terminology, actual master-slave synchronization in the following description is collectively referred to as "master-slave synchronization"; all the libraries in the description are replaced by "slave libraries".
In the prior art, big data may extract data slices before a certain time point in a plurality of batch tasks of an application system. In the extraction process, in order to avoid data pollution caused by the fact that data to be extracted is changed by the execution of batch tasks of a subsequent application system, the batch tasks can be suspended for a period of time, and the batch tasks can be continuously executed after waiting for the end of large data extraction; the duration of the pause may be set.
The application system mentioned in the scheme refers to an application system related to batch tasks in the financial industry, and comprises application systems such as a credit core application system, a line core application system and the like; according to the function division, application systems such as batch processing credit business, batch processing of account checking files of a discharging partner, batch processing of general account running water and the like can be available.
In summary, in the conventional extraction method, there are two problems:
1. when the large data sampling number is abnormal, the waiting time is not finished yet after the setting, and the continuous execution of the batch tasks of the application system can pollute the data slices to be extracted.
2. The batch tasks of the application system need to be suspended in the middle of waiting for large data decimation, so that the overall execution time is prolonged.
Disclosure of Invention
Aiming at the problems that in the prior art, the large data sampling number is abnormal, the waiting time is not finished yet, the continuous execution of the batch tasks of the application system pollutes the data slices to be extracted, the batch tasks of the application system need to suspend waiting for the large data sampling number midway, and the total execution time of the batch tasks is prolonged, the invention provides a method for extracting the data slices of the application system based on the batch tasks, which aims at: the large data sampling number and the execution of the batch tasks can be synchronously performed, the pollution of data slices needing to be extracted can be avoided, and meanwhile, the total execution time of the batch tasks is shortened.
In order to achieve the above purpose, the invention adopts the following technical scheme:
an extraction method based on application system data slices related to batch tasks comprises the following steps:
step A, preprocessing a database, specifically: respectively adding a database in the same instance of a master library and a slave library of an application system, newly establishing a table X in the added database of the master library, and newly establishing a table X 'in the added database of the slave library, wherein the names of the table X and the table X' are the same;
and (B) step (B): when the application system executes the large data sampling nodes in batches, the master-slave synchronous tasks are interrupted, and interrupt nodes are generated;
step C, the big data extracts the application system from the database data and simultaneously the application system continuously executes the subsequent batch tasks;
and D, recovering the master-slave data synchronization from the master-slave synchronization interrupt node.
In the invention, by changing the table name of the slave library and inserting a record in the table of the master library, when the record is ready to be synchronized to the slave library, the corresponding database table cannot be found, so that the master-slave synchronization is disconnected, and an interrupt node is generated. At this time, no matter any modification is made on any table of the master library, the master-slave synchronization cannot be continued. At this point the big data starts to extract the data slice. After the big data is completely extracted from the slave library, after the extracted data completion notification is sent, the application system modifies the table name of the slave library, the table name of the modified slave library is the same as that of the master library, and the master-slave synchronization can be continued. The large data extraction slave database data process and the application system executing the subsequent batch tasks can be performed simultaneously, the application system does not pause batch and wait any more, and the total execution time of the batch tasks is shortened.
Further, the step a specifically includes: respectively adding a database in the same instance of the master library and the slave library of the application system, newly establishing a table X in the added database of the master library, and newly establishing a table X 'in the added database of the slave library, wherein the names of the table X and the table X' are the same; the table names of the newly built tables in the master library and the slave library can be named randomly according to habits and requirements, so that the understanding and management are convenient.
Further, the step B specifically includes: when the batch task is executed by the application system and reaches the big data lottery node, writing a mark of ' reaching the lottery node ' into the main library service table, modifying the name of the table X ' in the slave library, and adding any record into the table X in the main library.
The invention is realized through daily batch business logic of the application system, and after the lottery node is reached, the following batch is executed before: firstly, writing a mark of ' reaching the lottery node ' into a main library service table, secondly, modifying the name of a slave library table X ', and finally, adding any record into the main library table X, wherein the record does not need to have service meaning. When the table X is newly added with records, master-slave synchronization is initiated to the table X', at the moment, the table name which is the same as the table X cannot be found, the master-slave synchronization of the data fails, and the failure of synchronizing the master library data of all subsequent application systems to the slave library is triggered, so that the master-slave synchronization is interrupted artificially.
Further, the step C specifically comprises the following steps: the big data starts the data checking task, and starts to draw the slave library after the 'reaching the drawing node' mark is polled.
Further, in the step C, during the process of extracting the data from the big data, the application system continues to execute the subsequent batch tasks, and the synchronization of the slave library is not triggered while the data of the master library is changed.
The master-slave synchronization is interrupted, so that when large data is extracted, the problem that the data change caused by continuously executing batch tasks can be immediately synchronized to the data of the slave library influencing the extraction of the large data is avoided, and the data is prevented from being polluted.
Further, the step D specifically includes: after the big data is extracted, a notice of the completion of data extraction is sent to an application system, after the application system receives the notice, the names of the table X' in the slave library are modified to the original names, and all slave interrupt nodes are not synchronized to the data of the slave library, and the master-slave data synchronization is sequentially restored.
After the big data extraction is completed, the master-slave synchronization is recovered by modifying the table name of the slave table X', and all failed data are sequentially synchronized after the node is interrupted. Until the data of the master and slave libraries agree.
In summary, due to the adoption of the technical scheme, the beneficial effects of the invention are as follows:
1. the method effectively controls the disconnection and recovery of the master-slave synchronization by modifying the newly added table names in the slave library, and avoids the pollution of the data slices to be extracted due to the execution of subsequent batch tasks.
2. The large data extraction slave library data process and the application system execution batch task can be performed simultaneously, so that the total time consumption of batch task processing is shortened.
3. The master-slave synchronous switch is completely mastered in the service logic code of the application system, and is convenient, flexible and controllable.
Drawings
FIG. 1 is a schematic diagram of an embodiment of the present invention;
fig. 2 is a schematic diagram of a specific implementation of an embodiment of the present invention.
Detailed Description
All of the features disclosed in this specification, or all of the steps in a method or process disclosed, may be combined in any combination, except for mutually exclusive features and/or steps.
The invention will be further described with reference to the drawings and detailed description.
As shown in fig. 1, which is an embodiment of a method of extraction based on application system data slices involving batch tasks,
comprising the following steps:
step A, preprocessing a database;
the method comprises the following steps: and respectively adding a database in the same instance of the master library and the slave library of the application system, newly establishing a table X in the added database of the master library, and newly establishing a table X 'in the added database of the slave library, wherein the names of the table X and the table X' are the same and have no naming requirement.
And (B) step (B): when the node is in the large data sampling number, the master-slave synchronous task is interrupted, and an interrupt node is generated;
the method comprises the following steps: when the batch task is executed by the application system and reaches the big data lottery node, writing a mark of ' reaching the lottery node ' into the main library service table, modifying the name of the table X ' in the slave library, and adding any record into the table X in the main library.
Step C:
the big data starts the data checking task, and after the 'reaching the lottery node' mark is polled, the data extraction from the library is started. Meanwhile, the application system continues to execute the subsequent batch tasks, and the synchronization of the slave library is not triggered while the data of the master library is changed.
And D, recovering the master-slave data synchronization from the master-slave synchronization interrupt node.
The method comprises the following steps: after the large data is extracted, a notice of the completion of the data extraction is sent to the application system, and after the application system receives the notice of the large data, the names of the table X' in the slave library are modified to the original names, and all slave interrupt nodes do not synchronize the data of the slave library and sequentially recover the master-slave data synchronization.
Application system embodiment:
a new database A and a new database B are respectively established in a master library and a slave library of the application system, and a new table named as swich_on is established in both the new database A and the new database B. When the daily application system executes batch tasks to reach the big data decimation node: firstly, writing a ready identifier into a main library; then, modifying the names of the tables in the slave library, and modifying the 'swich_ on' to 'swich_ off'; and finally, writing a record R taking the current service date as a main key into a table of 'swichon' in the main library. Meanwhile, the polling task of big data starts to draw from the slave library after polling the "ready" flag.
When the record R is synchronized from the swith_on of the slave library to the swith_on of the slave library, the slave library is found that the swith_on table does not exist and cannot be synchronized, and further the synchronization thread is interrupted, and whether the data state can be synchronized or not is always checked by the node polling. In this state, no matter any modification is made to any piece of table data of the master library, synchronization to the slave library is impossible. The application system can continue with the subsequent batch task without fear that the data changes resulting from executing the batch task will immediately synchronize to the slave library contaminating the data slice to be extracted. Therefore, the purpose that the subsequent execution of batch tasks by the application system and the parallel processing of large data decimation are not mutually influenced is achieved.
After the big data extraction is completed, an MQ notification of the completion of the extraction is sent. After the application subscribes to and consumes this notification, the table "switch_off" in the slave library is modified to "switch_on". At this time, the table names of the master library and the slave library are the same, the master-slave synchronization is restarted from the node of the interrupt point, and all failed data are sequentially recovered to be synchronized until the data of the master library and the slave library are completely consistent.
The above is merely representative examples of numerous specific applications of the present invention and should not be construed as limiting the scope of the invention in any way. All technical schemes formed by adopting transformation or equivalent substitution fall within the protection scope of the invention.

Claims (4)

1. An extraction method based on application system data slices related to batch tasks is characterized by comprising the following steps: comprising the following steps:
step A, preprocessing a database, specifically: respectively adding a database in the same instance of a master library and a slave library of the application system, newly establishing a table X in the database added in the master library, and newly establishing a table X' in the database added in the slave library;
and (B) step (B): interrupting master-slave synchronization; the step B specifically comprises the following steps: when the batch task executed by the application system reaches a big data decimation node, notifying the big data that the decimation can be performed, modifying the name of the table X' in the slave library, and adding any record to the table X in the master library
Step C, the big data extracts the application system from the database data and simultaneously the application system continuously executes the subsequent batch tasks;
step D, recovering master-slave synchronization; the step D specifically comprises the following steps: after the large data extraction is completed, sending a notice of data extraction completion to an application system, and after the application system receives the notice, modifying the name of the table X' in the database back to the original name; all slave interrupt nodes are not synchronized to the slave library data and begin sequentially restoring master-slave data synchronization.
2. The method for extracting data slices based on application systems involving batch tasks according to claim 1, wherein the method comprises the following steps: the step A specifically comprises the following steps:
and the table names of the new table X and the new table X' in the master library and the slave library are not required to be named and are the same.
3. The method for extracting data slices based on application systems involving batch tasks according to claim 1, wherein the method comprises the following steps: the step C specifically comprises the following steps:
and after the big data receives the lottery notification, starting to extract data from the database.
4. The method for extracting data slices based on application systems involving batch tasks according to claim 1, wherein the method comprises the following steps: in the step C, in the process of extracting the big data, the application system continues to execute the subsequent batch tasks, and the synchronization of the slave library is not triggered while the data of the master library is changed.
CN202011485803.2A 2020-12-16 2020-12-16 Extraction method based on application system data slices related to batch tasks Active CN112597242B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011485803.2A CN112597242B (en) 2020-12-16 2020-12-16 Extraction method based on application system data slices related to batch tasks

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011485803.2A CN112597242B (en) 2020-12-16 2020-12-16 Extraction method based on application system data slices related to batch tasks

Publications (2)

Publication Number Publication Date
CN112597242A CN112597242A (en) 2021-04-02
CN112597242B true CN112597242B (en) 2023-06-06

Family

ID=75196384

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011485803.2A Active CN112597242B (en) 2020-12-16 2020-12-16 Extraction method based on application system data slices related to batch tasks

Country Status (1)

Country Link
CN (1) CN112597242B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114840393B (en) * 2022-06-29 2022-09-30 杭州比智科技有限公司 Multi-data-source data synchronous monitoring method and system

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101477572A (en) * 2009-01-12 2009-07-08 深圳市里王智通软件有限公司 Method and system of dynamic data base based on TDS transition data storage technology
CN102752372A (en) * 2012-06-18 2012-10-24 天津神舟通用数据技术有限公司 File based database synchronization method
CN104331435A (en) * 2014-10-22 2015-02-04 国家电网公司 Low-influence high-efficiency mass data extraction method based on Hadoop big data platform
CN105069142A (en) * 2015-08-18 2015-11-18 山大地纬软件股份有限公司 System and method for extraction, transformation and distribution of data increments
CN106354865A (en) * 2016-09-09 2017-01-25 北京奇虎科技有限公司 Method, device and system for synchronizing master database and secondary database
CN107766132A (en) * 2017-06-25 2018-03-06 平安科技(深圳)有限公司 Multi-task scheduling method, application server and computer-readable recording medium
CN109241175A (en) * 2018-06-28 2019-01-18 东软集团股份有限公司 Method of data synchronization, device, storage medium and electronic equipment
CN110019445A (en) * 2017-09-08 2019-07-16 北京京东尚科信息技术有限公司 Method of data synchronization and device calculate equipment and storage medium
CN110032594A (en) * 2019-03-21 2019-07-19 厦门市美亚柏科信息股份有限公司 The data pick-up method, apparatus and storage medium of the Various database of customizable
CN110175213A (en) * 2019-05-27 2019-08-27 浪潮软件集团有限公司 A kind of oracle database synchronization system and method based on SCN mode
CN110196885A (en) * 2019-06-13 2019-09-03 东方电子股份有限公司 A kind of cloud distributed real-time database system
CN110502583A (en) * 2019-08-27 2019-11-26 深圳前海微众银行股份有限公司 Distributed Data Synchronization method, apparatus, equipment and readable storage medium storing program for executing
CN110647548A (en) * 2019-09-23 2020-01-03 浪潮软件股份有限公司 Method and system for converting streaming data into batch based on NiFi and state value thereof
CN111881210A (en) * 2020-06-29 2020-11-03 平安国际智慧城市科技股份有限公司 Data synchronization method, device, intranet server and medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9060046B2 (en) * 2008-02-18 2015-06-16 Google Technology Holdings LLC Method and apparatus for transferring media data between devices
US10318355B2 (en) * 2017-01-24 2019-06-11 Oracle International Corporation Distributed graph processing system featuring interactive remote control mechanism including task cancellation

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101477572A (en) * 2009-01-12 2009-07-08 深圳市里王智通软件有限公司 Method and system of dynamic data base based on TDS transition data storage technology
CN102752372A (en) * 2012-06-18 2012-10-24 天津神舟通用数据技术有限公司 File based database synchronization method
CN104331435A (en) * 2014-10-22 2015-02-04 国家电网公司 Low-influence high-efficiency mass data extraction method based on Hadoop big data platform
CN105069142A (en) * 2015-08-18 2015-11-18 山大地纬软件股份有限公司 System and method for extraction, transformation and distribution of data increments
CN106354865A (en) * 2016-09-09 2017-01-25 北京奇虎科技有限公司 Method, device and system for synchronizing master database and secondary database
CN107766132A (en) * 2017-06-25 2018-03-06 平安科技(深圳)有限公司 Multi-task scheduling method, application server and computer-readable recording medium
CN110019445A (en) * 2017-09-08 2019-07-16 北京京东尚科信息技术有限公司 Method of data synchronization and device calculate equipment and storage medium
CN109241175A (en) * 2018-06-28 2019-01-18 东软集团股份有限公司 Method of data synchronization, device, storage medium and electronic equipment
CN110032594A (en) * 2019-03-21 2019-07-19 厦门市美亚柏科信息股份有限公司 The data pick-up method, apparatus and storage medium of the Various database of customizable
CN110175213A (en) * 2019-05-27 2019-08-27 浪潮软件集团有限公司 A kind of oracle database synchronization system and method based on SCN mode
CN110196885A (en) * 2019-06-13 2019-09-03 东方电子股份有限公司 A kind of cloud distributed real-time database system
CN110502583A (en) * 2019-08-27 2019-11-26 深圳前海微众银行股份有限公司 Distributed Data Synchronization method, apparatus, equipment and readable storage medium storing program for executing
CN110647548A (en) * 2019-09-23 2020-01-03 浪潮软件股份有限公司 Method and system for converting streaming data into batch based on NiFi and state value thereof
CN111881210A (en) * 2020-06-29 2020-11-03 平安国际智慧城市科技股份有限公司 Data synchronization method, device, intranet server and medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
parallelism extraction algorithm from stream-based processing flow applying spanning tree;Guyue Wang 等;《2014 IEEE International parallel & Distributed Processing Symposium Workshops》;20141204;632-641 *
复杂信息系统的数据提取、建模及其应用;黄鹏飞;《中国优秀硕士学位论文全文数据库 社会科学Ⅰ辑》;20190615(第06期);G113-42 *

Also Published As

Publication number Publication date
CN112597242A (en) 2021-04-02

Similar Documents

Publication Publication Date Title
EP3754514A1 (en) Distributed database cluster system, data synchronization method and storage medium
US10565071B2 (en) Smart data replication recoverer
WO2010028529A1 (en) A method and device for maintaining a changelog in data synchronization
CN112597242B (en) Extraction method based on application system data slices related to batch tasks
CN109901948B (en) Remote double-active disaster recovery system of shared-nothing database cluster
CN111147560B (en) Data synchronization method based on HTTP (hyper text transport protocol) and breakpoint continuous transmission
CN115438122A (en) Data heterogeneous synchronization system
CN105574127A (en) Quasi real-time disaster recovery method of distributed database system
CN110333973B (en) Multi-machine hot standby method and system
CN111488243B (en) Backup and recovery method and device for MongoDB database, electronic equipment and storage medium
CN113438111A (en) Method for restoring RabbitMQ network partition based on Raft distribution and application
CN116185697B (en) Container cluster management method, device and system, electronic equipment and storage medium
CN112800060A (en) Data processing method and device, computer readable storage medium and electronic equipment
CN112068994A (en) Method, apparatus, device and medium for data persistence during storage cluster runtime
CN116383161A (en) File synchronization method, device and medium
CN108984660A (en) A kind of MySQL database master-slave synchronisation data duplicate removal method
CN113297134B (en) Data processing system, data processing method and device, and electronic device
CN111444281B (en) Database parameter synchronization method and system
KR20100061983A (en) Method and system for operating management of real-time replicated database
CN111680040A (en) Data table processing method and device
CN114281607A (en) Task synchronization method, device, main and standby system and storage medium
CN114510539B (en) Method for generating and applying consistency check point of distributed database
CN106446031A (en) Quick node replacement method for large-scale cluster database
CN109710690B (en) Service driving calculation method and system
CN115794950A (en) Snapshot library processing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant