CN112597242B - Extraction method based on application system data slices related to batch tasks - Google Patents
Extraction method based on application system data slices related to batch tasks Download PDFInfo
- Publication number
- CN112597242B CN112597242B CN202011485803.2A CN202011485803A CN112597242B CN 112597242 B CN112597242 B CN 112597242B CN 202011485803 A CN202011485803 A CN 202011485803A CN 112597242 B CN112597242 B CN 112597242B
- Authority
- CN
- China
- Prior art keywords
- data
- slave
- library
- application system
- master
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
- G06F16/275—Synchronous replication
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Computing Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses an extraction method based on application system data slices related to batch tasks, which belongs to the technical field of data processing and solves the problems that large data extraction data slices and the application system execute batch tasks to serially lengthen batch task time, the extraction is not completed in a specified time, and then batch task pollution data is extracted during execution, and the method comprises the following steps: preprocessing a database; interrupting master-slave synchronization; the large data sampling number and the subsequent batch parallel execution of the application system; and recovering the master-slave synchronization. The purpose of the invention is that: and the time of the batch task at the end of the day is shortened, and the pollution to the data slice to be extracted is avoided. The invention is suitable for the application of banks or financial institutions which relate to more batch tasks and need big data to extract data slices.
Description
Technical Field
The invention relates to the technical field of data processing, in particular to an extraction method based on application system data slices related to batch tasks.
Background
The database cluster of the application system adopts a master-slave architecture system. When the writing operation is carried out under the same instance of the data master library, after the data is successfully written in the master library, a mechanism for automatically synchronizing the data is triggered to synchronize all operations in the current time period of the master library to the slave library and the standby library.
In the database use process, the main database is used for the application system adding, deleting and checking operation. The slave library is used for application part query operations. The backup library is used for extracting big data.
Convention: the big data extraction is that the application system synchronizes the data from the master library to the slave library and then from the slave library to the standby library, and the application does not relate to the related use of the slave library. To keep uniformity with industry terminology, actual master-slave synchronization in the following description is collectively referred to as "master-slave synchronization"; all the libraries in the description are replaced by "slave libraries".
In the prior art, big data may extract data slices before a certain time point in a plurality of batch tasks of an application system. In the extraction process, in order to avoid data pollution caused by the fact that data to be extracted is changed by the execution of batch tasks of a subsequent application system, the batch tasks can be suspended for a period of time, and the batch tasks can be continuously executed after waiting for the end of large data extraction; the duration of the pause may be set.
The application system mentioned in the scheme refers to an application system related to batch tasks in the financial industry, and comprises application systems such as a credit core application system, a line core application system and the like; according to the function division, application systems such as batch processing credit business, batch processing of account checking files of a discharging partner, batch processing of general account running water and the like can be available.
In summary, in the conventional extraction method, there are two problems:
1. when the large data sampling number is abnormal, the waiting time is not finished yet after the setting, and the continuous execution of the batch tasks of the application system can pollute the data slices to be extracted.
2. The batch tasks of the application system need to be suspended in the middle of waiting for large data decimation, so that the overall execution time is prolonged.
Disclosure of Invention
Aiming at the problems that in the prior art, the large data sampling number is abnormal, the waiting time is not finished yet, the continuous execution of the batch tasks of the application system pollutes the data slices to be extracted, the batch tasks of the application system need to suspend waiting for the large data sampling number midway, and the total execution time of the batch tasks is prolonged, the invention provides a method for extracting the data slices of the application system based on the batch tasks, which aims at: the large data sampling number and the execution of the batch tasks can be synchronously performed, the pollution of data slices needing to be extracted can be avoided, and meanwhile, the total execution time of the batch tasks is shortened.
In order to achieve the above purpose, the invention adopts the following technical scheme:
an extraction method based on application system data slices related to batch tasks comprises the following steps:
step A, preprocessing a database, specifically: respectively adding a database in the same instance of a master library and a slave library of an application system, newly establishing a table X in the added database of the master library, and newly establishing a table X 'in the added database of the slave library, wherein the names of the table X and the table X' are the same;
and (B) step (B): when the application system executes the large data sampling nodes in batches, the master-slave synchronous tasks are interrupted, and interrupt nodes are generated;
step C, the big data extracts the application system from the database data and simultaneously the application system continuously executes the subsequent batch tasks;
and D, recovering the master-slave data synchronization from the master-slave synchronization interrupt node.
In the invention, by changing the table name of the slave library and inserting a record in the table of the master library, when the record is ready to be synchronized to the slave library, the corresponding database table cannot be found, so that the master-slave synchronization is disconnected, and an interrupt node is generated. At this time, no matter any modification is made on any table of the master library, the master-slave synchronization cannot be continued. At this point the big data starts to extract the data slice. After the big data is completely extracted from the slave library, after the extracted data completion notification is sent, the application system modifies the table name of the slave library, the table name of the modified slave library is the same as that of the master library, and the master-slave synchronization can be continued. The large data extraction slave database data process and the application system executing the subsequent batch tasks can be performed simultaneously, the application system does not pause batch and wait any more, and the total execution time of the batch tasks is shortened.
Further, the step a specifically includes: respectively adding a database in the same instance of the master library and the slave library of the application system, newly establishing a table X in the added database of the master library, and newly establishing a table X 'in the added database of the slave library, wherein the names of the table X and the table X' are the same; the table names of the newly built tables in the master library and the slave library can be named randomly according to habits and requirements, so that the understanding and management are convenient.
Further, the step B specifically includes: when the batch task is executed by the application system and reaches the big data lottery node, writing a mark of ' reaching the lottery node ' into the main library service table, modifying the name of the table X ' in the slave library, and adding any record into the table X in the main library.
The invention is realized through daily batch business logic of the application system, and after the lottery node is reached, the following batch is executed before: firstly, writing a mark of ' reaching the lottery node ' into a main library service table, secondly, modifying the name of a slave library table X ', and finally, adding any record into the main library table X, wherein the record does not need to have service meaning. When the table X is newly added with records, master-slave synchronization is initiated to the table X', at the moment, the table name which is the same as the table X cannot be found, the master-slave synchronization of the data fails, and the failure of synchronizing the master library data of all subsequent application systems to the slave library is triggered, so that the master-slave synchronization is interrupted artificially.
Further, the step C specifically comprises the following steps: the big data starts the data checking task, and starts to draw the slave library after the 'reaching the drawing node' mark is polled.
Further, in the step C, during the process of extracting the data from the big data, the application system continues to execute the subsequent batch tasks, and the synchronization of the slave library is not triggered while the data of the master library is changed.
The master-slave synchronization is interrupted, so that when large data is extracted, the problem that the data change caused by continuously executing batch tasks can be immediately synchronized to the data of the slave library influencing the extraction of the large data is avoided, and the data is prevented from being polluted.
Further, the step D specifically includes: after the big data is extracted, a notice of the completion of data extraction is sent to an application system, after the application system receives the notice, the names of the table X' in the slave library are modified to the original names, and all slave interrupt nodes are not synchronized to the data of the slave library, and the master-slave data synchronization is sequentially restored.
After the big data extraction is completed, the master-slave synchronization is recovered by modifying the table name of the slave table X', and all failed data are sequentially synchronized after the node is interrupted. Until the data of the master and slave libraries agree.
In summary, due to the adoption of the technical scheme, the beneficial effects of the invention are as follows:
1. the method effectively controls the disconnection and recovery of the master-slave synchronization by modifying the newly added table names in the slave library, and avoids the pollution of the data slices to be extracted due to the execution of subsequent batch tasks.
2. The large data extraction slave library data process and the application system execution batch task can be performed simultaneously, so that the total time consumption of batch task processing is shortened.
3. The master-slave synchronous switch is completely mastered in the service logic code of the application system, and is convenient, flexible and controllable.
Drawings
FIG. 1 is a schematic diagram of an embodiment of the present invention;
fig. 2 is a schematic diagram of a specific implementation of an embodiment of the present invention.
Detailed Description
All of the features disclosed in this specification, or all of the steps in a method or process disclosed, may be combined in any combination, except for mutually exclusive features and/or steps.
The invention will be further described with reference to the drawings and detailed description.
As shown in fig. 1, which is an embodiment of a method of extraction based on application system data slices involving batch tasks,
comprising the following steps:
step A, preprocessing a database;
the method comprises the following steps: and respectively adding a database in the same instance of the master library and the slave library of the application system, newly establishing a table X in the added database of the master library, and newly establishing a table X 'in the added database of the slave library, wherein the names of the table X and the table X' are the same and have no naming requirement.
And (B) step (B): when the node is in the large data sampling number, the master-slave synchronous task is interrupted, and an interrupt node is generated;
the method comprises the following steps: when the batch task is executed by the application system and reaches the big data lottery node, writing a mark of ' reaching the lottery node ' into the main library service table, modifying the name of the table X ' in the slave library, and adding any record into the table X in the main library.
Step C:
the big data starts the data checking task, and after the 'reaching the lottery node' mark is polled, the data extraction from the library is started. Meanwhile, the application system continues to execute the subsequent batch tasks, and the synchronization of the slave library is not triggered while the data of the master library is changed.
And D, recovering the master-slave data synchronization from the master-slave synchronization interrupt node.
The method comprises the following steps: after the large data is extracted, a notice of the completion of the data extraction is sent to the application system, and after the application system receives the notice of the large data, the names of the table X' in the slave library are modified to the original names, and all slave interrupt nodes do not synchronize the data of the slave library and sequentially recover the master-slave data synchronization.
Application system embodiment:
a new database A and a new database B are respectively established in a master library and a slave library of the application system, and a new table named as swich_on is established in both the new database A and the new database B. When the daily application system executes batch tasks to reach the big data decimation node: firstly, writing a ready identifier into a main library; then, modifying the names of the tables in the slave library, and modifying the 'swich_ on' to 'swich_ off'; and finally, writing a record R taking the current service date as a main key into a table of 'swichon' in the main library. Meanwhile, the polling task of big data starts to draw from the slave library after polling the "ready" flag.
When the record R is synchronized from the swith_on of the slave library to the swith_on of the slave library, the slave library is found that the swith_on table does not exist and cannot be synchronized, and further the synchronization thread is interrupted, and whether the data state can be synchronized or not is always checked by the node polling. In this state, no matter any modification is made to any piece of table data of the master library, synchronization to the slave library is impossible. The application system can continue with the subsequent batch task without fear that the data changes resulting from executing the batch task will immediately synchronize to the slave library contaminating the data slice to be extracted. Therefore, the purpose that the subsequent execution of batch tasks by the application system and the parallel processing of large data decimation are not mutually influenced is achieved.
After the big data extraction is completed, an MQ notification of the completion of the extraction is sent. After the application subscribes to and consumes this notification, the table "switch_off" in the slave library is modified to "switch_on". At this time, the table names of the master library and the slave library are the same, the master-slave synchronization is restarted from the node of the interrupt point, and all failed data are sequentially recovered to be synchronized until the data of the master library and the slave library are completely consistent.
The above is merely representative examples of numerous specific applications of the present invention and should not be construed as limiting the scope of the invention in any way. All technical schemes formed by adopting transformation or equivalent substitution fall within the protection scope of the invention.
Claims (4)
1. An extraction method based on application system data slices related to batch tasks is characterized by comprising the following steps: comprising the following steps:
step A, preprocessing a database, specifically: respectively adding a database in the same instance of a master library and a slave library of the application system, newly establishing a table X in the database added in the master library, and newly establishing a table X' in the database added in the slave library;
and (B) step (B): interrupting master-slave synchronization; the step B specifically comprises the following steps: when the batch task executed by the application system reaches a big data decimation node, notifying the big data that the decimation can be performed, modifying the name of the table X' in the slave library, and adding any record to the table X in the master library
Step C, the big data extracts the application system from the database data and simultaneously the application system continuously executes the subsequent batch tasks;
step D, recovering master-slave synchronization; the step D specifically comprises the following steps: after the large data extraction is completed, sending a notice of data extraction completion to an application system, and after the application system receives the notice, modifying the name of the table X' in the database back to the original name; all slave interrupt nodes are not synchronized to the slave library data and begin sequentially restoring master-slave data synchronization.
2. The method for extracting data slices based on application systems involving batch tasks according to claim 1, wherein the method comprises the following steps: the step A specifically comprises the following steps:
and the table names of the new table X and the new table X' in the master library and the slave library are not required to be named and are the same.
3. The method for extracting data slices based on application systems involving batch tasks according to claim 1, wherein the method comprises the following steps: the step C specifically comprises the following steps:
and after the big data receives the lottery notification, starting to extract data from the database.
4. The method for extracting data slices based on application systems involving batch tasks according to claim 1, wherein the method comprises the following steps: in the step C, in the process of extracting the big data, the application system continues to execute the subsequent batch tasks, and the synchronization of the slave library is not triggered while the data of the master library is changed.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011485803.2A CN112597242B (en) | 2020-12-16 | 2020-12-16 | Extraction method based on application system data slices related to batch tasks |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011485803.2A CN112597242B (en) | 2020-12-16 | 2020-12-16 | Extraction method based on application system data slices related to batch tasks |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112597242A CN112597242A (en) | 2021-04-02 |
CN112597242B true CN112597242B (en) | 2023-06-06 |
Family
ID=75196384
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011485803.2A Active CN112597242B (en) | 2020-12-16 | 2020-12-16 | Extraction method based on application system data slices related to batch tasks |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112597242B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114840393B (en) * | 2022-06-29 | 2022-09-30 | 杭州比智科技有限公司 | Multi-data-source data synchronous monitoring method and system |
Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101477572A (en) * | 2009-01-12 | 2009-07-08 | 深圳市里王智通软件有限公司 | Method and system of dynamic data base based on TDS transition data storage technology |
CN102752372A (en) * | 2012-06-18 | 2012-10-24 | 天津神舟通用数据技术有限公司 | File based database synchronization method |
CN104331435A (en) * | 2014-10-22 | 2015-02-04 | 国家电网公司 | Low-influence high-efficiency mass data extraction method based on Hadoop big data platform |
CN105069142A (en) * | 2015-08-18 | 2015-11-18 | 山大地纬软件股份有限公司 | System and method for extraction, transformation and distribution of data increments |
CN106354865A (en) * | 2016-09-09 | 2017-01-25 | 北京奇虎科技有限公司 | Method, device and system for synchronizing master database and secondary database |
CN107766132A (en) * | 2017-06-25 | 2018-03-06 | 平安科技(深圳)有限公司 | Multi-task scheduling method, application server and computer-readable recording medium |
CN109241175A (en) * | 2018-06-28 | 2019-01-18 | 东软集团股份有限公司 | Method of data synchronization, device, storage medium and electronic equipment |
CN110019445A (en) * | 2017-09-08 | 2019-07-16 | 北京京东尚科信息技术有限公司 | Method of data synchronization and device calculate equipment and storage medium |
CN110032594A (en) * | 2019-03-21 | 2019-07-19 | 厦门市美亚柏科信息股份有限公司 | The data pick-up method, apparatus and storage medium of the Various database of customizable |
CN110175213A (en) * | 2019-05-27 | 2019-08-27 | 浪潮软件集团有限公司 | A kind of oracle database synchronization system and method based on SCN mode |
CN110196885A (en) * | 2019-06-13 | 2019-09-03 | 东方电子股份有限公司 | A kind of cloud distributed real-time database system |
CN110502583A (en) * | 2019-08-27 | 2019-11-26 | 深圳前海微众银行股份有限公司 | Distributed Data Synchronization method, apparatus, equipment and readable storage medium storing program for executing |
CN110647548A (en) * | 2019-09-23 | 2020-01-03 | 浪潮软件股份有限公司 | Method and system for converting streaming data into batch based on NiFi and state value thereof |
CN111881210A (en) * | 2020-06-29 | 2020-11-03 | 平安国际智慧城市科技股份有限公司 | Data synchronization method, device, intranet server and medium |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9060046B2 (en) * | 2008-02-18 | 2015-06-16 | Google Technology Holdings LLC | Method and apparatus for transferring media data between devices |
US10318355B2 (en) * | 2017-01-24 | 2019-06-11 | Oracle International Corporation | Distributed graph processing system featuring interactive remote control mechanism including task cancellation |
-
2020
- 2020-12-16 CN CN202011485803.2A patent/CN112597242B/en active Active
Patent Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101477572A (en) * | 2009-01-12 | 2009-07-08 | 深圳市里王智通软件有限公司 | Method and system of dynamic data base based on TDS transition data storage technology |
CN102752372A (en) * | 2012-06-18 | 2012-10-24 | 天津神舟通用数据技术有限公司 | File based database synchronization method |
CN104331435A (en) * | 2014-10-22 | 2015-02-04 | 国家电网公司 | Low-influence high-efficiency mass data extraction method based on Hadoop big data platform |
CN105069142A (en) * | 2015-08-18 | 2015-11-18 | 山大地纬软件股份有限公司 | System and method for extraction, transformation and distribution of data increments |
CN106354865A (en) * | 2016-09-09 | 2017-01-25 | 北京奇虎科技有限公司 | Method, device and system for synchronizing master database and secondary database |
CN107766132A (en) * | 2017-06-25 | 2018-03-06 | 平安科技(深圳)有限公司 | Multi-task scheduling method, application server and computer-readable recording medium |
CN110019445A (en) * | 2017-09-08 | 2019-07-16 | 北京京东尚科信息技术有限公司 | Method of data synchronization and device calculate equipment and storage medium |
CN109241175A (en) * | 2018-06-28 | 2019-01-18 | 东软集团股份有限公司 | Method of data synchronization, device, storage medium and electronic equipment |
CN110032594A (en) * | 2019-03-21 | 2019-07-19 | 厦门市美亚柏科信息股份有限公司 | The data pick-up method, apparatus and storage medium of the Various database of customizable |
CN110175213A (en) * | 2019-05-27 | 2019-08-27 | 浪潮软件集团有限公司 | A kind of oracle database synchronization system and method based on SCN mode |
CN110196885A (en) * | 2019-06-13 | 2019-09-03 | 东方电子股份有限公司 | A kind of cloud distributed real-time database system |
CN110502583A (en) * | 2019-08-27 | 2019-11-26 | 深圳前海微众银行股份有限公司 | Distributed Data Synchronization method, apparatus, equipment and readable storage medium storing program for executing |
CN110647548A (en) * | 2019-09-23 | 2020-01-03 | 浪潮软件股份有限公司 | Method and system for converting streaming data into batch based on NiFi and state value thereof |
CN111881210A (en) * | 2020-06-29 | 2020-11-03 | 平安国际智慧城市科技股份有限公司 | Data synchronization method, device, intranet server and medium |
Non-Patent Citations (2)
Title |
---|
parallelism extraction algorithm from stream-based processing flow applying spanning tree;Guyue Wang 等;《2014 IEEE International parallel & Distributed Processing Symposium Workshops》;20141204;632-641 * |
复杂信息系统的数据提取、建模及其应用;黄鹏飞;《中国优秀硕士学位论文全文数据库 社会科学Ⅰ辑》;20190615(第06期);G113-42 * |
Also Published As
Publication number | Publication date |
---|---|
CN112597242A (en) | 2021-04-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP3754514A1 (en) | Distributed database cluster system, data synchronization method and storage medium | |
US10565071B2 (en) | Smart data replication recoverer | |
WO2010028529A1 (en) | A method and device for maintaining a changelog in data synchronization | |
CN112597242B (en) | Extraction method based on application system data slices related to batch tasks | |
CN109901948B (en) | Remote double-active disaster recovery system of shared-nothing database cluster | |
CN111147560B (en) | Data synchronization method based on HTTP (hyper text transport protocol) and breakpoint continuous transmission | |
CN115438122A (en) | Data heterogeneous synchronization system | |
CN105574127A (en) | Quasi real-time disaster recovery method of distributed database system | |
CN110333973B (en) | Multi-machine hot standby method and system | |
CN111488243B (en) | Backup and recovery method and device for MongoDB database, electronic equipment and storage medium | |
CN113438111A (en) | Method for restoring RabbitMQ network partition based on Raft distribution and application | |
CN116185697B (en) | Container cluster management method, device and system, electronic equipment and storage medium | |
CN112800060A (en) | Data processing method and device, computer readable storage medium and electronic equipment | |
CN112068994A (en) | Method, apparatus, device and medium for data persistence during storage cluster runtime | |
CN116383161A (en) | File synchronization method, device and medium | |
CN108984660A (en) | A kind of MySQL database master-slave synchronisation data duplicate removal method | |
CN113297134B (en) | Data processing system, data processing method and device, and electronic device | |
CN111444281B (en) | Database parameter synchronization method and system | |
KR20100061983A (en) | Method and system for operating management of real-time replicated database | |
CN111680040A (en) | Data table processing method and device | |
CN114281607A (en) | Task synchronization method, device, main and standby system and storage medium | |
CN114510539B (en) | Method for generating and applying consistency check point of distributed database | |
CN106446031A (en) | Quick node replacement method for large-scale cluster database | |
CN109710690B (en) | Service driving calculation method and system | |
CN115794950A (en) | Snapshot library processing method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |