CN112597242B

CN112597242B - Extraction method based on application system data slices related to batch tasks

Info

Publication number: CN112597242B
Application number: CN202011485803.2A
Authority: CN
Inventors: 张妍洁; 唐振华; 朱小容; 杨斌; 廖雪强
Original assignee: Sichuan XW Bank Co Ltd
Current assignee: Sichuan XW Bank Co Ltd
Priority date: 2020-12-16
Filing date: 2020-12-16
Publication date: 2023-06-06
Anticipated expiration: 2040-12-16
Also published as: CN112597242A

Abstract

The invention discloses an extraction method based on application system data slices related to batch tasks, which belongs to the technical field of data processing and solves the problems that large data extraction data slices and the application system execute batch tasks to serially lengthen batch task time, the extraction is not completed in a specified time, and then batch task pollution data is extracted during execution, and the method comprises the following steps: preprocessing a database; interrupting master-slave synchronization; the large data sampling number and the subsequent batch parallel execution of the application system; and recovering the master-slave synchronization. The purpose of the invention is that: and the time of the batch task at the end of the day is shortened, and the pollution to the data slice to be extracted is avoided. The invention is suitable for the application of banks or financial institutions which relate to more batch tasks and need big data to extract data slices.

Description

Extraction method based on application system data slices related to batch tasks

Technical Field

The invention relates to the technical field of data processing, in particular to an extraction method based on application system data slices related to batch tasks.

Background

The database cluster of the application system adopts a master-slave architecture system. When the writing operation is carried out under the same instance of the data master library, after the data is successfully written in the master library, a mechanism for automatically synchronizing the data is triggered to synchronize all operations in the current time period of the master library to the slave library and the standby library.

In the database use process, the main database is used for the application system adding, deleting and checking operation. The slave library is used for application part query operations. The backup library is used for extracting big data.

Convention: the big data extraction is that the application system synchronizes the data from the master library to the slave library and then from the slave library to the standby library, and the application does not relate to the related use of the slave library. To keep uniformity with industry terminology, actual master-slave synchronization in the following description is collectively referred to as "master-slave synchronization"; all the libraries in the description are replaced by "slave libraries".

In the prior art, big data may extract data slices before a certain time point in a plurality of batch tasks of an application system. In the extraction process, in order to avoid data pollution caused by the fact that data to be extracted is changed by the execution of batch tasks of a subsequent application system, the batch tasks can be suspended for a period of time, and the batch tasks can be continuously executed after waiting for the end of large data extraction; the duration of the pause may be set.

The application system mentioned in the scheme refers to an application system related to batch tasks in the financial industry, and comprises application systems such as a credit core application system, a line core application system and the like; according to the function division, application systems such as batch processing credit business, batch processing of account checking files of a discharging partner, batch processing of general account running water and the like can be available.

In summary, in the conventional extraction method, there are two problems:

1. when the large data sampling number is abnormal, the waiting time is not finished yet after the setting, and the continuous execution of the batch tasks of the application system can pollute the data slices to be extracted.

2. The batch tasks of the application system need to be suspended in the middle of waiting for large data decimation, so that the overall execution time is prolonged.

Disclosure of Invention

Aiming at the problems that in the prior art, the large data sampling number is abnormal, the waiting time is not finished yet, the continuous execution of the batch tasks of the application system pollutes the data slices to be extracted, the batch tasks of the application system need to suspend waiting for the large data sampling number midway, and the total execution time of the batch tasks is prolonged, the invention provides a method for extracting the data slices of the application system based on the batch tasks, which aims at: the large data sampling number and the execution of the batch tasks can be synchronously performed, the pollution of data slices needing to be extracted can be avoided, and meanwhile, the total execution time of the batch tasks is shortened.

In order to achieve the above purpose, the invention adopts the following technical scheme:

an extraction method based on application system data slices related to batch tasks comprises the following steps:

step A, preprocessing a database, specifically: respectively adding a database in the same instance of a master library and a slave library of an application system, newly establishing a table X in the added database of the master library, and newly establishing a table X 'in the added database of the slave library, wherein the names of the table X and the table X' are the same;

and (B) step (B): when the application system executes the large data sampling nodes in batches, the master-slave synchronous tasks are interrupted, and interrupt nodes are generated;

step C, the big data extracts the application system from the database data and simultaneously the application system continuously executes the subsequent batch tasks;

and D, recovering the master-slave data synchronization from the master-slave synchronization interrupt node.

In the invention, by changing the table name of the slave library and inserting a record in the table of the master library, when the record is ready to be synchronized to the slave library, the corresponding database table cannot be found, so that the master-slave synchronization is disconnected, and an interrupt node is generated. At this time, no matter any modification is made on any table of the master library, the master-slave synchronization cannot be continued. At this point the big data starts to extract the data slice. After the big data is completely extracted from the slave library, after the extracted data completion notification is sent, the application system modifies the table name of the slave library, the table name of the modified slave library is the same as that of the master library, and the master-slave synchronization can be continued. The large data extraction slave database data process and the application system executing the subsequent batch tasks can be performed simultaneously, the application system does not pause batch and wait any more, and the total execution time of the batch tasks is shortened.

Further, the step a specifically includes: respectively adding a database in the same instance of the master library and the slave library of the application system, newly establishing a table X in the added database of the master library, and newly establishing a table X 'in the added database of the slave library, wherein the names of the table X and the table X' are the same; the table names of the newly built tables in the master library and the slave library can be named randomly according to habits and requirements, so that the understanding and management are convenient.

Further, the step B specifically includes: when the batch task is executed by the application system and reaches the big data lottery node, writing a mark of ' reaching the lottery node ' into the main library service table, modifying the name of the table X ' in the slave library, and adding any record into the table X in the main library.

The invention is realized through daily batch business logic of the application system, and after the lottery node is reached, the following batch is executed before: firstly, writing a mark of ' reaching the lottery node ' into a main library service table, secondly, modifying the name of a slave library table X ', and finally, adding any record into the main library table X, wherein the record does not need to have service meaning. When the table X is newly added with records, master-slave synchronization is initiated to the table X', at the moment, the table name which is the same as the table X cannot be found, the master-slave synchronization of the data fails, and the failure of synchronizing the master library data of all subsequent application systems to the slave library is triggered, so that the master-slave synchronization is interrupted artificially.

Further, the step C specifically comprises the following steps: the big data starts the data checking task, and starts to draw the slave library after the 'reaching the drawing node' mark is polled.

Further, in the step C, during the process of extracting the data from the big data, the application system continues to execute the subsequent batch tasks, and the synchronization of the slave library is not triggered while the data of the master library is changed.

The master-slave synchronization is interrupted, so that when large data is extracted, the problem that the data change caused by continuously executing batch tasks can be immediately synchronized to the data of the slave library influencing the extraction of the large data is avoided, and the data is prevented from being polluted.

Further, the step D specifically includes: after the big data is extracted, a notice of the completion of data extraction is sent to an application system, after the application system receives the notice, the names of the table X' in the slave library are modified to the original names, and all slave interrupt nodes are not synchronized to the data of the slave library, and the master-slave data synchronization is sequentially restored.

After the big data extraction is completed, the master-slave synchronization is recovered by modifying the table name of the slave table X', and all failed data are sequentially synchronized after the node is interrupted. Until the data of the master and slave libraries agree.

In summary, due to the adoption of the technical scheme, the beneficial effects of the invention are as follows:

1. the method effectively controls the disconnection and recovery of the master-slave synchronization by modifying the newly added table names in the slave library, and avoids the pollution of the data slices to be extracted due to the execution of subsequent batch tasks.

2. The large data extraction slave library data process and the application system execution batch task can be performed simultaneously, so that the total time consumption of batch task processing is shortened.

3. The master-slave synchronous switch is completely mastered in the service logic code of the application system, and is convenient, flexible and controllable.

Drawings

FIG. 1 is a schematic diagram of an embodiment of the present invention;

fig. 2 is a schematic diagram of a specific implementation of an embodiment of the present invention.

Detailed Description

All of the features disclosed in this specification, or all of the steps in a method or process disclosed, may be combined in any combination, except for mutually exclusive features and/or steps.

The invention will be further described with reference to the drawings and detailed description.

As shown in fig. 1, which is an embodiment of a method of extraction based on application system data slices involving batch tasks,

comprising the following steps:

step A, preprocessing a database;

the method comprises the following steps: and respectively adding a database in the same instance of the master library and the slave library of the application system, newly establishing a table X in the added database of the master library, and newly establishing a table X 'in the added database of the slave library, wherein the names of the table X and the table X' are the same and have no naming requirement.

And (B) step (B): when the node is in the large data sampling number, the master-slave synchronous task is interrupted, and an interrupt node is generated;

the method comprises the following steps: when the batch task is executed by the application system and reaches the big data lottery node, writing a mark of ' reaching the lottery node ' into the main library service table, modifying the name of the table X ' in the slave library, and adding any record into the table X in the main library.

Step C:

the big data starts the data checking task, and after the 'reaching the lottery node' mark is polled, the data extraction from the library is started. Meanwhile, the application system continues to execute the subsequent batch tasks, and the synchronization of the slave library is not triggered while the data of the master library is changed.

The method comprises the following steps: after the large data is extracted, a notice of the completion of the data extraction is sent to the application system, and after the application system receives the notice of the large data, the names of the table X' in the slave library are modified to the original names, and all slave interrupt nodes do not synchronize the data of the slave library and sequentially recover the master-slave data synchronization.

Application system embodiment:

a new database A and a new database B are respectively established in a master library and a slave library of the application system, and a new table named as swich_on is established in both the new database A and the new database B. When the daily application system executes batch tasks to reach the big data decimation node: firstly, writing a ready identifier into a main library; then, modifying the names of the tables in the slave library, and modifying the 'swich_ on' to 'swich_ off'; and finally, writing a record R taking the current service date as a main key into a table of 'swichon' in the main library. Meanwhile, the polling task of big data starts to draw from the slave library after polling the "ready" flag.

When the record R is synchronized from the swith_on of the slave library to the swith_on of the slave library, the slave library is found that the swith_on table does not exist and cannot be synchronized, and further the synchronization thread is interrupted, and whether the data state can be synchronized or not is always checked by the node polling. In this state, no matter any modification is made to any piece of table data of the master library, synchronization to the slave library is impossible. The application system can continue with the subsequent batch task without fear that the data changes resulting from executing the batch task will immediately synchronize to the slave library contaminating the data slice to be extracted. Therefore, the purpose that the subsequent execution of batch tasks by the application system and the parallel processing of large data decimation are not mutually influenced is achieved.

After the big data extraction is completed, an MQ notification of the completion of the extraction is sent. After the application subscribes to and consumes this notification, the table "switch_off" in the slave library is modified to "switch_on". At this time, the table names of the master library and the slave library are the same, the master-slave synchronization is restarted from the node of the interrupt point, and all failed data are sequentially recovered to be synchronized until the data of the master library and the slave library are completely consistent.

The above is merely representative examples of numerous specific applications of the present invention and should not be construed as limiting the scope of the invention in any way. All technical schemes formed by adopting transformation or equivalent substitution fall within the protection scope of the invention.

Claims

1. An extraction method based on application system data slices related to batch tasks is characterized by comprising the following steps: comprising the following steps:

step A, preprocessing a database, specifically: respectively adding a database in the same instance of a master library and a slave library of the application system, newly establishing a table X in the database added in the master library, and newly establishing a table X' in the database added in the slave library;

and (B) step (B): interrupting master-slave synchronization; the step B specifically comprises the following steps: when the batch task executed by the application system reaches a big data decimation node, notifying the big data that the decimation can be performed, modifying the name of the table X' in the slave library, and adding any record to the table X in the master library

step D, recovering master-slave synchronization; the step D specifically comprises the following steps: after the large data extraction is completed, sending a notice of data extraction completion to an application system, and after the application system receives the notice, modifying the name of the table X' in the database back to the original name; all slave interrupt nodes are not synchronized to the slave library data and begin sequentially restoring master-slave data synchronization.

2. The method for extracting data slices based on application systems involving batch tasks according to claim 1, wherein the method comprises the following steps: the step A specifically comprises the following steps:

and the table names of the new table X and the new table X' in the master library and the slave library are not required to be named and are the same.

3. The method for extracting data slices based on application systems involving batch tasks according to claim 1, wherein the method comprises the following steps: the step C specifically comprises the following steps:

and after the big data receives the lottery notification, starting to extract data from the database.

4. The method for extracting data slices based on application systems involving batch tasks according to claim 1, wherein the method comprises the following steps: in the step C, in the process of extracting the big data, the application system continues to execute the subsequent batch tasks, and the synchronization of the slave library is not triggered while the data of the master library is changed.