CN115062060A

CN115062060A - Method for improving spring-batch framework batch processing execution efficiency

Info

Publication number: CN115062060A
Application number: CN202210706496.9A
Authority: CN
Inventors: 王钰博; 徐俊霞
Original assignee: Shanghai Pudong Development Bank Co Ltd
Current assignee: Shanghai Pudong Development Bank Co Ltd
Priority date: 2022-06-21
Filing date: 2022-06-21
Publication date: 2022-09-16

Abstract

The invention relates to a method for improving spring-batch frame batch execution efficiency, which comprises the steps of establishing a thread fragment file based on a file thread database cache interface, then loading the number of pages and writing, and splitting an independent large file or a large result set into small files or small result sets according to rules for subsequent processing; a developer writes query/read logic according to business requirements by using a business reader interface, loads data to be processed and transfers the data to be processed to a processor layer; configuring single-time cache number by using a file cache entity interface, reading corresponding line data from the thread fragment file at one time, converting the line data into a line data entity, and writing the line data entity into a redis cache; configuring basic information of a read file by using a file reading interface; and creating a spring-batch framework public class through a Reader interface of the springframe. Compared with the prior art, the method has the advantages of greatly improving the batch execution efficiency, reducing the service risk of large data volume and high data timeliness, and the like.

Description

Method for improving spring-batch framework batch processing execution efficiency

Technical Field

The invention relates to the technical field of computers, in particular to a method for improving spring-batch framework batch processing execution efficiency.

Background

In modern enterprise applications, for complex services and massive data, besides various processing through a complicated human-computer interaction interface, a processing mode is also available, namely batch processing, manual intervention is not needed in batch processing, and only large batches of data need to be read periodically, and then corresponding service processing is completed and filed. As an indispensable data processing method in modern enterprise applications, a batch processing method has a problem of how to efficiently execute batch processing tasks.

At present, under the condition that a distributed system uses multiple threads, the processing flow of the existing batch processing architecture reads all required processing data through a reader layer and generates a fragment thread file, each piece of data separately obtains and transfers each piece of data to a processor layer to perform business logic, and then a result is output through a writer until all data processing is completed. However, the disadvantages of this method are mainly that the batch execution time is long under the processing of large data volume, the resource consumption is large, and there is a certain risk for the service with high timeliness.

Disclosure of Invention

The present invention is directed to overcome the above-mentioned defects in the prior art, and provides a method for improving the batch execution efficiency of a spring-batch framework, which can greatly improve the batch execution efficiency, reduce the business risk of large data volume and high data timeliness, bear more business functions in a fixed time period, and improve the productivity and output.

The purpose of the invention can be realized by the following technical scheme:

a method for improving spring-batch framework batch processing execution efficiency comprises the following specific contents:

creating a thread fragment file based on a file thread database cache interface, then loading the number of pages and writing, and splitting an independent large file or a large result set into small files or small result sets according to rules for subsequent processing;

a developer writes query/read logic according to business requirements by using a business reader interface, loads data to be processed through a file thread database cache interface, and transfers the data to be processed to a processor layer;

configuring single-time cache number by using a file cache entity interface, reading corresponding line data from the thread fragment file at one time, converting the line data into a line data entity, and writing the line data entity into a redis cache;

configuring basic information of a read file by using a file reading interface;

and creating a spring-batch framework public class through a Reader interface of the springframe.

Further, the basic information of the read file includes, but is not limited to, a delimiter, a character code, a line feed character, and a skip line number.

Further, the file thread database cache interface creates a thread fragment file by using a task execution of spring.

Further, the file thread database cache interface loads and writes the number of pages through a batchKeyLoadPageSize parameter, and splits an individual large file or a large result set into small files according to rules.

Further, the large file is a file with a size larger than a preset size, and the small file is a file with a size smaller than or equal to the preset size. And if the large result set is greater than or equal to a preset result set size threshold, the result set is a large result set, and the small result set is a result set smaller than the result set size threshold.

Further, the file cache entity interface provides a cacheLineNum parameter to implement configuration of the single cache number.

Further, the file cache entity interface reads corresponding line data from the thread fragment file at one time, converts the line data into a line data entity by a transform method, and writes the line data entity into a redis cache.

Compared with the prior art, the method for improving the spring-batch framework batch execution efficiency provided by the invention at least has the following beneficial effects:

1) according to the invention, the batch data to be processed is subjected to pre-caching processing, and a plurality of pieces of data can be processed through single interaction, so that the data scanning and database interaction times are reduced, the total number of fragment and thread processing is reduced, and the system batch processing execution time is shortened from the whole level; the consumption of resources is reduced, the influence of timeliness on the service is avoided as far as possible, and the productivity output is improved.

2) The method can be used in both a database level and a file level, and can cover most batch processing scenes and unify batch processing modes and flows.

Drawings

FIG. 1 is a schematic diagram illustrating the relationship between elements of the method for improving the batch execution efficiency of the spring-batch framework in the embodiment;

FIG. 2 is a logic flow diagram illustrating a method for improving the batch execution efficiency of the spring-batch framework in the embodiment.

Detailed Description

The invention is described in detail below with reference to the figures and the specific embodiments. It should be apparent that the described embodiments are only some of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, shall fall within the scope of protection of the present invention.

Examples

The invention relates to a method for improving the batch execution efficiency of a spring-batch framework, which comprises the following basic concepts and principles: the spring-batch is an open-source large-data-volume parallel processing framework, a lightweight robust parallel processing application can be constructed through the spring-batch, the transaction, concurrency, flow and monitoring are supported, and unified interface management and task management are provided.

A key problem encountered when using spring batch for data migration is as follows: how to guarantee the memory when the data migration volume is large. When using spring batch, three things must be configured: reader, processor, and writer. The reader is used for reading data from the database, and when the data volume is small, the logic of the reader does not exert too much pressure on the memory, but when the data volume to be read is very large, the problems in terms of the memory and the like have to be considered, because if the data volume is very large, the memory, the execution time and the like are affected. Writer is used to write data from the Spring Batch application to a particular destination. The processor is a class containing a processing code for processing the data read into the spring batch. If an application reads n records, then code in the processor will execute on each record.

The element relationship of the method for improving the spring-batch framework batch execution efficiency is shown in fig. 1, wherein:

JOB: the method is a core concept of the spring-batch framework, comprises all operations of batch processing, and is a batch task;

STEP: each JOB is composed of one or more STEPs, which are task STEPs in a batch of tasks; the STEP is respectively connected with the reader interface, the processor interface and the writer interface in an abutting mode.

A reader: a data source reading interface;

a processor: a service logic processing interface;

and (3) writer: and outputting the processed data through an interface.

Based on the element relationship of fig. 1, the present invention creates a uniform and normative batch processing common class for the inheritance of independent business readers, the logic of the method utilizes cache, and the actual process is shown in fig. 2, wherein:

service reader: developers only need to write query/read logic according to business requirements, data to be processed is transferred to a processor layer, and the data to be processed is obtained through following loader loading processing.

FileThreadDBCacheReader: the file thread database cache interface creates a thread fragment file by using a task execution of spring, loads and writes page numbers through a batchKeyLoadPageSize parameter, and splits an independent large file or a large result set into small files or small result sets according to rules for subsequent processing. It should be noted that the large file is a file larger than a preset size, and the small file is a file smaller than or equal to the preset size. And when the result set is judged to be a large result set or a small result set, if the result set is larger than or equal to a preset result set size threshold value, the result set is the large result set, and if the result set is smaller than the result set size threshold value, the result set is the small result set.

Absfilecacheentireader: and the file cache entity interface provides a configurable single cache number of cacheLineNum parameters, reads corresponding line data from the thread fragment file at one time, converts the line data into a line data entity through a transform method, and writes the line data entity into a redis cache.

AbsFileReader: the file reading interface is used for configuring basic information of a read file, such as: a delimiter, a character code, a line break, a number of skipped lines, etc.

XXXReader of springframe: spring-batch framework public class.

In summary, the invention performs pre-caching processing on the data to be processed, can process a plurality of pieces of data through single interaction, and can reduce the data scanning and database interaction times, thereby reducing the total number of fragment and thread processing. The database layer and the file layer can be used, most batch processing scenes are covered, batch processing modes and processes are unified, and system batch processing execution time is shortened from the whole layer. In practical use, the method can shorten the accounting processing batch (only the batch is taken as an example) of the original system from 3 hours to 1 hour, wherein the batch comprises the processing of hundred million-level data, and the database operation of a plurality of hundred million-level large tables is related, so that the batch processing time is greatly shortened, the consumption of resources is reduced, the influence of timeliness on the service is avoided as far as possible, and the capacity and the output are improved.

While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and those skilled in the art can easily conceive of various equivalent modifications or substitutions within the technical scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A method for improving the batch execution efficiency of a spring-batch framework is characterized by comprising the following steps:

compiling query/read logic according to business requirements by using a business reader interface, loading data to be processed through a file thread database cache interface, and transferring the data to be processed to a processor layer;

configuring basic information of a read file by using a file reading interface;

2. The method for improving the batch execution efficiency of the spring-batch framework of claim 1, wherein the file thread database cache interface utilizes task execution of spring to create a thread fragment file.

3. The method for improving the batch execution efficiency of the spring-batch framework of claim 2, wherein the file thread database cache interface loads and writes the number of the paging entries through a batch keyloadpagesize parameter, and splits an individual large file or a large result set into small files according to rules.

4. The method of claim 3, wherein the large file is a file with a size larger than a preset size, and the small file is a file with a size smaller than or equal to the preset size.

5. The method of claim 3, wherein if the large result set is a result set greater than or equal to a predetermined result set size threshold, the result set is a large result set, and the small result set is a result set smaller than the result set size threshold.

6. The method for improving the batch execution efficiency of the spring-batch framework of claim 1, wherein the file cache entity interface provides a cacheLineNum parameter to implement configuration of the number of single caches.

7. The method for improving the batch execution efficiency of the spring-batch framework according to claim 6, wherein the file cache entity interface reads the corresponding line data from the thread fragment file at one time, converts the line data into the line data entity by a transform method, and writes the line data entity into a redis cache.

8. The method for improving the batch execution efficiency of the spring-batch framework of claim 1, wherein the basic information of the read file includes but is not limited to a delimiter, a character code, a line feed character and a skip line number.