CN115062060A - Method for improving spring-batch framework batch processing execution efficiency - Google Patents

Method for improving spring-batch framework batch processing execution efficiency Download PDF

Info

Publication number
CN115062060A
CN115062060A CN202210706496.9A CN202210706496A CN115062060A CN 115062060 A CN115062060 A CN 115062060A CN 202210706496 A CN202210706496 A CN 202210706496A CN 115062060 A CN115062060 A CN 115062060A
Authority
CN
China
Prior art keywords
file
batch
interface
spring
cache
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210706496.9A
Other languages
Chinese (zh)
Inventor
王钰博
徐俊霞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Pudong Development Bank Co Ltd
Original Assignee
Shanghai Pudong Development Bank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Pudong Development Bank Co Ltd filed Critical Shanghai Pudong Development Bank Co Ltd
Priority to CN202210706496.9A priority Critical patent/CN115062060A/en
Publication of CN115062060A publication Critical patent/CN115062060A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24552Database cache management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24532Query optimisation of parallel queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5017Task decomposition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5018Thread allocation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention relates to a method for improving spring-batch frame batch execution efficiency, which comprises the steps of establishing a thread fragment file based on a file thread database cache interface, then loading the number of pages and writing, and splitting an independent large file or a large result set into small files or small result sets according to rules for subsequent processing; a developer writes query/read logic according to business requirements by using a business reader interface, loads data to be processed and transfers the data to be processed to a processor layer; configuring single-time cache number by using a file cache entity interface, reading corresponding line data from the thread fragment file at one time, converting the line data into a line data entity, and writing the line data entity into a redis cache; configuring basic information of a read file by using a file reading interface; and creating a spring-batch framework public class through a Reader interface of the springframe. Compared with the prior art, the method has the advantages of greatly improving the batch execution efficiency, reducing the service risk of large data volume and high data timeliness, and the like.

Description

Method for improving spring-batch framework batch processing execution efficiency
Technical Field
The invention relates to the technical field of computers, in particular to a method for improving spring-batch framework batch processing execution efficiency.
Background
In modern enterprise applications, for complex services and massive data, besides various processing through a complicated human-computer interaction interface, a processing mode is also available, namely batch processing, manual intervention is not needed in batch processing, and only large batches of data need to be read periodically, and then corresponding service processing is completed and filed. As an indispensable data processing method in modern enterprise applications, a batch processing method has a problem of how to efficiently execute batch processing tasks.
At present, under the condition that a distributed system uses multiple threads, the processing flow of the existing batch processing architecture reads all required processing data through a reader layer and generates a fragment thread file, each piece of data separately obtains and transfers each piece of data to a processor layer to perform business logic, and then a result is output through a writer until all data processing is completed. However, the disadvantages of this method are mainly that the batch execution time is long under the processing of large data volume, the resource consumption is large, and there is a certain risk for the service with high timeliness.
Disclosure of Invention
The present invention is directed to overcome the above-mentioned defects in the prior art, and provides a method for improving the batch execution efficiency of a spring-batch framework, which can greatly improve the batch execution efficiency, reduce the business risk of large data volume and high data timeliness, bear more business functions in a fixed time period, and improve the productivity and output.
The purpose of the invention can be realized by the following technical scheme:
a method for improving spring-batch framework batch processing execution efficiency comprises the following specific contents:
creating a thread fragment file based on a file thread database cache interface, then loading the number of pages and writing, and splitting an independent large file or a large result set into small files or small result sets according to rules for subsequent processing;
a developer writes query/read logic according to business requirements by using a business reader interface, loads data to be processed through a file thread database cache interface, and transfers the data to be processed to a processor layer;
configuring single-time cache number by using a file cache entity interface, reading corresponding line data from the thread fragment file at one time, converting the line data into a line data entity, and writing the line data entity into a redis cache;
configuring basic information of a read file by using a file reading interface;
and creating a spring-batch framework public class through a Reader interface of the springframe.
Further, the basic information of the read file includes, but is not limited to, a delimiter, a character code, a line feed character, and a skip line number.
Further, the file thread database cache interface creates a thread fragment file by using a task execution of spring.
Further, the file thread database cache interface loads and writes the number of pages through a batchKeyLoadPageSize parameter, and splits an individual large file or a large result set into small files according to rules.
Further, the large file is a file with a size larger than a preset size, and the small file is a file with a size smaller than or equal to the preset size. And if the large result set is greater than or equal to a preset result set size threshold, the result set is a large result set, and the small result set is a result set smaller than the result set size threshold.
Further, the file cache entity interface provides a cacheLineNum parameter to implement configuration of the single cache number.
Further, the file cache entity interface reads corresponding line data from the thread fragment file at one time, converts the line data into a line data entity by a transform method, and writes the line data entity into a redis cache.
Compared with the prior art, the method for improving the spring-batch framework batch execution efficiency provided by the invention at least has the following beneficial effects:
1) according to the invention, the batch data to be processed is subjected to pre-caching processing, and a plurality of pieces of data can be processed through single interaction, so that the data scanning and database interaction times are reduced, the total number of fragment and thread processing is reduced, and the system batch processing execution time is shortened from the whole level; the consumption of resources is reduced, the influence of timeliness on the service is avoided as far as possible, and the productivity output is improved.
2) The method can be used in both a database level and a file level, and can cover most batch processing scenes and unify batch processing modes and flows.
Drawings
FIG. 1 is a schematic diagram illustrating the relationship between elements of the method for improving the batch execution efficiency of the spring-batch framework in the embodiment;
FIG. 2 is a logic flow diagram illustrating a method for improving the batch execution efficiency of the spring-batch framework in the embodiment.
Detailed Description
The invention is described in detail below with reference to the figures and the specific embodiments. It should be apparent that the described embodiments are only some of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, shall fall within the scope of protection of the present invention.
Examples
The invention relates to a method for improving the batch execution efficiency of a spring-batch framework, which comprises the following basic concepts and principles: the spring-batch is an open-source large-data-volume parallel processing framework, a lightweight robust parallel processing application can be constructed through the spring-batch, the transaction, concurrency, flow and monitoring are supported, and unified interface management and task management are provided.
A key problem encountered when using spring batch for data migration is as follows: how to guarantee the memory when the data migration volume is large. When using spring batch, three things must be configured: reader, processor, and writer. The reader is used for reading data from the database, and when the data volume is small, the logic of the reader does not exert too much pressure on the memory, but when the data volume to be read is very large, the problems in terms of the memory and the like have to be considered, because if the data volume is very large, the memory, the execution time and the like are affected. Writer is used to write data from the Spring Batch application to a particular destination. The processor is a class containing a processing code for processing the data read into the spring batch. If an application reads n records, then code in the processor will execute on each record.
The element relationship of the method for improving the spring-batch framework batch execution efficiency is shown in fig. 1, wherein:
JOB: the method is a core concept of the spring-batch framework, comprises all operations of batch processing, and is a batch task;
STEP: each JOB is composed of one or more STEPs, which are task STEPs in a batch of tasks; the STEP is respectively connected with the reader interface, the processor interface and the writer interface in an abutting mode.
A reader: a data source reading interface;
a processor: a service logic processing interface;
and (3) writer: and outputting the processed data through an interface.
Based on the element relationship of fig. 1, the present invention creates a uniform and normative batch processing common class for the inheritance of independent business readers, the logic of the method utilizes cache, and the actual process is shown in fig. 2, wherein:
service reader: developers only need to write query/read logic according to business requirements, data to be processed is transferred to a processor layer, and the data to be processed is obtained through following loader loading processing.
FileThreadDBCacheReader: the file thread database cache interface creates a thread fragment file by using a task execution of spring, loads and writes page numbers through a batchKeyLoadPageSize parameter, and splits an independent large file or a large result set into small files or small result sets according to rules for subsequent processing. It should be noted that the large file is a file larger than a preset size, and the small file is a file smaller than or equal to the preset size. And when the result set is judged to be a large result set or a small result set, if the result set is larger than or equal to a preset result set size threshold value, the result set is the large result set, and if the result set is smaller than the result set size threshold value, the result set is the small result set.
Absfilecacheentireader: and the file cache entity interface provides a configurable single cache number of cacheLineNum parameters, reads corresponding line data from the thread fragment file at one time, converts the line data into a line data entity through a transform method, and writes the line data entity into a redis cache.
AbsFileReader: the file reading interface is used for configuring basic information of a read file, such as: a delimiter, a character code, a line break, a number of skipped lines, etc.
XXXReader of springframe: spring-batch framework public class.
In summary, the invention performs pre-caching processing on the data to be processed, can process a plurality of pieces of data through single interaction, and can reduce the data scanning and database interaction times, thereby reducing the total number of fragment and thread processing. The database layer and the file layer can be used, most batch processing scenes are covered, batch processing modes and processes are unified, and system batch processing execution time is shortened from the whole layer. In practical use, the method can shorten the accounting processing batch (only the batch is taken as an example) of the original system from 3 hours to 1 hour, wherein the batch comprises the processing of hundred million-level data, and the database operation of a plurality of hundred million-level large tables is related, so that the batch processing time is greatly shortened, the consumption of resources is reduced, the influence of timeliness on the service is avoided as far as possible, and the capacity and the output are improved.
While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and those skilled in the art can easily conceive of various equivalent modifications or substitutions within the technical scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (8)

1. A method for improving the batch execution efficiency of a spring-batch framework is characterized by comprising the following steps:
creating a thread fragment file based on a file thread database cache interface, then loading the number of pages and writing, and splitting an independent large file or a large result set into small files or small result sets according to rules for subsequent processing;
compiling query/read logic according to business requirements by using a business reader interface, loading data to be processed through a file thread database cache interface, and transferring the data to be processed to a processor layer;
configuring single-time cache number by using a file cache entity interface, reading corresponding line data from the thread fragment file at one time, converting the line data into a line data entity, and writing the line data entity into a redis cache;
configuring basic information of a read file by using a file reading interface;
and creating a spring-batch framework public class through a Reader interface of the springframe.
2. The method for improving the batch execution efficiency of the spring-batch framework of claim 1, wherein the file thread database cache interface utilizes task execution of spring to create a thread fragment file.
3. The method for improving the batch execution efficiency of the spring-batch framework of claim 2, wherein the file thread database cache interface loads and writes the number of the paging entries through a batch keyloadpagesize parameter, and splits an individual large file or a large result set into small files according to rules.
4. The method of claim 3, wherein the large file is a file with a size larger than a preset size, and the small file is a file with a size smaller than or equal to the preset size.
5. The method of claim 3, wherein if the large result set is a result set greater than or equal to a predetermined result set size threshold, the result set is a large result set, and the small result set is a result set smaller than the result set size threshold.
6. The method for improving the batch execution efficiency of the spring-batch framework of claim 1, wherein the file cache entity interface provides a cacheLineNum parameter to implement configuration of the number of single caches.
7. The method for improving the batch execution efficiency of the spring-batch framework according to claim 6, wherein the file cache entity interface reads the corresponding line data from the thread fragment file at one time, converts the line data into the line data entity by a transform method, and writes the line data entity into a redis cache.
8. The method for improving the batch execution efficiency of the spring-batch framework of claim 1, wherein the basic information of the read file includes but is not limited to a delimiter, a character code, a line feed character and a skip line number.
CN202210706496.9A 2022-06-21 2022-06-21 Method for improving spring-batch framework batch processing execution efficiency Pending CN115062060A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210706496.9A CN115062060A (en) 2022-06-21 2022-06-21 Method for improving spring-batch framework batch processing execution efficiency

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210706496.9A CN115062060A (en) 2022-06-21 2022-06-21 Method for improving spring-batch framework batch processing execution efficiency

Publications (1)

Publication Number Publication Date
CN115062060A true CN115062060A (en) 2022-09-16

Family

ID=83202202

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210706496.9A Pending CN115062060A (en) 2022-06-21 2022-06-21 Method for improving spring-batch framework batch processing execution efficiency

Country Status (1)

Country Link
CN (1) CN115062060A (en)

Similar Documents

Publication Publication Date Title
US10963358B2 (en) Usage profile based recommendations
CN108280023B (en) Task execution method and device and server
CN111339073A (en) Real-time data processing method and device, electronic equipment and readable storage medium
CN107301214A (en) Data migration method, device and terminal device in HIVE
CN106557307B (en) Service data processing method and system
CN111580939B (en) Method and device for processing transactions in hierarchical and asynchronous mode
CN109885642B (en) Hierarchical storage method and device for full-text retrieval
CN114722119A (en) Data synchronization method and system
CN113221182A (en) Bank log desensitization method and device
CN112035230B (en) Task scheduling file generation method, device and storage medium
CN113256355B (en) Method, device, medium, equipment and system for determining integral rights and interests in real time
CN115062060A (en) Method for improving spring-batch framework batch processing execution efficiency
CN115509608B (en) Instruction optimization method and device, electronic equipment and computer-readable storage medium
CN111143461A (en) Mapping relation processing system and method and electronic equipment
US8229946B1 (en) Business rules application parallel processing system
CN116107772A (en) Multithreading data processing method and device, processor and electronic equipment
CN114691653A (en) Account set migration method and device, computer equipment and storage medium
CN110262758B (en) Data storage management method, system and related equipment
CN113032385A (en) Easily-extensible configurable data backup system and method
WO2019134238A1 (en) Method for executing auxiliary function, device, storage medium, and terminal
CN112559641A (en) Processing method and device of pull chain table, readable storage medium and electronic equipment
CN111782608A (en) Automatic file generation method and device, electronic equipment and storage medium
US8495033B2 (en) Data processing
US11182238B2 (en) Problematic characters
CN110209746B (en) Data processing method and device for data warehouse

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination