CN115062060A - Method for improving spring-batch framework batch processing execution efficiency - Google Patents
Method for improving spring-batch framework batch processing execution efficiency Download PDFInfo
- Publication number
- CN115062060A CN115062060A CN202210706496.9A CN202210706496A CN115062060A CN 115062060 A CN115062060 A CN 115062060A CN 202210706496 A CN202210706496 A CN 202210706496A CN 115062060 A CN115062060 A CN 115062060A
- Authority
- CN
- China
- Prior art keywords
- file
- batch
- interface
- spring
- cache
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24552—Database cache management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2453—Query optimisation
- G06F16/24532—Query optimisation of parallel queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G06F2209/50—Indexing scheme relating to G06F9/50
- G06F2209/5017—Task decomposition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G06F2209/50—Indexing scheme relating to G06F9/50
- G06F2209/5018—Thread allocation
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Abstract
The invention relates to a method for improving spring-batch frame batch execution efficiency, which comprises the steps of establishing a thread fragment file based on a file thread database cache interface, then loading the number of pages and writing, and splitting an independent large file or a large result set into small files or small result sets according to rules for subsequent processing; a developer writes query/read logic according to business requirements by using a business reader interface, loads data to be processed and transfers the data to be processed to a processor layer; configuring single-time cache number by using a file cache entity interface, reading corresponding line data from the thread fragment file at one time, converting the line data into a line data entity, and writing the line data entity into a redis cache; configuring basic information of a read file by using a file reading interface; and creating a spring-batch framework public class through a Reader interface of the springframe. Compared with the prior art, the method has the advantages of greatly improving the batch execution efficiency, reducing the service risk of large data volume and high data timeliness, and the like.
Description
Technical Field
The invention relates to the technical field of computers, in particular to a method for improving spring-batch framework batch processing execution efficiency.
Background
In modern enterprise applications, for complex services and massive data, besides various processing through a complicated human-computer interaction interface, a processing mode is also available, namely batch processing, manual intervention is not needed in batch processing, and only large batches of data need to be read periodically, and then corresponding service processing is completed and filed. As an indispensable data processing method in modern enterprise applications, a batch processing method has a problem of how to efficiently execute batch processing tasks.
At present, under the condition that a distributed system uses multiple threads, the processing flow of the existing batch processing architecture reads all required processing data through a reader layer and generates a fragment thread file, each piece of data separately obtains and transfers each piece of data to a processor layer to perform business logic, and then a result is output through a writer until all data processing is completed. However, the disadvantages of this method are mainly that the batch execution time is long under the processing of large data volume, the resource consumption is large, and there is a certain risk for the service with high timeliness.
Disclosure of Invention
The present invention is directed to overcome the above-mentioned defects in the prior art, and provides a method for improving the batch execution efficiency of a spring-batch framework, which can greatly improve the batch execution efficiency, reduce the business risk of large data volume and high data timeliness, bear more business functions in a fixed time period, and improve the productivity and output.
The purpose of the invention can be realized by the following technical scheme:
a method for improving spring-batch framework batch processing execution efficiency comprises the following specific contents:
creating a thread fragment file based on a file thread database cache interface, then loading the number of pages and writing, and splitting an independent large file or a large result set into small files or small result sets according to rules for subsequent processing;
a developer writes query/read logic according to business requirements by using a business reader interface, loads data to be processed through a file thread database cache interface, and transfers the data to be processed to a processor layer;
configuring single-time cache number by using a file cache entity interface, reading corresponding line data from the thread fragment file at one time, converting the line data into a line data entity, and writing the line data entity into a redis cache;
configuring basic information of a read file by using a file reading interface;
and creating a spring-batch framework public class through a Reader interface of the springframe.
Further, the basic information of the read file includes, but is not limited to, a delimiter, a character code, a line feed character, and a skip line number.
Further, the file thread database cache interface creates a thread fragment file by using a task execution of spring.
Further, the file thread database cache interface loads and writes the number of pages through a batchKeyLoadPageSize parameter, and splits an individual large file or a large result set into small files according to rules.
Further, the large file is a file with a size larger than a preset size, and the small file is a file with a size smaller than or equal to the preset size. And if the large result set is greater than or equal to a preset result set size threshold, the result set is a large result set, and the small result set is a result set smaller than the result set size threshold.
Further, the file cache entity interface provides a cacheLineNum parameter to implement configuration of the single cache number.
Further, the file cache entity interface reads corresponding line data from the thread fragment file at one time, converts the line data into a line data entity by a transform method, and writes the line data entity into a redis cache.
Compared with the prior art, the method for improving the spring-batch framework batch execution efficiency provided by the invention at least has the following beneficial effects:
1) according to the invention, the batch data to be processed is subjected to pre-caching processing, and a plurality of pieces of data can be processed through single interaction, so that the data scanning and database interaction times are reduced, the total number of fragment and thread processing is reduced, and the system batch processing execution time is shortened from the whole level; the consumption of resources is reduced, the influence of timeliness on the service is avoided as far as possible, and the productivity output is improved.
2) The method can be used in both a database level and a file level, and can cover most batch processing scenes and unify batch processing modes and flows.
Drawings
FIG. 1 is a schematic diagram illustrating the relationship between elements of the method for improving the batch execution efficiency of the spring-batch framework in the embodiment;
FIG. 2 is a logic flow diagram illustrating a method for improving the batch execution efficiency of the spring-batch framework in the embodiment.
Detailed Description
The invention is described in detail below with reference to the figures and the specific embodiments. It should be apparent that the described embodiments are only some of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, shall fall within the scope of protection of the present invention.
Examples
The invention relates to a method for improving the batch execution efficiency of a spring-batch framework, which comprises the following basic concepts and principles: the spring-batch is an open-source large-data-volume parallel processing framework, a lightweight robust parallel processing application can be constructed through the spring-batch, the transaction, concurrency, flow and monitoring are supported, and unified interface management and task management are provided.
A key problem encountered when using spring batch for data migration is as follows: how to guarantee the memory when the data migration volume is large. When using spring batch, three things must be configured: reader, processor, and writer. The reader is used for reading data from the database, and when the data volume is small, the logic of the reader does not exert too much pressure on the memory, but when the data volume to be read is very large, the problems in terms of the memory and the like have to be considered, because if the data volume is very large, the memory, the execution time and the like are affected. Writer is used to write data from the Spring Batch application to a particular destination. The processor is a class containing a processing code for processing the data read into the spring batch. If an application reads n records, then code in the processor will execute on each record.
The element relationship of the method for improving the spring-batch framework batch execution efficiency is shown in fig. 1, wherein:
JOB: the method is a core concept of the spring-batch framework, comprises all operations of batch processing, and is a batch task;
STEP: each JOB is composed of one or more STEPs, which are task STEPs in a batch of tasks; the STEP is respectively connected with the reader interface, the processor interface and the writer interface in an abutting mode.
A reader: a data source reading interface;
a processor: a service logic processing interface;
and (3) writer: and outputting the processed data through an interface.
Based on the element relationship of fig. 1, the present invention creates a uniform and normative batch processing common class for the inheritance of independent business readers, the logic of the method utilizes cache, and the actual process is shown in fig. 2, wherein:
service reader: developers only need to write query/read logic according to business requirements, data to be processed is transferred to a processor layer, and the data to be processed is obtained through following loader loading processing.
FileThreadDBCacheReader: the file thread database cache interface creates a thread fragment file by using a task execution of spring, loads and writes page numbers through a batchKeyLoadPageSize parameter, and splits an independent large file or a large result set into small files or small result sets according to rules for subsequent processing. It should be noted that the large file is a file larger than a preset size, and the small file is a file smaller than or equal to the preset size. And when the result set is judged to be a large result set or a small result set, if the result set is larger than or equal to a preset result set size threshold value, the result set is the large result set, and if the result set is smaller than the result set size threshold value, the result set is the small result set.
Absfilecacheentireader: and the file cache entity interface provides a configurable single cache number of cacheLineNum parameters, reads corresponding line data from the thread fragment file at one time, converts the line data into a line data entity through a transform method, and writes the line data entity into a redis cache.
AbsFileReader: the file reading interface is used for configuring basic information of a read file, such as: a delimiter, a character code, a line break, a number of skipped lines, etc.
XXXReader of springframe: spring-batch framework public class.
In summary, the invention performs pre-caching processing on the data to be processed, can process a plurality of pieces of data through single interaction, and can reduce the data scanning and database interaction times, thereby reducing the total number of fragment and thread processing. The database layer and the file layer can be used, most batch processing scenes are covered, batch processing modes and processes are unified, and system batch processing execution time is shortened from the whole layer. In practical use, the method can shorten the accounting processing batch (only the batch is taken as an example) of the original system from 3 hours to 1 hour, wherein the batch comprises the processing of hundred million-level data, and the database operation of a plurality of hundred million-level large tables is related, so that the batch processing time is greatly shortened, the consumption of resources is reduced, the influence of timeliness on the service is avoided as far as possible, and the capacity and the output are improved.
While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and those skilled in the art can easily conceive of various equivalent modifications or substitutions within the technical scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
Claims (8)
1. A method for improving the batch execution efficiency of a spring-batch framework is characterized by comprising the following steps:
creating a thread fragment file based on a file thread database cache interface, then loading the number of pages and writing, and splitting an independent large file or a large result set into small files or small result sets according to rules for subsequent processing;
compiling query/read logic according to business requirements by using a business reader interface, loading data to be processed through a file thread database cache interface, and transferring the data to be processed to a processor layer;
configuring single-time cache number by using a file cache entity interface, reading corresponding line data from the thread fragment file at one time, converting the line data into a line data entity, and writing the line data entity into a redis cache;
configuring basic information of a read file by using a file reading interface;
and creating a spring-batch framework public class through a Reader interface of the springframe.
2. The method for improving the batch execution efficiency of the spring-batch framework of claim 1, wherein the file thread database cache interface utilizes task execution of spring to create a thread fragment file.
3. The method for improving the batch execution efficiency of the spring-batch framework of claim 2, wherein the file thread database cache interface loads and writes the number of the paging entries through a batch keyloadpagesize parameter, and splits an individual large file or a large result set into small files according to rules.
4. The method of claim 3, wherein the large file is a file with a size larger than a preset size, and the small file is a file with a size smaller than or equal to the preset size.
5. The method of claim 3, wherein if the large result set is a result set greater than or equal to a predetermined result set size threshold, the result set is a large result set, and the small result set is a result set smaller than the result set size threshold.
6. The method for improving the batch execution efficiency of the spring-batch framework of claim 1, wherein the file cache entity interface provides a cacheLineNum parameter to implement configuration of the number of single caches.
7. The method for improving the batch execution efficiency of the spring-batch framework according to claim 6, wherein the file cache entity interface reads the corresponding line data from the thread fragment file at one time, converts the line data into the line data entity by a transform method, and writes the line data entity into a redis cache.
8. The method for improving the batch execution efficiency of the spring-batch framework of claim 1, wherein the basic information of the read file includes but is not limited to a delimiter, a character code, a line feed character and a skip line number.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210706496.9A CN115062060A (en) | 2022-06-21 | 2022-06-21 | Method for improving spring-batch framework batch processing execution efficiency |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210706496.9A CN115062060A (en) | 2022-06-21 | 2022-06-21 | Method for improving spring-batch framework batch processing execution efficiency |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115062060A true CN115062060A (en) | 2022-09-16 |
Family
ID=83202202
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210706496.9A Pending CN115062060A (en) | 2022-06-21 | 2022-06-21 | Method for improving spring-batch framework batch processing execution efficiency |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115062060A (en) |
-
2022
- 2022-06-21 CN CN202210706496.9A patent/CN115062060A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10963358B2 (en) | Usage profile based recommendations | |
CN108280023B (en) | Task execution method and device and server | |
CN111339073A (en) | Real-time data processing method and device, electronic equipment and readable storage medium | |
CN107301214A (en) | Data migration method, device and terminal device in HIVE | |
CN106557307B (en) | Service data processing method and system | |
CN111580939B (en) | Method and device for processing transactions in hierarchical and asynchronous mode | |
CN109885642B (en) | Hierarchical storage method and device for full-text retrieval | |
CN114722119A (en) | Data synchronization method and system | |
CN113221182A (en) | Bank log desensitization method and device | |
CN112035230B (en) | Task scheduling file generation method, device and storage medium | |
CN113256355B (en) | Method, device, medium, equipment and system for determining integral rights and interests in real time | |
CN115062060A (en) | Method for improving spring-batch framework batch processing execution efficiency | |
CN115509608B (en) | Instruction optimization method and device, electronic equipment and computer-readable storage medium | |
CN111143461A (en) | Mapping relation processing system and method and electronic equipment | |
US8229946B1 (en) | Business rules application parallel processing system | |
CN116107772A (en) | Multithreading data processing method and device, processor and electronic equipment | |
CN114691653A (en) | Account set migration method and device, computer equipment and storage medium | |
CN110262758B (en) | Data storage management method, system and related equipment | |
CN113032385A (en) | Easily-extensible configurable data backup system and method | |
WO2019134238A1 (en) | Method for executing auxiliary function, device, storage medium, and terminal | |
CN112559641A (en) | Processing method and device of pull chain table, readable storage medium and electronic equipment | |
CN111782608A (en) | Automatic file generation method and device, electronic equipment and storage medium | |
US8495033B2 (en) | Data processing | |
US11182238B2 (en) | Problematic characters | |
CN110209746B (en) | Data processing method and device for data warehouse |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |