CN111858653A - Distributed batch processing method based on database segmentation - Google Patents
Distributed batch processing method based on database segmentation Download PDFInfo
- Publication number
- CN111858653A CN111858653A CN202010679652.8A CN202010679652A CN111858653A CN 111858653 A CN111858653 A CN 111858653A CN 202010679652 A CN202010679652 A CN 202010679652A CN 111858653 A CN111858653 A CN 111858653A
- Authority
- CN
- China
- Prior art keywords
- segmentation
- segment
- batch
- sql
- segmented
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000011218 segmentation Effects 0.000 title claims abstract description 37
- 238000003672 processing method Methods 0.000 title claims abstract description 8
- 238000000034 method Methods 0.000 claims abstract description 17
- 238000012544 monitoring process Methods 0.000 claims abstract description 3
- 238000010586 diagram Methods 0.000 description 2
- 238000010923 batch production Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000005096 rolling process Methods 0.000 description 1
- 238000011144 upstream manufacturing Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2453—Query optimisation
- G06F16/24532—Query optimisation of parallel queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/24569—Query processing with adaptation to specific hardware, e.g. adapted for using GPUs or SSDs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5083—Techniques for rebalancing the load in a distributed system
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogram communication
- G06F9/547—Remote procedure calls [RPC]; Web services
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G06F2209/50—Indexing scheme relating to G06F9/50
- G06F2209/508—Monitor
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to the technical field of information, provides a distributed batch processing method based on database segmentation, and aims to solve the problems that the batch processing time is greatly wasted because the batch unit needs to be re-run from the beginning at the moment when the existing batch unit is interrupted due to various reasons during running. The main scheme comprises defining segmentation parameters, and judging whether a batch unit needs to be segmented, if so, performing segmentation, and if so, performing segmentation; segmenting the batch unit A into N small sections of SQL tasks; and monitoring the operation condition of each server by RPC remote procedure call ZK service, and distributing the segmented SQL tasks to the relatively idle servers for operation.
Description
Technical Field
The invention relates to the technical field of information, and provides a batch distributed segmentation method based on nighttime.
Background
The prior art is as follows: the accounting system batch at night needs to process a large amount of accounting information, and simultaneously needs to provide data for upstream and downstream systems or cooperation channels, so that the system batch at night is required to be extremely high in efficiency, the current intra-line system batch at night running mechanism is coexistence of serial and parallel, each batch unit has a relatively close dependency relationship, and each batch unit is a minimum batch processing factor and cannot be split.
The technical defects are as follows:
1. based on the current inline batch operation mechanism, one batch unit cannot be subdivided and can only be operated on one server, and the operation time of each batch unit is inconsistent, so that some servers are in an idle state when running at full load, and meanwhile, when the batch operation time reaches a threshold value at night, the operation time can be reduced only by increasing the servers, so that more machines are idle in some time periods during batch running, and the full utilization of resources cannot be realized.
2. When a batch unit is interrupted during operation for various reasons, the batch unit is re-operated, and the batch unit including processing logic and data is integrated, the batch unit is executed from the initial state, so that batch processing time is greatly wasted, and if the batch unit is operated for a long time, the supply of a downstream system is even influenced.
Disclosure of Invention
The invention aims to solve the problem that batch processing time is greatly wasted because the batch unit is interrupted due to various reasons during operation, and the batch unit is executed from the initial state because the batch unit comprises processing logic and data which are integrated when the batch unit is re-operated.
In order to solve the technical problems, the invention adopts the following technical scheme:
a distributed batch processing method based on database segmentation comprises the following steps:
step 1, defining segmentation parameters, and judging whether a batch unit needs to be segmented or not, wherein the segmentation is carried out if the batch unit needs to be segmented, and the segmentation is carried out if the batch unit needs to be segmented;
step 2, segmenting the batch unit A into N small sections of SQL tasks;
and 3, monitoring the operation condition of each server by RPC remote procedure call ZK service, and distributing the segmented SQL tasks to the relatively idle servers for operation.
In the above technical solution, in step 1, the step of judging whether the batch unit needs to be segmented specifically includes the following steps: each operation unit of the late batch has a unique JOB _ ID, a new batch segmentation unit definition table is added, the JOB _ ID is used as a main key and comprises fields of whether segmentation is needed, a segmentation execution class and a segmentation execution SQL method, the data volume of each segment (for example, the set value is 20000, the data volume of each segment is 20000), when the late batch unit operates, the field of 'whether segmentation is needed' is inquired in the definition table according to the JOB _ ID, if the field is 'Y', the segmentation execution class field and the segmentation execution SQL field are read, and meanwhile, specific execution SQL is acquired from the defined XML according to the acquired execution SQL method and loaded for execution.
In the above technical solution, step 2 specifically includes the following steps:
step 2.1, inquiring the data quantity required to be made into a segment table, and defining a row _ count alias;
step 2.2, acquiring data volume of each small segment of SQL task configured in the XML, and obtaining the number of segments by using the total data volume/the data volume of each small segment of SQL task;
step 2.3: and (3) obtaining the initial field and the end of each segment, and determining the specific numerical value of the initial/end segment of each segment according to the number of the segments and the total data volume obtained in the step (2).
In the technical scheme, an XML configuration file is loaded, and a segmented SQL in the configuration file is read, wherein the SQL comprises an initial segmented field; the initial field name and the ending field name of each segment are configured in the XML, the initial field and the ending field are numbers or characters, and the value of the initial field and the value of the ending field of each segment are obtained according to the line number/the data volume of each segment inquired from the database by the initial field name and the ending field name.
Because the invention adopts the technical scheme, the invention has the following beneficial effects:
1. the scheme can complete batch units which need to be executed in a segmented and parallel mode through configuration, and is simple and easy to maintain.
2. After the keywords are segmented, each segment is uniformly distributed to each server for execution through the Zookeeper distributed coordination service, and meanwhile, when each segment executes tasks to process data, only the data of the segment needs to be processed, so that full-table scanning is avoided, the occupation of memory resources of the servers is greatly reduced, and the execution efficiency is improved.
3. The scheme supports the expansion to any scene needing to process a large amount of data, particularly needs to frequently operate the database, and avoids the problems of full-table scanning and rolling back of a large amount of data through data segmentation.
Drawings
FIG. 1 is a schematic diagram of a batch process run.
FIG. 2 is a schematic diagram of a batch run operating in stages.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the detailed description and specific examples, while indicating the preferred embodiment of the invention, are intended for purposes of illustration only and are not intended to limit the scope of the invention.
The detailed description of the embodiments of the present invention is not intended to limit the scope of the invention as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
It is noted that relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The segmented SQL principle:
the SQL segmentation principle is analyzed by taking the following scenarios as an example: if 1000000 data in mb _ acct table needs to be segmented, the following table:
Internalkey | posted | rate | date | …… |
1 | …… | …… | …… | …… |
2 | …… | …… | …… | …… |
…… | …… | …… | …… | …… |
1000000 | …… | …… | …… | …… |
if we now segment by key InternalKEy, 10000 pieces of data per segment, the segmentation result is as follows:
after the segmentation is completed and data needs to be processed, adding a start _ key and an end _ key behind the corresponding SQL as query conditions, wherein the query conditions are as follows:
Select*from mb_acct where xxx and internalkey between#{start_key}and#{end_key}
after segmentation is completed, tasks can be executed concurrently in batches.
Claims (4)
1. A distributed batch processing method based on database segmentation is characterized by comprising the following steps:
step 1, defining segmentation parameters, and judging whether a batch unit needs to be segmented or not, wherein the segmentation is carried out if the batch unit needs to be segmented, and the segmentation is carried out if the batch unit needs to be segmented;
step 2, segmenting the batch unit A into N small sections of SQL tasks;
and 3, monitoring the operation condition of each server by RPC remote procedure call ZK service, and distributing the segmented SQL tasks to the relatively idle servers for operation.
2. The distributed batch processing method based on database segmentation according to claim 1, wherein in step 1, the step of judging whether the batch unit needs to be segmented specifically comprises the steps of:
Each operation unit of the late batch has a unique JOB _ ID, a new batch segmentation unit definition table is added, the JOB _ ID is used as a main key and comprises fields of whether segmentation is needed, a segmentation execution class and a segmentation execution SQL method, the data volume of each segment is defined according to the J0B _ ID to inquire whether the segment is needed or not in the table when the late batch unit operates, if the segment is 'Y', the segmentation execution class field and the segmentation execution SQL field are read, and meanwhile, specific execution SQL is obtained from the defined XML according to the obtained execution SQL method and loaded for execution.
3. The distributed batch processing method based on database segmentation according to claim 1, wherein the step 2 specifically comprises the steps of:
step 2.1, inquiring the data quantity required to be made into a segment table, and defining a row _ count alias;
step 2.2, acquiring data volume of each small segment of SQL task configured in the XML, and obtaining the number of segments by using the total data volume/the data volume of each small segment of SQL task;
step 2.3: and (3) obtaining the initial field and the end of each segment, and determining the specific numerical value of the initial/end segment of each segment according to the number of the segments and the total data volume obtained in the step (2).
4. The distributed batch processing method based on database segmentation according to claim 3, characterized in that, loading XML configuration file, reading segmented SQL in the configuration file, the SQL containing initial segmentation field; the initial field name and the ending field name of each segment are configured in the XML, the initial field and the ending field are numbers or characters, and the value of the initial field and the value of the ending field of each segment are obtained according to the line number/the data volume of each segment inquired from the database by the initial field name and the ending field name.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010679652.8A CN111858653A (en) | 2020-07-15 | 2020-07-15 | Distributed batch processing method based on database segmentation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010679652.8A CN111858653A (en) | 2020-07-15 | 2020-07-15 | Distributed batch processing method based on database segmentation |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111858653A true CN111858653A (en) | 2020-10-30 |
Family
ID=72983479
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010679652.8A Pending CN111858653A (en) | 2020-07-15 | 2020-07-15 | Distributed batch processing method based on database segmentation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111858653A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113807710A (en) * | 2021-09-22 | 2021-12-17 | 四川新网银行股份有限公司 | Method for sectionally paralleling and dynamically scheduling system batch tasks and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101359295A (en) * | 2007-08-01 | 2009-02-04 | 阿里巴巴集团控股有限公司 | Batch task scheduling and allocating method and system |
CN106708620A (en) * | 2015-11-13 | 2017-05-24 | 苏宁云商集团股份有限公司 | Data processing method and system |
CN106980678A (en) * | 2017-03-30 | 2017-07-25 | 温馨港网络信息科技(苏州)有限公司 | Data analysing method and system based on zookeeper technologies |
CN110308980A (en) * | 2019-06-27 | 2019-10-08 | 深圳前海微众银行股份有限公司 | Batch processing method, device, equipment and the storage medium of data |
-
2020
- 2020-07-15 CN CN202010679652.8A patent/CN111858653A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101359295A (en) * | 2007-08-01 | 2009-02-04 | 阿里巴巴集团控股有限公司 | Batch task scheduling and allocating method and system |
CN106708620A (en) * | 2015-11-13 | 2017-05-24 | 苏宁云商集团股份有限公司 | Data processing method and system |
CN106980678A (en) * | 2017-03-30 | 2017-07-25 | 温馨港网络信息科技(苏州)有限公司 | Data analysing method and system based on zookeeper technologies |
CN110308980A (en) * | 2019-06-27 | 2019-10-08 | 深圳前海微众银行股份有限公司 | Batch processing method, device, equipment and the storage medium of data |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113807710A (en) * | 2021-09-22 | 2021-12-17 | 四川新网银行股份有限公司 | Method for sectionally paralleling and dynamically scheduling system batch tasks and storage medium |
CN113807710B (en) * | 2021-09-22 | 2023-06-20 | 四川新网银行股份有限公司 | System batch task segmentation parallel and dynamic scheduling method and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110413634B (en) | Data query method, system, device and computer readable storage medium | |
CN111506559B (en) | Data storage method, device, electronic equipment and storage medium | |
TWI738721B (en) | Task scheduling method and device | |
CN112115200B (en) | Data synchronization method, device, electronic equipment and readable storage medium | |
CN113704243A (en) | Data analysis method, data analysis device, computer device, and storage medium | |
CN111858653A (en) | Distributed batch processing method based on database segmentation | |
CN112416972A (en) | Real-time data stream processing method, device, equipment and readable storage medium | |
CN110704699A (en) | Data image construction method and device, computer equipment and storage medium | |
CN112559514B (en) | Information processing method and system | |
CN112579633A (en) | Data retrieval method, device, equipment and storage medium | |
CN105718550B (en) | Media information publishing method and system | |
CN110941536B (en) | Monitoring method and system, and first server cluster | |
CN114036048A (en) | Case activity detection method, device, equipment and storage medium | |
CN113627862A (en) | First supply material overall process management method and device based on account book | |
CN108763498B (en) | User identity identification method and device, electronic equipment and readable storage medium | |
CN113312434A (en) | Pre-polymerization treatment method for massive structured data | |
CN113672618A (en) | Metadata table-based multi-tenant data processing method and device | |
CN109086279B (en) | Report caching method and device | |
CN113220726A (en) | Data quality detection method and system | |
CN112835932A (en) | Batch processing method and device of service table and nonvolatile storage medium | |
WO2019169696A1 (en) | Platform client data backflow method, electronic apparatus, device, and storage medium | |
CN111352674B (en) | List circulation method, server and computer readable storage medium | |
CN111679899A (en) | Task scheduling method, device, platform equipment and storage medium | |
CN110515923B (en) | Data migration method and system between distributed databases | |
CN111666324A (en) | ETL scheduling method and device between relational databases |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |