CN111858653A - Distributed batch processing method based on database segmentation - Google Patents

Distributed batch processing method based on database segmentation Download PDF

Info

Publication number
CN111858653A
CN111858653A CN202010679652.8A CN202010679652A CN111858653A CN 111858653 A CN111858653 A CN 111858653A CN 202010679652 A CN202010679652 A CN 202010679652A CN 111858653 A CN111858653 A CN 111858653A
Authority
CN
China
Prior art keywords
segmentation
segment
batch
sql
segmented
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010679652.8A
Other languages
Chinese (zh)
Inventor
王智聪
李耀
彭磊
薛伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Zhongbang Bank Co Ltd
Original Assignee
Wuhan Zhongbang Bank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Zhongbang Bank Co Ltd filed Critical Wuhan Zhongbang Bank Co Ltd
Priority to CN202010679652.8A priority Critical patent/CN111858653A/en
Publication of CN111858653A publication Critical patent/CN111858653A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24532Query optimisation of parallel queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/24569Query processing with adaptation to specific hardware, e.g. adapted for using GPUs or SSDs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/547Remote procedure calls [RPC]; Web services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/508Monitor

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of information, provides a distributed batch processing method based on database segmentation, and aims to solve the problems that the batch processing time is greatly wasted because the batch unit needs to be re-run from the beginning at the moment when the existing batch unit is interrupted due to various reasons during running. The main scheme comprises defining segmentation parameters, and judging whether a batch unit needs to be segmented, if so, performing segmentation, and if so, performing segmentation; segmenting the batch unit A into N small sections of SQL tasks; and monitoring the operation condition of each server by RPC remote procedure call ZK service, and distributing the segmented SQL tasks to the relatively idle servers for operation.

Description

Distributed batch processing method based on database segmentation
Technical Field
The invention relates to the technical field of information, and provides a batch distributed segmentation method based on nighttime.
Background
The prior art is as follows: the accounting system batch at night needs to process a large amount of accounting information, and simultaneously needs to provide data for upstream and downstream systems or cooperation channels, so that the system batch at night is required to be extremely high in efficiency, the current intra-line system batch at night running mechanism is coexistence of serial and parallel, each batch unit has a relatively close dependency relationship, and each batch unit is a minimum batch processing factor and cannot be split.
The technical defects are as follows:
1. based on the current inline batch operation mechanism, one batch unit cannot be subdivided and can only be operated on one server, and the operation time of each batch unit is inconsistent, so that some servers are in an idle state when running at full load, and meanwhile, when the batch operation time reaches a threshold value at night, the operation time can be reduced only by increasing the servers, so that more machines are idle in some time periods during batch running, and the full utilization of resources cannot be realized.
2. When a batch unit is interrupted during operation for various reasons, the batch unit is re-operated, and the batch unit including processing logic and data is integrated, the batch unit is executed from the initial state, so that batch processing time is greatly wasted, and if the batch unit is operated for a long time, the supply of a downstream system is even influenced.
Disclosure of Invention
The invention aims to solve the problem that batch processing time is greatly wasted because the batch unit is interrupted due to various reasons during operation, and the batch unit is executed from the initial state because the batch unit comprises processing logic and data which are integrated when the batch unit is re-operated.
In order to solve the technical problems, the invention adopts the following technical scheme:
a distributed batch processing method based on database segmentation comprises the following steps:
step 1, defining segmentation parameters, and judging whether a batch unit needs to be segmented or not, wherein the segmentation is carried out if the batch unit needs to be segmented, and the segmentation is carried out if the batch unit needs to be segmented;
step 2, segmenting the batch unit A into N small sections of SQL tasks;
and 3, monitoring the operation condition of each server by RPC remote procedure call ZK service, and distributing the segmented SQL tasks to the relatively idle servers for operation.
In the above technical solution, in step 1, the step of judging whether the batch unit needs to be segmented specifically includes the following steps: each operation unit of the late batch has a unique JOB _ ID, a new batch segmentation unit definition table is added, the JOB _ ID is used as a main key and comprises fields of whether segmentation is needed, a segmentation execution class and a segmentation execution SQL method, the data volume of each segment (for example, the set value is 20000, the data volume of each segment is 20000), when the late batch unit operates, the field of 'whether segmentation is needed' is inquired in the definition table according to the JOB _ ID, if the field is 'Y', the segmentation execution class field and the segmentation execution SQL field are read, and meanwhile, specific execution SQL is acquired from the defined XML according to the acquired execution SQL method and loaded for execution.
In the above technical solution, step 2 specifically includes the following steps:
step 2.1, inquiring the data quantity required to be made into a segment table, and defining a row _ count alias;
step 2.2, acquiring data volume of each small segment of SQL task configured in the XML, and obtaining the number of segments by using the total data volume/the data volume of each small segment of SQL task;
step 2.3: and (3) obtaining the initial field and the end of each segment, and determining the specific numerical value of the initial/end segment of each segment according to the number of the segments and the total data volume obtained in the step (2).
In the technical scheme, an XML configuration file is loaded, and a segmented SQL in the configuration file is read, wherein the SQL comprises an initial segmented field; the initial field name and the ending field name of each segment are configured in the XML, the initial field and the ending field are numbers or characters, and the value of the initial field and the value of the ending field of each segment are obtained according to the line number/the data volume of each segment inquired from the database by the initial field name and the ending field name.
Because the invention adopts the technical scheme, the invention has the following beneficial effects:
1. the scheme can complete batch units which need to be executed in a segmented and parallel mode through configuration, and is simple and easy to maintain.
2. After the keywords are segmented, each segment is uniformly distributed to each server for execution through the Zookeeper distributed coordination service, and meanwhile, when each segment executes tasks to process data, only the data of the segment needs to be processed, so that full-table scanning is avoided, the occupation of memory resources of the servers is greatly reduced, and the execution efficiency is improved.
3. The scheme supports the expansion to any scene needing to process a large amount of data, particularly needs to frequently operate the database, and avoids the problems of full-table scanning and rolling back of a large amount of data through data segmentation.
Drawings
FIG. 1 is a schematic diagram of a batch process run.
FIG. 2 is a schematic diagram of a batch run operating in stages.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the detailed description and specific examples, while indicating the preferred embodiment of the invention, are intended for purposes of illustration only and are not intended to limit the scope of the invention.
The detailed description of the embodiments of the present invention is not intended to limit the scope of the invention as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
It is noted that relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The segmented SQL principle:
the SQL segmentation principle is analyzed by taking the following scenarios as an example: if 1000000 data in mb _ acct table needs to be segmented, the following table:
Internalkey posted rate date ……
1 …… …… …… ……
2 …… …… …… ……
…… …… …… …… ……
1000000 …… …… …… ……
if we now segment by key InternalKEy, 10000 pieces of data per segment, the segmentation result is as follows:
Figure BDA0002584829170000031
Figure BDA0002584829170000041
after the segmentation is completed and data needs to be processed, adding a start _ key and an end _ key behind the corresponding SQL as query conditions, wherein the query conditions are as follows:
Select*from mb_acct where xxx and internalkey between#{start_key}and#{end_key}
after segmentation is completed, tasks can be executed concurrently in batches.

Claims (4)

1. A distributed batch processing method based on database segmentation is characterized by comprising the following steps:
step 1, defining segmentation parameters, and judging whether a batch unit needs to be segmented or not, wherein the segmentation is carried out if the batch unit needs to be segmented, and the segmentation is carried out if the batch unit needs to be segmented;
step 2, segmenting the batch unit A into N small sections of SQL tasks;
and 3, monitoring the operation condition of each server by RPC remote procedure call ZK service, and distributing the segmented SQL tasks to the relatively idle servers for operation.
2. The distributed batch processing method based on database segmentation according to claim 1, wherein in step 1, the step of judging whether the batch unit needs to be segmented specifically comprises the steps of:
Each operation unit of the late batch has a unique JOB _ ID, a new batch segmentation unit definition table is added, the JOB _ ID is used as a main key and comprises fields of whether segmentation is needed, a segmentation execution class and a segmentation execution SQL method, the data volume of each segment is defined according to the J0B _ ID to inquire whether the segment is needed or not in the table when the late batch unit operates, if the segment is 'Y', the segmentation execution class field and the segmentation execution SQL field are read, and meanwhile, specific execution SQL is obtained from the defined XML according to the obtained execution SQL method and loaded for execution.
3. The distributed batch processing method based on database segmentation according to claim 1, wherein the step 2 specifically comprises the steps of:
step 2.1, inquiring the data quantity required to be made into a segment table, and defining a row _ count alias;
step 2.2, acquiring data volume of each small segment of SQL task configured in the XML, and obtaining the number of segments by using the total data volume/the data volume of each small segment of SQL task;
step 2.3: and (3) obtaining the initial field and the end of each segment, and determining the specific numerical value of the initial/end segment of each segment according to the number of the segments and the total data volume obtained in the step (2).
4. The distributed batch processing method based on database segmentation according to claim 3, characterized in that, loading XML configuration file, reading segmented SQL in the configuration file, the SQL containing initial segmentation field; the initial field name and the ending field name of each segment are configured in the XML, the initial field and the ending field are numbers or characters, and the value of the initial field and the value of the ending field of each segment are obtained according to the line number/the data volume of each segment inquired from the database by the initial field name and the ending field name.
CN202010679652.8A 2020-07-15 2020-07-15 Distributed batch processing method based on database segmentation Pending CN111858653A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010679652.8A CN111858653A (en) 2020-07-15 2020-07-15 Distributed batch processing method based on database segmentation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010679652.8A CN111858653A (en) 2020-07-15 2020-07-15 Distributed batch processing method based on database segmentation

Publications (1)

Publication Number Publication Date
CN111858653A true CN111858653A (en) 2020-10-30

Family

ID=72983479

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010679652.8A Pending CN111858653A (en) 2020-07-15 2020-07-15 Distributed batch processing method based on database segmentation

Country Status (1)

Country Link
CN (1) CN111858653A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113807710A (en) * 2021-09-22 2021-12-17 四川新网银行股份有限公司 Method for sectionally paralleling and dynamically scheduling system batch tasks and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101359295A (en) * 2007-08-01 2009-02-04 阿里巴巴集团控股有限公司 Batch task scheduling and allocating method and system
CN106708620A (en) * 2015-11-13 2017-05-24 苏宁云商集团股份有限公司 Data processing method and system
CN106980678A (en) * 2017-03-30 2017-07-25 温馨港网络信息科技(苏州)有限公司 Data analysing method and system based on zookeeper technologies
CN110308980A (en) * 2019-06-27 2019-10-08 深圳前海微众银行股份有限公司 Batch processing method, device, equipment and the storage medium of data

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101359295A (en) * 2007-08-01 2009-02-04 阿里巴巴集团控股有限公司 Batch task scheduling and allocating method and system
CN106708620A (en) * 2015-11-13 2017-05-24 苏宁云商集团股份有限公司 Data processing method and system
CN106980678A (en) * 2017-03-30 2017-07-25 温馨港网络信息科技(苏州)有限公司 Data analysing method and system based on zookeeper technologies
CN110308980A (en) * 2019-06-27 2019-10-08 深圳前海微众银行股份有限公司 Batch processing method, device, equipment and the storage medium of data

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113807710A (en) * 2021-09-22 2021-12-17 四川新网银行股份有限公司 Method for sectionally paralleling and dynamically scheduling system batch tasks and storage medium
CN113807710B (en) * 2021-09-22 2023-06-20 四川新网银行股份有限公司 System batch task segmentation parallel and dynamic scheduling method and storage medium

Similar Documents

Publication Publication Date Title
CN110413634B (en) Data query method, system, device and computer readable storage medium
CN111506559B (en) Data storage method, device, electronic equipment and storage medium
TWI738721B (en) Task scheduling method and device
CN112115200B (en) Data synchronization method, device, electronic equipment and readable storage medium
CN113704243A (en) Data analysis method, data analysis device, computer device, and storage medium
CN111858653A (en) Distributed batch processing method based on database segmentation
CN112416972A (en) Real-time data stream processing method, device, equipment and readable storage medium
CN110704699A (en) Data image construction method and device, computer equipment and storage medium
CN112559514B (en) Information processing method and system
CN112579633A (en) Data retrieval method, device, equipment and storage medium
CN105718550B (en) Media information publishing method and system
CN110941536B (en) Monitoring method and system, and first server cluster
CN114036048A (en) Case activity detection method, device, equipment and storage medium
CN113627862A (en) First supply material overall process management method and device based on account book
CN108763498B (en) User identity identification method and device, electronic equipment and readable storage medium
CN113312434A (en) Pre-polymerization treatment method for massive structured data
CN113672618A (en) Metadata table-based multi-tenant data processing method and device
CN109086279B (en) Report caching method and device
CN113220726A (en) Data quality detection method and system
CN112835932A (en) Batch processing method and device of service table and nonvolatile storage medium
WO2019169696A1 (en) Platform client data backflow method, electronic apparatus, device, and storage medium
CN111352674B (en) List circulation method, server and computer readable storage medium
CN111679899A (en) Task scheduling method, device, platform equipment and storage medium
CN110515923B (en) Data migration method and system between distributed databases
CN111666324A (en) ETL scheduling method and device between relational databases

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination