CN114257582A - Batch job processing method, distributed system and batch job processing architecture - Google Patents

Batch job processing method, distributed system and batch job processing architecture Download PDF

Info

Publication number
CN114257582A
CN114257582A CN202111550208.7A CN202111550208A CN114257582A CN 114257582 A CN114257582 A CN 114257582A CN 202111550208 A CN202111550208 A CN 202111550208A CN 114257582 A CN114257582 A CN 114257582A
Authority
CN
China
Prior art keywords
processing
preprocessing
distributed
distributed system
job
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111550208.7A
Other languages
Chinese (zh)
Inventor
袁怡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Agricultural Bank of China
Original Assignee
Agricultural Bank of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Agricultural Bank of China filed Critical Agricultural Bank of China
Priority to CN202111550208.7A priority Critical patent/CN114257582A/en
Publication of CN114257582A publication Critical patent/CN114257582A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/06Protocols specially adapted for file transfer, e.g. file transfer protocol [FTP]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/02Banking, e.g. interest calculation or account maintenance
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1004Server selection for load balancing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • General Engineering & Computer Science (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • General Business, Economics & Management (AREA)
  • Computer And Data Communications (AREA)

Abstract

The embodiment of the invention discloses a batch job processing method, a distributed system and a batch job processing architecture. The method comprises the following steps: preprocessing the acquired batch jobs through a plurality of electronic devices in the distributed system to obtain corresponding preprocessing results; storing the preprocessing result according to the subject of the preprocessing result by a distributed publish-subscribe message system; and calling a preprocessing result corresponding to the consumption process from the distributed publishing and subscribing message system for consumption processing to obtain a consumption result when the consumption process is triggered by the online processing system. According to the method, the batch jobs with high data volume are preprocessed by utilizing the plurality of electronic devices in the distributed system, and on the basis, the preprocessing result is transmitted to the online processing system for consumption processing through the distributed publishing and subscribing message system, so that the processing efficiency of the batch jobs can be effectively improved.

Description

Batch job processing method, distributed system and batch job processing architecture
Technical Field
The embodiment of the invention relates to the technical field of data processing, in particular to a batch job processing method, a distributed system and a batch job processing architecture.
Background
In large enterprises such as finance, telecommunications, and the internet, the processing of batch jobs is widely applied to various business logics and business processes. The job refers to submitting data or files to a server for processing in a job form in order to complete tasks of business functions, and has certain independence and definition. Under the scenario of a large amount of high concurrent jobs, if thousands of jobs are processed by using a conventional serial processing mechanism, the processing effect of batch jobs will be very poor, and if there may be problems of untimely transmission, high load, job information loss, and the like.
Therefore, how to improve the processing efficiency of batch operation is a technical problem to be solved urgently at present.
Disclosure of Invention
The embodiment of the invention provides a batch job processing method, a distributed system and a batch job processing architecture, which are used for improving the processing efficiency of batch jobs.
In a first aspect, an embodiment of the present invention provides a batch job processing method, including:
preprocessing the acquired batch jobs through a plurality of electronic devices in the distributed system to obtain corresponding preprocessing results;
storing the preprocessing result according to the subject of the preprocessing result by a distributed publish-subscribe message system;
and calling a preprocessing result corresponding to the consumption process from the distributed publishing and subscribing message system for consumption processing to obtain a consumption result when the consumption process is triggered by the online processing system.
Optionally, the preprocessing the acquired batch jobs by the multiple electronic devices in the distributed system to obtain corresponding preprocessing results includes:
and transmitting the acquired batch jobs to corresponding electronic equipment in the distributed system for processing through the distributed system to obtain a corresponding preprocessing result.
Optionally, the transmitting, by the distributed system, the obtained batch job to the corresponding electronic device in the distributed system for processing includes:
and transmitting the obtained batch jobs to corresponding electronic equipment in the distributed system for processing according to a set rule through load balancing equipment in the distributed system.
Optionally, for each electronic device assigned to a job, the electronic device processes the assigned job, including:
generating a job execution plan of a corresponding job on the electronic equipment according to a set plan template;
and processing the corresponding operation based on the operation execution plan to obtain a corresponding preprocessing result.
Optionally, the job execution plan is updated periodically based on a timer.
Optionally, after obtaining the corresponding preprocessing result, the method further includes:
performing theme classification on the preprocessing result according to the use type of the batch operation to obtain a plurality of themes corresponding to the batch operation;
and sending the preprocessing results corresponding to different topics to the distributed publishing and subscribing message system.
In a second aspect, an embodiment of the present invention further provides a distributed system, including:
the distributed system executes the method corresponding to the distributed system in the embodiment of the invention.
In a third aspect, an embodiment of the present invention further provides a batch job processing architecture, including:
the distributed system is used for preprocessing the acquired batch jobs through a plurality of electronic devices in the distributed system to obtain corresponding preprocessing results;
the distributed publishing and subscribing message system is used for storing the preprocessing result according to the subject to which the preprocessing result belongs;
and the online processing system is used for calling the preprocessing result corresponding to the consumption process from the distributed publishing and subscribing message system to perform consumption processing when the consumption process is triggered, so as to obtain the consumption result.
Optionally, the distributed system is further configured to transmit the obtained batch jobs to corresponding electronic devices in the distributed system for processing, so as to obtain corresponding preprocessing results.
Optionally, the distributed system is further configured to transmit the obtained batch jobs to the corresponding electronic devices in the distributed system for processing according to a set rule through a load balancing device in the distributed system.
The embodiment of the invention provides a batch job processing method, a distributed system and a batch job processing architecture, firstly, a plurality of electronic devices in the distributed system are used for preprocessing the acquired batch jobs to obtain corresponding preprocessing results; then storing the preprocessing result according to the subject of the preprocessing result by a distributed publish-subscribe message system; and finally, calling a preprocessing result corresponding to the consumption process from the distributed publishing and subscribing message system for consumption processing to obtain a consumption result when the consumption process is triggered by the online processing system. According to the method, the batch jobs with high data volume are preprocessed by utilizing the plurality of electronic devices in the distributed system, and on the basis, the preprocessing result is transmitted to the online processing system for consumption processing through the distributed publishing and subscribing message system, so that the processing efficiency of the batch jobs can be effectively improved.
Drawings
FIG. 1 is a flowchart illustrating a batch job processing method according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating a batch job processing method according to a second embodiment of the present invention;
FIG. 3 is a schematic diagram illustrating an implementation of a batch job processing method according to a second embodiment of the present invention;
fig. 4 is a schematic structural diagram of a distributed system according to a third embodiment of the present invention;
fig. 5 is a schematic diagram of a batch job processing architecture according to a fourth embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Before discussing exemplary embodiments in more detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart may describe the operations (or steps) as a sequential process, many of the operations can be performed in parallel, concurrently or simultaneously. In addition, the order of the operations may be re-arranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figure. The processes may correspond to methods, functions, procedures, subroutines, and the like. In addition, the embodiments and features of the embodiments in the present invention may be combined with each other without conflict.
The term "include" and variations thereof as used herein are intended to be open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment".
It should be noted that the concepts of "first", "second", etc. mentioned in the present invention are only used for distinguishing corresponding contents, and are not used for limiting the order or interdependence relationship.
It is noted that references to "a", "an", and "the" modifications in the present invention are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that reference to "one or more" unless the context clearly dictates otherwise.
Example one
Fig. 1 is a flowchart illustrating a batch job processing method according to an embodiment of the present invention, where the method is applicable to processing a batch job, and the method can be executed by a batch job processing architecture.
As shown in fig. 1, a method for processing a batch job according to a first embodiment of the present invention includes the following steps:
and S110, preprocessing the acquired batch jobs through a plurality of electronic devices in the distributed system to obtain corresponding preprocessing results.
In this embodiment, a job may refer to a file composed of related data according to a certain service. For example, some users in a bank need to transact credit card application business, and bank personnel enter relevant information (that is, application information) required by the users for applying for the credit card business into a corresponding database, on the basis of which, the bank may generate a large amount of application information data for storing in the corresponding database within a period of time (such as half a day, or a day, etc.). A file may be generated in the database according to all application data in a period of time (e.g., a day), for example, a file may be generated by all application information data added in a day. It should be noted that this file may be incremental or full. The increment may refer to that all newly added application information data in one day generate a file, and the total amount may refer to that all newly added application information data in the day and all application information data existing in the database generate a file together. The file may be understood as a job. The specific configuration and content of the operation are not limited here, and may be set according to actual needs. According to different types of business handled by banks and the like, a plurality of files can be generated within a period of time, at the moment, the plurality of files can be understood as a plurality of jobs, and the plurality of jobs can be regarded as batch jobs. The number of jobs included in the batch job is not limited in this embodiment, for example, at least two jobs or the number is greater than the set number, and the value of the set number may be determined according to actual conditions.
A distributed system may refer to a system built on a network, where a distributed system contains a set of individual electronic devices, but is presented to the user as a unified whole. The electronic device may be a computer device or a server device, among others. A distributed system is understood to be a system comprising a plurality of independent electronic devices, each electronic device in the system does not interact with each other, i.e. each electronic device can operate independently, wherein each electronic device can process corresponding data by using its own processor, storage device, etc. The number of electronic devices included in the distributed system is not specifically limited, and can be flexibly set according to actual requirements. Wherein the corresponding data may include jobs assigned to the electronic devices.
Preprocessing is understood as processing the files and the contained data content corresponding to the acquired batch jobs, and for example, preprocessing may include file decompression, extraction of data contained in the file content, determination of execution sequence of each file, storage of related data, and other processing operations. The pre-processing result may refer to a data result obtained after pre-processing the acquired batch job.
In one embodiment, for the acquired batch jobs, how to allocate to the corresponding electronic devices in the distributed system may be determined according to the number of jobs included in the batch jobs. For example, if the number of jobs included in the batch job is very large, and the number of jobs allocated to a single electronic device for preprocessing may increase the calculation amount, so that a single computer cannot complete job processing, storage, and the like in a short time, the acquired batch job may be divided into a plurality of shares, each share may include a certain number of jobs (the number of shares and the number of jobs included in each share are not limited herein and may be set according to actual requirements), and then allocated to the electronic devices in the distributed system one by one according to the number of shares, and the acquired part of jobs may be preprocessed on the corresponding electronic devices to obtain corresponding preprocessing results. For another example, if the number of jobs included in the batch job is small, such as smaller than a set threshold (the set threshold can be flexibly set according to actual requirements), the obtained batch job may be allocated to a single electronic device for preprocessing.
In one embodiment, in the process of allocating the acquired batch jobs to the electronic devices in the distributed system, the electronic device most suitable for being allocated and processing the batch jobs can be determined according to the current internal resource usage of each electronic device; for example, if an electronic device in the distributed system is currently processing other jobs, that is, the internal resource usage space is small, it may indicate that the electronic device is currently not suitable for being allocated and processing batch jobs again; if an electronic device in the distributed system does not currently process other jobs, that is, the current internal resource usage space is large, it may indicate that the electronic device is currently suitable for being allocated and processing the batch jobs again.
It should be noted that, after the acquired batch jobs are preprocessed by the plurality of electronic devices in the distributed system to obtain corresponding preprocessing results, the preprocessing results can be further classified, each class corresponds to one theme, and the preprocessing results are distributed to various themes according to the themes respectively corresponding to the classes; that is, after the pre-processing results are classified, each of the pre-processing results corresponds to an belonging topic. In this embodiment, the preprocessing result may be classified according to the usage type of each job, for example, the usage type may include a card application transaction service, an opening message service, and the like. The classification of the preprocessing result is not limited, and the preprocessing result can be flexibly set according to actual requirements.
And S120, storing the preprocessing result according to the subject to which the preprocessing result belongs through a distributed publish-subscribe message system.
In this embodiment, the distributed publish-subscribe messaging system may refer to a high-throughput messaging middleware for caching and transmitting data, and may implement asynchronous communication between applications, where the applications only need to pay attention to passing data and do not need to pay attention to how data is transferred between the applications. For example, the distributed publish-subscribe message system may be a Kafka message queue. The Kafka message queue serves as message middleware, can be used for constructing a reliable pipeline for transmitting real-time data and cache data between systems or applications, can asynchronously process the data, has high-throughput data performance, can receive data sent by an upstream (namely a distributed system), and guarantees the integrity of the data sent by the upstream.
The Kafka message queue may include a plurality of topics, which may be preset according to the usage type of the job, and each topic may include a plurality of partitions, each partition includes a certain storage space, and may be used to store corresponding data. The setting of the theme and the partition in the Kafka message queue, such as the setting of the number of the theme and the partition, the setting of the partition storage space, and the like, is not limited, and can be flexibly set according to actual requirements. It can be understood that when the topics in the Kafka message queue are set, the preprocessing result output in the distributed system and the corresponding topics thereof can be guaranteed, and the corresponding topics can be matched in the Kafka message queue for data storage.
The specific process of storing the preprocessing result according to the topic of the distributed publish-subscribe message system may be as follows: firstly, receiving preprocessing results output by a distributed system through a distributed publish-subscribe message system, wherein each preprocessing result corresponds to a subject to which the preprocessing result belongs; then, according to the topic to which each preprocessing result belongs, distributing the preprocessing result to the corresponding topic contained in the distributed publish-subscribe message system; finally, the partition capable of storing the preprocessing result may be determined according to the size of the storage space capacity of each partition in the corresponding topic, for example, the preprocessing result may be stored in a partition with sufficient storage space according to actual requirements, which is not limited herein.
S130, when the consumption process is triggered through the online processing system, the preprocessing result corresponding to the consumption process is called from the distributed publishing and subscribing message system to carry out consumption processing, and the consumption result is obtained.
In this embodiment, the online processing system may be considered as a system that consumes the pre-processed results. The consumption process may be a process of performing consumption processing on the corresponding preprocessing result, where the consumption processing may be processing specific task content included in the job in the acquired preprocessing result, and the obtained processing result may be understood as a consumption result. For example, if the specific task content included in the pre-processing result corresponding to a certain consumption process is credit card application business and related application information, the corresponding consumption process can be understood as credit card transaction according to the credit card application business and the related application information, and the credit card transaction result can be understood as the corresponding consumption result.
When the consumption process is triggered by the online processing system, the preprocessing result corresponding to the consumption process is called from the corresponding topic of the distributed publish-subscribe message system according to the topic corresponding to the preprocessing result corresponding to the consumption process, and the called preprocessing result is consumed on the basis to obtain the consumption result.
In an embodiment, the consuming process may also correspond to one topic, and in this case, when the consuming process is triggered by the online processing system, all the preprocessing results in the topic corresponding to the consuming process are called from the distributed publish-subscribe message system, and the consuming process is performed on all the preprocessing results to obtain a consuming result.
In one embodiment, the online processing system may include one or more consumers, which may be understood as electronic devices (e.g., computers or servers) that perform consumption processing on the pre-processed results. In this embodiment, the online processing system includes a plurality of consumers, and the number of the consumers can be controlled to share the load in the online processing system and speed up the consumption processing. For example, in each consumption process, the preprocessing result called from the distributed publish-subscribe message system may be distributed to different consumers according to actual needs for consumption processing, so as to reduce the amount of computation of each consumer and improve the rate of consumption processing. The manner in which the invoked pre-processing results are distributed among the different consumers is not limited herein.
The batch job processing method provided by the embodiment of the invention comprises the steps of firstly preprocessing the acquired batch jobs through a plurality of electronic devices in a distributed system to obtain corresponding preprocessing results; then storing the preprocessing result according to the subject of the preprocessing result by a distributed publish-subscribe message system; and finally, calling a preprocessing result corresponding to the consumption process from the distributed publishing and subscribing message system for consumption processing to obtain a consumption result when the consumption process is triggered by the online processing system. According to the method, the batch jobs with high data volume are preprocessed by utilizing the plurality of electronic devices in the distributed system, and on the basis, the preprocessing result is transmitted to the online processing system for consumption processing through the distributed publishing and subscribing message system, so that the processing efficiency of the batch jobs can be effectively improved.
Example two
Fig. 2 is a schematic flow chart of a batch job processing method according to a second embodiment of the present invention, which is further detailed based on the above embodiments. In this embodiment, a process of preprocessing the acquired batch jobs by a plurality of electronic devices in the distributed system to obtain corresponding preprocessing results is specifically described. It should be noted that technical details that are not described in detail in the present embodiment may be referred to any of the above embodiments.
As shown in fig. 2, a batch job processing method according to a second embodiment of the present invention includes the following steps:
and S210, transmitting the acquired batch jobs to corresponding electronic equipment in the distributed system through the distributed system.
In this embodiment, a batch job is received by a distributed system, and the obtained batch job is distributed to a corresponding electronic device in the distributed system, for example, the obtained batch job may be distributed to one electronic device in the distributed system; or the system can be distributed to a plurality of electronic devices, and the number of the jobs distributed to each electronic device can be set according to actual requirements.
Optionally, the step of transmitting the obtained batch jobs to the corresponding electronic devices in the distributed system for processing by the distributed system includes: and transmitting the obtained batch jobs to corresponding electronic equipment in the distributed system for processing according to a set rule through load balancing equipment in the distributed system.
The load balancing device may be understood as an electronic device in the distributed system for managing batch job distribution to ensure load balancing among other electronic devices in the distributed system. Load balancing may be understood as the sharing of the pressure balance of the processed data over each electronic device (excluding the load balancing device) in the distributed system to avoid over-stressing some electronic devices and over-idling others in the distributed system. Through load balancing, each electronic device can acquire the load suitable for the processing capacity of the electronic device. For example, if the resource usage space of some electronic devices in the distributed system is already occupied for a large part, and the resource usage space of other electronic devices is still more surplus, the obtained batch jobs may be allocated to the electronic devices with more surplus resource usage space for load balancing through the load balancing device in the distributed system.
The setting rule may be understood as a rule that is set in advance to determine the electronic device to which the job can be allocated according to the size of the resource usage space available in each electronic device, and the setting rule may further include determining the number of jobs that can be allocated to the corresponding electronic device according to the size of the resource usage space available in each electronic device. This is not particularly limited herein.
Specifically, batch jobs are obtained through load balancing equipment in the distributed system, and the obtained batch jobs are transmitted to corresponding electronic equipment in the distributed system for processing according to a set rule in the load balancing equipment.
S220, for each electronic device assigned to the job, a job execution plan for the corresponding job is generated on the electronic device based on the set plan template.
In this embodiment, the plan template may be understood as a pre-programmed template for generating a job execution plan, where the template may include processing logic for a job, such as decompressing a file, extracting data, and then storing corresponding data for a job, and an execution sequence for multiple jobs. The setting plan template can set the addition or modification of corresponding functions according to the actual requirements, which is not limited herein.
For each electronic device assigned to a job, on the electronic device, a job execution plan corresponding to one or more jobs may be generated according to a set plan template; it should be understood that, for a plurality of jobs distributed to the electronic device, one corresponding job execution plan may be generated from one job according to the set plan template, or one corresponding job execution plan may be generated from a plurality of jobs according to the set plan template. The job execution plan may refer to a processing plan for a specific job, such as a job execution plan corresponding to a plurality of jobs, and the job execution plan may include a sequential execution order of the plurality of jobs, an execution start time and an execution end time of each job, a maximum executable time of each job, data extraction of specific content of each job, and the like. Note that the job execution plan may be updated periodically.
Optionally, the job execution plan is updated periodically based on a timer.
The timer may refer to a preset module for controlling the update of the job execution plan timing. A timer may be provided on each electronic device in the distributed system for timed updates to the job execution plan by the electronic device based on the timer. For example, the timer may be set to perform the job execution plan update once a day, or may be set to perform the job execution plan update once two hours, which is not limited herein and may be flexibly set according to actual requirements.
And S230, processing the corresponding job based on the job execution plan to obtain a corresponding preprocessing result.
In this embodiment, on each electronic device assigned to a job, the corresponding job may be processed based on the generated job execution plan, resulting in a corresponding preprocessing result.
S240, performing theme classification on the preprocessing result according to the use type of the batch operation to obtain a plurality of themes corresponding to the batch operation.
In this embodiment, the usage type may be understood as the usage of a specific task corresponding to a job, such as a card application service type, a message service type, or a card sale service type. The preprocessing result can be subject-classified according to the usage type of the batch jobs, so as to obtain a plurality of subjects corresponding to the batch jobs, wherein each subject can include a plurality of jobs corresponding to the subjects.
And S250, sending the preprocessing results corresponding to different topics to a distributed publish-subscribe message system.
In this embodiment, after completing the topic classification of the preprocessing result, the preprocessing results corresponding to different topics may be sent to the distributed publish-subscribe message system.
And S260, storing the preprocessing result according to the subject to which the preprocessing result belongs through a distributed publish-subscribe message system.
In this embodiment, the distributed publish-subscribe message system includes a plurality of preset topics, and after receiving the preprocessing result sent from the distributed system, the preprocessing result may be stored in the topic corresponding to the distributed publish-subscribe message system according to the topic to which the preprocessing result belongs. On the basis, in the distributed publish-subscribe message system, for the topic receiving the preprocessing result, the preprocessing result can be distributed to the corresponding partition for storage according to the actual requirement.
And S270, calling a preprocessing result corresponding to the consuming process from the distributed publishing and subscribing message system to perform consuming processing to obtain a consuming result when the consuming process is triggered by the online processing system.
In this embodiment, when the online processing system triggers the consuming process, according to the topic corresponding to the preprocessing result corresponding to the consuming process, the preprocessing result in the corresponding topic may be called from the distributed publish-subscribe message system, and the called preprocessing result is consumed to obtain the consuming result.
The batch job processing method provided by the second embodiment of the invention embodies the process of preprocessing the acquired batch jobs by a plurality of electronic devices in a distributed system to obtain corresponding preprocessing results. The method can generate the job execution plan corresponding to the batch job by setting the plan template, can preprocess the job according to the processing logic in the job execution plan, and can avoid the problems of untimely job processing, high load, job information loss and the like when facing the batch job by combining the setting of a distributed system on the basis, thereby improving the speed and the accuracy of the batch job processing. In addition, the job execution plan is updated regularly through setting of the timer, and flexibility and expandability of batch job execution are improved.
Fig. 3 is a schematic diagram illustrating an implementation of a batch job processing method according to a second embodiment of the present invention. As shown in fig. 3, the method includes a file preprocessing module (i.e., a distributed system), a Kafka message queue module (i.e., a distributed publish-subscribe message system), and an online processing module (i.e., an online processing system). The files to be processed reach the file preprocessing module in the form of batch operation, after preprocessing, data information in the files enters a Kafka message queue, the online processing module serves as a consumer, and data are extracted from the Kafka message queue and are subsequently processed, so that the batch operation is processed. The specific implementation process can be as follows: first, since the data volume of batch jobs is usually very large, and a single computer cannot complete calculation and storage tasks in a short time, the file preprocessing module may be composed of a distributed system including a plurality of computers, process a large number of files arriving in a short time, and extract data information therein. The file preprocessing module can also generate a job execution plan to control execution logic of the corresponding batch job, and the like. Then, the file preprocessing module is used as a producer to classify and output the processed data according to the theme; the Kafka message queue module receives the data and distributes the data to different topics for storage, each topic can be divided into a plurality of partitions, each partition can be used for storing a large amount of data, and when the consumers need to consume the data, the data are extracted from the Kafka message queue and then are subjected to subsequent processing. Finally, the online processing module, as a consumer, receives the topics required for subscription from the Kafka message queue and receives the required messages from the corresponding topics. The online processing module can comprise a plurality of consumers (one consumer can be regarded as a server), each consumer receives the information of a part of partitions in the subscribed theme, the load in the online processing module can be shared by controlling the number of the consumers, the data consumption speed is increased, and online calling is realized.
EXAMPLE III
Fig. 4 is a schematic structural diagram of a distributed system according to a third embodiment of the present invention. As shown in fig. 4, the distributed system includes: the distributed system can be used for executing the method corresponding to the distributed system in the method provided by the embodiment of the invention.
In this embodiment, first, a plurality of electronic devices in a distributed system preprocess an acquired batch job to obtain a corresponding preprocessing result; then storing the preprocessing result according to the subject of the preprocessing result by a distributed publish-subscribe message system; and finally, calling a preprocessing result corresponding to the consumption process from the distributed publishing and subscribing message system for consumption processing to obtain a consumption result when the consumption process is triggered by the online processing system. According to the embodiment, the batch jobs with high data volume are preprocessed by the plurality of electronic devices in the distributed system, and on the basis, the preprocessing result is transmitted to the online processing system for consumption processing through the distributed publish-subscribe message system, so that the processing efficiency of the batch jobs can be effectively improved.
Example four
Fig. 5 is a schematic diagram of a batch job processing architecture according to a fourth embodiment of the present invention. As shown in FIG. 5, the batch job processing architecture comprises: a distributed system 510, a distributed publish-subscribe messaging system 520, and an online processing system 530;
the distributed system 510 is configured to perform preprocessing on the acquired batch jobs through a plurality of electronic devices in the distributed system 510 to obtain corresponding preprocessing results;
a distributed publish-subscribe message system 520, configured to store the preprocessing result according to the topic to which the preprocessing result belongs;
and the online processing system 530 is configured to, when the consuming process is triggered, invoke the preprocessing result corresponding to the consuming process from the distributed publish-subscribe message system 520 to perform consuming processing, so as to obtain a consuming result.
The distributed system 510 may be a system made up of a plurality of electronic devices. The distributed publish-subscribe message system 520 may be a kafka message queue. The online processing system 530 may be a system made up of multiple consumers (one consumer for each electronic device).
In the batch job processing architecture provided in the fourth embodiment, firstly, a plurality of electronic devices in the distributed system 510 perform preprocessing on the acquired batch job to obtain a corresponding preprocessing result; then, storing the preprocessing result according to the subject of the preprocessing result by the distributed publishing and subscribing message system 520; finally, when the consuming process is triggered by the online processing system 530, the preprocessing result corresponding to the consuming process is called from the distributed publishing and subscribing message system 520 to perform consuming processing, so as to obtain a consuming result. The batch job processing architecture can effectively improve the processing efficiency of batch jobs by preprocessing the acquired batch jobs with high data volume by using a plurality of electronic devices in the distributed system 510, and transmitting the preprocessing result to the online processing system 530 for consumption processing through the distributed publish-subscribe message system 520 on the basis.
On the basis of the foregoing embodiment, the distributed system 510 is further configured to transmit the acquired batch job to a corresponding electronic device in the distributed system 510 for processing, so as to obtain a corresponding preprocessing result.
Optionally, the distributed system 510 is further configured to transmit, through a load balancing device in the distributed system 510, the obtained batch job to a corresponding electronic device in the distributed system 510 according to a set rule for processing.
Optionally, the transmitting, by the load balancing device in the distributed system 510, the obtained batch job to the corresponding electronic device in the distributed system 510 according to the set rule for processing includes:
and transmitting the obtained batch jobs to corresponding electronic devices in the distributed system 510 for processing according to a set rule through load balancing devices in the distributed system 510.
Optionally, for each electronic device assigned to a job, the electronic device processes the assigned job, including:
on the electronic equipment, generating a job execution plan of the corresponding job according to a set plan template;
and processing the corresponding operation based on the operation execution plan to obtain a corresponding preprocessing result.
Optionally, the job execution plan is updated periodically based on a timer.
Optionally, the distributed system 510 is further configured to, after obtaining the corresponding preprocessing result, perform theme classification on the preprocessing result according to the usage type of the batch job to obtain multiple themes corresponding to the batch job;
the preprocessing results corresponding to the different topics are sent to the distributed publish-subscribe message system 520.
The batch job processing architecture provided by the fourth embodiment can be used for executing the batch job processing method provided by any of the above embodiments, and has corresponding functions and beneficial effects.
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (10)

1. A method for processing a batch job, the method comprising:
preprocessing the acquired batch jobs through a plurality of electronic devices in the distributed system to obtain corresponding preprocessing results;
storing the preprocessing result according to the subject of the preprocessing result by a distributed publish-subscribe message system;
and calling a preprocessing result corresponding to the consumption process from the distributed publishing and subscribing message system for consumption processing to obtain a consumption result when the consumption process is triggered by the online processing system.
2. The method of claim 1, wherein preprocessing the acquired batch job by a plurality of electronic devices in a distributed system to obtain corresponding preprocessing results comprises:
and transmitting the acquired batch jobs to corresponding electronic equipment in the distributed system for processing through the distributed system to obtain a corresponding preprocessing result.
3. The method of claim 2, wherein the transferring the acquired batch jobs through a distributed system to corresponding electronic devices in the distributed system for processing comprises:
and transmitting the obtained batch jobs to corresponding electronic equipment in the distributed system for processing according to a set rule through load balancing equipment in the distributed system.
4. The method of claim 2, wherein for each electronic device assigned to a job, the electronic device processing the assigned job comprises:
generating a job execution plan of a corresponding job on the electronic equipment according to a set plan template;
and processing the corresponding operation based on the operation execution plan to obtain a corresponding preprocessing result.
5. The method of claim 4, wherein the job execution plan is updated periodically based on a timer.
6. The method of claim 1, further comprising, after obtaining the corresponding pre-processing result:
performing theme classification on the preprocessing result according to the use type of the batch operation to obtain a plurality of themes corresponding to the batch operation;
and sending the preprocessing results corresponding to different topics to the distributed publishing and subscribing message system.
7. A distributed system comprising a plurality of electronic devices, wherein the distributed system performs a method corresponding to the distributed system in the method of claims 1-6.
8. A batch job processing architecture, comprising:
the distributed system is used for preprocessing the acquired batch jobs through a plurality of electronic devices in the distributed system to obtain corresponding preprocessing results;
the distributed publishing and subscribing message system is used for storing the preprocessing result according to the subject to which the preprocessing result belongs;
and the online processing system is used for calling the preprocessing result corresponding to the consumption process from the distributed publishing and subscribing message system to perform consumption processing when the consumption process is triggered, so as to obtain the consumption result.
9. The framework of claim 8, wherein the distributed system is further configured to transmit the obtained batch job to a corresponding electronic device in the distributed system for processing, so as to obtain a corresponding preprocessing result.
10. The framework of claim 8, wherein the distributed system is further configured to transmit, through a load balancing device in the distributed system, the obtained batch jobs to a corresponding electronic device in the distributed system according to a set rule for processing.
CN202111550208.7A 2021-12-17 2021-12-17 Batch job processing method, distributed system and batch job processing architecture Pending CN114257582A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111550208.7A CN114257582A (en) 2021-12-17 2021-12-17 Batch job processing method, distributed system and batch job processing architecture

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111550208.7A CN114257582A (en) 2021-12-17 2021-12-17 Batch job processing method, distributed system and batch job processing architecture

Publications (1)

Publication Number Publication Date
CN114257582A true CN114257582A (en) 2022-03-29

Family

ID=80795538

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111550208.7A Pending CN114257582A (en) 2021-12-17 2021-12-17 Batch job processing method, distributed system and batch job processing architecture

Country Status (1)

Country Link
CN (1) CN114257582A (en)

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104090891A (en) * 2013-12-12 2014-10-08 深圳市腾讯计算机系统有限公司 Method and device for data processing and server and system for data processing
CN105100271A (en) * 2015-08-31 2015-11-25 南京势行软件开发有限公司 System for publishing distributed flexible extension information and control method thereof
CN108536532A (en) * 2018-04-23 2018-09-14 中国农业银行股份有限公司 A kind of batch tasks processing method and system
CN109451072A (en) * 2018-12-29 2019-03-08 广东电网有限责任公司 A kind of message caching system and method based on Kafka
CN110913000A (en) * 2019-11-27 2020-03-24 浙江华诺康科技有限公司 Method, system and computer readable storage medium for processing service information
CN111367953A (en) * 2020-03-30 2020-07-03 中国建设银行股份有限公司 Streaming processing method and device for information data
CN111866189A (en) * 2020-09-03 2020-10-30 中国银行股份有限公司 Batch file transmission method and device
CN112506960A (en) * 2020-12-17 2021-03-16 青岛以萨数据技术有限公司 Multi-model data storage method and system based on ArangoDB engine
CN112541816A (en) * 2020-12-21 2021-03-23 四川新网银行股份有限公司 Distributed stream computing processing engine for internet financial consumption credit batch business
CN112615773A (en) * 2020-12-02 2021-04-06 海南车智易通信息技术有限公司 Message processing method and system
CN112634021A (en) * 2020-12-24 2021-04-09 中国建设银行股份有限公司 Client data processing method and device
CN113157449A (en) * 2021-04-16 2021-07-23 上海寰果信息科技有限公司 Real-time stream data analysis processing method based on MQTT
CN113641509A (en) * 2020-05-11 2021-11-12 长城汽车股份有限公司 Internet of things data processing method and device

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104090891A (en) * 2013-12-12 2014-10-08 深圳市腾讯计算机系统有限公司 Method and device for data processing and server and system for data processing
CN105100271A (en) * 2015-08-31 2015-11-25 南京势行软件开发有限公司 System for publishing distributed flexible extension information and control method thereof
CN108536532A (en) * 2018-04-23 2018-09-14 中国农业银行股份有限公司 A kind of batch tasks processing method and system
CN109451072A (en) * 2018-12-29 2019-03-08 广东电网有限责任公司 A kind of message caching system and method based on Kafka
CN110913000A (en) * 2019-11-27 2020-03-24 浙江华诺康科技有限公司 Method, system and computer readable storage medium for processing service information
CN111367953A (en) * 2020-03-30 2020-07-03 中国建设银行股份有限公司 Streaming processing method and device for information data
CN113641509A (en) * 2020-05-11 2021-11-12 长城汽车股份有限公司 Internet of things data processing method and device
CN111866189A (en) * 2020-09-03 2020-10-30 中国银行股份有限公司 Batch file transmission method and device
CN112615773A (en) * 2020-12-02 2021-04-06 海南车智易通信息技术有限公司 Message processing method and system
CN112506960A (en) * 2020-12-17 2021-03-16 青岛以萨数据技术有限公司 Multi-model data storage method and system based on ArangoDB engine
CN112541816A (en) * 2020-12-21 2021-03-23 四川新网银行股份有限公司 Distributed stream computing processing engine for internet financial consumption credit batch business
CN112634021A (en) * 2020-12-24 2021-04-09 中国建设银行股份有限公司 Client data processing method and device
CN113157449A (en) * 2021-04-16 2021-07-23 上海寰果信息科技有限公司 Real-time stream data analysis processing method based on MQTT

Similar Documents

Publication Publication Date Title
CN109327509A (en) A kind of distributive type Computational frame of the lower coupling of master/slave framework
CN109451072A (en) A kind of message caching system and method based on Kafka
CN110288255A (en) A kind of logistics method and device of distributed transaction
US20130304826A1 (en) Scheduled messages in a scalable messaging system
CN113422842B (en) Distributed power utilization information data acquisition system considering network load
CN111506430B (en) Method and device for processing data under multitasking and electronic equipment
CN112035238A (en) Task scheduling processing method and device, cluster system and readable storage medium
CN113946431A (en) Resource scheduling method, system, medium and computing device
CN102904961A (en) Method and system for scheduling cloud computing resources
CN114610474A (en) Multi-strategy job scheduling method and system in heterogeneous supercomputing environment
CN114548426A (en) Asynchronous federal learning method, business service prediction method, device and system
CN108304267A (en) The multi-source data of highly reliable low-resource expense draws the method for connecing
Aytekin et al. Harnessing the power of serverless runtimes for large-scale optimization
CN115858667A (en) Method, apparatus, device and storage medium for synchronizing data
US8180823B2 (en) Method of routing messages to multiple consumers
CN114710571A (en) Data packet processing system
CN114153635A (en) Message processing method, device, storage medium and computer equipment
CN113971098A (en) RabbitMQ consumption management method and system
CN112114968B (en) Recommendation method, recommendation device, electronic equipment and storage medium
CN115361382B (en) Data processing method, device, equipment and storage medium based on data group
CN116932147A (en) Streaming job processing method and device, electronic equipment and medium
CN114257582A (en) Batch job processing method, distributed system and batch job processing architecture
CN111049846A (en) Data processing method and device, electronic equipment and computer readable storage medium
CN112965796B (en) Task scheduling system, method and device
CN114237858A (en) Task scheduling method and system based on multi-cluster network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination