CN115169949A - Batch job processing method and device under distributed system - Google Patents

Batch job processing method and device under distributed system Download PDF

Info

Publication number
CN115169949A
CN115169949A CN202210885410.3A CN202210885410A CN115169949A CN 115169949 A CN115169949 A CN 115169949A CN 202210885410 A CN202210885410 A CN 202210885410A CN 115169949 A CN115169949 A CN 115169949A
Authority
CN
China
Prior art keywords
actuator
job
group
state
current
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210885410.3A
Other languages
Chinese (zh)
Inventor
谢伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202210885410.3A priority Critical patent/CN115169949A/en
Publication of CN115169949A publication Critical patent/CN115169949A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06311Scheduling, planning or task assignment for a person or group
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/02Banking, e.g. interest calculation or account maintenance

Landscapes

  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Business, Economics & Management (AREA)
  • Finance (AREA)
  • Development Economics (AREA)
  • Marketing (AREA)
  • Accounting & Taxation (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Educational Administration (AREA)
  • Tourism & Hospitality (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Game Theory and Decision Science (AREA)
  • Technology Law (AREA)
  • Hardware Redundancy (AREA)

Abstract

A batch job processing method and device under a distributed system relate to the technical field of distributed processing and can be used in the financial field, and the method comprises the following steps: dividing the batch job into one or more groups; sequentially executing the operation in each group by the corresponding executors in the default park according to the execution plan; if the state of the actuator of the current operation in the current group is abnormal, the operation in the current group is backtracked to position the current group entry operation, the actuator of the current group entry operation in the nearby park with normal state is selected to execute the entry operation, and the operations in the subsequent groups are sequentially completed in the nearby park. According to the method and the system, batch jobs in a distributed system are grouped, and jobs in the same group are completed in the same park, so that cross-domain scheduling is reduced, the execution efficiency of the batch jobs is guaranteed, in addition, the method and the system also realize automatic restart and switching scheduling of the jobs based on the groups, and the emergency recovery efficiency is improved.

Description

Batch job processing method and device under distributed system
Technical Field
The invention relates to the technical field of distributed processing, can be used in the field of finance, and particularly relates to a batch job processing method and device under a distributed system.
Background
Banking system services are complex and fast in development and change, and a large number of batch service processing exists. As core traffic moves down, more and more core batch traffic moves to processing under the distributed batch system.
And the data scale, the job number and the complexity of scheduling among applications of batch job processing are greatly improved when the centralized system is converted into the distributed system. How to further improve the high availability capability, the gray scale capability, the automatic scheduling and emergency capability of the distributed batch system, and reduce the operation and maintenance complexity has become a problem generally faced by the industry.
Disclosure of Invention
In view of the problems in the prior art, embodiments of the present invention mainly aim to provide a method and an apparatus for processing batch jobs in a distributed system, so as to improve the efficiency of processing batch jobs and reduce the complexity of job scheduling.
In order to achieve the above object, an embodiment of the present invention provides a batch job processing method in a distributed system, where the method includes: dividing the batch job into one or more groups; sequentially executing the operation in each group by the corresponding executors in the default park according to the execution plan; if the state of the actuator of the current operation in the current group is abnormal, the operation in the current group is traced back to locate the current group entry operation, the actuator of the current group entry operation with the normal state in the nearby park is selected to execute the entry operation, and the subsequent operations in each group are sequentially completed in the nearby park.
The embodiment of the invention also provides a batch job processing device under the distributed system, which comprises: a grouping unit for dividing the batch job into one or more groups; the execution unit is used for sequentially executing the operation in each group by the corresponding executor in the default park according to the execution plan; and the path selection unit is used for performing operation backtracking on the operation in the current group to locate the current group entry operation when the state of the actuator of the current operation in the current group is abnormal, selecting the actuator of the current group entry operation in the nearby park and with a normal state to execute the entry operation, and sequentially completing the operation in each subsequent group in the nearby park.
An embodiment of the present invention further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the method is implemented.
An embodiment of the present invention further provides a computer-readable storage medium, in which a computer program for executing the above method is stored.
Embodiments of the present invention further provide a computer program product, which includes a computer program/instruction, and the computer program/instruction implements the steps of the above method when executed by a processor.
The invention groups the batch jobs under the distributed system, and completes the jobs in the same group in the same park, thereby reducing cross-domain scheduling, ensuring the execution efficiency of the batch jobs, realizing the self-restart and switching scheduling of the jobs based on the group, improving the emergency recovery efficiency, and avoiding errors caused by human intervention.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained based on these drawings without creative efforts.
FIG. 1 is a schematic diagram of the prior art;
fig. 2 is a flowchart of a batch job processing method in a distributed system according to an embodiment of the present disclosure;
FIG. 3 is a schematic diagram of a grouping of batch jobs according to an embodiment of the present application;
FIG. 4 is a flowchart of a batch job processing method in a distributed architecture according to another embodiment of the present disclosure;
FIG. 5 is a diagram illustrating specifying an actuator state and scheduling a job according to an embodiment of the present disclosure;
fig. 6 is a structural diagram of a batch operation processing apparatus under a distributed system according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The embodiment of the invention provides a method and a device for processing batch jobs under a distributed system, which can be used in the financial field and other fields.
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1, the prior art specifically includes: and (4) performing double-park double-activity deployment of the batch executors, and scheduling the jobs randomly or according to the load. To ensure high availability of bulk files, the executor program will generally set two addresses, main and standby. When regional failure occurs, the upstream application is generally required to download files again at the backup address, and the downstream application executes manual switching; or the upstream application simultaneously issues files to the main address and the standby address, and the downstream application automatically downloads the files from the standby address after the downloading from the main address fails. When a link in the middle of a certain batch of operation links fails in application processing, a fault point needs to be manually positioned, and the link from which redo needs to be executed is judged according to business logic. The method does not have the gray scale production capacity, and the executor program version needs to be rolled back after the new version fails to be executed, and the rolling console carries out batch scheduling.
Among them, the problems of the prior art are as follows: the whole system is difficult to be applied to multi-site and multi-activity deployment, and a large amount of cross-city data synchronization can be caused. Batch systems commonly implement heartbeat checks for the executor nodes and job scheduling strategies based on the executor job load, but lack the health checks to perform dependencies on downstream jobs and perform proactive scheduling control based on such dependency checks. The false activation of the actuator is easily caused, the actual lack of execution conditions after the delivery of the operation is passed causes the interruption of the operation, and the manual intervention is triggered. In the face of regional failures (such as failures of servers or file systems storing bulk files), manual intervention is generally required for emergency; or the available capacity can be improved to a certain extent by using the dual transmission of the main address and the standby address, but on one hand, the network bandwidth pressure is increased in a normalized manner, and on the other hand, a downstream receiving end is required to realize judgment logic in a batch program by itself to select downloading from two upstream nodes. The high availability capability is somewhat intrusive to the bulk business logic, increasing the cost of research and development testing. When a certain operation processing intermediate link breaks down, the restarting position needs to be judged manually, the emergency timeliness is poor, and the risk of manual judgment errors exists. The difficulty of implementing gray scale production of batch operation is high, and the method relates to upstream and downstream full-link coordinated scheduling.
As shown in fig. 2, a flowchart of a batch job processing method under a distributed architecture according to an embodiment of the present invention is provided, an execution main body of the batch job processing method under the distributed architecture according to the embodiment of the present invention includes, but is not limited to, a computer of a batch job control platform, where the computer may be set in a distributed manner or may be a server set in a centralized manner, and the present application is not limited to this, and the method includes the following steps:
step S201: the batch job is divided into one or more groups.
In this embodiment, an application may be composed of a plurality of batch jobs, and the batch jobs of the application are first divided into one or more groups according to a preset rule in this embodiment. For example, as shown in FIG. 3, a batch job for an application is divided into three groups: packet 1, packet 2, and packet 3.
Step S202: and sequentially executing the operation in each group by the corresponding executor in the default park according to the execution plan.
In this embodiment, each batch job corresponds to one executor, and the executor executes the corresponding batch job. The campus is a campus in the meaning of geographic location, the applied batch jobs can be executed by an executor corresponding to the campus in each campus, and the application batch jobs in each campus are also grouped in the same way. As shown in FIG. 3, identifying both parks A and B, where park A is the default park, the application's batch job, when executed, first executes job 1 on park A and then sequentially executes jobs 2 through job 9.
The sequence of the sequential operations may be configured to the batch operation control platform in advance by a configuration staff, and the batch operation control platform controls the actuators to perform the operations in sequence, for example, the downstream dependency relationship of the operations may be configured in the batch operation schedule in advance, so that after the operation 1 is performed, the batch operation control platform may locate the operation 2 according to the downstream dependency relationship of the operation 1 in the operation schedule, then control the actuators to perform the operation 2, and so on to perform the subsequent operations.
Step S203: if the state of the actuator of the current operation in the current group is abnormal, the operation in the current group is traced back to locate the current group entry operation, the actuator of the current group entry operation with the normal state in the nearby park is selected to execute the entry operation, and the subsequent operations in each group are sequentially completed in the nearby park.
In this embodiment, the state of the actuator may be checked when the actuator is started, and then the check result is sent to the batch job control platform, or the check result is periodically checked and sent to the batch job control platform in the whole batch job process, or the batch job console periodically sends a state query instruction to the actuator, so as to obtain state information returned by the actuator, which is not limited in this application.
Taking fig. 3 as an example, when the job 5 is completed and the job 6 needs to be executed, the batch job control platform finds that the state of the actuator of the job 6 is abnormal, or when the actuator executes the job 6 and cannot continue to execute the job due to the abnormal state, performs job backtracking to locate the entry job of the group 2 (i.e., the job 5), then the batch job control platform selects the nearest campus B, checks the state of the actuator corresponding to the job 5 of the campus B, and if the state is normal, performs job scheduling and executes the job 5 and each subsequent job (i.e., the jobs 6 to 9) in the campus B. Since fig. 3 only schematically illustrates two parks, the following processes can be clearly derived by those skilled in the art from the disclosure of the present application: and if the state of the actuator corresponding to the operation 5 in the park B is abnormal, the distance between the parks C in the remaining parks is the nearest and the state of the actuator corresponding to the operation 5 is normal, selecting to execute the operation 5 and the subsequent operations in the park C. In addition, when the park B is selected to continue the operation, if the state of the actuator is abnormal in the subsequent operation, the operation backtracking and the selection of the nearby park are continuously performed on the basis of the park B.
As can be seen from the above description, the present embodiment groups batch jobs in a distributed system, and completes jobs in the same group in the same campus, thereby reducing cross-domain scheduling and ensuring execution efficiency of batch jobs.
Fig. 4 is a flowchart of a batch job processing method under a distributed architecture according to another embodiment of the present application, where the method includes the following steps:
step S401: and according to the processing flow of the batch job file, dividing the jobs of different layers for processing the same service in the batch job into one or more groups.
The process flow of batch jobs generally includes importing, parsing, processing an upstream file, and then exporting a downstream file. Therefore, batch jobs of different layers for processing the same service can be packaged and grouped according to the processing flow of the batch job file. Taking fig. 3 as an example, group 1 in fig. 3 contains four layers of jobs, i.e., job 1-job 2- (job 3.1, job 3.2) -job 4, where job 3.1, job 3.2 are to divide one job into two sub-jobs, which can be executed together, and to send the execution results to job 4, respectively. Here, the processing of jobs 1-4 includes the whole process of importing, parsing, processing, and exporting the upstream file, and is therefore divided into one group.
Step S402: and checking the actuator states of all batch jobs in the first group in the default park, if any actuator state is abnormal, entering step S403, and otherwise, entering step S404.
Step S403: a campus where all the actuators in the first group in the nearby campus are in normal states is selected as a default campus, and the process proceeds to step S404.
Step S404: and sequentially executing the operation in each group by the corresponding executor in the default park according to the execution plan.
Step S405: if the state of the actuator of the current operation in the current group is abnormal, the operation in the current group is traced back to locate the current group entry operation, the states of all actuators in the current group in the nearby park are checked, if the states of all the actuators are normal, the step S406 is executed, otherwise, the step S407 is executed.
Step S406: and selecting the nearby park to execute the currently grouped entry operation, and sequentially finishing the operation in each subsequent group in the nearby park.
Step S407: and continuously checking and selecting the nearby park with all the actuators in the current park in normal states in other parks to execute the entry operation of the current park, and sequentially finishing the operation in each subsequent park.
As can be seen from the above description, in this embodiment, before executing an ingress job of each group, the states of the actuators of all jobs in the group to be executed are checked, and if any one of the states of the actuators is abnormal, the actuator of the group to be executed that is in a nearby park and has a normal state is selected to execute the ingress job, and the jobs in the subsequent groups are sequentially completed in the nearby park. Thus, the execution efficiency of the batch operation can be further improved, and the execution of redundant steps can be avoided. Taking fig. 3 as an example, if job 6 has an exception, and the status of all the executors of group 2 is checked before executing group 2 job 5, the operation of executing job 5 first and then backtracking when job 6 is executed can be avoided.
Preferably, the above-mentioned abnormal state of the actuator in the embodiment corresponding to fig. 2 and 4 may include: the fault exception of the executor per se and the exception of the external dependency state of the current job refer to a database, a file system, a cache system and the like which are depended on in the execution process of the job. When the executor is started, the state attribute of the executor is firstly sent to the batch console, the background check thread is started at the same time, and the peripheral dependence check result is sent to the batch operation console, wherein the state attribute and the check result can be sent at regular intervals. The volume operation console updates the executor return check result to a dependency check table, wherein the dependency check table at least comprises the following contents: application name, check tag, check status, update time, etc.
The background check thread of the executor can be realized through simple policy configuration, peripheral storage devices, databases, file entities or file systems depended by the executor are checked through the policy configuration, check labels and park identifiers are set, the executor background program periodically executes the check and returns the result to the batch operation console, and the batch operation console can supplement the check labels of the downstream dependent operations into the batch operation schedule. The policy configuration format may be as shown in Table 1, and the complementary fields of the batch job schedule may be as shown in Table 2:
TABLE 1
Figure BDA0003765655710000061
TABLE 2
Figure BDA0003765655710000071
Since the jobs in a group may be related to an external dependency in common, the external dependency may be aligned with the group (as described in the configuration description of table 1), and each group corresponds to a group of peripheral dependencies, although it is also possible to align jobs, that is, a job corresponds to a group of peripheral dependencies, which is not limited in this application.
In this embodiment, when the abnormal state of the actuator is the failure of the actuator itself, the job in the current group is traced back to locate the current group entry job, the standby actuator of the failed actuator in the local park is selected to execute the entry job, and the jobs in the subsequent groups are sequentially completed in the local park. In same garden, the executor of every operation all can set up and be no less than 2, breaks down at executor self like this and can chooses for use spare executor to carry out the operation, further reduces cross-domain scheduling.
When the state abnormality of the actuator is the external dependence state abnormality of the current operation, the actuator replacement cannot play a role at the moment, so that the operation backtracking can be only carried out on the operation in the current group to locate the current group entry operation, the actuator which is in the current group entry operation and has a normal state in the nearby park is selected to execute the entry operation, and the operation in each subsequent group is sequentially completed in the nearby park.
Further preferably, in this embodiment, when the state abnormality of the actuator is an actuator failure, and the standby actuator is switched to execute the entry job, the batch job console may further send a monitoring message to the monitoring center to perform low-level alarm on the actuator failure.
Further preferably, in this embodiment, when the state exception of the actuator is an external dependency state exception of the current job, after the actuator in the nearby campus is selected to execute the entry job, the batch job console may also send a monitoring message to the monitoring center to perform a mid-level alarm in the cross-campus scheduling.
Preferably, when the state exception of the actuator is the external dependency state exception of the current operation and the actuators of the current group entry operation in all the parks are checked to be all abnormal, the batch operation processing is interrupted, and a monitoring message is sent to the monitoring center to perform operation interruption high-level alarm. Or when the state of the actuator is abnormal, namely the actuator fails and the standby actuators are unavailable due to failure, the batch operation console interrupts the batch operation processing and sends a monitoring message to the monitoring center to perform high-level alarm of operation interruption.
Further preferably, the batch job console may wait for a preset time after the batch job processing is interrupted, and then resume the scheduling execution of the current group entry job. This is to allow timely resumption of batch job processing after manual troubleshooting.
Through the classified alarm, the background can control the working state of batch operation processing in real time, and then choose which measure to intervene, so that the reduction of execution efficiency caused by manual intervention due to excessive operation stopping is avoided.
Further preferably, this embodiment may control the scheduling path of the batch job by specifying the operating state of the job executor in the group, where the operating state of the executor includes: normal operating state, fault state, maintenance state, etc. The specification of the operation state may be specifically realized by specifying the state of the executor inspection tag from the batch job console, and thus, when the upgrade of the infrastructure is implemented or the job on the upstream and downstream of the batch is significantly changed, the state of the inspection tag corresponding to the relevant job packet may be specified from the batch job console to control the job scheduling path. For example, as shown in FIG. 5:
if the groups 1 and 2 need to perform operating system upgrade and network equipment replacement in the executor physical resource domain corresponding to the park A, the batch operation console can designate the inspection states of the inspection tags of the park A groups 1 and 2 as maintenance, the batch operation console can schedule the operations of the groups 1 and 2 to the park B to execute according to the tag states, and the subsequent operation of the group 3 can also be executed in the park B based on the park priority strategy.
Further preferably, in this embodiment, the downstream dependency of the jobs in the group may include a grayscale dependency and a general dependency, where the jobs in the grayscale dependency complete a grayscale batch job, and the jobs in the general dependency complete a general batch job. In particular, the batch job console may add two sets of scheduling definitions to a given job within the batch schedule, one set of dependent check tags corresponding to downstream grayscale nodes and one set of dependent check tags corresponding to downstream general nodes. Thus, the non-grayscale batch job link and the grayscale batch job link can be completely isolated. Upstream gray level jobs select a downstream scheduling target according to a gray level check label, and the downstream jobs are only executed at gray level nodes; the upstream non-gray level job selects a downstream scheduling target according to the non-gray level check label, and the downstream job is executed only at the non-gray level node. Thereby, a full-link batch gray scale scheduling mechanism can be established.
According to the method, the batch jobs are grouped and scheduled according to the service logic, the technical support and the transformation contract machine facing the service operation and maintenance are provided for the batch services, the jobs in the same group are all completed in the same park, cross-domain scheduling is reduced, the execution efficiency of the batch jobs is guaranteed, in addition, the grouping-based job self-restarting and switching scheduling are realized, the emergency recovery efficiency is improved, and errors caused by human intervention are avoided. In addition, the embodiment also provides hierarchical alarm, so that the background can control the working state of batch job processing in real time, and then choose which measure to intervene, thereby avoiding the reduction of execution efficiency caused by excessive stopping of job for manual intervention.
Fig. 6 is a schematic structural diagram of a batch job processing apparatus under a distributed architecture according to an embodiment of the present application, where the apparatus includes: grouping unit 610, execution unit 620 and path selection unit 630, wherein execution unit 620 is connected to grouping unit 610 and path selection unit 630 respectively.
Grouping unit 610 is used to divide the batch job into one or more groups.
The execution unit 620 is configured to execute the jobs in each group in turn by the corresponding executor within the default campus according to the execution plan.
The path selecting unit 630 is configured to, when the state of the actuator of the current job in the current group is abnormal, perform job backtracking on the job in the current group to locate the current group entry job, select the actuator of the current group entry job in the local park and in a normal state to execute the entry job, and sequentially complete jobs in subsequent groups in the local park.
Preferably, the apparatus further includes a grouping checking unit, configured to check the status of the actuators of all the jobs in the group to be executed before executing the ingress job of each group, and if any one of the actuators is abnormal, the path selecting unit 630 selects the actuator of the ingress job to be executed in the local park with a normal status to execute the ingress job, and sequentially completes the jobs in the subsequent groups in the local park.
Preferably, the grouping unit 610 is specifically configured to: and according to the processing flow of the batch job file, dividing the jobs of different layers for processing the same service in the batch job into one or more groups.
Preferably, the actuator state abnormality includes: the path selecting unit 630 is further specifically configured to:
when the abnormal state of the actuator is the self fault of the actuator, the operation in the current group is traced back to position the current group entry operation, a spare actuator of the fault actuator in the park area is selected to execute the entry operation, and the subsequent operations in each group are sequentially completed in the park area;
and when the state abnormality of the actuator is the external dependence state abnormality of the current operation, performing operation backtracking on the operation in the current group to locate the current group entry operation, selecting the actuator of the current group entry operation in the nearby park to execute the entry operation, and sequentially completing the operation in each subsequent group in the nearby park.
Preferably, the apparatus of this embodiment further includes a low-level alarm unit, configured to send a monitoring message to the monitoring center to perform low-level alarm of the actuator failure after switching the standby actuator to execute the entry operation when the actuator state is abnormal and the actuator itself fails.
Preferably, the apparatus of this embodiment further includes a middle-level alarm unit, configured to, when the state exception of the actuator is an external dependency state exception of the current job, select an actuator in the nearby campus to execute the entry job, and then send a monitoring message to the monitoring center to perform middle-level alarm in cross-campus scheduling.
Preferably, the apparatus of this embodiment further includes a high-level alarm unit, configured to send a monitoring message to the monitoring center to perform operation interruption high-level alarm after the state exception of the actuator is an external dependency state exception of the current operation, and the actuators of the current group entry operation in all the parks are checked to be abnormal and batch operation processing is interrupted.
Preferably, the apparatus of this embodiment further includes a rescheduling unit, configured to, after the batch job processing is interrupted, wait for a preset time and then resume the scheduled execution of the current group entry job.
Preferably, the apparatus of this embodiment further includes a state specifying unit, configured to control the scheduling path of the batch job by specifying an operating state of a job executor within the group, where the operating state of the executor includes: normal operating state, fault state and maintenance state.
Preferably, the executing unit 620 in this embodiment is specifically configured to: and executing the jobs in each group in sequence by the corresponding executors according to the downstream dependency relationship of the jobs in the group.
Preferably, in this embodiment, the downstream dependency of the jobs in the group includes a grayscale dependency and a general dependency, the jobs in the grayscale dependency complete the grayscale batch job, and the jobs in the general dependency complete the general batch job.
For detailed description of each unit of the above apparatus, reference may be made to the description of the foregoing method embodiment, and details are not repeated herein.
According to the method and the device, the batch jobs are grouped and scheduled according to the service logic, the service operation and maintenance oriented technical support and transformation contract are provided for the batch services, the jobs in the same group are all completed in the same park, cross-domain scheduling is reduced, the execution efficiency of the batch jobs is guaranteed, in addition, the grouping-based job self-restart and switching scheduling are realized, the emergency recovery efficiency is improved, and errors caused by manual intervention are avoided. In addition, the embodiment also provides hierarchical alarm, so that the background can control the working state of batch job processing in real time, and then choose which measure to intervene, thereby avoiding the reduction of execution efficiency caused by excessive stopping of job for manual intervention.
The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method when executing the program.
The present invention also provides a computer program product comprising computer programs/instructions which, when executed by a processor, implement the steps of the above method.
The present invention also provides a computer-readable storage medium having stored thereon a computer program for executing the above method.
As shown in fig. 7, the electronic device 600 may further include: communication module 110, input unit 120, audio processor 130, display 160, power supply 170. It is noted that the electronic device 600 does not necessarily include all of the components shown in fig. 7; furthermore, the electronic device 600 may also comprise components not shown in fig. 7, which may be referred to in the prior art.
As shown in fig. 7, the central processor 100, sometimes referred to as a controller or operational control, may include a microprocessor or other processor device and/or logic device, the central processor 100 receiving input and controlling the operation of the various components of the electronic device 600.
The memory 140 may be, for example, one or more of a buffer, a flash memory, a hard drive, a removable media, a volatile memory, a non-volatile memory, or other suitable devices. The information relating to the failure may be stored, and a program for executing the information may be stored. And the central processing unit 100 may execute the program stored in the memory 140 to realize information storage or processing, etc.
The input unit 120 provides input to the cpu 100. The input unit 120 is, for example, a key or a touch input device. The power supply 170 is used to provide power to the electronic device 600. The display 160 is used for displaying display objects such as images and characters. The display may be, for example, an LCD display, but is not limited thereto.
The memory 140 may be a solid state memory such as Read Only Memory (ROM), random Access Memory (RAM), a SIM card, or the like. There may also be a memory that holds information even when power is off, can be selectively erased, and is provided with more data, an example of which is sometimes called an EPROM or the like. The memory 140 may also be some other type of device. Memory 140 includes buffer memory 141 (sometimes referred to as a buffer). The memory 140 may include an application/function storage section 142, and the application/function storage section 142 is used to store application programs and function programs or a flow for executing the operation of the electronic device 600 by the central processing unit 100.
The memory 140 may also include a data store 143, the data store 143 for storing data, such as contacts, digital data, pictures, sounds, and/or any other data used by the electronic device. The driver storage portion 144 of the memory 140 may include various drivers of the electronic device for communication functions and/or for performing other functions of the electronic device (e.g., messaging application, address book application, etc.).
The communication module 110 is a transmitter/receiver 110 that transmits and receives signals via an antenna 111. The communication module (transmitter/receiver) 110 is coupled to the central processor 100 to provide an input signal and receive an output signal, which may be the same as in the case of a conventional mobile communication terminal.
Based on different communication technologies, a plurality of communication modules 110, such as a cellular network module, a bluetooth module, and/or a wireless local area network module, etc., may be provided in the same electronic device. The communication module (transmitter/receiver) 110 is also coupled to a speaker 131 and a microphone 132 via an audio processor 130 to provide audio output via the speaker 131 and to receive audio input from the microphone 132 to implement general telecommunication functions. Audio processor 130 may include any suitable buffers, decoders, amplifiers and so forth. In addition, an audio processor 130 is also coupled to the central processor 100, enabling recording locally through a microphone 132, and enabling locally stored sound to be played through a speaker 131.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention has been described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The principle and the implementation mode of the invention are explained by applying specific embodiments in the invention, and the description of the embodiments is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (14)

1. A batch job processing method under a distributed system is characterized by comprising the following steps:
dividing the batch job into one or more groups;
sequentially executing the operation in each group by the corresponding executor in the default park according to the execution plan;
if the state of the actuator of the current operation in the current group is abnormal, the operation in the current group is backtracked to position the current group entry operation, the actuator of the current group entry operation in the nearby park with normal state is selected to execute the entry operation, and the subsequent operations in each group are sequentially completed in the nearby park.
2. The method of claim 1, further comprising: before executing the inlet operation of each group, checking the states of actuators of all the operations in the group to be executed, if any actuator state is abnormal, selecting the actuator of the inlet operation to be executed in the nearby park with normal state to execute the inlet operation, and sequentially finishing the operations in the subsequent groups in the nearby park.
3. The method of claim 1, wherein dividing the batch job into one or more group jobs comprises: and according to the processing flow of the batch job file, dividing the jobs of different layers for processing the same service in the batch job into one or more groups.
4. The method of claim 1, wherein the actuator state anomaly comprises: the fault of the actuator is abnormal and the external dependence state of the current operation is abnormal;
when the abnormal state of the actuator is the self fault of the actuator, the operation in the current group is backtracked to position the current group entry operation, a standby actuator of the fault actuator in the park area is selected to execute the entry operation, and the subsequent operations in all groups are sequentially completed in the park area;
and when the state abnormality of the actuator is the external dependence state abnormality of the current operation, performing operation backtracking on the operation in the current group to locate the current group entry operation, selecting the actuator of the current group entry operation in the nearby park to execute the entry operation, and sequentially completing the operation in each subsequent group in the nearby park.
5. The method according to claim 4, wherein when the abnormal state of the actuator is the self-failure of the actuator, and after switching the standby actuator to execute the entry job, the method further comprises: and sending monitoring messages to the monitoring center to perform low-level alarm of actuator faults.
6. The method of claim 4, wherein after selecting an actuator on the local campus to execute the entry job when the actuator state exception is an external dependency state exception for the current job, further comprising: and sending a monitoring message to the monitoring center to perform level alarm in cross-park scheduling.
7. The method according to claim 4, wherein when the actuator status exception is an external dependency status exception of the current job, and the actuators of the current packet-entry job in all the parks are checked to be exception, the batch job processing is interrupted, and a monitoring message is sent to the monitoring center to perform a high-level alarm of job interruption.
8. The method of claim 7, wherein after interrupting processing of the batch job, further comprising: and after waiting for the preset time, the scheduling execution of the current packet entry job is carried out again.
9. The method of claim 1, further comprising: controlling a dispatch path for the batch job by specifying an operating state of job executors within a group, the operating state of the executors including: normal operating state, fault state and maintenance state.
10. The method of claim 1, wherein the executing, by the corresponding executor, the jobs within the respective groups in turn comprises: and executing the jobs in each group in sequence by the corresponding executors according to the downstream dependency relationship of the jobs in the group.
11. The method of claim 10, wherein the downstream dependencies of jobs within the group include a grayscale dependency whose job completes a grayscale batch job and a generic dependency whose job completes a generic batch job.
12. An apparatus for processing a batch job under a distributed system, the apparatus comprising:
a grouping unit for dividing the batch job into one or more groups;
the execution unit is used for sequentially executing the operation in each group by the corresponding executors in the default park according to the execution plan;
and the path selection unit is used for performing operation backtracking on the operation in the current group to locate the current group entry operation when the state of the actuator of the current operation in the current group is abnormal, selecting the actuator of the current group entry operation in a nearby park with a normal state to execute the entry operation, and sequentially finishing the operation in each subsequent group in the nearby park.
13. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any one of claims 1 to 11 when executing the computer program.
14. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program for executing the method of any one of claims 1 to 11.
CN202210885410.3A 2022-07-26 2022-07-26 Batch job processing method and device under distributed system Pending CN115169949A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210885410.3A CN115169949A (en) 2022-07-26 2022-07-26 Batch job processing method and device under distributed system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210885410.3A CN115169949A (en) 2022-07-26 2022-07-26 Batch job processing method and device under distributed system

Publications (1)

Publication Number Publication Date
CN115169949A true CN115169949A (en) 2022-10-11

Family

ID=83496239

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210885410.3A Pending CN115169949A (en) 2022-07-26 2022-07-26 Batch job processing method and device under distributed system

Country Status (1)

Country Link
CN (1) CN115169949A (en)

Similar Documents

Publication Publication Date Title
KR100575497B1 (en) Fault tolerant computer system
CN106301876B (en) Physical machine upgrade method, business migration method and device
US20060218545A1 (en) Server system and online software update method
CN102882704B (en) Link protection method in the soft reboot escalation process of a kind of ISSU and equipment
CN103581225A (en) Distributed system node processing task method
CN107479862A (en) The gray scale dissemination method and system of a kind of software upgrading
CN110033095A (en) A kind of fault-tolerance approach and system of high-available distributed machine learning Computational frame
US20060282831A1 (en) Method and hardware node for customized upgrade control
CN111858050B (en) Server cluster hybrid deployment method, cluster management node and related system
CN111464352A (en) Call link data processing method and device
CN106385330A (en) Network function virtualization composer realization method and device
CN115169949A (en) Batch job processing method and device under distributed system
CN102231684B (en) Interface board state detection method, multi-core central processing unit, interface board and router
CN115412610A (en) Flow scheduling method and device under fault scene
CN110347525A (en) A kind of fault handling method and device
CN110035496A (en) A kind of cloud platform switching method, system and electronic equipment
CN113419829B (en) Job scheduling method, device, scheduling platform and storage medium
CN109936462A (en) Disaster recovery method and device
CN112445574A (en) Application container multi-cluster migration method and device
JPH06119182A (en) Information communication network system with down-load control function
CN102917388A (en) Self-repairing method for base stations, cut-through management and control device, cluster head base station and communication system
CN113050974B (en) Online upgrading method and device for cloud computing infrastructure
CN116800604B (en) Configurable laser communication equipment control method, device, equipment and medium
CN115378809B (en) Software version upgrading method and device
CN113762821B (en) Cargo information processing method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination