CN112035233A - Big data batch job task scheduling method and device - Google Patents

Big data batch job task scheduling method and device Download PDF

Info

Publication number
CN112035233A
CN112035233A CN202010906310.5A CN202010906310A CN112035233A CN 112035233 A CN112035233 A CN 112035233A CN 202010906310 A CN202010906310 A CN 202010906310A CN 112035233 A CN112035233 A CN 112035233A
Authority
CN
China
Prior art keywords
batch
task
job
scheduling
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010906310.5A
Other languages
Chinese (zh)
Inventor
杨晓晓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bank of China Ltd
Original Assignee
Bank of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bank of China Ltd filed Critical Bank of China Ltd
Priority to CN202010906310.5A priority Critical patent/CN112035233A/en
Publication of CN112035233A publication Critical patent/CN112035233A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3051Monitoring arrangements for monitoring the configuration of the computing system or of the computing system component, e.g. monitoring the presence of processing resources, peripherals, I/O links, software programs

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Quality & Reliability (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a method and a device for scheduling tasks of large-data batch jobs, wherein the method comprises the following steps: defining a batch job for executing batch processing on big data, wherein the batch job comprises the following steps: a plurality of tasks; configuring batch parameter information of batch operation; executing each task in the batch operation according to the configured batch parameter information; monitoring the job execution state of the batch jobs and the task execution state of each task in the batch jobs; and scheduling each task of the batch job according to the monitoring result. The invention can realize a big data batch job task scheduling mechanism developed based on Java technology so as to realize scheduling of batch processing tasks of the MPP database.

Description

Big data batch job task scheduling method and device
Technical Field
The invention relates to the field of big data batch processing, in particular to a big data batch job task scheduling method and device.
Background
This section is intended to provide a background or context to the embodiments of the invention that are recited in the claims. The description herein is not admitted to be prior art by inclusion in this section.
An MPP (maximum Power Point) architecture can disperse tasks to a plurality of nodes in Parallel, and after calculation on each node is completed, the results of the respective parts are gathered together to obtain a final result. The database adopting the MPP architecture is referred to as an MPP database.
The MPP database has strong parallel data calculation capacity and mass data storage capacity, is very suitable for big data analysis and calculation, and is widely used as a big data management platform of each big bank at present to deal with mass data which shows explosive growth in a bank system.
The big data analysis cannot be separated from the task scheduling, and the task scheduling plays an important role in a big data platform architecture. At present, a big data batch job task scheduling mechanism based on Java technology does not exist in a big data management platform of each big bank, so that the big data batch job task scheduling mechanism is responsible for scheduling and controlling batch processing scheduling tasks of the MPP database.
Disclosure of Invention
The embodiment of the invention provides a big data batch job task scheduling method, which is used for solving the technical problem that a big data batch job task scheduling mechanism developed based on Java does not exist in the prior art, and comprises the following steps: defining a batch job for executing batch processing on big data, wherein the batch job comprises the following steps: a plurality of tasks; configuring batch parameter information of batch operation; executing each task in the batch operation according to the configured batch parameter information; monitoring the job execution state of the batch jobs and the task execution state of each task in the batch jobs; and scheduling each task of the batch job according to the monitoring result.
The embodiment of the invention also provides a big data batch job task scheduling device, which is used for solving the technical problem that a big data batch job task scheduling mechanism developed based on Java does not exist in the prior art, and comprises the following components: the batch job definition module is used for defining batch jobs for executing batch processing on the big data, and the batch jobs comprise: a plurality of tasks; the batch parameter configuration module is used for configuring batch parameter information of batch operation; the batch task execution module is used for executing each task in batch operation according to the configured batch parameter information; the execution state monitoring module is used for monitoring the job execution state of the batch jobs and the task execution state of each task in the batch jobs; and the task scheduling module is used for scheduling each task of the batch job according to the monitoring result.
The embodiment of the invention also provides computer equipment for solving the technical problem that a large data batch job task scheduling mechanism developed based on Java does not exist in the prior art, the computer equipment comprises a memory, a processor and a computer program which is stored on the memory and can be operated on the processor, and the large data batch job task scheduling method is realized when the processor executes the computer program.
The embodiment of the invention also provides a computer readable storage medium, which is used for solving the technical problem that a Java-development-based large data batch job task scheduling mechanism does not exist in the prior art.
In the embodiment of the invention, the batch job which is used for executing batch processing on big data and comprises a plurality of tasks is defined, after the batch parameter information of the batch job is configured, each task in the batch job is executed according to the configured batch parameter information, and each task of the batch job is scheduled by monitoring the job execution state of the batch job and the task execution state of each task in the batch job and further according to the monitoring result, so that a big data batch job task scheduling mechanism developed based on Java technology can be realized, and the scheduling of the batch processing tasks of the MPP database is realized.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts. In the drawings:
FIG. 1 is a flowchart of a task scheduling method for a big data batch job provided in an embodiment of the present invention;
FIG. 2 is a flowchart of an alternative method for scheduling tasks of large data batch jobs according to an embodiment of the present invention;
FIG. 3 is a flowchart of an alternative method for scheduling tasks of large data batch jobs according to an embodiment of the present invention;
FIG. 4 is a flowchart of an alternative method for scheduling tasks of large data batch jobs according to an embodiment of the present invention;
FIG. 5 is a schematic diagram illustrating task relationships of a batch job according to an embodiment of the present invention;
fig. 6 is a schematic view of a monitoring screen of a batch scheduling application according to an embodiment of the present invention;
FIG. 7 is a diagram illustrating a task scheduling apparatus for a big data batch job according to an embodiment of the present invention;
FIG. 8 is a schematic diagram of an alternative task scheduling apparatus for a large data batch job according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the embodiments of the present invention are further described in detail below with reference to the accompanying drawings. The exemplary embodiments and descriptions of the present invention are provided to explain the present invention, but not to limit the present invention.
The embodiment of the invention provides a task scheduling method for large-data batch jobs, and fig. 1 is a flow chart of the task scheduling method for the large-data batch jobs provided in the embodiment of the invention, and as shown in fig. 1, the method includes the following steps:
s101, defining a batch job for executing batch processing on big data, wherein the batch job comprises: a plurality of tasks.
In specific implementation, in S101, one or more batch jobs may be defined according to the batch processing operation performed on the big data, and the execution of each batch job is in complementary interference. The task types included in each batch job may include: the method comprises three types of starting tasks, process tasks and ending tasks, wherein the starting tasks are tasks only provided with successor tasks in the batch jobs, the process tasks are tasks simultaneously provided with predecessor tasks and successor tasks in the batch jobs, and the ending tasks are tasks only provided with predecessor tasks in the batch jobs.
Optionally, the method for scheduling tasks of large-data batch jobs provided in the embodiment of the present invention may be applied to, but not limited to, scheduling batch processing tasks of a massively parallel analysis MPP database.
S102, configuring batch parameter information of batch jobs.
It should be noted that the batch parameter information configured in S102 includes not only parameter information for controlling the execution of the batch job, but also parameter information for controlling the execution of each task in the batch job.
It should be noted that, in the embodiment of the present invention, the batch parameter information to be configured is different for different batch processing of different data, and when the batch job defined in the embodiment of the present invention is a batch job that performs batch processing on the bank MPP database GBASE 8a, the configured batch parameter information may include, but is not limited to: working day, limited day, provincial organization number, line, flow control.
And S103, executing each task in the batch job according to the configured batch parameter information.
In specific implementation, starting the batch operation according to the parameter information executed by the batch operation; and further, after the batch operation is started, controlling the execution of each task according to the parameter information executed by each task in the batch operation.
S104, monitoring the job execution state of the batch jobs and the task execution state of each task in the batch jobs.
In specific implementation, in the process of controlling the execution of each task in the batch job according to the parameter information of the batch job execution and the parameter information of each task in the batch job, the task execution state of each task can be monitored, and the job state of the batch job can be further determined.
It should be noted that, in the embodiment of the present invention, after defining the batch job for performing batch processing on the big data, it is further necessary to define in advance a plurality of job execution states during execution of the batch job and a plurality of task execution states during execution of each task in the batch job.
For example, each task is defined to have 6 states: 0-idle state, 1-executing state, 2-successful completion state, 3-warning state, 4-failure state, 5-ready state; each batch job is defined to have 5 states: 0-idle state, 1-executing state, 2-successful completion state, 3-warning state, 4-failed state.
And S105, scheduling each task of the batch job according to the monitoring result.
In a specific implementation, the step S105 may be implemented by: restarting the batch operation; and/or skipping the executed task in the batch job and continuing to execute the batch job.
In one embodiment, as shown in fig. 2, after executing each task in a batch job according to configured batch parameter information, the method for scheduling tasks of a large data batch job provided in the embodiment of the present invention may further implement recording of log information by:
s201, recording log information of batch jobs in an execution process;
and S202, classifying and storing the recorded log information.
Further, after recording log information of the batch job in the execution process, the method for scheduling tasks of the large-data batch job provided in the embodiment of the present invention may further include the following steps: recording an error code of an error log; extracting log information of the error log according to the error code of the error log; and outputting the log information of the error log. By recording the error codes of the error reporting logs and extracting the log information corresponding to the error codes, large-data operation and maintenance personnel can conveniently and quickly locate the log information with errors.
In the embodiment of the invention, the log information is recorded and classified to be stored, so that the rapid retrieval and positioning of the log information of the batch operation can be realized, and all the log information of the error log can be rapidly positioned by recording the error code of the error log and associating the log information of the error log, thereby bringing great convenience to large-data operation and maintenance personnel.
In an embodiment, as shown in fig. 3, after executing each task in a batch job according to configured batch parameter information, the method for scheduling tasks of a large data batch job according to the embodiment of the present invention may further implement monitoring of system resource usage by:
s301, collecting system resource use information of batch jobs in the execution process;
s302, outputting the collected system resource use information.
In an embodiment, as shown in fig. 4, after monitoring the job execution state of the batch job and the task execution states of the tasks in the batch job, the method for scheduling tasks of a large-data batch job according to the embodiment of the present invention may further implement an alarm on an abnormal batch job by:
s401, generating alarm information according to the monitoring result;
s402, sending alarm information by mail or short message.
In specific implementation, different forms of alarm information can be generated according to different monitoring terminals used by monitoring personnel, for example, when the monitoring terminal used by the monitoring personnel is a mobile phone, the alarm information can be sent to the mobile phone of the monitoring personnel in a short message manner, so that the monitoring personnel can be notified in time; when the monitoring terminal used by the monitoring personnel is a computer, the alarm information is sent to the computer of the monitoring personnel in a mail mode, and the mail can present richer information than the short message, so that the monitoring personnel can know the alarm information more comprehensively.
According to the big data batch job task scheduling method provided by the embodiment of the invention, a set of big data batch program scheduling system for scheduling batch processing tasks of the MPP database GBASE 8a can be developed based on Java technology, the scheduling system can adopt a B/S structure, a server end runs on an AIX operating system and WebSphere ND6.1.0.15, and the database is the MPP database GBASE 8 a. The client uses the IE.
Optionally, in specific implementation, the Java source program is developed based on an Eclipse system development tool, and the storage process is developed based on a gbasesystem development tool.
The following describes an embodiment of the present invention in detail, taking batch processing task scheduling for the MPP database GBASE 8a of the banking system as an example. The operation and maintenance monitoring of the MPP database GBASE 8a is a group of services which are responsible for dispatching and controlling batch processing and dispatching tasks of each layer and the functions of other system master controllers. The functions mainly comprise job maintenance (new creation, modification, deletion and manual starting), job real-time information inquiry, task batch limited day setting and other monitoring functions of 930 service, short message service and batch backup historical data parameter setting.
Batch job design principle
Meet performance optimization, control the number of batch jobs, divide and save the setting jobs, no correlation between jobs
The storage process does not judge whether to execute or not, and whether to execute is controlled by the batch scheduling framework.
FIG. 5 shows the TASK relationships of a batch JOB, as shown in FIG. 5, a batch JOB (JOB) STARTs with a virtual TASK (TASK) START and ENDs with a virtual TASK END. A. B, C three tasks are executed in parallel, after A, B tasks are all successfully finished, D, E starts to execute (execute in parallel); and after the task C is successfully finished, the task F is started, after all the tasks D, E, F are successfully finished, the task G is started, and after the task G is successfully finished, the whole JOB is finished.
In the batch operation, after a task is successfully executed, the next task can be executed, the former task is a precursor task of the next person, and the latter task is a successor task of the former task. For example, A, B, C three tasks are successors of task START, D, E task is successor of A, B two tasks, F task is successor of C task, G task is successor of D, E, F, and D, E, F task is predecessor of G task.
The START task has no predecessor tasks and the END task has no successor tasks. The task relationships of the batch jobs shown in fig. 5 are shown in table 1.
TABLE 1 task relationships for a batch job
Name of current task Predecessor task Successor tasks
START Is free of A、B、C
A START D、E
B START D、E
C START F
D A、B G
E A、B G
F D、E、F G
G D、E、F END
END G Is free of
Each task shown in table 1 has 6 states: 0-idle state, 1-executing state, 2-successful completion state, 3-warning state, 4-failure state, 5-ready state; the batch job shown in table 1 has 5 states: 0-idle state, 1-executing state, 2-successful completion state, 3-warning state, 4-failed state.
In a specific implementation, a start time may be set for each JOB, which may be accurate to seconds. And scanning the starting time of each JOB by the batch scheduling framework, and starting the corresponding JOB at the starting time. The initial state of the JOB is 0, the state after starting is 1, the state after successful completion is 2, the state when the error exists but the effect is not influenced after the JOB execution is completed is 3, and the state when the JOB execution fails is 4. The initial state of the TASK is 0, the states of all the TASKs initialized after the JOB is started are 5, the states after the TASK is started are 1, the states after the TASK is successfully completed are 2, the states when the TASK is successfully completed but not affected are 3, and the states when the TASK is failed to exit are 4.
During design, the atomic TASK can be stored in a table, and the state field is not included; each JOB has two tables, one table stores definition information of the JOB, the other table stores TASK definition information contained in the JOB, and both tables can contain status fields.
The task scheduling of the big data batch job provided by the embodiment of the invention can realize, but is not limited to, the following functions:
setting parameters:
parameters that support settings include, but are not limited to, workday, limited day, provincial agency number, line, flow control.
Working day: after the first setting of online, normally, after all batches of each province run out, the working day of the province automatically jumps to the next day. In the case of batch running by day, the working day needs to be increased by n days to the next execution time. The working days follow the provinces, namely, each province has own working day.
Restricted day: statutory holidays and saturdays.
If the 9 th day, the 29 th day/the 30 th day is saturday, the 10 th day, the 1 st day, the 2 nd day, the 3 rd day, the 4 th day, the 5 th day, the 6 th day and the 7 th day are left on holidays. Therefore, intermodulation is carried out on 29 and 30 days in 9 months and 4 and 5 days in 10 months, namely, the 29 and 30 days are normally on duty, and the 4 and 5 days are left as fake. At this time, 9 months, 29 days and 30 days should not be restricted days.
The mechanism number is saved: can be added, deleted, changed and checked.
Line drawing: a certain line may be turned on and off.
Flow control: the maximum amount of concurrency for each type of transaction may be set. The transaction categories are: default transactions, i.e., the flow of such transactions is not controlled; short transaction; the transaction is simple, the response time is short, the resource consumption is low, and the flow needs to be controlled; medium trade: is more complex than short transactions and needs flow control; long transaction: the most complex transaction, the response time is longer, and the resource consumption is more. According to the sequence of default transaction, short transaction, medium transaction and long transaction, the flow control threshold value is changed from big to small.
Monitoring function:
the monitoring functions supported include, but are not limited to: and displaying whether the job is in a disabled or enabled state, whether the last job is successfully executed, the starting time and the ending time of the job operation and the like. And displaying the current task state and the execution condition. The user can view the operation log and the error log of the job.
Fig. 6 is a schematic view of a monitoring screen of a batch scheduling application provided in an embodiment of the present invention, so as to facilitate operation and maintenance monitoring personnel to monitor an operation state. Two cases are considered, forced execution and skip execution. Mandatory execution is a function that is required when a rerun is required; skip execution is required when some kind of data can be skipped when it does not arrive under exceptional conditions.
Alternatively, on the main screen, tasks with different execution states may be marked with different colors or otherwise, for example, tasks that are successfully executed with green marks, tasks that are completed with yellow marks and have errors but no effect, and tasks that fail with red marks.
Each JOB name is a link and a dialog box can pop up to show all TASK runs of the JOB. And is accompanied by a task relationship table as shown in table 1.
Other monitoring services, such as 930 start-stop and status display, CISS status display, third information status display, sms status display, etc., may also be provided.
Maintenance function:
the supported maintenance functions comprise the definition, modification, deletion and addition of tasks of batch jobs and the definition and deletion of the tasks; when defining batch operation, it can set the provincial organization number and working day, operation executing frequency and other batch parameter information. The limited day and the frequency of task execution (e.g., days per month) may be set when adding tasks. The working day and provincial organization number are taken as parameters by the job and transferred to the storage process. Not defined in terms of provinces, and does not limit the number of times a task is added. The addition may be made in a manner that is convenient for the operator to understand, such as shown in table 1, when adding a task.
Log analysis function:
the log information recorded includes, but is not limited to, both error logs and monitoring logs. The monitoring log mainly includes some status information, such as execution duration, operation status, and the like. The error log is mainly a log recorded when an error occurs in a batch job or a task execution process.
Data backup function:
automatic backup: and the batch framework automatically starts the backup scripts every day according to the backup strategy to perform database layered backup.
Manual backup: logging in the scheduling system, and manually selecting the database to be backed up.
If the backup fails, the batch framework will prompt for manual intervention to restart the backup.
In the embodiment of the invention, the query and the deletion of the log information can be supported, and the query can be carried out according to conditions such as time, type and the like during the query. Collecting logs: the log information is collected together because the log is dispersed in each layer of database; the log type: and the method is divided into an error log and a monitoring log. The monitoring log is mainly some state information, such as execution time, running state, etc. The error log is a log when an error occurs in the TASK or the JOB.
Based on the same inventive concept, the embodiment of the present invention further provides a task scheduling apparatus for large data batch jobs, as described in the following embodiments. Because the principle of the device for solving the problems is similar to that of the large data batch job task scheduling method, the implementation of the device can refer to the implementation of the large data batch job task scheduling method, and repeated parts are not described again.
Fig. 7 is a schematic diagram of a task scheduling apparatus for a large-data batch job provided in an embodiment of the present invention, and as shown in fig. 7, the apparatus includes: a batch job definition module 71, a batch parameter configuration module 72, a batch task execution module 73, an execution status monitoring module 74, and a task scheduling module 75.
The batch job definition module 71 is configured to define a batch job for performing batch processing on big data, where the batch job includes: a plurality of tasks; a batch parameter configuration module 72 configured to configure batch parameter information of batch jobs; a batch task execution module 73, configured to execute each task in the batch job according to the configured batch parameter information; an execution status monitoring module 74 for monitoring the job execution status of the batch job and the task execution status of each task in the batch job; and a task scheduling module 75, configured to schedule each task of the batch job according to the monitoring result.
Optionally, the big-data batch job task scheduling device provided in the embodiment of the present invention may be applied to, but not limited to, scheduling batch processing tasks of a massively parallel analysis MPP database
It should be noted that the task types included in the batch jobs defined by the batch job definition module 71 in the embodiment of the present invention include: the method comprises a starting task, a process task and an ending task, wherein the starting task is a task which only has a subsequent task in the batch operation, the process task is a task which simultaneously has a precursor task and a subsequent task in the batch operation, and the ending task is a task which only has a precursor task in the batch operation.
In one embodiment, the task scheduling module 75 may be configured to restart a batch job; and/or skipping the executed task in the batch job and continuing to execute the batch job.
In an embodiment, as shown in fig. 8, the task scheduling apparatus for a large data batch job provided in an embodiment of the present invention may further include: an execution state definition module 76 is used to define the job execution state of the batch job and the task execution state of each task in the batch job.
In an embodiment, as shown in fig. 8, the task scheduling apparatus for a large data batch job provided in an embodiment of the present invention may further include: and a log recording module 77, configured to record log information of the batch job during execution, and store the recorded log information in a classified manner.
Further, the logging module 77 may further be configured to: recording an error code of an error log; extracting log information of the error log according to the error code of the error log; and outputting the log information of the error log.
In an embodiment, as shown in fig. 8, the task scheduling apparatus for a large data batch job provided in an embodiment of the present invention may further include: and a system resource usage information monitoring module 78, configured to collect system resource usage information of the batch job during execution, and output the collected system resource usage information.
In an embodiment, as shown in fig. 8, the task scheduling apparatus for a large data batch job provided in an embodiment of the present invention may further include: and the warning module 79 is used for generating warning information according to the monitoring result and sending the warning information by adopting a mail or short message mode.
Based on the same inventive concept, the embodiment of the present invention further provides a computer device, so as to solve the technical problem that a Java-based developed task scheduling mechanism for large data batch jobs does not exist in the prior art, where the computer device includes a memory, a processor, and a computer program stored in the memory and capable of running on the processor, and when the processor executes the computer program, the method for scheduling tasks for large data batch jobs is implemented.
Based on the same inventive concept, the embodiment of the present invention further provides a computer-readable storage medium, so as to solve the technical problem that a Java-based developed task scheduling mechanism for a large data batch job does not exist in the prior art, where the computer-readable storage medium stores a computer program for executing the task scheduling method for a large data batch job.
In summary, embodiments of the present invention provide a method, an apparatus, a computer device, and a computer readable storage medium for scheduling tasks of a big data batch job, where a batch job including multiple tasks for executing batch processing on big data is defined, after batch parameter information of the batch job is configured, each task in the batch job is executed according to the configured batch parameter information, and a task scheduling mechanism for a big data batch job developed based on Java technology can be implemented by monitoring a job execution state of the batch job and a task execution state of each task in the batch job and then scheduling each task of the batch job according to a monitoring result, so as to implement scheduling of batch processing tasks of an MPP database.
According to the big data batch job task scheduling method provided by the embodiment of the invention, operation and maintenance personnel can conveniently realize state monitoring of batch jobs, and the operation and maintenance work efficiency is improved; and developers can realize the development of the task scheduling of the large-data batch jobs only by carrying out simple parameter configuration, importing the file into the mapping relation of the maintenance field and storing the precursor and successor relations in the process maintenance. By the embodiment of the invention, the following technical effects can be realized but not limited:
the method is characterized in that a big data batch program scheduling mechanism developed based on Java technology is realized to be responsible for scheduling and controlling batch processing scheduling tasks of the MPP database GBASE 8a and other functions of system master control, including but not limited to: maintenance of jobs (new, modified, deleted, manually initiated), query of job real-time information, setting of limited dates for task runs, and some other monitoring functions (e.g., 930 service, SMS service, batch backup historical data parameter settings). The big data batch job task scheduling method provided by the embodiment of the invention is applied to the MPP database, so that the maintenance cost of the MPP database is greatly reduced, the scheduling is simple, and the scheduling software is small in installation, maintenance, use, operation difficulty and the like.
High customizability: due to the fact that the scheduling interface developed by the Java technology has a strong customizing function, and the personalized requirements of operation and maintenance personnel can be met.
And thirdly, real-time process monitoring: real-time performance states of a CPU, a memory, a virtual memory, a disk I/O and the like are monitored in real time so as to rapidly analyze and solve the performance problem of the server.
Fourthly, the development cost is low: the MPP data management system is configurable for file loading, storage process configuration, file generation and other operations of the MPP database. The scheduling order relationship only requires the configuration of jobs and tasks.
Simple deployment and easy operation. And timely and effective alarm notification is carried out, and operation and maintenance personnel are notified in the forms of mails, short messages and the like.
And sixthly, the application can write specific error codes and logs in the log file by reporting errors, and the problem can be accurately positioned by checking the error codes of all modules. Error handling can also be handled in the standard way of Struts, returning from the Module layer to the Controller layer is an error code. And extracting error information from the properties resource file according to the error code at the Controller layer and returning the error information to the page. The detailed error detail information is written to the log through log4 j. The error warning information, the operation prompt information and the page element information are stored in a Properties file. The user and the operation and maintenance personnel do not need to care about the error code, and the scheduling system can display the error warning information corresponding to the error code on a front page.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (16)

1. A big data batch job task scheduling method is characterized by comprising the following steps:
defining a batch job for executing batch processing on big data, wherein the batch job comprises: a plurality of tasks;
configuring batch parameter information of the batch operation;
executing each task in the batch operation according to the configured batch parameter information;
monitoring the job execution state of the batch jobs and the task execution state of each task in the batch jobs;
and scheduling each task of the batch job according to the monitoring result.
2. The method of claim 1, wherein the task types included in the batch job include: the method comprises a starting task, a process task and an ending task, wherein the starting task is a task which only has a subsequent task in the batch operation, the process task is a task which simultaneously has a precursor task and a subsequent task in the batch operation, and the ending task is a task which only has a precursor task in the batch operation.
3. The method of claim 1, wherein the method further comprises:
and defining the job execution state of the batch job and the task execution state of each task in the batch job.
4. The method of claim 1, wherein scheduling the tasks of the batch job based on the monitoring comprises:
restarting the batch job;
and skipping the executed tasks in the batch jobs, and continuing to execute the batch jobs.
5. The method of claim 1, wherein after executing each task in the batch job according to configured batch parameter information, the method further comprises:
recording the log information of the batch jobs in the execution process;
and classifying and storing the recorded log information.
6. The method of claim 5, wherein after recording log information of the batch job during execution, the method further comprises:
recording an error code of an error log;
extracting log information of the error log according to the error code of the error log;
and outputting the log information of the error log.
7. The method of claim 1, wherein after executing each task in the batch job according to configured batch parameter information, the method further comprises:
collecting system resource use information of the batch jobs in the execution process;
and outputting the collected system resource use information.
8. The method of claim 1, wherein after monitoring the job execution status of the batch job and the task execution status of each task in the batch job, the method further comprises:
generating alarm information according to the monitoring result;
and sending the alarm information by adopting a mail or short message mode.
9. The method of any one of claims 1 to 8, wherein the big data is data stored in a massively parallel analysis (MPP) database.
10. A big data batch job task scheduling device is characterized by comprising:
the batch job definition module is used for defining batch jobs for executing batch processing on big data, and the batch jobs comprise: a plurality of tasks;
the batch parameter configuration module is used for configuring batch parameter information of the batch operation;
the batch task execution module is used for executing each task in the batch operation according to the configured batch parameter information;
the execution state monitoring module is used for monitoring the job execution state of the batch jobs and the task execution state of each task in the batch jobs;
and the task scheduling module is used for scheduling each task of the batch job according to the monitoring result.
11. The apparatus of claim 10, wherein the apparatus further comprises:
and the execution state definition module is used for defining the job execution state of the batch job and the task execution state of each task in the batch job.
12. The apparatus of claim 10, wherein the apparatus further comprises:
and the log recording module is used for recording the log information of the batch jobs in the execution process and performing classified storage on the recorded log information.
13. The apparatus of claim 10, wherein the apparatus further comprises:
and the system resource use information monitoring module is used for acquiring the system resource use information of the batch operation in the execution process and outputting the acquired system resource use information.
14. The apparatus of claim 10, wherein the apparatus further comprises:
and the alarm module is used for generating alarm information according to the monitoring result and sending the alarm information by adopting a mail or short message mode.
15. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method for task scheduling of a large data batch job according to any one of claims 1 to 9 when executing the computer program.
16. A computer-readable storage medium, wherein the computer-readable storage medium stores a computer program for executing the task scheduling method for the large data batch job according to any one of claims 1 to 9.
CN202010906310.5A 2020-09-01 2020-09-01 Big data batch job task scheduling method and device Pending CN112035233A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010906310.5A CN112035233A (en) 2020-09-01 2020-09-01 Big data batch job task scheduling method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010906310.5A CN112035233A (en) 2020-09-01 2020-09-01 Big data batch job task scheduling method and device

Publications (1)

Publication Number Publication Date
CN112035233A true CN112035233A (en) 2020-12-04

Family

ID=73592179

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010906310.5A Pending CN112035233A (en) 2020-09-01 2020-09-01 Big data batch job task scheduling method and device

Country Status (1)

Country Link
CN (1) CN112035233A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112882767A (en) * 2021-02-08 2021-06-01 建信金融科技有限责任公司 Method and system for maintaining spring batch operation web pages
CN113010232A (en) * 2021-03-31 2021-06-22 建信金融科技有限责任公司 Configuration-driven lightweight batch data processing method and device
CN113127175A (en) * 2021-05-18 2021-07-16 中国银行股份有限公司 Host job scheduling operation method and device
CN113434360A (en) * 2021-06-23 2021-09-24 中国建设银行股份有限公司 Method and system for monitoring operation
CN113516545A (en) * 2021-04-22 2021-10-19 建信金融科技有限责任公司 Internal control compliance service management method and device
CN114528079A (en) * 2022-01-28 2022-05-24 中银金融科技有限公司 Method and system for scheduling and processing batch files
CN114564296A (en) * 2022-03-04 2022-05-31 中信银行股份有限公司 Batch processing task scheduling method and device and electronic equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109379400A (en) * 2018-09-04 2019-02-22 中国建设银行股份有限公司 Batch jobs dispatch deal system, method, apparatus and storage medium
CN111125444A (en) * 2019-12-10 2020-05-08 中国平安财产保险股份有限公司 Big data task scheduling management method, device, equipment and storage medium
CN111400011A (en) * 2020-03-19 2020-07-10 中国建设银行股份有限公司 Real-time task scheduling method, system, equipment and readable storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109379400A (en) * 2018-09-04 2019-02-22 中国建设银行股份有限公司 Batch jobs dispatch deal system, method, apparatus and storage medium
CN111125444A (en) * 2019-12-10 2020-05-08 中国平安财产保险股份有限公司 Big data task scheduling management method, device, equipment and storage medium
CN111400011A (en) * 2020-03-19 2020-07-10 中国建设银行股份有限公司 Real-time task scheduling method, system, equipment and readable storage medium

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112882767A (en) * 2021-02-08 2021-06-01 建信金融科技有限责任公司 Method and system for maintaining spring batch operation web pages
CN113010232A (en) * 2021-03-31 2021-06-22 建信金融科技有限责任公司 Configuration-driven lightweight batch data processing method and device
CN113516545A (en) * 2021-04-22 2021-10-19 建信金融科技有限责任公司 Internal control compliance service management method and device
CN113127175A (en) * 2021-05-18 2021-07-16 中国银行股份有限公司 Host job scheduling operation method and device
CN113434360A (en) * 2021-06-23 2021-09-24 中国建设银行股份有限公司 Method and system for monitoring operation
CN113434360B (en) * 2021-06-23 2024-04-19 中国建设银行股份有限公司 Method and system for monitoring operation
CN114528079A (en) * 2022-01-28 2022-05-24 中银金融科技有限公司 Method and system for scheduling and processing batch files
CN114564296A (en) * 2022-03-04 2022-05-31 中信银行股份有限公司 Batch processing task scheduling method and device and electronic equipment

Similar Documents

Publication Publication Date Title
CN112035233A (en) Big data batch job task scheduling method and device
CN111736969B (en) Distributed job scheduling method and device
CN107678907B (en) Database service logic monitoring method, system and storage medium
CN101477543B (en) System and method for automating ETL application
US8126760B2 (en) Work item tracking system for projects
CN106406993A (en) Timed task management method and system
CN103034554A (en) ETL (Extraction-Transformation-Loading) dispatching system and method for error-correction restarting and automatic-judgment starting
CN111125444A (en) Big data task scheduling management method, device, equipment and storage medium
CN111984390A (en) Task scheduling method, device, equipment and storage medium
CN114356750A (en) Test method, test device, computer equipment and storage medium
JP5989194B1 (en) Test management system and program
CN112561370A (en) Software version management method and device, computer equipment and storage medium
CN115907683A (en) Realization system and method of workflow engine based on financial product management
CN111861418A (en) Task generation method and device and electronic equipment
EP2913757A1 (en) Method, system, and computer software product for test automation
CN110825507B (en) Scheduling method supporting multi-task re-running
WO2019223171A1 (en) Workflow management method and system, computer device and storage medium
CN115185825A (en) Interface test scheduling method and device
CN110928884B (en) Data re-brushing method, device and system
CN117573327B (en) Method, equipment and storage medium for intelligent scheduling and trend monitoring
WO2021205529A1 (en) Information processing device, information processing method, and program
CN111078666B (en) Automatic unloading and supplying method based on multi-database crossing center
CN115061801A (en) Task management method and device, storage medium and electronic device
CN117632359A (en) Task scheduling management method based on kubernetes, electronic equipment and storage medium
CN116339806A (en) Production processing method and device for TWS transition version

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination