CN114924858A - Task scheduling method and device, storage medium and electronic equipment - Google Patents

Task scheduling method and device, storage medium and electronic equipment Download PDF

Info

Publication number
CN114924858A
CN114924858A CN202210588291.5A CN202210588291A CN114924858A CN 114924858 A CN114924858 A CN 114924858A CN 202210588291 A CN202210588291 A CN 202210588291A CN 114924858 A CN114924858 A CN 114924858A
Authority
CN
China
Prior art keywords
task
data integration
scheduling
determining
integration task
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210588291.5A
Other languages
Chinese (zh)
Inventor
金悦
刘冰琳
陈倩文
汪兰叶
齐佳敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bank of China Ltd
Original Assignee
Bank of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bank of China Ltd filed Critical Bank of China Ltd
Priority to CN202210588291.5A priority Critical patent/CN114924858A/en
Publication of CN114924858A publication Critical patent/CN114924858A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues

Abstract

The application discloses a task scheduling method and device, a storage medium and electronic equipment, which can be applied to the financial field or other fields. The method comprises the following steps: the method comprises the steps of obtaining a task queue containing a plurality of data integration tasks, judging whether the data integration tasks meet discontinuous scheduling conditions or not for each data integration task, judging whether the data integration tasks meet execution conditions or not if the data integration tasks meet the discontinuous scheduling conditions, and removing the data integration tasks from the task queue to update the task queue if the data integration tasks do not meet the execution conditions. After the updating is completed, the priority of each data integration task in the task queue is determined, the execution sequence of each data integration task is determined according to the priority of each data integration task, and task scheduling is achieved. By applying the method provided by the embodiment of the invention, for a single data integration task, the discontinuous scheduling can be carried out in a continuous time period without creating a plurality of data integration tasks, so that the workload can be reduced, and the resource consumption can be reduced.

Description

Task scheduling method and device, storage medium and electronic equipment
Technical Field
The present invention relates to the field of data integration technologies, and in particular, to a task scheduling method and apparatus, a storage medium, and an electronic device.
Background
In the informatization construction of various enterprise organizations, a data warehouse is one of the common tools. The data warehouse is a data storage environment for providing various types of data support, and the data stored in the data warehouse is integrated by the data in various data sources.
In the application process of the data warehouse, it is often necessary to perform ETL (Extract-Transform-Load) on data of each business system, that is, after data is extracted (Extract) and subjected to cleansing conversion (Transform), the data is loaded (Load) into the data warehouse, and this process may also be referred to as data integration. In the existing data integration process, each data integration task is generally required to be scheduled. At present, in the scheduling process of a data integration task, for the data integration task that needs to be executed repeatedly, the task is usually scheduled according to a preset continuous time period, and when each execution time point of the time period is reached, the data integration task is scheduled to be executed.
The inventor finds that in an actual service scene, the time required to be executed by some data integration tasks is not continuous, but based on the existing task scheduling method, the tasks can be scheduled only according to continuous time periods, if some special time needs to be skipped, the time periods need to be divided, a plurality of data integration tasks are respectively established, the process is very complicated, the workload is large, and a plurality of tasks with the same integration content need to be managed, and the consumed system resources are large.
Disclosure of Invention
In view of this, embodiments of the present invention provide a task scheduling method, so as to solve the problems that multiple data integration tasks need to be created, the workload is large, and the resource consumption is large under the condition of executing across time periods.
The embodiment of the invention also provides a task scheduling device which is used for ensuring the actual realization and application of the method.
In order to achieve the above purpose, the embodiments of the present invention provide the following technical solutions:
a task scheduling method comprises the following steps:
the method comprises the steps of obtaining a task queue to be scheduled, wherein the task queue to be scheduled comprises a plurality of data integration tasks, and the data integration tasks are in one-to-one correspondence with a plurality of pre-generated configuration files;
judging whether each data integration task meets a preset discontinuous scheduling condition or not, and taking the data integration task meeting the discontinuous scheduling condition as a first data integration task;
for each first data integration task, determining a break time file corresponding to the first data integration task, judging whether the first data integration task meets a preset execution condition according to the break time file, and if the first data integration task does not meet the execution condition, determining the first data integration task as a second data integration task;
removing each second data integration task from the task queue to be scheduled to obtain an updated task queue, and determining each data integration task in the updated task queue as a target data integration task;
determining the priority corresponding to each target data integration task according to a preset priority strategy;
and determining an execution sequence corresponding to each target data integration task according to the priority corresponding to each target data integration task so as to schedule each target data integration task.
Optionally, the method described above includes a process of generating the configuration file, where the process includes:
when a task creating instruction sent by a user through a front end is received, determining task configuration content corresponding to a data integration task to be created, wherein the task configuration content comprises task nodes, workflows, a coordinator and batch scheduling workflows created by the user through the front end;
calling a preset extensible markup language parsing interface, parsing the task configuration content, and generating an extensible markup language file corresponding to the task configuration content;
and taking the extensible markup language file as a configuration file corresponding to the data integration task to be created.
Optionally, the method for determining whether each data integration task meets a preset discontinuous scheduling condition includes:
determining task information corresponding to each data integration task;
acquiring a scheduling identifier corresponding to each data integration task from task information corresponding to each data integration task;
and for each data integration task, if the scheduling identifier corresponding to the data integration task is the scheduling identifier of the discontinuous scheduling, determining that the data integration task meets the discontinuous scheduling condition.
Optionally, the method for determining whether the first data integration task meets the preset execution condition according to the discontinuous time file includes:
determining a date constraint condition corresponding to the discontinuous time file;
determining a current date;
judging whether the current date is matched with the date constraint condition, and if the current date is matched with the date constraint condition, determining that the first data integration task does not accord with the execution condition;
and if the current date does not match the date constraint condition, determining that the first data integration task meets the execution condition.
Optionally, in the method, the determining, according to the priority corresponding to each target data integration task, an execution sequence corresponding to each target data integration task includes:
and determining the execution sequence corresponding to each target data integration task according to the sequence of the priority level corresponding to each target data integration task from high to low and the sequence from first to last.
A task scheduling apparatus, comprising:
the device comprises an acquisition unit, a scheduling unit and a scheduling unit, wherein the acquisition unit is used for acquiring a task queue to be scheduled, the task queue to be scheduled comprises a plurality of data integration tasks, and the data integration tasks are in one-to-one correspondence with a plurality of pre-generated configuration files;
the first judging unit is used for judging whether each data integration task meets a preset discontinuous scheduling condition or not and taking the data integration task meeting the discontinuous scheduling condition as a first data integration task;
the second judging unit is used for determining a break time file corresponding to each first data integration task, judging whether the first data integration task meets a preset execution condition or not according to the break time file, and determining the first data integration task as a second data integration task if the first data integration task does not meet the execution condition;
the updating unit is used for removing each second data integration task from the task queue to be scheduled, obtaining an updated task queue, and determining each data integration task in the updated task queue as a target data integration task;
the first determining unit is used for determining the priority corresponding to each target data integration task according to a preset priority strategy;
and the second determining unit is used for determining the execution sequence corresponding to each target data integration task according to the priority corresponding to each target data integration task so as to schedule each target data integration task.
The above apparatus, optionally, further comprises:
the third determining unit is used for determining task configuration content corresponding to a data integration task to be created when a task creating instruction sent by a user through a front end is received, wherein the task configuration content comprises task nodes, workflows, coordinators and batch scheduling workflows created by the user through the front end;
the generating unit is used for calling a preset extensible markup language analysis interface, analyzing the task configuration content and generating an extensible markup language file corresponding to the task configuration content; and taking the extensible markup language file as a configuration file corresponding to the data integration task to be created.
Optionally, the apparatus described above, wherein the first determining unit includes:
the first determining subunit is used for determining task information corresponding to each data integration task;
the acquiring subunit is configured to acquire a scheduling identifier corresponding to each data integration task from task information corresponding to each data integration task;
and the second determining subunit is configured to determine, for each data integration task, that the data integration task meets the intermittent scheduling condition if the scheduling identifier corresponding to the data integration task is the scheduling identifier of intermittent scheduling.
A storage medium comprising stored instructions, wherein when executed, the instructions control a device on which the storage medium is located to perform a task scheduling method as described above.
An electronic device comprising a memory, and one or more instructions, wherein the one or more instructions are stored in the memory and configured to be executed by one or more processors to perform the task scheduling method as described above.
Based on the task scheduling method provided by the embodiment of the invention, the task scheduling method comprises the following steps: acquiring a task queue comprising a plurality of data integration tasks, judging whether each data integration task meets a preset intermittent scheduling condition, and taking the data integration task meeting the condition as a first data integration task; for each first data integration task, determining a break time file corresponding to the first data integration task, judging whether the first data integration task meets a preset execution condition or not according to the break time file, and if not, determining the first data integration task as a second data integration task; removing each second data integration task from the task queue to update the task queue, and after updating, determining the priority corresponding to each data integration task in the task queue according to a preset priority strategy; and determining the execution sequence of the data integration tasks according to the priority, and realizing task scheduling. By applying the method provided by the embodiment of the invention, the discontinuous time file can be configured according to actual requirements. In the task scheduling process, for a data integration task which needs to be scheduled intermittently (namely, certain time is skipped in a continuous time period), when the data integration task is scheduled and executed each time, whether the data integration task meets the execution condition currently can be judged through an intermittent time file, if not, the data integration task can be removed from a task queue, and the data integration task is not executed. Based on a single data integration task, continuous scheduling with time interruption can be realized, a plurality of data integration tasks are not required to be created, the workload can be reduced, and system resources can be saved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a flowchart of a method for task scheduling according to an embodiment of the present invention;
fig. 2 is a flowchart of another method of a task scheduling method according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a task scheduling device according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a task scheduling apparatus according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In this application, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
As can be seen from the background art, in the conventional scheduling process of the data integration tasks, for the continuous scheduling of the data integration tasks, the continuous scheduling of the data integration tasks can be continuously and repeatedly scheduled only according to the preset continuous time periods, for example, the preset time periods are from date a to date B, so that the data integration tasks are repeatedly scheduled to be executed every day in the time period, and some dates or time periods in the time period cannot be skipped. If there is a need for intermittent scheduling, i.e. certain dates or time periods need to be skipped, multiple data integration tasks need to be created, each corresponding to a time period for which there is no intermittent scheduling, to avoid performing the integration operation of this data integration task at some time. For a single data integration operation, a plurality of tasks need to be created, the workload is large, and the consumed resources are more. Secondly, in an actual application scenario, the requirements of the data integration tasks on the processing time are different, and in the existing task scheduling process, the execution sequence of each data integration task is generally randomly arranged, which is not favorable for meeting the response requirements of the data integration tasks and is easy to cause adverse effects on the data integration work.
Therefore, the embodiment of the invention provides a task scheduling method, which can realize the intermittent scheduling of a single data integration task in a continuous time period through the judgment of the condition of the intermittent scheduling, does not need to create a plurality of tasks to meet the requirement of the intermittent scheduling, can reduce the workload and save the system resources.
The embodiment of the invention provides a task scheduling method, which can be applied to a task scheduling system of a data integration task, wherein an execution main body of the method can be a processor of the system, and a flow chart of the method is shown in figure 1 and comprises the following steps:
s101: the method comprises the steps of obtaining a task queue to be scheduled, wherein the task queue to be scheduled comprises a plurality of data integration tasks, and the data integration tasks are in one-to-one correspondence with a plurality of pre-generated configuration files;
in the method provided by the embodiment of the invention, the task scheduling system can schedule each data integration task currently entering the task queue in real time. The task scheduling system can monitor each established data integration task in the system in real time, and when the execution time point of the data integration task is triggered, the data integration task is added into the task queue. The data-integration task may be continuously and repeatedly executed, for example, the execution time of the data-integration task is set to eight am from date a to date B, and the data-integration task is added to the task queue every eight am during the period from date a to date B.
In the method provided by the embodiment of the invention, the data integration tasks in the system are analyzed and created based on the corresponding configuration files, that is, the system analyzes based on the pre-generated configuration files and establishes the data integration tasks corresponding to the configuration files. A configuration file is a type of file that describes the workflow of its corresponding data integration task.
S102: judging whether each data integration task meets a preset intermittent scheduling condition or not, and taking the data integration task meeting the intermittent scheduling condition as a first data integration task;
in the method provided by the embodiment of the present invention, it is determined whether each data integration task in the task queue meets a preset intermittent scheduling condition, that is, whether each data integration task has a requirement for intermittent scheduling, that is, a specified execution time point is skipped. And taking the data integration task meeting the intermittent scheduling condition, namely the data integration task with the intermittent scheduling requirement as a first data integration task. Whether the data integration task needs to be scheduled intermittently or not can be determined by the setting of the user for the data integration task.
S103: for each first data integration task, determining a break time file corresponding to the first data integration task, judging whether the first data integration task meets a preset execution condition according to the break time file, and if the first data integration task does not meet the execution condition, determining the first data integration task as a second data integration task;
in the method provided by the embodiment of the invention, when the data integration task has a requirement of discontinuous scheduling, a user needs to correspondingly configure a discontinuous time file corresponding to the data integration task, that is, information of time or date that the data integration task needs to be discontinuously scheduled is described.
In the method provided by the embodiment of the invention, the discontinuous time file corresponding to each first data integration task can be obtained in the system, and whether each first data integration task meets the execution condition is judged according to the discontinuous time file, namely whether the task needs to be normally executed or the current execution time point needs to be skipped. When the first data integration task does not meet the execution condition, that is, it is stated that the first data integration task needs to skip the current execution time point, that is, the first data integration task does not need to be executed currently, the first data integration task is regarded as the second data integration task, and the second data integration task may be understood as a data integration task that needs to be executed repeatedly in a continuous time period but needs to skip the current execution time point.
S104: removing each second data integration task from the task queue to be scheduled to obtain an updated task queue, and determining each data integration task in the updated task queue as a target data integration task;
in the method provided by the embodiment of the invention, each data integration task determined as the second data integration task in the task queue is removed to update the task queue, and after the update is completed, each data integration task in the task queue is used as a target data integration task, namely the data integration task which needs to be executed currently.
S105: determining the priority corresponding to each target data integration task according to a preset priority strategy;
in the method provided by the embodiment of the invention, the priority strategy can be set according to the actual requirement, and the priority of each target data integration task is determined. Specifically, the priority may be determined according to the waiting time of the task, and the longer the waiting time, the higher the priority of the data integration task. The priority can also be determined according to the data type of task integration, the importance degree of each type of data is configured in advance, the higher the importance degree of the data integrated by the data integration task is, the higher the priority is, and the like.
It should be noted that, in a specific implementation process, the priority policy may be set according to actual requirements, and the priority of the data integration task may be determined in different manners according to the requirements, without affecting the implementation function of the method provided in the embodiment of the present invention.
S106: and determining an execution sequence corresponding to each target data integration task according to the priority corresponding to each target data integration task so as to schedule each target data integration task.
In the method provided by the embodiment of the invention, the execution sequence of each target data integration task is determined according to the priority of each target data integration task. And sequentially executing each target data integration task according to the execution sequence of each target data integration task, thereby realizing the scheduling of each target data integration task.
Based on the method provided by the embodiment of the invention, when each data integration task in the task queue needs to be scheduled, whether each data integration task meets the preset discontinuous scheduling condition or not can be judged, if yes, the discontinuous time file corresponding to the data integration task is determined, whether the data integration task meets the preset execution condition or not is judged according to the discontinuous time file, and if not, the data integration task is removed from the task queue. And determining the priority of each data integration task still reserved in the task queue according to a preset priority strategy, and determining the execution sequence of each data integration task in the task queue according to the priority of each data integration task to realize task scheduling. By applying the method provided by the embodiment of the invention, the discontinuous time file can be configured according to actual requirements, such as which dates or time points need to be skipped by the data integration task in a continuously executed time period. In the process of task scheduling, for a data integration task needing to be scheduled and executed discontinuously, each time the data integration task is scheduled and executed, whether the data integration task meets the execution condition currently can be judged through a discontinuous time file, if not, the data integration task can be removed from a task queue, and the data integration task is not executed. Based on a single data integration task, continuous scheduling with time interruption can be realized, a plurality of data integration tasks are not required to be created, the workload can be reduced, and system resources can be saved. In the process of task scheduling, the priority of the data integration tasks can be determined, the execution sequence of the data integration tasks is determined according to the priority sequence, response requirements of different data integration tasks are met, and smooth operation of the data integration tasks is guaranteed.
To better explain the method provided by the embodiment of the present invention, on the basis of the method shown in fig. 1, the embodiment of the present invention provides another task scheduling method, as shown in a flowchart shown in fig. 2, in the method provided by the embodiment of the present invention, the generating process of the configuration file mentioned in step S101 includes:
s201: when a task creating instruction sent by a user through a front end is received, determining task configuration content corresponding to a data integration task to be created, wherein the task configuration content comprises task nodes, workflows, coordinators and batch scheduling workflows created by the user through the front end;
in the method provided by the embodiment of the invention, the data integration task is established based on the workflow, and the configuration file refers to the configuration file of the workflow corresponding to the data integration task.
In the method provided by the embodiment of the present invention, a front-end interface for establishing a workflow is provided, and specifically, a visual interface is developed based on a workflow engine Oozie, which is an existing workflow engine and is not described in detail herein.
In the method provided by the embodiment of the invention, when a user needs to create a new data integration task, the user can establish task nodes (actions), workflows (workflow), a coordinator (coordinator) and batch scheduling workflows (bundle) through a visual interface at the front end in a dragging and pulling manner, the user can preset task operation contents of subtasks and then establish each task node, and the association relationship, namely the association sequence, of each action is set in the dragging and pulling manner, so that workflow can be established. The bundle is an abstraction of a plurality of coordinators, and a group of coordinators are subjected to summary processing. workflow may be understood as a collection of task nodes and control nodes in combination. The action, workflow, coordinator, and bundle are all existing elements in Oozie, and will not be further described here.
It should be noted that, in the specific implementation process, the action, workflow, coordinator and other contents established by the user may be established according to actual task requirements, may not include all the above-mentioned elements, and the specific node number of each type of element may also be different, without affecting the implementation function of the method provided in the embodiment of the present invention.
In the method provided by the embodiment of the invention, after the user establishes various elements through the front end, the task configuration contents can be sent to the system so as to send the task creation instruction.
S202: calling a preset extensible markup language parsing interface, parsing the task configuration content, and generating an extensible markup language file corresponding to the task configuration content;
in the method provided by the embodiment of the invention, an Extensible Markup Language (XML) parsing interface (dom4j) can be called to parse the task configuration content, and then an Extensible Markup Language file, namely an XML file, corresponding to the task configuration content and capable of being identified by Oozie can be automatically generated. dom4j is a java XML API, an existing open source parsing package, and is not described in detail herein.
S203: and taking the extensible markup language file as a configuration file corresponding to the data integration task to be created.
In the method provided by the embodiment of the invention, the automatically generated XML file can be used as the configuration file corresponding to the data integration task which needs to be created by the user.
Based on the method provided by the embodiment of the invention, the user can create the relevant elements of the corresponding workflow by dragging and pulling and the like based on the visual front-end interface, and the system can automatically generate the corresponding XML file based on the elements. The user does not need to write and configure the XML file by himself, the workload can be reduced, and the working efficiency can be improved.
On the basis of the method shown in fig. 1, in the method provided in the embodiment of the present invention, the step S102 of determining whether each of the data integration tasks meets the preset intermittent scheduling condition includes:
determining task information corresponding to each data integration task;
acquiring a scheduling identifier corresponding to each data integration task from task information corresponding to each data integration task;
and for each data integration task, if the scheduling identifier corresponding to the data integration task is the scheduling identifier of the discontinuous scheduling, determining that the data integration task meets the discontinuous scheduling condition.
In the method provided by the embodiment of the invention, when a user creates the task configuration content of the data integration task, the user can set the requirement of discontinuous scheduling for the data integration task, namely, the data integration task is set to be required to be discontinuously scheduled or not required to be discontinuously scheduled. The setting information of the user for the discontinuous scheduling requirement of the data integration task is stored in the task information of the data integration task.
In the method provided by the embodiment of the invention, the scheduling identifier corresponding to the data integration task, that is, an identifier for representing whether the data integration task needs to be scheduled intermittently or not, can be obtained from the task information of the data integration task, and corresponds to the setting of the intermittent scheduling requirement of the data integration task by the user. And if the scheduling identifier is an identifier of discontinuous scheduling, namely the data integration task needs to be discontinuously scheduled, determining that the data integration task meets the discontinuous scheduling condition. And if the scheduling identifier corresponding to the data integration task is the scheduling identifier of the continuous scheduling, that is, the data integration task does not need to be subjected to the intermittent scheduling, determining that the data integration task does not accord with the preset intermittent scheduling condition.
Further, on the basis of the method shown in fig. 1, in the method provided in the embodiment of the present invention, the step S103 of determining whether the first data integration task meets the preset execution condition according to the break time file includes:
determining date constraint conditions corresponding to the discontinuous time files;
in the method provided by the embodiment of the present invention, when the user performs the setting requiring the intermittent scheduling, the information about the time requiring the intermittent scheduling, that is, the time requiring the skipping may be, for example, a weekend break or a holiday break, or a date break, that is, the time is skipped, and the data integration task is not performed in the time. The information related to the break time set by the user is recorded in the break time file. The date constraint condition corresponding to the discontinuous time file, namely some conditions met by the time needing to be discontinuous set by the user, can be determined by reading the content of the discontinuous time file. For example, a specific date, or an indicated time date, such as monday, tuesday, wednesday, etc., may be a holiday, etc., which may also be understood as a weekend and a number of specified holiday dates.
Determining a current date;
in the method provided by the embodiment of the invention, the date of the current day, namely the current date, can be determined by reading the system time.
Judging whether the current date is matched with the date constraint condition, and if the current date is matched with the date constraint condition, determining that the first data integration task does not accord with the execution condition;
in the method provided by the embodiment of the present invention, the current date may be matched with the date constraint condition, that is, whether the current date satisfies the date constraint condition is determined, that is, whether the current date is a date that matches with the time that needs to be interrupted and is set by the user is determined, for example, whether the current date is a specified date is determined if the date constraint condition is a specified specific date, and whether the current date corresponds to monday is determined if the date constraint condition is a time expression such as monday. And if the current date is matched with the date constraint condition, namely the current date is the time needing interruption set by the user, determining that the current data integration task is not in accordance with the execution condition, namely the current data integration task does not need to be executed.
And if the current date does not match the date constraint condition, determining that the first data integration task meets the execution condition.
In the method provided by the embodiment of the invention, if the current date is not matched with the date constraint condition, namely the current date is not the time which needs to be interrupted and is the time which needs to execute the current data integration task, the current data integration task is determined to be in accordance with the execution condition.
Based on the method shown in fig. 1, in the method provided in the embodiment of the present invention, the step S106 of determining the execution sequence corresponding to each target data integration task according to the priority corresponding to each target data integration task includes:
and determining the execution sequence corresponding to each target data integration task according to the sequence of the priority level corresponding to each target data integration task from high to low and the sequence from first to last.
In the method provided by the embodiment of the present invention, the execution sequence is determined from the first to the second by contrasting the sequence of the priorities of the target data integration tasks from high to low, that is, the higher the priority of the target data integration task is, the earlier the corresponding execution sequence is, that is, the data integration task with the higher priority is preferentially executed.
In order to better illustrate the method provided by the embodiment of the present invention, in combination with an actual application scenario, the embodiment of the present invention provides another task scheduling method. The method provided by the embodiment of the invention is applied to the scheduling scene of the ETL task of the bank data warehouse.
The method provided by the embodiment of the invention can be applied to a scheduling system of ETL tasks of a bank data warehouse, and the system is an instantiation of the method shown in figure 1. The system mainly comprises the following modules:
a workflow module: providing a pulling-free form for a user to create a workflow, and performing operations such as starting and stopping on a task through a graphical interface; the task triggering mode can be based on time, can also be based on the file arrival on the monitored hdfs path, and can also provide an integrated task state monitoring page.
A holiday scheduling module: whether to create a new task instance is decided by determining whether the current day is a work day.
An XML file generation module: and automatically generating an XML file according to a workflow formed by pulling and releasing of a user, and uploading the XML file to a corresponding path of hdfs.
A priority scheduling module: and designing an algorithm, and reasonably planning an execution sequence according to the waiting time and the priority.
The task scheduling process provided by the embodiment of the invention mainly comprises the following steps:
the user uses the workflow module to establish the workflow, and sets task priority, task triggering mode, whether to schedule holidays and the like.
And converting the working stream into an XML file which can be identified by Oozie through an XML file generating module, and uploading the XML file to hdfs.
If the scheduling is the holiday scheduling, uploading excel files through a holiday scheduling module, and automatically adding Java task nodes and outputting Boolean values to a precision control node through the module to achieve the holiday scheduling. In a specific scheduling processing process, a Java task node can be established in advance, the holiday scheduling is judged, and the judgment result is used as an output parameter and is transmitted to the Decision control node. The Decision node then captures this parameter as a criterion to implement the holiday schedule. The specific judgment logic of the Java node is that the result is false by taking the double holidays as holidays generally, and the result is true by taking Monday to Friday as workdays. But it also needs to read the uploaded excel file and update the judgment result according to the special date in the file.
Tasks in the workflow are initiated using the workflow module.
And the priority scheduling module calculates the execution sequence of the tasks in all the started workflows according to an algorithm and distributes the tasks to a Hadoop environment to execute specific tasks.
In the method provided by the embodiment of the invention, a graphical operation page and an integrated task monitoring page are developed. And adding holiday scheduling judgment, and not building a new task example in a holiday. The XML file needed in the Oozie schedule is automatically generated. And creating a priority queue and designing an algorithm, and reasonably planning an execution sequence according to the waiting time and the priority.
The embodiment of the invention provides an Oozie-based scheduling system for processing ETL tasks of a bank data warehouse, which can establish a workflow in a pull-off mode and increase a holiday scheduling function and a priority queue. The method can help bank enterprises to reasonably plan and efficiently execute ETL tasks and establish a data warehouse based on the ETL tasks. The burden of data warehouse developers can be reduced, and a foundation is laid for faster and better service provision of banks.
Corresponding to the task scheduling method shown in fig. 1, an embodiment of the present invention further provides a task scheduling apparatus, which is used for specifically implementing the method shown in fig. 1, and a schematic structural diagram of the task scheduling apparatus is shown in fig. 3, and includes:
an obtaining unit 301, configured to obtain a task queue to be scheduled, where the task queue to be scheduled includes multiple data integration tasks, and the multiple data integration tasks correspond to multiple pre-generated configuration files one to one;
a first judging unit 302, configured to judge whether each of the data integration tasks meets a preset intermittent scheduling condition, and use the data integration task meeting the intermittent scheduling condition as a first data integration task;
a second determining unit 303, configured to determine, for each first data integration task, a break time file corresponding to the first data integration task, determine whether the first data integration task meets a preset execution condition according to the break time file, and if the first data integration task does not meet the execution condition, determine the first data integration task as a second data integration task;
an updating unit 304, configured to remove each second data integration task from the task queue to be scheduled, obtain an updated task queue, and determine each data integration task in the updated task queue as a target data integration task;
a first determining unit 305, configured to determine, according to a preset priority policy, a priority corresponding to each target data integration task;
a second determining unit 306, configured to determine, according to the priority corresponding to each target data integration task, an execution order corresponding to each target data integration task, so as to schedule each target data integration task.
Based on the device provided by the embodiment of the invention, when each data integration task in the task queue needs to be scheduled, whether each data integration task meets the preset intermittent scheduling condition or not can be judged, if yes, the intermittent time file corresponding to the data integration task is determined, whether the data integration task meets the preset execution condition or not is judged according to the intermittent time file, and if not, the data integration task is removed from the task queue. And determining the priority of each data integration task still reserved in the task queue according to a preset priority strategy, and determining the execution sequence of each data integration task in the task queue according to the priority of each data integration task to realize task scheduling. By applying the device provided by the embodiment of the invention, the discontinuous time file can be configured according to actual requirements, such as which dates or time points need to be skipped by the data integration task in a continuously executed time period. In the process of task scheduling, for a data integration task needing to be scheduled and executed discontinuously, each time the data integration task is scheduled and executed, whether the data integration task meets the execution condition currently can be judged through a discontinuous time file, if not, the data integration task can be removed from a task queue, and the data integration task is not executed. Based on a single data integration task, continuous scheduling with time interruption can be realized, a plurality of data integration tasks are not required to be created, the workload can be reduced, and system resources can be saved. Secondly, in the task scheduling process, the priority of the data integration tasks can be determined, the execution sequence of the data integration tasks is determined according to the priority, the response requirements of different data integration tasks are favorably met, and the smooth operation of the data integration work is guaranteed.
On the basis of the apparatus shown in fig. 1, an embodiment of the present invention provides another task scheduling apparatus, a schematic structural diagram of which is shown in fig. 4, and the apparatus provided in the embodiment of the present invention further includes:
a third determining unit 307, configured to determine task configuration content corresponding to a data integration task to be created when a task creation instruction sent by a user through a front end is received, where the task configuration content includes a task node, a workflow, a coordinator, and a batch scheduling workflow created by the user through the front end;
a generating unit 308, configured to invoke a preset xml parsing interface, parse the task configuration content, and generate an xml file corresponding to the task configuration content; and taking the extensible markup language file as a configuration file corresponding to the data integration task to be created.
On the basis of the apparatus provided in the foregoing embodiment, in the apparatus provided in the embodiment of the present invention, the first determining unit 302 includes:
the first determining subunit is used for determining task information corresponding to each data integration task;
the acquiring subunit is configured to acquire a scheduling identifier corresponding to each data integration task from task information corresponding to each data integration task;
and the second determining subunit is configured to determine, for each data integration task, that the data integration task meets the intermittent scheduling condition if the scheduling identifier corresponding to the data integration task is the scheduling identifier of intermittent scheduling.
On the basis of the apparatus provided in the foregoing embodiment, in the apparatus provided in the embodiment of the present invention, the second determining unit 303 includes:
a third determining subunit, configured to determine a date constraint condition corresponding to the discontinuous time file; determining a current date;
a judging subunit, configured to judge whether the current date matches the date constraint condition, and if the current date matches the date constraint condition, determine that the first data integration task does not meet the execution condition;
and the fourth determining subunit is configured to determine that the first data integration task meets the execution condition if the current date does not match the date constraint condition.
On the basis of the apparatus provided in the foregoing embodiment, in the apparatus provided in the embodiment of the present invention, the second determining unit 306 includes:
a fifth determining subunit, configured to determine, according to a sequence from high to low of the priority corresponding to each target data integration task and a sequence from the first to the next, an execution sequence corresponding to each target data integration task.
The embodiment of the present invention further provides a storage medium, where the storage medium includes stored instructions, and when the instructions are executed, the device on which the storage medium is located is controlled to execute the task scheduling method.
An electronic device is provided in an embodiment of the present invention, and the structural diagram of the electronic device is shown in fig. 5, which specifically includes a memory 401 and one or more instructions 402, where the one or more instructions 402 are stored in the memory 401 and configured to be executed by one or more processors 403 to perform the following operations for executing the one or more instructions 402:
the method comprises the steps of obtaining a task queue to be scheduled, wherein the task queue to be scheduled comprises a plurality of data integration tasks, and the data integration tasks are in one-to-one correspondence with a plurality of pre-generated configuration files;
judging whether each data integration task meets a preset discontinuous scheduling condition or not, and taking the data integration task meeting the discontinuous scheduling condition as a first data integration task;
for each first data integration task, determining a break time file corresponding to the first data integration task, judging whether the first data integration task meets a preset execution condition according to the break time file, and if the first data integration task does not meet the execution condition, determining the first data integration task as a second data integration task;
removing each second data integration task from the task queue to be scheduled to obtain an updated task queue, and determining each data integration task in the updated task queue as a target data integration task;
determining the priority corresponding to each target data integration task according to a preset priority strategy;
and determining an execution sequence corresponding to each target data integration task according to the priority corresponding to each target data integration task so as to schedule each target data integration task.
It should be noted that the task scheduling method and apparatus, the storage medium, and the electronic device provided by the present invention may be used in the financial field or other fields, for example, in a bank data warehouse application scenario in the financial field. The other fields are arbitrary fields other than the financial field, for example, the field of communication services. The foregoing is merely an example, and does not limit the application fields of the task scheduling method and apparatus, the storage medium, and the electronic device provided by the present invention.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, the system or system embodiments, which are substantially similar to the method embodiments, are described in a relatively simple manner, and reference may be made to some descriptions of the method embodiments for relevant points. The above-described system and system embodiments are only illustrative, wherein the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement without inventive effort.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the technical solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A method for task scheduling, comprising:
the method comprises the steps of obtaining a task queue to be scheduled, wherein the task queue to be scheduled comprises a plurality of data integration tasks, and the data integration tasks are in one-to-one correspondence with a plurality of pre-generated configuration files;
judging whether each data integration task meets a preset intermittent scheduling condition or not, and taking the data integration task meeting the intermittent scheduling condition as a first data integration task;
for each first data integration task, determining a break time file corresponding to the first data integration task, judging whether the first data integration task meets a preset execution condition according to the break time file, and if the first data integration task does not meet the execution condition, determining the first data integration task as a second data integration task;
removing each second data integration task from the task queue to be scheduled to obtain an updated task queue, and determining each data integration task in the updated task queue as a target data integration task;
determining the priority corresponding to each target data integration task according to a preset priority strategy;
and determining an execution sequence corresponding to each target data integration task according to the priority corresponding to each target data integration task so as to schedule each target data integration task.
2. The method of claim 1, wherein the generating of the configuration file comprises:
when a task creating instruction sent by a user through a front end is received, determining task configuration content corresponding to a data integration task to be created, wherein the task configuration content comprises task nodes, workflows, coordinators and batch scheduling workflows created by the user through the front end;
calling a preset extensible markup language parsing interface, parsing the task configuration content, and generating an extensible markup language file corresponding to the task configuration content;
and taking the extensible markup language file as a configuration file corresponding to the data integration task to be created.
3. The method according to claim 1, wherein said determining whether each of the data integration tasks meets a preset discontinuous scheduling condition comprises:
determining task information corresponding to each data integration task;
acquiring a scheduling identifier corresponding to each data integration task from task information corresponding to each data integration task;
and for each data integration task, if the scheduling identifier corresponding to the data integration task is the scheduling identifier of the discontinuous scheduling, determining that the data integration task meets the discontinuous scheduling condition.
4. The method of claim 1, wherein the determining whether the first data integration task meets a predetermined execution condition according to the pause time file comprises:
determining date constraint conditions corresponding to the discontinuous time files;
determining a current date;
judging whether the current date is matched with the date constraint condition, and if the current date is matched with the date constraint condition, determining that the first data integration task does not accord with the execution condition;
and if the current date does not match the date constraint condition, determining that the first data integration task meets the execution condition.
5. The method according to claim 1, wherein the determining an execution order corresponding to each of the target data integration tasks according to the priority corresponding to each of the target data integration tasks comprises:
and determining the execution sequence corresponding to each target data integration task according to the sequence of the priority level corresponding to each target data integration task from high to low and the sequence from first to last.
6. A task scheduling apparatus, comprising:
the device comprises an acquisition unit, a scheduling unit and a scheduling unit, wherein the acquisition unit is used for acquiring a task queue to be scheduled, the task queue to be scheduled comprises a plurality of data integration tasks, and the data integration tasks are in one-to-one correspondence with a plurality of pre-generated configuration files;
the first judging unit is used for judging whether each data integration task meets a preset discontinuous scheduling condition or not and taking the data integration task meeting the discontinuous scheduling condition as a first data integration task;
the second judging unit is used for determining a break time file corresponding to each first data integration task, judging whether the first data integration task meets a preset execution condition or not according to the break time file, and determining the first data integration task as a second data integration task if the first data integration task does not meet the execution condition;
the updating unit is used for removing each second data integration task from the task queue to be scheduled, obtaining an updated task queue, and determining each data integration task in the updated task queue as a target data integration task;
the first determining unit is used for determining the priority corresponding to each target data integration task according to a preset priority strategy;
and the second determining unit is used for determining the execution sequence corresponding to each target data integration task according to the priority corresponding to each target data integration task so as to schedule each target data integration task.
7. The apparatus of claim 6, further comprising:
a third determining unit, configured to determine task configuration content corresponding to a data integration task to be created when a task creation instruction sent by a user through a front end is received, where the task configuration content includes task nodes, workflows, a coordinator, and a batch scheduling workflow created by the user through the front end;
the generating unit is used for calling a preset extensible markup language parsing interface, parsing the task configuration content and generating an extensible markup language file corresponding to the task configuration content; and taking the extensible markup language file as a configuration file corresponding to the data integration task to be created.
8. The apparatus according to claim 6, wherein the first determining unit comprises:
the first determining subunit is used for determining task information corresponding to each data integration task;
the acquiring subunit is configured to acquire a scheduling identifier corresponding to each data integration task from task information corresponding to each data integration task;
and the second determining subunit is configured to determine, for each data integration task, that the data integration task meets the intermittent scheduling condition if the scheduling identifier corresponding to the data integration task is the scheduling identifier of intermittent scheduling.
9. A storage medium, comprising stored instructions, wherein when executed, the storage medium controls a device to execute the task scheduling method according to any one of claims 1 to 5.
10. An electronic device comprising a memory and one or more instructions, wherein the one or more instructions are stored in the memory and configured to be executed by the one or more processors to perform the task scheduling method of any one of claims 1-5.
CN202210588291.5A 2022-05-27 2022-05-27 Task scheduling method and device, storage medium and electronic equipment Pending CN114924858A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210588291.5A CN114924858A (en) 2022-05-27 2022-05-27 Task scheduling method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210588291.5A CN114924858A (en) 2022-05-27 2022-05-27 Task scheduling method and device, storage medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN114924858A true CN114924858A (en) 2022-08-19

Family

ID=82810144

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210588291.5A Pending CN114924858A (en) 2022-05-27 2022-05-27 Task scheduling method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN114924858A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116700913A (en) * 2022-09-13 2023-09-05 荣耀终端有限公司 Scheduling method, equipment and storage medium of embedded file system
WO2024055708A1 (en) * 2022-09-13 2024-03-21 上海寒武纪信息科技有限公司 Task scheduling method and apparatus, and device and medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116700913A (en) * 2022-09-13 2023-09-05 荣耀终端有限公司 Scheduling method, equipment and storage medium of embedded file system
WO2024055708A1 (en) * 2022-09-13 2024-03-21 上海寒武纪信息科技有限公司 Task scheduling method and apparatus, and device and medium

Similar Documents

Publication Publication Date Title
Wang et al. Dynamic multiple-period reconfiguration of real-time scheduling based on timed DES supervisory control
CN106126332A (en) Distributed timing task scheduling system and method
Alt et al. A grid workflow language using high-level petri nets
CN114924858A (en) Task scheduling method and device, storage medium and electronic equipment
CN112148455B (en) Task processing method, device and medium
CN108984284A (en) DAG method for scheduling task and device based on off-line calculation platform
CN103679392B (en) A kind of task scheduling processing method and system
Zhang et al. Simulation-based optimization for dynamic resource allocation
CN103065221A (en) Multidisciplinary collaborative optimization flow modeling and scheduling method and system based on business process execution language (BPEL)
JP2002189841A (en) Workflow management method and system, and recording medium storing its processing program
CN103825964A (en) SLS (Service Level Specification) scheduling device and SLS scheduling method based on cloud computing PaaS (platform-as-a-service) platform
Imai et al. Accurate resource prediction for hybrid IaaS clouds using workload-tailored elastic compute units
CN104536819A (en) Task scheduling method based on WEB service
CN111861235A (en) Task flow arrangement method and device and electronic equipment
CN112181621A (en) Task scheduling system, method, equipment and storage medium
CN110569113A (en) Method and system for scheduling distributed tasks and computer readable storage medium
CN102420709A (en) Method and equipment for managing scheduling task based on task frame
CN111897799A (en) Hydrological model service system based on process engine
Kolovos et al. Crossflow: a framework for distributed mining of software repositories
Frantz et al. An efficient orchestration engine for the cloud
JP2009230581A (en) Batch job control system, management node, and batch job control method
CN112559156A (en) Multi-dependency task grouping management method, device, equipment and storage medium
CN116841758A (en) Workflow task processing method, device, computer equipment and storage medium
CN113434268A (en) Workflow distributed scheduling management system and method
CN111338775B (en) Method and equipment for executing timing task

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination