CN112801546A

CN112801546A - Task scheduling method, device and storage medium

Info

Publication number: CN112801546A
Application number: CN202110290109.3A
Authority: CN
Inventors: 吴成杰; 沈梦婷; 彭金胜; 孙丽娜
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2021-03-18
Filing date: 2021-03-18
Publication date: 2021-05-14

Abstract

The embodiment of the specification provides a task scheduling method, a task scheduling device and a storage medium, which can be applied to the technical field of big data processing. The method comprises the following steps: acquiring a plurality of self-defined tasks with dependency relations; the user-defined task is a task deployed by a user; the user-defined task comprises script content for executing the task; analyzing the script content to obtain the upstream task which each user-defined task depends on; the upstream task is a system task; processing the upstream task on which each custom task depends, and removing the dependency relationship between the downstream custom task and the target upstream task in the plurality of custom tasks; and under the condition of successful processing, scheduling the plurality of self-defined tasks according to the upstream tasks which are depended by the processed self-defined tasks, thereby improving the task scheduling efficiency.

Description

Task scheduling method, device and storage medium

Technical Field

The embodiment of the specification relates to the technical field of big data processing, in particular to a task scheduling method and device and a storage medium.

Background

With the rapid development of society, more and more data are available, and the data utilization is more and more difficult. The big data platform processes the heterogeneous scattered data through an ETL scheduling system on the basis of a data warehouse, a data lake or other data sources to form useful valuable data or knowledge, and provides support for analysis and decision making of users or operation management personnel.

In order to facilitate users or operation management personnel to use the big data platform to carry out rapid analysis and decision, the big data platform supports an analyst to deploy a custom task, and an upstream task (an original processing task of a data warehouse, a data lake or other data sources) depended on is automatically judged through the task analysis module. The ETL dispatching system supports the user-defined task which is immediately triggered to be deployed by an analyst after the corresponding batch of the corresponding upstream task is processed, and enables quick analysis and decision. Meanwhile, the big data platform supports analysts to make execution sequence dependency relationships for custom configuration tasks, and supports complex processing flows.

The ETL scheduling can be divided into simple timing scheduling and workflow scheduling according to the functional complexity:

1. the ETL tasks are repeatedly run by the timing schedule at a fixed time in each period, such as data statistics and summarization of the previous month at 3 am on the first day of each month. Timing scheduling cannot handle situations where there are dependencies between multiple ETL tasks.

2. Workflow scheduling establishes a dependency relationship among ETL tasks, and executes the ETL tasks one by one strictly according to a set dependency relationship. Therefore, the complex data processing flow can be divided into a plurality of ETL tasks to be completed step by step. The workflow scheduling is more suitable for the conditions of large data volume and complex processing logic.

The timing scheduling cannot meet the requirements of complex processing flows, so the current ETL scheduling system generally adopts workflow scheduling. But workflow scheduling needs to check whether the dependent task is run and completed, thereby triggering the scheduled execution of the task. With the popularization of large data platforms, custom tasks deployed by analysts are more and more complex. The complexity is represented by: the custom tasks deployed by analysts depend on more and more upstream tasks, and the dependency relationship among the custom tasks is more and more complex. More and more task-dependent check items are required for workflow scheduling. The method creates certain challenges for the processing performance of the ETL scheduling system and the instantaneity of scheduling triggering, gradually creates the problems of high load of the ETL scheduling system, untimely scheduling triggering and the like, and is not beneficial for analysts to use a large data platform for rapid analysis and decision making.

Disclosure of Invention

An object of the embodiments of the present specification is to provide a task scheduling method, a task scheduling device, and a storage medium, so as to solve the problems of high load, untimely scheduling trigger, and the like of an ETL scheduling system in the prior art, and improve task scheduling efficiency.

To solve the above problem, an embodiment of the present specification provides a task scheduling method, where the method includes: acquiring a plurality of self-defined tasks with dependency relations; the user-defined task is a task deployed by a user; the user-defined task comprises script content for executing the task; analyzing the script content to obtain the upstream task which each user-defined task depends on; the upstream task is a system task; processing the upstream task on which each custom task depends, and removing the dependency relationship between the downstream custom task and the target upstream task in the plurality of custom tasks; the target upstream task is an upstream task which is repeatedly dependent on an upstream custom task and a downstream custom task in the multiple custom tasks; and under the condition of successful processing, scheduling the plurality of self-defined tasks according to the upstream tasks which are depended by the processed self-defined tasks.

In order to solve the above problem, an embodiment of the present specification further provides a task scheduling apparatus, where the apparatus includes: the acquisition module is used for acquiring a plurality of custom tasks with dependency relationships; the user-defined task is a task deployed by a user; the user-defined task comprises script content for executing the task; the analysis module is used for analyzing the script content to obtain the upstream task which each user-defined task depends on; the upstream task is a system task; the processing module is used for processing the upstream task on which each custom task depends and removing the dependency relationship between the downstream custom task and the target upstream task in the plurality of custom tasks; the target upstream task is an upstream task which is repeatedly dependent on an upstream custom task and a downstream custom task in the multiple custom tasks; and the scheduling module is used for scheduling the plurality of self-defined tasks according to the processed upstream tasks which are depended by the self-defined tasks under the condition of successful processing.

In order to solve the above problem, an embodiment of the present specification further provides an electronic device, including: a memory for storing a computer program; a processor for executing the computer program to implement: acquiring a plurality of self-defined tasks with dependency relations; the user-defined task is a task deployed by a user; the user-defined task comprises script content for executing the task; analyzing the script content to obtain the upstream task which each user-defined task depends on; the upstream task is a system task; processing the upstream task on which each custom task depends, and removing the dependency relationship between the downstream custom task and the target upstream task in the plurality of custom tasks; the target upstream task is an upstream task which is repeatedly dependent on an upstream custom task and a downstream custom task in the multiple custom tasks; and under the condition of successful processing, scheduling the plurality of self-defined tasks according to the upstream tasks which are depended by the processed self-defined tasks.

To solve the above problem, embodiments of the present specification further provide a computer-readable storage medium having stored thereon computer instructions, which when executed, implement: acquiring a plurality of self-defined tasks with dependency relations; the user-defined task is a task deployed by a user; the user-defined task comprises script content for executing the task; analyzing the script content to obtain the upstream task which each user-defined task depends on; the upstream task is a system task; processing the upstream task on which each custom task depends, and removing the dependency relationship between the downstream custom task and the target upstream task in the plurality of custom tasks; the target upstream task is an upstream task which is repeatedly dependent on an upstream custom task and a downstream custom task in the multiple custom tasks; and under the condition of successful processing, scheduling the plurality of self-defined tasks according to the upstream tasks which are depended by the processed self-defined tasks.

According to the technical scheme provided by the embodiment of the specification, a plurality of custom tasks with dependency relationships can be acquired; the user-defined task is a task deployed by a user; the user-defined task comprises script content for executing the task; analyzing the script content to obtain the upstream task which each user-defined task depends on; the upstream task is a system task; processing the upstream task on which each custom task depends, and removing the dependency relationship between the downstream custom task and the target upstream task in the plurality of custom tasks; the target upstream task is an upstream task which is repeatedly dependent on an upstream custom task and a downstream custom task in the multiple custom tasks; and under the condition of successful processing, scheduling the plurality of self-defined tasks according to the upstream tasks which are depended by the processed self-defined tasks. The method provided by the embodiment of the specification relieves the repeated upstream task dependency relationship of the user-defined task, so that the upstream task dependency relationship of the user-defined task is simpler and clearer, the ETL scheduling system can simplify the judgment of the trigger condition during scheduling, the load of the ETL scheduling system is reduced, the storage space required by the scheduling system is reduced, and the task scheduling efficiency is improved.

Drawings

In order to more clearly illustrate the embodiments of the present specification or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the specification, and other drawings can be obtained by those skilled in the art without creative efforts.

FIG. 1 is a system architecture diagram of a big data platform for rapid analysis and decision-making by a user in one example scenario of the present specification;

FIG. 2 is a schematic diagram of an ETL scheduling system according to an exemplary scenario of the present disclosure;

FIG. 3 is a flowchart illustrating a task scheduling method according to an embodiment of the present disclosure;

FIG. 4 is a flowchart illustrating the operation of a task resolution module according to an exemplary scenario of the present disclosure;

FIG. 5 is a flowchart illustrating the operation of a processing module in an exemplary scenario of the present specification;

FIG. 6 is a schematic diagram illustrating the effect of the treatment according to the embodiment of the present disclosure;

FIG. 7 is a schematic flow chart illustrating a refinement of the processing performed by embodiments of the present disclosure;

fig. 8 is a functional structure diagram of an electronic device according to an embodiment of the present disclosure;

fig. 9 is a functional structure diagram of a task scheduling device according to an embodiment of the present disclosure.

Detailed Description

The technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present specification without any creative effort shall fall within the protection scope of the present specification.

In some complex scenarios, data processing is often divided into several task steps to complete a data processing flow. Often, a strong dependency relationship exists among a plurality of task units, and an upstream task is executed and succeeded, and a downstream task can be executed. For example, after the upstream task is finished, the result a is obtained, and the downstream task needs to combine the result a to produce the result B, so that the start of the downstream task must be started after the upstream task successfully runs to obtain the result. In order to ensure the accuracy of the data processing result, the tasks must be executed orderly and efficiently according to the upstream and downstream dependency relationships. The ETL (Extract-Transform-Load) scheduling system is a key system which can organize the dependency relationship between the front and the back of the task and enable the task to be executed in order.

The current ETL scheduling system generally adopts workflow scheduling. But workflow scheduling needs to check whether the dependent task is run and completed, thereby triggering the scheduled execution of the task. With the popularization of large data platforms, custom tasks deployed by analysts are more and more complex. The complexity is represented by: the custom tasks deployed by analysts depend on more and more upstream tasks, and the dependency relationship among the custom tasks is more and more complex. More and more task-dependent check items are required for workflow scheduling. The method creates certain challenges for the processing performance of the ETL scheduling system and the instantaneity of scheduling triggering, gradually creates the problems of high load of the ETL scheduling system, untimely scheduling triggering and the like, and is not beneficial for analysts to use a large data platform for rapid analysis and decision making.

Considering that if the dependency relationship of a plurality of custom tasks is related and the repeated dependency relationship is removed, the dependency relationship of the upstream task of the custom task is simpler and clearer, the problems of high load, untimely scheduling trigger and the like of an ETL scheduling system in the prior art are hopefully solved, and the task scheduling efficiency is improved. Based on this, the embodiments of the present specification provide a task scheduling method, device and storage medium.

Referring to fig. 1, an example of a scenario in the present specification is presented. Fig. 1 is a system structure diagram of a big data platform for a user to perform quick analysis and decision making in the scenario example. The method comprises the following steps: the ETL scheduling system 1, the data warehouse 2, the data mart 3 and the instant BI system 4.

In this scenario example, the ETL scheduling system 1 may support the user-defined task to extract, convert, and load data from a source end to a destination end, so as to provide the processed data for a user to analyze and make a decision, with the execution main body of the task scheduling method.

In the present scenario example, the data warehouse 2 stores a subject-oriented, integrated, time-dependent, non-modifiable data set in enterprise management and decision making. Here, the ETL scheduling system extracts, converts, and loads the source of the data. In addition to data warehouses, the ETL scheduling system may also use data lakes or other data sources as a source for extracting, converting, and loading data.

In this scenario example, the data mart 3 stores a data cube that is oriented to the needs of a decision analysis to meet the needs of a particular department or user. The destination end of the ETL scheduling system extracts, converts and loads data, stores processed data and is used for a user to analyze and make decisions. Besides the data set market, the ETL scheduling system can also use other storage systems as the destination for extracting, converting, and loading data.

In this scenario example, the instant BI system 4 is an entry of a user using a big data platform, and provides functions including deploying an operation interface of a custom task, viewing an operation state of the custom task, querying processed data, displaying, analyzing, processing, and the like using a chart. In addition to the instant BI system, other systems or devices may be used as portals for using the ETL scheduling system.

As shown in FIG. 1, in this scenario example, the steps for the user to perform a quick analysis and decision are as follows.

Step 1: the user deploys the custom task through an interface provided by the instant BI system 4, and the instant BI system 4 transmits the deployed custom task and the dependency relationship between the custom tasks to the ETL scheduling system 1.

The user-defined task is a data analysis processing task deployed by a user, and includes processes of extraction, conversion, loading and the like, and is intended to perform data analysis or data statistics (such data analysis or data statistics tasks are often temporary or are flexibly changed by the user as required). The name of custom task is used to distinguish system tasks deployed by release from developers or technicians. The user-defined tasks mainly comprise statistical analysis, including data statistics of daily reports, weekly reports and monthly reports, data analysis of data quality classes and anomaly detection classes, and the like. The specific form of the custom task is mainly SQL script. Here, a simple custom task (denoted as custom task a) is given:

INSERT INTO usr.order_price_day_stat

SELECT'${process_date}',cast(SUM(order_price)as decimal(30,2))

FROM org.emall_b2c_order_info

WHERE to_date(order_time)＝'${process_date}'；

the custom task A only has one SQL statement, which is used for counting the total amount of the order on a certain day, wherein, $ { process _ date } is the parameter (date) of the task, usr.order _ price _ date _ stat is a table in the data mart 3, the statistical data (result data) is stored, org.email _ b2c _ order _ info is a table in the data warehouse 2 and is the detailed data (source data) of the order, to _ date is a function for converting time to date, SUM is summation, and cast is format conversion.

Step 2: the ETL scheduling system 1 analyzes the upstream task dependency relationship of the custom task, optimizes the dependent upstream task, and regularly acquires the running state of the dependent upstream task.

The custom task is to process and process the data in the data warehouse 2 (source end), and store the processed data in the data mart 3, and the generation of the source end data corresponds to the original data processing task (usually, the system task), which is responsible for storing the original data generated by the service system in the data warehouse 2 and possibly performing certain data filtering or data cleaning, and the original data processing task is the upstream task on which the custom task depends. The main steps of obtaining the dependency relationship of the upstream task are firstly analyzing the SQL script, obtaining a source table used by the SQL script, and inquiring the corresponding relationship of the table and the original data processing task, thereby obtaining the dependency relationship of the upstream task. Taking the above-mentioned custom task a as an example, the custom task a uses a table org, email _ B2c _ order _ info in the data warehouse 2, where the table stores order data generated by the electronic mall system, and an upstream task B (an upstream task B is a system task, developed and deployed by developers or technicians) is responsible for storing the order data generated by the business system into the table org, email _ B2c _ order _ info on a daily basis, so that the upstream task B is an upstream task of the custom task a, and the custom task a can be scheduled only after the order data of the upstream task B on a certain day is loaded, and the total order amount of the day is counted.

And step 3: the ETL scheduling system 1 schedules and executes scripts of the custom task in sequence in the data warehouse 3 according to the running state of the upstream task.

As shown in fig. 2, the ETL scheduling system 1 may include a deployment custom task module 11, a task parsing module 12, a processing module 13, an upstream task monitoring module 14, an ETL scheduling module 15, and a scheduling monitoring module 16. Through the data processing of each module, the upstream task dependency relationship of the custom task is analyzed, the dependent upstream task is optimized, the running state of the dependent upstream task is obtained at regular time, and the script of the custom task is scheduled and executed in the data warehouse 3 in sequence according to the running state of the upstream task.

And 4, step 4: and transmitting the data processed by the user-defined task to the data mart 3.

And 5: the user queries the running status of the custom task through the instant BI system 4.

ETL scheduling system 1 may monitor the running status of the custom task and provide an interface to instant BI system 4 to query the running status of the custom task. The real-time BI system 4 is generally a B/S architecture system, and a user may access the real-time BI system 4 through a browser, and perform operations of deploying a custom task, querying an operating state of the custom task, displaying result data after processing the custom task in a chart form, and the like on a browser interface. Taking the above-mentioned custom task a as an example, the user accesses the instant BI system 4 through the browser, obtains the task operation condition and the latest processing date of the custom task a on each day by querying the function of the custom task operation state, and may display the total amount of the order on each day in the form of a table or a line graph.

Step 6: and the user acquires the processed data through the instant BI system 4, and analyzes and displays the processed data.

Please refer to fig. 3. The embodiment of the specification provides a task scheduling method. In the embodiment of the present specification, a main body for executing the task scheduling method may be an electronic device having a logical operation function, and the electronic device may be a server. The server may be an electronic device having a certain arithmetic processing capability. Which may have a network communication unit, a processor, a memory, etc. Of course, the server is not limited to the electronic device having a certain entity, and may be software running in the electronic device. The server may also be a distributed server, which may be a system with multiple processors, memory, network communication modules, etc. operating in coordination. Alternatively, the server may also be a server cluster formed by several servers. The method may include the following steps.

S310: acquiring a plurality of self-defined tasks with dependency relations; the user-defined task is a task deployed by a user; the custom task includes script content for executing the task.

The process of obtaining a plurality of custom tasks with dependency relationships is described by taking the above scenario example as an example. The user may deploy the custom task through an interface provided by the instant BI system 4, and the instant BI system 4 transmits the deployed custom task and the dependency between the custom tasks to the ETL scheduling system 1.

The deployment custom task module 11 of the ETL scheduling system 1 may receive a custom task deployed by a user, and store the script content and the dependency relationship of the custom task in a database, so as to obtain a plurality of custom tasks with dependency relationships from the database.

S320: analyzing the script content to obtain the upstream task which each user-defined task depends on; the upstream task is a system task.

In some embodiments, the parsing the script content to obtain the upstream task on which each custom task depends includes: analyzing a table name used by an SQL statement in the script content of the user-defined task; and acquiring an upstream task corresponding to the custom task according to the table name. Specifically, the main steps of obtaining the upstream task on which each custom task depends are to analyze the SQL script, obtain a source table used by the SQL script, and query a corresponding relationship between the table and the original data processing task, thereby obtaining the upstream task dependency relationship. Taking the above-mentioned custom task a as an example, the custom task a uses a table org, email _ B2c _ order _ info in the data warehouse 2, where the table stores order data generated by the electronic mall system, and an upstream task B (an upstream task B is a system task, developed and deployed by developers or technicians) is responsible for storing the order data generated by the business system into the table org, email _ B2c _ order _ info on a daily basis, so that the upstream task B is an upstream task of the custom task a, and the custom task a can be scheduled only after the order data of the upstream task B on a certain day is loaded, and the total order amount of the day is counted. By the method, the upstream tasks which are depended by the custom tasks can be accurately obtained, and the obtaining efficiency of the upstream tasks which are depended by the custom tasks is improved.

The above-described example of the scenario is taken as an example to describe a process of obtaining the upstream task on which each custom task depends. And analyzing the script content through a task analysis module 12 of the ETL scheduling system 1 to obtain the upstream tasks which are depended by each user-defined task. As shown in fig. 4, the following steps may be included.

Step 1201: and receiving the SQL script of a certain custom task as an input parameter.

Specifically, the script content of the newly added custom task received by the custom task module 11 can be accepted as input.

Step 1202: and analyzing the table used in the SQL statement.

Specifically, table names used by all SQL statements in the script may be parsed.

Step 1203: and acquiring the corresponding upstream task according to the table name.

Specifically, the corresponding upstream task may be obtained according to the parsed table name.

Step 1204: the custom task and its upstream task dependencies are saved.

Specifically, the parsed custom task and its upstream task dependency relationship may be placed in a database store.

S330: processing the upstream task on which each custom task depends, and removing the dependency relationship between the downstream custom task and the target upstream task in the plurality of custom tasks; the target upstream task is an upstream task which is repeatedly depended by an upstream custom task and a downstream custom task in the multiple custom tasks.

In some embodiments, the upstream custom task and the downstream custom task have a direct or indirect dependency relationship, that is, the upstream custom task is a custom task that the downstream custom task depends on, or the upstream custom task is a custom task that the downstream custom task depends on.

In some embodiments, the processing the upstream task on which each custom task depends, and the releasing the dependency relationship between the downstream custom task in the plurality of custom tasks and the target upstream task may include: and obtaining the upstream task on which each custom task depends, respectively comparing the upstream task on which the upstream custom task and the downstream custom task depend in each custom task, if the upstream task depending on the repetition exists, determining the repeated upstream task as a target upstream task, and removing the dependency relationship between the downstream custom task and the target upstream task in the multiple custom tasks.

In some embodiments, the processes may also be implemented using graph theory. Specifically, the processing the upstream task on which each custom task depends, and the removing the dependency relationship between the downstream custom task and the target upstream task in the multiple custom tasks may further include: establishing a directed graph according to a plurality of user-defined tasks; wherein, the vertex in the directed graph represents the self-defined task, and the edge represents the dependency relationship; marking the plurality of self-defined tasks as unprocessed, and enabling the implicit upstream dependency set of the unprocessed self-defined tasks to be empty; the set of implicit upstream dependencies is comprised of implicit upstream tasks; the implicit upstream task is an upstream task which is depended by a parent custom task which is depended by the custom task; merging the implicit upstream dependency sets of at least one father custom task of the target custom task to obtain an implicit upstream dependency set of the target custom task; the target self-defining task is an unprocessed self-defining task with an in-degree of 0; determining the same task in the upstream task depended by the target custom task and the implicit upstream dependency set of the target custom task as a target upstream task, and removing the dependency relationship between the target custom task and the target upstream task; and adding the upstream task on which the target custom task depends into an implicit upstream dependency set of the target custom task, marking the target custom task as processed, deleting an edge associated with the target custom task from the directed graph, and recalculating the in-degree of each unprocessed custom task so as to determine a new target custom task until all custom tasks are processed. The in-degree is one of important concepts in graph theory algorithm, and generally refers to the sum of times that a certain vertex in a directed graph is used as an end point of an edge in the graph.

By the processing mode, the repeated upstream task dependency relationship of the custom task can be released, so that the upstream task dependency relationship of the custom task is simpler and clearer, the judgment of the trigger condition can be simplified when the task is scheduled, the load of an ETL scheduling system is reduced, and the storage space required by the scheduling system is also reduced.

In some embodiments, in the processing process, in the case that there is no unprocessed custom task with an in-degree of 0, the directed graph is determined to have a ring, and the processing is determined to fail. Specifically, under the condition that an unprocessed custom task with an in-degree of 0 does not exist, it can be determined that each custom task depends on other custom tasks, and the dependency relationship among the custom tasks forms a closed loop, so that the execution sequence of each custom task cannot be determined, and thus each custom task is abnormal in execution. By the processing mode, the loop condition can be found in time, and the exception handling efficiency of task scheduling is improved.

The procedure of the processing will be described by taking the above-described scene example as an example. The processing module 13 of the ETL scheduling system 1 is used for processing the upstream task on which each custom task depends, and the dependency relationship between the downstream custom task and the target upstream task in the custom tasks is released. As shown in fig. 5, the following steps may be included.

Step 1301: and receiving a certain custom task as an input parameter.

Specifically, custom tasks processed by the task parsing module 12 may be accepted as input.

Step 1302: and acquiring all self-defined tasks related to the task, the dependency relationship of the self-defined tasks and the dependent upstream tasks.

Specifically, all the custom tasks having a dependency relationship with the custom task and the upstream task dependency relationship of the tasks may be obtained.

Step 1303: and processing the set of custom tasks with the dependency relationship.

Step 1304: and judging whether the processing in the step 1303 is successful, if so, executing the step 1305, and if not, executing the step 1306.

Step 1305: and saving the processed upstream task dependence condition.

Specifically, the processed upstream task dependency relationship may be placed in a database for storage.

Step 1306: the set of custom tasks with dependencies is disabled and the user is notified of dependency exceptions.

Fig. 6 is a schematic diagram of the effect after treatment. As shown in fig. 6, the custom task 1, the custom task 2, and the custom task 3 depend on a plurality of upstream tasks, respectively, because the custom task 1, the custom task 2, and the custom task 3 are related tasks to be executed, and there is a high probability that the custom task 1, the custom task 2, and the custom task 3 will depend on the same upstream task, for example, the custom task 1 and the custom task 2 depend on the same upstream task 2, and because the custom task 2 must be scheduled and executed after the custom task 1, the upstream task of the custom task 1 must be executed already when the custom task 2 is scheduled and executed, so that the dependency relationship that the custom task 2 depends on the upstream task 2 can be released without affecting the scheduling and executing sequence.

Fig. 7 is a detailed flowchart of the processing in step 1303. Processing is based on graph theory, and the abnormal condition with a ring in the graph can be checked at the same time. In this scenario example, it may be checked whether there is a ring in the graph while performing dependency optimization. The specific steps detailed in step 1303 are as follows.

Step 130301: and establishing a directed graph G for the self-defined tasks and the dependency relationships thereof, wherein the tasks are represented by vertexes (V) and the dependency relationships are represented by edges (E).

Step 130302: and (4) calculating the in-degree of all the vertexes (namely counting the number of the parent custom tasks which each custom task depends on).

Step 130303: all custom tasks are marked as unprocessed, and for each task its implicit upstream dependency set is set to null.

Step 130304: and judging whether all the custom tasks are processed. If yes, step 1303 is normally ended, and the process is successful. If not, step 130305 is performed.

Step 130305: and judging whether an unprocessed task with the income degree of 0 exists or not. If so, step 130306 is performed. If not, the ring in the figure is described, the abnormal condition is existed, and the processing is failed.

Step 130306: taking an unprocessed custom task with an in-degree of 0, and calling the task as 'task J' in the following steps "

Step 130307: and taking a union set of the implicit upstream dependency sets of the parent custom task of the task J, and setting the union set as the implicit upstream dependency set of the task J.

Step 130308: and removing the tasks in the implicit upstream dependency set for the upstream tasks on which the task J depends.

Step 130309: and adding the upstream task on which the task J depends into the implicit upstream dependency set of the task J.

Step 130310: task J is marked as processed.

Step 130311: the edges associated with task J are removed from G, the degree of entry of the sub-task is recalculated, and step 130304 is performed.

Wherein the set of implicit upstream dependencies may consist of implicit upstream tasks; the implicit upstream task is an upstream task that a parent custom task that the custom task depends on. Specifically, the custom task depends on other custom tasks besides the upstream task on which the custom task depends, and the other custom tasks also have dependent upstream tasks, and all the upstream tasks on which the other custom tasks depend are referred to as implicit upstream tasks.

The implicit upstream dependency set in the detailed step of step 1303 is an auxiliary data structure in the processing process of step 1303, and is initially empty (step 130303), and then the implicit upstream dependency set of the parent custom task is respectively merged and collected (step 130307), processed (step 130308), and the own upstream dependency task is added to the implicit upstream dependency set to facilitate the processing of the child custom task (step 130309) in a breadth-first traversal manner. After the process of step 1303, the implicit upstream dependency set no longer needs to exist. For convenience of illustration, taking fig. 5 as an example, the implicit upstream dependency of the custom task 1 is null, and the implicit upstream dependency of the custom task 2 is also null, because the custom task 1 and the custom task 2 have no parent custom task of dependency, which is equivalent to no optimization in the processing of step 130307, then the upstream tasks themselves are added to the implicit upstream dependency set in step 130308, and the implicit upstream dependency sets of the custom task 1 and the custom task 2 are merged to obtain the implicit upstream dependency of the custom task 3 in the processing of the custom task 3. The change of the implicit upstream dependency set in step 1303 is further given here:

let the implicit upstream dependency sets of custom task 1, custom task 2, and custom task 3 be represented by S1, S2, S3, respectively, S1, S2, S3 all being initially empty (step 130303)

Processing the custom task 1:

step 130307, S1 is still empty since there is no parent custom task;

step 130308 processing, since S1 is null, this step is equivalent to null operation;

in step 130309, S1 ═ upstream task 1, upstream task 2 }.

And (4) processing the custom task 2:

step 130307, S2 is still empty since there is no parent custom task;

step 130308 processing, since S2 is null, this step is equivalent to null operation;

in step 130309, S1 ═ upstream task 4, upstream task 5, and upstream task 6.

And (3) processing the custom task:

step 130307 is to take the union of the parent custom tasks S1 and S2, S3 being { upstream task 1-2, upstream task 4-6 };

step 130308, processing, namely subtracting the upstream task 2 and the upstream tasks 4-6 which the custom task 3 depends on, and only leaving the upstream task 3;

in step 130309, S3 ═ upstream tasks 1 to 6.

In some embodiments, the method may further comprise: and in the case of processing failure, deactivating the plurality of custom tasks and returning a notice of dependency exception. Specifically, under the condition that an unprocessed self-defined task with an entry degree of 0 does not exist, the directed graph is determined to have a ring, processing failure is judged, under the condition that processing failure occurs, the plurality of self-defined tasks are stopped, and a notification of dependence relation abnormity is returned, so that the memory of the ETL scheduling system can be released in time, the load of the ETL scheduling system is reduced, the abnormal condition of task scheduling is fed back in time, and a user can check the abnormal condition.

S340: and under the condition of successful processing, scheduling the plurality of self-defined tasks according to the upstream tasks which are depended by the processed self-defined tasks.

In some embodiments, said scheduling the plurality of custom tasks according to the processed upstream tasks on which the respective custom tasks depend comprises: acquiring the running state of the upstream task which is depended on by each processed custom task at fixed time; and triggering and executing the custom tasks in sequence according to the running state of each upstream task and the dependency relationship among the plurality of custom tasks. In the mode, the triggering conditions of the scheduling of the child self-defined task are simplified through processing, so that the child self-defined task can be timely triggered after the running of the parent self-defined task is finished, the problem that the scheduling triggering is not timely is solved, and the timeliness of the task scheduling is optimized.

In some embodiments, the method may further comprise: and monitoring the running state of the custom task so as to feed back the running state of the custom task. The running state of the custom task can be fed back by monitoring the running state of the custom task, so that task scheduling can be smoothly executed, and a user can be informed of the execution condition of each custom task.

The above scenario example is taken as an example to describe the process of scheduling the multiple custom tasks and monitoring the running states of the custom tasks.

The scheduling of the plurality of customized tasks according to the processed upstream tasks on which the customized tasks depend is realized by the upstream task monitoring module 14 and the ETL scheduling module 15 of the ETL scheduling system 1. Specifically, the upstream task monitoring module 14 is responsible for acquiring the running state of the upstream task at regular time, so that the ETL scheduling module 15 triggers the running of the custom task. The ETL scheduling module 15 sequentially triggers and executes the custom tasks according to the running state of the upstream tasks, the dependency relationship between the custom tasks, and the relationship that the custom tasks depend on the upstream tasks, which are provided by the upstream task monitoring module 14.

The running state of the custom task is monitored through the scheduling monitoring module 16 of the ETL scheduling system 1, so that the running state of the custom task can be fed back conveniently. Specifically, the scheduling monitoring module 16 is responsible for monitoring the running status of the custom job triggered by the ETL scheduling module 15 and providing the running status of the custom job to other systems.

The method provided by the embodiment of the specification can acquire a plurality of custom tasks with dependency relationship; the user-defined task is a task deployed by a user; the user-defined task comprises script content for executing the task; analyzing the script content to obtain the upstream task which each user-defined task depends on; the upstream task is a system task; processing the upstream task on which each custom task depends, and removing the dependency relationship between the downstream custom task and the target upstream task in the plurality of custom tasks; the target upstream task is an upstream task which is repeatedly dependent on an upstream custom task and a downstream custom task in the multiple custom tasks; and under the condition of successful processing, scheduling the plurality of self-defined tasks according to the upstream tasks which are depended by the processed self-defined tasks. The method provided by the embodiment of the specification relieves the repeated upstream task dependency relationship of the user-defined task, so that the upstream task dependency relationship of the user-defined task is simpler and clearer, the ETL scheduling system can simplify the judgment of the trigger condition during scheduling, the load of the ETL scheduling system is reduced, the storage space required by the scheduling system is reduced, and the task scheduling efficiency is improved.

Fig. 8 is a functional structure diagram of an electronic device according to an embodiment of the present disclosure, where the electronic device may include a memory and a processor.

In some embodiments, the memory may be used to store the computer programs and/or modules, and the processor may implement the various functions of the task scheduling method by running or executing the computer programs and/or modules stored in the memory and calling the data stored in the memory. The memory can mainly comprise a program storage area and a data storage area, wherein the program storage area can store an operating system and an application program required by at least one function; the storage data area may store data created according to the use of the user terminal. In addition, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.

The Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an APPlication Specific Integrated Circuit (ASIC), a Field-Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The processor may execute the computer instructions to perform the steps of: acquiring a plurality of self-defined tasks with dependency relations; the user-defined task is a task deployed by a user; the user-defined task comprises script content for executing the task; analyzing the script content to obtain the upstream task which each user-defined task depends on; the upstream task is a system task; processing the upstream task on which each custom task depends, and removing the dependency relationship between the downstream custom task and the target upstream task in the plurality of custom tasks; the target upstream task is an upstream task which is repeatedly dependent on an upstream custom task and a downstream custom task in the multiple custom tasks; and under the condition of successful processing, scheduling the plurality of self-defined tasks according to the upstream tasks which are depended by the processed self-defined tasks.

In the embodiments of the present description, the functions and effects specifically realized by the electronic device may be explained in comparison with other embodiments, and are not described herein again.

Fig. 9 is a functional structure diagram of a task scheduling device according to an embodiment of the present disclosure, where the task scheduling device may specifically include the following structural modules.

An obtaining module 910, configured to obtain a plurality of custom tasks with dependency relationships; the user-defined task is a task deployed by a user; the user-defined task comprises script content for executing the task;

the analysis module 920 is configured to analyze the script content to obtain an upstream task on which each user-defined task depends; the upstream task is a system task;

a processing module 930, configured to process an upstream task on which each custom task depends, and release a dependency relationship between a downstream custom task and a target upstream task in the multiple custom tasks; the target upstream task is an upstream task which is repeatedly dependent on an upstream custom task and a downstream custom task in the multiple custom tasks;

and a scheduling module 940, configured to schedule the multiple custom tasks according to the upstream tasks that the processed custom tasks depend on, when the processing is successful.

The embodiment of the present specification further provides a computer-readable storage medium of a task scheduling method, where the computer-readable storage medium stores computer program instructions, and when the computer program instructions are executed, the computer-readable storage medium implements: acquiring a plurality of self-defined tasks with dependency relations; the user-defined task is a task deployed by a user; the user-defined task comprises script content for executing the task; analyzing the script content to obtain the upstream task which each user-defined task depends on; the upstream task is a system task; processing the upstream task on which each custom task depends, and removing the dependency relationship between the downstream custom task and the target upstream task in the plurality of custom tasks; the target upstream task is an upstream task which is repeatedly dependent on an upstream custom task and a downstream custom task in the multiple custom tasks; and under the condition of successful processing, scheduling the plurality of self-defined tasks according to the upstream tasks which are depended by the processed self-defined tasks.

In the embodiments of the present specification, the storage medium includes, but is not limited to, a Random Access Memory (RAM), a Read-Only Memory (ROM), a Cache (Cache), a Hard Disk Drive (HDD), or a Memory Card (Memory Card). The memory may be used for storing the computer programs and/or modules, and the memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the user terminal, and the like. In addition, the memory may include high speed random access memory, and may also include non-volatile memory. In the embodiments of the present description, the functions and effects specifically realized by the program instructions stored in the computer-readable storage medium may be explained in contrast to other embodiments, and are not described herein again.

It should be noted that the task scheduling method, device and storage medium provided in the embodiments of the present specification may be applied to the technical field of big data processing. Of course, the method and the device for task scheduling may also be applied to the financial field or any field other than the financial field, and the application fields of the method, the device and the storage medium for task scheduling are not limited in the embodiments of the present specification.

It should be noted that, in the present specification, each embodiment is described in a progressive manner, and the same or similar parts in each embodiment may be referred to each other, and each embodiment focuses on differences from other embodiments. In particular, as for the apparatus embodiment and the apparatus embodiment, since they are substantially similar to the method embodiment, the description is relatively simple, and reference may be made to some descriptions of the method embodiment for relevant points.

After reading this specification, persons skilled in the art will appreciate that any combination of some or all of the embodiments set forth herein, without inventive faculty, is within the scope of the disclosure and protection of this specification.

In the 90 s of the 20 th century, improvements in a technology could clearly distinguish between improvements in hardware (e.g., improvements in circuit structures such as diodes, transistors, switches, etc.) and improvements in software (improvements in process flow). However, as technology advances, many of today's process flow improvements have been seen as direct improvements in hardware circuit architecture. Designers almost always obtain the corresponding hardware circuit structure by programming an improved method flow into the hardware circuit. Thus, it cannot be said that an improvement in the process flow cannot be realized by hardware physical modules. For example, a Programmable Logic Device (PLD), such as a Field Programmable Gate Array (FPGA), is an integrated circuit whose Logic functions are determined by programming the Device by a user. A digital system is "integrated" on a PLD by the designer's own programming without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Furthermore, nowadays, instead of manually making an Integrated Circuit chip, such Programming is often implemented by "logic compiler" software, which is similar to a software compiler used in program development and writing, but the original code before compiling is also written by a specific Programming Language, which is called Hardware Description Language (HDL), and HDL is not only one but many, such as abel (advanced Boolean Expression Language), ahdl (alternate Language Description Language), traffic, pl (core unified Programming Language), HDCal, JHDL (Java Hardware Description Language), langue, Lola, HDL, laspam, hardbyscript Description Language (vhr Description Language), and the like, which are currently used by Hardware compiler-software (Hardware Description Language-software). It will also be apparent to those skilled in the art that hardware circuitry that implements the logical method flows can be readily obtained by merely slightly programming the method flows into an integrated circuit using the hardware description languages described above.

The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

From the above description of the embodiments, it is clear to those skilled in the art that the present specification can be implemented by software plus a necessary general hardware platform. Based on such understanding, the technical solutions of the present specification may be essentially or partially implemented in the form of software products, which may be stored in a storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and include instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments of the present specification.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The description is operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.

This description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

While the specification has been described with examples, those skilled in the art will appreciate that there are numerous variations and permutations of the specification that do not depart from the spirit of the specification, and it is intended that the appended claims include such variations and modifications that do not depart from the spirit of the specification.

Claims

1. A method for task scheduling, the method comprising:

acquiring a plurality of self-defined tasks with dependency relations; the user-defined task is a task deployed by a user; the user-defined task comprises script content for executing the task;

analyzing the script content to obtain the upstream task which each user-defined task depends on; the upstream task is a system task;

processing the upstream task on which each custom task depends, and removing the dependency relationship between the downstream custom task and the target upstream task in the plurality of custom tasks; the target upstream task is an upstream task which is repeatedly dependent on an upstream custom task and a downstream custom task in the multiple custom tasks;

and under the condition of successful processing, scheduling the plurality of self-defined tasks according to the upstream tasks which are depended by the processed self-defined tasks.

2. The method of claim 1, wherein parsing the script content to obtain the upstream task on which each custom task depends comprises:

analyzing a table name used by an SQL statement in the script content of the user-defined task;

and acquiring an upstream task corresponding to the custom task according to the table name.

3. The method of claim 1, wherein the processing the upstream task on which each custom task depends, and the releasing the dependency relationship between the downstream custom task and the target upstream task in the plurality of custom tasks comprises:

establishing a directed graph according to a plurality of user-defined tasks; wherein, the vertex in the directed graph represents the self-defined task, and the edge represents the dependency relationship;

marking the plurality of self-defined tasks as unprocessed, and enabling the implicit upstream dependency set of the unprocessed self-defined tasks to be empty; the set of implicit upstream dependencies is comprised of implicit upstream tasks; the implicit upstream task is an upstream task which is depended by a parent custom task which is depended by the custom task;

merging the implicit upstream dependency sets of at least one father custom task of the target custom task to obtain an implicit upstream dependency set of the target custom task; the target self-defining task is an unprocessed self-defining task with an in-degree of 0;

determining the same task in the upstream task depended by the target custom task and the implicit upstream dependency set of the target custom task as a target upstream task, and removing the dependency relationship between the target custom task and the target upstream task;

and adding the upstream task on which the target custom task depends into an implicit upstream dependency set of the target custom task, marking the target custom task as processed, deleting an edge associated with the target custom task from the directed graph, and recalculating the in-degree of each unprocessed custom task so as to determine a new target custom task until all custom tasks are processed.

4. The method of claim 3, wherein in the absence of an unprocessed custom task with an in-degree of 0, determining that the directed graph has a ring and determining that processing has failed.

5. The method of claim 1, wherein in the event of a processing failure, the plurality of custom tasks are deactivated, and a notification of a dependency exception is returned.

6. The method of claim 1, wherein scheduling the plurality of custom tasks according to the processed upstream tasks on which the respective custom tasks depend comprises:

acquiring the running state of the upstream task which is depended on by each processed custom task at fixed time;

and triggering and executing the custom tasks in sequence according to the running state of each upstream task and the dependency relationship among the plurality of custom tasks.

7. The method of claim 1, further comprising: and monitoring the running state of the custom task so as to feed back the running state of the custom task.

8. A task scheduling apparatus, characterized in that the apparatus comprises:

the acquisition module is used for acquiring a plurality of custom tasks with dependency relationships; the user-defined task is a task deployed by a user; the user-defined task comprises script content for executing the task;

the analysis module is used for analyzing the script content to obtain the upstream task which each user-defined task depends on; the upstream task is a system task;

the processing module is used for processing the upstream task on which each custom task depends and removing the dependency relationship between the downstream custom task and the target upstream task in the plurality of custom tasks; the target upstream task is an upstream task which is repeatedly dependent on an upstream custom task and a downstream custom task in the multiple custom tasks;

and the scheduling module is used for scheduling the plurality of self-defined tasks according to the processed upstream tasks which are depended by the self-defined tasks under the condition of successful processing.

9. An electronic device, comprising:

a memory for storing a computer program;

a processor for executing the computer program to implement: acquiring a plurality of self-defined tasks with dependency relations; the user-defined task is a task deployed by a user; the user-defined task comprises script content for executing the task; analyzing the script content to obtain the upstream task which each user-defined task depends on; the upstream task is a system task; processing the upstream task on which each custom task depends, and removing the dependency relationship between the downstream custom task and the target upstream task in the plurality of custom tasks; the target upstream task is an upstream task which is repeatedly dependent on an upstream custom task and a downstream custom task in the multiple custom tasks; and under the condition of successful processing, scheduling the plurality of self-defined tasks according to the upstream tasks which are depended by the processed self-defined tasks.

10. A computer readable storage medium having computer instructions stored thereon that when executed perform: acquiring a plurality of self-defined tasks with dependency relations; the user-defined task is a task deployed by a user; the user-defined task comprises script content for executing the task; analyzing the script content to obtain the upstream task which each user-defined task depends on; the upstream task is a system task; processing the upstream task on which each custom task depends to release the dependency relationship between the downstream custom task and the target upstream task in the plurality of custom tasks; the target upstream task is an upstream task which is repeatedly dependent on an upstream custom task and a downstream custom task in the multiple custom tasks; and under the condition of successful processing, scheduling the plurality of self-defined tasks according to the upstream tasks which are depended by the processed self-defined tasks.