WO2023213118A1

WO2023213118A1 - Task scheduling method and apparatus, and device

Info

Publication number: WO2023213118A1
Application number: PCT/CN2023/078004
Authority: WO
Inventors: 武浩瑞; 张韬
Original assignee: 北京快乐茄信息技术有限公司
Priority date: 2022-05-06
Filing date: 2023-02-23
Publication date: 2023-11-09
Also published as: CN115098232A

Abstract

A task scheduling method and apparatus, and a device, which can be applied to the technical field of big data analysis. The task scheduling method comprises: acquiring a first task (S101); when the execution of the first task is completed, determining N second tasks, wherein each second task is a downstream task of the first task (S102); and when the execution of an upstream task of an ith second task among the N second tasks is completed, executing the ith second task (S103). The execution of a downstream task can be triggered once it is determined that the execution of an upstream task is completed without the need to wait for a fixed execution time, thereby effectively shortening a delay between the upstream task and the downstream task.

Description

Task scheduling methods, devices and equipment

Cross-references to related applications

This disclosure claims priority to the Chinese patent application with application number 202210488671.1 and titled "Method, Device and Equipment for Task Scheduling" filed with the State Intellectual Property Office of China on May 6, 2022, the entire content of which is incorporated herein by reference. Public.

Technical field

The present disclosure relates to the technical field of big data analysis, and in particular to a task scheduling method, device and equipment.

Background technique

At present, with the rapid development of the Internet, the types of tasks that require the help of the network are becoming increasingly diverse. In the field of big data analysis technology, there are complex dependencies between multiple tasks, which requires the use of task scheduling to manage tasks.

Traditional task scheduling mainly uses periodic scheduled execution and tasks as upstream dependencies. Because users can hardly give the most appropriate scheduled execution time when configuring tasks. This leads to the problem of high latency in traditional task scheduling methods.

Contents of the invention

Embodiments of the present disclosure provide a task scheduling method, apparatus and equipment to achieve effects such as reducing delays between upstream and downstream tasks.

In a first aspect, embodiments of the present disclosure provide a task scheduling method, including: obtaining a first task; when the execution of the first task is completed, determining N second tasks, and the second tasks are downstream tasks of the first task, N is a positive integer; when the upstream task of the i-th second task among the N second tasks is completed, the i-th second task is executed.

In some possible implementations, after obtaining the first task, the method may further include: executing the first task according to the first dependency data table and the first execution parameter of the first task; the first dependency data table indicates the first task The data table produced by the upstream tasks that it depends on during execution.

In some possible implementations, when the execution of the first task is completed, determining N second tasks includes: when the execution of the first task is completed, obtaining the first output data table, and the first output data table is the first The data table produced when the task is completed; among the downstream tasks of the first task, N downstream tasks that depend on the first output data table are determined as N second tasks.

In some possible implementations, the first execution parameter may include at least one of the following: execution granularity, dependency granularity, dependency time offset, and output time offset; where execution granularity is used to represent execution of the first task. Execution cycle; dependency granularity is used to represent the cycle of the first dependency data table that the first task relies on when executing; dependency time offset is used to represent the difference between the execution time of the first task and the output time of the first dependency data table. offset value between The offset is used to represent the offset value between the execution time of the first task and the output time of the first task output data table.

In some possible implementations, the execution period may include at least one of the following: one month, one week, one day, and one hour.

In some possible implementations, when the upstream task of the i-th second task among the N second tasks is completed, before executing the i-th second task, the method further includes: polling the N second tasks ; Determine the upstream task of each second task according to the second dependency data table of each second task, where the second dependency data table is used to indicate the output of the upstream task on which each second task depends during execution. data sheet.

In some possible implementations, after determining the upstream task of each second task according to the second dependency data table of each second task, the method further includes: detecting whether the upstream task of each second task produces data. table; when it is detected that the upstream task of the i-th second task has produced a data table, it is determined that the upstream task of the i-th second task has been executed.

In some possible implementations, after the N second tasks are determined when the execution of the first task is completed, the method may further include: according to the second execution parameters of each of the N second tasks, Each second task registers an execution trigger, and the execution trigger is configured to trigger the execution of the second task when the execution time of the corresponding second task arrives; when the upstream task of the i-th second task among the N second tasks When the execution is completed, executing the i-th second task includes: when the execution of the upstream task of the i-th second task is completed, triggering the execution trigger corresponding to the i-th second task.

In some possible implementations, according to the first dependency data table and the first execution parameter of the first task, executing the first task includes: corresponding to the current system time according to the output data table of the first task. Whether there is success information at the data output time. If there is no success, the first task can be filtered out to trigger the execution of the task at the time corresponding to the current system time.

In some possible implementations, when determining the downstream tasks of the first task, the configuration information stored in the database is retrieved, and the downstream tasks of the first task are determined based on the dependency information in the configuration information.

In some possible implementations, after the execution of the first task is completed, information indicating the successful execution of the first task is stored in a task status table, where the information in the task status table includes task name, data Production date and status.

In a second aspect, embodiments of the present disclosure provide a device for task scheduling. The device may be a chip or a system-on-chip in an electronic device. It may also be configured in an electronic device to implement the first aspect and any of its possibilities. Functional modules that implement the method. The task scheduling device can realize the functions performed by the electronic device in the first aspect and any of its possible implementation modes, and the functions can be realized by hardware executing corresponding software. Hardware or software includes one or more modules corresponding to the above functions. The device for task scheduling includes: an acquisition module configured to acquire a first task; a determination module configured to determine N second tasks when the execution of the first task is completed. The task is a downstream task of the first task, and N is a positive integer; the execution module is configured to execute the i-th second task when the upstream task of the i-th second task among the N second tasks is completed. Task.

In some possible implementations, the acquisition module is further configured to: after acquiring the first task, execute the first task according to the first dependency data table and the first execution parameter of the first task; the first dependency data table indicates the first A data table produced by upstream tasks that a task depends on during execution.

In some possible implementations, the determination module is further configured to: when the execution of the first task is completed, obtain the first output data table, and the first output data table is the data table generated when the execution of the first task is completed. ; Among the downstream tasks of the first task, N downstream tasks that depend on the first output data table are determined as N second tasks.

In some possible implementations, the first execution parameter may include at least one of the following: execution granularity, dependency granularity, dependency time offset, and output time offset; where execution granularity is used to represent execution of the first task. Execution cycle; dependency granularity is used to represent the cycle of the first dependency data table that the first task relies on when executing; dependency time offset is used to represent the difference between the execution time of the first task and the output time of the first dependency data table. The offset value between; the output time offset is used to represent the offset value between the execution time of the first task and the output time of the first task output data table.

In some possible implementations, the execution period includes at least one of the following: one month, one week, one day, and one hour.

In some possible implementations, the execution module is further configured to: when the upstream task of the i-th second task among the N second tasks is completed, before executing the i-th second task, poll the N Second task; determine the upstream tasks of each second task according to the second dependency data table of each second task, where the second dependency data table is used to indicate the upstream tasks on which each second task depends during execution. Output data table.

In some possible implementations, the execution module is further configured to: after determining the upstream task of each second task according to the second dependency data table of each second task, detect whether the upstream task of each second task Produce a data table; when it is detected that the upstream task of the i-th second task has produced a data table, it is determined that the upstream task of the i-th second task has been executed.

In some possible implementations, the execution module is further configured to: after determining the N second tasks when the execution of the first task is completed, according to the second execution parameter of each second task in the N second tasks , register an execution trigger for each second task, and the execution trigger is configured to trigger the execution of the second task when the execution time of the corresponding second task arrives; when the i-th second task among the N second tasks When the execution of the upstream task is completed, executing the i-th second task includes: when the execution of the upstream task of the i-th second task is completed, triggering the execution trigger corresponding to the i-th second task.

In a third aspect, embodiments of the present disclosure provide an electronic device, which may include: a memory configured to store processor-executable instructions; a processor; wherein the processor is configured to: execute the executable instructions hour, To implement the method described in the first aspect and any possible implementation manner thereof.

In a fourth aspect, embodiments of the present disclosure provide a computer-readable storage medium that stores computer-executable instructions. After the computer-executable instructions are executed by a processor, the computer-readable storage medium can implement the first aspect and any one thereof. possible implementation methods.

Compared with the existing technology, the technical solution provided by the embodiments of the present disclosure can at least achieve the following beneficial effects:

In the present disclosure, by acquiring the first task, when the execution of the first task is completed, N second tasks are determined, where the second tasks are downstream tasks of the first task. When the upstream task of the i-th second task among the N second tasks is completed, the i-th second task is executed. It can be seen that the downstream tasks in the present disclosure can trigger execution when the execution of the upstream tasks is determined to be completed, without waiting for a fixed execution time, which effectively reduces the delay between the upstream and downstream tasks.

It should be understood that the above general description and the following detailed description are exemplary and explanatory only, and do not limit the scope of the present disclosure.

Description of the drawings

Figure 1 is a schematic flowchart of an implementation of a task scheduling method in an embodiment of the present disclosure;

Figure 2 is a schematic flowchart of another implementation of a task scheduling method in an embodiment of the present disclosure;

Figure 3 is a schematic diagram of the structure of task configuration in an embodiment of the present disclosure;

Figure 4 is a schematic diagram of the structure of a system meter in an embodiment of the present disclosure;

Figure 5 is another schematic diagram of the structure of a system meter in an embodiment of the present disclosure;

Figure 6 is a schematic flowchart of another implementation of a task scheduling method in an embodiment of the present disclosure;

Figure 7 is a schematic structural diagram of a task scheduling device in an embodiment of the present disclosure;

FIG. 8 is a schematic structural diagram of an electronic device in an embodiment of the present disclosure.

Detailed ways

In the following description, specific details such as specific system structures and technologies are provided for the purpose of illustration rather than limitation, so as to provide a thorough understanding of the embodiments of the present invention. However, it will be apparent to those skilled in the art that the present invention may be practiced in other embodiments without these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.

In order to illustrate the technical solutions described in the present disclosure, specific examples will be described below.

At present, with the rapid development of the Internet, the types of tasks that require the help of the network are becoming increasingly diverse. Especially in the field of big data, more and more companies are beginning to focus on data. Enterprises usually use the powerful computing power of server clusters to obtain various data reports, so that they can intuitively understand and understand related businesses through these reports.

As the amount of network data increases, so does the need for data analysis. For example, when a business requires big data analysis in various dimensions every day, a large number of data analysis tasks will be generated. Because each data processing task performs The length of time is limited by factors such as changing computing cluster resources and the amount of data processed, so it is impossible to estimate how much time a task will actually take. When users configure tasks, they cannot give the most appropriate scheduled execution time. As a result, when the upstream task completes execution, the downstream task has to wait because the scheduled execution time has not been reached, resulting in a high delay between the upstream and downstream tasks.

For example, assume that task A is executed regularly every day and produces data a. However, because the structured query language (SQL) actually executed by Task A generates T-2 data (T-2 represents the data of the previous two periods calculated in the current cycle), that is, Task A executes 1 For the task on January 3rd, the calculated output is actually the data on January 1st. When users do not understand the specific execution content of task A, they will think that the daily execution of task A should produce the data of that day (T-0). Therefore, when configuring downstream task B, a dependent time range associated with the execution time of task A will be configured, causing task B to run empty data every day, because data a of T-0 will always be generated 2 days later. .

Therefore, existing task scheduling methods have the problem of high task delay due to user configuration.

In order to solve the above problems, embodiments of the present disclosure provide a task scheduling method, which is applied in the field of big data analysis technology. The execution subject of each step of the method may be an electronic device with computing and processing capabilities. In one embodiment, the electronic device can be a terminal, such as a mobile phone, a tablet computer, a smart wearable device, etc.; in another embodiment, the electronic device can be a server, and the server can be one server or multiple servers. A server cluster composed of multiple servers may also be a cloud server, which is not limited in the embodiments of the present disclosure.

FIG. 1 is a schematic flowchart of an implementation of a task scheduling method in an embodiment of the present disclosure. Referring to FIG. 1 , the task scheduling method may include S101 to S103.

S101, the electronic device acquires the first task.

It should be understood that the first task can be any task that completes the configuration; or, the first task can be multiple tasks that complete the configuration. The process of configuring the task can be completed by the user inputting on a special information configuration page; or the process of configuring the task can be completed by the electronic device setting itself. The process of configuring tasks can also be completed in other ways, and this disclosure does not specifically limit this.

It should be noted that the process of configuring a task may include: configuring dependencies of the task and configuring execution parameters of the task. Among them, dependency relationships can be used to represent execution dependencies between tasks. For example, if the execution of task A requires the use of the execution results of task B, then the dependency relationship between task A and task B can be that the upstream task of task A is task B. . Execution parameters can be used to represent the rules that the current task follows when executing. For example, when task A is executed, it follows the rule of executing once a week, so that executing once a week is the execution parameter of task A.

It should be understood that after the task configuration is completed, the electronic device will store the task and its configuration information (dependencies and execution parameters) for subsequent retrieval.

In some possible implementations, S201 may also be included after S101, and S201 may be executed after S101, And executed before S102 is executed. Figure 2 is a schematic flowchart of an implementation of a task scheduling method in an embodiment of the present disclosure.

S201. The electronic device can execute the first task according to the first dependency data table and the first execution parameter of the first task.

The first dependency data table may indicate a data table produced by an upstream task on which the first task depends during execution.

It should be understood that in the big data scenario, according to the task configuration information, the upstream data required for task execution and the output data of the task execution can be obtained. Among them, the upstream data refers to the data table required to execute the current task, and the output data refers to the data table where the data is finally written after the current task is completed. Multiple upstream data are allowed, but only one output data is allowed. Each task can produce data and obtain an output data table (which can be understood as an output data table). The output data table can be used as the upstream data of downstream tasks (can be understood as a dependent data table).

It can be seen from the above that the electronic device can obtain the first task through S101, and then obtain the configuration information of the first task. Execute S201 to obtain the first dependency data table and first execution parameters of the first task through the configuration information of the first task. Execute the first task according to the first dependency data table and the first execution parameters.

In some possible implementations, the first execution parameter may include at least one of the following: execution granularity, dependency granularity, dependency time offset, and output time offset. Among them, the execution granularity can be used to represent the execution cycle of the first task; the dependency granularity can be used to represent the cycle of the first dependency data table that the first task relies on when executing; the dependency time offset can be used to represent the execution of the first dependency data table. The offset value between the execution time of the task and the output time of the first dependent data table; and, the output time offset can be used to represent the execution time of the first task and the output of the first output data table offset value between moments.

It should be understood that in big data scenarios, data tables always need to be partitioned to store data. For example, partitions can include monthly partitions, weekly partitions, etc. In the monthly partition, the data will generate calculation results once a month and store them in the partition belonging to that month. Therefore, in the embodiment of the present disclosure, the execution cycle of the task and the dependency cycle of the task can be set, and the execution granularity and dependency granularity can be used to uniformly manage the cycle. Preferably, the execution period may include one month, one week, one day and one hour.

Furthermore, the execution time of a task can represent the periodic running time of the task. For example, a task can be run in an hourly cycle, and the execution time of a task running in two consecutive cycles can be: the first time is 2022/03/01 00:00:00, and the second time is 2022/03/01 01:00: 00. The output time of a task can represent the specific time when the task produces data each time it is executed. For example, if a task runs in a daily cycle, the output time of a task running in two consecutive cycles can be: the first time is 2022/03/01 00:00:00, and the second time is 2022/03/02 00:00: 00.

Furthermore, a time offset can be introduced based on the execution time and output time. The time offset can represent the offset value between the execution time and the output time. That is, execution time + offset = output time. In the embodiment of the present disclosure, since tasks are set to different granularities, it is not easy to characterize the characteristics of tasks with different granularities using a single offset. Therefore, the offset is divided into two indicators: granularity and offset to adapt to tasks of different strengths. For example, the dependent time offset can represent the offset value between the execution time of the task and the output time of the upstream task, expressed as the output time of the upstream task Moment - dependency time offset * dependency granularity = execution time of downstream tasks. The output time offset can represent the offset value between the execution time of the task and the task output time, which is expressed as task execution time + output time offset * execution granularity = data output time.

In one embodiment, FIG. 3 is a schematic diagram of the structure of a task configuration in an embodiment of the present disclosure. Referring to FIG. 3 , six tasks are shown. Among them, three tasks (Task A, Task B, and Task C) are not configured with dependency data tables, so they are upstream tasks; three tasks (Task D, Task E, and Task F) are configured with dependency data tables, so they are downstream tasks.

Among them, upstream task A outputs data table a. The granularity of table a is hourly level and outputs data of T-1. Among them, T-1 can mean that the output time of task A is one hour earlier than the execution time. Upstream task B outputs data table b. The granularity of table b can be day level, and outputs data of T-1; where T-1 means that the output time of task B is one day earlier than the execution time. The upstream task C outputs data table c. The granularity of table c can be day level and produces T-3 data; where T-3 means that the output time of task C is three days earlier than the execution time. Downstream task D, output data table d, the granularity of table d can be day level. Depend on data table a, use the data of T-1 of table a with daily granularity; where T-1 means that the output time of upstream task A is one day earlier than the execution time of downstream task D. Downstream task E produces data table e. The granularity of table e can be day-level. Dependent data table a, dependent data table b, uses the data of T-2 in table a at hourly granularity, and uses the data of T-1 in table b at daily granularity; among them, T-2 indicates that the output time of upstream task A is higher than that of downstream task A. The execution time of task E is two hours earlier; T-1 means that the output time of upstream task B is one day earlier than the execution time of downstream task E. Downstream task F, output data table f, the granularity of table f can be monthly level. Dependent on data table c, use the data of T-1 of table c at monthly granularity. Among them, T-1 means that the output time of upstream task C is two months earlier than the execution time of downstream task F.

Illustratively, taking the task configuration in Figure 3 as an example, the execution of the first task in S201 according to the first dependency data table and the first execution parameters will be described. The output data table a (abbreviated as table a) of electronic equipment judgment task A corresponds to the data output time (2022/02/28 23:00:00) at the current system time (2022/03/01 00:05:36) ). If there is no success information, you can filter out task A and trigger the execution of the task at the moment corresponding to the current system time (2022/03/01 00:00:00).

Specifically, the conversion logic between system time and data output time in the above example is: erase the unit value in the system time that is lower granularity than the current task output data, and set it to 0. The result is the task execution date. Then, the data output date is obtained through the time period offset between the task execution date and the output data.

S102. When the execution of the first task is completed, the electronic device determines N second tasks, and the second tasks are downstream tasks of the first task.

It should be understood that when the execution of the first task is completed, the electronic device can determine the downstream tasks of the first task according to the configuration information of the task. Due to the complex dependencies between tasks, the first task can have one downstream task or multiple downstream tasks.

In some possible implementations, when it is necessary to determine the downstream tasks of the first task, the electronic device can retrieve the configuration information stored in the database and determine the downstream tasks of the first task based on the dependency information in the configuration information. It should be understood that the configuration information can be stored in the form of a data table.

In one embodiment, Figure 4 is a schematic diagram of the structure of the system table in the embodiment of the present disclosure. Referring to Figure 4, the configuration information of the task is written into the relationship table (RELATION) as shown in Figure 4. storage. The information recorded in the RELATION table can include task name, output data table, dependency data table, output granularity, and time offset. The electronic device can obtain the configuration information of the task through the query table RELATION. After the first task is completed, the electronic device can also store information indicating that the task was successfully executed. As shown in Figure 4, information about successful task execution can be written to the task status table (DATASET_STATUE) as shown in Figure 4 for storage. The information recorded in the table DATASET_STATUE can include the task name, data output date, and status (to determine whether it is successful). The electronic device can obtain information about whether the task is successfully executed by querying the DATASET_STATUE table.

For example, taking the task configuration in Figure 3 as an example, the output data granularity of task A can be at the hour level, so the electronic device will trigger the execution of task A once every hour. Assume that after the execution of task A at the execution time of 2022/03/01 00:00:00 is successful, since the execution parameters of task A are configured to produce T-1 data, then the output data of table a will eventually be entered in the table DATASET_STATUE. The status at time (2022/02/28 23:00:00) is successful.

In some possible implementations, S102 may also include obtaining a first output data table when the execution of the first task is completed. The first output data table may be a data table generated when the execution of the first task is completed. Among the downstream tasks of the first task, N downstream tasks that depend on the first output data table can be determined as N second tasks.

It should be understood that after the execution of the first task is completed, the electronic device can write the output data of the first task into a table (which can be understood as an output data table) to obtain the first output data table of the first task. By querying the task configuration information stored after the task configuration is completed, the electronic device can obtain N tasks that depend on the first output data table (that is, the dependent data tables of the N tasks are the first output data table), and combine the N tasks with the first output data table. The task is determined as the second task.

Taking the task configuration shown in Figure 3 as an example, the output data table of task A is a, and both tasks D and task E are configured with dependency data table a. Therefore, task D and task E are determined to be the downstream data of task A.

S103: When the upstream task of the i-th second task among the N second tasks is completed, the electronic device executes the i-th second task.

It should be understood that after the task execution is completed, the electronic device will store the information on the completion of the task execution. Based on the stored information, the electronic device queries whether all upstream tasks of the i-th second task have been completed. When the execution is completed, the execution of the i-th second task can be triggered. Due to the complex dependencies between tasks, the i-th second task can have multiple upstream tasks.

Among them, whether all the upstream tasks of the i-th second task are completed is determined by whether the upstream task produces the dependency data table required by the second task. Therefore, the dependency between the second task and the upstream task is generated by the data table. relationship, with The task itself is irrelevant.

For example, taking the task configuration in Figure 3 as an example, task E depends on task A. It is assumed that task A produces data with a date of 2022/02/28 23:00:00 and writes it to the data output table a. Since task E uses the data of T-2 of table a at hourly granularity. According to the output time of the upstream task - dependency time offset * dependency granularity = execution time of the downstream task (i.e. 2022/02/28 23:00:00-(-2)*day = 2022/03/02 00:00: 00), it can be determined that the execution time of task E is 2022/03/02 00:00:00.

When the electronic device obtains task E and needs to execute it at 2022/03/02 00:00:00, it will check whether table a is between 2022/02/28 00:00:00-2022/02/28 23:00:00 There is a success status every hour during the period. Check whether data table b has a success status on the date 2022/03/01 00:00:00. If the conditions are met, task E will be executed immediately.

In some possible implementations, as shown in Figure 2, S202 may be included before S103, and S202 may be executed after S102 and before S103.

S202. The electronic device polls N second tasks; according to the second dependency data table of each second task, the upstream task of each second task can be determined. The second dependency data table may be used to indicate the data table produced by the upstream task on which each second task depends during execution.

It should be understood that when there are multiple second tasks, each second task needs to be polled. According to the configuration information stored after the task is configured, the second dependent data table of the current second task can be determined, and the task that produces the second dependent data table of the second task can be determined to be the upstream task of the current second task. For details on how to determine the upstream task based on the task configuration information, please refer to S102.

In some possible implementations, the electronic device detects whether the upstream task of each second task has produced a data table; when it is detected that the upstream task of the i-th second task has produced a data table, determines whether the i-th second task has produced a data table. The task's upstream task execution is completed.

It should be understood that after the task execution is completed, the electronic device will store the completed information. When the electronic device needs to query the information of the upstream task required for the i-th second task, the stored information can be retrieved. When it is determined that the dependent data table has been produced by the upstream task at the current system time, it can be determined that the execution of the upstream task is completed. .

For example, the table DATASET_STATUE in Figure 4 can be used to store information about successful execution. The electronic device can query the information about whether the task is successfully executed by querying the table DATASET_STATUE.

In some possible implementations, as shown in the dotted box and dotted arrow in Figure 2 . S203 may also be included after S102, and S203 may be executed after S102 and before S103. Alternatively, S203 may be included after S102, and S202 may be executed after S203.

S203. According to the second execution parameter of each second task among the N second tasks, the electronic device can register an execution trigger for each second task, and the execution trigger is configured to execute the corresponding second task. When the time arrives, the second task execution is triggered.

It should be understood that the second execution parameter can be used to determine the execution time of the second task. After determining the execution time, the electronic device registers an execution trigger with the execution time for each second task. When the execution time is reached, the second task execution can be electronically triggered.

In some possible implementations, the execution time of the second task may be determined by the time it takes for the upstream task to generate the dependency data table of the second task. Therefore, the second execution parameters may include dependency granularity and dependency time offset.

For example, take the task configuration in Figure 3 as an example and refer to the calculation process of S103. Assume that the execution date of task A is 2022/03/01 00:00:00. When task A is executed successfully, since the downstream task E depends on the data a of T-2 at a daily granularity, a trigger for task E with an execution date of 2022/03/02 00:00:00 will be registered.

In one embodiment, Figure 5 is another schematic diagram of the structure of a system table in an embodiment of the present disclosure. Referring to Figure 5, the information of each second task execution trigger can be recorded in the trigger table ( TRIGGER). The information recorded in table TRIGGER includes task name, task execution date and status (to determine whether it is successful). The electronic device can obtain the information of the second task corresponding to the current trigger by querying the TRIGGER table.

In some possible embodiments, as shown in the dotted box and dotted arrow in Figure 2, S204 may also be included after S203, and S204 may be executed after S203 and before S103. Alternatively, S202 may be included before S204, S202 may be executed after S203, and S204 may be executed after S202.

S204. When the execution of the upstream task of the i-th second task is completed, the electronic device can trigger the execution trigger corresponding to the i-th second task.

It should be understood that by retrieving stored data, the electronic device can obtain whether all the upstream tasks of the second task in the trigger have a success status. If the upstream tasks of the second task have been successfully executed, the execution of the current task will be triggered immediately.

Among them, the fact that all the upstream tasks of the second task have been executed successfully means that the dependent data tables of the second task have been produced at the current moment.

For example, the electronic device obtains all triggers in the to-be-triggered state at intervals of 5 seconds, and then checks the corresponding tasks that need to be performed in the triggers. The trigger obtains the upstream task of the corresponding task and queries the status of the upstream task of the corresponding task. If the upstream tasks have produced dependent data tables, the corresponding task in the trigger is immediately triggered to execute.

In this embodiment, through S101 to S103, the first task can be obtained, and when the execution of the first task is completed, N second tasks are determined, where the second tasks are downstream tasks of the first task. When the upstream task of the i-th second task among the N second tasks is completed, the i-th second task can be executed. It can be seen that the downstream tasks in the present disclosure can be executed when the execution of the upstream tasks is determined to be completed, without waiting for a fixed execution time, which effectively reduces the delay between the upstream and downstream tasks.

In this embodiment, it can be seen from S101 to S103 and S201 to 204 that the first task is obtained, and when the execution of the first task is completed, N second tasks (downstream tasks) can be determined. And register execution triggers for N second tasks. When it is checked that the upstream task of the i-th second task is completed, the execution trigger can be triggered to execute the i-th second task. Electronic devices can pick up tasks in triggers and execute them at very low intervals. During the entire process, the delay in scheduling upstream and downstream tasks is maintained at the second level, which can effectively reduce the delay between upstream and downstream tasks. At the same time, the output data table is directly used as the upstream, and the intensity and dependency time offset are configured according to the current task situation. It avoids the problems of difficult configuration and configuration errors caused by inconsistent execution time and output time when using tasks as upstream.

The following uses examples to illustrate the task scheduling process in the embodiment of the present disclosure.

FIG. 6 is a schematic flowchart of another implementation of the task scheduling method in an embodiment of the present disclosure. As shown in Figure 6, it includes:

S601, the electronic device parses the task, configures the dependency information and execution parameters of the task, and enters S602;

First, the electronic device can perform syntax analysis on the user's SQL task through a syntax parsing tool. Obtaining the task from the parsing result requires relying on the data table and output data table and supplementing the configuration information of the task; secondly, the electronic device can perform syntax analysis based on the user's SQL. Usage, configure the execution parameters of the task.

S602, the electronic device can store the dependency information and execution parameters of the task and enter S603;

Among them, the electronic device can enter the dependency information and execution parameters of the task into the table RELATION.

S603, the electronic device can obtain all upstream tasks and determine the task execution status, and enter S604;

Among them, the electronic device can obtain all tasks without upstream data at 5s intervals, and determine whether the task's output data table has success information under the data output date corresponding to the current system time.

S604, the electronic device can obtain all unexecuted upstream tasks and trigger the execution of the tasks, entering S605;

Among them, the electronic device can filter out the data tables for which no successful information is found, obtain the corresponding tasks through the RELATION table, and trigger the execution of tasks corresponding to the current system time.

S605, the electronic device can register an execution trigger for the downstream task after the execution of the upstream task is completed, and enter S606;

Among them, after the task execution is completed, the electronic device can enter the corresponding information on the success of the output data into the table DATASET_STATUE. The electronic device can find the downstream tasks of the task through the RELATION table, register execution triggers for the downstream tasks, and store the execution trigger information in the TRIGGER table.

S606, the electronic device queries the execution trigger, determines that all corresponding upstream tasks are successful, and enters S607;

Among them, the electronic device can obtain all execution triggers in the to-be-triggered status of the table TRIGGER at intervals of 5 seconds, and check that the upstream tasks of the corresponding tasks in the execution triggers have successful status in the table DATASET_STATUE.

S607, execute downstream tasks.

Among them, the electronic device queries the table DATASET_STATUE to show that the upstream dependent table has been successful, immediately executes the current task, and changes the table TRIGGER to successful. The electronic device can perform all tasks and implement task scheduling by repeating S604 to S607.

In the embodiment of the present disclosure, it can be known from S601 to S607 that the electronic device can obtain the downstream task through the upstream task and register an execution trigger for each downstream task. Continuously poll the execution trigger and query the upstream of the task corresponding to the execution trigger Whether the task is completed. If all upstream tasks are completed, downstream tasks are executed. It can be seen that after each task in this disclosure is executed, the downstream task can be quickly found and an execution trigger is registered for it through the data lineage maintained in the database. The electronic device obtains the task in the execution trigger and executes it at extremely low intervals. During the entire process, the delay in scheduling upstream and downstream tasks is maintained at the second level, achieving low latency in task scheduling. Furthermore, the output data table is directly used as the upstream, and the intensity and dependent time offset are configured according to the current task situation. It avoids the problems of difficult configuration and configuration errors caused by inconsistent execution time and output time when using tasks as upstream.

Based on the same inventive concept, embodiments of the present disclosure also provide a task scheduling device. The task scheduling device may be a chip or a system-on-chip in an electronic device, or may be used in an electronic device to implement the above-mentioned embodiments. The function module of the method. The task scheduling device can realize the functions performed by the electronic devices in the above embodiments, and these functions can be realized by hardware executing corresponding software. These hardware or software include one or more modules corresponding to the above functions.

Figure 7 is a schematic structural diagram of a task scheduling device in an embodiment of the present disclosure. Referring to Figure 7, the task scheduling device 700 may include: an acquisition module 701 configured to acquire the first task; determine Module 702, the determination module 702 is configured to determine N second tasks when the execution of the first task is completed, the second tasks are downstream tasks of the first task, and N is a positive integer; the execution module 703, the execution module 703 is It is configured to execute the i-th second task when the upstream task of the i-th second task among the N second tasks is completed.

In some possible implementations, the acquisition module 701 is further configured to: after acquiring the first task, execute the first task according to the first dependency data table and the first execution parameters of the first task; the first dependency data table indicates The data table produced by the upstream task that the first task relies on when executing.

In some possible implementations, the determination module 702 is further configured to: when the execution of the first task is completed, obtain the first output data table, and the first output data table is the data generated when the execution of the first task is completed. table; among the downstream tasks of the first task, N downstream tasks that depend on the first output data table are determined as N second tasks.

In some possible implementations, the first execution parameter includes at least one of the following: execution granularity, dependency granularity, dependency time offset, and output time offset; wherein the execution granularity is used to represent the execution of the first task. Period; dependency granularity is used to represent the period of the first dependency data table that the first task relies on when executing; dependency time offset is used to represent the time between the execution time of the first task and the output time of the first dependency data table. The offset value; the output time offset is used to represent the offset value between the execution time of the first task and the output time of the first output data table.

In some possible implementations, the execution module 703 is further configured to: when the upstream task of the i-th second task among the N second tasks is completed, before executing the i-th second task, poll N second tasks; according to the second dependency data table of each second task, determine the upstream task of each second task, where the second dependency data The table is a data table used to indicate the output of the upstream task that each second task depends on when executing.

In some possible implementations, the execution module 703 is further configured to: after determining the upstream task of each second task according to the second dependency data table of each second task, detect the upstream task of each second task Whether to generate a data table; when it is detected that the upstream task of the i-th second task has generated a data table, it is determined that the upstream task of the i-th second task has been executed.

In some possible implementations, the execution module 703 is further configured to: after determining the N second tasks when the execution of the first task is completed, perform a second execution of the second task according to each of the N second tasks. Parameter, register an execution trigger for each second task. The execution trigger is configured to trigger the execution of the second task when the execution time of the corresponding second task arrives; when the i-th second task among the N second tasks When the execution of the upstream task of the i-th second task is completed, executing the i-th second task includes: when the execution of the upstream task of the i-th second task is completed, triggering the execution trigger corresponding to the i-th second task.

It should be noted that for the specific implementation process of the acquisition module 701, the determination module 702 and the execution module 703, reference can be made to the detailed description of the embodiments in Figures 1 to 6. For the sake of simplicity of the description, they will not be described again here.

Based on the same inventive concept, embodiments of the present disclosure provide an electronic device, which may be the electronic device described in one or more of the above embodiments. FIG. 8 is a schematic structural diagram of an electronic device in an embodiment of the present disclosure. As shown in FIG. 8 , the electronic device 800 can use general computer hardware, including a processor 801 and a memory 802 .

In some possible implementations, the at least one processor may constitute any physical device having circuitry that performs logical operations on one or more inputs. For example, at least one processor may include one or more integrated circuits (ICs), including application specific integrated circuits (ASICs), microchips, microcontrollers, microprocessors, all or part of a central processing unit (CPU), graphics processing unit unit (GPU), digital signal processor (DSP), field programmable gate array (FPGA), or other circuitry suitable for executing instructions or performing logical operations. Instructions for execution by the at least one processor may, for example, be preloaded into memory integrated with or embedded in the controller, or may be stored in separate memory. Memory may include random access memory (RAM), read only memory (ROM), hard disk, optical disk, magnetic media, flash memory, other permanent, fixed or volatile memory, or any other mechanism capable of storing instructions. In some embodiments, at least one processor may include more than one processor. Each processor may have a similar structure, or the processors may have different configurations that are electrically connected or disconnected from each other. For example, the processor may be a separate circuit or integrated in a single circuit. When more than one processor is used, the processors may be configured to operate independently or cooperatively. Processors may be coupled electrically, magnetically, optically, acoustically, mechanically, or by other means that allow them to interact.

According to one embodiment of the present invention, the present invention also provides a computer-readable storage medium on which computer instructions are stored, and the instructions are used by a processor to execute the steps of the method for scheduling tasks. Memory 802 may include computer storage media in the form of volatile and/or non-volatile memory, such as read-only memory and/or random access memory. Memory 802 may store operating systems, application programs, other program modules, executable code, program data, user Data etc.

In addition, the above-mentioned memory 802 stores computer execution instructions for implementing the functions of the acquisition module 701, the determination module 702 and the execution module 703 in Figure 7. The functions/implementation processes of the acquisition module 701, the determination module 702 and the execution module 703 in Figure 7 can all be implemented by the processor 801 in Figure 8 calling the computer execution instructions stored in the memory 802. For specific implementation processes and functions, please refer to the above-mentioned related implementations. example.

Those skilled in the art can understand that the size of the sequence numbers of each step in the above embodiments does not mean the order of execution. The execution order of each process should be determined by its functions and internal logic, and should not be determined by the implementation process of the embodiments of the present invention. constitute any limitation.

The above-described embodiments are only used to illustrate the technical solutions of the present invention, but not to limit them. Although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that they can still implement the above-mentioned implementations. The technical solutions described in the examples are modified, or some of the technical features are equivalently replaced; and these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of each embodiment of the present invention, and should be included in within the protection scope of the present invention.

Industrial applicability

Embodiments of the present disclosure provide a task scheduling method, apparatus and equipment. The technical solution provided by the embodiments of the present disclosure obtains the first task and determines N second tasks when the execution of the first task is completed, where the second tasks are downstream tasks of the first task. When the upstream task of the i-th second task among the N second tasks is completed, the i-th second task is executed. It can be seen that the downstream tasks in the present disclosure can trigger execution when the execution of the upstream tasks is determined to be completed, without waiting for a fixed execution time, which effectively reduces the delay between the upstream and downstream tasks.

In addition, it can be understood that the task scheduling methods, devices and equipment provided by the embodiments of the present disclosure are reproducible and can be used in a variety of industrial applications. For example, the task scheduling method, apparatus and equipment provided by the embodiments of the present disclosure can be used in the field of big data analysis technology, such as the field of task scheduling in big data analysis.

Claims

A method of task scheduling, wherein the method includes:

Get the first task;

When the execution of the first task is completed, N second tasks are determined. The second tasks are downstream tasks of the first task, and N is a positive integer;

When the upstream task of the i-th second task among the N second tasks is completed, the i-th second task is executed.
The method according to claim 1, wherein after obtaining the first task, the method further includes:

Execute the first task according to the first dependency data table and first execution parameters of the first task; the first dependency data table indicates the data generated by the upstream task on which the first task depends during execution. surface.
The method according to claim 1 or 2, wherein when the execution of the first task is completed, determining N second tasks includes:

When the execution of the first task is completed, a first output data table is obtained, and the first output data table is the data table generated when the execution of the first task is completed;

Among the downstream tasks of the first task, N downstream tasks that depend on the first output data table are determined as the N second tasks.
The method of claim 2, wherein the first execution parameter includes at least one of the following: execution granularity, dependency granularity, dependency time offset and output time offset;

Wherein, the execution granularity is used to represent the execution cycle of executing the first task;

The dependency granularity is used to represent the cycle of the first dependency data table that the first task relies on when executing;

The dependency time offset is used to represent the offset value between the execution time of the first task and the output time of the first dependency data table;

The output time offset is used to represent an offset value between the execution time of executing the first task and the output time of the first task output data table.
The method of claim 4, wherein the execution period includes at least one of the following: one month, one week, one day, and one hour.
The method according to any one of claims 1 to 5, wherein when the upstream task of the i-th second task among the N second tasks is completed, the i-th second task is executed. Before the second task, the method further includes:

Poll the N second tasks;

Determine the upstream tasks of each second task according to the second dependency data table of each second task, wherein the second dependency data table is used to indicate the upstream tasks on which each second task depends during execution. Output data table.
The method according to claim 6, wherein after determining the upstream task of each second task according to the second dependency data table of each second task, the method further includes:

Detect whether the upstream task of each second task produces a data table;

When it is detected that the upstream task of the i-th second task has produced a data table, it is determined that the execution of the upstream task of the i-th second task is completed.
The method according to any one of claims 1 to 7, wherein, after determining N second tasks when the execution of the first task is completed, the method further includes:

According to the second execution parameter of each second task among the N second tasks, an execution trigger is registered for each second task, and the execution trigger is configured to execute the The execution of the second task is triggered when the execution time arrives;

When the upstream task of the i-th second task among the N second tasks is completed, executing the i-th second task includes:

When the execution of the upstream task of the i-th second task is completed, the execution trigger corresponding to the i-th second task is triggered.
The method of claim 3, wherein executing the first task according to the first dependent data table and the first execution parameter of the first task includes: determining whether the output data table of the first task is currently Whether there is success information at the data output moment corresponding to the system time. If there is no success, the first task is filtered out and the execution of the task at the moment corresponding to the current system time is triggered.
The method according to any one of claims 1 to 9, wherein when determining the downstream task of the first task, the configuration information stored in the database is retrieved and determined based on the dependency information in the configuration information. downstream tasks of the first task.
The method according to any one of claims 1 to 10, wherein after the execution of the first task is completed, information indicating the successful execution of the first task is stored in a task status table, wherein the task The information in the status table includes task name, data output date, and status.
A device for task scheduling, wherein the device includes:

An acquisition module, the acquisition module is configured to acquire the first task;

Determining module, the determining module is configured to determine N second tasks when the execution of the first task is completed, the second tasks are downstream tasks of the first task, and N is a positive integer; and

An execution module, the execution module is configured to execute the i-th second task when the upstream task of the i-th second task among the N second tasks is completed.
The device according to claim 12, wherein the acquisition module is further configured to: after the acquisition of the first task, execute the first dependency data table and the first execution parameter of the first task. The first task; the first dependency data table indicates the data table produced by the upstream task on which the first task depends during execution.
The device according to claim 12 or 13, wherein the determination module is further configured to: when the execution of the first task is completed, obtain a first output data table, the first output data table is the The data table generated when the first task is executed is completed; among the downstream tasks of the first task, N downstream tasks that depend on the first output data table are determined as the N second tasks.
The apparatus according to claim 13, wherein the first execution parameter includes at least one of the following: execution granularity, dependency granularity, dependency time offset and output time offset;

Wherein, the execution granularity is used to represent the execution cycle of executing the first task; the dependency granularity is used to represent the cycle of the first dependency data table that the first task relies on when executing; the dependency time The offset is used to represent the offset value between the execution time of the first task and the production time of the first dependent data table; the production time offset is used to represent the execution of the first The offset value between the execution time of the task and the output time of the first task output data table.
The device according to claim 15, wherein the execution period includes at least one of the following: one month, one week, one day, and one hour.
The apparatus according to any one of claims 12 to 16, wherein the execution module is further configured to: when the upstream task of the i-th second task among the N second tasks When the execution is completed, before executing the i-th second task, the N second tasks are polled; according to the second dependency data table of each second task, the upstream task of each second task is determined, where, The second dependency data table is used to indicate a data table produced by an upstream task on which each second task depends during execution.
The apparatus according to any one of claims 12 to 17, wherein the execution module is further configured to: determine each second task in the second dependency data table according to each second task After the upstream task of each second task, detect whether the upstream task of each second task has produced a data table; when it is detected that the upstream task of the i-th second task has produced a data table, determine whether the i-th second task has produced a data table. The upstream task of the second task is executed.
The apparatus according to any one of claims 12 to 18, wherein the execution module is further configured to: after determining N second tasks when the execution of the first task is completed, according to the The second execution parameter of each second task among the N second tasks is to register an execution trigger for each second task, and the execution trigger is configured to execute the corresponding second task. The second task is triggered when the time arrives Execution; when the upstream task of the i-th second task among the N second tasks is completed, executing the i-th second task includes: when the upstream task of the i-th second task When the task execution is completed, the execution trigger corresponding to the i-th second task is triggered.
An electronic device, wherein the electronic device includes:

memory configured to store instructions executable by the processor; and

A processor; wherein the processor is configured to implement the method according to any one of claims 1 to 11 when executing the executable instructions.
A computer-readable storage medium, wherein the readable storage medium stores an executable program, wherein the executable program implements the method according to any one of claims 1 to 11 when executed by a processor .