WO2023213118A1 - Task scheduling method and apparatus, and device - Google Patents

Task scheduling method and apparatus, and device Download PDF

Info

Publication number
WO2023213118A1
WO2023213118A1 PCT/CN2023/078004 CN2023078004W WO2023213118A1 WO 2023213118 A1 WO2023213118 A1 WO 2023213118A1 CN 2023078004 W CN2023078004 W CN 2023078004W WO 2023213118 A1 WO2023213118 A1 WO 2023213118A1
Authority
WO
WIPO (PCT)
Prior art keywords
task
execution
tasks
data table
upstream
Prior art date
Application number
PCT/CN2023/078004
Other languages
French (fr)
Chinese (zh)
Inventor
武浩瑞
张韬
Original Assignee
北京快乐茄信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京快乐茄信息技术有限公司 filed Critical 北京快乐茄信息技术有限公司
Publication of WO2023213118A1 publication Critical patent/WO2023213118A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system

Definitions

  • the present disclosure relates to the technical field of big data analysis, and in particular to a task scheduling method, device and equipment.
  • Traditional task scheduling mainly uses periodic scheduled execution and tasks as upstream dependencies. Because users can hardly give the most appropriate scheduled execution time when configuring tasks. This leads to the problem of high latency in traditional task scheduling methods.
  • Embodiments of the present disclosure provide a task scheduling method, apparatus and equipment to achieve effects such as reducing delays between upstream and downstream tasks.
  • embodiments of the present disclosure provide a task scheduling method, including: obtaining a first task; when the execution of the first task is completed, determining N second tasks, and the second tasks are downstream tasks of the first task, N is a positive integer; when the upstream task of the i-th second task among the N second tasks is completed, the i-th second task is executed.
  • the method may further include: executing the first task according to the first dependency data table and the first execution parameter of the first task; the first dependency data table indicates the first task The data table produced by the upstream tasks that it depends on during execution.
  • determining N second tasks includes: when the execution of the first task is completed, obtaining the first output data table, and the first output data table is the first The data table produced when the task is completed; among the downstream tasks of the first task, N downstream tasks that depend on the first output data table are determined as N second tasks.
  • the first execution parameter may include at least one of the following: execution granularity, dependency granularity, dependency time offset, and output time offset; where execution granularity is used to represent execution of the first task. Execution cycle; dependency granularity is used to represent the cycle of the first dependency data table that the first task relies on when executing; dependency time offset is used to represent the difference between the execution time of the first task and the output time of the first dependency data table. offset value between The offset is used to represent the offset value between the execution time of the first task and the output time of the first task output data table.
  • the execution period may include at least one of the following: one month, one week, one day, and one hour.
  • the method when the upstream task of the i-th second task among the N second tasks is completed, before executing the i-th second task, the method further includes: polling the N second tasks ; Determine the upstream task of each second task according to the second dependency data table of each second task, where the second dependency data table is used to indicate the output of the upstream task on which each second task depends during execution. data sheet.
  • the method further includes: detecting whether the upstream task of each second task produces data. table; when it is detected that the upstream task of the i-th second task has produced a data table, it is determined that the upstream task of the i-th second task has been executed.
  • the method may further include: according to the second execution parameters of each of the N second tasks, Each second task registers an execution trigger, and the execution trigger is configured to trigger the execution of the second task when the execution time of the corresponding second task arrives; when the upstream task of the i-th second task among the N second tasks
  • executing the i-th second task includes: when the execution of the upstream task of the i-th second task is completed, triggering the execution trigger corresponding to the i-th second task.
  • executing the first task includes: corresponding to the current system time according to the output data table of the first task. Whether there is success information at the data output time. If there is no success, the first task can be filtered out to trigger the execution of the task at the time corresponding to the current system time.
  • the configuration information stored in the database is retrieved, and the downstream tasks of the first task are determined based on the dependency information in the configuration information.
  • information indicating the successful execution of the first task is stored in a task status table, where the information in the task status table includes task name, data Production date and status.
  • inventions of the present disclosure provide a device for task scheduling.
  • the device may be a chip or a system-on-chip in an electronic device. It may also be configured in an electronic device to implement the first aspect and any of its possibilities. Functional modules that implement the method.
  • the task scheduling device can realize the functions performed by the electronic device in the first aspect and any of its possible implementation modes, and the functions can be realized by hardware executing corresponding software. Hardware or software includes one or more modules corresponding to the above functions.
  • the device for task scheduling includes: an acquisition module configured to acquire a first task; a determination module configured to determine N second tasks when the execution of the first task is completed. The task is a downstream task of the first task, and N is a positive integer; the execution module is configured to execute the i-th second task when the upstream task of the i-th second task among the N second tasks is completed. Task.
  • the acquisition module is further configured to: after acquiring the first task, execute the first task according to the first dependency data table and the first execution parameter of the first task; the first dependency data table indicates the first A data table produced by upstream tasks that a task depends on during execution.
  • the determination module is further configured to: when the execution of the first task is completed, obtain the first output data table, and the first output data table is the data table generated when the execution of the first task is completed. ; Among the downstream tasks of the first task, N downstream tasks that depend on the first output data table are determined as N second tasks.
  • the first execution parameter may include at least one of the following: execution granularity, dependency granularity, dependency time offset, and output time offset; where execution granularity is used to represent execution of the first task.
  • Execution cycle dependency granularity is used to represent the cycle of the first dependency data table that the first task relies on when executing
  • dependency time offset is used to represent the difference between the execution time of the first task and the output time of the first dependency data table.
  • the offset value between; the output time offset is used to represent the offset value between the execution time of the first task and the output time of the first task output data table.
  • the execution period includes at least one of the following: one month, one week, one day, and one hour.
  • the execution module is further configured to: when the upstream task of the i-th second task among the N second tasks is completed, before executing the i-th second task, poll the N Second task; determine the upstream tasks of each second task according to the second dependency data table of each second task, where the second dependency data table is used to indicate the upstream tasks on which each second task depends during execution. Output data table.
  • the execution module is further configured to: after determining the upstream task of each second task according to the second dependency data table of each second task, detect whether the upstream task of each second task Produce a data table; when it is detected that the upstream task of the i-th second task has produced a data table, it is determined that the upstream task of the i-th second task has been executed.
  • the execution module is further configured to: after determining the N second tasks when the execution of the first task is completed, according to the second execution parameter of each second task in the N second tasks , register an execution trigger for each second task, and the execution trigger is configured to trigger the execution of the second task when the execution time of the corresponding second task arrives; when the i-th second task among the N second tasks
  • executing the i-th second task includes: when the execution of the upstream task of the i-th second task is completed, triggering the execution trigger corresponding to the i-th second task.
  • embodiments of the present disclosure provide an electronic device, which may include: a memory configured to store processor-executable instructions; a processor; wherein the processor is configured to: execute the executable instructions hour, To implement the method described in the first aspect and any possible implementation manner thereof.
  • embodiments of the present disclosure provide a computer-readable storage medium that stores computer-executable instructions. After the computer-executable instructions are executed by a processor, the computer-readable storage medium can implement the first aspect and any one thereof. possible implementation methods.
  • N second tasks are determined, where the second tasks are downstream tasks of the first task.
  • the upstream task of the i-th second task among the N second tasks is completed, the i-th second task is executed.
  • the downstream tasks in the present disclosure can trigger execution when the execution of the upstream tasks is determined to be completed, without waiting for a fixed execution time, which effectively reduces the delay between the upstream and downstream tasks.
  • Figure 1 is a schematic flowchart of an implementation of a task scheduling method in an embodiment of the present disclosure
  • Figure 2 is a schematic flowchart of another implementation of a task scheduling method in an embodiment of the present disclosure
  • Figure 3 is a schematic diagram of the structure of task configuration in an embodiment of the present disclosure.
  • Figure 4 is a schematic diagram of the structure of a system meter in an embodiment of the present disclosure.
  • Figure 5 is another schematic diagram of the structure of a system meter in an embodiment of the present disclosure.
  • Figure 6 is a schematic flowchart of another implementation of a task scheduling method in an embodiment of the present disclosure.
  • Figure 7 is a schematic structural diagram of a task scheduling device in an embodiment of the present disclosure.
  • FIG. 8 is a schematic structural diagram of an electronic device in an embodiment of the present disclosure.
  • T-2 represents the data of the previous two periods calculated in the current cycle
  • the calculated output is actually the data on January 1st.
  • users do not understand the specific execution content of task A, they will think that the daily execution of task A should produce the data of that day (T-0). Therefore, when configuring downstream task B, a dependent time range associated with the execution time of task A will be configured, causing task B to run empty data every day, because data a of T-0 will always be generated 2 days later. .
  • the execution subject of each step of the method may be an electronic device with computing and processing capabilities.
  • the electronic device can be a terminal, such as a mobile phone, a tablet computer, a smart wearable device, etc.; in another embodiment, the electronic device can be a server, and the server can be one server or multiple servers.
  • a server cluster composed of multiple servers may also be a cloud server, which is not limited in the embodiments of the present disclosure.
  • FIG. 1 is a schematic flowchart of an implementation of a task scheduling method in an embodiment of the present disclosure.
  • the task scheduling method may include S101 to S103.
  • the electronic device acquires the first task.
  • the first task can be any task that completes the configuration; or, the first task can be multiple tasks that complete the configuration.
  • the process of configuring the task can be completed by the user inputting on a special information configuration page; or the process of configuring the task can be completed by the electronic device setting itself.
  • the process of configuring tasks can also be completed in other ways, and this disclosure does not specifically limit this.
  • the process of configuring a task may include: configuring dependencies of the task and configuring execution parameters of the task.
  • dependency relationships can be used to represent execution dependencies between tasks. For example, if the execution of task A requires the use of the execution results of task B, then the dependency relationship between task A and task B can be that the upstream task of task A is task B. .
  • Execution parameters can be used to represent the rules that the current task follows when executing. For example, when task A is executed, it follows the rule of executing once a week, so that executing once a week is the execution parameter of task A.
  • the electronic device will store the task and its configuration information (dependencies and execution parameters) for subsequent retrieval.
  • S201 may also be included after S101, and S201 may be executed after S101, And executed before S102 is executed.
  • Figure 2 is a schematic flowchart of an implementation of a task scheduling method in an embodiment of the present disclosure.
  • the electronic device can execute the first task according to the first dependency data table and the first execution parameter of the first task.
  • the first dependency data table may indicate a data table produced by an upstream task on which the first task depends during execution.
  • the upstream data required for task execution and the output data of the task execution can be obtained.
  • the upstream data refers to the data table required to execute the current task
  • the output data refers to the data table where the data is finally written after the current task is completed.
  • Multiple upstream data are allowed, but only one output data is allowed.
  • Each task can produce data and obtain an output data table (which can be understood as an output data table).
  • the output data table can be used as the upstream data of downstream tasks (can be understood as a dependent data table).
  • the electronic device can obtain the first task through S101, and then obtain the configuration information of the first task.
  • Execute S201 to obtain the first dependency data table and first execution parameters of the first task through the configuration information of the first task.
  • Execute the first task according to the first dependency data table and the first execution parameters.
  • the first execution parameter may include at least one of the following: execution granularity, dependency granularity, dependency time offset, and output time offset.
  • execution granularity can be used to represent the execution cycle of the first task
  • dependency granularity can be used to represent the cycle of the first dependency data table that the first task relies on when executing
  • dependency time offset can be used to represent the execution of the first dependency data table.
  • the output time offset can be used to represent the execution time of the first task and the output of the first output data table offset value between moments.
  • partitions can include monthly partitions, weekly partitions, etc.
  • the execution cycle of the task and the dependency cycle of the task can be set, and the execution granularity and dependency granularity can be used to uniformly manage the cycle.
  • the execution period may include one month, one week, one day and one hour.
  • the execution time of a task can represent the periodic running time of the task.
  • a task can be run in an hourly cycle, and the execution time of a task running in two consecutive cycles can be: the first time is 2022/03/01 00:00:00, and the second time is 2022/03/01 01:00: 00.
  • the output time of a task can represent the specific time when the task produces data each time it is executed. For example, if a task runs in a daily cycle, the output time of a task running in two consecutive cycles can be: the first time is 2022/03/01 00:00:00, and the second time is 2022/03/02 00:00: 00.
  • a time offset can be introduced based on the execution time and output time.
  • the offset is divided into two indicators: granularity and offset to adapt to tasks of different strengths.
  • the dependent time offset can represent the offset value between the execution time of the task and the output time of the upstream task, expressed as the output time of the upstream task
  • Moment - dependency time offset * dependency granularity execution time of downstream tasks.
  • FIG. 3 is a schematic diagram of the structure of a task configuration in an embodiment of the present disclosure.
  • six tasks are shown. Among them, three tasks (Task A, Task B, and Task C) are not configured with dependency data tables, so they are upstream tasks; three tasks (Task D, Task E, and Task F) are configured with dependency data tables, so they are downstream tasks.
  • upstream task A outputs data table a.
  • the granularity of table a is hourly level and outputs data of T-1.
  • T-1 can mean that the output time of task A is one hour earlier than the execution time.
  • Upstream task B outputs data table b.
  • the granularity of table b can be day level, and outputs data of T-1; where T-1 means that the output time of task B is one day earlier than the execution time.
  • the upstream task C outputs data table c.
  • the granularity of table c can be day level and produces T-3 data; where T-3 means that the output time of task C is three days earlier than the execution time.
  • Downstream task D output data table d, the granularity of table d can be day level.
  • Depend on data table a use the data of T-1 of table a with daily granularity; where T-1 means that the output time of upstream task A is one day earlier than the execution time of downstream task D.
  • Downstream task E produces data table e.
  • the granularity of table e can be day-level.
  • Dependent data table a, dependent data table b uses the data of T-2 in table a at hourly granularity, and uses the data of T-1 in table b at daily granularity; among them, T-2 indicates that the output time of upstream task A is higher than that of downstream task A.
  • the execution time of task E is two hours earlier; T-1 means that the output time of upstream task B is one day earlier than the execution time of downstream task E.
  • Downstream task F output data table f, the granularity of table f can be monthly level.
  • T-1 means that the output time of upstream task C is two months earlier than the execution time of downstream task F.
  • the output data table a (abbreviated as table a) of electronic equipment judgment task A corresponds to the data output time (2022/02/28 23:00:00) at the current system time (2022/03/01 00:05:36) ). If there is no success information, you can filter out task A and trigger the execution of the task at the moment corresponding to the current system time (2022/03/01 00:00:00).
  • the conversion logic between system time and data output time in the above example is: erase the unit value in the system time that is lower granularity than the current task output data, and set it to 0. The result is the task execution date. Then, the data output date is obtained through the time period offset between the task execution date and the output data.
  • the electronic device determines N second tasks, and the second tasks are downstream tasks of the first task.
  • the electronic device can determine the downstream tasks of the first task according to the configuration information of the task. Due to the complex dependencies between tasks, the first task can have one downstream task or multiple downstream tasks.
  • the electronic device when it is necessary to determine the downstream tasks of the first task, can retrieve the configuration information stored in the database and determine the downstream tasks of the first task based on the dependency information in the configuration information.
  • the configuration information can be stored in the form of a data table.
  • Figure 4 is a schematic diagram of the structure of the system table in the embodiment of the present disclosure.
  • the configuration information of the task is written into the relationship table (RELATION) as shown in Figure 4. storage.
  • the information recorded in the RELATION table can include task name, output data table, dependency data table, output granularity, and time offset.
  • the electronic device can obtain the configuration information of the task through the query table RELATION. After the first task is completed, the electronic device can also store information indicating that the task was successfully executed.
  • information about successful task execution can be written to the task status table (DATASET_STATUE) as shown in Figure 4 for storage.
  • the information recorded in the table DATASET_STATUE can include the task name, data output date, and status (to determine whether it is successful).
  • the electronic device can obtain information about whether the task is successfully executed by querying the DATASET_STATUE table.
  • the output data granularity of task A can be at the hour level, so the electronic device will trigger the execution of task A once every hour. Assume that after the execution of task A at the execution time of 2022/03/01 00:00:00 is successful, since the execution parameters of task A are configured to produce T-1 data, then the output data of table a will eventually be entered in the table DATASET_STATUE. The status at time (2022/02/28 23:00:00) is successful.
  • S102 may also include obtaining a first output data table when the execution of the first task is completed.
  • the first output data table may be a data table generated when the execution of the first task is completed.
  • N downstream tasks that depend on the first output data table can be determined as N second tasks.
  • the electronic device can write the output data of the first task into a table (which can be understood as an output data table) to obtain the first output data table of the first task.
  • a table which can be understood as an output data table
  • the electronic device can obtain N tasks that depend on the first output data table (that is, the dependent data tables of the N tasks are the first output data table), and combine the N tasks with the first output data table.
  • the task is determined as the second task.
  • the output data table of task A is a, and both tasks D and task E are configured with dependency data table a. Therefore, task D and task E are determined to be the downstream data of task A.
  • the electronic device will store the information on the completion of the task execution. Based on the stored information, the electronic device queries whether all upstream tasks of the i-th second task have been completed. When the execution is completed, the execution of the i-th second task can be triggered. Due to the complex dependencies between tasks, the i-th second task can have multiple upstream tasks.
  • whether all the upstream tasks of the i-th second task are completed is determined by whether the upstream task produces the dependency data table required by the second task. Therefore, the dependency between the second task and the upstream task is generated by the data table. relationship, with The task itself is irrelevant.
  • task E When the electronic device obtains task E and needs to execute it at 2022/03/02 00:00:00, it will check whether table a is between 2022/02/28 00:00:00-2022/02/28 23:00:00 There is a success status every hour during the period. Check whether data table b has a success status on the date 2022/03/01 00:00:00. If the conditions are met, task E will be executed immediately.
  • S202 may be included before S103, and S202 may be executed after S102 and before S103.
  • the electronic device polls N second tasks; according to the second dependency data table of each second task, the upstream task of each second task can be determined.
  • the second dependency data table may be used to indicate the data table produced by the upstream task on which each second task depends during execution.
  • the second dependent data table of the current second task can be determined, and the task that produces the second dependent data table of the second task can be determined to be the upstream task of the current second task.
  • the upstream task based on the task configuration information.
  • the electronic device detects whether the upstream task of each second task has produced a data table; when it is detected that the upstream task of the i-th second task has produced a data table, determines whether the i-th second task has produced a data table. The task's upstream task execution is completed.
  • the electronic device will store the completed information.
  • the electronic device needs to query the information of the upstream task required for the i-th second task, the stored information can be retrieved.
  • the execution of the upstream task is completed. .
  • the table DATASET_STATUE in Figure 4 can be used to store information about successful execution.
  • the electronic device can query the information about whether the task is successfully executed by querying the table DATASET_STATUE.
  • S203 may also be included after S102, and S203 may be executed after S102 and before S103.
  • S203 may be included after S102, and S202 may be executed after S203.
  • the electronic device can register an execution trigger for each second task, and the execution trigger is configured to execute the corresponding second task.
  • the second task execution is triggered.
  • the second execution parameter can be used to determine the execution time of the second task. After determining the execution time, the electronic device registers an execution trigger with the execution time for each second task. When the execution time is reached, the second task execution can be electronically triggered.
  • the execution time of the second task may be determined by the time it takes for the upstream task to generate the dependency data table of the second task. Therefore, the second execution parameters may include dependency granularity and dependency time offset.
  • Figure 5 is another schematic diagram of the structure of a system table in an embodiment of the present disclosure.
  • the information of each second task execution trigger can be recorded in the trigger table (TRIGGER).
  • the information recorded in table TRIGGER includes task name, task execution date and status (to determine whether it is successful).
  • the electronic device can obtain the information of the second task corresponding to the current trigger by querying the TRIGGER table.
  • S204 may also be included after S203, and S204 may be executed after S203 and before S103.
  • S202 may be included before S204, S202 may be executed after S203, and S204 may be executed after S202.
  • the electronic device can trigger the execution trigger corresponding to the i-th second task.
  • the electronic device can obtain whether all the upstream tasks of the second task in the trigger have a success status. If the upstream tasks of the second task have been successfully executed, the execution of the current task will be triggered immediately.
  • the fact that all the upstream tasks of the second task have been executed successfully means that the dependent data tables of the second task have been produced at the current moment.
  • the electronic device obtains all triggers in the to-be-triggered state at intervals of 5 seconds, and then checks the corresponding tasks that need to be performed in the triggers.
  • the trigger obtains the upstream task of the corresponding task and queries the status of the upstream task of the corresponding task. If the upstream tasks have produced dependent data tables, the corresponding task in the trigger is immediately triggered to execute.
  • the first task can be obtained, and when the execution of the first task is completed, N second tasks are determined, where the second tasks are downstream tasks of the first task.
  • the upstream task of the i-th second task among the N second tasks is completed, the i-th second task can be executed.
  • the downstream tasks in the present disclosure can be executed when the execution of the upstream tasks is determined to be completed, without waiting for a fixed execution time, which effectively reduces the delay between the upstream and downstream tasks.
  • N second tasks downstream tasks
  • register execution triggers for N second tasks.
  • the execution trigger can be triggered to execute the i-th second task.
  • Electronic devices can pick up tasks in triggers and execute them at very low intervals.
  • the delay in scheduling upstream and downstream tasks is maintained at the second level, which can effectively reduce the delay between upstream and downstream tasks.
  • the output data table is directly used as the upstream, and the intensity and dependency time offset are configured according to the current task situation. It avoids the problems of difficult configuration and configuration errors caused by inconsistent execution time and output time when using tasks as upstream.
  • FIG. 6 is a schematic flowchart of another implementation of the task scheduling method in an embodiment of the present disclosure. As shown in Figure 6, it includes:
  • the electronic device parses the task, configures the dependency information and execution parameters of the task, and enters S602;
  • the electronic device can perform syntax analysis on the user's SQL task through a syntax parsing tool. Obtaining the task from the parsing result requires relying on the data table and output data table and supplementing the configuration information of the task; secondly, the electronic device can perform syntax analysis based on the user's SQL. Usage, configure the execution parameters of the task.
  • the electronic device can store the dependency information and execution parameters of the task and enter S603;
  • the electronic device can enter the dependency information and execution parameters of the task into the table RELATION.
  • the electronic device can obtain all upstream tasks and determine the task execution status, and enter S604;
  • the electronic device can obtain all tasks without upstream data at 5s intervals, and determine whether the task's output data table has success information under the data output date corresponding to the current system time.
  • the electronic device can obtain all unexecuted upstream tasks and trigger the execution of the tasks, entering S605;
  • the electronic device can filter out the data tables for which no successful information is found, obtain the corresponding tasks through the RELATION table, and trigger the execution of tasks corresponding to the current system time.
  • the electronic device can register an execution trigger for the downstream task after the execution of the upstream task is completed, and enter S606;
  • the electronic device can enter the corresponding information on the success of the output data into the table DATASET_STATUE.
  • the electronic device can find the downstream tasks of the task through the RELATION table, register execution triggers for the downstream tasks, and store the execution trigger information in the TRIGGER table.
  • the electronic device queries the execution trigger, determines that all corresponding upstream tasks are successful, and enters S607;
  • the electronic device can obtain all execution triggers in the to-be-triggered status of the table TRIGGER at intervals of 5 seconds, and check that the upstream tasks of the corresponding tasks in the execution triggers have successful status in the table DATASET_STATUE.
  • the electronic device queries the table DATASET_STATUE to show that the upstream dependent table has been successful, immediately executes the current task, and changes the table TRIGGER to successful.
  • the electronic device can perform all tasks and implement task scheduling by repeating S604 to S607.
  • the electronic device can obtain the downstream task through the upstream task and register an execution trigger for each downstream task. Continuously poll the execution trigger and query the upstream of the task corresponding to the execution trigger Whether the task is completed. If all upstream tasks are completed, downstream tasks are executed. It can be seen that after each task in this disclosure is executed, the downstream task can be quickly found and an execution trigger is registered for it through the data lineage maintained in the database. The electronic device obtains the task in the execution trigger and executes it at extremely low intervals. During the entire process, the delay in scheduling upstream and downstream tasks is maintained at the second level, achieving low latency in task scheduling. Furthermore, the output data table is directly used as the upstream, and the intensity and dependent time offset are configured according to the current task situation. It avoids the problems of difficult configuration and configuration errors caused by inconsistent execution time and output time when using tasks as upstream.
  • embodiments of the present disclosure also provide a task scheduling device.
  • the task scheduling device may be a chip or a system-on-chip in an electronic device, or may be used in an electronic device to implement the above-mentioned embodiments.
  • the task scheduling device can realize the functions performed by the electronic devices in the above embodiments, and these functions can be realized by hardware executing corresponding software. These hardware or software include one or more modules corresponding to the above functions.
  • Figure 7 is a schematic structural diagram of a task scheduling device in an embodiment of the present disclosure.
  • the task scheduling device 700 may include: an acquisition module 701 configured to acquire the first task; determine Module 702, the determination module 702 is configured to determine N second tasks when the execution of the first task is completed, the second tasks are downstream tasks of the first task, and N is a positive integer; the execution module 703, the execution module 703 is It is configured to execute the i-th second task when the upstream task of the i-th second task among the N second tasks is completed.
  • the acquisition module 701 is further configured to: after acquiring the first task, execute the first task according to the first dependency data table and the first execution parameters of the first task; the first dependency data table indicates The data table produced by the upstream task that the first task relies on when executing.
  • the determination module 702 is further configured to: when the execution of the first task is completed, obtain the first output data table, and the first output data table is the data generated when the execution of the first task is completed. table; among the downstream tasks of the first task, N downstream tasks that depend on the first output data table are determined as N second tasks.
  • the first execution parameter includes at least one of the following: execution granularity, dependency granularity, dependency time offset, and output time offset; wherein the execution granularity is used to represent the execution of the first task.
  • execution granularity is used to represent the execution of the first task.
  • Period dependency granularity is used to represent the period of the first dependency data table that the first task relies on when executing
  • dependency time offset is used to represent the time between the execution time of the first task and the output time of the first dependency data table.
  • the offset value; the output time offset is used to represent the offset value between the execution time of the first task and the output time of the first output data table.
  • the execution period includes at least one of the following: one month, one week, one day, and one hour.
  • the execution module 703 is further configured to: when the upstream task of the i-th second task among the N second tasks is completed, before executing the i-th second task, poll N second tasks; according to the second dependency data table of each second task, determine the upstream task of each second task, where the second dependency data
  • the table is a data table used to indicate the output of the upstream task that each second task depends on when executing.
  • the execution module 703 is further configured to: after determining the upstream task of each second task according to the second dependency data table of each second task, detect the upstream task of each second task Whether to generate a data table; when it is detected that the upstream task of the i-th second task has generated a data table, it is determined that the upstream task of the i-th second task has been executed.
  • the execution module 703 is further configured to: after determining the N second tasks when the execution of the first task is completed, perform a second execution of the second task according to each of the N second tasks. Parameter, register an execution trigger for each second task.
  • the execution trigger is configured to trigger the execution of the second task when the execution time of the corresponding second task arrives; when the i-th second task among the N second tasks
  • executing the i-th second task includes: when the execution of the upstream task of the i-th second task is completed, triggering the execution trigger corresponding to the i-th second task.
  • FIG. 8 is a schematic structural diagram of an electronic device in an embodiment of the present disclosure. As shown in FIG. 8 , the electronic device 800 can use general computer hardware, including a processor 801 and a memory 802 .
  • the at least one processor may constitute any physical device having circuitry that performs logical operations on one or more inputs.
  • at least one processor may include one or more integrated circuits (ICs), including application specific integrated circuits (ASICs), microchips, microcontrollers, microprocessors, all or part of a central processing unit (CPU), graphics processing unit unit (GPU), digital signal processor (DSP), field programmable gate array (FPGA), or other circuitry suitable for executing instructions or performing logical operations.
  • ICs integrated circuits
  • ASICs application specific integrated circuits
  • microcontrollers microprocessors, all or part of a central processing unit (CPU), graphics processing unit unit (GPU), digital signal processor (DSP), field programmable gate array (FPGA), or other circuitry suitable for executing instructions or performing logical operations.
  • Instructions for execution by the at least one processor may, for example, be preloaded into memory integrated with or embedded in the controller, or may be stored in separate memory.
  • Memory may include random access memory (RAM), read only memory (ROM), hard disk, optical disk, magnetic media, flash memory, other permanent, fixed or volatile memory, or any other mechanism capable of storing instructions.
  • at least one processor may include more than one processor. Each processor may have a similar structure, or the processors may have different configurations that are electrically connected or disconnected from each other. For example, the processor may be a separate circuit or integrated in a single circuit. When more than one processor is used, the processors may be configured to operate independently or cooperatively. Processors may be coupled electrically, magnetically, optically, acoustically, mechanically, or by other means that allow them to interact.
  • the present invention also provides a computer-readable storage medium on which computer instructions are stored, and the instructions are used by a processor to execute the steps of the method for scheduling tasks.
  • Memory 802 may include computer storage media in the form of volatile and/or non-volatile memory, such as read-only memory and/or random access memory. Memory 802 may store operating systems, application programs, other program modules, executable code, program data, user Data etc.
  • the above-mentioned memory 802 stores computer execution instructions for implementing the functions of the acquisition module 701, the determination module 702 and the execution module 703 in Figure 7.
  • the functions/implementation processes of the acquisition module 701, the determination module 702 and the execution module 703 in Figure 7 can all be implemented by the processor 801 in Figure 8 calling the computer execution instructions stored in the memory 802.
  • the processor 801 in Figure 8 calling the computer execution instructions stored in the memory 802.
  • Embodiments of the present disclosure provide a task scheduling method, apparatus and equipment.
  • the technical solution provided by the embodiments of the present disclosure obtains the first task and determines N second tasks when the execution of the first task is completed, where the second tasks are downstream tasks of the first task.
  • the upstream task of the i-th second task among the N second tasks is completed, the i-th second task is executed.
  • the downstream tasks in the present disclosure can trigger execution when the execution of the upstream tasks is determined to be completed, without waiting for a fixed execution time, which effectively reduces the delay between the upstream and downstream tasks.
  • the task scheduling methods, devices and equipment provided by the embodiments of the present disclosure are reproducible and can be used in a variety of industrial applications.
  • the task scheduling method, apparatus and equipment provided by the embodiments of the present disclosure can be used in the field of big data analysis technology, such as the field of task scheduling in big data analysis.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

A task scheduling method and apparatus, and a device, which can be applied to the technical field of big data analysis. The task scheduling method comprises: acquiring a first task (S101); when the execution of the first task is completed, determining N second tasks, wherein each second task is a downstream task of the first task (S102); and when the execution of an upstream task of an ith second task among the N second tasks is completed, executing the ith second task (S103). The execution of a downstream task can be triggered once it is determined that the execution of an upstream task is completed without the need to wait for a fixed execution time, thereby effectively shortening a delay between the upstream task and the downstream task.

Description

任务调度的方法、装置和设备Task scheduling methods, devices and equipment
相关申请的交叉引用Cross-references to related applications
本公开要求于2022年5月6日提交中国国家知识产权局的申请号为202210488671.1、名称为“任务调度的方法、装置和设备”的中国专利申请的优先权,其全部内容通过引用结合在本公开中。This disclosure claims priority to the Chinese patent application with application number 202210488671.1 and titled "Method, Device and Equipment for Task Scheduling" filed with the State Intellectual Property Office of China on May 6, 2022, the entire content of which is incorporated herein by reference. Public.
技术领域Technical field
本公开涉及大数据分析技术领域,尤其涉及一种任务调度的方法、装置和设备。The present disclosure relates to the technical field of big data analysis, and in particular to a task scheduling method, device and equipment.
背景技术Background technique
目前,随着互联网的高速发展,需要借助网络运行的任务种类日益繁多。在大数据分析技术领域,多个任务之间存在复杂的依赖关系,就要求必须使用任务调度对任务进行管理。At present, with the rapid development of the Internet, the types of tasks that require the help of the network are becoming increasingly diverse. In the field of big data analysis technology, there are complex dependencies between multiple tasks, which requires the use of task scheduling to manage tasks.
传统的任务调度主要采用周期定时执行、以任务作为上游依赖的调度方式。由于用户在配置任务时,几乎不能给出一个最合适的定时执行时间。这就导致了传统的任务调度方法存在高延迟的问题。Traditional task scheduling mainly uses periodic scheduled execution and tasks as upstream dependencies. Because users can hardly give the most appropriate scheduled execution time when configuring tasks. This leads to the problem of high latency in traditional task scheduling methods.
发明内容Contents of the invention
本公开的实施方式提供了一种任务调度的方法、装置和设备,以实现例如降低上下游任务之间的延迟的效果。Embodiments of the present disclosure provide a task scheduling method, apparatus and equipment to achieve effects such as reducing delays between upstream and downstream tasks.
第一方面,本公开的实施方式提供一种任务调度的方法,包括:获取第一任务;当第一任务执行完成时,确定N个第二任务,第二任务为第一任务的下游任务,N为正整数;当N个第二任务中的第i个第二任务的上游任务执行完成时,执行第i个第二任务。In a first aspect, embodiments of the present disclosure provide a task scheduling method, including: obtaining a first task; when the execution of the first task is completed, determining N second tasks, and the second tasks are downstream tasks of the first task, N is a positive integer; when the upstream task of the i-th second task among the N second tasks is completed, the i-th second task is executed.
在一些可能的实施方式中,在获取第一任务之后,方法还可以包括:根据第一任务的第一依赖数据表和第一执行参数,执行第一任务;第一依赖数据表指示第一任务在执行时所依赖的上游任务产出的数据表。In some possible implementations, after obtaining the first task, the method may further include: executing the first task according to the first dependency data table and the first execution parameter of the first task; the first dependency data table indicates the first task The data table produced by the upstream tasks that it depends on during execution.
在一些可能的实施方式中,当第一任务执行完成时,确定N个第二任务,包括:当第一任务执行完成时,获得第一产出数据表,第一产出数据表为第一任务在执行完成时产出的数据表;将第一任务的下游任务中,依赖于第一产出数据表的N个下游任务确定为N个第二任务。In some possible implementations, when the execution of the first task is completed, determining N second tasks includes: when the execution of the first task is completed, obtaining the first output data table, and the first output data table is the first The data table produced when the task is completed; among the downstream tasks of the first task, N downstream tasks that depend on the first output data table are determined as N second tasks.
在一些可能的实施方式中,第一执行参数可以至少包括以下之一:执行粒度、依赖粒度、依赖时间偏移量和产出时间偏移量;其中,执行粒度用于表示执行第一任务的执行周期;依赖粒度用于表示第一任务在执行时所依赖第一依赖数据表的周期;依赖时间偏移量用于表示执行第一任务的执行时刻与第一依赖数据表的产出时刻之间的偏移值;产出时间 偏移量用于表示执行第一任务的执行时刻与第一任务产出数据表的产出时刻之间的偏移值。In some possible implementations, the first execution parameter may include at least one of the following: execution granularity, dependency granularity, dependency time offset, and output time offset; where execution granularity is used to represent execution of the first task. Execution cycle; dependency granularity is used to represent the cycle of the first dependency data table that the first task relies on when executing; dependency time offset is used to represent the difference between the execution time of the first task and the output time of the first dependency data table. offset value between The offset is used to represent the offset value between the execution time of the first task and the output time of the first task output data table.
在一些可能的实施方式中,执行周期可以至少包括以下之一:一个月、一周、一天以及一小时。In some possible implementations, the execution period may include at least one of the following: one month, one week, one day, and one hour.
在一些可能的实施方式中,在当N个第二任务中的第i个第二任务的上游任务执行完成时,执行第i个第二任务之前,方法还包括:轮询N个第二任务;根据每一个第二任务的第二依赖数据表,确定每一个第二任务的上游任务,其中,第二依赖数据表用于指示每一个第二任务在执行时所依赖的上游任务产出的数据表。In some possible implementations, when the upstream task of the i-th second task among the N second tasks is completed, before executing the i-th second task, the method further includes: polling the N second tasks ; Determine the upstream task of each second task according to the second dependency data table of each second task, where the second dependency data table is used to indicate the output of the upstream task on which each second task depends during execution. data sheet.
在一些可能的实施方式中,在根据每一个第二任务的第二依赖数据表,确定每一个第二任务的上游任务之后,方法还包括:检测每一个第二任务的上游任务是否产出数据表;当检测到第i个第二任务的上游任务已产出数据表时,确定第i个第二任务的上游任务执行完成。In some possible implementations, after determining the upstream task of each second task according to the second dependency data table of each second task, the method further includes: detecting whether the upstream task of each second task produces data. table; when it is detected that the upstream task of the i-th second task has produced a data table, it is determined that the upstream task of the i-th second task has been executed.
在一些可能的实施方式中,在当第一任务执行完成时,确定N个第二任务之后,方法还可以包括:根据N个第二任务中的每一个第二任务的第二执行参数,为每一个第二任务注册执行触发器,执行触发器被配置成在对应的第二任务的执行时刻到达时触发第二任务执行;当N个第二任务中的第i个第二任务的上游任务执行完成时,执行第i个第二任务,包括:当第i个第二任务的上游任务执行完成时,触发第i个第二任务对应的执行触发器。In some possible implementations, after the N second tasks are determined when the execution of the first task is completed, the method may further include: according to the second execution parameters of each of the N second tasks, Each second task registers an execution trigger, and the execution trigger is configured to trigger the execution of the second task when the execution time of the corresponding second task arrives; when the upstream task of the i-th second task among the N second tasks When the execution is completed, executing the i-th second task includes: when the execution of the upstream task of the i-th second task is completed, triggering the execution trigger corresponding to the i-th second task.
在一些可能的实施方式中,根据所述第一任务的第一依赖数据表和第一执行参数,执行所述第一任务包括:根据所述第一任务的产出数据表在当前系统时间对应的数据产出时刻下是否有成功信息,如没有成功,可以将所述第一任务筛选出来,触发执行当前系统时间对应的时刻的任务的执行。In some possible implementations, according to the first dependency data table and the first execution parameter of the first task, executing the first task includes: corresponding to the current system time according to the output data table of the first task. Whether there is success information at the data output time. If there is no success, the first task can be filtered out to trigger the execution of the task at the time corresponding to the current system time.
在一些可能的实施方式中,在确定所述第一任务的下游任务时,调取数据库存储的配置信息,并基于所述配置信息中的依赖信息来确定所述第一任务的下游任务。In some possible implementations, when determining the downstream tasks of the first task, the configuration information stored in the database is retrieved, and the downstream tasks of the first task are determined based on the dependency information in the configuration information.
在一些可能的实施方式中,在所述第一任务执行完成后,将所述第一任务执行成功的信息存储到任务状态表中,其中,所述任务状态表中的信息包括任务名、数据产出日期、状态。In some possible implementations, after the execution of the first task is completed, information indicating the successful execution of the first task is stored in a task status table, where the information in the task status table includes task name, data Production date and status.
第二方面,本公开的实施方式提供一种任务调度的装置,该装置可以为电子设备中的芯片或者片上系统,还可以为电子设备中被配置成实现第一方面及其任一种可能的实施方式方法的功能模块。该任务调度的装置可以实现第一方面及其任一种可能的实施方式电子设备所执行的功能,功能可以通过硬件执行相应的软件实现。硬件或软件包括一个或多个上述功能相应的模块。该任务调度的装置包括:获取模块,该获取模块被配置成获取第一任务;确定模块,该确定模块被配置成当第一任务执行完成时,确定N个第二任务,第二 任务为第一任务的下游任务,N为正整数;执行模块,该执行模块被配置成当N个第二任务中的第i个第二任务的上游任务执行完成时,执行第i个第二任务。In a second aspect, embodiments of the present disclosure provide a device for task scheduling. The device may be a chip or a system-on-chip in an electronic device. It may also be configured in an electronic device to implement the first aspect and any of its possibilities. Functional modules that implement the method. The task scheduling device can realize the functions performed by the electronic device in the first aspect and any of its possible implementation modes, and the functions can be realized by hardware executing corresponding software. Hardware or software includes one or more modules corresponding to the above functions. The device for task scheduling includes: an acquisition module configured to acquire a first task; a determination module configured to determine N second tasks when the execution of the first task is completed. The task is a downstream task of the first task, and N is a positive integer; the execution module is configured to execute the i-th second task when the upstream task of the i-th second task among the N second tasks is completed. Task.
在一些可能的实施方式中,获取模块还被配置成:在获取第一任务之后,根据第一任务的第一依赖数据表和第一执行参数,执行第一任务;第一依赖数据表指示第一任务在执行时所依赖的上游任务产出的数据表。In some possible implementations, the acquisition module is further configured to: after acquiring the first task, execute the first task according to the first dependency data table and the first execution parameter of the first task; the first dependency data table indicates the first A data table produced by upstream tasks that a task depends on during execution.
在一些可能的实施方式中,确定模块还被配置成:当第一任务执行完成时,获得第一产出数据表,第一产出数据表为第一任务在执行完成时产出的数据表;将第一任务的下游任务中,依赖于第一产出数据表的N个下游任务确定为N个第二任务。In some possible implementations, the determination module is further configured to: when the execution of the first task is completed, obtain the first output data table, and the first output data table is the data table generated when the execution of the first task is completed. ; Among the downstream tasks of the first task, N downstream tasks that depend on the first output data table are determined as N second tasks.
在一些可能的实施方式中,第一执行参数可以至少包括以下之一:执行粒度、依赖粒度、依赖时间偏移量和产出时间偏移量;其中,执行粒度用于表示执行第一任务的执行周期;依赖粒度用于表示第一任务在执行时所依赖第一依赖数据表的周期;依赖时间偏移量用于表示执行第一任务的执行时刻与第一依赖数据表的产出时刻之间的偏移值;产出时间偏移量用于表示执行第一任务的执行时刻与第一任务产出数据表的产出时刻之间的偏移值。In some possible implementations, the first execution parameter may include at least one of the following: execution granularity, dependency granularity, dependency time offset, and output time offset; where execution granularity is used to represent execution of the first task. Execution cycle; dependency granularity is used to represent the cycle of the first dependency data table that the first task relies on when executing; dependency time offset is used to represent the difference between the execution time of the first task and the output time of the first dependency data table. The offset value between; the output time offset is used to represent the offset value between the execution time of the first task and the output time of the first task output data table.
在一些可能的实施方式中,执行周期至少包括以下之一:一个月、一周、一天以及一小时。In some possible implementations, the execution period includes at least one of the following: one month, one week, one day, and one hour.
在一些可能的实施方式中,执行模块还被配置成:在当N个第二任务中的第i个第二任务的上游任务执行完成时,执行第i个第二任务之前,轮询N个第二任务;根据每一个第二任务的第二依赖数据表,确定每一个第二任务的上游任务,其中,第二依赖数据表用于指示每一个第二任务在执行时所依赖的上游任务产出的数据表。In some possible implementations, the execution module is further configured to: when the upstream task of the i-th second task among the N second tasks is completed, before executing the i-th second task, poll the N Second task; determine the upstream tasks of each second task according to the second dependency data table of each second task, where the second dependency data table is used to indicate the upstream tasks on which each second task depends during execution. Output data table.
在一些可能的实施方式中,执行模块还被配置成:在根据每一个第二任务的第二依赖数据表,确定每一个第二任务的上游任务之后,检测每一个第二任务的上游任务是否产出数据表;当检测到第i个第二任务的上游任务已产出数据表时,确定第i个第二任务的上游任务执行完成。In some possible implementations, the execution module is further configured to: after determining the upstream task of each second task according to the second dependency data table of each second task, detect whether the upstream task of each second task Produce a data table; when it is detected that the upstream task of the i-th second task has produced a data table, it is determined that the upstream task of the i-th second task has been executed.
在一些可能的实施方式中,执行模块还被配置成:在当第一任务执行完成时,确定N个第二任务之后,根据N个第二任务中的每一个第二任务的第二执行参数,为每一个第二任务注册执行触发器,执行触发器被配置成在对应的第二任务的执行时刻到达时触发第二任务执行;当N个第二任务中的第i个第二任务的上游任务执行完成时,执行第i个第二任务,包括:当第i个第二任务的上游任务执行完成时,触发第i个第二任务对应的执行触发器。In some possible implementations, the execution module is further configured to: after determining the N second tasks when the execution of the first task is completed, according to the second execution parameter of each second task in the N second tasks , register an execution trigger for each second task, and the execution trigger is configured to trigger the execution of the second task when the execution time of the corresponding second task arrives; when the i-th second task among the N second tasks When the execution of the upstream task is completed, executing the i-th second task includes: when the execution of the upstream task of the i-th second task is completed, triggering the execution trigger corresponding to the i-th second task.
第三方面,本公开的实施方式提供一种电子设备,该电子设备可以包括:被配置成存储处理器可执行指令的存储器;处理器;其中,处理器被配置为:用于执行可执行指令时, 以实现如第一方面及其任一可能的实施方式所述的方法。In a third aspect, embodiments of the present disclosure provide an electronic device, which may include: a memory configured to store processor-executable instructions; a processor; wherein the processor is configured to: execute the executable instructions hour, To implement the method described in the first aspect and any possible implementation manner thereof.
第四方面,本公开的实施方式提供一种计算机可读存储介质,该计算机可读存储介质存储有计算机可执行指令,计算机可执行指令被处理器执行后能够实现如第一方面及其任一种可能的实施方式所述的方法。In a fourth aspect, embodiments of the present disclosure provide a computer-readable storage medium that stores computer-executable instructions. After the computer-executable instructions are executed by a processor, the computer-readable storage medium can implement the first aspect and any one thereof. possible implementation methods.
本公开的实施方式提供的技术方案与现有技术相比至少能够实现如下有益效果:Compared with the existing technology, the technical solution provided by the embodiments of the present disclosure can at least achieve the following beneficial effects:
在本公开中,通过获取第一任务,在第一任务执行完成时,确定N个第二任务,其中,第二任务为第一任务的下游任务。当N个第二任务中的第i个第二任务的上游任务执行完成时,执行第i个第二任务。可见,本公开中的下游任务在确定上游任务执行完成时就能够触发执行,无须等到固定执行时间,有效降低了上下游任务间的延迟。In the present disclosure, by acquiring the first task, when the execution of the first task is completed, N second tasks are determined, where the second tasks are downstream tasks of the first task. When the upstream task of the i-th second task among the N second tasks is completed, the i-th second task is executed. It can be seen that the downstream tasks in the present disclosure can trigger execution when the execution of the upstream tasks is determined to be completed, without waiting for a fixed execution time, which effectively reduces the delay between the upstream and downstream tasks.
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本公开的保护范围。It should be understood that the above general description and the following detailed description are exemplary and explanatory only, and do not limit the scope of the present disclosure.
附图说明Description of the drawings
图1为本公开实施例中的任务调度的方法的一种实施流程示意图;Figure 1 is a schematic flowchart of an implementation of a task scheduling method in an embodiment of the present disclosure;
图2为本公开实施例中的任务调度的方法的另一种实施流程示意图;Figure 2 is a schematic flowchart of another implementation of a task scheduling method in an embodiment of the present disclosure;
图3为本公开实施例中的任务配置的结构的一种示意图;Figure 3 is a schematic diagram of the structure of task configuration in an embodiment of the present disclosure;
图4为本公开实施例中的系统用表的结构的一种示意图;Figure 4 is a schematic diagram of the structure of a system meter in an embodiment of the present disclosure;
图5为本公开实施例中的系统用表的结构的另一种示意图;Figure 5 is another schematic diagram of the structure of a system meter in an embodiment of the present disclosure;
图6为本公开实施例中的任务调度的方法的又一种实施流程示意图;Figure 6 is a schematic flowchart of another implementation of a task scheduling method in an embodiment of the present disclosure;
图7为本公开实施例中的一种任务调度的装置的结构示意图;Figure 7 is a schematic structural diagram of a task scheduling device in an embodiment of the present disclosure;
图8为本公开实施例中的一种电子设备的结构示意图。FIG. 8 is a schematic structural diagram of an electronic device in an embodiment of the present disclosure.
具体实施方式Detailed ways
以下描述中,为了说明而不是为了限定,提出了诸如特定系统结构、技术之类的具体细节,以便透彻理解本发明实施例。然而,本领域的技术人员应当清楚,在没有这些具体细节的其它实施例中也可以实现本发明。在其它情况中省略对众所周知的系统、装置、电路以及方法的详细说明,以免不必要的细节妨碍本发明的描述。In the following description, specific details such as specific system structures and technologies are provided for the purpose of illustration rather than limitation, so as to provide a thorough understanding of the embodiments of the present invention. However, it will be apparent to those skilled in the art that the present invention may be practiced in other embodiments without these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.
为了说明本公开所述的技术方案,下面通过具体实施例来进行说明。In order to illustrate the technical solutions described in the present disclosure, specific examples will be described below.
目前,随着互联网的高速发展,需要借助网络运行的任务种类日益繁多。尤其是在大数据领域,越来越多的企业开始将数据作为关注的重点。企业通常会利用服务器集群的强大计算力来得到各种各样的数据报表,从而通过这些报表,直观地对相关业务进行认识和理解。At present, with the rapid development of the Internet, the types of tasks that require the help of the network are becoming increasingly diverse. Especially in the field of big data, more and more companies are beginning to focus on data. Enterprises usually use the powerful computing power of server clusters to obtain various data reports, so that they can intuitively understand and understand related businesses through these reports.
随着网络数据量的增加,数据分析需求也在增加。例如,当某个业务要求每天都要进行各个维度的大数据分析时,就会产生大量的数据分析任务。因为每个数据处理任务执行 时间的长短受限于变化的计算集群资源、处理数据量等因素,所以无法预估出一个任务实际需要多少时间。当用户在配置任务时,就不能给出一个最合适的定时执行时间。进而导致存在上游任务执行完毕时,下游任务却因没到达定时执行时间而不得不等待,致使上下游任务之间存在很高的延迟。As the amount of network data increases, so does the need for data analysis. For example, when a business requires big data analysis in various dimensions every day, a large number of data analysis tasks will be generated. Because each data processing task performs The length of time is limited by factors such as changing computing cluster resources and the amount of data processed, so it is impossible to estimate how much time a task will actually take. When users configure tasks, they cannot give the most appropriate scheduled execution time. As a result, when the upstream task completes execution, the downstream task has to wait because the scheduled execution time has not been reached, resulting in a high delay between the upstream and downstream tasks.
示例性的,假设任务A,每天定时执行一次,产出数据a。但是由于任务A实际执行的数据结构化查询语言(structured query language,SQL)产生的是T-2的数据(T-2代表当前周期计算的是前2个周期的数据),即任务A执行1月3号的任务时,其实计算产出的是1月1号的数据。当用户不了解任务A的具体执行内容时,会认为任务A每日执行就应该产出的是当日(T-0)数据。因此,在配置下游任务B时,会配置以任务A执行时间关联的依赖时间范围,导致任务B每天都在空跑数据,因为T-0的数据a总是要晚2天之后才会产出。For example, assume that task A is executed regularly every day and produces data a. However, because the structured query language (SQL) actually executed by Task A generates T-2 data (T-2 represents the data of the previous two periods calculated in the current cycle), that is, Task A executes 1 For the task on January 3rd, the calculated output is actually the data on January 1st. When users do not understand the specific execution content of task A, they will think that the daily execution of task A should produce the data of that day (T-0). Therefore, when configuring downstream task B, a dependent time range associated with the execution time of task A will be configured, causing task B to run empty data every day, because data a of T-0 will always be generated 2 days later. .
所以,现有的任务调动方法存在因用户配置而导致的任务高延迟的问题。Therefore, existing task scheduling methods have the problem of high task delay due to user configuration.
为了解决上述问题,本公开实施例提供一种任务调度的方法,应用于大数据分析技术领域。该方法各步骤的执行主体可以是具备计算和处理能力的电子设备。在一个实施例中,电子设备可以是终端,例如,手机、平板电脑、智能可穿戴设备等;在另一个实施例中,电子设备可以是服务器,服务器可以是一台服务器,也可以是由多台服务器组成的服务器集群,还可以是云服务器,本公开实施例对此不作限定。In order to solve the above problems, embodiments of the present disclosure provide a task scheduling method, which is applied in the field of big data analysis technology. The execution subject of each step of the method may be an electronic device with computing and processing capabilities. In one embodiment, the electronic device can be a terminal, such as a mobile phone, a tablet computer, a smart wearable device, etc.; in another embodiment, the electronic device can be a server, and the server can be one server or multiple servers. A server cluster composed of multiple servers may also be a cloud server, which is not limited in the embodiments of the present disclosure.
图1为本公开实施例中的任务调度的方法的一种实施流程示意图,参见图1所示,该任务调度的方法可以包括S101至S103。FIG. 1 is a schematic flowchart of an implementation of a task scheduling method in an embodiment of the present disclosure. Referring to FIG. 1 , the task scheduling method may include S101 to S103.
S101,电子设备获取第一任务。S101, the electronic device acquires the first task.
应理解的,第一任务可以为任意一个完成配置的任务;或者,第一任务可以为多个完成配置的任务。其中,配置任务的过程可以由用户在专门信息配置页面输入而完成;或者,配置任务的过程可以由电子设备自行设置而完成。配置任务的过程还可以由其它方式而完成,本公开对此不作具体限定。It should be understood that the first task can be any task that completes the configuration; or, the first task can be multiple tasks that complete the configuration. The process of configuring the task can be completed by the user inputting on a special information configuration page; or the process of configuring the task can be completed by the electronic device setting itself. The process of configuring tasks can also be completed in other ways, and this disclosure does not specifically limit this.
需要说明的是,配置任务的过程可以包括:配置任务的依赖关系和配置任务的执行参数。其中,依赖关系可以用于表示任务之间的执行依赖,例如,若任务A的执行需要使用任务B的执行结果,那么,任务A和任务B的依赖关系可以为任务A的上游任务为任务B。执行参数可以用于表示当前任务在执行时所遵循的规则,例如,任务A执行时遵循每周执行一次的规则,使每周执行一次为任务A的执行参数。It should be noted that the process of configuring a task may include: configuring dependencies of the task and configuring execution parameters of the task. Among them, dependency relationships can be used to represent execution dependencies between tasks. For example, if the execution of task A requires the use of the execution results of task B, then the dependency relationship between task A and task B can be that the upstream task of task A is task B. . Execution parameters can be used to represent the rules that the current task follows when executing. For example, when task A is executed, it follows the rule of executing once a week, so that executing once a week is the execution parameter of task A.
应理解的,任务配置完成后,电子设备会将任务与任务的配置信息(依赖关系和执行参数)进行存储,以便后续的调取。It should be understood that after the task configuration is completed, the electronic device will store the task and its configuration information (dependencies and execution parameters) for subsequent retrieval.
在一些可能的实施方式中,S101之后还可以包括S201,S201可以在S101执行之后, 且在S102执行之前执行。图2为本公开实施例中的任务调度的方法的一种实施流程示意图。In some possible implementations, S201 may also be included after S101, and S201 may be executed after S101, And executed before S102 is executed. Figure 2 is a schematic flowchart of an implementation of a task scheduling method in an embodiment of the present disclosure.
S201,电子设备可以根据第一任务的第一依赖数据表和第一执行参数,执行第一任务。S201. The electronic device can execute the first task according to the first dependency data table and the first execution parameter of the first task.
其中,第一依赖数据表可以指示第一任务在执行时所依赖的上游任务产出的数据表。The first dependency data table may indicate a data table produced by an upstream task on which the first task depends during execution.
应理解的,大数据场景下,根据任务配置信息,可以获取任务执行时所需的上游数据和执行任务的产出数据。其中,上游数据是指执行当前任务所需查询的数据表,产出数据是指当前任务执行完成后,将数据最终写入的数据表。上游数据可以允许是多个,产出数据只允许一个。每一个任务都能够产出数据,得到产出的数据表(可以理解为产出数据表)。该产出的数据表可以作为下游任务的上游数据(可以理解为依赖数据表)。It should be understood that in the big data scenario, according to the task configuration information, the upstream data required for task execution and the output data of the task execution can be obtained. Among them, the upstream data refers to the data table required to execute the current task, and the output data refers to the data table where the data is finally written after the current task is completed. Multiple upstream data are allowed, but only one output data is allowed. Each task can produce data and obtain an output data table (which can be understood as an output data table). The output data table can be used as the upstream data of downstream tasks (can be understood as a dependent data table).
由上述可知,电子设备可以通过S101获取第一任务,进而获得第一任务的配置信息。执行S201,通过第一任务的配置信息,获得第一任务的第一依赖数据表和第一执行参数。根据第一依赖数据表和第一执行参数,执行第一任务。It can be seen from the above that the electronic device can obtain the first task through S101, and then obtain the configuration information of the first task. Execute S201 to obtain the first dependency data table and first execution parameters of the first task through the configuration information of the first task. Execute the first task according to the first dependency data table and the first execution parameters.
在一些可能的实施方式中,第一执行参数可以至少包括以下之一:执行粒度、依赖粒度、依赖时间偏移量和产出时间偏移量。其中,执行粒度可以用于表示执行第一任务的执行周期;依赖粒度可以用于表示第一任务在执行时所依赖第一依赖数据表的周期;依赖时间偏移量可以用于表示执行第一任务的执行时刻与第一依赖数据表的产出时刻之间的偏移值;以及,产出时间偏移量可以用于表示执行第一任务的执行时刻与第一产出数据表的产出时刻之间的偏移值。In some possible implementations, the first execution parameter may include at least one of the following: execution granularity, dependency granularity, dependency time offset, and output time offset. Among them, the execution granularity can be used to represent the execution cycle of the first task; the dependency granularity can be used to represent the cycle of the first dependency data table that the first task relies on when executing; the dependency time offset can be used to represent the execution of the first dependency data table. The offset value between the execution time of the task and the output time of the first dependent data table; and, the output time offset can be used to represent the execution time of the first task and the output of the first output data table offset value between moments.
应理解的,在大数据场景下,数据表总是需要以分区去存储数据。例如,分区可以包含月分区、周分区等。在月分区,数据会每月产出一次计算结果存入到属于当月的分区中。因此,在本公开实施例中,可以设置任务的执行周期和任务的依赖周期,并且可以使用执行粒度和依赖粒度对周期进行统一管理。优选的,执行周期可以包括一个月、一周、一天以及一小时。It should be understood that in big data scenarios, data tables always need to be partitioned to store data. For example, partitions can include monthly partitions, weekly partitions, etc. In the monthly partition, the data will generate calculation results once a month and store them in the partition belonging to that month. Therefore, in the embodiment of the present disclosure, the execution cycle of the task and the dependency cycle of the task can be set, and the execution granularity and dependency granularity can be used to uniformly manage the cycle. Preferably, the execution period may include one month, one week, one day and one hour.
进一步地,任务的执行时刻可以表示任务的周期运行时间。例如,任务可以以小时周期运行,连续两个周期运行的任务的执行时刻可以为:第一次为2022/03/01 00:00:00,第二次为2022/03/01 01:00:00。任务的产出时刻可以表示任务每次执行产出数据的具体时间。例如,任务以天周期运行,连续两个周期运行的任务的产出时刻可以为:第一次为2022/03/01 00:00:00,第二次为2022/03/02 00:00:00。Furthermore, the execution time of a task can represent the periodic running time of the task. For example, a task can be run in an hourly cycle, and the execution time of a task running in two consecutive cycles can be: the first time is 2022/03/01 00:00:00, and the second time is 2022/03/01 01:00: 00. The output time of a task can represent the specific time when the task produces data each time it is executed. For example, if a task runs in a daily cycle, the output time of a task running in two consecutive cycles can be: the first time is 2022/03/01 00:00:00, and the second time is 2022/03/02 00:00: 00.
进一步地,根据执行时刻和产出时刻可以引入时间偏移量。时间偏移量可以表示执行时刻和产出时刻之间相差的偏移值。即执行时刻+偏移量=产出时刻。在本公开实施例中,由于任务被设置为不同的粒度,采用单一偏移量不容易表征不同粒度任务的特性。因此,将偏移量分割为粒度和偏移量两个指标表征,以适配不同力度的任务。示例性的,依赖时间偏移量可以表示任务的执行时刻与上游任务产出时刻的偏移值,表达为上游任务产出时 刻-依赖时间偏移量*依赖粒度=下游任务的执行时刻。产出时间偏移量可以表示任务的执行时刻与任务产出时刻的偏移值,表达为任务执行时刻+产出时间偏移量*执行粒度=数据产出时刻。Furthermore, a time offset can be introduced based on the execution time and output time. The time offset can represent the offset value between the execution time and the output time. That is, execution time + offset = output time. In the embodiment of the present disclosure, since tasks are set to different granularities, it is not easy to characterize the characteristics of tasks with different granularities using a single offset. Therefore, the offset is divided into two indicators: granularity and offset to adapt to tasks of different strengths. For example, the dependent time offset can represent the offset value between the execution time of the task and the output time of the upstream task, expressed as the output time of the upstream task Moment - dependency time offset * dependency granularity = execution time of downstream tasks. The output time offset can represent the offset value between the execution time of the task and the task output time, which is expressed as task execution time + output time offset * execution granularity = data output time.
在一实施例中,图3为本公开实施例中的任务配置的结构的一种示意图,参见图3所示,展示有六个任务。其中,三个任务(任务A、任务B、任务C)没有配置依赖数据表,所以为上游任务;三个任务(任务D、任务E、任务F)配置有依赖数据表,所以为下游任务。In one embodiment, FIG. 3 is a schematic diagram of the structure of a task configuration in an embodiment of the present disclosure. Referring to FIG. 3 , six tasks are shown. Among them, three tasks (Task A, Task B, and Task C) are not configured with dependency data tables, so they are upstream tasks; three tasks (Task D, Task E, and Task F) are configured with dependency data tables, so they are downstream tasks.
其中,上游任务A,产出数据表a,表a的粒度为小时级,产出T-1的数据;其中,T-1可以表示任务A的产出时刻比执行时刻早一小时。上游任务B,产出数据表b,表b的粒度可以为天级,产出T-1的数据;其中,T-1表示任务B的产出时刻比执行时刻早一天。上游任务C,产出数据表c,表c的粒度可以为天级,产出T-3的数据;其中,T-3表示任务C的产出时刻比执行时刻早三天。下游任务D,产出数据表d,表d的粒度可以为天级。依赖数据表a,以天粒度使用表a的T-1的数据;其中,T-1表示上游任务A的产出时刻比下游任务D的执行时刻早一天。下游任务E,产出数据表e,表e的粒度可以为天级。依赖数据表a、依赖数据表b,以小时粒度使用表a的T-2的数据,以天粒度使用表b的T-1的数据;其中,T-2表示上游任务A产出时刻比下游任务E的执行时刻早两小时;T-1表示上游任务B的产出时刻比下游任务E的执行时刻早一天。下游任务F,产出数据表f,表f的粒度可以为月级。依赖数据表c,以月粒度使用表c的T-1的数据。其中,T-1表示上游任务C的产出时刻比下游任务F的执行时刻早两个月。Among them, upstream task A outputs data table a. The granularity of table a is hourly level and outputs data of T-1. Among them, T-1 can mean that the output time of task A is one hour earlier than the execution time. Upstream task B outputs data table b. The granularity of table b can be day level, and outputs data of T-1; where T-1 means that the output time of task B is one day earlier than the execution time. The upstream task C outputs data table c. The granularity of table c can be day level and produces T-3 data; where T-3 means that the output time of task C is three days earlier than the execution time. Downstream task D, output data table d, the granularity of table d can be day level. Depend on data table a, use the data of T-1 of table a with daily granularity; where T-1 means that the output time of upstream task A is one day earlier than the execution time of downstream task D. Downstream task E produces data table e. The granularity of table e can be day-level. Dependent data table a, dependent data table b, uses the data of T-2 in table a at hourly granularity, and uses the data of T-1 in table b at daily granularity; among them, T-2 indicates that the output time of upstream task A is higher than that of downstream task A. The execution time of task E is two hours earlier; T-1 means that the output time of upstream task B is one day earlier than the execution time of downstream task E. Downstream task F, output data table f, the granularity of table f can be monthly level. Dependent on data table c, use the data of T-1 of table c at monthly granularity. Among them, T-1 means that the output time of upstream task C is two months earlier than the execution time of downstream task F.
示例性的,以图3的任务配置为例,对S201中的根据第一依赖数据表和第一执行参数执行第一任务进行说明。电子设备判断任务A的产出数据表a(简写为表a)在当前系统时间(2022/03/01 00:05:36)对应的数据产出时刻(2022/02/28 23:00:00)下是否有成功信息,如没有成功,可以将任务A筛选出来,触发执行当前系统时间对应的时刻(2022/03/01 00:00:00)的任务执行。Illustratively, taking the task configuration in Figure 3 as an example, the execution of the first task in S201 according to the first dependency data table and the first execution parameters will be described. The output data table a (abbreviated as table a) of electronic equipment judgment task A corresponds to the data output time (2022/02/28 23:00:00) at the current system time (2022/03/01 00:05:36) ). If there is no success information, you can filter out task A and trigger the execution of the task at the moment corresponding to the current system time (2022/03/01 00:00:00).
具体来说,上述例子中系统时间与数据产出时刻的转换逻辑为:将系统时间中,比当前任务产出数据粒度更低的单位数值抹去,置为0,得到的就是任务执行日期,再通过任务执行日期与产出数据的时间周期偏移量得到数据产出日期。Specifically, the conversion logic between system time and data output time in the above example is: erase the unit value in the system time that is lower granularity than the current task output data, and set it to 0. The result is the task execution date. Then, the data output date is obtained through the time period offset between the task execution date and the output data.
S102,当第一任务执行完成时,电子设备确定N个第二任务,第二任务为第一任务的下游任务。S102. When the execution of the first task is completed, the electronic device determines N second tasks, and the second tasks are downstream tasks of the first task.
应理解的,当第一任务执行完成时,电子设备能够根据任务的配置信息确定出第一任务的下游任务。由于任务之间的依赖关系复杂,所以,第一任务可以有一个下游任务或是多个下游任务。 It should be understood that when the execution of the first task is completed, the electronic device can determine the downstream tasks of the first task according to the configuration information of the task. Due to the complex dependencies between tasks, the first task can have one downstream task or multiple downstream tasks.
在一些可能的实施方式中,在需要确定出第一任务的下游任务时,电子设备可以调取数据库存储的配置信息,并基于配置信息中的依赖信息去确定第一任务的下游任务。应理解的,配置信息可以以数据表的形式进行存储。In some possible implementations, when it is necessary to determine the downstream tasks of the first task, the electronic device can retrieve the configuration information stored in the database and determine the downstream tasks of the first task based on the dependency information in the configuration information. It should be understood that the configuration information can be stored in the form of a data table.
在一实施例中,图4为本公开实施例中的系统用表的结构的一种示意图,参见图4所示,任务的配置信息被写入如图4所示的关系表(RELATION)进行存储。表RELATION记录的信息可以包括任务名、产出数据表、依赖数据表、产出粒度、时间偏移量。电子设备可以通过查询表RELATION能够获取任务的配置信息。当第一任务执行完成后,电子设备也可以将任务执行成功的信息进行存储。参见图4所示,任务执行成功的信息可以被写入如图4所示的任务状态表(DATASET_STATUE)进行存储。表DATASET_STATUE记录的信息可以包括任务名、数据产出日期、状态(判断是否成功)。电子设备通过查询表DATASET_STATUE能够获取任务是否执行成功的信息。In one embodiment, Figure 4 is a schematic diagram of the structure of the system table in the embodiment of the present disclosure. Referring to Figure 4, the configuration information of the task is written into the relationship table (RELATION) as shown in Figure 4. storage. The information recorded in the RELATION table can include task name, output data table, dependency data table, output granularity, and time offset. The electronic device can obtain the configuration information of the task through the query table RELATION. After the first task is completed, the electronic device can also store information indicating that the task was successfully executed. As shown in Figure 4, information about successful task execution can be written to the task status table (DATASET_STATUE) as shown in Figure 4 for storage. The information recorded in the table DATASET_STATUE can include the task name, data output date, and status (to determine whether it is successful). The electronic device can obtain information about whether the task is successfully executed by querying the DATASET_STATUE table.
示例性的,以图3的任务配置为例,任务A的产出数据粒度可以是小时级别,所以电子设备会在每小时触发任务A执行一次。假设执行时刻2022/03/01 00:00:00的任务A执行成功后,由于任务A的执行参数配置为产出T-1数据,那么,最终会在表DATASET_STATUE中录入产出数据表a的时刻(2022/02/28 23:00:00)的状态是已成功。For example, taking the task configuration in Figure 3 as an example, the output data granularity of task A can be at the hour level, so the electronic device will trigger the execution of task A once every hour. Assume that after the execution of task A at the execution time of 2022/03/01 00:00:00 is successful, since the execution parameters of task A are configured to produce T-1 data, then the output data of table a will eventually be entered in the table DATASET_STATUE. The status at time (2022/02/28 23:00:00) is successful.
在一些可能的实施方式中,S102还可以包括当第一任务执行完成时,获得第一产出数据表,第一产出数据表可以为第一任务在执行完成时产出的数据表。将第一任务的下游任务中,依赖于第一产出数据表的N个下游任务可以确定为N个第二任务。In some possible implementations, S102 may also include obtaining a first output data table when the execution of the first task is completed. The first output data table may be a data table generated when the execution of the first task is completed. Among the downstream tasks of the first task, N downstream tasks that depend on the first output data table can be determined as N second tasks.
应理解的,第一任务执行完成后,电子设备可以将第一任务的产出数据写入一张表中(可以理解为产出数据表),获得第一任务的第一产出数据表。电子设备通过查询任务配置完成后存储的任务配置信息,可以获得N个依赖于第一产出数据表的任务(即N个任务的依赖数据表为第一产出数据表),并将N个任务确定为第二任务。It should be understood that after the execution of the first task is completed, the electronic device can write the output data of the first task into a table (which can be understood as an output data table) to obtain the first output data table of the first task. By querying the task configuration information stored after the task configuration is completed, the electronic device can obtain N tasks that depend on the first output data table (that is, the dependent data tables of the N tasks are the first output data table), and combine the N tasks with the first output data table. The task is determined as the second task.
以图3所示的任务配置为例,任务A的产出数据表为a,任务D和任务E都配置有依赖数据表a。所以,确定任务D和任务E为任务A的下游数据。Taking the task configuration shown in Figure 3 as an example, the output data table of task A is a, and both tasks D and task E are configured with dependency data table a. Therefore, task D and task E are determined to be the downstream data of task A.
S103,当N个第二任务中的第i个第二任务的上游任务执行完成时,电子设备执行第i个第二任务。S103: When the upstream task of the i-th second task among the N second tasks is completed, the electronic device executes the i-th second task.
应理解的,任务执行完成后,电子设备会将任务执行完成的信息进行存储。电子设备根据存储信息,查询第i个第二任务的所有上游任务是否都执行完成,当执行完成时,可以触发执行第i个第二任务。由于任务之间的依赖关系复杂,所以,第i个第二任务可以有多个上游任务。It should be understood that after the task execution is completed, the electronic device will store the information on the completion of the task execution. Based on the stored information, the electronic device queries whether all upstream tasks of the i-th second task have been completed. When the execution is completed, the execution of the i-th second task can be triggered. Due to the complex dependencies between tasks, the i-th second task can have multiple upstream tasks.
其中,第i个第二任务的所有上游任务是否都执行完成是根据上游任务是否产出第二任务所需依赖数据表决定的,因此,第二任务与上游任务之间是以数据表生成依赖关系,与 任务本身无关。Among them, whether all the upstream tasks of the i-th second task are completed is determined by whether the upstream task produces the dependency data table required by the second task. Therefore, the dependency between the second task and the upstream task is generated by the data table. relationship, with The task itself is irrelevant.
示例性的,以图3的任务配置为例,任务E依赖任务A,假设任务A产出了日期为2022/02/28 23:00:00的数据,并写入数据产出表a。由于任务E以小时粒度使用表a的T-2的数据。根据上游任务产出时刻-依赖时间偏移量*依赖粒度=下游任务的执行时刻(即2022/02/28 23:00:00-(-2)*天=2022/03/02 00:00:00),可以确定任务E的执行时刻为2022/03/02 00:00:00。For example, taking the task configuration in Figure 3 as an example, task E depends on task A. It is assumed that task A produces data with a date of 2022/02/28 23:00:00 and writes it to the data output table a. Since task E uses the data of T-2 of table a at hourly granularity. According to the output time of the upstream task - dependency time offset * dependency granularity = execution time of the downstream task (i.e. 2022/02/28 23:00:00-(-2)*day = 2022/03/02 00:00: 00), it can be determined that the execution time of task E is 2022/03/02 00:00:00.
当电子设备获取任务E需要在2022/03/02 00:00:00执行后,会去检查表a是否在2022/02/28 00:00:00-2022/02/28 23:00:00之间每个小时都存在成功状态,检查数据表b是否在日期2022/03/01 00:00:00存在成功状态,如果满足条件,任务E则立即被执行。When the electronic device obtains task E and needs to execute it at 2022/03/02 00:00:00, it will check whether table a is between 2022/02/28 00:00:00-2022/02/28 23:00:00 There is a success status every hour during the period. Check whether data table b has a success status on the date 2022/03/01 00:00:00. If the conditions are met, task E will be executed immediately.
在一些可能的实施方式中,如图2所示,S103之前可以包括S202,S202可以在S102执行之后,且在S103执行之前执行。In some possible implementations, as shown in Figure 2, S202 may be included before S103, and S202 may be executed after S102 and before S103.
S202,电子设备轮询N个第二任务;根据每一个第二任务的第二依赖数据表,可以确定每一个第二任务的上游任务。其中,第二依赖数据表可以用于指示每一个第二任务在执行时所依赖的上游任务产出的数据表。S202. The electronic device polls N second tasks; according to the second dependency data table of each second task, the upstream task of each second task can be determined. The second dependency data table may be used to indicate the data table produced by the upstream task on which each second task depends during execution.
应理解的,当第二任务有多个时,需要轮询每一个第二任务。根据任务配置后存储的配置信息,可以确定当前第二任务的第二依赖数据表,并确定产出第二任务的第二依赖数据表的任务为当前第二任务的上游任务。具体如何根据任务配置信息确定上游任务可参考S102。It should be understood that when there are multiple second tasks, each second task needs to be polled. According to the configuration information stored after the task is configured, the second dependent data table of the current second task can be determined, and the task that produces the second dependent data table of the second task can be determined to be the upstream task of the current second task. For details on how to determine the upstream task based on the task configuration information, please refer to S102.
在一些可能的实施方式中,电子设备检测每一个第二任务的上游任务是否产出数据表;当检测到第i个第二任务的上游任务已产出数据表时,确定第i个第二任务的上游任务执行完成。In some possible implementations, the electronic device detects whether the upstream task of each second task has produced a data table; when it is detected that the upstream task of the i-th second task has produced a data table, determines whether the i-th second task has produced a data table. The task's upstream task execution is completed.
应理解的,任务执行完成后,电子设备会将完成的信息进行存储。当电子设备需要查询第i个第二任务所需上游任务的信息时,可以调取存储的信息,在确定当前系统时刻所述依赖数据表已被上游任务产出时,可以确定上游任务执行完成。It should be understood that after the task execution is completed, the electronic device will store the completed information. When the electronic device needs to query the information of the upstream task required for the i-th second task, the stored information can be retrieved. When it is determined that the dependent data table has been produced by the upstream task at the current system time, it can be determined that the execution of the upstream task is completed. .
示例性的,图4中的表DATASET_STATUE可以用来储存执行成功的信息,电子设备通过查询表DATASET_STATUE,可以查询到任务是否执行成功的信息。For example, the table DATASET_STATUE in Figure 4 can be used to store information about successful execution. The electronic device can query the information about whether the task is successfully executed by querying the table DATASET_STATUE.
在一些可能的实施方式中,如图2中虚线框以及虚线箭头所示。在S102之后还可以包括S203,S203可以在S102执行之后,且在S103执行之前执行。或者,在S102之后还可以包括S203,S203之后还可以执行S202。In some possible implementations, as shown in the dotted box and dotted arrow in Figure 2 . S203 may also be included after S102, and S203 may be executed after S102 and before S103. Alternatively, S203 may be included after S102, and S202 may be executed after S203.
S203,根据N个第二任务中的每一个第二任务的第二执行参数,电子设备可以为每一个第二任务注册的执行触发器,执行触发器被配置成在对应的第二任务的执行时刻到达时触发第二任务执行。 S203. According to the second execution parameter of each second task among the N second tasks, the electronic device can register an execution trigger for each second task, and the execution trigger is configured to execute the corresponding second task. When the time arrives, the second task execution is triggered.
应理解的,第二执行参数能够用于确定第二任务的执行时间。确定执行时间后,电子设备为每一个第二任务注册具有执行时间的执行触发器。当执行时间达到时,可以电子触发第二任务执行。It should be understood that the second execution parameter can be used to determine the execution time of the second task. After determining the execution time, the electronic device registers an execution trigger with the execution time for each second task. When the execution time is reached, the second task execution can be electronically triggered.
在一些可能的实施方式中,第二任务的执行时间可以由上游任务产出第二任务所需依赖数据表的时间决定。所以,第二执行参数可以包括依赖粒度和依赖时间偏移量。In some possible implementations, the execution time of the second task may be determined by the time it takes for the upstream task to generate the dependency data table of the second task. Therefore, the second execution parameters may include dependency granularity and dependency time offset.
示例性的,以图3的任务配置为例,参考S103的计算过程。假设任务A执行日期为2022/03/01 00:00:00。当任务A执行成功后,由于下游任务E是以天粒度依赖T-2的数据a,那么会注册一个执行日期为2022/03/02 00:00:00的任务E的触发器。For example, take the task configuration in Figure 3 as an example and refer to the calculation process of S103. Assume that the execution date of task A is 2022/03/01 00:00:00. When task A is executed successfully, since the downstream task E depends on the data a of T-2 at a daily granularity, a trigger for task E with an execution date of 2022/03/02 00:00:00 will be registered.
在一实施例中,图5为本公开实施例中的系统用表的结构的另一种示意图,参见图5所示,每一个第二任务执行触发器的信息可以被记录在触发器表(TRIGGER)中。表TRIGGER记录的信息包括任务名、任务执行日期和状态(判断是否成功)。电子设备通过查询表TRIGGER,可以获得当前触发器对应的第二任务的信息。In one embodiment, Figure 5 is another schematic diagram of the structure of a system table in an embodiment of the present disclosure. Referring to Figure 5, the information of each second task execution trigger can be recorded in the trigger table ( TRIGGER). The information recorded in table TRIGGER includes task name, task execution date and status (to determine whether it is successful). The electronic device can obtain the information of the second task corresponding to the current trigger by querying the TRIGGER table.
在一些可能的实施例中,如图2中虚线框以及虚线箭头所示,在S203之后还可以包括S204,S204可以在S203执行之后,且在S103执行之前执行。或者,在S204之前还可以包括S202,S203之后还可以执行S202,S202之后可以执行S204。In some possible embodiments, as shown in the dotted box and dotted arrow in Figure 2, S204 may also be included after S203, and S204 may be executed after S203 and before S103. Alternatively, S202 may be included before S204, S202 may be executed after S203, and S204 may be executed after S202.
S204,当第i个第二任务的上游任务执行完成时,电子设备可以触发第i个第二任务对应的执行触发器。S204. When the execution of the upstream task of the i-th second task is completed, the electronic device can trigger the execution trigger corresponding to the i-th second task.
应理解的,电子设备通过调取存储数据,可以获得触发器中第二任务的上游任务是否都已经有成功状态。若第二任务的上游任务都已经执行成功,立即触发执行当前任务。It should be understood that by retrieving stored data, the electronic device can obtain whether all the upstream tasks of the second task in the trigger have a success status. If the upstream tasks of the second task have been successfully executed, the execution of the current task will be triggered immediately.
其中,第二任务的上游任务都已经执行成功是指第二任务的依赖数据表在当前时刻都已被产出。Among them, the fact that all the upstream tasks of the second task have been executed successfully means that the dependent data tables of the second task have been produced at the current moment.
示例性的,电子设备以5s的间隔获取所有待触发状态的触发器,进而检查触发器中所需执行的对应任务。触发器获取对应任务的上游任务,查询对应任务的上游任务的状态,若上游任务都已产出依赖数据表,则立即触发执行触发器中对应任务。For example, the electronic device obtains all triggers in the to-be-triggered state at intervals of 5 seconds, and then checks the corresponding tasks that need to be performed in the triggers. The trigger obtains the upstream task of the corresponding task and queries the status of the upstream task of the corresponding task. If the upstream tasks have produced dependent data tables, the corresponding task in the trigger is immediately triggered to execute.
在本实施例中,通过S101至S103,可以获取第一任务,在第一任务执行完成时,确定N个第二任务,其中,第二任务为第一任务的下游任务。当N个第二任务中的第i个第二任务的上游任务执行完成时,可以执行第i个第二任务。可见,本公开中的下游任务在确定上游任务执行完成时就能够执行,无须等到固定执行时间,有效降低了上下游任务间的延迟。In this embodiment, through S101 to S103, the first task can be obtained, and when the execution of the first task is completed, N second tasks are determined, where the second tasks are downstream tasks of the first task. When the upstream task of the i-th second task among the N second tasks is completed, the i-th second task can be executed. It can be seen that the downstream tasks in the present disclosure can be executed when the execution of the upstream tasks is determined to be completed, without waiting for a fixed execution time, which effectively reduces the delay between the upstream and downstream tasks.
在本实施例中,通过S101至S103和S201至204可知,获取第一任务,在第一任务执行完成时,可以确定N个第二任务(下游任务)。并为N个第二任务注册执行触发器,当检查到第i个第二任务的上游任务执行完成时,可以触发执行触发器执行第i个第二任务。 电子设备可以以极低时间间隔获取触发器中的任务并执行。整个过程,调度上下游任务的延迟维持在秒级,能够有效降低了上下游任务间的延迟。同时,直接使用产出数据表作为上游,根据当前任务的情况配置以来力度、依赖时间偏移量。避免了以任务作为上游时,执行时刻与产出时刻不一致而导致的配置难、配置错误的问题。In this embodiment, it can be seen from S101 to S103 and S201 to 204 that the first task is obtained, and when the execution of the first task is completed, N second tasks (downstream tasks) can be determined. And register execution triggers for N second tasks. When it is checked that the upstream task of the i-th second task is completed, the execution trigger can be triggered to execute the i-th second task. Electronic devices can pick up tasks in triggers and execute them at very low intervals. During the entire process, the delay in scheduling upstream and downstream tasks is maintained at the second level, which can effectively reduce the delay between upstream and downstream tasks. At the same time, the output data table is directly used as the upstream, and the intensity and dependency time offset are configured according to the current task situation. It avoids the problems of difficult configuration and configuration errors caused by inconsistent execution time and output time when using tasks as upstream.
下面以实例对本公开实施例中任务调度的过程进行说明。The following uses examples to illustrate the task scheduling process in the embodiment of the present disclosure.
图6为本公开实施例中的任务调度的方法的又一种实施流程示意图。如图6所示,包括:FIG. 6 is a schematic flowchart of another implementation of the task scheduling method in an embodiment of the present disclosure. As shown in Figure 6, it includes:
S601,电子设备解析任务,配置任务的依赖信息和执行参数,进入S602;S601, the electronic device parses the task, configures the dependency information and execution parameters of the task, and enters S602;
首先,电子设备可以通过语法解析工具对用户的SQL任务进行语法解析,从解析结果中获取任务需要依赖数据表和产出数据表并补充至任务的配置信息;其次,电子设备可以根据用户SQL的使用情况,配置任务的执行参数。First, the electronic device can perform syntax analysis on the user's SQL task through a syntax parsing tool. Obtaining the task from the parsing result requires relying on the data table and output data table and supplementing the configuration information of the task; secondly, the electronic device can perform syntax analysis based on the user's SQL. Usage, configure the execution parameters of the task.
S602,电子设备可以将任务的依赖信息和执行参数进行存储,进入S603;S602, the electronic device can store the dependency information and execution parameters of the task and enter S603;
其中,电子设备可以将任务的依赖信息和执行参数录入到表RELATION中。Among them, the electronic device can enter the dependency information and execution parameters of the task into the table RELATION.
S603,电子设备可以获取所有上游任务并判断任务执行情况,进入S604;S603, the electronic device can obtain all upstream tasks and determine the task execution status, and enter S604;
其中,电子设备可以以5s的间隔,获取所有没有上游数据的任务,并判断任务的产出数据表在当前系统时间对应的数据产出日期下是否有成功信息。Among them, the electronic device can obtain all tasks without upstream data at 5s intervals, and determine whether the task's output data table has success information under the data output date corresponding to the current system time.
S604,电子设备可以获取所有未执行的上游任务,并触发该任务执行,进入S605;S604, the electronic device can obtain all unexecuted upstream tasks and trigger the execution of the tasks, entering S605;
其中,电子设备可以将没有查到成功信息的数据表筛选出来,通过表RELATION,获取到对应任务,触发执行当前系统时间对应的任务执行。Among them, the electronic device can filter out the data tables for which no successful information is found, obtain the corresponding tasks through the RELATION table, and trigger the execution of tasks corresponding to the current system time.
S605,电子设备可以在上游任务执行完成后,为下游任务注册执行触发器,进入S606;S605, the electronic device can register an execution trigger for the downstream task after the execution of the upstream task is completed, and enter S606;
其中,任务执行完成后,电子设备可以将对应的产出数据成功的信息录入表DATASET_STATUE中。电子设备会可以通过表RELATION找到任务的下游任务,为下游任务注册执行触发器,执行触发器的信息存储在表TRIGGER中。Among them, after the task execution is completed, the electronic device can enter the corresponding information on the success of the output data into the table DATASET_STATUE. The electronic device can find the downstream tasks of the task through the RELATION table, register execution triggers for the downstream tasks, and store the execution trigger information in the TRIGGER table.
S606,电子设备查询执行触发器,确定对应的上游任务全部成功,进入S607;S606, the electronic device queries the execution trigger, determines that all corresponding upstream tasks are successful, and enters S607;
其中,电子设备可以以5s的间隔获取表TRIGGER所有待触发状态的执行触发器,检查执行触发器中对应任务的上游任务在表DATASET_STATUE中都已有成功状态。Among them, the electronic device can obtain all execution triggers in the to-be-triggered status of the table TRIGGER at intervals of 5 seconds, and check that the upstream tasks of the corresponding tasks in the execution triggers have successful status in the table DATASET_STATUE.
S607,执行下游任务。S607, execute downstream tasks.
其中,电子设备查询到表DATASET_STATUE显示上游依赖表已成功,立即执行当前任务,并且将表TRIGGER修改为已成功。电子设备可以通过重复S604至S607,执行所有任务,实现任务调度。Among them, the electronic device queries the table DATASET_STATUE to show that the upstream dependent table has been successful, immediately executes the current task, and changes the table TRIGGER to successful. The electronic device can perform all tasks and implement task scheduling by repeating S604 to S607.
在本公开实施例中,由S601至S607可知,电子设备可以通过上游任务获得下游任务,为每个下游任务注册执行触发器。不断轮询执行触发器,查询执行触发器对应任务的上游 任务是否执行完成。若所有上游任务执行完成,则执行下游任务。可见,本公开中的每个任务在执行完后,能够通过在数据库中维护的数据血缘,快速找到下游任务并为其注册执行触发器。电子设备以极低时间间隔获取执行触发器中的任务并执行。整个过程,调度上下游任务的延迟维持在秒级,实现任务调度的低延迟。进一步地,直接使用产出数据表作为上游,根据当前任务的情况配置以来力度、依赖时间偏移量。避免了以任务作为上游时,执行时刻与产出时刻不一致而导致的配置难、配置错误的问题。In the embodiment of the present disclosure, it can be known from S601 to S607 that the electronic device can obtain the downstream task through the upstream task and register an execution trigger for each downstream task. Continuously poll the execution trigger and query the upstream of the task corresponding to the execution trigger Whether the task is completed. If all upstream tasks are completed, downstream tasks are executed. It can be seen that after each task in this disclosure is executed, the downstream task can be quickly found and an execution trigger is registered for it through the data lineage maintained in the database. The electronic device obtains the task in the execution trigger and executes it at extremely low intervals. During the entire process, the delay in scheduling upstream and downstream tasks is maintained at the second level, achieving low latency in task scheduling. Furthermore, the output data table is directly used as the upstream, and the intensity and dependent time offset are configured according to the current task situation. It avoids the problems of difficult configuration and configuration errors caused by inconsistent execution time and output time when using tasks as upstream.
基于相同的发明构思,本公开实施例还提供一种任务调度的装置,该任务调度的装置可以为电子设备中的芯片或者片上系统,还可以为电子设备中用于实现上述各个实施例所述的方法的功能模块。该任务调度的装置可以实现上述各实施例中电子设备所执行的功能,这些功能可以通过硬件执行相应的软件实现。这些硬件或软件包括一个或多个上述功能相应的模块。Based on the same inventive concept, embodiments of the present disclosure also provide a task scheduling device. The task scheduling device may be a chip or a system-on-chip in an electronic device, or may be used in an electronic device to implement the above-mentioned embodiments. The function module of the method. The task scheduling device can realize the functions performed by the electronic devices in the above embodiments, and these functions can be realized by hardware executing corresponding software. These hardware or software include one or more modules corresponding to the above functions.
图7为本公开实施例中的一种任务调度的装置的结构示意图,参见图7所示,该任务调度的装置700可以包括:获取模块701,获取模块701被配置成获取第一任务;确定模块702,确定模块702被配置成当第一任务执行完成时,确定N个第二任务,第二任务为所述第一任务的下游任务,N为正整数;执行模块703,执行模块703被配置成当N个第二任务中的第i个第二任务的上游任务执行完成时,执行第i个第二任务。Figure 7 is a schematic structural diagram of a task scheduling device in an embodiment of the present disclosure. Referring to Figure 7, the task scheduling device 700 may include: an acquisition module 701 configured to acquire the first task; determine Module 702, the determination module 702 is configured to determine N second tasks when the execution of the first task is completed, the second tasks are downstream tasks of the first task, and N is a positive integer; the execution module 703, the execution module 703 is It is configured to execute the i-th second task when the upstream task of the i-th second task among the N second tasks is completed.
在一些可能的实施方式中,获取模块701还被配置成:在获取第一任务之后,根据第一任务的第一依赖数据表和第一执行参数,执行第一任务;第一依赖数据表指示第一任务在执行时所依赖的上游任务产出的数据表。In some possible implementations, the acquisition module 701 is further configured to: after acquiring the first task, execute the first task according to the first dependency data table and the first execution parameters of the first task; the first dependency data table indicates The data table produced by the upstream task that the first task relies on when executing.
在一些可能的实施方式中,确定模块702还被配置成:当第一任务执行完成时,获得第一产出数据表,第一产出数据表为第一任务在执行完成时产出的数据表;将第一任务的下游任务中,依赖于第一产出数据表的N个下游任务确定为N个第二任务。In some possible implementations, the determination module 702 is further configured to: when the execution of the first task is completed, obtain the first output data table, and the first output data table is the data generated when the execution of the first task is completed. table; among the downstream tasks of the first task, N downstream tasks that depend on the first output data table are determined as N second tasks.
在一些可能的实施方式中,第一执行参数至少包括以下之一:执行粒度、依赖粒度、依赖时间偏移量和产出时间偏移量;其中,执行粒度用于表示执行第一任务的执行周期;依赖粒度用于表示第一任务在执行时所依赖第一依赖数据表的周期;依赖时间偏移量用于表示执行第一任务的执行时刻与第一依赖数据表的产出时刻之间的偏移值;产出时间偏移量用于表示执行第一任务的执行时刻与第一产出数据表的产出时刻之间的偏移值。In some possible implementations, the first execution parameter includes at least one of the following: execution granularity, dependency granularity, dependency time offset, and output time offset; wherein the execution granularity is used to represent the execution of the first task. Period; dependency granularity is used to represent the period of the first dependency data table that the first task relies on when executing; dependency time offset is used to represent the time between the execution time of the first task and the output time of the first dependency data table. The offset value; the output time offset is used to represent the offset value between the execution time of the first task and the output time of the first output data table.
在一些可能的实施方式中,执行周期至少包括以下之一:一个月、一周、一天以及一小时。In some possible implementations, the execution period includes at least one of the following: one month, one week, one day, and one hour.
在一些可能的实施方式中,执行模块703还被配置成:在当N个第二任务中的第i个第二任务的上游任务执行完成时,执行第i个第二任务之前,轮询N个第二任务;根据每一个第二任务的第二依赖数据表,确定每一个第二任务的上游任务,其中,第二依赖数据 表用于指示每一个第二任务在执行时所依赖的上游任务产出的数据表。In some possible implementations, the execution module 703 is further configured to: when the upstream task of the i-th second task among the N second tasks is completed, before executing the i-th second task, poll N second tasks; according to the second dependency data table of each second task, determine the upstream task of each second task, where the second dependency data The table is a data table used to indicate the output of the upstream task that each second task depends on when executing.
在一些可能的实施方式中,执行模块703还被配置成:在根据每一个第二任务的第二依赖数据表,确定每一个第二任务的上游任务之后,检测每一个第二任务的上游任务是否产出数据表;当检测到第i个第二任务的上游任务已产出数据表时,确定第i个第二任务的上游任务执行完成。In some possible implementations, the execution module 703 is further configured to: after determining the upstream task of each second task according to the second dependency data table of each second task, detect the upstream task of each second task Whether to generate a data table; when it is detected that the upstream task of the i-th second task has generated a data table, it is determined that the upstream task of the i-th second task has been executed.
在一些可能的实施方式中,执行模块703还被配置成:在当第一任务执行完成时,确定N个第二任务之后,根据N个第二任务中的每一个第二任务的第二执行参数,为每一个第二任务注册执行触发器,执行触发器被配置成在对应的第二任务的执行时刻到达时触发第二任务执行;当N个第二任务中的第i个第二任务的上游任务执行完成时,执行第i个第二任务,包括:当第i个第二任务的上游任务执行完成时,触发第i个第二任务对应的执行触发器。In some possible implementations, the execution module 703 is further configured to: after determining the N second tasks when the execution of the first task is completed, perform a second execution of the second task according to each of the N second tasks. Parameter, register an execution trigger for each second task. The execution trigger is configured to trigger the execution of the second task when the execution time of the corresponding second task arrives; when the i-th second task among the N second tasks When the execution of the upstream task of the i-th second task is completed, executing the i-th second task includes: when the execution of the upstream task of the i-th second task is completed, triggering the execution trigger corresponding to the i-th second task.
需要说明的是,获取模块701、确定模块702和执行模块703的具体实现过程可参考图1至图6实施例的详细描述,为了说明书的简洁,这里不再赘述。It should be noted that for the specific implementation process of the acquisition module 701, the determination module 702 and the execution module 703, reference can be made to the detailed description of the embodiments in Figures 1 to 6. For the sake of simplicity of the description, they will not be described again here.
基于相同的发明构思,本公开实施例提供一种电子设备,该电子设备可以为上述一个或者多个实施例中所述的电子设备。图8为本公开实施例中的一种电子设备的结构示意图,参见图8所示,电子设备800,可以采用通用的计算机硬件,包括处理器801、存储器802。Based on the same inventive concept, embodiments of the present disclosure provide an electronic device, which may be the electronic device described in one or more of the above embodiments. FIG. 8 is a schematic structural diagram of an electronic device in an embodiment of the present disclosure. As shown in FIG. 8 , the electronic device 800 can use general computer hardware, including a processor 801 and a memory 802 .
在一些可能的实施方式中,至少一个处理器可以构成具有对一个或多个输入执行逻辑运算的电路的任何物理设备。例如,至少一个处理器可以包括一个或多个集成电路(IC),包括专用集成电路(ASIC)、微芯片、微控制器、微处理器、中央处理单元(CPU)的全部或部分、图形处理单元(GPU)、数字信号处理器(DSP)、现场可编程门阵列(FPGA)或者适于执行指令或执行逻辑运算的其它电路。由至少一个处理器执行的指令可以例如被预加载到与控制器集成的或嵌入在控制器中的存储器中,或者可以存储在分离的存储器中。存储器可以包括随机存取存储器(RAM)、只读存储器(ROM)、硬盘、光盘、磁介质、闪存,其它永久、固定或易失性存储器,或者能够存储指令的任何其它机制。在一些实施例中,至少一个处理器可以包括多于一个处理器。每个处理器可以具有相似的结构,或者处理器可以具有彼此电连接或断开的不同构造。例如,处理器可以是分离的电路或集成在单个电路中。当使用多于一个处理器时,处理器可以被配置为独立地或协作地操作。处理器可以以电、磁、光学、声学、机械或通过允许它们交互的其它手段来耦合。In some possible implementations, the at least one processor may constitute any physical device having circuitry that performs logical operations on one or more inputs. For example, at least one processor may include one or more integrated circuits (ICs), including application specific integrated circuits (ASICs), microchips, microcontrollers, microprocessors, all or part of a central processing unit (CPU), graphics processing unit unit (GPU), digital signal processor (DSP), field programmable gate array (FPGA), or other circuitry suitable for executing instructions or performing logical operations. Instructions for execution by the at least one processor may, for example, be preloaded into memory integrated with or embedded in the controller, or may be stored in separate memory. Memory may include random access memory (RAM), read only memory (ROM), hard disk, optical disk, magnetic media, flash memory, other permanent, fixed or volatile memory, or any other mechanism capable of storing instructions. In some embodiments, at least one processor may include more than one processor. Each processor may have a similar structure, or the processors may have different configurations that are electrically connected or disconnected from each other. For example, the processor may be a separate circuit or integrated in a single circuit. When more than one processor is used, the processors may be configured to operate independently or cooperatively. Processors may be coupled electrically, magnetically, optically, acoustically, mechanically, or by other means that allow them to interact.
根据本发明的一个实施例,本发明还提供了一种计算机可读存储介质,其上存储有计算机指令,该指令被处理器执行上述调度任务的方法的步骤。存储器802可以包括以易失性和/或非易失性存储器形式的计算机存储媒体,如只读存储器和/或随机存取存储器。存储器802可以存储操作系统、应用程序、其他程序模块、可执行代码、程序数据、用户 数据等。According to one embodiment of the present invention, the present invention also provides a computer-readable storage medium on which computer instructions are stored, and the instructions are used by a processor to execute the steps of the method for scheduling tasks. Memory 802 may include computer storage media in the form of volatile and/or non-volatile memory, such as read-only memory and/or random access memory. Memory 802 may store operating systems, application programs, other program modules, executable code, program data, user Data etc.
此外,上述存储器802中存储有用于实现图7中的获取模块701、确定模块702和执行模块703的功能的计算机执行指令。图7中获取模块701、确定模块702和执行模块703的功能/实现过程均可以通过图8中的处理器801调用存储器802中存储的计算机执行指令来实现,具体实现过程和功能参考上述相关实施例。In addition, the above-mentioned memory 802 stores computer execution instructions for implementing the functions of the acquisition module 701, the determination module 702 and the execution module 703 in Figure 7. The functions/implementation processes of the acquisition module 701, the determination module 702 and the execution module 703 in Figure 7 can all be implemented by the processor 801 in Figure 8 calling the computer execution instructions stored in the memory 802. For specific implementation processes and functions, please refer to the above-mentioned related implementations. example.
本领域技术人员可以理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本发明实施例的实施过程构成任何限定。Those skilled in the art can understand that the size of the sequence numbers of each step in the above embodiments does not mean the order of execution. The execution order of each process should be determined by its functions and internal logic, and should not be determined by the implementation process of the embodiments of the present invention. constitute any limitation.
以上所述实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解,其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围,均应包含在本发明的保护范围之内。The above-described embodiments are only used to illustrate the technical solutions of the present invention, but not to limit them. Although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that they can still implement the above-mentioned implementations. The technical solutions described in the examples are modified, or some of the technical features are equivalently replaced; and these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of each embodiment of the present invention, and should be included in within the protection scope of the present invention.
工业实用性Industrial applicability
本公开的实施例提供了一种任务调度的方法、装置和设备。本公开的实施方式提供的技术方案通过获取第一任务,在第一任务执行完成时,确定N个第二任务,其中,第二任务为第一任务的下游任务。当N个第二任务中的第i个第二任务的上游任务执行完成时,执行第i个第二任务。可见,本公开中的下游任务在确定上游任务执行完成时就能够触发执行,无须等到固定执行时间,有效降低了上下游任务间的延迟。Embodiments of the present disclosure provide a task scheduling method, apparatus and equipment. The technical solution provided by the embodiments of the present disclosure obtains the first task and determines N second tasks when the execution of the first task is completed, where the second tasks are downstream tasks of the first task. When the upstream task of the i-th second task among the N second tasks is completed, the i-th second task is executed. It can be seen that the downstream tasks in the present disclosure can trigger execution when the execution of the upstream tasks is determined to be completed, without waiting for a fixed execution time, which effectively reduces the delay between the upstream and downstream tasks.
此外,可以理解的是,本公开实施例提供的任务调度的方法、装置和设备是可以重现的,并且可以用在多种工业应用中。例如,本公开实施例提供的任务调度的方法、装置和设备可以用于涉及大数据分析技术领域,例如大数据分析中的任务调度的领域。 In addition, it can be understood that the task scheduling methods, devices and equipment provided by the embodiments of the present disclosure are reproducible and can be used in a variety of industrial applications. For example, the task scheduling method, apparatus and equipment provided by the embodiments of the present disclosure can be used in the field of big data analysis technology, such as the field of task scheduling in big data analysis.

Claims (21)

  1. 一种任务调度的方法,其中,所述方法包括:A method of task scheduling, wherein the method includes:
    获取第一任务;Get the first task;
    当所述第一任务执行完成时,确定N个第二任务,所述第二任务为所述第一任务的下游任务,N为正整数;When the execution of the first task is completed, N second tasks are determined. The second tasks are downstream tasks of the first task, and N is a positive integer;
    当所述N个第二任务中的第i个第二任务的上游任务执行完成时,执行所述第i个第二任务。When the upstream task of the i-th second task among the N second tasks is completed, the i-th second task is executed.
  2. 根据权利要求1所述的方法,其中,在所述获取第一任务之后,所述方法还包括:The method according to claim 1, wherein after obtaining the first task, the method further includes:
    根据所述第一任务的第一依赖数据表和第一执行参数,执行所述第一任务;所述第一依赖数据表指示所述第一任务在执行时所依赖的上游任务产出的数据表。Execute the first task according to the first dependency data table and first execution parameters of the first task; the first dependency data table indicates the data generated by the upstream task on which the first task depends during execution. surface.
  3. 根据权利要求1或2所述的方法,其中,所述当所述第一任务执行完成时,确定N个第二任务,包括:The method according to claim 1 or 2, wherein when the execution of the first task is completed, determining N second tasks includes:
    当所述第一任务执行完成时,获得第一产出数据表,所述第一产出数据表为所述第一任务在执行完成时产出的数据表;When the execution of the first task is completed, a first output data table is obtained, and the first output data table is the data table generated when the execution of the first task is completed;
    将所述第一任务的下游任务中,依赖于所述第一产出数据表的N个下游任务确定为所述N个第二任务。Among the downstream tasks of the first task, N downstream tasks that depend on the first output data table are determined as the N second tasks.
  4. 根据权利要求2所述的方法,其中,所述第一执行参数至少包括以下之一:执行粒度、依赖粒度、依赖时间偏移量和产出时间偏移量;The method of claim 2, wherein the first execution parameter includes at least one of the following: execution granularity, dependency granularity, dependency time offset and output time offset;
    其中,所述执行粒度用于表示执行所述第一任务的执行周期;Wherein, the execution granularity is used to represent the execution cycle of executing the first task;
    所述依赖粒度用于表示所述第一任务在执行时所依赖所述第一依赖数据表的周期;The dependency granularity is used to represent the cycle of the first dependency data table that the first task relies on when executing;
    所述依赖时间偏移量用于表示执行所述第一任务的执行时刻与所述第一依赖数据表的产出时刻之间的偏移值;The dependency time offset is used to represent the offset value between the execution time of the first task and the output time of the first dependency data table;
    所述产出时间偏移量用于表示执行所述第一任务的执行时刻与所述第一任务产出数据表的产出时刻之间的偏移值。The output time offset is used to represent an offset value between the execution time of executing the first task and the output time of the first task output data table.
  5. 根据权利要求4所述的方法,其中,所述执行周期至少包括以下之一:一个月、一周、一天以及一小时。The method of claim 4, wherein the execution period includes at least one of the following: one month, one week, one day, and one hour.
  6. 根据权利要求1至5中的任一项所述的方法,其中,在所述当所述N个第二任务中的第i个第二任务的上游任务执行完成时,执行所述第i个第二任务之前,所述方法还包括: The method according to any one of claims 1 to 5, wherein when the upstream task of the i-th second task among the N second tasks is completed, the i-th second task is executed. Before the second task, the method further includes:
    轮询所述N个第二任务;Poll the N second tasks;
    根据每一个第二任务的第二依赖数据表,确定每一个第二任务的上游任务,其中,所述第二依赖数据表用于指示所述每一个第二任务在执行时所依赖的上游任务产出的数据表。Determine the upstream tasks of each second task according to the second dependency data table of each second task, wherein the second dependency data table is used to indicate the upstream tasks on which each second task depends during execution. Output data table.
  7. 根据权利要求6所述的方法,其中,在所述根据每一个第二任务的第二依赖数据表,确定每一个第二任务的上游任务之后,所述方法还包括:The method according to claim 6, wherein after determining the upstream task of each second task according to the second dependency data table of each second task, the method further includes:
    检测所述每一个第二任务的上游任务是否产出数据表;Detect whether the upstream task of each second task produces a data table;
    当检测到所述第i个第二任务的上游任务已产出数据表时,确定所述第i个第二任务的上游任务执行完成。When it is detected that the upstream task of the i-th second task has produced a data table, it is determined that the execution of the upstream task of the i-th second task is completed.
  8. 根据权利要求1至7中的任一项所述的方法,其中,在所述当所述第一任务执行完成时,确定N个第二任务之后,所述方法还包括:The method according to any one of claims 1 to 7, wherein, after determining N second tasks when the execution of the first task is completed, the method further includes:
    根据所述N个第二任务中的每一个第二任务的第二执行参数,为所述每一个第二任务注册执行触发器,所述执行触发器被配置成在对应的所述第二任务的执行时刻到达时触发所述第二任务执行;According to the second execution parameter of each second task among the N second tasks, an execution trigger is registered for each second task, and the execution trigger is configured to execute the The execution of the second task is triggered when the execution time arrives;
    所述当所述N个第二任务中的第i个第二任务的上游任务执行完成时,执行所述第i个第二任务,包括:When the upstream task of the i-th second task among the N second tasks is completed, executing the i-th second task includes:
    当所述第i个第二任务的上游任务执行完成时,触发所述第i个第二任务对应的执行触发器。When the execution of the upstream task of the i-th second task is completed, the execution trigger corresponding to the i-th second task is triggered.
  9. 根据权利要求3所述的方法,其中,根据所述第一任务的第一依赖数据表和第一执行参数,执行所述第一任务包括:判断所述第一任务的产出数据表在当前系统时间对应的数据产出时刻下是否有成功信息,如没有成功,将所述第一任务筛选出来,触发当前系统时间对应的时刻的任务的执行。The method of claim 3, wherein executing the first task according to the first dependent data table and the first execution parameter of the first task includes: determining whether the output data table of the first task is currently Whether there is success information at the data output moment corresponding to the system time. If there is no success, the first task is filtered out and the execution of the task at the moment corresponding to the current system time is triggered.
  10. 根据权利要求1至9中的任一项所述的方法,其中,在确定所述第一任务的下游任务时,调取数据库存储的配置信息,并基于所述配置信息中的依赖信息来确定所述第一任务的下游任务。The method according to any one of claims 1 to 9, wherein when determining the downstream task of the first task, the configuration information stored in the database is retrieved and determined based on the dependency information in the configuration information. downstream tasks of the first task.
  11. 根据权利要求1至10中的任一项所述的方法,其中,在所述第一任务执行完成后,将所述第一任务执行成功的信息存储到任务状态表中,其中,所述任务状态表中的信息包括任务名、数据产出日期、状态。The method according to any one of claims 1 to 10, wherein after the execution of the first task is completed, information indicating the successful execution of the first task is stored in a task status table, wherein the task The information in the status table includes task name, data output date, and status.
  12. 一种任务调度的装置,其中,所述装置包括:A device for task scheduling, wherein the device includes:
    获取模块,所述获取模块被配置成获取第一任务;An acquisition module, the acquisition module is configured to acquire the first task;
    确定模块,所述确定模块被配置成当所述第一任务执行完成时,确定N个第二任务,所述第二任务为所述第一任务的下游任务,N为正整数;以及 Determining module, the determining module is configured to determine N second tasks when the execution of the first task is completed, the second tasks are downstream tasks of the first task, and N is a positive integer; and
    执行模块,所述执行模块被配置成当所述N个第二任务中的第i个第二任务的上游任务执行完成时,执行所述第i个第二任务。An execution module, the execution module is configured to execute the i-th second task when the upstream task of the i-th second task among the N second tasks is completed.
  13. 根据权利要求12所述的装置,其中,所述获取模块还被配置成:在所述获取第一任务之后,根据所述第一任务的第一依赖数据表和第一执行参数,执行所述第一任务;所述第一依赖数据表指示所述第一任务在执行时所依赖的上游任务产出的数据表。The device according to claim 12, wherein the acquisition module is further configured to: after the acquisition of the first task, execute the first dependency data table and the first execution parameter of the first task. The first task; the first dependency data table indicates the data table produced by the upstream task on which the first task depends during execution.
  14. 根据权利要求12或13所述的装置,其中,所述确定模块还被配置成:当所述第一任务执行完成时,获得第一产出数据表,所述第一产出数据表为所述第一任务在执行完成时产出的数据表;将所述第一任务的下游任务中,依赖于所述第一产出数据表的N个下游任务确定为所述N个第二任务。The device according to claim 12 or 13, wherein the determination module is further configured to: when the execution of the first task is completed, obtain a first output data table, the first output data table is the The data table generated when the first task is executed is completed; among the downstream tasks of the first task, N downstream tasks that depend on the first output data table are determined as the N second tasks.
  15. 根据权利要求13所述的装置,其中,所述第一执行参数至少包括以下之一:执行粒度、依赖粒度、依赖时间偏移量和产出时间偏移量;The apparatus according to claim 13, wherein the first execution parameter includes at least one of the following: execution granularity, dependency granularity, dependency time offset and output time offset;
    其中,所述执行粒度用于表示执行所述第一任务的执行周期;所述依赖粒度用于表示所述第一任务在执行时所依赖所述第一依赖数据表的周期;所述依赖时间偏移量用于表示执行所述第一任务的执行时刻与所述第一依赖数据表的产出时刻之间的偏移值;所述产出时间偏移量用于表示执行所述第一任务的执行时刻与所述第一任务产出数据表的产出时刻之间的偏移值。Wherein, the execution granularity is used to represent the execution cycle of executing the first task; the dependency granularity is used to represent the cycle of the first dependency data table that the first task relies on when executing; the dependency time The offset is used to represent the offset value between the execution time of the first task and the production time of the first dependent data table; the production time offset is used to represent the execution of the first The offset value between the execution time of the task and the output time of the first task output data table.
  16. 根据权利要求15所述的装置,其中,所述执行周期至少包括以下之一:一个月、一周、一天以及一小时。The device according to claim 15, wherein the execution period includes at least one of the following: one month, one week, one day, and one hour.
  17. 根据权利要求12至16中的任一项所述的装置,其中,所述执行模块,还被配置成:在所述当所述N个第二任务中的第i个第二任务的上游任务执行完成时,执行所述第i个第二任务之前,轮询所述N个第二任务;根据每一个第二任务的第二依赖数据表,确定每一个第二任务的上游任务,其中,所述第二依赖数据表用于指示所述每一个第二任务在执行时所依赖的上游任务产出的数据表。The apparatus according to any one of claims 12 to 16, wherein the execution module is further configured to: when the upstream task of the i-th second task among the N second tasks When the execution is completed, before executing the i-th second task, the N second tasks are polled; according to the second dependency data table of each second task, the upstream task of each second task is determined, where, The second dependency data table is used to indicate a data table produced by an upstream task on which each second task depends during execution.
  18. 根据权利要求12至17中的任一项所述的装置,其中,所述执行模块,还被配置成:在所述根据每一个第二任务的第二依赖数据表,确定每一个第二任务的上游任务之后,检测所述每一个第二任务的上游任务是否产出数据表;当检测到所述第i个第二任务的上游任务已产出数据表时,确定所述第i个第二任务的上游任务执行完成。The apparatus according to any one of claims 12 to 17, wherein the execution module is further configured to: determine each second task in the second dependency data table according to each second task After the upstream task of each second task, detect whether the upstream task of each second task has produced a data table; when it is detected that the upstream task of the i-th second task has produced a data table, determine whether the i-th second task has produced a data table. The upstream task of the second task is executed.
  19. 根据权利要求12至18中的任一项所述的装置,其中,所述执行模块还被配置成:在所述当所述第一任务执行完成时,确定N个第二任务之后,根据所述N个第二任务中的每一个第二任务的第二执行参数,为所述每一个第二任务注册执行触发器,所述执行触发器被配置成在对应的所述第二任务的执行时刻到达时触发所述第二任务 执行;所述当所述N个第二任务中的第i个第二任务的上游任务执行完成时,执行所述第i个第二任务,包括:当所述第i个第二任务的上游任务执行完成时,触发所述第i个第二任务对应的执行触发器。The apparatus according to any one of claims 12 to 18, wherein the execution module is further configured to: after determining N second tasks when the execution of the first task is completed, according to the The second execution parameter of each second task among the N second tasks is to register an execution trigger for each second task, and the execution trigger is configured to execute the corresponding second task. The second task is triggered when the time arrives Execution; when the upstream task of the i-th second task among the N second tasks is completed, executing the i-th second task includes: when the upstream task of the i-th second task When the task execution is completed, the execution trigger corresponding to the i-th second task is triggered.
  20. 一种电子设备,其中,所述电子设备包括:An electronic device, wherein the electronic device includes:
    被配置成存储处理器可执行指令的存储器;以及memory configured to store instructions executable by the processor; and
    处理器;其中,所述处理器被配置为:用于执行所述可执行指令时,实现根据权利要求1至11中的任一项所述的方法。A processor; wherein the processor is configured to implement the method according to any one of claims 1 to 11 when executing the executable instructions.
  21. 一种计算机可读存储介质,其中,所述可读存储介质存储有可执行程序,其中,所述可执行程序被处理器执行时实现根据权利要求1至11中的任一项所述的方法。 A computer-readable storage medium, wherein the readable storage medium stores an executable program, wherein the executable program implements the method according to any one of claims 1 to 11 when executed by a processor .
PCT/CN2023/078004 2022-05-06 2023-02-23 Task scheduling method and apparatus, and device WO2023213118A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210488671.1 2022-05-06
CN202210488671.1A CN115098232A (en) 2022-05-06 2022-05-06 Task scheduling method, device and equipment

Publications (1)

Publication Number Publication Date
WO2023213118A1 true WO2023213118A1 (en) 2023-11-09

Family

ID=83287137

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/078004 WO2023213118A1 (en) 2022-05-06 2023-02-23 Task scheduling method and apparatus, and device

Country Status (2)

Country Link
CN (1) CN115098232A (en)
WO (1) WO2023213118A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115098232A (en) * 2022-05-06 2022-09-23 北京快乐茄信息技术有限公司 Task scheduling method, device and equipment

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180341516A1 (en) * 2017-05-25 2018-11-29 International Business Machines Corporation Processing jobs using task dependencies
US20210064425A1 (en) * 2018-05-15 2021-03-04 Huawei Technologies Co., Ltd. Task Processing Method, Processing Apparatus, and Computer System
CN112559143A (en) * 2020-12-04 2021-03-26 海南车智易通信息技术有限公司 Task scheduling method and system and computing device
CN112801546A (en) * 2021-03-18 2021-05-14 中国工商银行股份有限公司 Task scheduling method, device and storage medium
CN113535364A (en) * 2021-07-29 2021-10-22 维沃移动通信(杭州)有限公司 Task scheduling method and device
CN113806038A (en) * 2021-08-04 2021-12-17 北京房江湖科技有限公司 Task scheduling method, device, electronic equipment, storage medium and program product
CN113918288A (en) * 2020-07-07 2022-01-11 北京达佳互联信息技术有限公司 Task processing method, device, server and storage medium
CN115098232A (en) * 2022-05-06 2022-09-23 北京快乐茄信息技术有限公司 Task scheduling method, device and equipment

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180341516A1 (en) * 2017-05-25 2018-11-29 International Business Machines Corporation Processing jobs using task dependencies
US20210064425A1 (en) * 2018-05-15 2021-03-04 Huawei Technologies Co., Ltd. Task Processing Method, Processing Apparatus, and Computer System
CN113918288A (en) * 2020-07-07 2022-01-11 北京达佳互联信息技术有限公司 Task processing method, device, server and storage medium
CN112559143A (en) * 2020-12-04 2021-03-26 海南车智易通信息技术有限公司 Task scheduling method and system and computing device
CN112801546A (en) * 2021-03-18 2021-05-14 中国工商银行股份有限公司 Task scheduling method, device and storage medium
CN113535364A (en) * 2021-07-29 2021-10-22 维沃移动通信(杭州)有限公司 Task scheduling method and device
CN113806038A (en) * 2021-08-04 2021-12-17 北京房江湖科技有限公司 Task scheduling method, device, electronic equipment, storage medium and program product
CN115098232A (en) * 2022-05-06 2022-09-23 北京快乐茄信息技术有限公司 Task scheduling method, device and equipment

Also Published As

Publication number Publication date
CN115098232A (en) 2022-09-23

Similar Documents

Publication Publication Date Title
US10649935B2 (en) Deferred inter-processor interrupts
Boykin et al. Summingbird: A framework for integrating batch and online mapreduce computations
JP6266630B2 (en) Managing continuous queries with archived relations
US9996394B2 (en) Scheduling accelerator tasks on accelerators using graphs
EP3513317A1 (en) Data serialization in a distributed event processing system
WO2018045753A1 (en) Method and device for distributed graph computation
WO2023213118A1 (en) Task scheduling method and apparatus, and device
CN110297955B (en) Information query method, device, equipment and medium
US10838931B1 (en) Use of stream-oriented log data structure for full-text search oriented inverted index metadata
WO2016116020A1 (en) Method, apparatus and apparatus for realizing expired operation of object
CN114153783B (en) Method, system, computer device and storage medium for implementing multi-core communication mechanism
CN113360581A (en) Data processing method, device and storage medium
CN110109986B (en) Task processing method, system, server and task scheduling system
US9659041B2 (en) Model for capturing audit trail data with reduced probability of loss of critical data
CN110727666A (en) Cache assembly, method, equipment and storage medium for industrial internet platform
US20090249343A1 (en) System, method, and computer program product for receiving timer objects from local lists in a global list for being used to execute events associated therewith
WO2019134084A1 (en) Code execution method and apparatus, terminal device, and computer-readable storage medium
CN110865877B (en) Task request response method and device
CN113672358A (en) Timing task processing method, device and system, electronic equipment and storage medium
CN111090633A (en) Small file aggregation method, device and equipment of distributed file system
Jiang et al. A generic specification framework for weakly consistent replicated data types
CN115826731B (en) Sleep control method and device, storage medium and computing equipment
CN112306711B (en) Service alarm method, equipment and computer readable storage medium
Drótos et al. Interrupt driven parallel processing
CN115600567B (en) Report export method and device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23799108

Country of ref document: EP

Kind code of ref document: A1