CN110554909A

CN110554909A - task scheduling processing method and device and computer equipment

Info

Publication number: CN110554909A
Application number: CN201910844301.5A
Authority: CN
Inventors: 王自昊
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-09-06
Filing date: 2019-09-06
Publication date: 2019-12-10

Abstract

The method constructs a directed acyclic graph based on the dependency relationship among tasks, constructs a task scheduling queue by performing deep traversal on the constructed directed acyclic graph, and finally controls the scheduling and execution of the tasks based on the task scheduling queue and the dependency relationship among the tasks; different tasks with dependency relations are executed in series according to the dependency relations, and at least part of different tasks without dependency relations are executed in parallel. According to the method and the device, based on the directed acyclic graph capable of reflecting the task dependency relationship, at least part of tasks in different tasks without the dependency relationship are executed in parallel, the utilization rate of computing resources can be improved to a certain extent, the computing efficiency of the tasks is improved at the same time, in addition, the tasks which depend on the tasks are executed in series according to the dependency relationship among the tasks, the repeated execution of the tasks on the front tasks which depend on the tasks is avoided, and the task execution efficiency is further improved.

Description

task scheduling processing method and device and computer equipment

Technical Field

the present application belongs to the technical field of task scheduling, and in particular, to a method and an apparatus for task scheduling processing, and a computer device.

Background

In a feature analysis task of a recommendation model, for example, a feature analysis task of a quick-report recommendation model in the day and the day, a series of (for example, a dozen) different feature indexes are generally required to be calculated.

In the serial mode, different computing tasks need to be queued and executed in sequence, and queuing and waiting of the tasks easily causes idle computing resources, which correspondingly causes the defects of low utilization rate of the computing resources and low computing efficiency of the tasks.

Disclosure of Invention

In view of this, an object of the present application is to provide a method, an apparatus, and a computer device for scheduling and processing tasks, which are used to construct a directed acyclic graph of tasks according to dependency relationships among the tasks, and control scheduling and execution of the tasks according to the parallel and dependent computing framework provided by the directed acyclic graph and by combining the dependency relationships among the tasks in a parallel manner, so as to correspondingly achieve the purposes of improving the utilization rate of computing resources and improving the computing efficiency of the tasks.

In order to achieve the above object, in one aspect, the present application provides a method for scheduling and processing a task, where the method includes:

determining the dependency relationship among tasks;

Constructing a directed acyclic graph of the tasks based on the dependency relationship among the tasks;

constructing a task scheduling queue by performing deep traversal on the directed acyclic graph;

controlling the scheduling and execution of each task based on the task scheduling queue and the dependency relationship among the tasks; different tasks with dependency relations are executed in series according to the dependency relations, and at least part of different tasks without dependency relations are executed in parallel.

on the other hand, the present application further provides a task scheduling processing apparatus, including:

The determining unit is used for determining the dependency relationship among the tasks;

The first construction unit is used for constructing a directed acyclic graph of the tasks based on the dependency relationship among the tasks;

The second construction unit is used for constructing a task scheduling queue by performing depth traversal on the directed acyclic graph;

The control unit is used for controlling the scheduling and the execution of each task based on the task scheduling queue and the dependency relationship among the tasks; different tasks with dependency relations are executed in series according to the dependency relations, and at least part of different tasks without dependency relations are executed in parallel.

In yet another aspect, the present application further provides a computer device, including:

A memory for storing computer executable instructions;

A processor for loading and executing the computer-executable instructions, which when loaded and executed, are at least operable to carry out the method as described above.

according to the scheme, the task scheduling processing method, the task scheduling processing device and the computer equipment construct the directed acyclic graph based on the dependency relationship among the tasks, construct the task scheduling queue by performing deep traversal on the constructed directed acyclic graph, and finally control the scheduling and execution of the tasks based on the task scheduling queue and the dependency relationship among the tasks; different tasks with dependency relations are executed in series according to the dependency relations, and at least part of different tasks without dependency relations are executed in parallel. According to the method and the device, based on the directed acyclic graph capable of reflecting the task dependency relationship, at least part of tasks in different tasks without the dependency relationship are executed in parallel, the utilization rate of computing resources can be improved to a certain extent, the computing efficiency of the tasks is improved at the same time, in addition, the tasks which depend on the tasks are executed in series according to the dependency relationship among the tasks, the repeated execution of the tasks on the front tasks which depend on the tasks is avoided, and the task execution efficiency is further improved.

Drawings

in order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

FIG. 1 is a schematic diagram of dependencies between different tasks in an alternative embodiment of the present application;

fig. 2 is an architecture diagram of a scheduling and execution processing scenario of a feature index calculation task of a server cluster in an alternative embodiment of the present application;

FIG. 3 is a schematic diagram of an alternate embodiment of a computer device;

FIG. 4 is a flowchart illustrating a task scheduling method according to an alternative embodiment of the present application;

FIG. 5 is a diagram illustrating an example graph structure of a directed acyclic graph in an alternative embodiment of the present application;

FIG. 6 is a schematic flow chart diagram illustrating a task scheduling method according to an alternative embodiment of the present application;

FIG. 7 is a schematic flow chart diagram illustrating a task scheduling method according to an alternative embodiment of the present application;

FIG. 8 is a schematic flow chart diagram illustrating a task scheduling method according to an alternative embodiment of the present application;

FIG. 9 is a schematic flow chart diagram illustrating a task scheduling method according to an alternative embodiment of the present application;

FIG. 10 is a logical representation of a scheduling process of feature metric calculation tasks in an alternative embodiment of the present application;

Fig. 11 is a schematic structural diagram of a task scheduling processing apparatus in an alternative embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

the inventor finds that the feature index calculation task has the following three characteristics: 1) the calculated characteristic data amount is huge, and the data amount in 24 hours is dozens of T; 2) the calculation process of the individual characteristic index is complex and relates to a plurality of formulas; 3) the calculation of the feature indexes has a dependency relationship, and the analysis and calculation of some features need to use the analysis and calculation results of other features, which results in that the feature calculation is time-consuming and serious, and the calculation time is longer and longer with the increase of the calculation indexes, so that two serious consequences are caused: on one hand, the development and debugging efficiency is low, and the feature index calculation time occupies most of the development and debugging time; on the other hand, the on-line computing resources are occupied by the characteristic index computing tasks for a long time, so that other computing tasks cannot be scheduled to run.

aiming at the characteristics of the characteristic index calculation task, the inventor finds that at least the following problems exist when the current serial solution is adopted to execute the characteristic index calculation task through analysis and research:

1) Low utilization rate of computing resources

in the conventional serial solution, the M feature index calculation tasks are executed serially, and it is not considered that most of the calculation tasks actually require many calculation resources, and the serial execution of the tasks under the current resources does not fully utilize the calculation resources of the system. When the serial execution is performed, other calculation tasks irrelevant to the currently executed calculation task also need to be queued, and most of the calculation resources are in an idle state in the task queuing and waiting process, so that the utilization rate of the calculation resources of the system is low.

2) low computational efficiency

in the existing serial solution, the interdependence relation between the computing tasks is not considered, so that the depended tasks in different computing tasks with the dependency relation can be repeatedly executed. In this case, assuming that there are M computing tasks, where N computing tasks are depended on by other tasks, at least (M + N) tasks need to be executed in a serial manner. As shown in FIG. 1, task 10 depends on the results of the computations of tasks 9 and 2, and task 11 depends on the results of the computations of task 2. According to the current serial solution, task 2 will be performed 3 times, task 9 will be performed 2 times, and the whole calculation process needs to be performed 7 times. But ideally, only 4 calculations are needed to meet the computational requirements of the task.

in view of the above defects of the current serial solutions, the present application provides a task scheduling processing method, a device, and a computer device, so as to construct a directed acyclic graph of tasks according to dependency relationships among the tasks, and control scheduling and execution of the tasks according to the dependency relationships among the tasks in a parallel manner based on a parallel and dependent computing framework provided by the directed acyclic graph, thereby correspondingly achieving the purpose of improving the utilization rate of computing resources and improving the computing efficiency of the tasks.

As an aspect of the embodiments of the present application, the present application first provides a task scheduling processing method, which may be applied in a task scheduling and execution processing scenario of a Computer device such as a PC (Personal Computer) or a server, and more specifically, for example, but not limited to, a feature indicator calculation task of a server cluster (e.g., a feature indicator calculation task in a feature analysis of a recommendation model). When the method is applied to a PC, the method can be implemented on the PC in a form of, but not limited to, APP or a local function of a device, and when the method is applied to a server, the method can be implemented in a form of, but not limited to, a platform function.

next, taking a scheduling and execution processing scenario of a feature index calculation task applied to a server cluster as an example, an application scenario architecture of the method of the present application is introduced, and in this scenario, the method of the present application is specifically applied to a certain server of the server cluster, and the server serving as an execution subject of the method of the present application may participate in task calculation or may not participate in task calculation.

As shown in fig. 2, the architecture of the application scenario includes a server cluster formed by a plurality of servers, optionally, one of the servers in the cluster is selected as an execution main body of the application method, and is used for providing a task scheduling processing function of the application method, implementing task distribution to corresponding computing units in each server of the cluster, and coordinating task execution processes between computing units of the servers based on task dependency relationships, in addition to providing the task scheduling processing function of the application method, the server serving as the execution main body may also participate in the task computing process based on computing resources thereof, and certainly, may not participate, this embodiment does not limit this, and other servers provide computing units with corresponding numbers of associated computing and storage resources for the task computing process responsible, and specifically, each computing unit at least has a CPU (such as one core or multiple cores) and an associated hard disk (and/or memory), and the like The computing and storage resources of (1).

Of course, the above application scenario is only a typical scenario exemplarily provided by the present application, and in practical applications, the present application method is not limited to this scenario, for example, the present application method may also be applied to a task scheduling and execution processing scenario inside a personal PC of a user, or a task scheduling and execution processing scenario among different PCs in a local area network, or a task scheduling and execution processing scenario inside a single server, and the like.

referring to fig. 3, a schematic structural diagram of a computer device to which the method of the present application is applied in the above application scenario is shown. The computer device is also the computer device disclosed as another aspect of the present application, and as shown in fig. 3, the computer device may include: a processor 301 and a memory 302. Of course, a communication interface 303, an input unit 304, a display 305, and a communication bus 306 may also be included.

The processor 301, the memory 302, the communication interface 303, the input unit 304, and the display 305 all communicate with each other via a communication bus 306.

In the embodiment of the present application, the processor 301 may be a Central Processing Unit (CPU), an application-specific integrated circuit (ASIC), a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic devices.

the processor 301 may call a program stored in the memory 302.

The memory 302 is used for storing one or more programs, the program may include program codes, the program codes include computer operation instructions (computer executable instructions), in this embodiment, the memory 302 stores at least the program for realizing the following functions:

Determining the dependency relationship among tasks;

in one possible implementation, the memory 302 may include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data created during use of the computer, such as user data, user access data, audio data, and the like.

Further, the memory 302 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.

the communication interface 303 may be an interface of a communication module, such as an interface of a GSM module.

the input unit 304 may be a touch sensing unit, a keyboard, and the like. Display 305 may include a display panel, such as a touch display panel or the like.

of course, the terminal device structure shown in fig. 3 does not constitute a limitation to the terminal device in the embodiment of the present application, and in practical applications, the terminal device may include more or less components than those shown in fig. 3, or some components may be combined.

The embodiments of the present application will be described in further detail below based on the above-described common aspects to the embodiments of the present application. As shown in fig. 4, a schematic flowchart of a task scheduling processing method according to an alternative embodiment of the present application is provided, where the method includes:

and step S401, determining the dependency relationship among the tasks.

The task may be, but is not limited to, a feature index calculation task to be performed, such as a feature index calculation task in feature analysis of a recommendation model, for example, the task may be a feature coverage calculation task, a feature number calculation task, a feature information entropy/information gain calculation task, a feature mutual information calculation task, and the like.

There may be a dependency relationship between different tasks, where if the execution process of one task needs to use the execution result of another task or tasks, the task is considered to be dependent on the other task or tasks, this embodiment refers to the depended task or tasks as a preceding task of the task, and relatively, the task is a subsequent task of the depended task or tasks. In addition, the present application refers to different tasks without any dependency as parallel tasks.

In step S401, the dependency relationship between the tasks may be determined based on the actual dependency status between the tasks.

And S402, constructing a directed acyclic graph of the tasks based on the dependency relationship among the tasks.

After the dependency relationship among the tasks is determined, a topological structure of the tasks is constructed based on the dependency relationship among the tasks, namely the constructed topological structure of the tasks can reflect the dependency relationship among different tasks, and then the topological structure of the tasks is further abstracted into a directed acyclic graph.

referring to fig. 5, in an exemplary provided directed acyclic graph structure of tasks, each node of the directed acyclic graph corresponds to each task to be executed one by one, that is, each node represents one task to be executed, and directed edges between different nodes of the directed acyclic graph indicate dependency relationships between different tasks. Of course, in other embodiments of the present application, a node in the directed acyclic graph where a directed edge deviates from may also be designed to represent a dependent pre-task (pre-node), and a node pointed to by the directed edge is correspondingly designed to represent a post-task (post-node) corresponding to the pre-task.

the start of a node task must satisfy the condition that the task of the node it depends on completes. One task (node) may depend on a plurality of tasks (nodes), or may depend on only one task (node), and there is no cyclic dependency relationship in the directed acyclic graph.

and S403, constructing a task scheduling queue by performing deep traversal on the directed acyclic graph.

the trend of the directed edge of the directed acyclic graph reflects the dependency relationship between tasks, and based on the characteristic, the embodiment constructs the task scheduling queue by performing deep traversal on the directed acyclic graph. By the aid of the task scheduling queue constructed by deep traversal of the directed acyclic graph, actual dependency relationship among tasks is obviously followed, and a preceding task in the task scheduling queue at least does not need to depend on a following task (such as a preceding task in the queue is a preceding task or a parallel task of the following task).

In this step, the specific implementation process of constructing the task scheduling queue by performing deep traversal on the directed acyclic graph will be described in detail later.

And S404, controlling the scheduling and execution of each task based on the task scheduling queue and the dependency relationship among the tasks.

After the task scheduling queue is constructed for each task to be executed, the scheduling and executing process of each task is further controlled based on the task scheduling queue, wherein, the starting of the node task must satisfy the condition that the task of the node depended on by the node task is completed, different tasks with dependency relationship need to be executed in series according to the dependency relationship, in order to improve the calculation efficiency of the tasks and the utilization rate of the tasks to the system resources, different tasks without dependency relationship can be executed in a parallel mode, wherein, optionally, all tasks in the currently executable tasks (referring to the completion of the task's predecessor task) without dependency relationship can be selected to be executed in parallel according to the resource idle status and the task's dependency status, or selecting to execute part of tasks which have no dependency relationship and can be executed currently in parallel.

the embodiment constructs the directed acyclic graph of the tasks according to the dependency relationship among the tasks, and controls the scheduling and the execution of the tasks by combining the dependency relationship among the tasks in a parallel mode based on a parallel and dependent computing framework provided by the directed acyclic graph. The method comprises the steps that at least part of tasks in different tasks without dependency relationships are executed in parallel based on a directed acyclic graph capable of reflecting task dependency relationships, the utilization rate of computing resources is improved to a certain extent, the computing efficiency of the tasks is improved at the same time, in addition, the dependent tasks are executed in series according to the dependency relationships among the tasks, the repeated execution of the tasks on the dependent tasks is avoided, and the task execution efficiency is further improved.

the following develops a detailed description of the implementation process of the method of the present application. In an optional implementation manner of the embodiment of the present application, referring to fig. 6, which is a schematic flow diagram of a task scheduling processing method of the present application, the method may specifically be implemented by the following processing procedures:

and step S601, determining the dependency relationship among the tasks.

and step S602, constructing a directed acyclic graph of the tasks based on the dependency relationship among the tasks.

the steps S601 to S602 are the same as the steps S401 to S402 in the above embodiment, and reference may be specifically made to the related descriptions or descriptions of the steps S401 to S402 in the above implementation, which is not described herein again.

Step S603, according to at least one designated target task, performing depth traversal on the directed acyclic graph to obtain a dependent path corresponding to each target task; the dependency path of the target task comprises various tasks from the target task to a root task on which the target task depends in the directed acyclic graph.

The at least one designated target task may be, but is not limited to, a task that is designated by a person associated with the system or is designated by the system automatically and needs to be executed preferentially based on priority or importance of the task, that is, the at least one designated target task is a task that is currently focused on and needs to be executed preferentially, but other tasks in the directed acyclic graph do not focus on currently, and may be executed after the designated target task is completed, or may be executed by using idle resources of the system without preempting resources of the target task, which is not limited in this embodiment.

For the case that at least one target task is specified, the present embodiment performs depth traversal on the directed acyclic graph according to the specified at least one target task to obtain a dependent path corresponding to each target task.

as shown in fig. 5, assuming that the system specifies tasks 7 and 9, the dependent path of task 7 can be obtained by deep traversal: 7 → 6 → 2 → 1, the dependency path of task 8: 9 → 5 → 2 → 1, wherein task 1 is the root task of the specified target tasks 7 and 9

And step S604, starting with the root task on each dependency path, and performing level sequence traversal on the tasks on each dependency path to obtain each task layer.

still taking the designated target tasks as the tasks 7 and 9 in fig. 5 as an example, the root tasks on the dependent paths 7 → 6 → 2 → 1 of the task 7 and the dependent paths 9 → 5 → 2 → 1 of the task 9 are all the tasks 1, the step specifically starts with the task 1, and specifically performs the sequence traversal on only the task nodes in the dependent paths 7 → 6 → 2 → 1 and the dependent paths 9 → 5 → 2 → 1 in the directed acyclic graph in a targeted manner, while performing no attention and no traversal processing on other nodes except the two paths, and by performing the sequence traversal on the tasks on the two paths, the following task layers can be obtained:

task layer 11: task 1;

The task layer 12: task 2;

and (3) task layer 13: task 6, task 5;

The task layer 14: task 7, and task 9.

step S605, building a task scheduling queue for each dependent task layer, and allocating the tasks in the same dependent task layer to the same corresponding task scheduling queue.

As in the above example, for 4 task layers, task layer 11-task layer 14, 4 task scheduling queues are constructed: task scheduling queue 11-task scheduling queue 14, wherein task 1 is allocated to task scheduling queue 11, task 2 is allocated to task scheduling queue 12, tasks 6 and 5 are allocated to task scheduling queue 13, and tasks 7 and 9 are allocated to task scheduling queue 14.

And step S606, determining a first scheduling queue needing to be scheduled currently according to the sequence.

After the tasks on the dependent paths of the specified tasks are distributed to the corresponding task scheduling queues based on the depth traversal and combined with the hierarchical traversal, the tasks on the dependent paths of the specified target tasks can be further scheduled and executed based on the task scheduling queues, so that a basis is provided for starting and executing the specified target tasks.

In view of this, in this embodiment, the first scheduling queue to be currently scheduled is determined in the sequence.

In the above example with respect to fig. 5, in the initial state of task scheduling, task 1 in task scheduling queue 11 is obviously scheduled first according to the sequence, so that task scheduling queue 11 is the first scheduling queue to be scheduled in the initial state.

step S607, allocating each task in the first scheduling queue to a corresponding number of computing units for parallel execution; when executing a task, if the task does not have a dependent pre-task, the task is directly executed, and if the dependent pre-task exists, the execution result of the dependent pre-task is obtained, and the task is executed based on the execution result of the pre-task.

After determining the first scheduling queue to be scheduled, the tasks in the first scheduling queue are scheduled and executed, wherein, if the first scheduling queue only comprises one task and the one task does not have a dependent pre-task or the pre-task is completed, the task can be directly distributed to a computing unit for processing, if the first scheduling queue comprises more than one task, since each task in the first scheduling queue is a node at the same level in the directed acyclic graph based on the level-order traversal, therefore, the tasks in the first scheduling queue have no dependency relationship, so that no queue waiting is needed during scheduling and execution, in this case, based on the consideration of improving the task computation efficiency and improving the resource utilization rate, it is preferable that each task in the first scheduling queue is allocated to a corresponding number of different computing units to be executed in parallel.

in the example of fig. 5, in an initial state of task scheduling, the task scheduling queue 11 is first determined as a first scheduling queue to be scheduled, and the task 1 is allocated to a corresponding computing unit for execution, for example, the task 1 is allocated to a computing unit having the computing capability in a certain server of a cluster specifically according to a data amount, a computing complexity, and the like of the task 1. After task 1 is completed, continuing to determine task scheduling queue 12 as a new first scheduling queue, and allocating task 2 therein to a corresponding computing unit, and executing task 2 by the computing unit based on the task result of task 1; then, determining the task scheduling queue 13 as a new first scheduling queue, and allocating the task 6 and the task 5 to two different computing units for parallel execution, wherein when the task 6 and the task 5 are executed in parallel, the two computing units respectively call the task result of the task 2 and execute the task 6 and the task 5 based on the task result of the task 2; finally, the task scheduling queue 14 is determined as a new first scheduling queue, and the task 7 and the task 9 are allocated to two different computing units to be executed in parallel, similarly, when the task 7 and the task 9 are executed in parallel, the two computing units respectively call the task results of the task 6 and the task 5, execute the task 7 based on the task result of the task 6, and execute the task 9 based on the execution result of the task 5.

step S608, performing persistence processing on the execution result of each task in the first scheduling queue, so as to be called by the task in the next task scheduling queue.

In the process of scheduling and executing the task, in order to facilitate the call of the task result of the pre-task by the post-task and avoid the repeated execution of the pre-task on which the post-task depends, the embodiment performs the persistence processing on the task result of each task in the first scheduling queue, wherein at least the task results of the dependent tasks for the respective tasks as predecessors have to be persisted to a respective storage location, and records the persistent location information, such as persisting to a hard disk storage space associated with the computing unit and recording, thus, when a task needs to be executed depending on the task results of one or more preceding tasks, based on the recorded persistent location information, the task results of the one or more pre-tasks that depend on can be read from the corresponding locations, and the current task can be started and executed based on the read task result information.

In this embodiment, for the case that a target task is specified, a deep traversal is used in combination with a hierarchical traversal, and only a dependent path of the specified target task is subjected to task traversal and scheduling control in a targeted manner, so as to implement a priority execution of the specified target task, and for other tasks except for the dependent path of the target task in the directed acyclic graph, the tasks may be executed after the specified target task is completed, or may be executed by using idle resources of the system without preempting resources of the target task, which is not limited in this embodiment, for example, for fig. 5, it is assumed that the target task is still the task 7 and the task 9, and it is assumed that two available computing units are currently available in the system, when task 1 is completed and task 2 is executed, the task 3 or task 10 may be executed by using one currently idle computing unit, the premise that the task 3 or the task 10 is executed by using a currently idle computing unit is that the computing time of the task 3 or the task 10 is required not to exceed the execution time of the task 2, so as to ensure that the starting of the subsequent task 6 and the task 5 is not influenced.

In an optional implementation manner of the embodiment of the present application, referring to the flowchart of fig. 7, the method for scheduling a task according to the present application may further be implemented by the following processing procedures:

And step S701, determining the dependency relationship among the tasks.

And S702, constructing a directed acyclic graph of the tasks based on the dependency relationship among the tasks.

the steps S701 to S702 are the same as the steps S401 to S402 in the above embodiment, and specific reference may be made to the related description or description of the steps S401 to S402 in the above implementation, which is not described herein again.

And step S703, randomly selecting at least one target task in the directed acyclic graph.

Under the condition that tasks needing to be executed preferentially are not specified, for each task needing to be executed, one or more target tasks can be randomly selected from the directed acyclic graph by the system to serve as starting tasks of the depth traversal.

step S704, according to the at least one target task, performing a deep traversal on the directed acyclic graph to obtain at least one root task corresponding to the at least one target task.

Taking a randomly selected target task as the task 7 and the task 9 in fig. 5 as an example, then, starting with the task 7 and the task 9, performing deep traversal on the directed acyclic graph, and accordingly obtaining root tasks of the task 7 and the task 9: task 1.

Step S705, starting with the at least one root task, performing a hierarchical traversal on the directed acyclic graph to obtain each task layer.

different from the previous embodiment, in this embodiment, for a case where a task that needs to be preferentially executed is not specified, after at least one root task corresponding to at least one randomly selected target task is determined through depth traversal, all task nodes in the directed acyclic graph are subjected to level-order traversal starting with the at least one root task, and each task layer covering all task nodes (instead of task nodes on a dependent path of the specified target task) is correspondingly obtained.

For example, in the fig. 5, starting with task 1, after performing hierarchical traversal on all task nodes of the directed acyclic graph, the following task layers can be obtained:

Task layer 21: task 1;

The task layer 22: task 2, task 3, task 10;

Task layer 23: task 6, task 5, and task 4;

The task layer 24: task 7, task 8, and task 9.

Step S706, a task scheduling queue is built for each task layer, and tasks on the same task layer are distributed to the same corresponding task scheduling queue.

similar to the previous embodiment, for the 4 task layers, i.e. task layer 21-task layer 24, 4 task scheduling queues are constructed: task scheduling queue 21-task scheduling queue 24, wherein task 1 is allocated to task scheduling queue 21, tasks 2, 3 and 10 are allocated to task scheduling queue 22, task 6, task 5 and task 4 are allocated to task scheduling queue 23, and task 7, task 8 and task 9 are allocated to task scheduling queue 24.

And step S707, determining a first scheduling queue needing scheduling currently according to the sequence.

Step S708, allocating each task in the first scheduling queue to a corresponding number of computing units for parallel execution; when executing a task, if the task does not have a dependent pre-task, the task is directly executed, and if the dependent pre-task exists, the execution result of the dependent pre-task is obtained, and the task is executed based on the execution result of the pre-task.

Step S709, perform persistence processing on the execution result of each task in the first scheduling queue, so as to allow the task in the next task scheduling queue to call.

The execution process of the steps S707-S709 is similar to the execution process of the steps S606-S608 in the previous embodiment, and the difference from the previous embodiment is that in the present embodiment, each task layer covers all task nodes in the directed acyclic graph, and in the previous embodiment, each task layer covers each task node on the dependent path of the specified target task, correspondingly, in the steps S707-S709 of the present embodiment, when determining the first scheduling queue and scheduling each task in the first scheduling queue, and performing the persistence process with the task result, the processing objects are "all" task nodes of the corresponding task layer corresponding to the layer in the directed acyclic graph, and in the steps S606-S608 of the previous embodiment, the processing objects are task nodes contained in the dependent path of the corresponding task layer corresponding to the layer in the directed acyclic graph, the processing of other parts is the same, and therefore, for steps S707 to S709, reference is specifically made to the related description or description of steps S606 to S608 in the previous embodiment, and the processing objects in this embodiment are understood, and no further description is given here.

In the embodiment, for the condition that the task which needs to be preferentially executed is not specified, the depth traversal is combined with the sequence traversal to perform traversal and scheduling control on all task nodes of the directed acyclic graph, so that the task scheduling and execution are controlled in a parallel mode and in combination with the dependency relationship among the tasks, and the purposes of improving the utilization rate of computing resources and improving the task computing efficiency are correspondingly achieved.

In an optional implementation manner of the embodiment of the present application, referring to a flowchart of a task scheduling processing method shown in fig. 8, the method may further be implemented by the following processing procedures:

And step S801, determining the dependency relationship among the tasks.

And S802, constructing a directed acyclic graph of the tasks based on the dependency relationship among the tasks.

And S803, randomly selecting at least one target task in the directed acyclic graph.

Step S804, according to the at least one target task, performing a deep traversal on the directed acyclic graph to obtain at least one root task corresponding to the at least one target task.

the steps S801 to S804 are the same as the steps S701 to S404 in the above embodiment, and specific reference may be made to the related description or description of the steps S701 to S704 in the above implementation, which is not described herein again.

and step S805, starting from the at least one root task, classifying each task according to the dependency relationship among the tasks in the directed acyclic graph and the task property of each task to obtain each task class.

The task property may be, but is not limited to, a priority and/or an importance level of the task.

the embodiment provides a solution for traversing and scheduling tasks aiming at the condition that each task to be executed has corresponding different or same priority and/or importance degree.

Specifically, the tasks in the directed acyclic graph can be classified by using the at least one root task as a start according to the priority and/or the importance of each task in the directed acyclic graph and the dependency relationship among the tasks in the directed acyclic graph, so that each task with the same task property and the prepositive tasks on the dependency paths thereof are classified as one class as far as possible.

for example, referring to fig. 5, assuming that the priority of the task nodes 2, 3, and 7 in fig. 5 is the highest, the priority of the task nodes 8 and 4 is the lowest, the priority of the task nodes 9 and 10 is the next lowest, and the priorities of the other nodes are the lowest, then for the priority case, in combination with the dependency relationship between the tasks in the directed acyclic graph, the task nodes in the directed acyclic graph can be divided into the following categories:

Task class 1: task 1, task 2, task 3, task 6, and task 7;

Task class 2: task 5, task 8, and task 4;

task class 3: task 9, task 10.

step S806, a task scheduling queue is constructed for each task class, and the tasks in the same task class are allocated to the same corresponding task scheduling queue.

as in the above example, for 3 task classes, task class 1-task class 3, 3 task scheduling queues are constructed: task scheduling queue 31-task scheduling queue 33, wherein task 1, task 2, task 3, task 6, and task 7 are allocated to task scheduling queue 31, task 5, task 8, and task 4 are allocated to task scheduling queue 32, and task 5, task 8, and task 4 are allocated to task scheduling queue 33.

Step S807, determining a second scheduling queue to be currently scheduled based on the task property.

after each task scheduling queue is constructed, the present embodiment determines a second scheduling queue to be currently scheduled based on the property of the task, such as the priority and/or the important program of the task.

for the above example, in the initial state of task scheduling, according to the priority status of the task, it may be determined that the priority of the task nodes 2, 3, and 7 is the highest, so that the task scheduling queue 31 needs to be used as the second scheduling queue to be scheduled first. After the execution of each task in the second scheduling queue is completed (or under the condition that the system has idle resources), according to the task priority, the task scheduling queue 32 may be continuously used as the second scheduling queue to be scheduled, and the process is circulated until the task scheduling queue 33 is finally used as the second scheduling queue to be scheduled.

step S808, if the current task to be scheduled in the second scheduling queue does not have a dependent pre-task, allocating the current task to be scheduled to a corresponding computing unit for execution.

since the task classes are divided based on the task priorities and the dependencies among the tasks, the same class, that is, the different tasks in the same task scheduling queue, may or may not have a dependency. As in the task scheduling queue 31, the task 3 does not have a dependency relationship with the tasks 2, 6, and 7, but the tasks 7, 6, 2, and 1 have a dependency relationship in this order, and the task 1 does not have a dependent pre-task.

For the current task to be scheduled in the second scheduling queue, if the task does not have a dependent pre-task, such as the task 1, the task is directly allocated to the corresponding computing unit for execution.

And step S809, if the current task to be scheduled has a dependent pre-task, allocating the current task to be scheduled to a corresponding computing unit for execution when the pre-task is finished.

If the current task to be scheduled has a dependent pre-task, the start of the node task must meet the condition that the task of the node to be dependent is completed, so that the current task to be scheduled is redistributed to the corresponding computing unit to be executed when the pre-task of the task is executed, and when the computing unit executes the task, the computing unit needs to call the task result of the pre-task of the task from the corresponding persistence position and execute the processing process of the task based on the task result of the pre-task.

step S810, if there are other tasks in the second scheduling queue that do not have a dependency relationship with the current task to be scheduled, allocating the current task to be scheduled and the other tasks to different computing units for parallel execution.

When the current task in the second scheduling queue is scheduled and allocated to the corresponding computing unit, if other tasks which do not have a dependency relationship with the current task exist in the second scheduling queue, that is, a parallel task of the current task exists, the parallel task of the current task does not need to be queued for waiting, but the parallel task of the current task can be directly allocated to other different computing units and executed in parallel with the current task.

If the task 3 in the task scheduling queue 31 is a parallel task, when the task 2 is scheduled and the task 2 is allocated to a certain computing unit, the task 3 does not need to wait in a queue, but is directly allocated to another different computing unit, so that the two different computing units can execute the task 2 and the task 3 which are parallel to each other and have no dependency relationship.

step S811, the execution result of each task in the second scheduling queue is subjected to persistence processing so as to be called by the task in the next task scheduling queue

In the process of scheduling and executing the task, in order to facilitate the call of the task result of the pre-task by the post-task and avoid the repeated execution of the pre-task on which the post-task depends, the embodiment performs the persistence processing on the execution result of each task in the second scheduling queue, wherein at least the task results of the dependent tasks for the respective tasks as predecessors have to be persisted to a respective storage location, and records the persistent location information, such as persisting to a hard disk storage space associated with the computing unit and recording, thus, when a task needs to be executed depending on the task results of one or more preceding tasks, based on the recorded persistent location information, the task results of the one or more pre-tasks that depend on can be read from the corresponding locations, and the current task can be started and executed based on the read task result information.

In the embodiment, for the case that each task to be executed has a corresponding different or the same priority and/or importance level, by prioritizing and/or importance of tasks in combination with dependencies between tasks, the tasks are divided into the task classes, then the tasks are scheduled based on the task classes corresponding to different priorities and/or importance degrees, the tasks are executed according to the priorities and/or the importance degrees of the tasks and the dependency relationship among the tasks, the priority requirements of the tasks are effectively met, meanwhile, the embodiment gives consideration to the dependency relationship among the tasks, and controls the scheduling and execution of the tasks according to the parallel mode and the dependency relationship among the tasks based on the parallel and dependency computation framework provided by the directed acyclic graph, so that the aims of improving the utilization rate of the computation resources and the computation efficiency of the tasks can be fulfilled; in addition, the embodiment can also greatly reduce the search space for selecting the computing unit when the task is scheduled, thereby greatly reducing the running time of the whole scheduling.

in an optional implementation manner of the embodiment of the present application, referring to a flowchart of a scheduling processing method of a task in the present application shown in fig. 9, the method may further include the following processing procedures:

Step S901, based on a greedy policy, allocating a task that can be currently executed in the directed acyclic graph to a corresponding computing unit for execution until no available computing unit is available or no task that can be currently executed in the directed acyclic graph is available.

The task which can be executed currently refers to a task which can be started and executed according with the basic starting condition after the execution of the dependent front task is finished.

In order to further improve the utilization rate of the computing resources and the execution efficiency of the tasks, in a specific implementation, a greedy policy may be adopted, and in a case that the computing resources are idle, the tasks are concurrently executed as many as possible, that is, in a case that the computing resources are idle, the tasks that can be currently executed in the directed acyclic graph are allocated to the corresponding computing units to be executed until no available computing units are available or the tasks that cannot be currently executed in the directed acyclic graph are available.

The following provides a specific application example of the method, which is specifically directed to a scheduling and execution processing scenario of a feature index calculation task in feature analysis of a recommendation model.

As shown in fig. 10, in this example, the scheduling process of the task can be completed by the following procedure;

1) Configuring a feature analysis calculation task to be executed;

For example, for a recommendation model for the quick reporting every day, more than ten feature index calculation tasks can be specifically configured.

2) Constructing a directed acyclic graph of the computing tasks according to the dependency relationship among the computing tasks;

3) Generating a task scheduling queue based on scheduling requirements;

for the scheduling requirement of the target task which is specified to be executed preferentially, or the scheduling requirement under the condition that the target task which is not specified to be executed preferentially and the priority/importance degree relation between the tasks is absent, or the scheduling requirement under the condition that each task which is required to be executed has different priority or the same priority and/or importance degree, the corresponding task scheduling queue can be respectively constructed for each calculation task in the directed acyclic graph according to the processing procedures of the corresponding embodiments.

4) distributing the computing tasks to corresponding computing units to be executed based on the task scheduling queue;

5) and carrying out persistence processing on the task result of each depended computing task and updating the task state so as to support the task result of the depended front task to be multiplexed by the rear task.

On the other hand, the application also provides a task scheduling processing device.

referring to fig. 11, a schematic structural diagram of a task scheduling apparatus in the present application is shown, where the apparatus is specifically applicable to a computer device such as a PC or a server, and the constituent structure of the computer device to which the apparatus is applicable may refer to the related description above, and is not described herein again.

As shown in fig. 11, the task scheduling processing apparatus according to the embodiment of the present application may include:

A determining unit 1101 configured to determine a dependency relationship between the tasks;

A first building unit 1102, configured to build a directed acyclic graph of tasks based on a dependency relationship between the tasks;

A second constructing unit 1103, configured to construct a task scheduling queue by performing deep traversal on the directed acyclic graph;

A control unit 1104, configured to control scheduling and execution of each task based on the task scheduling queue and a dependency relationship between each task; different tasks with dependency relations are executed in series according to the dependency relations, and at least part of different tasks without dependency relations are executed in parallel.

in an optional implementation manner of the embodiment of the present application, each node of the directed acyclic graph corresponds to each task one to one, and a directed edge between different nodes of the directed acyclic graph indicates a dependency relationship between different tasks.

In an optional implementation manner of the embodiment of the present application, the second constructing unit 1103 is specifically configured to:

According to at least one appointed target task, performing depth traversal on the directed acyclic graph to obtain a dependent path corresponding to each target task; the dependency path of the target task comprises various tasks from the target task to a root task which the target task depends on in the directed acyclic graph;

starting with the root task on each dependent path, and performing layer sequence traversal on the tasks on each dependent path to obtain each task layer;

and constructing a task scheduling queue for each task layer, and distributing the tasks on the same task layer to the corresponding same task scheduling queue.

Randomly selecting at least one target task in the directed acyclic graph;

According to the at least one target task, performing depth traversal on the directed acyclic graph to obtain at least one root task corresponding to the at least one target task;

Starting with the at least one root task, and performing layer sequence traversal on the directed acyclic graph to obtain each task layer;

In an optional implementation manner of the embodiment of the present application, the control unit 1104 is specifically configured to:

Determining a first scheduling queue needing scheduling currently according to a sequence;

Distributing each task in the first scheduling queue to each computing unit with corresponding quantity to execute in parallel; when executing a task, if the task does not have a dependent pre-task, directly executing the task, if the dependent pre-task exists, acquiring a task result of the dependent pre-task, and executing the task based on the task result of the pre-task;

and carrying out persistence processing on the task results of each task in the first scheduling queue so as to be called by the corresponding post task.

Randomly selecting at least one target task in the directed acyclic graph;

Classifying each task by taking the at least one root task as a start according to the dependency relationship among each task in the directed acyclic graph and the task property of each task to obtain each task class;

And constructing a task scheduling queue for each task class, and distributing the tasks in the same task class to the corresponding same task scheduling queue.

Determining a second scheduling queue needing to be scheduled currently based on task properties;

If the current task to be scheduled in the second scheduling queue does not have a dependent pre-task, distributing the current task to be scheduled to a corresponding computing unit for execution;

If the current task to be scheduled has a dependent pre-task, the current task to be scheduled is allocated to a corresponding computing unit for execution when the pre-task is finished;

if other tasks which do not have dependency relationship with the current task to be scheduled exist in the second scheduling queue, distributing the current task to be scheduled and the other tasks to different computing units for parallel execution;

And performing persistence processing on the task result of each task in the second scheduling queue so as to be called by the corresponding post task.

in an optional implementation manner of the embodiment of the present application, the control unit 1104 may further be configured to: and distributing the tasks which can be executed currently in the directed acyclic graph to corresponding computing units for execution based on a greedy strategy until no available computing units are available or no tasks which can be executed currently in the directed acyclic graph are available.

in still another aspect, the present application further provides a storage medium, where a computer program is stored, and when the computer program is loaded and executed by a processor, the method for scheduling and processing tasks as described in any one of the above embodiments is implemented.

It should be noted that, in the present specification, the embodiments are all described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other.

for convenience of description, the above system or apparatus is described as being divided into various modules or units by function, respectively. Of course, the functionality of the units may be implemented in one or more software and/or hardware when implementing the present application.

From the above description of the embodiments, it is clear to those skilled in the art that the present application can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present application may be essentially or partially implemented in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments of the present application.

Finally, it is further noted that, herein, relational terms such as first, second, third, fourth, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

the foregoing is only a preferred embodiment of the present application and it should be noted that those skilled in the art can make several improvements and modifications without departing from the principle of the present application, and these improvements and modifications should also be considered as the protection scope of the present application.

Claims

1. A task scheduling processing method is characterized by comprising the following steps:

determining the dependency relationship among tasks;

2. The method according to claim 1, wherein each node of the directed acyclic graph has a one-to-one correspondence with each task, and wherein directed edges between different nodes of the directed acyclic graph indicate dependencies between different tasks.

3. the method of claim 2, wherein constructing a task scheduling queue by performing a deep traversal of the directed acyclic graph comprises:

4. The method of claim 2, wherein constructing a task scheduling queue by performing a deep traversal of the directed acyclic graph comprises:

Randomly selecting at least one target task in the directed acyclic graph;

5. the method according to claim 3 or 4, wherein the controlling the scheduling and the execution of each task based on the task scheduling queue and the dependency relationship between each task comprises:

6. the method of claim 2, wherein constructing a task scheduling queue by performing a deep traversal of the directed acyclic graph comprises:

Randomly selecting at least one target task in the directed acyclic graph;

7. The method according to claim 6, wherein the controlling the scheduling and the execution of each task based on the task scheduling queue and the dependency relationship between each task comprises:

8. the method according to any one of claims 1-7, wherein in controlling the scheduling and execution of the respective tasks, the method further comprises:

And distributing the tasks which can be executed currently in the directed acyclic graph to corresponding computing units for execution based on a greedy strategy until no available computing units are available or no tasks which can be executed currently in the directed acyclic graph are available.

9. A task scheduling processing apparatus, comprising:

10. A computer device, comprising:

a memory for storing computer executable instructions;

a processor for loading and executing the computer-executable instructions, which when loaded and executed are at least operable to carry out the method of any one of claims 1 to 8.