CN113608852A

CN113608852A - Task scheduling method, scheduling module, inference node and collaborative operation system

Info

Publication number: CN113608852A
Application number: CN202110888396.8A
Authority: CN
Inventors: 张海俊; 朱亚平; 姚文军; 李华清
Original assignee: iFlytek Co Ltd
Current assignee: University of Science and Technology of China USTC; iFlytek Co Ltd
Priority date: 2021-08-03
Filing date: 2021-08-03
Publication date: 2021-11-05

Abstract

The invention provides a task scheduling method, a scheduling module, an inference node and a collaborative operation system, wherein the method is applied to the scheduling module and comprises the following steps: acquiring information to be processed of each inference node, wherein each inference node collaborates with the operation, and the information to be processed comprises the number of tasks to be processed under the corresponding inference node and/or the task type of each task to be processed; selecting target nodes from the reasoning nodes based on the information to be processed of the reasoning nodes and the importance of the reasoning nodes in the collaborative operation, wherein the sum of computing resources required by all the target nodes for task processing is less than or equal to the rated computing quantity; and sending a task processing instruction to the target node to trigger the target node to perform task processing, so that the requirements on the overall throughput and the task response time are met, the problem that a scheduling inference scheme of a single neural network cannot schedule and infer tasks of cooperative work of a plurality of neural networks is solved, and scheduling inference under a complex scene is realized.

Description

Task scheduling method, scheduling module, inference node and collaborative operation system

Technical Field

The invention relates to the technical field of computers, in particular to a task scheduling method, a scheduling module, an inference node and a collaborative operation system.

Background

With the wide development of the application scene of the cloud multi-path neural network, the application of the cloud multi-path neural network is extended from initial off-line identification to real-time on-line, and the application of a single network is changed into multi-network cooperative work. In this process, not only the throughput requirement but also the task response time requirement must be met.

At present, for an application scenario of a single neural network, the use efficiency of equipment is generally improved and the requirement of the overall task throughput is met by increasing the number of data splicing units of tasks, but the method limits the complexity of the tasks, is only effective for the tasks of the single neural network, and cannot schedule and infer the tasks of multiple neural networks working in cooperation.

Disclosure of Invention

The invention provides a task scheduling method, a scheduling module, an inference node and a cooperative operation system, which are used for solving the defect that the scheduling inference of a plurality of tasks working cooperatively through a neural network cannot be carried out in the prior art.

The invention provides a task scheduling method, which is applied to a scheduling module and comprises the following steps:

acquiring to-be-processed information of each inference node, wherein the to-be-processed information comprises the number of to-be-processed tasks and/or the task type of each to-be-processed task under the corresponding inference node;

selecting target nodes from the reasoning nodes based on the information to be processed of the reasoning nodes and the importance of the reasoning nodes in the collaborative operation, wherein the sum of computing resources required by all the target nodes for task processing is less than or equal to the rated computing quantity;

and sending a task processing instruction to the target node to trigger the target node to perform task processing.

According to the task scheduling method provided by the invention, the step of selecting the target node from the reasoning nodes based on the information to be processed of the reasoning nodes and the importance of the reasoning nodes in the collaborative operation comprises the following steps:

determining the number of priority tasks under each inference node based on the to-be-processed information of each inference node, wherein the priority tasks are to-be-processed tasks with task types as priority types;

and selecting a target node from the reasoning nodes based on the number of the priority tasks under the reasoning nodes and the importance of the reasoning nodes in the cooperative operation.

According to the task scheduling method provided by the invention, the step of selecting the target node from the reasoning nodes based on the number of the priority tasks under the reasoning nodes and the importance of the reasoning nodes in the collaborative operation comprises the following steps:

if the priority reasoning nodes exist, determining the reasoning priority of each priority reasoning node based on the number of the tasks to be processed of each priority reasoning node and the cooperative operation weight;

determining a target node based on the inference priority of each priority inference node;

the priority reasoning node is a reasoning node with the priority task number larger than 0, and the collaborative operation weight is determined based on the importance of the corresponding reasoning node in the collaborative operation.

According to the task scheduling method provided by the invention, the step of determining the target node based on the inference priority of each priority inference node comprises the following steps:

if the number of the priority inference nodes is larger than or equal to a first preset number, selecting the priority inference node with the highest inference priority of the first preset number as a target node;

otherwise, taking all the priority inference nodes as target nodes, and selecting a second preset number of non-priority inference nodes with highest inference priority as the target nodes;

the first preset number is a target node number threshold value determined based on the calculation resources required by each inference node for task processing and the rated calculation amount, the second preset number is a number difference value between the first preset number and the priority inference nodes, and the non-priority inference nodes are inference nodes with the priority task number of 0.

determining the priority inference nodes as the target nodes one by one based on the sequence of the inference priorities from large to small until the sum of computing resources required by task processing of the priority inference node to be determined as the target node and all the target nodes is larger than the rated computing amount;

if the sum of the computing resources required by all the priority inference nodes for task processing is less than the rated calculated amount, determining all the non-priority inference nodes as the target nodes one by one on the basis of the sequence of the inference priorities from large to small until the sum of the computing resources required by the non-priority inference nodes to be determined as the target nodes and all the target nodes for task processing is more than the rated calculated amount;

the non-priority reasoning nodes are reasoning nodes with the priority task number of 0.

The invention also provides a task scheduling method, which is applied to the reasoning node and comprises the following steps:

sending the information to be processed of the local terminal to a scheduling module, so that the scheduling module selects a target node from all inference nodes based on the information to be processed of all inference nodes and the importance of all inference nodes in cooperative operation, and sends a task processing instruction to the target node;

if the task processing instruction is received, performing task processing;

the local end and other inference nodes cooperatively work, and the information to be processed comprises the number of tasks to be processed of the local end and/or the task type of each task to be processed.

The present invention also provides a scheduling module, comprising:

the information acquisition unit is used for acquiring information to be processed of each inference node, the inference nodes cooperate to operate, and the information to be processed comprises the number of tasks to be processed under the corresponding inference node and/or the task type of each task to be processed;

the target selection unit is used for selecting target nodes from the reasoning nodes based on the information to be processed of the reasoning nodes and the importance of the reasoning nodes in the cooperative operation, and the sum of computing resources required by all the target nodes for task processing is less than or equal to the rated computing amount;

and the instruction sending unit is used for sending a task processing instruction to the target node so as to trigger the target node to perform task processing.

The present invention also provides an inference node, comprising:

the sending unit is used for sending the to-be-processed information of the local terminal to the scheduling module, so that the scheduling module selects a target node from all the inference nodes based on the to-be-processed information of all the inference nodes and the importance of all the inference nodes in the cooperative operation, and sends a task processing instruction to the target node; the local terminal and other inference nodes cooperatively work, and the information to be processed comprises the number of tasks to be processed of the local terminal and/or the task type of each task to be processed;

and the task processing unit is used for processing the task if the task processing instruction is received.

The invention also provides a cooperative operation system which comprises the scheduling module and a plurality of the reasoning nodes.

The present invention also provides an electronic device, including a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the task scheduling method as described in any one of the above when executing the program.

The invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the task scheduling method as described in any of the above.

According to the task scheduling method, the scheduling module, the reasoning nodes and the collaborative operation system, when the target node is selected from the reasoning nodes, the information to be processed of the reasoning nodes and the importance of the reasoning nodes in collaborative operation are considered, so that the selected target node can meet the requirement of the overall throughput and the requirement of the task response time when the task is processed, the problem that a scheduling reasoning scheme of a single neural network cannot schedule and reason tasks of multiple neural networks in collaborative work is solved, and scheduling reasoning under a complex scene is realized.

Drawings

In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

Fig. 1 is a flowchart illustrating a task scheduling method according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating step 120 of a task scheduling method according to an embodiment of the present invention;

FIG. 3 is a flowchart illustrating step 122 of a task scheduling method according to an embodiment of the present invention;

FIG. 4 is a flowchart illustrating step 122-2 of the task scheduling method according to an embodiment of the present invention;

FIG. 5 is a second flowchart illustrating a task scheduling method according to an embodiment of the present invention;

FIG. 6 is a schematic diagram illustrating a task scheduling method applied to a multi-task scenario according to an embodiment of the present invention;

FIG. 7 is a schematic structural diagram of a scheduling module provided in the present invention;

FIG. 8 is a schematic structural diagram of an inference node provided by the present invention;

FIG. 9 is a schematic diagram of a cooperative work system provided by the present invention;

fig. 10 is a schematic structural diagram of an electronic device provided by the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

At present, in an application scenario of a cloud multi-path neural network, for scheduling inference of a single neural network, generally, tasks from multiple paths are analyzed outside the neural network, data splicing is performed according to a certain rule, and the spliced data is input to a neural network inference engine for inference calculation to obtain a result output by the neural network.

However, the above scheme limits the complexity of the task and is only effective for the task of a single neural network. When a plurality of neural networks are involved in a task and an inference task needs to be completed through the cooperation of the plurality of neural networks, the scheme is difficult to meet the requirement.

In view of the above situation, in order to solve the problem that a scheduling inference scheme of a single neural network cannot schedule and infer a task in which multiple neural networks work cooperatively, the present invention provides a task scheduling method, and fig. 1 is one of the flow diagrams of the task scheduling method provided in the embodiment of the present invention, as shown in fig. 1, the method is applied to a scheduling module, and the method includes:

and step 110, acquiring to-be-processed information of each inference node, wherein each inference node collaborates, and the to-be-processed information comprises the number of to-be-processed tasks and/or the task type of each to-be-processed task under the corresponding inference node.

Specifically, in a collaborative work scenario, multiple inference nodes may be included. Here, the inference node is an inference engine node for executing an independent task or for executing a part of tasks in a cooperative task in a cooperative job scenario, and the inference node may include a single neural network or may include a plurality of neural networks that are sequentially executed.

For example, in a human-computer interaction scenario, collaborative operation can be performed through 3 inference nodes, where an inference node a is a speech recognition node and is used for performing speech transcription on speech input by a user, an inference node B is a character recognition node and is used for recognizing characters contained in an image input by the user to form a text, and an inference node C is a question recommendation node and is used for performing question recommendation according to the text obtained by the speech transcription and/or the text obtained by performing character recognition on the image. The inference node a may include two sequentially executed neural networks, and specifically, in the execution process, the speech noise is reduced through the neural network a1, and then the speech after noise reduction is transcribed through the neural network a 2. The inference nodes B and C may each be implemented by a single neural network.

Considering that under the cooperative operation scene of multiple inference nodes, only the communication relation on the execution logic exists between the inference nodes, for example, under the above-mentioned human-computer interaction scene, the outputs of the inference node a and the inference node B are the inputs of the inference node C. However, the inference nodes do not mutually transmit the information states to be processed, so that in the collaborative operation scene, the embodiment of the invention additionally arranges a scheduling module which can communicate with each inference node, thereby realizing task scheduling of collaborative operation according to the information to be processed of each inference node.

Before scheduling each inference node to perform task processing, the scheduling module needs to acquire to-be-processed information of each inference node, where the to-be-processed information is used to represent relevant information of a task that needs to be processed by the inference node, and may include, for example, the number of the to-be-processed tasks, the task type of each to-be-processed task, or may include both the number of the to-be-processed tasks and the task type of each to-be-processed task, which is not specifically limited in this embodiment of the present invention.

Regarding the number of the tasks to be processed, when the inference node is scheduled to perform task processing at a single time, the larger the number of the tasks to be processed is, the higher the utilization rate of the computing equipment for executing the inference node is, and the less the waste of computing resources is, so that the inference node with the larger number of the tasks to be processed can be scheduled preferentially.

For the task type of the task to be processed, the task type of the task to be processed may be divided according to the importance degree of the task to be processed, for example, the task with higher importance degree needs to preferentially schedule the corresponding inference node for processing; the task types can also be divided according to the requirements of the tasks to be processed on the response speed, for example, the tasks with higher requirements on the response speed need to be processed by preferentially scheduling the corresponding inference nodes; the task types can also be divided according to a processing mode defined by a user, for example, if the user predefines that certain types of tasks need to be processed preferentially, inference nodes corresponding to the tasks need to be scheduled preferentially to be processed.

Step 120, selecting target nodes from the reasoning nodes based on the information to be processed of the reasoning nodes and the importance of the reasoning nodes in the collaborative operation, wherein the sum of computing resources required by all the target nodes for task processing is less than or equal to the rated computing quantity;

specifically, in a collaborative operation scenario, each inference node assumes a collaborative processing task and has its importance in collaborative operations. However, because the influence degrees of different inference nodes on the response time in the collaborative operation, the acceleration efficiency when the dimension is increased, the occupation degree of the computing equipment and the occupation degree of each inference node on the I/O equipment are different, the importance of each inference node in the collaborative operation is different.

After the scheduling module acquires the to-be-processed information of each inference node, a target node can be selected from each inference node according to the acquired to-be-processed information of each inference node and the importance of each inference node in the collaborative operation. Here, the target node is an inference node that needs to be preferentially scheduled. For example, the scheduling priority of each inference node can be determined by combining the information to be processed of each inference node and the importance of each inference node in the cooperative job, so that the target node is selected according to the priority. For another example, the target node may be selected according to the to-be-processed information of each inference node, and in this process, the inference nodes similar to the to-be-processed information may be prioritized according to their importance in the collaborative job to determine the target node.

The target node selection mode combining the information to be processed of each inference node and the importance of each inference node in the cooperative operation guarantees that the selected target node takes the information of two dimensions of the information to be processed and the importance into account, so that the scheduling module can meet the requirement of throughput and guarantee the requirement of task response time when scheduling the selected target node to perform task processing.

It should be noted that, considering that the computation resources of the devices operated by each inference node are limited, when selecting a target node from each inference node, the sum of the computation resources required by all the selected target nodes during task processing needs to be less than or equal to the rated computation amount, so as to avoid the preemptive overhead caused by the simultaneous computation of too many inference nodes.

And step 130, sending a task processing instruction to the target node to trigger the target node to perform task processing.

Specifically, after a target node is selected from the inference nodes, the scheduling module may send a task processing instruction to the selected target node, where the task processing instruction is an instruction sent by the scheduling module to control the target node to perform task processing, and after receiving the task processing instruction sent by the scheduling module, the target node performs task processing, where the task processing includes performing data splicing and inference calculation on a task to be processed.

It should be noted that inference nodes that do not perform task processing may continue to receive tasks to be processed, and increase the number of tasks to be processed, thereby increasing the possibility of being selected as a target node, further avoiding the problem of too small Batch Size (Batch Size) due to frequent invocation of a certain inference node, and simultaneously avoiding the phenomenon that a certain node may not be scheduled in a long time, and ensuring the balance of throughput and response.

According to the task scheduling method provided by the embodiment of the invention, when the target node is selected from the reasoning nodes, the information to be processed of the reasoning nodes and the importance of the reasoning nodes in cooperative operation are considered, so that the selected target node can meet the requirement of the overall throughput and the requirement of the task response time when the task is processed, the problem that a scheduling reasoning scheme of a single neural network cannot schedule and reason tasks of cooperative work of a plurality of neural networks is solved, and scheduling reasoning under a complex scene is realized.

Based on the foregoing embodiment, fig. 2 is a schematic flowchart of step 120 in the task scheduling method provided by the embodiment of the present invention, and as shown in fig. 2, step 120 includes:

step 121, determining the number of priority tasks under each inference node based on the information to be processed of each inference node, wherein the priority tasks are to-be-processed tasks with task types as priority types;

and step 122, selecting a target node from the reasoning nodes based on the number of the priority tasks under the reasoning nodes and the importance of the reasoning nodes in the collaborative operation.

Specifically, the task types of the tasks to be processed under each inference node are a priority type and a non-priority type, the task with the task type being the priority type is a priority task, and the priority task is a pushed finished task under each inference node, for example, the task with the priority type may be a task that is entered and finished by a user. Considering that in the process of actually performing task processing, the response time of each inference node to the priority task is pursued by the user, and the priority of the priority task should be higher than that of the non-priority task when performing task processing, therefore, in step 121, on the basis of obtaining the to-be-processed information of each inference node, the task type of each to-be-processed task under each inference node needs to be further determined, and the number of tasks whose task type is the priority type, that is, the number of priority tasks, is obtained.

Further, after the number of the priority tasks under each inference node is determined, a target node can be selected from each inference node based on the number of the priority tasks under each inference node and the importance of each inference node in the cooperative operation, so that the scheduling module can schedule the selected target node for task processing. For example, the scheduling priority of each inference node can be determined by combining the number of priority tasks under each inference node and the importance of each inference node in the collaborative operation, so that the target node is selected according to the priority. For another example, the target node may be selected according to the number of priority tasks under each inference node, and in the process, for inference nodes with the same number of priority tasks, the target node may be determined by prioritizing according to the importance of the inference nodes in the collaborative job.

Based on the foregoing embodiment, fig. 3 is a schematic flowchart of step 122 in the task scheduling method provided by the embodiment of the present invention, and as shown in fig. 3, step 122 includes:

step 122-1, if the priority reasoning nodes exist, determining the reasoning priority of each priority reasoning node based on the number of the tasks to be processed of each priority reasoning node and the cooperative operation weight; the priority reasoning nodes are reasoning nodes with the priority task number larger than 0, and the collaborative operation weight is determined based on the importance of the corresponding reasoning nodes in the collaborative operation;

and step 122-2, determining a target node based on the inference priority of each priority inference node.

Specifically, after determining the number of priority tasks under each inference node in step 121, the inference nodes may be classified, the inference node whose number of priority tasks is greater than 0 is determined as a priority inference node, the inference node whose number of priority tasks is equal to 0 is determined as a non-priority inference node, and the priority inference node is an inference node whose task processing needs to be preferentially scheduled by the scheduling module when the task processing is performed.

After determining the node type of each inference node, in step 122-1, it is determined whether there is a priority inference node in each inference node, that is, whether there is an inference node whose number of priority tasks is greater than 0, and if there is a priority inference node, it is necessary to determine the inference priority of each priority inference node, where the inference priority represents the priority order of each priority inference node scheduled by the scheduling module during task processing.

The inference priority of each priority inference node may be specifically determined according to the number of tasks to be processed and the cooperative operation weight of each priority inference node, for example, the inference priority of each priority inference node may be determined according to a product of the number of tasks to be processed and the cooperative operation weight of each priority inference node. For another example, the number of tasks to be processed and the cooperative job weight of each priority inference node may be weighted, and the inference priority of each priority inference node may be determined according to a result obtained by the weighting.

The cooperative operation weight of each inference node can be determined according to the importance of each inference node in the cooperative operation, and the more important the corresponding inference node is in the cooperative operation, the larger the cooperative operation weight is; conversely, the smaller the cooperative work weight.

In step 122-2, after the inference priority of each inference node is determined, a target node can be selected from each inference node according to the inference priority of each inference node. For example, target nodes can be selected from the inference nodes in the order of the inference priority of each priority inference node from high to low, and the sum of computing resources required by the selected target nodes for task processing does not exceed the rated computing amount.

The task scheduling method provided by the embodiment of the invention selects the target node according to whether the priority reasoning node exists or not, thereby ensuring that the reasoning node with the priority processing task can be scheduled preferentially and greatly improving the response time of the equipment to the priority task.

Based on the above embodiment, step 122-2 includes:

the first preset number is a target node number threshold value determined based on the calculation resources required by each inference node for task processing and the rated calculation amount, the second preset number is a number difference value between the first preset number and the prior inference nodes, and the non-prior inference nodes are inference nodes with the prior task number of 0.

Specifically, in consideration of the requirement that the sum of the computing resources required for task processing of the selected target nodes cannot exceed the rated computing amount, a target node number threshold may be preset to control the number of inference nodes operating at the same time in the device.

After the existence of the priority reasoning nodes is determined, the number of the priority reasoning nodes can be determined, if the number of the priority reasoning nodes is larger than or equal to a target node number threshold, the sum of the computing resources required by all the priority reasoning nodes when performing task processing exceeds a rated computing amount, the target node number threshold is a first preset number and can be determined according to the computing resources required by all the reasoning nodes when performing task processing and the rated computing amount, and the first preset number defines the number of the reasoning nodes which can run on the equipment at the same time.

If the number of the priority reasoning nodes is larger than or equal to the first preset number, the selection range of the target node is only limited in the priority reasoning nodes, and the priority reasoning nodes with the first preset number can be sequentially selected as the target nodes from high to low according to the reasoning priority of each priority reasoning node, so that the sum of computing resources occupied by all the selected target nodes when the tasks are processed simultaneously is not more than the rated computing amount, and the normal operation of the equipment is ensured.

Correspondingly, if the number of the priority reasoning nodes is smaller than the first preset number, which indicates that the sum of the computing resources required by all the priority reasoning nodes for task processing does not exceed the rated computation amount, all the priority reasoning nodes are used as target nodes, a second preset number of non-priority reasoning nodes are selected from the non-priority reasoning nodes to serve as the target nodes, and the selected target nodes not only comprise the priority reasoning nodes, but also comprise the non-priority reasoning nodes. It should be noted that the sum of the second preset number and the number of the priority inference nodes is equal to the first preset number.

And selecting a second preset number of non-priority inference nodes from the non-priority inference nodes as target nodes, specifically, sequentially selecting the second preset number of non-priority inference nodes from high to low according to the inference priority of each non-priority inference node, and taking the selected non-priority inference nodes as the target nodes.

For example, the scheduling module senses that tasks with task types of priority exist in the first inference node, the second inference node and the third inference node, and tasks with task types of priority do not exist in the fourth inference node and the fifth inference node, that is, the first inference node, the second inference node and the third inference node are priority inference nodes, and the fourth inference node and the fifth inference node are non-priority inference nodes. The number of the tasks to be processed of the first inference node is 20, the number of the tasks to be processed of the second inference node and the third inference node is 10, the number of the tasks to be processed of the fourth inference node is 20, and the number of the tasks to be processed of the fifth inference node is 10.

Compared with other inference nodes, the influence of the second inference node on the task response time of the collaborative operation is more critical, so that the collaborative operation weight of the second inference node is 1.5, the collaborative action weight of the first inference node and the third inference node is 1, and the collaborative action weight of the fourth inference node and the fifth inference node is 0.8. The inference priority parameters of the five inference nodes at the current moment are calculated to be 20, 22.5, 10, 16 and 8 respectively. The reasoning priority of the second reasoning node in the priority reasoning nodes is higher than that of the first reasoning node, and the reasoning priority of the first reasoning node is higher than that of the third reasoning node. And the reasoning priority of the fourth reasoning node in the non-priority reasoning nodes is higher than that of the fifth reasoning node. And if the first preset number is 1, taking a second inference node with the highest inference priority in the priority inference nodes as a target node.

If the first preset number is 4, all the priority reasoning nodes, namely the first reasoning node, the second reasoning node and the third reasoning node are determined as target nodes, and a fourth reasoning node with the highest reasoning priority in the non-priority reasoning nodes is used as the target node.

The task scheduling method provided by the embodiment of the invention combines the computing resources and the rated computation amount required by each inference node for task processing to select the target node, ensures that the sum of the computing resources required by each selected target node for task processing does not exceed the rated computation amount, not only meets the requirement of high throughput, but also improves the use efficiency of the equipment, and simultaneously avoids the preemptive overhead caused by the simultaneous task processing of excessive inference nodes, thereby ensuring the normal operation of the equipment.

Based on the foregoing embodiment, fig. 4 is a schematic flowchart of step 122-2 in the task scheduling method provided by the embodiment of the present invention, and as shown in fig. 4, step 122-2 includes:

step 122-21, based on the sequence of the inference priorities from large to small, determining each priority inference node as a target node one by one until the sum of computing resources required by task processing of the priority inference node to be determined as the target node and all target nodes is greater than the rated computing amount;

122-22, if the sum of the computing resources required by all the priority inference nodes for task processing is less than the rated calculated amount, determining all the non-priority inference nodes as target nodes one by one based on the sequence of the inference priorities from large to small until the sum of the computing resources required by the next non-priority inference node to be determined as the target node and all the target nodes for task processing is more than the rated calculated amount; the non-priority reasoning node is a reasoning node with the priority task number of 0.

Specifically, considering that it is required to ensure that the sum of the computing resources required for the task processing of each selected target node cannot exceed the rated computing amount, it may be synchronously determined whether the computing resources required for the task processing of each selected target node exceed the rated computing amount in the selection process.

After the inference priority of each priority inference node is determined, if the sum of the computing resources required by all priority inference nodes for task processing exceeds the rated computing amount, the priority inference nodes are determined as target nodes one by one according to the sequence of the inference priority from high to low, for example, the ith priority inference node is determined as a target node according to the sequence of the inference priority from high to low, the sum of the computing resources required by the first i priority inference nodes determined as the target nodes for task processing is M, M is less than the rated computing amount M, the computing resources required by the i +1 th priority inference nodes for task processing is n, if M + n is more than or equal to M, the sum of the computing resources required by the first i priority inference nodes determined as the target nodes for task processing and the computing resources required by the i +1 th priority inference nodes to be determined as the target nodes for task processing exceeds the rated computing amount, the first i priority inference nodes are determined as target nodes.

Correspondingly, if M + n is less than M, the sum of the computing resources required by the first i priority reasoning nodes which are determined as the target nodes for performing task processing and the computing resources required by the first i +1 priority reasoning nodes which are to be determined as the next target nodes for performing task processing does not exceed the rated computing amount, at this time, the first i +1 priority reasoning nodes can be continuously determined as the target nodes, the computing resources required by the first i +2 priority reasoning nodes for performing task processing and the computing resources required by the first i +1 priority reasoning nodes for performing task processing are judged whether to exceed the rated computing amount, whether the first i +2 priority reasoning nodes are required to be used as the target nodes is determined according to the judgment result, the process is repeated until the computing resources required by the first priority reasoning nodes which are determined as the target nodes for performing task processing are determined, the sum of the computing resources required for task processing with a priority inference node to be determined next as a target node exceeds a rated computation amount.

And if the sum of the computing resources required by all the priority inference nodes for task processing does not exceed the rated calculated amount, all the priority inference nodes are taken as target nodes, for example, if the number of the priority inference nodes is k, and the sum a of the computing resources required by k priority inference nodes for task processing is less than the rated calculated amount M, all the k priority inference nodes are taken as target nodes. And determining the non-priority inference nodes as target nodes one by one according to the inference priority of each non-priority inference node from high to low, for example, determining whether the sum of the computing resources b and a required by the 1 st non-priority inference node for task processing is less than or equal to M according to the inference priority of each non-priority inference node from high to low, if b + a is less than or equal to M, then using the 1 st non-priority inference node as the target node, and continuing to determine whether the computing resources required by the 2 nd non-priority inference node for task processing and the sum of the computing resources required by the k priority inference nodes determined as the target nodes and the 1 st non-priority inference node for task processing exceed the rated computing capacity, determining whether the 2 nd non-priority inference node is required as the target node according to the determination result, repeating the above process until the determined target node performs the computing resources required for task processing, the sum of the computing resources required for task processing with a non-priority reasoning node to be determined as a target node next is larger than the rated calculated amount.

If b + a is larger than M, it indicates that in the rated calculation amount, except the computing resources occupied by the task processing of the k priority inference nodes, the remaining computing resources are not enough to support the 1 st non-priority inference node with the highest priority inference level to perform the task processing, and at this time, the k priority inference nodes are determined as target nodes.

Fig. 5 is a second flowchart of a task scheduling method according to an embodiment of the present invention, as shown in fig. 5, the method is applied to an inference node, and the method includes:

step 510, sending the information to be processed of the home terminal to a scheduling module, so that the scheduling module selects a target node from the inference nodes based on the information to be processed of the inference nodes and the importance of the inference nodes in the cooperative operation, and sends a task processing instruction to the target node; the local end and other inference nodes cooperatively work, and the information to be processed comprises the number of tasks to be processed of the local end and/or the task type of each task to be processed.

Specifically, in consideration of a collaborative operation scenario of multiple inference nodes, only a communication link in execution logic exists between the inference nodes, for example, in the above human-computer interaction scenario, outputs of the inference node a and the inference node B are inputs of the inference node C. However, the inference nodes do not mutually transmit the information states to be processed, so that in the collaborative operation scene, the embodiment of the invention additionally arranges a scheduling module which can communicate with each inference node, thereby realizing task scheduling of collaborative operation according to the information to be processed of each inference node.

Before scheduling each inference node to perform task processing, each inference node needs to send to-be-processed information of its own end to a scheduling module, where the to-be-processed information is used to represent relevant information of a task that needs to be processed by the inference node, and may include, for example, the number of the to-be-processed tasks, the task type of each to-be-processed task, or may include both the number of the to-be-processed tasks and the task type of each to-be-processed task.

After receiving the information to be processed of each inference node, the scheduling module can select a target node from each inference node according to the received information to be processed of each inference node and the importance of each inference node in the collaborative operation. Here, the target node is an inference node that needs to be preferentially scheduled.

In the cooperative operation scene, each inference node bears the task of cooperative processing and has the importance of the inference node in the cooperative operation. However, because the influence degrees of different inference nodes on the response time in the collaborative operation, the acceleration efficiency when the dimension is increased, the occupation degree of the computing equipment and the occupation degree of each inference node on the I/O equipment are different, the importance of each inference node in the collaborative operation is different.

For example, the scheduling priority of each inference node can be determined by combining the information to be processed of each inference node and the importance of each inference node in the cooperative job, so that the target node is selected according to the priority. For another example, the target node may be selected according to the to-be-processed information of each inference node, and in this process, the inference nodes similar to the to-be-processed information may be prioritized according to their importance in the collaborative job to determine the target node.

After the target node is determined, the scheduling module further needs to send a task processing instruction to the target node, where the task processing instruction is an instruction sent by the scheduling module to control the target node to perform task processing.

Step 520, if a task processing instruction is received, performing task processing.

Specifically, after receiving a task processing instruction sent by the scheduling module, the target node performs task processing, where the task processing includes performing data splicing and inference calculation on a to-be-processed task.

It should be noted that inference nodes that do not perform task processing may continue to receive tasks to be processed, and increase the number of tasks to be processed, thereby increasing the possibility of being selected as target nodes, further avoiding the problem of too small Batch Size (Batch Size) due to frequent invocation of a certain inference node, and meanwhile avoiding the phenomenon that a certain node may not be scheduled in a long time, and ensuring the balance of throughput and response.

According to the task scheduling method provided by the embodiment of the invention, each inference node sends the information to be processed of the local terminal to the scheduling module, and the scheduling module selects the target node from each inference node by combining the information to be processed of each inference node and the importance of each inference node in cooperative operation, so that when the selected target node performs task processing, the requirement of the overall throughput can be met, the requirement of the task response time is also met, the problem that a scheduling inference scheme of a single neural network cannot schedule and infer tasks of cooperative work of a plurality of neural networks is solved, and scheduling inference under a complex scene is realized.

Based on the above embodiments, fig. 6 is a schematic diagram of the task scheduling method provided by the embodiment of the present invention applied to a multi-task scenario, where T and R in fig. 6 represent to-be-processed tasks outside corresponding inference nodes, each T or each R corresponds to one session, Batch1, Exe1, and Exe2 in fig. 6 correspond to first inference nodes, Batch2, Exe3, and Exe4 correspond to second inference nodes, and Batch3 and Exe5 correspond to third inference nodes. Wherein, Batch represents the data splicing node in the reasoning node, Exe represents the reasoning engine node for executing reasoning in the reasoning node, and Schedule is the scheduling module. The solid arrows indicate data transmission and the direction of the data transmission, and the dashed arrows indicate the scheduling process of the scheduling module. Each Batch performs the following functions:

(1) when processing a task, the inference engine has a certain requirement on the size of the processed data, the externally input data may not meet the size requirement, and the inference engine does not have the capability of storing data and managing data, so that the Batch realizes the function, that is, each data splicing node stores the task to be processed at the local end;

(2) the dispatching assisting module is used for dispatching and dispatching the whole task, each Batch analyzes the to-be-processed task managed by the local terminal, the to-be-processed information of the local terminal is determined, the to-be-processed information of the local terminal is sent to the dispatching module, and the dispatching assisting module is used for dispatching and managing.

In the application scenario provided in the embodiment of the present invention, the execution flow for the one-way task includes the following steps:

step S1, the user inputs a plurality of Data (Data0, Data1, …, DataN) into Exe1 to carry out reasoning calculation;

step S2, the Exe1 outputs the calculation result, the calculation result output by Exe1 is input to Exe2 for inference calculation, and meanwhile, the write-back data in the calculation result output by Exe1 is input back to Batch 1;

step S3, the user inputs a plurality of Data (Data0, Data1, …, DataN) into Exe3 to carry out reasoning calculation;

step S4, the Exe3 outputs the calculation result, the calculation result output by Exe3 is input to Exe4 for reasoning calculation, and meanwhile, the write-back data in the output result is input back to Batch 2;

in step S5, after the inference calculation process of Exe2 and Exe4 is completed, the calculation results output by Exe2 and Exe4 are input to Exe5 for final inference calculation, and the calculation result output by Exe5 is obtained.

Each one-way task is executed in the timing of Exe 1-Exe 2, Exe 3-Exe 4, Exe2+ Exe 4-Exe 5.

As shown in fig. 6, the execution flow for the multi-path task includes:

(1) uniformly managing all batchs, namely a Batch1, a Batch2 and a Batch3 in the graph by using a scheduling module, logically treating the nodes of the successive inference engines as the same node, for example treating Exe1 and Exe2 as the same node, and treating Exe3 and Exe4 as the same node;

(2) the scheduling module acquires information to be processed of each inference node, namely acquiring information to be processed of a first inference node, a second inference node and a third inference node, wherein the acquired information to be processed comprises the number of tasks to be processed under the corresponding inference node and/or the task type of each task to be processed, and when the inference node has the task to be processed, the information of the task to be processed is fed back to the scheduling module;

(3) when the scheduling module senses that the to-be-processed tasks exist in the currently managed inference nodes, the scheduling module can select target nodes from the inference nodes according to the to-be-processed information of the inference nodes and the importance of the inference nodes in cooperative operation, schedule the target nodes for data splicing, and input the spliced data to the inference engine node for inference calculation.

The reasonable scheduling strategy and the effective engine combination under the multi-path task scene can ensure the maximization of task throughput and the high utilization rate of equipment.

The following describes the scheduling module provided by the present invention, and the scheduling module described below and the task scheduling method described above may be referred to correspondingly.

Fig. 7 is a schematic structural diagram of a scheduling module provided in the present invention. As shown in fig. 7, the module includes:

an information obtaining unit 710, configured to obtain to-be-processed information of each inference node, where the to-be-processed information includes the number of to-be-processed tasks under a corresponding inference node and/or the task type of each to-be-processed task, and the to-be-processed information is cooperatively operated by each inference node;

a target selecting unit 720, configured to select a target node from the inference nodes based on the to-be-processed information of each inference node and the importance of each inference node in the cooperative operation, where a total of computing resources required by all the target nodes for performing task processing is less than or equal to a rated computing amount;

the instruction sending unit 730 is configured to send a task processing instruction to the target node to trigger the target node to perform task processing.

The scheduling module provided by the invention considers the information to be processed of each inference node and the importance of each inference node in cooperative operation when selecting the target node from each inference node, so that the selected target node can meet the requirement of the whole throughput and the requirement of task response time when performing task processing, the problem that a scheduling inference scheme of a single neural network cannot schedule and infer tasks of cooperative work of a plurality of neural networks is solved, and scheduling inference under a complex scene is realized.

Based on the above embodiment, the target selecting unit 720 is configured to:

Fig. 8 is a schematic structural diagram of an inference node provided by the present invention. As shown in fig. 8, the node includes:

the sending unit 810 is configured to send the to-be-processed information of the home terminal to the scheduling module, so that the scheduling module selects a target node from the inference nodes based on the to-be-processed information of the inference nodes and the importance of the inference nodes in the collaborative operation, and sends a task processing instruction to the target node; the local terminal and other inference nodes cooperatively work, and the information to be processed comprises the number of tasks to be processed of the local terminal and/or the task type of each task to be processed;

and a task processing unit 820, configured to perform task processing if the task processing instruction is received.

According to the inference node provided by the invention, each inference node sends the information to be processed of the local terminal to the scheduling module, and the scheduling module selects the target node from each inference node by combining the information to be processed of each inference node and the importance of each inference node in cooperative operation, so that when the selected target node performs task processing, the requirement of the overall throughput can be met, the requirement of the task response time is also met, the problem that a scheduling inference scheme of a single neural network cannot schedule and infer tasks of cooperative work of a plurality of neural networks is solved, and scheduling inference under a complex scene is realized.

Fig. 9 is a schematic structural diagram of a cooperative operation system provided in the present invention, and as shown in fig. 9, the system includes a scheduling module 700, and a plurality of inference nodes 800.

Fig. 10 illustrates a physical structure diagram of an electronic device, and as shown in fig. 10, the electronic device may include: a processor (processor)1010, a communication Interface (Communications Interface)1020, a memory (memory)1030, and a communication bus 1040, wherein the processor 1010, the communication Interface 1020, and the memory 1030 communicate with each other via the communication bus 1040. Processor 1010 may call logic instructions in memory 1030 to perform a task scheduling method applied to a scheduling module, the method comprising: acquiring to-be-processed information of each inference node, wherein the to-be-processed information comprises the number of to-be-processed tasks and/or the task type of each to-be-processed task under the corresponding inference node; selecting target nodes from the reasoning nodes based on the information to be processed of the reasoning nodes and the importance of the reasoning nodes in the collaborative operation, wherein the sum of computing resources required by all the target nodes for task processing is less than or equal to the rated computing quantity; and sending a task processing instruction to the target node to trigger the target node to perform task processing.

Furthermore, the logic instructions in the memory 1030 can be implemented in software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In another aspect, the present invention also provides a computer program product, which includes a computer program stored on a non-transitory computer-readable storage medium, the computer program including program instructions, when the program instructions are executed by a computer, the computer can execute the task scheduling method provided by the above methods, the method is applied to a scheduling module, and the method includes: acquiring to-be-processed information of each inference node, wherein the to-be-processed information comprises the number of to-be-processed tasks and/or the task type of each to-be-processed task under the corresponding inference node; selecting target nodes from the reasoning nodes based on the information to be processed of the reasoning nodes and the importance of the reasoning nodes in the collaborative operation, wherein the sum of computing resources required by all the target nodes for task processing is less than or equal to the rated computing quantity; and sending a task processing instruction to the target node to trigger the target node to perform task processing.

In yet another aspect, the present invention also provides a non-transitory computer readable storage medium, on which a computer program is stored, the computer program being implemented by a processor to perform the task scheduling method provided in the above aspects, the method being applied to a scheduling module, and the method including: acquiring to-be-processed information of each inference node, wherein the to-be-processed information comprises the number of to-be-processed tasks and/or the task type of each to-be-processed task under the corresponding inference node; selecting target nodes from the reasoning nodes based on the information to be processed of the reasoning nodes and the importance of the reasoning nodes in the collaborative operation, wherein the sum of computing resources required by all the target nodes for task processing is less than or equal to the rated computing quantity; and sending a task processing instruction to the target node to trigger the target node to perform task processing.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A task scheduling method is applied to a scheduling module and comprises the following steps:

2. The task scheduling method according to claim 1, wherein the selecting a target node from the inference nodes based on the information to be processed of the inference nodes and the importance of the inference nodes in the collaborative operation comprises:

3. The task scheduling method according to claim 2, wherein the selecting a target node from the inference nodes based on the number of priority tasks under the inference nodes and the importance of the inference nodes in the collaborative operation comprises:

4. The task scheduling method according to claim 3, wherein the determining the target node based on the inference priority of each priority inference node comprises:

5. The task scheduling method according to claim 3, wherein the determining the target node based on the inference priority of each priority inference node comprises:

6. A task scheduling method is applied to an inference node and comprises the following steps:

if the task processing instruction is received, performing task processing;

7. A scheduling module, comprising:

8. An inference node, comprising:

9. A collaborative work system comprising the scheduling module of claim 7, and a plurality of inference nodes of claim 8.

10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the task scheduling method according to any of claims 1 to 6 are implemented when the processor executes the program.

11. A non-transitory computer readable storage medium, having a computer program stored thereon, wherein the computer program, when being executed by a processor, implements the steps of the task scheduling method according to any one of claims 1 to 6.