CN113608852B

CN113608852B - Task scheduling method, scheduling module, reasoning node and collaborative operation system

Info

Publication number: CN113608852B
Application number: CN202110888396.8A
Authority: CN
Inventors: 张海俊; 朱亚平; 姚文军; 李华清
Original assignee: University of Science and Technology of China USTC; iFlytek Co Ltd
Current assignee: University of Science and Technology of China USTC; iFlytek Co Ltd
Priority date: 2021-08-03
Filing date: 2021-08-03
Publication date: 2024-07-16
Anticipated expiration: 2041-08-03
Also published as: CN113608852A

Abstract

The invention provides a task scheduling method, a scheduling module, an inference node and a collaborative operation system, wherein the method is applied to the scheduling module and comprises the following steps: acquiring to-be-processed information of each inference node, wherein each inference node cooperates with the operation, and the to-be-processed information comprises the number of to-be-processed tasks under the corresponding inference node and/or the task types of each to-be-processed task; selecting target nodes from the inference nodes based on the information to be processed of the inference nodes and the importance of the inference nodes in collaborative operation, wherein the sum of computing resources required by all the target nodes for task processing is less than or equal to rated computing quantity; the task processing instruction is sent to the target node to trigger the target node to process the task, so that the requirements on the overall throughput and the task response time are met, the problem that a scheduling reasoning scheme of a single neural network cannot perform scheduling reasoning on tasks of a plurality of neural networks for cooperative work is solved, and scheduling reasoning under a complex scene is realized.

Description

Task scheduling method, scheduling module, reasoning node and collaborative operation system

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a task scheduling method, a scheduling module, an inference node, and a collaborative operation system.

Background

Along with the wide development of the application scene of the cloud multi-path neural network, the application of the cloud multi-path neural network is extended from initial offline identification to real-time online, and is changed from single-network application to multi-network cooperative work. In this process, not only the throughput needs but also the task response time needs are satisfied.

At present, for an application scene of a single neural network, the use efficiency of equipment is generally improved by increasing the number of data splicing units of tasks and the requirement of overall task throughput is met, but the method limits the complexity of the tasks, is only effective for the tasks of the single neural network, and cannot perform scheduling reasoning on the tasks of the collaborative work of a plurality of neural networks.

Disclosure of Invention

The invention provides a task scheduling method, a scheduling module, an inference node and a collaborative operation system, which are used for solving the defect that scheduling inference cannot be carried out on tasks which work cooperatively by a plurality of neural networks in the prior art.

The invention provides a task scheduling method, which is applied to a scheduling module and comprises the following steps:

Acquiring to-be-processed information of each inference node, wherein each inference node cooperates with the operation, and the to-be-processed information comprises the number of to-be-processed tasks under the corresponding inference node and/or the task types of each to-be-processed task;

selecting target nodes from the inference nodes based on the information to be processed of the inference nodes and the importance of the inference nodes in collaborative operation, wherein the sum of calculation resources required by task processing of all the target nodes is smaller than or equal to rated calculation amount;

and sending a task processing instruction to the target node to trigger the target node to process the task.

According to the task scheduling method provided by the invention, the target node is selected from the inference nodes based on the information to be processed of the inference nodes and the importance of the inference nodes in collaborative operation, and the task scheduling method comprises the following steps:

based on the to-be-processed information of each inference node, determining the number of priority tasks under each inference node, wherein the priority tasks are to-be-processed tasks with the task types being priority types;

And selecting a target node from the inference nodes based on the number of priority tasks under the inference nodes and the importance of the inference nodes in collaborative operation.

According to the task scheduling method provided by the invention, the target node is selected from the inference nodes based on the number of priority tasks under the inference nodes and the importance of the inference nodes in collaborative operation, and the task scheduling method comprises the following steps:

If the priority reasoning nodes exist, determining the reasoning priority of each priority reasoning node based on the number of tasks to be processed and the collaborative operation weight of each priority reasoning node;

Determining a target node based on the reasoning priority of each priority reasoning node;

the priority inference node is an inference node with the number of priority tasks being greater than 0, and the collaborative operation weight is determined based on the importance of the corresponding inference node in collaborative operation.

According to the task scheduling method provided by the invention, the target node is determined based on the reasoning priority of each priority reasoning node, and the task scheduling method comprises the following steps:

If the number of the priority reasoning nodes is larger than or equal to a first preset number, selecting the priority reasoning node with the highest reasoning priority of the first preset number as a target node;

Otherwise, taking all the priority reasoning nodes as target nodes, and selecting a second preset number of non-priority reasoning nodes with highest reasoning priority as target nodes;

the first preset number is a target node number threshold value determined based on calculation resources required by task processing of all the inference nodes and the rated calculation amount, the second preset number is a number difference value between the first preset number and the priority inference nodes, and the non-priority inference nodes are inference nodes with the priority task number of 0.

Determining each priority inference node as the target node one by one based on the order of the inference priorities from large to small until the sum of calculation resources required by task processing of the next priority inference node to be determined as the target node and all the target nodes is larger than the rated calculation amount;

If the sum of the computing resources required by all the priority reasoning nodes for task processing is smaller than the rated computing amount, determining each non-priority reasoning node as the target node one by one based on the sequence of the reasoning priorities from large to small until the sum of the computing resources required by the non-priority reasoning nodes to be determined as the target nodes and all the target nodes to be task processed is larger than the rated computing amount;

The non-priority reasoning nodes are reasoning nodes with the number of priority tasks being 0.

The invention also provides a task scheduling method which is applied to the inference node and comprises the following steps:

The method comprises the steps that information to be processed of a local end is sent to a scheduling module, so that the scheduling module can select a target node from all inference nodes based on the information to be processed of all inference nodes and the importance of all inference nodes in collaborative operation, and a task processing instruction is issued to the target node;

If the task processing instruction is received, performing task processing;

The method comprises the steps that a local terminal cooperates with other reasoning nodes, and the information to be processed comprises the number of tasks to be processed of the local terminal and/or task types of the tasks to be processed.

The invention also provides a scheduling module, comprising:

The information acquisition unit is used for acquiring the information to be processed of each inference node, the inference nodes work cooperatively, and the information to be processed comprises the number of tasks to be processed and/or the task types of the tasks to be processed under the corresponding inference nodes;

The target selection unit is used for selecting target nodes from the inference nodes based on the information to be processed of the inference nodes and the importance of the inference nodes in collaborative operation, and the sum of calculation resources required by all the target nodes for task processing is smaller than or equal to rated calculation amount;

And the instruction sending unit is used for sending a task processing instruction to the target node so as to trigger the target node to process the task.

The invention also provides an inference node comprising:

The system comprises a sending unit, a scheduling module and a task processing unit, wherein the sending unit is used for sending information to be processed of a local end to the scheduling module, so that the scheduling module can select a target node from all the inference nodes based on the information to be processed of all the inference nodes and the importance of all the inference nodes in collaborative operation, and issue a task processing instruction to the target node; the method comprises the steps that a local terminal cooperates with other reasoning nodes, and the information to be processed comprises the number of tasks to be processed of the local terminal and/or task types of the tasks to be processed;

And the task processing unit is used for performing task processing if the task processing instruction is received.

The invention also provides a collaborative operation system which comprises the scheduling module and a plurality of reasoning nodes.

The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the task scheduling method as described in any one of the above when executing the program.

The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the task scheduling method as described in any of the above.

According to the task scheduling method, the scheduling module, the inference nodes and the collaborative operation system, when the target node is selected from the inference nodes, the importance of the information to be processed of the inference nodes and the importance of the inference nodes in the collaborative operation are considered, so that the selected target node can meet the requirement of overall throughput and the requirement of task response time when performing task processing, the problem that a scheduling inference scheme of a single neural network cannot perform scheduling inference on tasks of a plurality of neural networks in collaborative operation is solved, and scheduling inference under a complex scene is realized.

Drawings

In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic flow chart of a task scheduling method according to an embodiment of the present invention;

Fig. 2 is a flowchart of step 120 in the task scheduling method according to the embodiment of the present invention;

FIG. 3 is a flowchart of step 122 in a task scheduling method according to an embodiment of the present invention;

FIG. 4 is a flowchart of step 122-2 in a task scheduling method according to an embodiment of the present invention;

FIG. 5 is a second flow chart of a task scheduling method according to the embodiment of the present invention;

FIG. 6 is a schematic diagram of a task scheduling method applied to a multi-path task scenario according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of a scheduling module provided by the present invention;

Fig. 8 is a schematic diagram of the structure of an inference node provided by the present invention;

FIG. 9 is a schematic diagram of a collaborative operation system according to the present invention;

fig. 10 is a schematic structural diagram of an electronic device provided by the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

At present, in an application scene of a cloud multipath neural network, a scheduling inference of a single neural network is generally performed by analyzing tasks from multiple paths outside the neural network, splicing data according to a certain rule, and inputting the spliced data into a neural network inference engine for inference calculation to obtain a result output by the neural network.

However, the above scheme limits the complexity of the task and is only effective for the task of a single neural network. When a plurality of neural networks are involved in a task and the reasoning task is required to be completed together through the cooperation of the plurality of neural networks, the scheme is difficult to meet the requirement.

Aiming at the above situation, in order to solve the problem that a scheduling reasoning scheme of a single neural network cannot perform scheduling reasoning on tasks of a plurality of neural networks for cooperative work, the invention provides a task scheduling method, and fig. 1 is one of flow diagrams of the task scheduling method provided by the embodiment of the invention, as shown in fig. 1, the method is applied to a scheduling module, and the method comprises:

Step 110, obtaining the information to be processed of each inference node, wherein each inference node cooperates with the job, and the information to be processed comprises the number of tasks to be processed and/or the task types of each task to be processed under the corresponding inference node.

In particular, in a collaborative job scenario, multiple inference nodes may be included. Here, the inference node is an inference engine node for executing independent tasks or for executing partial tasks in the collaborative task scenario, and the inference node may include a single neural network or may include a plurality of neural networks that are sequentially executed.

For example, under a human-computer interaction scene, collaborative operation can be performed through 3 inference nodes, wherein the inference node A is a voice recognition node and is used for performing voice transcription on voice input by a user, the inference node B is a word recognition node and is used for recognizing words contained in an image input by the user to form a text, and the inference node C is a test question recommendation node and is used for performing test question recommendation according to the text obtained through voice transcription and/or the text obtained through word recognition on the image. The inference node a may include two sequentially executed neural networks, specifically, in the execution process, voice noise reduction is performed through the neural network A1, and then, the neural network A2 is used to transcribe the noise reduced voice. The inference nodes B and C may each be implemented by a single neural network.

Considering that only communication links which are logically executed exist between the reasoning nodes in a collaborative operation scene of multiple reasoning nodes, for example, in the man-machine interaction scene, the output of the reasoning node A and the reasoning node B is the input of the reasoning node C. However, the information states to be processed of the inference nodes cannot be mutually transmitted, so that in the embodiment of the invention, under the cooperative work scene, a scheduling module is additionally arranged, and the scheduling module can be communicated with each inference node, so that the task scheduling of the cooperative work can be realized according to the information to be processed of each inference node.

Before the scheduling module schedules each inference node to process the task, the scheduling module needs to acquire the information to be processed of each inference node, where the information to be processed is used to represent the relevant information of the task to be processed by the inference node, for example, the information may include the number of tasks to be processed, or may include the task type of each task to be processed, or may also include the number of tasks to be processed and the task type of each task to be processed, which is not limited in particular by the embodiment of the present invention.

Regarding the number of tasks to be processed, when the single scheduling inference node processes the tasks, the larger the number of the tasks to be processed is, the higher the utilization rate of the computing equipment executing the inference node is, and the less the waste of computing resources is, so that the inference node with the larger number of the tasks to be processed can be scheduled preferentially.

For the task type of the task to be processed, the task type of the task to be processed can be divided according to the importance degree of the task to be processed, for example, the task with higher importance degree is more required to be processed by preferentially scheduling the corresponding inference node; the task types can also be divided according to the requirements of the task to be processed on the response speed, for example, the higher the requirement of the response speed is, the more the corresponding inference node is required to be scheduled for processing preferentially; the task types can also be divided according to a user-defined processing mode, for example, if the user defines that certain types of tasks need to be processed preferentially, the inference nodes corresponding to the tasks need to be scheduled preferentially for processing.

Step 120, selecting target nodes from the inference nodes based on the information to be processed of the inference nodes and the importance of the inference nodes in collaborative operation, wherein the sum of computing resources required by task processing of all the target nodes is smaller than or equal to rated computing quantity;

Specifically, in the collaborative job scenario, each inference node bears a task of collaborative processing, which has importance in collaborative jobs. However, because the influence degree of different inference nodes on response time in collaborative operation is different in acceleration efficiency when the dimension is increased, occupation degree of computing equipment and occupation degree of each inference node on I/O equipment, importance of each inference node in collaborative operation is also different.

After the scheduling module obtains the information to be processed of each inference node, the scheduling module can select a target node from the inference nodes according to the obtained information to be processed of each inference node and the importance of each inference node in collaborative operation. Here, the target node is an inference node that needs to be preferentially scheduled. For example, the scheduling priority of each inference node may be determined in combination with the information to be processed of each inference node and the importance in the collaborative job, so as to select the target node according to the priority level. For another example, the target node may be selected according to the information to be processed of each inference node, and in this process, inference nodes similar to the information to be processed may be prioritized according to their importance in the collaborative job to determine the target node.

The target node selection mode of combining the information to be processed of each inference node and the importance of each inference node in collaborative operation ensures that the selected target node takes account of the information to be processed and the information of two dimensions of the importance, so that the scheduling module can not only meet the requirement of throughput but also ensure the requirement of task response time when scheduling the selected target node for task processing.

It should be noted that, considering that the computing resources of the devices operated by each inference node are limited, when selecting the target node from each inference node, the sum of the computing resources required by all the selected target nodes is less than or equal to the rated computing amount, so as to avoid the preemptive overhead caused by simultaneous computation of multiple inference nodes.

And 130, sending a task processing instruction to the target node to trigger the target node to process the task.

Specifically, after the target node is selected from all the inference nodes, the scheduling module can send a task processing instruction to the selected target node, wherein the task processing instruction is an instruction for controlling the target node to process a task, which is sent by the scheduling module, and the target node processes the task after receiving the task processing instruction sent by the scheduling module, and the task processing comprises data splicing and reasoning calculation of the task to be processed.

It should be noted that, the inference node that does not process the task may continue to receive the task to be processed, and increase the number of tasks to be processed, thereby increasing the possibility of being selected as the target node, further avoiding the problem that a Batch Size is too small caused by frequent call of a certain inference node, and avoiding the phenomenon that a certain node may not be scheduled in a longer time, and ensuring balance of throughput and response.

According to the task scheduling method provided by the embodiment of the invention, when the target node is selected from the inference nodes, the importance of the information to be processed of the inference nodes and the importance of the inference nodes in collaborative operation are considered, so that the selected target node can meet the requirement of overall throughput and the requirement of task response time when performing task processing, the problem that a scheduling inference scheme of a single neural network cannot perform scheduling inference on tasks of a plurality of neural networks for collaborative operation is solved, and scheduling inference under a complex scene is realized.

Based on the foregoing embodiments, fig. 2 is a schematic flow chart of step 120 in the task scheduling method according to the embodiment of the present invention, as shown in fig. 2, step 120 includes:

Step 121, determining the number of priority tasks under each inference node based on the information to be processed of each inference node, wherein the priority tasks are tasks to be processed with the task types being priority types;

Step 122, selecting a target node from the inference nodes based on the number of priority tasks under the inference nodes and the importance of the inference nodes in the collaborative operation.

Specifically, task types of tasks to be processed under each inference node are classified into a priority type and a non-priority type, tasks with the task types being priority types are priority tasks, and the priority tasks are tasks with the push ends under each inference node, for example, the tasks with the priority types can be tasks which are input and completed by a user. Considering that the user pursues the response time of each inference node to the priority task in the actual task processing process, the priority of the priority task should be higher than that of the non-priority task when the task processing is performed, therefore, in step 121, on the basis of obtaining the information to be processed of each inference node, the task type of each task to be processed under each inference node needs to be further determined, and the task number with the task type of the priority type, that is, the priority task number, is obtained.

Further, after the number of the priority tasks under each inference node is determined, the target node is selected from the inference nodes based on the number of the priority tasks under each inference node and the importance of each inference node in collaborative operation, so that the scheduling module schedules the selected target node for task processing. For example, the scheduling priority of each inference node may be determined in combination with the number of priority tasks under each inference node and the importance in the collaborative job, so as to select the target node according to the priority level. For another example, the target node may be selected according to the number of priority tasks under each inference node, and in this process, the inference nodes with the same number of priority tasks may be prioritized according to their importance in the collaborative job to determine the target node.

Based on the above embodiments, fig. 3 is a flowchart of step 122 in the task scheduling method according to the embodiment of the present invention, and as shown in fig. 3, step 122 includes:

Step 122-1, if there are priority inference nodes, determining an inference priority of each priority inference node based on the number of tasks to be processed and the collaborative job weight of each priority inference node; the priority reasoning nodes are the reasoning nodes with the number of priority tasks being more than 0, and the collaborative operation weight is determined based on the importance of the corresponding reasoning nodes in collaborative operation;

step 122-2, determining the target node based on the inference priority of each of the prioritized inference nodes.

Specifically, after determining the number of priority tasks under each of the inference nodes in step 121, each of the inference nodes may be classified, the inference node with the number of priority tasks greater than 0 is determined as a priority inference node, the inference node with the number of priority tasks equal to 0 is determined as a non-priority inference node, and the priority inference node is the inference node that needs to be scheduled with priority by the scheduling module when performing task processing.

After determining the node type of each inference node, in step 122-1, it is determined whether there are any inference nodes with priority, i.e. whether there are inference nodes with the number of priority tasks greater than 0, and if there are any inference nodes with priority, it is necessary to determine the inference priority of each inference node, where the inference priority indicates the priority of each inference node scheduled by the scheduling module during task processing.

The inference priority of each priority inference node may be specifically determined according to the number of tasks to be processed and the collaborative job weight of each priority inference node, for example, the inference priority of each priority inference node may be determined according to the product of the number of tasks to be processed and the collaborative job weight of each priority inference node. For another example, the number of tasks to be processed and the collaborative job weight of each priority inference node may be weighted, and the inference priority of each priority inference node may be determined according to the result obtained by the weighting.

The cooperative operation weight of each inference node can be determined according to the importance of each inference node in cooperative operation, and the more important the corresponding inference node is in cooperative operation, the larger the cooperative operation weight is; conversely, the smaller the cooperative work weight.

In step 122-2, after determining the inference priority of each inference node, the target node may be selected from the inference nodes according to the inference priority of each inference node. For example, the target nodes can be selected from the inference nodes according to the order of the inference priorities of the inference nodes from high to low, and the sum of computing resources required by the selected target nodes for task processing does not exceed the rated computing amount.

According to the task scheduling method provided by the embodiment of the invention, the target node is selected based on whether the priority inference node exists, so that the inference node with the priority processing task can be scheduled preferentially, and the response time of the device to the priority task is greatly improved.

Based on the above embodiment, step 122-2 includes:

If the number of the priority reasoning nodes is larger than or equal to the first preset number, selecting the priority reasoning nodes with the highest reasoning priority of the first preset number as target nodes;

The first preset number is a target node number threshold value determined based on calculation resources and rated calculation amount required by task processing of each inference node, the second preset number is a number difference value between the first preset number and the priority inference node, and the non-priority inference node is an inference node with the priority task number of 0.

Specifically, considering that the sum of computing resources required by each selected target node to perform task processing cannot exceed the rated computing amount, a threshold value of the number of target nodes can be preset to control the number of inference nodes running at the same time in the device.

After the existence of the priority inference nodes is determined, the number of the priority inference nodes can be determined, if the number of the priority inference nodes is greater than or equal to the threshold value of the number of the target nodes, the sum of calculation resources required by all the priority inference nodes for task processing exceeds the rated calculation amount, the threshold value of the number of the target nodes is a first preset number, and the number of the inference nodes which can run on the equipment at the same time can be determined according to the calculation resources and the rated calculation amount required by each inference node for task processing.

If the number of the priority inference nodes is greater than or equal to the first preset number, the selection range of the target nodes is limited in the priority inference nodes, and the first preset number of the priority inference nodes can be sequentially selected as the target nodes according to the order of the inference priorities of the priority inference nodes from high to low, so that the sum of calculation resources occupied when all the selected target nodes simultaneously perform task processing is ensured not to exceed the rated calculation amount, and the normal operation of the equipment is ensured.

Correspondingly, if the number of the priority inference nodes is smaller than the first preset number, which indicates that the sum of calculation resources required by all the priority inference nodes for task processing does not exceed the rated calculation amount, taking all the priority inference nodes as target nodes, and selecting a second preset number of non-priority inference nodes from the non-priority inference nodes as target nodes, wherein the selected target nodes not only comprise the priority inference nodes but also comprise the non-priority inference nodes. It should be noted that the sum of the second preset number and the number of the priority reasoning nodes is equal to the first preset number.

The second preset number of non-priority reasoning nodes are selected from the non-priority reasoning nodes to serve as target nodes, specifically, the second preset number of non-priority reasoning nodes can be sequentially selected according to the order of the reasoning priorities of the non-priority reasoning nodes from high to low, and the selected non-priority reasoning nodes serve as target nodes.

For example, the scheduling module perceives that tasks with task types of priority types exist in the first, second and third inference nodes, tasks with task types of priority types do not exist in the fourth and fifth inference nodes, namely, the first, second and third inference nodes are priority inference nodes, and the fourth and fifth inference nodes are non-priority inference nodes. The number of tasks to be processed of the first reasoning node is 20, the number of tasks to be processed of the second reasoning node, the number of tasks to be processed of the third reasoning node is 10, the number of tasks to be processed of the fourth reasoning node is 20, and the number of tasks to be processed of the fifth reasoning node is 10.

Compared with other reasoning nodes, the influence of the second reasoning node on the task response time of the collaborative operation is more critical, so that the collaborative operation weight of the second reasoning node is 1.5, the collaborative operation weight of the first reasoning node and the third reasoning node is 1, and the collaborative operation weight of the fourth reasoning node and the fifth reasoning node is 0.8. The inference priority parameters of the five inference nodes at the current moment are respectively 20, 22.5, 10, 16 and 8. The second inference node of the priority inference nodes has higher inference priority than the first inference node, and the first inference node has higher inference priority than the third inference node. The reasoning priority of the fourth reasoning node is higher than that of the fifth reasoning node in the non-priority reasoning nodes. And if the first preset number is 1, taking the second inference node with highest inference priority in the priority inference nodes as a target node.

If the first preset number is 4, determining all the priority inference nodes, namely the first inference node, the second inference node and the third inference node, as target nodes, and taking a fourth inference node with highest inference priority in the non-priority inference nodes as the target node.

According to the task scheduling method provided by the embodiment of the invention, the target nodes are selected by combining the calculation resources and the rated calculation amount required by each inference node for task processing, so that the sum of the calculation resources required by each selected target node for task processing is not more than the rated calculation amount, the high throughput requirement is met, the use efficiency of the equipment is improved, and the preemptive overhead caused by the fact that too many inference nodes simultaneously process the tasks is avoided, thereby ensuring the normal operation of the equipment.

Based on the above embodiments, fig. 4 is a schematic flow chart of step 122-2 in the task scheduling method according to the embodiment of the present invention, and as shown in fig. 4, step 122-2 includes:

Step 122-21, determining each priority inference node as a target node one by one based on the order of the inference priorities from large to small until the sum of calculation resources required by task processing of the next priority inference node to be determined as the target node and all the target nodes is larger than rated calculation amount;

Step 122-22, if the sum of the computing resources required by all the priority reasoning nodes for task processing is smaller than the rated computing amount, determining each non-priority reasoning node as a target node one by one based on the sequence of the reasoning priority from large to small until the sum of the computing resources required by the non-priority reasoning node to be determined as the target node and all the target nodes for task processing is larger than the rated computing amount; the non-priority reasoning node is a reasoning node with the number of priority tasks of 0.

Specifically, considering that the sum of computing resources required by each selected target node when task processing is required to be ensured cannot exceed the rated computing amount, whether the computing resources required by each selected target node for task processing exceeds the rated computing amount can be synchronously judged in the selecting process.

After determining the inference priorities of the priority inference nodes, if the sum of the calculation resources required by all the priority inference nodes for task processing exceeds the rated calculation amount, determining the priority inference nodes as target nodes one by one according to the order of the inference priorities from high to low, for example, determining the i-th priority inference node as the target node according to the order of the inference priorities from high to low, wherein the sum of the calculation resources required by the i-th priority inference node determined as the target node for task processing is M, M is smaller than the rated calculation amount M, the calculation resources required by the i+1-th priority inference node for task processing is n, if m+n is larger than or equal to M, the calculation resources required by the i-th priority inference node determined as the target node for task processing are indicated, and the sum of the calculation resources required by the i+1-th priority inference node to be determined as the target node next exceeds the rated calculation amount, and determining the i-th priority inference node as the target node.

Correspondingly, if m+n is smaller than M, it indicates that the sum of the computing resources required for task processing by the i-th priority inference node determined as the target node and the computing resources required for task processing by the i+1th priority inference node determined as the target node does not exceed the rated computing amount, at this time, the i+1th priority inference node may be continuously determined as the target node, and it may be determined whether the sum of the computing resources required for task processing by the i+2th priority inference node and the computing resources required for task processing by the i+1th priority inference node exceeds the rated computing amount, it may be determined whether the i+2th priority inference node is required as the target node according to the determination result, and the above process may be repeated until the sum of the computing resources required for task processing by the i+2th priority inference node determined as the target node and the next priority inference node exceeds the rated computing amount.

If the sum of the computing resources required by all the priority inference nodes for task processing does not exceed the rated computing amount, all the priority inference nodes are used as target nodes, for example, if the number of the priority inference nodes is k, and the sum a of the computing resources required by the k priority inference nodes for task processing is smaller than the rated computing amount M, all the k priority inference nodes are used as target nodes. And determining the non-priority inference nodes as target nodes one by one according to the order of the inference priorities of the non-priority inference nodes from high to low, for example, determining whether the sum of the computing resources b and a required by the 1 st non-priority inference node for task processing is less than or equal to M according to the order of the inference priorities of the non-priority inference nodes from high to low, if b+a is less than or equal to M, taking the 1 st non-priority inference node as the target node, continuously determining whether the sum of the computing resources required by the 2 nd non-priority inference node for task processing and the computing resources required by the k priority inference nodes and the 1 st non-priority inference node which are determined as the target nodes exceeds the rated computing amount, determining whether the 2 nd non-priority inference node is required as the target node according to the determination result, repeating the process until the sum of the computing resources required by the determined target node for task processing and the computing resources required by the next non-priority inference node to be determined as the target node is greater than the rated computing amount.

If b+a is greater than M, indicating that in the rated calculation amount, except for the calculation resources occupied by the k priority reasoning nodes for task processing, the remaining calculation resources are not enough to support the 1 st non-priority reasoning node with the highest priority reasoning level for task processing, and at the moment, determining the k priority reasoning nodes as target nodes.

Fig. 5 is a second flowchart of a task scheduling method according to an embodiment of the present invention, and as shown in fig. 5, the method is applied to an inference node, and the method includes:

Step 510, the information to be processed of the local end is sent to the scheduling module, so that the scheduling module selects a target node from the inference nodes based on the information to be processed of the inference nodes and the importance of the inference nodes in the collaborative operation, and issues a task processing instruction to the target node; the method comprises the steps that a local terminal cooperates with other reasoning nodes, and information to be processed comprises the number of tasks to be processed of the local terminal and/or task types of the tasks to be processed.

Specifically, in view of the collaborative operation scenario of multiple inference nodes, only the communication links which are logically executed exist between the inference nodes, for example, in the above man-machine interaction scenario, the output of the inference node a and the inference node B is the input of the inference node C. However, the information states to be processed of the inference nodes cannot be mutually transmitted, so that in the embodiment of the invention, under the cooperative work scene, a scheduling module is additionally arranged, and the scheduling module can be communicated with each inference node, so that the task scheduling of the cooperative work can be realized according to the information to be processed of each inference node.

Before the scheduling module schedules each inference node to process the task, each inference node needs to send the information to be processed of the local end to the scheduling module, where the information to be processed is used to represent the relevant information of the task to be processed by the inference node, for example, the information may include the number of tasks to be processed, or may include the task type of each task to be processed, or may also include the number of tasks to be processed and the task type of each task to be processed, which is not limited in particular by the embodiment of the present invention.

After receiving the information to be processed of each inference node, the scheduling module can select a target node from the inference nodes according to the received information to be processed of each inference node and the importance of each inference node in collaborative operation. Here, the target node is an inference node that needs to be preferentially scheduled.

In the collaborative operation scene, each inference node bears the task of collaborative processing and has the importance of the inference node in collaborative operation. However, because the influence degree of different inference nodes on response time in collaborative operation is different in acceleration efficiency when the dimension is increased, occupation degree of computing equipment and occupation degree of each inference node on I/O equipment, importance of each inference node in collaborative operation is also different.

For example, the scheduling priority of each inference node may be determined in combination with the information to be processed of each inference node and the importance in the collaborative job, so as to select the target node according to the priority level. For another example, the target node may be selected according to the information to be processed of each inference node, and in this process, inference nodes similar to the information to be processed may be prioritized according to their importance in the collaborative job to determine the target node.

After the target node is determined, the scheduling module also needs to send a task processing instruction to the target node, where the task processing instruction is an instruction sent by the scheduling module to control the target node to process the task.

Step 520, if a task processing instruction is received, task processing is performed.

Specifically, after receiving a task processing instruction sent by the scheduling module, the target node performs task processing, wherein the task processing comprises data splicing and reasoning calculation of a task to be processed.

It should be noted that, the inference node that does not process the task may continue to receive the task to be processed, and increase the number of tasks to be processed, so as to increase the possibility of being selected as the target node, further avoid the problem that a Batch Size is too small caused by frequent call of a certain inference node, and meanwhile avoid the phenomenon that a certain node may not be scheduled for a long time, and ensure balance of throughput and response.

According to the task scheduling method provided by the embodiment of the invention, each inference node sends the information to be processed of the local end to the scheduling module, and the scheduling module combines the information to be processed of each inference node and the importance of each inference node in collaborative operation, and selects the target node from each inference node, so that the selected target node can meet the requirement of overall throughput and the requirement of task response time when performing task processing, the problem that a scheduling inference scheme of a single neural network cannot perform scheduling inference on tasks of a plurality of neural networks in collaborative operation is solved, and scheduling inference under a complex scene is realized.

Based on the above embodiment, fig. 6 is a schematic diagram of a task scheduling method applied to a multi-path task scenario, where T and R in fig. 6 represent tasks to be processed outside corresponding inference nodes, each T or each R corresponds to one path of session, batch1, exe1, and Exe2 in fig. 6 correspond to a first inference node, batch2, exe3, and Exe4 correspond to a second inference node, and Batch3 and Exe5 correspond to a third inference node. Wherein, the Batch represents a data splicing node in the reasoning nodes, the Exe represents a reasoning engine node for executing reasoning in the reasoning nodes, and the Schedule is a scheduling module. Solid arrows indicate data transmission and the direction of data transmission, and dashed arrows indicate the scheduling process of the scheduling module. Each Batch performs the following functions:

(1) When task processing is carried out, the inference engine has certain requirements on the size of processing data, the data input from the outside possibly cannot meet the size requirements, and the inference engine does not have the capability of storing data and managing the data, so that the function is realized by the Batch, namely each data splicing node stores the task to be processed of the local end;

(2) The scheduling module is assisted to schedule and dispatch the whole task, each Batch analyzes the task to be processed managed by the local end, the information to be processed of the local end is clear, and the information to be processed of the local end is sent to the scheduling module to assist the scheduling module to schedule and manage.

In the application scenario provided by the embodiment of the invention, the execution flow aiming at the single-path task comprises the following steps:

Step S1, a user inputs a plurality of Data (Data 0, data1, …, dataN) to Exe1 for reasoning calculation;

Step S2, outputting a calculation result by Exe1, inputting the calculation result output by Exe1 to Exe2 for reasoning calculation, and simultaneously, inputting write-back data in the calculation result output by Exe1 back to Batch1;

Step S3, the user inputs a plurality of Data (Data 0, data1, …, dataN) to Exe3 for reasoning calculation;

Step S4, outputting a calculation result by Exe3, inputting the calculation result output by Exe3 to Exe4 for reasoning calculation, and simultaneously, inputting write-back data in the output result back to Batch2;

And S5, after the execution of the reasoning calculation process of the Exe2 and the Exe4 is finished, inputting the calculation results output by the Exe2 and the Exe4 into the Exe5 for final reasoning calculation to obtain the calculation result output by the Exe 5.

Each one-way task is performed in the timing of Exe1 to Exe2, exe3 to Exe4, and exe2+exe4 to Exe 5.

As shown in fig. 6, the execution flow for the multi-path task includes:

(1) Using a scheduling module to uniformly manage all the Batch, namely Batch1, batche and Batch3 in the graph, logically treating continuous reasoning engine nodes as the same node, for example treating Exe1 and Exe2 as the same node, and treating Exe3 and Exe4 as the same node;

(2) The scheduling module acquires the information to be processed of each inference node, namely the information to be processed of the first inference node, the second inference node and the third inference node, wherein the acquired information to be processed comprises the number of tasks to be processed and/or the task types of the tasks to be processed under the corresponding inference nodes, and when the tasks to be processed exist in the inference nodes, the information of the tasks to be processed is fed back to the scheduling module;

(3) When the scheduling module senses that the task to be processed exists in the currently managed reasoning nodes, a target node is selected from the reasoning nodes according to the information to be processed of the reasoning nodes and the importance of the reasoning nodes in collaborative operation, the target node is scheduled to splice data, and the spliced data is input to the reasoning engine node to perform reasoning calculation.

And a reasonable scheduling strategy and an effective engine combination in a multi-path task scene can ensure the maximization of task throughput and the high utilization rate of equipment.

The scheduling module provided by the invention is described below, and the scheduling module described below and the task scheduling method described above can be referred to correspondingly.

Fig. 7 is a schematic structural diagram of a scheduling module provided by the present invention. As shown in fig. 7, the module includes:

The information obtaining unit 710 is configured to obtain information to be processed of each inference node, where the information to be processed includes the number of tasks to be processed and/or task types of each task to be processed under the corresponding inference node;

The target selecting unit 720 is configured to select a target node from the inference nodes based on the information to be processed of the inference nodes and the importance of the inference nodes in the collaborative operation, where the sum of computing resources required by all the target nodes for performing task processing is less than or equal to a rated computation amount;

And the instruction sending unit 730 is configured to send a task processing instruction to the target node, so as to trigger the target node to perform task processing.

According to the scheduling module provided by the invention, when the target node is selected from the inference nodes, the importance of the information to be processed of the inference nodes and the importance of the inference nodes in collaborative operation are considered, so that the selected target node can meet the requirement of overall throughput and the requirement of task response time when performing task processing, the problem that a scheduling inference scheme of a single neural network cannot perform scheduling inference on tasks of collaborative work of a plurality of neural networks is solved, and scheduling inference under a complex scene is realized.

Based on the above embodiment, the target selection unit 720 is configured to:

Fig. 8 is a schematic diagram of the structure of an inference node provided by the present invention. As shown in fig. 8, the node includes:

the sending unit 810 is configured to send the information to be processed of the local end to a scheduling module, so that the scheduling module selects a target node from the inference nodes based on the information to be processed of the inference nodes and the importance of the inference nodes in the collaborative job, and issues a task processing instruction to the target node; the method comprises the steps that a local terminal cooperates with other reasoning nodes, and the information to be processed comprises the number of tasks to be processed of the local terminal and/or task types of the tasks to be processed;

and the task processing unit 820 is configured to perform task processing if the task processing instruction is received.

According to the reasoning nodes provided by the invention, each reasoning node sends the information to be processed of the local end to the scheduling module, and the scheduling module combines the information to be processed of each reasoning node and the importance of each reasoning node in collaborative operation, and selects the target node from each reasoning node, so that when the selected target node processes tasks, the requirement of overall throughput can be met, the requirement of task response time is met, the problem that a scheduling reasoning scheme of a single neural network cannot schedule and reason tasks of a plurality of neural networks in collaborative operation is solved, and scheduling reasoning under a complex scene is realized.

Fig. 9 is a schematic structural diagram of a collaborative operation system provided in the present invention, and as shown in fig. 9, the system includes a scheduling module 700, and a plurality of inference nodes 800.

Fig. 10 illustrates a physical structure diagram of an electronic device, as shown in fig. 10, which may include: processor 1010, communication interface (Communications Interface) 1020, memory 1030, and communication bus 1040, wherein processor 1010, communication interface 1020, and memory 1030 communicate with each other via communication bus 1040. Processor 1010 may invoke logic instructions in memory 1030 to perform a task scheduling method that applies to a scheduling module, the method comprising: acquiring to-be-processed information of each inference node, wherein each inference node cooperates with the operation, and the to-be-processed information comprises the number of to-be-processed tasks under the corresponding inference node and/or the task types of each to-be-processed task; selecting target nodes from the inference nodes based on the information to be processed of the inference nodes and the importance of the inference nodes in collaborative operation, wherein the sum of calculation resources required by task processing of all the target nodes is smaller than or equal to rated calculation amount; and sending a task processing instruction to the target node to trigger the target node to process the task.

Further, the logic instructions in the memory 1030 described above may be implemented in the form of software functional units and stored in a computer readable storage medium when sold or used as a stand alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

In another aspect, the present invention also provides a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, enable the computer to perform the task scheduling method provided by the above methods, the method being applied to a scheduling module, the method comprising: acquiring to-be-processed information of each inference node, wherein each inference node cooperates with the operation, and the to-be-processed information comprises the number of to-be-processed tasks under the corresponding inference node and/or the task types of each to-be-processed task; selecting target nodes from the inference nodes based on the information to be processed of the inference nodes and the importance of the inference nodes in collaborative operation, wherein the sum of calculation resources required by task processing of all the target nodes is smaller than or equal to rated calculation amount; and sending a task processing instruction to the target node to trigger the target node to process the task.

In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the task scheduling methods provided above, the method being applied to a scheduling module, the method comprising: acquiring to-be-processed information of each inference node, wherein each inference node cooperates with the operation, and the to-be-processed information comprises the number of to-be-processed tasks under the corresponding inference node and/or the task types of each to-be-processed task; selecting target nodes from the inference nodes based on the information to be processed of the inference nodes and the importance of the inference nodes in collaborative operation, wherein the sum of calculation resources required by task processing of all the target nodes is smaller than or equal to rated calculation amount; and sending a task processing instruction to the target node to trigger the target node to process the task.

The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A task scheduling method, which is applied to a scheduling module, comprising:

Determining the reasoning priority of each reasoning node based on the information to be processed of each reasoning node and the collaborative operation weight of each reasoning node, and selecting a target node from the reasoning nodes based on the reasoning priority of each reasoning node, wherein the target node is a reasoning node with higher reasoning priority; the sum of computing resources required by all target nodes for task processing is smaller than or equal to rated computing quantity, and the collaborative operation weight is determined based on the importance of the corresponding reasoning node in collaborative operation;

2. The task scheduling method according to claim 1, wherein the determining the inference priority of each inference node based on the information to be processed of each inference node and the importance of each inference node in the collaborative job includes:

and determining the reasoning priority of each reasoning node based on the number of priority tasks under each reasoning node and the importance of each reasoning node in collaborative operation.

3. The task scheduling method according to claim 2, wherein determining the inference priority of each inference node based on the number of priority tasks under each inference node and the importance of each inference node in collaborative operation, and selecting a target node from the inference nodes based on the inference priority of each inference node, comprises:

The priority reasoning nodes are reasoning nodes with the number of priority tasks being more than 0.

4. A task scheduling method according to claim 3, wherein the determining the target node based on the inferred priorities of the prioritized inference nodes comprises:

5. A task scheduling method according to claim 3, wherein the determining the target node based on the inferred priorities of the prioritized inference nodes comprises:

6. A task scheduling method, applied to an inference node, comprising:

The method comprises the steps that information to be processed of a local end is sent to a scheduling module, so that the scheduling module can determine the reasoning priority of each reasoning node based on the information to be processed of each reasoning node and the collaborative operation weight of each reasoning node, select a target node from each reasoning node based on the reasoning priority of each reasoning node, and send task processing instructions to the target node; the target node is an inference node with higher inference priority, and the collaborative operation weight is determined based on the importance of the corresponding inference node in collaborative operation;

If the task processing instruction is received, performing task processing;

7. A scheduling module, comprising:

The target selection unit is used for determining the reasoning priority of each reasoning node based on the information to be processed of each reasoning node and the collaborative operation weight of each reasoning node, and selecting a target node from the reasoning nodes based on the reasoning priority of each reasoning node, wherein the target node is a reasoning node with higher reasoning priority; the sum of computing resources required by all target nodes for task processing is smaller than or equal to rated computing quantity, and the collaborative operation weight is determined based on the importance of the corresponding reasoning node in collaborative operation;

8. An inference node, comprising:

The system comprises a sending unit, a scheduling module and a task processing unit, wherein the sending unit is used for sending information to be processed of a local end to the scheduling module, so that the scheduling module can determine the reasoning priority of each reasoning node based on the information to be processed of each reasoning node and the collaborative operation weight of each reasoning node, select a target node from each reasoning node based on the reasoning priority of each reasoning node and send a task processing instruction to the target node; the method comprises the steps that a local terminal cooperates with other reasoning nodes, and the information to be processed comprises the number of tasks to be processed of the local terminal and/or task types of the tasks to be processed; the target node is an inference node with higher inference priority, and the collaborative operation weight is determined based on the importance of the corresponding inference node in collaborative operation;

9. A collaborative operation system comprising the scheduling module of claim 7 and a plurality of inference nodes of claim 8.

10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the task scheduling method according to any one of claims 1 to 6 when the program is executed by the processor.

11. A non-transitory computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the task scheduling method according to any one of claims 1 to 6.