CN114546609A

CN114546609A - DNN inference task batch scheduling method facing heterogeneous cluster

Info

Publication number: CN114546609A
Application number: CN202210035043.8A
Authority: CN
Inventors: 王超; 张仁宇; 朱宗卫; 周学海; 李曦
Original assignee: Suzhou Institute Of Higher Studies University Of Science And Technology Of China
Current assignee: Suzhou Institute Of Higher Studies University Of Science And Technology Of China
Priority date: 2022-01-13
Filing date: 2022-01-13
Publication date: 2022-05-27

Abstract

The invention discloses a DNN inference task batch scheduling method facing heterogeneous clusters. Wherein, the method comprises the following steps: the system comprises a DNN reasoning task characterization module, a heterogeneous cluster dynamic hardware processing capacity extraction module and a DNN reasoning task batch scheduling module; the DNN reasoning task characterization module is used for classifying the DNN reasoning tasks according to the task types to construct a DNN reasoning task class set; the heterogeneous cluster dynamic hardware processing capacity extraction module is used for dividing the computing nodes and constructing a heterogeneous computing node set according to a dividing result; and the DNN inference task batch scheduling module is used for scheduling the DNN tasks according to the DNN inference task partition set and the heterogeneous computing node set through a target search algorithm. Through the arrangement of the modules, the DNN reasoning task is effectively classified, the calculation power measurement of different nodes of a heterogeneous cluster is solved, the DNN reasoning task is optimally scheduled, and the DNN reasoning task batch scheduling oriented to the heterogeneous cluster is optimized.

Description

DNN inference task batch scheduling method facing heterogeneous cluster

Technical Field

The embodiment of the invention relates to a DNN inference task batch scheduling technology, in particular to a DNN inference task batch scheduling method facing heterogeneous clusters.

Background

In recent years, with the rapid development of Deep Neural Networks (DNNs), applications of Artificial Intelligence (AI) have been developed, and the scale of application categories and the number of applications have been increased significantly. With superior massively parallel computing power, general purpose graphics processing units (gpgpgpu) have become mainstream deep learning network accelerators.

The DNN acceleration network on the GPU usually comprises two stages, firstly a large-scale training data set is needed to train a selected DNN model, the training process is time-consuming and is often placed on a computing cluster with high computing power, and then the network model is deployed on the large-scale cluster to carry out actual reasoning stage work after the training is finished. On the other hand, high-precision DNN networks have spawned a large number of artificial intelligence applications based on this technology, which have a large number of user groups and correspond to different usage goals. Typically, these applications send data collected at the terminal back to the cloud data center to perform inference work using a high performance GPU.

However, due to differences among GPU hardware resources, most of the existing clusters are heterogeneous, which is mainly reflected in computational power heterogeneity, that is, a plurality of computing nodes with different computational powers are often divided in the clusters for performing DNN-class application inference tasks. Task scheduling in a cluster typically requires attention to the degree of load balancing among the different compute nodes and the average completion time of all tasks.

Reasoning for DNN-class applications can be divided into three classes for different applications according to application requirements: real-time tasks, interactive tasks, and background tasks. Real-time tasks require that reasoning must be done within a small delay, application scenarios such as security cameras real-time monitoring and autopilot. The background task is the opposite, requiring little delay for the inference but sensitive to the precision required for the inference. More complex are interactive tasks, which can often tolerate a certain inference delay and also do not require to maintain the highest inference precision at all times, so that the inference precision and the inference rate can be reasonably reduced. To sum up, the application requirements corresponding to the three tasks are known, the DNN inference tasks of different classes have different characteristics, and some DNN inference tasks require high inference rate and fast response. Others do not require fast response but require high inference accuracy. That is, the inference task can be divided into many categories, and the models, inference rates and inference precisions used by them are also different. Therefore, for the task, an important measure for the quality of the scheduling result is used for reasoning the timeout amount of the task.

At the task flow level, a huge number of inference requests of DNN applications are generated by a user every day, and the inference requests are inferred after reaching the heterogeneous cluster at different time points. Under big data, the task flow generated by a huge user population can present obvious tidal rules. For example, the density distribution of the number of tasks over the time period of the day, often much during the day and less late at night. In conventional task scheduling, little research is done on the distribution characteristics of tasks in a task flow.

Disclosure of Invention

The invention provides a DNN inference task batch scheduling method facing heterogeneous clusters, which is used for optimizing DNN inference task batch scheduling on clusters.

The technical scheme mainly solves the following three problems:

(a) modeling heterogeneous cluster dynamic hardware processing capabilities

The cluster is mostly composed of various types of computing board cards with different computing powers, and in order to solve the problem of measuring the computing powers of different nodes of a heterogeneous cluster, the invention provides a set of modeling method for the dynamic processing capability of hardware.

(b) Modeling of DNN inference tasks through analysis of DNN inference tasks

The DNN models used, the required inference time constraints and the required inference precision constraints are different for different classes of DNN inference tasks. The invention provides a DNN inference task modeling method which can effectively classify tasks in a DNN inference task set.

(c) And scheduling the DNN reasoning task, so that the scheduling result can be optimal under the evaluation function formed by each evaluation index. The invention provides a dimension reduction method based on a static strategy and dynamic parameters, a low-dimensional parameter solution space representing a scheduling result is constructed, and then a meta-heuristic search algorithm is introduced to search an optimal solution of scheduling.

The technical scheme of the invention is as follows:

the embodiment provides a DNN inference task batch scheduling method facing heterogeneous clusters, which comprises the following steps:

the DNN reasoning task scheduling system comprises a DNN reasoning task characterization module, a heterogeneous cluster dynamic hardware processing capacity extraction module and a DNN reasoning task batch scheduling module;

the DNN reasoning task characterization module is used for classifying the DNN reasoning tasks according to task types to obtain a DNN reasoning task division set;

the heterogeneous cluster dynamic hardware processing capacity extraction module is used for dividing the computing nodes and constructing a heterogeneous computing node set according to a dividing result;

and the DNN inference task batch scheduling module is used for scheduling the DNN inference tasks according to the DNN inference task partition set and the heterogeneous computing node set through a target search algorithm.

Optionally, the DNN inference task characterization module includes:

the cluster deployment model statistics device is used for carrying out statistics on the DNN models which can be inferred on each node in the cluster and determining a DNN model list;

the cluster external DNN application statistics device is used for counting external DNN application sets corresponding to all the inferential DNN models in the cluster;

and the DNN inference task divider is used for determining a DNN inference task division set according to the DNN model list and the external DNN application set.

Optionally, the DNN inference task partition set includes: all task types of the DNN inference, and the allowed response time, the target usage model and the target node set corresponding to each task type.

Optionally, the heterogeneous cluster dynamic hardware processing capability extraction module includes:

the cluster computing node divider is used for dynamically dividing the board cards in the cluster according to the computing power so as to obtain a heterogeneous computing node set;

and the cluster reasoning task processing timer is used for actually executing each type of DNN reasoning tasks on each processable heterogeneous computing node in advance according to the DNN reasoning task partition set and the heterogeneous computing node set, counting the consumed time of all processing, generating a dynamic hardware processing capacity matrix according to the consumed time result and sending the matrix to the DNN reasoning task batch scheduling module.

Optionally, a dynamic parameter matrix representing a scheduling result is constructed based on a static policy and a dimension reduction method of dynamic parameters;

correspondingly, the DNN inference task batch scheduling module comprises:

the task scheduler is used for segmenting the task stream input in the DNN reasoning task division set and generating a corresponding scheduling result stream;

the scheduling result evaluator is used for evaluating each scheduling result in the scheduling result stream according to the evaluation function;

and the searcher is used for searching in the dynamic parameter matrix through a meta-heuristic algorithm and taking a solution after the meta-heuristic algorithm converges as an optimal scheduling solution.

Optionally, the evaluation function is:

wherein σ₁Is the average completion time, σ, of a task₂For the average timeout of all tasks, σ₃The load balance degree; xi is a certain scheduling result, lambda₁、λ₂And λ₃Three different weights.

The DNN reasoning task can be effectively classified through the DNN reasoning task representation module; the measurement of the computational power of different nodes of the heterogeneous cluster is solved through a dynamic hardware processing capability extraction module of the heterogeneous cluster; the DNN inference task batch scheduling module realizes the scheduling of the DNN inference tasks, so that the scheduling result can be evaluated by each evaluation index

The function is optimal, so that the optimization of DNN inference task batch scheduling facing heterogeneous clusters is realized.

Drawings

FIG. 1 is a frame design diagram of a DNN inference task batch scheduling method for heterogeneous clusters according to the present invention;

FIG. 2 is a schematic diagram illustrating a method for implementing batch scheduling of DNN inference tasks in the present invention;

FIG. 3 is an exemplary algorithmic flow of a batch scheduling implementation of the present invention;

FIG. 4 is a comparison of scheduling results for different types of task flows in the present invention;

FIG. 5 is a comparison diagram of the configuration scheduling results of various heterogeneous cluster nodes in the present invention;

FIG. 6 is a comparison graph of scheduling results of various optimization objectives in the present invention;

FIG. 7 is a graph of the convergence rate of various optimization algorithms of the present invention;

fig. 8 is an exemplary diagram of scheduling of a DNN inference task batch scheduling method framework oriented to heterogeneous clusters in the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.

In order to realize the DNN inference task batch scheduling optimization method facing the heterogeneous cluster, the invention designs a scheduling optimization method based on a static and dynamic mixed dimension reduction mechanism. Fig. 1 is a general design framework diagram of a DNN-BS scheduling optimization method, including:

the DNN reasoning task characterization module is used for classifying the DNN reasoning tasks according to task types to obtain a DNN reasoning task division set; wherein these different DNN inference tasks are characterized by task feature vectors.

The heterogeneous cluster dynamic hardware processing capacity extraction module is used for dividing the computing nodes and constructing a heterogeneous computing node set according to the dividing result.

The specific composition and execution steps of the above three modules are detailed below.

First, DNN reasoning task representation module

(a) The cluster deployment model statistics device is used for carrying out statistics on the DNN models which can be inferred on each node in the cluster and determining a DNN model list, and specifically refer to table 1, wherein table 1 lists a model set which can be inferred on each node.

Computing node 1	Computing node 2	……	Computing node n
				Model set 1	Model set 2	……	Model machine n

TABLE 1

(b) And the cluster external DNN application statistics device is used for counting external DNN application sets corresponding to all the inferential DNN models in the cluster. Each application has a different task classification, namely a real-time task, an interactive task or a background task. The response time allowed for the inference request task generated by each DNN application can be derived from cluster-external DNN application stats. And the used models are counted, and an application information matrix is uniformly formed, as shown in figure 2.

Application of DNN 1	Application 2 of DNN	……	Application of DNN n
				Allowed response	Allowed response	……	Allowed response
Time
	1	Time 2	……	Time n
Use model
	1	Use model 2	……	Using model n

TABLE 2

(c) And the DNN inference task divider is used for determining a DNN inference task division set according to the DNN model list and the external DNN application set. Specifically, the DNN inference task divider synthesizes the model statistics list and the application information matrix given by both the cluster deployment model statistics device and the cluster external DNN application statistics device, and obtains the DNN inference task division set shown in table 3. This set gives all the task types for DNN inference on the cluster, each type corresponding to the allowed response time, the model used and the set of nodes that can handle this inference task.

DNN inference task type 1	DNN inference task type 2	……	DNN inference task type n
				Allowed response time 1	Allowed response time 2	……	Allowed response time n
Use model
	1	Use model 2	……	Using model n
Processable node set 1					Processable node set 2	……	Processable node set n

TABLE 3

Second, heterogeneous cluster dynamic hardware processing capacity extraction module

(a) And the cluster computing node divider is used for dynamically dividing the board cards in the cluster according to the computing power so as to obtain a heterogeneous computing node set.

Specifically, a large number of boards with different computing power are often arranged in a large cluster, and the cluster computing node divider is mainly used for dynamically dividing the boards in the cluster into computing nodes with different computing power. After the division is finished, a heterogeneous computing node set is obtained, and the set gives all heterogeneous computing nodes capable of processing DNN inference tasks in the current cluster.

(b) And the cluster reasoning task processing timer is used for counting all processing time consumption which comprises actual reasoning time and communication time by actually executing each type of DNN reasoning task on each processable heterogeneous computing node in advance according to the DNN reasoning task partition set and the heterogeneous computing node set. And further, generating a dynamic hardware processing capacity matrix according to the time-consuming result and sending the matrix to the DNN inference task batch scheduling module for generating a scheduling result stream.

Third, DNN reasoning task batch scheduling module

(a) Batch scheduling implementation method

In this embodiment, a low-dimensional parameter solution space representing a scheduling result is constructed based on a static policy and a dynamic parameter dimension reduction method, and then a meta-heuristic search algorithm is introduced to search for an optimal solution of scheduling, so as to optimize a DNN inference task batch scheduling method. Referring to fig. 2 specifically, in the present scheduling implementation method, a scheduling method template is first constructed, which is composed of a static policy set and a set of all parameters in these policies. This scheduling method template may be instantiated, i.e. the static policies are combined and all parameter sets therein constitute a dynamic parameter vector. When a task flow comes, the task flow is dynamically segmented by utilizing the short-term characteristic of task scheduling, and each segment is actually scheduled by using different scheduling method instances. And the dynamic parameter vectors in all the scheduling method examples in the whole task flow form a dynamic parameter matrix. This is a low dimensional parameter matrix. The whole task flow is scheduled by all the scheduling method examples to obtain a scheduling result, and the dynamic parameter vectors in all the current scheduling method examples have current values. Therefore, the final equivalence is that a certain integral value in the matrix corresponds to a scheduling result of a task flow. Next, the values of the dynamic parameter matrix K are searched by a searcher in combination with the optimization objective and the dynamic hardware processing capability. Each value corresponds to a scheduling result. When the search process converges, the optimal K is obtained. The optimal scheduling result can be generated by adding the optimal value of the parameter matrix K to the scheduling instance method of each segment of the task stream at the last time, and a specific implementation flow chart can be seen in fig. 3.

(b) Task scheduler

The scheduler performs segmentation of the incoming task stream, each segment performing task scheduling operations using one scheduling method instance. When the segmentation, all scheduling method instances and the dynamic parameter matrix are determined, a scheduling result stream is generated corresponding to a task stream input. The timing at which each DNN inference task is scheduled to a certain compute node for inference is given in the result stream.

(c) And the scheduling result evaluator is used for evaluating each scheduling result in the scheduling result stream according to the evaluation function.

Under the scene, three evaluation indexes are provided, and the average completion time sigma of all tasks₁Average timeout of all tasks σ₂And degree of load balancing σ₃. The formula of the evaluation function is specifically:

for a certain scheduling result xi, by three different weights λ₁、λ₂And λ₃These three are combined using linear functions to form the score evaluation function. After a scheduling result is generated, the grade of the scheduling result can be obtained through the score. The weights are all set artificially according to actual needs.

(d) Searching device

By searching in the dynamic parameter matrix K using a meta-heuristic algorithm (e.g., a particle swarm algorithm), an optimal scheduling solution can be obtained after convergence. The final scheduling result flow can be obtained through the scheduling solution.

The experimental verification of the scheme is as follows:

the experiment of the invention is realized on a heterogeneous reasoning cluster formed by heterogeneous equipment based on NVIDIA A100, NVIDIA TITAN Xp and the like. In the above equipment, the experiment is uniformly installed with some necessary operating environments, such as Ubuntu 16.04, CUDA 10.2, CUDNN 7.5, pytorreh 1.7, etc., and the DNN inference task is executed based on the pytorreh framework. The task set comprises a plurality of classification tasks and target detection tasks, wherein the backbone network adopts nine types of mainstream neural networks, such as DenseNet121, DenseNet169, DenseNet201, ResNet50, ResNet101, ResNet152, VGG16, VGG19 and Inception V3. This guarantees the heterogeneous environment of the platform and the diversity of the DNN inference task set. Five experiments were designed in this experiment to evaluate the excellence of DNN-BS over the existing algorithms from different aspects.

In the dispatching experiment, the DNN inference task batch dispatching system DNN-BS provided by the experiment is evaluated based on the heterogeneous inference cluster. In the comparative experiment setup, the experiment was selected to be the least completeTime-of-arrival scheduling (Minimum Completion Time), Min-Min, Max-Min, modified Min-Min, modified Max-Min, Genetic Algorithm (GA), and particle swarm algorithm (PSO) were used as the comparison criteria for scheduling methods. Set up the contrast experiment of multiple aspect simultaneously, included: 1) grading comparison of scheduling results under evaluation functions under task flows of different scales, various types of task flows, various heterogeneous cluster node configurations and various optimization targets, wherein all seven comparison methods are used; 2) the convergence rate and the scheduling result are compared with each other in the case of calculating the ratio of the number of nodes for various task flow scales, and the GA and the PSO are compared here because the other 5 tasks do not have the property of convergence rate. The experiment was performed with the average completion time σ of all tasks₁Average timeout of all tasks σ₂And degree of load balancing σ₃The scheduling performance was evaluated for the observed quantity. Default settings are 20 compute nodes and 8000 inference tasks.

1. Comparison of scheduling results under different scale task streams

Three task flow scales of 3000, 5000 and 8000 were set here to perform the experiment. Giving the average completion time σ of all tasks₁Average timeout of all tasks σ₂And degree of load balancing σ₃And finally combining the scoring results of the merit functions as shown in table 1. It can be seen here that DNN-BS has the best effect on each index for different task flow sizes. Since GA and PSO are global search algorithms, their results will also be better than the other five methods. See table 4 for specific results:

TABLE 4

2. Comparison of scheduling results under different types of task streams

The spatiotemporal characteristics of tasks in a task stream are different, such as the number of inferred task arrivals in different time periods, the distribution of different classes of tasks in the task stream, and so on. Five task flows with distributed tasks are selected from the experiment for experiment. As shown in FIG. 4, the scheduling results of MCT, Min-Min, modified Min-Min, Max-Min, modified Max-Min, GA, PSO and the DNN-BS of the scheme are sequentially shown in each category from left to right, and the result shows that the DNN-BS has excellent scheduling results for different types of task flows.

3. Comparison of scheduling results under configuration of various heterogeneous cluster nodes

In this experiment, different node calculation force distributions in the cluster are set. Experiments were performed using the same task stream and the mean scores for ten different task streams were counted with each node computing power configuration. As shown in FIG. 5, the scheduling results of MCT, Min-Min, modified Min-Min, Max-Min, modified Max-Min, GA, PSO and the present solution DNN-BS are shown in each category from left to right in turn, and the result shows that the DNN-BS has the best scheduling effect.

4. Comparison of scheduling results under multiple optimization objectives

Average completion time σ in all tasks₁Average timeout of all tasks σ₂And degree of load balancing σ₃In the composed evaluation function, the weights of the three are usually adjusted as needed in practice to experience which index is more important in scheduling. In this experiment the experiment was according to lambda₁、λ₂、λ₃The order of which adjusts the weight parameters to form different scheduling optimization objectives. As shown in FIG. 6, the scheduling results of MCT, Min-Min, modified Min-Min, Max-Min, modified Max-Min, GA, PSO and DNN-BS in the scheme are shown in each category from left to right in turn, and the figure shows that DNN-BS in the experiment is still superior to scheduling results of other seven methods.

5. Comparison of convergence rates

In DNN-BS, the convergence rate needs to be evaluated due to the existence of a global search segment. In the experiment, the PSO and GA which are commonly used are used as comparative experiment methods, and the experiment is carried out on task flows with different scales. Six cases with the task flow scale of 1000, 3000, 5000, 8000, 10000 and 20000 are selected for experiments. As shown in fig. 7, the convergence rates of GA, PSO, and the present scheme DNN-BS are compared in each category. As can be seen from the figure, DNN-BS has a very fast convergence rate in all cases, while the final convergence results also greatly exceed PSO and GA algorithms.

Abstract example:

FIG. 8 shows an example of the use of the DNN-BS scheduling optimization method based on the static and dynamic hybrid dimension reduction mechanism.

Example 1:

large data centers typically have very large heterogeneous computing clusters that can accommodate hundreds or thousands of computing nodes. Where a very large number of well-trained neural network models are deployed. A quite wide user group outside the cluster sends DNN inference task requests to a data center for inference by using various DNN applications. These tasks correspond to different inference models and allowed inference delays. The DNN task inference scheduling in this scenario includes the following three processes: cluster DNN inference request collection, DNN inference task initialization, and compute node DNN inference. The DNN-BS scheduling optimization method is mainly used for scheduling each task to each computational node with different computing power after the DNN reasoning tasks reach the cluster. The DNN-BS can analyze the dynamic hardware processing capacity aiming at the current cluster heterogeneous computing node division condition, and finds the most appropriate task-computing node matching result by operating the scheduling algorithm of the DNN-BS.

Example 2:

with the development of technologies such as internet of things, artificial intelligence, intelligent embedded devices and the like, a small number of large data centers have been deployed to process all DNN reasoning tasks. Instead, multiple small processing clusters are deployed where the network edge devices are more concentrated. In this case, locally scoped DNN inference task scheduling needs to be done on small processing clusters at the edge of the network.

For such a scenario, the DNN-BS scheduling method may analyze all local DNN applications at the edge of the network, and perform a class division on this DNN inference task in advance. And simultaneously dividing all the computing nodes of the edge small-sized cluster to obtain a dynamic hardware processing capacity matrix. When the local DNN inference tasks reach the small processing cluster, the weight parameters in the optimized objective function can be adjusted according to the scheduling objective, so that the complete DNN inference task scheduling process is continued.

Example 3:

smart homes also face such problems. In a smart home there is usually one processing center that processes the data of all sensors. The data are converted into DNN reasoning tasks after arriving at the processing center. And often there is more than one hardware processing unit in the processing center. All types of data analysis inside the intelligent home can be classified into a limited number of DNN inference task categories, and inference can be carried out on a plurality of hardware processing units of the processing center.

In the scene, the DNN-BS scheduling optimization method can be deployed on scheduling nodes of a family small-sized processing center, and a dynamic hardware processing capacity matrix is obtained by obtaining the reasoning time of all kinds of DNN reasoning tasks in each hardware processing unit. And then all DNN inference tasks can be dispatched to the corresponding hardware processing units for inference through the rest DNN-BS dispatching algorithm flow.

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims

1. A DNN inference task batch scheduling method facing heterogeneous clusters is characterized by comprising the following steps:

2. The method of claim 1, wherein the DNN inference task characterization module comprises:

3. The method of claim 2, wherein the set of DNN inference task partitions comprises: all task types of the DNN inference, and the allowed response time, the target usage model and the target node set corresponding to each task type.

4. The method of claim 1, wherein the heterogeneous cluster dynamic hardware processing capability extraction module comprises:

and the cluster reasoning task processing timer is used for dividing the set and the heterogeneous computing node set according to the DNN reasoning tasks, counting the consumed time of all processing by actually executing each type of DNN reasoning tasks on each processable heterogeneous computing node in advance, generating a dynamic hardware processing capacity matrix according to the consumed time result and sending the matrix to the DNN reasoning task batch scheduling module.

5. The method of claim 1, wherein a dynamic parameter matrix characterizing the scheduling result is constructed based on a static policy and a dimension reduction method of the dynamic parameter;

correspondingly, the DNN inference task batch scheduling module comprises:

the task scheduler is used for segmenting the task stream input in the DNN inference task division set and generating a corresponding scheduling result stream;

6. The method of claim 5, wherein the merit function is: