CN113032113A

CN113032113A - Task scheduling method and related product

Info

Publication number: CN113032113A
Application number: CN201911359640.0A
Authority: CN
Inventors: 不公告发明人
Original assignee: Cambricon Technologies Corp Ltd
Current assignee: Cambricon Technologies Corp Ltd
Priority date: 2019-12-25
Filing date: 2019-12-25
Publication date: 2021-06-25
Anticipated expiration: 2039-12-25
Also published as: CN113032113B

Abstract

The application relates to a task scheduling method and a related product, which are mainly used for distributing different operation tasks to different computing platforms according to the computation complexity and the access amount of the operation tasks contained in a machine learning model. By taking the calculation intensity as a main division basis, the reasonable scheduling of the calculation tasks is realized, the calculation resources of different calculation platforms are fully utilized to achieve the purpose of cooperative processing, and the calculation efficiency can be greatly improved.

Description

Task scheduling method and related product

Technical Field

The present application relates to the field of data processing technologies, and in particular, to a task scheduling method and a related product.

Background

In recent years, machine learning has made major breakthroughs, such as: in the field of machine learning, a neural network model trained by a deep learning algorithm is adopted, a neural connection structure of a human brain is simulated by establishing the model, and data characteristics are described hierarchically through a plurality of transformation stages when signals such as images, sounds and texts are processed, so that remarkable results are obtained in the application fields such as image recognition, voice processing and intelligent robots.

With the continuous improvement of the complexity of the machine learning algorithm, the operation tasks of the machine learning model are also continuously increased. At present, in order to meet the requirement of universality in the processing mode of the calculation task for the machine learning model, the whole calculation task is usually put into a calculation platform (such as a server) with strong calculation capability for processing.

However, the above method wastes the computing power of some other computing platforms (such as terminals), so that the processing efficiency of the computing task is not good.

Disclosure of Invention

Therefore, it is necessary to provide a task scheduling method and related products for solving the technical problem of poor processing efficiency of the operation task.

A method of task scheduling, the method comprising:

acquiring computing power and bandwidth corresponding to a plurality of computing platforms;

determining a performance evaluation strategy corresponding to each computing platform according to the computing power and the bandwidth;

acquiring the calculated amount and the inventory amount of the calculation task contained in the machine learning model;

determining a performance parameter corresponding to the operation task when the operation task runs in each computing platform according to the calculated amount and the access amount and based on a performance evaluation strategy corresponding to each computing platform;

generating a task scheduling strategy according to the performance parameters;

and scheduling the operation tasks in the plurality of computing platforms based on the task scheduling strategy.

In one embodiment, determining the performance evaluation policy corresponding to each computing platform according to the computing power and the bandwidth includes:

determining a performance evaluation function corresponding to each computing platform according to the computing power and the bandwidth; or determining a performance evaluation function curve corresponding to each computing platform according to the computing power and the bandwidth.

determining the maximum computing intensity which can be provided by each computing platform according to the computing power and the bandwidth;

and determining a performance evaluation strategy corresponding to each computing platform according to the maximum computing strength, the computing power and the bandwidth which can be provided by each computing platform.

In one embodiment, determining, according to the calculated amount and the access amount and based on the performance evaluation policy corresponding to each computing platform, a performance parameter corresponding to the operation task when the operation task runs in each computing platform includes:

determining the calculation intensity corresponding to the calculation task according to the calculated amount and the access amount;

and determining the performance parameters corresponding to the operation task when the operation task runs in each computing platform according to the computing intensity corresponding to the operation task and the performance evaluation strategy corresponding to each computing platform.

In one embodiment, the performance parameters include the number of operations per second that can be achieved when the operation task runs in the computing platform.

In one embodiment, the computing platforms comprise a first computing platform and a second computing platform, wherein the first computing platform and the second computing platform have the same bandwidth, and the computing power of the first computing platform is greater than that of the second computing platform;

generating a task scheduling strategy according to the performance parameters, wherein the task scheduling strategy comprises the following steps:

if the number of times of operation per second which can be reached when the operation task runs in the first computing platform is larger than the number of times of operation per second which can be reached when the operation task runs in the second computing platform, the operation task is distributed to the first computing platform;

and if the operation times per second which can be reached when the operation task runs in the first computing platform are the same as the operation times per second which can be reached when the operation task runs in the second computing platform, distributing the operation task to the second computing platform.

In one embodiment, the computing platforms comprise a first computing platform and a second computing platform, wherein the bandwidth of the first computing platform is greater than the bandwidth of the second computing platform, and the computing power of the first computing platform is greater than the computing power of the second computing platform;

if the computation intensity corresponding to the computation task is less than or equal to the maximum computation intensity which can be provided by the second computing platform, distributing the computation task to the second computing platform;

and if the computation intensity corresponding to the computation task is greater than the maximum computation intensity which can be provided by the second computation platform, distributing the computation task to the first computation platform.

In one embodiment, the computing platforms comprise a first computing platform and a second computing platform, wherein the bandwidth of the first computing platform is less than the bandwidth of the second computing platform, and the computing power of the first computing platform is greater than that of the second computing platform;

if the number of times of operation per second that can be reached when the operation task runs in the first computing platform is less than or equal to the maximum computation strength that can be provided by the number of times of operation per second that can be reached when the operation task runs in the second computing platform, distributing the operation task to the second computing platform;

and if the operation times per second which can be reached when the operation task runs in the first computing platform are greater than the maximum computation intensity which can be provided by the operation times per second computing platform and can be reached when the operation task runs in the second computing platform, distributing the operation task to the first computing platform.

In one embodiment, the operation task includes an operation task corresponding to a machine learning model, an operation task corresponding to each layer of a network in the machine learning model, or an operation task corresponding to a neural network operator.

In one embodiment, the first computing platform is a server and the second computing platform is a terminal.

In one embodiment, the method further comprises: acquiring the number of terminals;

and generating a task scheduling strategy according to the performance parameters and the number of the terminals.

In one embodiment, the method further comprises: acquiring load information corresponding to the terminal and load information corresponding to the server;

and generating a task scheduling strategy according to the performance parameters, the load information corresponding to the terminal and the load information corresponding to the server.

A task scheduler comprises an instruction generation circuit and a communication circuit, wherein the instruction generation circuit is used for executing the task scheduling method, generating a control instruction according to the task scheduling strategy, and controlling the communication unit to schedule and distribute the operation tasks in the plurality of computing platforms according to the control instruction.

A task processing system comprises a terminal and a server, wherein the terminal comprises the task scheduler, the control instruction comprises a terminal control instruction and a server control instruction, the terminal is used for calculating the operation task distributed to the terminal according to the terminal control instruction, and the server is used for calculating the operation task distributed to the server according to the server control instruction.

A card, the card comprising: an artificial intelligence processor for performing any of the methods described above.

A motherboard, the motherboard comprising: a general processor and the board card.

An electronic device comprises the mainboard.

The task scheduling method and the related products adopt the attributes of a computing platform: computing power and bandwidth, attributes of the machine learning model calculation task: the performance of the operation tasks in respective operation of a plurality of computing platforms is determined by the calculated amount and the memory access amount, so that the computing platform to which the operation tasks are allocated is determined according to the performance, a plurality of operation tasks can be allocated to different computing platforms for cooperative processing, and the attributes of the operation tasks are more adaptive to the attributes of the computing platforms. By adopting the scheduling mode, the computing power of different computing platforms is effectively utilized, and the computing efficiency is greatly accelerated.

Drawings

FIG. 1 is a schematic diagram of an embodiment of a combined treatment apparatus;

FIG. 2 is a flowchart illustrating a task scheduling method according to an embodiment;

FIG. 3 is a diagram illustrating a performance evaluation function curve in one embodiment;

FIG. 4 is a flowchart illustrating a complementary approach to generating a task scheduling policy based on the performance parameters according to an embodiment;

FIG. 5 is a diagram that illustrates performance evaluation function curves for different computing platforms in one embodiment;

FIG. 6 is a flowchart illustrating a complementary approach for generating a task scheduling policy according to the performance parameter in another embodiment;

FIG. 7 is a diagram illustrating performance evaluation function curves for different computing platforms in another embodiment;

FIG. 8 is a flow chart illustrating a complementary approach to generating a task scheduling policy based on the performance parameters in yet another embodiment;

FIG. 9 is a diagram illustrating performance evaluation function curves for different computing platforms in yet another embodiment;

FIG. 10 is a diagram of a statistics table of the calculated amount and the accessed amount of VGG16 in one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The task scheduling method provided by the application can be applied to the application environment shown in fig. 1. The first computing platform 10 includes a first controller 110, a first arithmetic unit 120, and a first communication unit 130, wherein the first controller 110 is connected to the first arithmetic unit 120 and the first communication unit 130, respectively. The second computing platform 20 includes a second controller 110, a second arithmetic unit 120, and a second communication unit 130, and the second controller 110 is connected to the second arithmetic unit 120 and the second communication unit 130, respectively. The first communication unit 130 communicates with the second communication unit 230 through a network. Specifically, in the first controller 110 or the second controller 120, a corresponding task scheduling apparatus may be provided. The task scheduling equipment is used for evaluating the performance of each operation task on different computing platforms according to the corresponding computing power and bandwidth of the computing platforms and the computing amount and the access amount of the computing tasks, so that a task scheduling strategy is generated to realize the scheduling of the operation tasks. On the other hand, the first controller 110 or the second controller 120 is further configured to generate a control instruction when the task is scheduled, so that the other units can perform corresponding operations according to the control instruction. For example: and each arithmetic unit processes the corresponding arithmetic task according to the control instruction and generates an output result. Each communication unit realizes the interaction between the first computing platform 10 and the second computing platform 20 according to the control instruction.

It should be noted that the application scenarios related to the embodiments of the present application are not limited to the number of computing platforms, and the number of computing platforms may be three or more.

In an embodiment, please refer to fig. 2, which provides a task scheduling method, for example, the method is applied to a task scheduling device, and includes the following steps:

s202, computing power and bandwidth corresponding to a plurality of computing platforms are obtained.

Wherein, a computing platform refers to a hardware device with computing power. For example, a computing platform includes a server and a client. The computation power, i.e. the number of peak operations per second, also referred to as the upper limit of the performance of the computing platform, is understood to mean that a computing platform is inclined to complete the computation power per second. Bandwidth, i.e., Peak Memory data access per second (Peak Memory Bandwidth), also referred to as the upper Bandwidth limit of the computing platform, is understood to be the amount of Memory exchange that can be performed per second with the full force of a computing platform tilting, and is expressed in bytes/sec.

Specifically, the task scheduling device obtains computing power and bandwidth corresponding to a plurality of computing platforms.

And S204, determining a performance evaluation strategy corresponding to each computing platform according to the computing power and the bandwidth.

In particular, since computing power and bandwidth are inherent attributes of a computing platform, they may characterize the performance of the computing platform. Optionally, for the computation power and the bandwidth, the task scheduling device may perform a series of arithmetic operations to obtain a performance index for characterizing the computing platforms, and according to the performance index, the computation power and the bandwidth, the performance evaluation policy corresponding to each computing platform may be determined. The performance evaluation strategy is used for determining corresponding performance parameters when the operation task runs in the computing platform.

And S206, acquiring the calculated amount and the inventory amount of the calculation tasks included in the machine learning model.

The machine learning model is a model used for machine learning calculation. The machine learning model may be, for example, a support vector machine, a neural network model, or the like. The operation tasks included in the machine learning model comprise tasks required to be calculated by the machine learning model, tasks required to be calculated by layers in the machine learning model, tasks required to be calculated by the neural network operator and the like according to different objects. However, any operation task has the corresponding inherent properties: calculated amount and inventory amount. For neural network models, the computational effort may refer to inputting a single sample, such as: for the CNN model, the number of floating point operations that occur when the model performs one complete forward propagation, that is, the time complexity of the model is in the unit of FLOPS, may be one image, and may also refer to the amount of operations of the image in a certain layer or a certain operator in a neural network. The access amount may refer to the total amount of memory exchange occurring in the process of inputting a single sample and completing one forward propagation by the model, that is, the spatial complexity of the model, or may refer to the access amount of a certain layer or operator in the transmission process.

Specifically, the task scheduling device acquires the calculation amount and the inventory amount of the operation task included in the machine learning model.

And S208, determining the corresponding performance parameters of the operation task when the operation task runs in each computing platform according to the calculated amount and the access amount and based on the performance evaluation strategy corresponding to each computing platform.

Specifically, the task scheduling device obtains a performance index of the task: and after the calculation amount, the memory access amount and the performance evaluation strategy corresponding to the calculation platform are calculated, determining the corresponding performance parameters of the calculation task when the calculation task runs in each calculation platform according to the information. The performance parameter is used for representing performance of the operation task when the operation task runs in the computing platform, and may be, for example, parameters such as operation speed and operation time.

And S210, generating a task scheduling strategy according to the performance parameters.

Specifically, after acquiring all performance parameters, the task scheduling device generates a task scheduling policy by comparing or according to a preset scheduling condition. The task scheduling strategy is to distribute tasks to a more adaptive computing platform based on performance.

S212, based on the task scheduling strategy, the operation tasks are scheduled in a plurality of computing platforms.

Specifically, in view of the characteristics of machine learning: the calculation amount is large, and the number of calculation tasks contained in the calculation is also large. Based on the steps, the task scheduling strategy corresponding to each operation task can be obtained, so that the operation tasks can be distributed to different computing platforms for cooperative processing, and the attributes of the operation tasks are more adaptive to the attributes of the computing platforms, thereby effectively utilizing the computing power of the different computing platforms and greatly accelerating the computing efficiency.

The task scheduling method adopts the attributes of a computing platform: computing power and bandwidth, attributes of the machine learning model calculation task: the performance of the operation tasks in respective operation of a plurality of computing platforms is determined by the calculated amount and the memory access amount, so that the computing platform to which the operation tasks are allocated is determined according to the performance, a plurality of operation tasks can be allocated to different computing platforms for cooperative processing, and the attributes of the operation tasks are more adaptive to the attributes of the computing platforms. By adopting the scheduling mode, the computing power of different computing platforms is effectively utilized, and the computing efficiency is greatly accelerated.

Optionally, in one embodiment, the number of peak operations per second comprises a number of peak floating point operations per second, in units of FLOP/sec. It should be noted that the following references to the number of peak floating point operations per second are specific examples.

In one embodiment, the method involves determining a possible implementation of a performance assessment policy for each computing platform based on the computing power and the bandwidth. On the basis of the above embodiment, S210 includes the steps of:

and S210a, determining a performance evaluation function corresponding to each computing platform according to the computing power and the bandwidth.

Or,

and S210b, determining a performance evaluation function curve corresponding to each computing platform according to the computing power and the bandwidth.

In particular, the performance evaluation policies may be presented in different presentation manners. For example: the task scheduling equipment adopts a function form as a performance evaluation strategy; or a function image mode is adopted as a performance evaluation strategy.

Optionally, as a further description, the task scheduling device determines the maximum computation strength that each computing platform can provide according to the computation power and the bandwidth, and further determines the performance evaluation policy corresponding to each computing platform according to the maximum computation strength, the computation power, and the bandwidth that each computing platform can provide. Wherein the maximum computation strength may be a ratio of computation force to bandwidth.

For easy visual understanding, please refer to fig. 3, in which fig. 3 is a schematic diagram of an embodiment of a performance evaluation function curve. Wherein, P represents the corresponding performance parameter when the operation task runs in the computing platform, I represents the ratio of the calculated amount and the access amount of the operation task: the calculation intensity of the calculation task, beta represents the bandwidth, pi represents the calculation power, I_maxRepresenting the computational intensity of the computing platform. It is to be understood that the computational power determines the height of the lines in the image, and the bandwidth determinesThe slope of the oblique line in the image is determined, and all the performance that can be achieved by the operation tasks running on the computing platform is below the broken line.

For a more intuitive understanding, the image is divided into two regions: a computation-Bound bottleneck area (computer-Bound) and a bandwidth-Bound bottleneck area (Memory-Bound). In the bandwidth bottleneck region, when the computation intensity of the computation task is smaller than that of the computation platform, the performance of the computation task is limited by the memory bandwidth. At this time, the larger the bandwidth of the computing platform is, or the greater the computing intensity of the computing task is, the better the performance of the task is. In the computing bottleneck region, when the computing intensity of the computing task is greater than that of the computing platform, the performance of the computing task is limited by computing power. At this time, the higher the computing power of the computing platform is, the better the performance of the computing task is.

In one embodiment, based on the above, the performance evaluation function can be expressed as:

in one embodiment, the method involves determining a possible implementation process of a performance parameter corresponding to a running operation task in each computing platform according to a calculation amount and an access amount and based on a performance evaluation strategy corresponding to each computing platform. On the basis of the above embodiment, S208 includes the steps of:

s2082, determining the calculation intensity corresponding to the calculation task according to the calculation amount and the access amount;

s2084, determining the corresponding performance parameters of the operation task when the operation task runs in each computing platform according to the computing intensity corresponding to the operation task and the performance evaluation strategy corresponding to each computing platform.

Specifically, the calculation strength corresponding to the operation task may be a ratio of a calculation amount to an access amount, that is, how many times of floating point operations are performed when each Byte memory is swapped to the end, and the unit is FLOP/Byte. As mentioned above, the task scheduling device may substitute the computation strength corresponding to the computation task into the performance evaluation function or the performance evaluation function curve to obtain the performance parameter corresponding to the computation task when the computation task runs in each computing platform. Optionally, the performance parameter includes a number of floating point operations per second that can be achieved when the operation task runs in the computing platform.

To further explain the present application, in one embodiment, a computing platform comprises a first computing platform and a second computing platform, wherein the first computing platform is the same bandwidth as the second computing platform and the first computing platform is more computationally powerful than the second computing platform. Alternatively, the first computing platform may be a terminal or a server and the second computing platform may be a terminal or a server. In view of fig. 4, S210 includes the following steps:

s2101, if the number of floating-point operations per second that can be achieved when the operation task operates in the first computing platform is greater than the number of floating-point operations per second that can be achieved when the operation task operates in the second computing platform, allocating the operation task to the first computing platform;

s2102, if the number of floating point operations per second that can be achieved when the operation task runs in the first computing platform is the same as the number of floating point operations per second that can be achieved when the operation task runs in the second computing platform, allocating the operation task to the second computing platform.

Specifically, referring to fig. 5, fig. 5 shows a performance evaluation function curve corresponding to the first computing platform and a performance evaluation function curve corresponding to the first computing platform, respectively. It will be appreciated that since the bandwidths of the first and second computing platforms are the same and the computational power is different, the slopes of their slopes are the same and only the heights of the straight lines are different. Also, as can be seen in fig. 4: before the m points, the number of floating point operations per second which can be reached when the operation task runs in the first computing platform is the same as the number of floating point operations per second which can be reached when the operation task runs in the second computing platform; after m points, the number of times of floating point operation per second which can be reached when the operation task runs in the first computing platform is larger than the number of times of floating point operation per second which can be reached when the operation task runs in the second computing platform. In this case, the operation task located before the point m may be allocated to the second computing platform, and the operation task located after the point m may be allocated to the first computing platform, so that the computing power of the first computing platform and the second computing platform may be fully utilized, thereby increasing the operation speed as a whole.

In another embodiment, the bandwidth of the first computing platform is greater than the bandwidth of the second computing platform, and the computing power of the first computing platform is greater than the computing power of the second computing platform. In this regard, referring to fig. 6, 10 includes the following steps:

s2103, if the computation strength corresponding to the computation task is less than or equal to the maximum computation strength that can be provided by the second computing platform, allocating the computation task to the second computing platform;

s2104, if the computation strength corresponding to the computation task is greater than the maximum computation strength that can be provided by the second computing platform, allocating the computation task to the first computing platform.

Specifically, referring to fig. 7, fig. 7 shows a performance evaluation function curve corresponding to the first computing platform and a performance evaluation function curve corresponding to the first computing platform, respectively. It is easy to see that, since the bandwidth of the first computing platform is greater than the bandwidth of the second computing platform, and the computational power of the first computing platform is greater than that of the second computing platform, the slope of the oblique line corresponding to the first computing platform is greater than that of the oblique line corresponding to the second computing platform, and the height of the straight line corresponding to the first computing platform is also greater than that of the straight line corresponding to the second computing platform. For such a situation, the operation tasks before m points, that is, the operation tasks meeting the requirement that the computation intensity which can be achieved when the operation tasks run in the second computing platform is less than or equal to the maximum computation intensity which can be provided by the second computing platform, can be distributed to the second computing platform, and the computation power of the second computing platform is fully utilized; and distributing the operation tasks after the m points, namely the operation tasks meeting the condition that the calculation intensity of the operation tasks when the operation tasks run in the second calculation platform is greater than the maximum calculation intensity provided by the second calculation platform, to the first calculation platform. Therefore, the computing power of the first computing platform and the second computing platform can be fully utilized, and the computing speed is increased on the whole.

In yet another embodiment, the bandwidth of the first computing platform is less than the bandwidth of the second computing platform, and the computing power of the first computing platform is greater than the computing power of the second computing platform. In this regard, referring to fig. 8, S210 includes the following steps:

s2105, if the number of floating point operations per second which can be achieved when the operation task runs in the first computing platform is less than or equal to the number of floating point operations per second which can be achieved when the operation task runs in the second computing platform, distributing the operation task to the second computing platform;

s2106, if the number of floating point operations per second that can be achieved when the operation task runs in the first computing platform is greater than the number of floating point operations per second that can be achieved when the operation task runs in the second computing platform, allocating the operation task to the first computing platform.

Specifically, referring to fig. 9, fig. 9 shows a performance evaluation function curve corresponding to the first computing platform and a performance evaluation function curve corresponding to the first computing platform, respectively. It is easy to see that, since the bandwidth of the first computing platform is smaller than the bandwidth of the second computing platform and the computing power of the first computing platform is greater than that of the second computing platform, the slope of the oblique line corresponding to the first computing platform is smaller than that of the oblique line corresponding to the second computing platform, and the height of the straight line corresponding to the first computing platform is greater than that of the straight line corresponding to the second computing platform. For such a situation, the operation tasks before m points, that is, the operation tasks meeting the requirement that the number of floating point operations per second which can be reached by the operation tasks when the operation tasks run in the first computing platform is less than or equal to the number of floating point operations per second which can be reached by the operation tasks when the operation tasks run in the second computing platform, can be distributed to the second computing platform, and the computing power of the second computing platform is fully utilized; and distributing the operation tasks after m points, namely the operation tasks meeting the requirement that the number of times of floating point operation per second which can be reached when the operation tasks run in the first calculation platform is greater than the number of times of floating point operation per second which can be reached when the operation tasks run in the second calculation platform, to the first calculation platform. Therefore, the computing power of the first computing platform and the second computing platform can be fully utilized, and the computing speed is increased on the whole.

As an example, the first computing platform is a server and the second computing platform is a terminal. The terminal comprises a first controller, a first arithmetic unit, a first memory and a first communication module, wherein the first controller is respectively connected with the first arithmetic unit, the first memory and the first communication module, and the first memory is respectively connected with the first arithmetic unit and the first communication module. The server comprises a second controller, a second arithmetic unit, a second memory and a second communication module, wherein the second controller is respectively connected with the second arithmetic unit, the second memory and the second communication module, and the second memory is respectively connected with the second arithmetic unit and the second communication module. The first communication module and the second communication module communicate through a network. Alternatively, the task scheduling device may be provided in the first controller.

The operation tasks to be executed by the terminal and the server comprise operation tasks corresponding to the machine learning model, operation tasks corresponding to each layer of the network in the machine learning model or operation tasks corresponding to the neural network operators. For the neural network, the operation task includes an operation task corresponding to the neural network model, an operation task corresponding to each layer of the network in the neural network model, or an operation task corresponding to the neural network operator. The operation task can be matrix multiplication with small task quantity or a training method of a neural network with large task quantity.

Generally, the computation power of a server is necessarily greater than that of a terminal, but the bandwidth is not, so in order to fully utilize the computation resources of the server and the terminal, a computation-intensive computation task is generally allocated to the server, for example, a convolution layer of a model under a caffe framework; and distributing the data-intensive operation tasks to terminals, such as a full connection layer of a model under a caffe framework.

For example, taking CNN network-VGG 16 under the caffe framework as an example, please refer to fig. 5, and fig. 5 is a schematic diagram of a calculated amount and memory access amount statistical table of VGG 16. As can be seen from the figure, the computational tasks at different layers have greatly different requirements on computational power and bandwidth. For the server, it has moreHigh calculation power, I_maxMore difficult to achieve and therefore subject to bandwidth limitations; for the terminal, the performance of the terminal computing chip is limited, and the terminal computing chip is easy to be limited by computing power. In order to implement reasonable scheduling of operation tasks of different layers, one feasible implementation manner may be: distributing the operation tasks of Conv2D to server processing, thereby fully utilizing the high calculation power; all the computing tasks of Maxpooling2D and Flatten are distributed to terminal processing to utilize the characteristics of low computing power and high bandwidth; for the last Dense operation task, the last Dense operation task can be distributed in a server or a terminal according to the actual FLOPS result; for the VGG, the operation tasks of the first two layers of Dense can be distributed to server processing, and the operation tasks of the last layer of Dense can be distributed to terminal processing. Therefore, the terminal and the server are enabled to process the calculation tasks in a cooperative and integrated mode according to the layer division, and the calculation capacity of the server and the calculation capacity of the terminal are utilized to the maximum extent.

In an embodiment, considering that there is an execution sequence when the terminal and the server execute different network layers, the embodiment distributes the operation tasks in a manner similar to a ping-pong operation, so that the terminal and the server do not block each other, and both can be in a task processing state as much as possible. For example, in the server and the terminal, the calculation of the calculation task 1 and the calculation task 2 are performed alternately, for example: when the server executes the Conv layer of the operation task 1 and the terminal needs to wait, the terminal may execute the maxporoling layer of the operation task 2 first, and continue to execute the Conv layer of the operation task 2 after the server executes the Conv layer of the operation task 1 until the result of the maxporoling layer of the task 1 of the terminal is returned. It is understood that the method can make the calculation components of the server and the terminal run full of calculation power as much as possible, and further improve the calculation efficiency.

In one embodiment, S210 includes the steps of: and generating a task scheduling strategy according to the performance parameters and the number of the terminals. Specifically, the task scheduling device acquires the number of the terminals, and can determine the performance parameters after the terminals are overlapped according to the number of the terminals and the corresponding performance parameters when the operation tasks run in each terminal, so that a task scheduling strategy is generated according to the performance parameters after the terminals are overlapped and the corresponding performance parameters of the server. In the embodiment, considering that the performance of the terminal after being stacked is higher, the operation tasks can be more reasonably distributed by fully utilizing the plurality of terminals, the proportion of the terminal to process the tasks is increased, and the operation efficiency can be further improved.

In one embodiment, S210 further comprises the steps of: and generating a task scheduling strategy according to the performance parameters, the load information corresponding to the terminal and the load information corresponding to the server. Specifically, the task scheduling device acquires the performance parameters on one hand, and acquires the calculation load information when the terminal and the server process the calculation tasks respectively on the other hand, and if the calculation load of the server is greater than the preset value and the calculation load of the terminal is still residual, the task scheduling device allocates part of the calculation tasks after exceeding the m points to the server but to the terminal for processing, that is: even if the terminal is in a computation bottleneck state, the computation task is cooperatively processed by the terminal. It can be understood that, in order to reduce the computational load of the server, some of the computational tasks that are fully computational for the terminal may be allocated to the terminal for processing, and by means of a reasonable scheduling manner, the two portions of computational resources are fully utilized, so that the task scheduling flexibility is higher.

Further, in one embodiment, a plurality of terminals are set as a cluster to cooperatively process the operation tasks allocated to the terminal side. Specifically, after the current terminal is allocated with the operation task, the state of the current terminal is judged, and if the current terminal reaches the calculation power bottleneck state, that is, the current terminal cannot provide higher operation capability, at the moment, the next terminal is selected to receive the allocated operation task, so that the purpose that a plurality of terminals process the operation task together is achieved.

In this embodiment, compared to a single terminal, the terminal group can process more operation tasks in a task distribution manner, so that the performance is more excellent, and the operation efficiency can be further improved.

It should be understood that although the various steps in the flow charts of fig. 2-9 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2-9 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternating with other steps or at least some of the sub-steps or stages of other steps.

In one embodiment, a task scheduler is provided that includes instruction generation circuitry and communication circuitry. The instruction generating circuit is used for executing the task scheduling method in the embodiment, generating a control instruction according to the task scheduling policy, and further controlling the communication circuit to distribute the operation tasks according to the control instruction so as to schedule the operation tasks in the plurality of computing platforms.

The instruction generating circuit comprises minimum integration of a circuit for generating computer instructions. The instruction generation circuit may be integrated on a processing chip of a certain computer device. Similarly, the communication circuit includes a minimum integration of circuits for realizing the communication function. The communication circuit can be packaged in a communication product to form a communication module with independent functions.

Specifically, in the task scheduler, the instruction generation circuit acquires index parameters of a plurality of computing platforms, such as computing power and bandwidth, and acquires index parameters of a plurality of computing tasks, such as computing amount and access amount, and based on the method execution steps, obtains a corresponding task scheduling policy, and further generates a control instruction according to the task scheduling policy, and after the control instruction is analyzed, the control instruction can control the communication circuit to distribute the computing tasks to the computing platforms, so that the purpose of scheduling different computing tasks in the computing platforms is achieved.

In one embodiment, a task processing system is also provided, and the task scheduling system comprises a terminal and a server. Alternatively, the task scheduler described above may be provided on the terminal side or on the server side. Correspondingly, the control instruction comprises a terminal control instruction and a server control instruction, wherein the terminal control instruction is used for controlling the action of the terminal during operation; the server control instruction is used for controlling the action of the server during operation.

Illustratively, the task scheduler is provided on the terminal side, whereby, in the terminal, the instruction generation circuit in the task scheduler generates the data communication instruction, the terminal control instruction, and the server control instruction, respectively, according to the task scheduling policy. The data communication instruction is used for realizing data interaction between the server and the terminal through a communication circuit; the server control instruction is used for controlling an arithmetic unit and a memory in the server to finish the operation and storage of corresponding operation tasks; and the terminal control instruction is used for controlling an arithmetic unit and a memory in the server to finish the operation and storage of the corresponding operation task. And when all the operations are completed, the final operation result is returned to the terminal so as to be displayed to the user.

In one embodiment, the server may be an independent server, or may be a server cluster composed of a plurality of servers, such as a cloud server.

In an embodiment, a board is further provided, where the board includes an artificial intelligence processor, and the artificial intelligence processor includes a task scheduler, and the task scheduler is configured to execute the task scheduling method according to any of the foregoing embodiments.

In one embodiment, there is also provided a main board, including: general purpose processor and above-mentioned integrated circuit board.

In one embodiment, an electronic device is also provided, and the electronic device is the motherboard.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A method for task scheduling, the method comprising:

generating a task scheduling strategy according to the performance parameters;

2. The method of claim 1, wherein determining the performance assessment policy for each computing platform based on the computing power and the bandwidth comprises:

determining a performance evaluation function corresponding to each computing platform according to the computing power and the bandwidth; or,

and determining a performance evaluation function curve corresponding to each computing platform according to the computing power and the bandwidth.

3. The method of claim 1, wherein determining the performance assessment policy for each computing platform based on the computing power and the bandwidth comprises:

4. The method of claim 1, wherein determining, according to the calculated amount and the access amount and based on the performance evaluation policy corresponding to each computing platform, a performance parameter corresponding to the operation task when the operation task runs in each computing platform comprises:

5. The method of any of claims 1-4, wherein the performance parameters include a number of operations per second that can be performed while the computational task is running on the computing platform.

6. The method of claim 5, wherein the computing platforms comprise a first computing platform and a second computing platform, wherein the first computing platform is the same bandwidth as the second computing platform, and wherein the first computing platform is more computationally powerful than the second computing platform;

7. The method of claim 5, wherein the computing platforms comprise a first computing platform and a second computing platform, wherein the bandwidth of the first computing platform is greater than the bandwidth of the second computing platform, and wherein the computational power of the first computing platform is greater than the second computing platform;

if the computing intensity corresponding to the computing task is less than or equal to the maximum computing intensity which can be provided by the second computing platform, the computing task is distributed to the second computing platform, wherein the computing intensity corresponding to the computing task represents the ratio of the computing amount to the accessing amount, and the maximum computing intensity which can be provided by the second computing platform represents the ratio of the computing power to the bandwidth which corresponds to the second computing platform;

8. The method of claim 5, wherein the computing platforms comprise a first computing platform and a second computing platform, wherein the bandwidth of the first computing platform is less than the bandwidth of the second computing platform, and wherein the computational power of the first computing platform is greater than the second computing platform;

if the number of times of operation per second which can be reached when the operation task runs in the first computing platform is less than or equal to the number of times of operation per second which can be reached when the operation task runs in the second computing platform, distributing the operation task to the second computing platform;

and if the operation times per second which can be reached when the operation task runs in the first computing platform are larger than the operation times per second which can be reached when the operation task runs in the second computing platform, distributing the operation task to the first computing platform.

9. The method of claim 1, wherein the computational tasks include computational tasks corresponding to machine learning models, computational tasks corresponding to each layer of a network in a machine learning model, or computational tasks corresponding to neural network operators.

10. The method of any of claims 6-8, wherein the first computing platform is a server and the second computing platform is a terminal.

11. The method of claim 10, further comprising: acquiring the number of the terminals;

12. The method of claim 10, further comprising: acquiring load information corresponding to the terminal and load information corresponding to the server;

13. A task scheduler comprising instruction generation circuitry and communication circuitry;

the instruction generating circuit is configured to execute the task scheduling method according to any one of claims 1 to 12, generate a control instruction according to the task scheduling policy, and control the communication circuit to distribute the operation tasks according to the control instruction, so as to schedule the operation tasks in the plurality of computing platforms.

14. A task processing system comprising a terminal and a server, wherein the terminal comprises a task scheduler according to claim 13, wherein the control instructions comprise terminal control instructions and server control instructions,

the terminal is used for calculating the calculation tasks distributed to the terminal according to the terminal control instruction, and the server is used for calculating the calculation tasks distributed to the server according to the server control instruction transmitted by the communication circuit in the terminal.