CN117762583A - Task scheduling method, device, electronic equipment and storage medium - Google Patents

Task scheduling method, device, electronic equipment and storage medium Download PDF

Info

Publication number
CN117762583A
CN117762583A CN202311684555.8A CN202311684555A CN117762583A CN 117762583 A CN117762583 A CN 117762583A CN 202311684555 A CN202311684555 A CN 202311684555A CN 117762583 A CN117762583 A CN 117762583A
Authority
CN
China
Prior art keywords
target
task
state
processors
tasks
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311684555.8A
Other languages
Chinese (zh)
Inventor
翟东奇
窦则胜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202311684555.8A priority Critical patent/CN117762583A/en
Publication of CN117762583A publication Critical patent/CN117762583A/en
Pending legal-status Critical Current

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The disclosure relates to the field of computer technology, and in particular, to a task scheduling method, a task scheduling device, electronic equipment and a storage medium. The specific implementation scheme is as follows: before the program corresponding to the target tasks operates, the target tasks are distributed to the corresponding processors; in the process of program operation, determining the priority of each target task in each processor according to the operation state of each processor; and dynamically scheduling the high-priority target task among the plurality of processors according to the running state of each processor. According to the method and the device, the performance of the multiprocessor system can be effectively improved, the computing efficiency is improved, the dispatching cost is small, the task allocation is flexible, and the method and the device can be better adapted to the change of the environment in the running process.

Description

Task scheduling method, device, electronic equipment and storage medium
Technical Field
The disclosure relates to the field of computer technology, and in particular, to a task scheduling method, a task scheduling device, electronic equipment and a storage medium.
Background
In a multiple GPU (Graphics Processing Unit, graphics processor) hardware system, the task allocation and scheduling approach faces many challenges. In a multi-GPU environment, tasks need to be assigned to individual GPUs. How to reasonably distribute tasks according to the characteristics of the tasks and the characteristics of the GPU, and design an efficient scheduling strategy so that all the tasks can be efficiently executed according to the expected sequence.
In the prior art, static scheduling or dynamic scheduling is adopted independently, the static scheduling method is to distribute tasks during compiling, the tasks are divided into a plurality of subtasks and distributed to a plurality of GPUs, the static scheduling method adopted independently has the advantages of small scheduling expenditure, but has the disadvantages of inflexible task distribution and incapability of adapting to the change of the environment during running; the dynamic scheduling method is to distribute tasks in the running time, dynamically distributes the tasks to a plurality of GPUs according to the change of the running time environment, and has the advantages of flexible task distribution and adaptability to various environment changes, but has the disadvantage of larger scheduling cost.
Disclosure of Invention
The disclosure provides a task scheduling method, a task scheduling device, electronic equipment and a storage medium.
According to a first aspect of the present disclosure, there is provided a task scheduling method applied to a task processing system including a plurality of processors; comprising the following steps:
before running programs corresponding to the target tasks, distributing the target tasks to the corresponding processors;
determining the priority of each target task in each processor according to the running state of each processor in the running process of the program;
and dynamically scheduling the target task with high priority among a plurality of processors according to the running state of each processor.
According to a second aspect of the present disclosure, there is provided a task scheduling device applied to a task processing system including a plurality of processors; comprising the following steps:
the static allocation module is configured to allocate a plurality of target tasks to the corresponding processors before the programs corresponding to the target tasks run;
a priority determining module configured to determine, during the program running, a priority of each of the target tasks in each of the processors according to an running state of each of the processors;
and the dynamic scheduling module is configured to dynamically schedule the target task with high priority among a plurality of processors according to the running state of each processor.
According to a third aspect of the present disclosure, there is provided an electronic device comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of the above claims.
According to a fourth aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the method according to any one of the above-mentioned technical solutions.
According to a fifth aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements a method according to any of the above-mentioned technical solutions.
According to a sixth aspect of the present disclosure, there is provided an autopilot system comprising an electronic device as described in the above-mentioned technical solution.
The method and the device have the advantages that the performance of the multiprocessor system is improved, the computing efficiency is improved, the scheduling overhead is small, the task allocation is flexible, and the method and the device can be better adapted to the change of the environment during operation.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.
Drawings
The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is a schematic step diagram of a task scheduling method in an embodiment of the present disclosure;
FIG. 2 is a block diagram of a state prediction model for predicting the operating state of a GPU in an embodiment of the present disclosure;
FIG. 3 is a functional block diagram of a task scheduling device in an embodiment of the present disclosure;
fig. 4 is a block diagram of an electronic device used to implement a task scheduling method of an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
The existing static scheduling method is used for distributing tasks during compiling, dividing the tasks into a plurality of subtasks and distributing the subtasks to a plurality of GPUs, and the independent adoption of the static scheduling method has the advantages of small scheduling overhead, but has the disadvantages of inflexible task distribution and incapability of adapting to the change of the environment during running. The existing dynamic scheduling method is to distribute tasks in the running time, dynamically distributes the tasks to a plurality of GPUs according to the change of the running time environment, and the independent adoption of the dynamic scheduling method has the advantages of flexible task distribution and adaptability to various environment changes, but has the disadvantage of larger scheduling expenditure.
Aiming at the technical problems existing in the prior art that static scheduling or dynamic scheduling is adopted independently, the disclosure provides a task scheduling method which is applied to a task processing system, wherein the task processing system comprises a plurality of processors; as shown in fig. 1, includes:
step S101, before the program corresponding to the target tasks is run, the target tasks are distributed to the corresponding processors;
step S102, determining the priority of each target task in each processor according to the running state of each processor in the running process of the program;
step S103, according to the running state of each processor, dynamically scheduling the target task with high priority among a plurality of processors.
Specifically, the present disclosure adopts a hybrid scheduling method that merges static allocation and dynamic scheduling, taking a multi-GPU task processing system as an example, step S101, where a target task is allocated to each GPU before program execution, i.e., static allocation. Static allocation is the partitioning of the entire GPU available memory into memory blocks of equal size, each of which is considered a data storage unit to hold different types of data while the program is running. In the static allocation mode, the memory can be allocated in advance when the program is compiled, so that no additional memory allocation operation is needed when the program is running, thereby improving the running speed of the program, but the memory allocated in the mode cannot be dynamically changed in size, and therefore, the phenomenon of unreasonable consumption of the memory may exist.
Step S102, after the static allocation, since each GPU may be allocated to multiple tasks, the tasks in the single GPU are further prioritized according to the system operation condition, and step S102 is to dynamically schedule the tasks in the single GPU. Step S103, after the priority ordering of the tasks in each GPU, each target task can be processed according to the priority of the task, in the running process, multi-card scheduling is performed on the high-priority tasks with high partial real-time requirements, for example, due to the fact that the task A is statically allocated to the task A of the N-th GPU, the real-time requirements of the task A are high, if the task A is deployed on the N-1-th GPU, the system time delay is smaller, the task A can be scheduled from the N-th GPU to the N-1-th GPU, and step S103 is to dynamically schedule the tasks among the multiple GPUs. Step S102 and step S103 are performed during the running of the program, i.e., dynamic allocation. The dynamic allocation can effectively solve the problem that the static allocation divides the GPU memory into fixed sizes, and can adjust and reallocate the memory in time according to actual use conditions, so that the utilization rate of the program to the GPU is improved. The GPU utilization efficiency is further improved through dynamic scheduling, the real-time computing capacity is improved, dynamic scheduling is not needed for all tasks, only part of high-priority tasks are needed to be dynamically scheduled after static allocation is carried out on all tasks, the task allocation is flexible, the change of the running environment can be adapted, meanwhile, the scheduling cost is small, and the advantages of static allocation and dynamic allocation are considered.
As an optional embodiment, step S101, before the program corresponding to the plurality of target tasks is executed, allocating the plurality of target tasks to the corresponding processors includes:
modeling a task processing system, and distributing a plurality of target tasks to corresponding processors by solving an objective function targeting the minimum system delay.
Specifically, in the static allocation process, an objective function targeting the minimum delay of the system can be constructed, and the GPU allocation scheme can be solved. The minimum system delay comprises the minimum total delay of the task processing system for processing all target tasks or the minimum total delay of the task processing system for processing part of target tasks, for example, the minimum total delay of a plurality of target tasks with higher priorities is used as an optimization target to solve a GPU distribution scheme, and the optimization target of the target function can be determined according to the system requirement.
As an alternative embodiment, modeling a task processing system, assigning a plurality of target tasks to corresponding processors by solving an objective function that targets a minimum system latency includes:
measuring processing time of a plurality of target tasks, and assuming that M GPU tasks exist in one computing system, measuring the processing time T of the GPU tasks to obtain:
T=[t 1 … t M ]
wherein T represents the total processing time of the plurality of GPU tasks; t is t 1 Representing the processing time of the 1 st GPU task; t is t M Representing the processing time of the mth GPU task.
The method comprises the steps of measuring a correlation coefficient matrix among a plurality of target tasks, wherein the measurement results show that the correlation coefficient matrix R among GPU tasks is as follows:
wherein, R represents a correlation coefficient matrix between GPU tasks; r represents a correlation coefficient between GPU tasks, e.g., r 11 Representing a correlation coefficient between the 1 st task and the 1 st task; r is (r) 1M Representing the correlation coefficient between the 1 st task and the mth task.
Assume that the allocation matrix X between the plurality of target tasks is:
wherein x is ij E {0,1}, and Σ i x ij =1
And constructing an objective function which takes the minimum time delay of the system as an optimization target, takes the time delay of each objective task as a constraint condition and takes the distribution matrix as a decision variable. Specifically, according to the actual requirement of the system, determining an objective function, for example, minimizing the overall delay of the system, where the objective function is:
wherein w is j Represents the delay, w, of the jth task j Is calculated according to the processing time T and the correlation coefficient matrix R:
w j =∑ i r ij t j
and obtaining a numerical value corresponding to the distribution matrix X by solving the objective function, and obtaining the GPU static distribution scheme. The integer programming problem is solved by the optimization method, so that the system time delay is reduced as much as possible, and the GPU utilization efficiency is improved.
As an optional implementation manner, step S102, determining the priority of each target task in each processor includes:
determining a relation function positively correlated with time of the priority threshold of each target task, wherein the priority threshold of each target task changes along with the time, and calculating the priority threshold of each target task through the relation function:
q=a·t+b
where q represents the threshold of each priority; a represents a time coefficient; t represents time; b represents the initial weight. a and b are constants, the time coefficients and the initial weights of different target tasks are different, the real-time performance requirements of the target tasks can be met, and the initial weights of the target tasks with high real-time performance requirements are set higher.
Determining the priority of each target task according to the priority threshold of each target task, wherein each priority threshold has a corresponding priority, specifically through the following formula:
wherein p represents the priority of each target task; q represents the threshold value of each priority.
Specifically, according to the well-determined GPU static allocation scheme in step S101, for the situation that there are still multiple tasks on a single GPU, it is necessary to determine priorities of the multiple tasks in the single GPU, so as to implement dynamic scheduling in the single GPU.
As an optional implementation manner, step S103, dynamically scheduling the high-priority target task among the multiple processors according to the running states of the processors includes:
inputting the state information corresponding to each processor into a state prediction model, and outputting the state score of each processor at the next moment through the state prediction model; the state prediction model is obtained by training in advance based on a neural network model;
and scheduling the target task with high priority according to the state scores of the processors. For example, a high priority target task allocation is scheduled from a previously statically allocated processor to a processor with a higher state score.
Specifically, inputting the state information corresponding to each processor into a state prediction model, and outputting the state score of each processor at the next moment through the state prediction model includes:
acquiring state information corresponding to each processor at the current moment; the type of state information includes, but is not limited to, one or more of memory capacity usage, operating temperature, operating frequency, number of tasks, and states information for each GPU is noted as si= [ s1, s2,..s4]S1, s2, ·s4 represent different types of status information, respectively. State information S of each GPU i Expressed as:
inputting the state information of all the GPUs at the t moment into a state prediction model for feature extraction to obtain the state features of all the GPUs at the t moment:
X t =[S 1 T … S N T ] T
the state prediction model outputs the state characteristics of each processor at the next moment according to the input state characteristics of each processor, wherein the state characteristics of the input state prediction model are all the state characteristics from the t-N moment to the t moment, and are expressed as follows:
X=[X t-N … X t ]
the state prediction model outputs the state characteristics of each GPU at the next moment according to the input characteristics, and converts the state characteristics into corresponding state scores according to the mapping relation between the state characteristics and the state scores:
Q i =g(S i )
wherein S is i State information representing an ith GPU; q (Q) i And representing the state score corresponding to the ith GPU.
According to the state score Q i Determining GPU scheduling case i=argmaxq i I.e. select Q i The GPU with the largest value is used as the GPU for processing the high priority tasks.
According to the method and the device, the neural network model can be trained in advance through the historical state information of the GPU to obtain the state prediction model, a one-dimensional convolution network is set, and the GPU state at the next moment is predicted, so that the efficiency of multi-GPU dynamic scheduling is further improved. The structure of the state prediction model is shown in fig. 2, and includes a plurality of convolution modules (Convolutional block), where the convolution module 200 may include a one-dimensional convolution layer 201, a batch normalization layer 202, a linear rectification function (Linear rectification function, abbreviated as ReLU function) 203, and a max-pooling layer 204, and network output states: y=f (X).
For a real-time multi-GPU system, the static GPU scheduling mode can reduce task scheduling cost (scheduling calculation, resource application, release and communication cost) among GPUs, but because the state in the system is always changed, the static scheduling mode can only ensure the overall relative optimization of the system. In this regard, considering that for a small part of GPU tasks, the load of each GPU is effectively balanced through the scheduling among GPUs, the computing efficiency of the high priority task is improved, and the performance of the system is further improved.
The present disclosure provides a task scheduling device 300 applied to a task processing system, the task processing system including a plurality of processors; as shown in fig. 3, includes:
a static allocation module 301 configured to allocate a plurality of target tasks to corresponding processors before a program corresponding to the plurality of target tasks is run;
a priority determining module 302, configured to determine, during the running process of the program, a priority of each target task in each processor according to the running state of each processor;
the dynamic scheduling module 303 is configured to dynamically schedule the high-priority target task among the multiple processors according to the running states of the processors.
Specifically, the present disclosure adopts a hybrid scheduling method that merges static allocation and dynamic allocation, taking a multi-GPU task processing system as an example, the static allocation module 301 allocates a target task to each GPU, i.e., static allocation, before program execution. Static allocation is the partitioning of the entire GPU available memory into memory blocks of equal size, each of which is considered a data storage unit to hold different types of data while the program is running. In the static allocation mode, the memory can be allocated in advance when the program is compiled, so that no additional memory allocation operation is needed when the program is running, thereby improving the running speed of the program, but the memory allocated in the mode cannot be dynamically changed in size, and therefore, the phenomenon of unreasonable consumption of the memory may exist.
After static allocation, each GPU may be allocated to multiple tasks, and the priority determination module 302 further prioritizes the tasks within the single GPU according to the system operation, where the priority determination module 302 dynamically schedules the tasks within the single GPU. After the tasks in each GPU are prioritized, the GPU can process each target task according to the priorities of the tasks, and in the running process, the dynamic scheduling module 303 performs multi-card scheduling on the tasks with high priority and with high partial real-time requirements, for example, the task a is statically allocated to the nth GPU, because the real-time requirements of the task a are high, if the task a is deployed on the nth-1 GPU, the system time delay is smaller, the task a can be scheduled from the nth GPU to the nth-1 GPU, and the dynamic scheduling module 303 performs dynamic scheduling on the tasks among the GPUs. The priority determining module 302 and the dynamic scheduling module 303 dynamically allocate the target tasks during the running process of the program. The dynamic allocation can effectively solve the problem that the static allocation divides the GPU memory into fixed sizes, and can adjust and reallocate the memory in time according to actual use conditions, so that the utilization rate of the program to the GPU is improved. The GPU utilization efficiency is further improved through dynamic scheduling, the real-time computing capacity is improved, dynamic scheduling is not needed for all tasks, only part of high-priority tasks are needed to be dynamically scheduled after static allocation is carried out on all tasks, the task allocation is flexible, the change of the running environment can be adapted, meanwhile, the scheduling cost is small, and the advantages of static allocation and dynamic allocation are considered.
As an alternative embodiment, the static allocation module 301 allocates the plurality of target tasks to the corresponding processors before the program corresponding to the plurality of target tasks runs, including:
modeling a task processing system, and distributing a plurality of target tasks to corresponding processors by solving an objective function targeting the minimum system delay.
Specifically, in the static allocation process, an objective function targeting the minimum delay of the system can be constructed, and the GPU allocation scheme can be solved. The minimum system delay comprises the minimum total delay of the task processing system for processing all target tasks or the minimum total delay of the task processing system for processing part of target tasks, for example, the minimum total delay of a plurality of target tasks with higher priorities is used as an optimization target to solve a GPU distribution scheme, and the optimization target of the target function can be determined according to the system requirement.
As an alternative embodiment, the static allocation module 301 models a task processing system, and allocating a plurality of target tasks to corresponding processors by solving an objective function targeting a minimum system latency includes:
measuring processing time of a plurality of target tasks, and assuming that M GPU tasks exist in one computing system, measuring the processing time T of the GPU tasks to obtain:
T=[t 1 … t M ]
wherein T represents the total processing time of the plurality of GPU tasks; t is t 1 Representing the processing time of the 1 st GPU task; t is t M Representing the processing time of the mth GPU task.
The method comprises the steps of measuring a correlation coefficient matrix among a plurality of target tasks, wherein the measurement results show that the correlation coefficient matrix R among GPU tasks is as follows:
wherein, R represents a correlation coefficient matrix between GPU tasks; r represents a correlation coefficient between GPU tasks, e.g., r 11 Representing a correlation coefficient between the 1 st task and the 1 st task; r is (r) 1M Representing the correlation coefficient between the 1 st task and the mth task.
Assume that the allocation matrix X between the plurality of target tasks is:
wherein x is ij E {0,1}, and Σ i x ij =1
And constructing an objective function which takes the minimum time delay of the system as an optimization target, takes the time delay of each objective task as a constraint condition and takes the distribution matrix as a decision variable. Specifically, according to the actual requirement of the system, determining an objective function, for example, minimizing the overall delay of the system, where the objective function is:
wherein w is j Represents the delay, w, of the jth task j Is calculated according to the processing time T and the correlation coefficient matrix R:
w j =∑ i r ij t j
and obtaining a numerical value corresponding to the distribution matrix X by solving the objective function, and obtaining the GPU static distribution scheme. The integer programming problem is solved by the optimization method, so that the system time delay is reduced as much as possible, and the GPU utilization efficiency is improved.
As an alternative embodiment, the priority determination module 302 determines the priorities of the target tasks within each processor includes:
determining a relation function positively correlated with time of the priority threshold of each target task, wherein the priority threshold of each target task changes along with the time, and calculating the priority threshold of each target task through the relation function:
q=a·t+b
where q represents the threshold of each priority; a represents a time coefficient; t represents time; b represents the initial weight. a and b are constants, the time coefficients and the initial weights of different target tasks are different, the real-time performance requirements of the target tasks can be met, and the initial weights of the target tasks with high real-time performance requirements are set higher.
Determining the priority of each target task according to the priority threshold of each target task, wherein each priority threshold has a corresponding priority, specifically through the following formula:
wherein p represents the priority of each target task; q represents the threshold value of each priority.
Specifically, according to the well-determined GPU static allocation scheme in step S101, for the situation that there are still multiple tasks on a single GPU, it is necessary to determine priorities of the multiple tasks in the single GPU, so as to implement dynamic scheduling in the single GPU.
As an alternative embodiment, the dynamic scheduling module 303 includes:
a state prediction unit configured to input state information corresponding to each processor into a state prediction model, and output a state score of each processor at a next moment through the state prediction model; the state prediction model is obtained by training in advance based on a neural network model;
and the scheduling unit is configured to schedule the target task with high priority according to the state scores of the processors. For example, a high priority target task allocation is scheduled from a previously statically allocated processor to a processor with a higher state score.
Specifically, the state prediction unit includes:
the obtaining subunit is configured to obtain state information corresponding to each processor at the current moment; the type of state information includes, but is not limited to, one or more of memory capacity usage, operating temperature, operating frequency, number of tasks, and the state information of each GPU is noted as si= [ s1, s2, ], s4]S1, s2,..s4 respectively represent different types of status information. State information S of each GPU i Expressed as:
the feature extraction subunit is configured to input the state information of all the GPUs at the t moment into the state prediction model to perform feature extraction, so as to obtain the state features of all the GPUs at the t moment:
X t =[S 1 T … S N T ] T
the state prediction model outputs the state characteristics of each processor at the next moment according to the input state characteristics of each processor, wherein the state characteristics of the input state prediction model are all the state characteristics from the t-N moment to the t moment, and are expressed as follows:
X=[X t-N … X t ]
the output subunit is configured to output the state characteristics of each GPU at the next moment according to the input characteristics by the state prediction model, and convert the state characteristics into corresponding state scores according to the mapping relation between the state characteristics and the state scores:
Q i =g(S i )
wherein S is i State information representing an ith GPU; q (Q) i And representing the state score corresponding to the ith GPU.
According to the state score Q i Determining GPU scheduling case i=argmaxq i I.e. select Q i The GPU with the largest value is used as the GPU for processing the high priority tasks.
According to the method and the device, the neural network model can be trained in advance through the historical state information of the GPU to obtain the state prediction model, a one-dimensional convolution network is set, and the GPU state at the next moment is predicted, so that the efficiency of multi-GPU dynamic scheduling is further improved. The structure of the state prediction model is shown in fig. 2, and includes a plurality of convolution modules (Convolutional block), where the convolution module 200 may include a one-dimensional convolution layer 201, a batch normalization layer 202, a linear rectification function (Linear rectification function, abbreviated as ReLU function) 203, and a max-pooling layer 204, and network output states: y=f (X).
For a real-time multi-GPU system, the static GPU scheduling mode can reduce task scheduling cost (scheduling calculation, resource application, release and communication cost) among GPUs, but because the state in the system is always changed, the static scheduling mode can only ensure the overall relative optimization of the system. In this regard, considering that for a small part of GPU tasks, the load of each GPU is effectively balanced through the scheduling among GPUs, the computing efficiency of the high priority task is improved, and the performance of the system is further improved.
The disclosure also provides an autopilot system, which comprises an electronic device for executing the task scheduling method in any one of the embodiments, wherein the electronic device can execute the hybrid scheduling method combining static allocation and dynamic scheduling in the disclosure, so that the GPU utilization efficiency is improved, the real-time computing capability is improved, the task allocation is flexible, the task allocation can adapt to the change of the running environment, the scheduling overhead is small, and the advantages of the static allocation and the dynamic allocation are considered, thereby improving the safety and the stability of the autopilot system and reducing the scheduling cost of the autopilot system.
In the technical scheme of the disclosure, the acquisition, storage, application and the like of the related user personal information all conform to the regulations of related laws and regulations, and the public sequence is not violated.
According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.
Fig. 4 illustrates a schematic block diagram of an example electronic device 400 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 4, the apparatus 400 includes a computing unit 401 that can perform various suitable actions and processes according to a computer program stored in a Read Only Memory (ROM) 402 or a computer program loaded from a storage unit 408 into a Random Access Memory (RAM) 403. In RAM403, various programs and data required for the operation of device 400 may also be stored. The computing unit 401, ROM 402, and RAM403 are connected to each other by a bus 404. An input/output (I/O) interface 405 is also connected to bus 404.
Various components in device 400 are connected to I/O interface 405, including: an input unit 406 such as a keyboard, a mouse, etc.; an output unit 407 such as various types of displays, speakers, and the like; a storage unit 408, such as a magnetic disk, optical disk, etc.; and a communication unit 409 such as a network card, modem, wireless communication transceiver, etc. The communication unit 409 allows the device 400 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
The computing unit 401 may be a variety of general purpose and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 401 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning objective function algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 401 performs the respective methods and processes described above, such as a task scheduling method. For example, in some embodiments, the task scheduling method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 408. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 400 via the ROM 402 and/or the communication unit 409. When the computer program is loaded into RAM403 and executed by computing unit 401, one or more steps of the task scheduling method described above may be performed. Alternatively, in other embodiments, the computing unit 401 may be configured to perform the task scheduling method by any other suitable means (e.g. by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel, sequentially, or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.
The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims (22)

1. A task scheduling method is applied to a task processing system, wherein the task processing system comprises a plurality of processors; comprising the following steps:
before running programs corresponding to the target tasks, distributing the target tasks to the corresponding processors;
determining the priority of each target task in each processor according to the running state of each processor in the running process of the program;
and dynamically scheduling the target task with high priority among a plurality of processors according to the running state of each processor.
2. The method of claim 1, wherein said assigning a plurality of target tasks to corresponding ones of said processors prior to execution of programs corresponding to said plurality of target tasks comprises:
modeling the task processing system, and distributing a plurality of target tasks to the corresponding processors by solving an objective function with the minimum system delay as a target.
3. The method of claim 2, wherein modeling the task processing system, assigning a plurality of the target tasks to the corresponding processors by solving an objective function that targets a system minimum latency comprises:
constructing an objective function which takes the minimum time delay of a system as an optimization target, takes the time delay of each objective task as a constraint condition and takes an allocation matrix as a decision variable; the distribution matrix characterizes the corresponding relation between a plurality of target tasks and a plurality of processors;
and obtaining the numerical value corresponding to the distribution matrix by solving the objective function.
4. A method according to claim 3, wherein calculating the latency of each of the target tasks comprises:
acquiring processing time of a plurality of target tasks;
acquiring a correlation coefficient matrix among a plurality of target tasks;
and calculating the time delay of each target task according to the processing time and the correlation coefficient matrix.
5. A method according to claim 2 or 3, wherein the system minimum latency comprises a minimum total latency of the task processing system processing part of the target task; wherein, part of the target tasks are determined according to the priority of each target task.
6. The method of claim 1, wherein said determining the priority of each of said target tasks within each of said processors comprises:
determining a relation function positively correlated with the time and a priority threshold of each target task;
and determining the priority of each target task according to the priority threshold of each target task.
7. The method of any of claims 1-6, wherein said dynamically scheduling said target task of high priority among a plurality of said processors according to an operating state of each of said processors comprises:
inputting the state information corresponding to each processor into a state prediction model, and outputting the state score of each processor at the next moment through the state prediction model; the state prediction model is obtained by training in advance based on a neural network model;
and scheduling the target task with high priority according to the state scores of the processors.
8. The method of claim 7, wherein said inputting the state information corresponding to each of the processors into a state prediction model, outputting the state score of each of the processors at a next time instant through the state prediction model comprises:
acquiring the state information corresponding to each processor at the current moment;
inputting the state information at the current moment into the state prediction model for feature extraction to obtain state features at the current moment;
the state prediction model outputs the state characteristics of each processor at the next moment according to the state characteristics of each processor at the current moment;
and converting the state characteristic mapping of the next moment into the corresponding state score.
9. The method of claim 7 or 8, wherein the status information comprises at least one of: memory capacity usage; an operating temperature; an operating frequency; number of tasks.
10. A task scheduling device applied to a task processing system, wherein the task processing system comprises a plurality of processors; comprising the following steps:
the static allocation module is configured to allocate a plurality of target tasks to the corresponding processors before the programs corresponding to the target tasks run;
a priority determining module configured to determine, during the program running, a priority of each of the target tasks in each of the processors according to an running state of each of the processors;
and the dynamic scheduling module is used for dynamically scheduling the target task with high priority among a plurality of processors according to the running state of each processor.
11. The apparatus of claim 10, wherein the static allocation module allocates a plurality of target tasks to the corresponding processors before execution of programs corresponding to the plurality of target tasks comprises:
modeling the task processing system, and distributing a plurality of target tasks to the corresponding processors by solving an objective function with the minimum system delay as a target.
12. The apparatus of claim 11, wherein the static allocation module models the task processing system and allocating a plurality of the target tasks to the corresponding processors by solving an objective function that targets a system minimum latency comprises:
constructing an objective function which takes the minimum time delay of a system as an optimization target, takes the time delay of each objective task as a constraint condition and takes an allocation matrix as a decision variable; the distribution matrix characterizes the corresponding relation between a plurality of target tasks and a plurality of processors;
and obtaining the numerical value corresponding to the distribution matrix by solving the objective function.
13. The apparatus of claim 12, wherein the static allocation module calculating a latency for each of the target tasks comprises:
acquiring processing time of a plurality of target tasks;
acquiring a correlation coefficient matrix among a plurality of target tasks;
and calculating the time delay of each target task according to the processing time and the correlation coefficient matrix.
14. The apparatus of claim 11 or 12, wherein the system minimum latency comprises a minimum total latency for the task processing system to process a portion of the target task; wherein, part of the target tasks are determined according to the priority of each target task.
15. The apparatus of claim 10, wherein the priority determination module determining the priority of the target tasks within each of the processors comprises:
determining a relation function positively correlated with the time and a priority threshold of each target task;
and determining the priority of each target task according to the priority threshold of each target task.
16. The apparatus of any of claims 10-15, wherein the dynamic scheduling module comprises:
a state prediction unit configured to input state information corresponding to each processor into a state prediction model, and output a state score of each processor at a next time through the state prediction model; the state prediction model is obtained by training in advance based on a neural network model;
and a scheduling unit configured to schedule the target task with high priority according to the state scores of the processors.
17. The apparatus of claim 16, wherein the state prediction unit comprises:
the obtaining subunit is configured to obtain the state information corresponding to each processor at the current moment;
the feature extraction subunit is configured to input the state information at the current moment into the state prediction model to perform feature extraction so as to obtain the state feature at the current moment;
the output subunit is configured to output the state characteristics of each processor at the next moment according to the state characteristics of the current moment corresponding to each processor by the state prediction model;
and converting the state characteristic mapping of the next moment into the corresponding state score.
18. The apparatus of claim 16 or 17, wherein the status information comprises at least one of: memory capacity usage; an operating temperature; an operating frequency; number of tasks.
19. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-9.
20. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-9.
21. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any of claims 1-9.
22. An autopilot system comprising the electronic device of claim 19.
CN202311684555.8A 2023-12-08 2023-12-08 Task scheduling method, device, electronic equipment and storage medium Pending CN117762583A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311684555.8A CN117762583A (en) 2023-12-08 2023-12-08 Task scheduling method, device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311684555.8A CN117762583A (en) 2023-12-08 2023-12-08 Task scheduling method, device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117762583A true CN117762583A (en) 2024-03-26

Family

ID=90317198

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311684555.8A Pending CN117762583A (en) 2023-12-08 2023-12-08 Task scheduling method, device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117762583A (en)

Similar Documents

Publication Publication Date Title
JP6898496B2 (en) Computation graph processing
US10373053B2 (en) Stream-based accelerator processing of computational graphs
KR101286700B1 (en) Apparatus and method for load balancing in multi core processor system
CN111400022A (en) Resource scheduling method and device and electronic equipment
JP6241300B2 (en) Job scheduling apparatus, job scheduling method, and job scheduling program
CN110389816B (en) Method, apparatus and computer readable medium for resource scheduling
CN111104211A (en) Task dependency based computation offload method, system, device and medium
WO2023051505A1 (en) Job solving method and apparatus
CN111813523A (en) Duration pre-estimation model generation method, system resource scheduling method, device, electronic equipment and storage medium
CN113986497B (en) Queue scheduling method, device and system based on multi-tenant technology
CN111597044A (en) Task scheduling method and device, storage medium and electronic equipment
CN112925616A (en) Task allocation method and device, storage medium and electronic equipment
CN102184124B (en) Task scheduling method and system
CN114153614A (en) Memory management method and device, electronic equipment and automatic driving vehicle
US11954518B2 (en) User-defined metered priority queues
CN116795503A (en) Task scheduling method, task scheduling device, graphic processor and electronic equipment
CN117762583A (en) Task scheduling method, device, electronic equipment and storage medium
JP6732693B2 (en) Resource allocation control system, resource allocation control method, and program
CN115396515A (en) Resource scheduling method, device and storage medium
CN114327897A (en) Resource allocation method and device and electronic equipment
KR20120069364A (en) Apparatus and method of processing the frame for considering processing capability and power consumption in multicore environment
CN114896070A (en) GPU resource allocation method for deep learning task
CN110764886B (en) Batch job cooperative scheduling method and system supporting multi-partition processing
CN109522106B (en) Risk value simulation dynamic task scheduling method based on cooperative computing
CN111858019A (en) Task scheduling method and device and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination