CN113032116A

CN113032116A - Training method of task time prediction model, task scheduling method and related device

Info

Publication number: CN113032116A
Application number: CN202110247231.2A
Authority: CN
Inventors: 李晓杰
Original assignee: Guangzhou Huya Technology Co Ltd
Current assignee: Guangzhou Huya Technology Co Ltd
Priority date: 2021-03-05
Filing date: 2021-03-05
Publication date: 2021-06-25
Anticipated expiration: 2041-03-05
Also published as: CN113032116B

Abstract

The application discloses a training method of a task time prediction model, a task scheduling method and a related device, wherein the training method of the task time prediction model comprises the following steps: converting task information of a historical task into a gray image; the task information comprises a task running instruction, resources required by the task and a code for running the task; inputting the gray level image into a task time prediction model, and outputting the predicted running time corresponding to the historical task; and adjusting the network parameters of the task time prediction model based on the difference between the predicted running time and the actual running time corresponding to the historical task. According to the scheme, the resource utilization rate can be improved.

Description

Training method of task time prediction model, task scheduling method and related device

Technical Field

The present application relates to the field of computer technologies, and in particular, to a training method for a task time prediction model, a task scheduling method, and a related apparatus.

Background

Many companies' businesses include a large number of deep learning models, which typically require high-performance devices such as GPUs to train the models. For this reason, many companies using deep learning algorithms build specialized clusters, where the resources of each machine in the cluster are fixed, but the resources required for training tasks are different, and the running times required are also different.

The general training task with small card number is operated in a single machine, not in a distributed mode, because the training speed is the fastest. Assuming that each machine in the cluster has 8 GPUs, and the tasks running on each machine occupy 7 GPUs, if 1 single-machine 2-GPU task needs to be scheduled, although the total resources of the cluster are sufficient, no single-machine 2-GPU idle machine exists, so that a new task cannot be scheduled, which is a problem caused by resource fragmentation, and resource utilization rate is low.

Disclosure of Invention

The technical problem mainly solved by the application is to provide a training method of a task time prediction model, a task scheduling method and a related device, which can improve the resource utilization rate.

In order to solve the above problem, a first aspect of the present application provides a training method for a task time prediction model, where the training method includes: converting task information of a historical task into a gray image; the task information comprises a task running instruction, resources required by the task and a code for running the task; inputting the gray level image into a task time prediction model, and outputting the predicted running time corresponding to the historical task; and adjusting the network parameters of the task time prediction model based on the difference between the predicted running time and the actual running time corresponding to the historical task.

In order to solve the above problem, a second aspect of the present application provides a task scheduling method, where the task scheduling method includes: predicting a task to be operated by using a task time prediction model to obtain the predicted operation time of the task to be operated; selecting a machine node with a target task as a target node from all machine nodes meeting the resources required by the task to be operated; the difference between the ending time point of the target task and the predicted completion time point of the task to be operated is minimum; scheduling the task to be run to the target node; the task time prediction model is obtained by training through the training method of the task time prediction model of the first aspect.

In order to solve the above problem, a third aspect of the present application provides a task scheduling system, including: a plurality of machine nodes for running tasks using system resources; the task scheduler is used for predicting the task to be operated by using a task time prediction model to obtain the predicted operation time of the task to be operated; selecting a machine node with a target task as a target node from all machine nodes meeting the resources required by the task to be operated; scheduling the task to be run to the target node; the difference between the ending time point of the target task and the predicted completion time point of the task to be operated is minimum; the task time prediction model is obtained by training through the training method of the task time prediction model of the first aspect.

In order to solve the above problem, a fourth aspect of the present application provides a training apparatus for a task time prediction model, including: the information processing module is used for converting task information of the historical task into a gray image; the task information comprises a task running instruction, resources required by the task and a code for running the task; the first prediction module is used for inputting the gray level image into a task time prediction model and outputting the predicted running time corresponding to the historical task; a model optimization module to adjust network parameters of the task time prediction model based on a difference between a predicted run time and an actual run time corresponding to the historical task.

In order to solve the above problem, a fifth aspect of the present application provides a task scheduling apparatus, including: the second prediction module is used for predicting the task to be operated by using a task time prediction model to obtain the predicted operation time of the task to be operated; the node selection module is used for selecting a machine node with a target task as a target node from all machine nodes meeting the resources required by the task to be operated; the difference between the ending time point of the target task and the predicted completion time point of the task to be operated is minimum; the task scheduling module is used for scheduling the task to be operated to the target node; the task time prediction model is obtained by training through the training method of the task time prediction model of the first aspect.

In order to solve the above problem, a sixth aspect of the present application provides an electronic device, which includes a memory and a processor coupled to each other, where the processor is configured to execute program instructions stored in the memory to implement the method for training a task time prediction model of the first aspect or the method for task scheduling of the second aspect.

In order to solve the above problem, a seventh aspect of the present application provides a computer-readable storage medium on which program instructions are stored, the program instructions, when executed by a processor, implement the method for training the task time prediction model of the above first aspect, or the method for task scheduling of the above second aspect.

The invention has the beneficial effects that: different from the situation of the prior art, in the training method of the task time prediction model, task information of a historical task is converted into a gray image, wherein the task information comprises a task running instruction, resources required by the task and a task running code, the gray image is input into the task time prediction model, and the predicted running time corresponding to the historical task is output, so that network parameters of the task time prediction model can be adjusted based on the difference between the predicted running time corresponding to the historical task and the actual running time, the task time prediction model for predicting the running time of the task can be trained, and technical support is provided for improving the resource utilization rate; the predicted running time of the task to be run can be obtained by using the task time prediction model, so that the task to be run can be scheduled to the machine node with the residual time of the target task close to the time consumption of the task to be run, the completion time of the task to be run and the target task is close, the machine node can simultaneously vacate resources for scheduling of subsequent tasks as much as possible, the problem of resource fragmentation is solved, cluster resources can be fully utilized, and the resource utilization rate is improved.

Drawings

FIG. 1 is a schematic flowchart of an embodiment of a method for training a task time prediction model according to the present application;

FIG. 2 is a flowchart illustrating an embodiment of step S11 in FIG. 1;

FIG. 3 is a flowchart illustrating an embodiment of step S13 in FIG. 1;

FIG. 4 is a flowchart illustrating an embodiment of a task scheduling method according to the present application;

FIG. 5 is a flowchart illustrating an embodiment of step S42 in FIG. 4;

FIG. 6 is a block diagram of an embodiment of a task scheduling system according to the present application;

FIG. 7 is a block diagram of an embodiment of a training apparatus for a task time prediction model according to the present application;

FIG. 8 is a block diagram of an embodiment of a task scheduler of the present application;

FIG. 9 is a block diagram of an embodiment of an electronic device of the present application;

FIG. 10 is a block diagram of an embodiment of a computer-readable storage medium of the present application.

Detailed Description

The following describes in detail the embodiments of the present application with reference to the drawings attached hereto.

In the following description, for purposes of explanation and not limitation, specific details are set forth such as particular system structures, interfaces, techniques, etc. in order to provide a thorough understanding of the present application.

The terms "system" and "network" are often used interchangeably herein. The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship. Further, the term "plurality" herein means two or more than two.

Referring to fig. 1, fig. 1 is a schematic flowchart illustrating an embodiment of a training method for a task time prediction model according to the present application. Specifically, the training method of the task time prediction model in this embodiment may include the following steps:

step S11: and converting the task information of the historical task into a gray image. The task information comprises a task running instruction, resources required by the task and a code for running the task.

The artificial intelligence can simulate the information process of human consciousness and thinking, although the artificial intelligence is continuously developed, the essence of the artificial intelligence is not changed, the data to be learned is arranged into training samples, then an algorithm is compiled for learning, and various parameters or algorithms are adjusted until the calculation result reaches the design target. According to the nature of artificial intelligence, training samples are the basis of artificial intelligence, and can be used as training sample data with a large number of contents, such as texts, sounds, pictures and the like. In order to make the computer study, the characters, sound and pictures can be converted into a digital matrix through digitalization, and the learning is realized through an algorithm. Therefore, it can be understood that to predict the running time of a task by using a task time prediction model, the task information of the task needs to be processed into an input readable by the task time prediction model, that is, the task information of a historical task needs to be converted into a grayscale image.

In this application, the task information of the task may include a task running instruction, resources required by the task, and codes for running the task, and certainly, the task information may also include other parameters, such as a task name, a task frame, a duration of task input, a number of task iterations, and the like. The resources required by the task include at least one of a GPU, a CPU, a memory and a disk.

Referring to fig. 2, fig. 2 is a schematic flowchart illustrating an embodiment of step S11 in fig. 1. In an embodiment, the step S11 may specifically include:

step S111: and combining the task information into a binary information text.

Specifically, the task information may include various text formats, such as exe, apk, doc, txt, and the like, which are all required to be converted into binary files that can be recognized by the computer; for example, since the english word is composed of 26 letters, even if some punctuations and special symbols are added, the number of the letters can be represented by fewer numbers, so that each number of bytes can be set in the computer algorithm to represent a letter or a symbol, and the english text can be converted into a binary file which can be recognized by the computer. Similarly, other common texts written in natural language can be converted into binary files that can be recognized by the computer, so that all contents contained in the task information can be combined into a binary information text.

Step S112: and sequentially using each 8 bits of the binary information text as an image numerical value, and arranging all the image numerical values according to a square array to form an initial image. The range of the image numerical value is 0-255, and the vacant part in the initial image is filled with zeros.

After the task information is combined into a binary information text, sequentially taking every 8 bits of the binary information text as an image numerical value, namely for the combined binary information text, reading out binary stream which is a non-negative integer vector of 8 bits, wherein the numerical range represented by the 8 bits is 0-255 and corresponds to 0-255 pixel values of a gray image, mapping every 8 bits of the binary stream into a pixel point, wherein the pixel value of the pixel point is the image numerical value, and then adjusting the pixel values of the pixel points into a two-dimensional matrix according to the size of the combined binary information text, wherein the two-dimensional matrix is a square, so that a picture is obtained and is an initial image. It can be understood that the probability of the large number of the pixel points is not just enough to form a square initial image, and therefore, the blank portion in the initial image needs to be filled with zeros.

Step S113: and adjusting the initial image to be 255 x 255 by using a bilinear interpolation algorithm to obtain the gray image.

It can be understood that for different tasks, the sizes of binary information texts into which the task information is combined may be different, resulting in different sizes of initial images, and as an input of the task time prediction model, the sizes of images corresponding to different tasks should be the same; therefore, a standard image size may be set, for example, the standard image size may be set to 255 × 255, after an initial image is formed, the size of the initial image may be compared with the standard image size, and if the size of the initial image is different from the size of the standard image, the initial image needs to be scaled to obtain a grayscale image, so that the grayscale image after scaling can reach the standard image size. Specifically, when the initial image is scaled, a bilinear difference algorithm may be used to calculate a pixel value of each pixel point in the scaled gray-scale image, and the scaled gray-scale image is used as an input of the task time prediction model.

Step S12: and inputting the gray level image into a task time prediction model, and outputting the predicted running time corresponding to the historical task.

The structure of the task time prediction model can be freely set, for example, a ResNet-50 structure can be adopted, and after the gray level image corresponding to the historical task is input into the task time prediction model, 1 numerical value is output by the model through the full connection layer, namely the predicted prediction running time corresponding to the historical task. It can be understood that the residual error network can solve the problem that the classification accuracy rate does not increase and decrease due to the deepening of the network, and the prediction model can achieve the effect of improving the accuracy rate only through simple network deep stacking through the residual error network.

Step S13: and adjusting the network parameters of the task time prediction model based on the difference between the predicted running time and the actual running time corresponding to the historical task.

It can be understood that, since the historical task is used as a sample for training the task time prediction model, the historical task corresponds to an actual running time, and the historical task is predicted by using the task time prediction model, the predicted running time corresponding to the historical task can be obtained, so that the predicted running time corresponding to the historical task and the actual running time are expected to have higher consistency, therefore, the difference between the predicted running time and the actual running time is obtained by comparing the predicted running time corresponding to the historical task and the actual running time, and then the network parameters of the task time prediction model can be adjusted according to the difference between the predicted running time and the actual running time, so as to update the task time prediction model.

Referring to fig. 3, fig. 3 is a schematic flowchart illustrating an embodiment of step S13 in fig. 1. In an embodiment, the step S13 may specifically include:

step S131: and determining a square loss function of the task time prediction model based on the difference between the predicted running time and the actual running time corresponding to the historical task.

Step S132: and adjusting the network parameters of the task time prediction model by utilizing the square loss function of the task time prediction model.

The difference between the predicted running time and the actual running time is obtained by comparing the predicted running time and the actual running time corresponding to the historical task, so that a loss function of the task time prediction model can be determined.

Referring to fig. 4, fig. 4 is a flowchart illustrating a task scheduling method according to an embodiment of the present application. The task scheduling method in this embodiment may include the following steps:

step S41: and predicting the task to be operated by using a task time prediction model to obtain the predicted operation time of the task to be operated. The task time prediction model is obtained by training through any one of the above training methods of the task time prediction model.

Taking the execution subject of the task scheduling method as the task manager as an example, the task manager may receive an operation request of the task to be executed, that is, a user may submit the task to be executed to the task manager, for example, may submit a model training task or an application program. It can be understood that, after acquiring the task to be executed, the task manager may analyze the task to be executed, and extract task information of the task to be executed, where the task information of the task to be executed may include a task execution instruction, resources required by the task, and a code for task execution. Therefore, the task information of the task to be operated can be converted into a gray image, then the gray image is input into the task time prediction model, and then the task time prediction model predicts the task to be operated to obtain the predicted operation time of the task to be operated.

Step S42: selecting a machine node with a target task as a target node from all machine nodes meeting the resources required by the task to be operated; and the difference between the ending time point of the target task and the predicted completion time point of the task to be operated is minimum.

Step S43: and scheduling the task to be run to the target node.

When task scheduling is needed, firstly, machine nodes meeting resources needed by tasks to be operated are obtained through filtering, then, analysis is carried out according to states of the tasks which are running on the machine nodes and the tasks to be operated, and the machine nodes which are running with target tasks are selected as the target nodes. Specifically, the states of the running tasks on the machine nodes include the remaining running time of the running tasks, and according to the remaining running time of the running tasks and the predicted running time of the tasks to be run, the ending time point of the running tasks and the predicted completion time point of the tasks to be run can be obtained; it can be understood that, the task with the minimum difference between the ending time point and the predicted completion time point of the task to be executed is selected from all the tasks being executed as the target task, and then the machine node running the target task is selected as the target node, so that after the task to be executed is scheduled to the target node, the completion time of the task to be executed is closest to that of the target task, the target node can almost simultaneously vacate resources for subsequent task scheduling, the problem of resource fragmentation can be solved, the cluster resources can be fully utilized, and the resource utilization rate can be improved.

Referring to fig. 5, fig. 5 is a flowchart illustrating an embodiment of step S42 in fig. 4. In an embodiment, the step S42 may specifically include:

step S421: and when receiving the running requirement of the task to be run, acquiring the resource requirement information of the task to be run, and selecting a machine node with idle resources meeting the resource requirement information from all machine nodes as a candidate node.

Specifically, when the operation requirement of the task to be operated is received, firstly, the resource requirement information of the task to be operated needs to be acquired, and then the machine nodes of which the current idle resources meet the resource requirement of the task to be operated are screened out according to the resource requirement information of the task to be operated. The current idle resources refer to memory idle resources, CPU idle resources, GPU idle resources, disk idle resources and the like of the machine nodes. For example, the resource required by the task to be run is a GPU resource 2 core; when the GPU resource of the machine node A is not used, the GPU resource is a GPU resource 8 core, the currently occupied GPU resource is a GPU resource 7 core, and then the current idle resource of the machine node A is a GPU resource 1 core; when the GPU resource of the machine node B is not used, the GPU resource is a GPU resource 6 core, and the currently occupied GPU resource is a GPU resource 4 core, so that the current idle resource of the machine node B is a GPU resource 2 core; it can be found that the current idle resource of the machine node a does not satisfy the resource required by the task to be run, and the current idle resource of the machine node B satisfies the resource required by the task to be run, so that the machine node B can be used as a candidate node.

Step S422: and calculating the difference between the ending time point of each task operated in all the candidate nodes and the predicted completion time point of the task to be operated, and selecting the candidate node with the task with the minimum corresponding difference as the target node.

Specifically, according to the remaining running time of the running task and the predicted running time of the to-be-run task on the candidate node, the end time point of the running task and the predicted completion time point of the to-be-run task can be obtained, for example, if the end time point of the running task C is close to the predicted completion time point of the to-be-run task, the to-be-run task is scheduled to the machine node with the task C, so that the completion time of the to-be-run task and the task C is close, resources can be simultaneously freed by the machine node as much as possible for scheduling of subsequent tasks, and the problem of resource fragmentation can be solved. It can be understood that, from all running tasks, a task with the minimum difference between the ending time point and the predicted completion time point of the task to be run is selected as a target task, and then a machine node running the target task is selected as a target node, so that after the task to be run is scheduled to the target node, the completion time of the task to be run is closest to that of the target task, the target node can almost simultaneously vacate resources for subsequent task scheduling, the problem of resource fragmentation can be solved, the overall utilization rate of a cluster can be greatly improved, the time for waiting for starting the task after a user submits the task can be obviously reduced, the cluster resources can be fully utilized, and the resource utilization rate can be improved.

In an embodiment, the target task is a task with a latest ending time point in the target node. Therefore, the required running time of the task to be run is predicted according to the task information of the task to be run, and then the task to be run is scheduled to the machine node with the time of the task which is finished at the latest and the time of the task to be run which is closest to the predicted completion time according to the predicted running time, so that the cluster resources can be fully utilized, and the problem of resource fragmentation is solved.

In addition, it should be noted that, regarding the scheduling of the task to be executed, the task to be executed is selected to be scheduled to the idle machine node only when there is no non-idle machine node meeting the condition as the target node in the cluster. The non-idle machine node refers to a machine node which is currently running with a task, and the idle machine node refers to a machine node which is not currently running with a task.

The execution main body of the task scheduling method of the present application may be hardware or software. When the execution subject is hardware, the execution subject can be various electronic devices including, but not limited to, a smart phone, a tablet computer, an e-book reader, a vehicle-mounted terminal, and the like. When the execution subject is software, it can be installed in the electronic device listed above, and it can be implemented as multiple pieces of software or software modules for providing distributed tasks, or as a single piece of software or software module. And is not particularly limited herein.

According to the training method of the task time prediction model, task information of a historical task is converted into a gray image, the gray image is input into the task time prediction model, and the predicted operation time corresponding to the historical task is output, so that network parameters of the task time prediction model can be adjusted based on the difference between the predicted operation time corresponding to the historical task and the actual operation time, the task time prediction model for predicting the operation time of the task can be trained, and technical support is provided for improving the resource utilization rate; the predicted running time of the task to be run can be obtained by using the task time prediction model, so that the task to be run can be scheduled to the machine node with the residual time of the target task close to the time consumption of the task to be run, the completion time of the task to be run and the target task is close, the machine node can simultaneously vacate resources for scheduling of subsequent tasks as much as possible, the problem of resource fragmentation is solved, cluster resources can be fully utilized, and the resource utilization rate is improved.

Referring to fig. 6, fig. 6 is a block diagram illustrating an embodiment of a task scheduling system according to the present application. Task scheduling system 60 includes: a plurality of machine nodes 601, wherein the machine nodes 601 are used for running tasks by using system resources; the task scheduling device 602 is configured to predict a task to be run by using a task time prediction model, so as to obtain a predicted running time of the task to be run; selecting the machine node 601 running the target task as a target node from all the machine nodes 601 meeting the resources required by the task to be run; scheduling the task to be run to the target node; the difference between the ending time point of the target task and the predicted completion time point of the task to be operated is minimum; the task time prediction model is obtained by training through any one of the above training methods of the task time prediction model.

In the scheme, the task scheduler 602 obtains the predicted running time of the task to be run by using the task time prediction model, so that the task to be run can be scheduled to the machine node 601 with the residual time of the target task close to the time consumption of the task to be run, the completion time of the task to be run and the target task is close, the machine node 601 can simultaneously vacate resources for scheduling of subsequent tasks as much as possible, the problem of resource fragmentation is solved, cluster resources can be fully utilized, and the resource utilization rate is improved.

Referring to fig. 7, fig. 7 is a block diagram illustrating an embodiment of a training apparatus for a task time prediction model according to the present application. The training device 70 for the task time prediction model includes: an information processing module 700, wherein the information processing module 700 is used for converting task information of a historical task into a gray image; the task information comprises a task running instruction, resources required by the task and a code for running the task; a first prediction module 702, where the first prediction module 702 is configured to input the grayscale image into a task time prediction model, and output a predicted running time corresponding to the historical task; a model optimization module 704, wherein the model optimization module 704 is configured to adjust a network parameter of the task time prediction model based on a difference between a predicted runtime and an actual runtime corresponding to the historical task.

In some embodiments, the information processing module 700 performs the step of converting the task information of the historical task into a grayscale image, including: combining the task information into a binary information text; sequentially taking each 8 bits of the binary information text as an image numerical value, and arranging all the image numerical values according to a square array to form an initial image; the range of the image numerical value is 0-255, and the vacant part in the initial image is filled with zeros; and adjusting the initial image to be 255 x 255 by using a bilinear interpolation algorithm to obtain the gray image.

In some embodiments, model optimization module 704 performs adjusting network parameters of the task time prediction model based on a difference between a predicted runtime and an actual runtime for the historical task, including: determining a square loss function of the task time prediction model based on a difference between a predicted running time and an actual running time corresponding to the historical task; and adjusting the network parameters of the task time prediction model by utilizing the square loss function of the task time prediction model.

Referring to fig. 8, fig. 8 is a schematic diagram of a task scheduling device according to an embodiment of the present application. The task scheduling device 80 includes: a second prediction module 800, where the second prediction module 800 is configured to predict a task to be run by using a task time prediction model, so as to obtain a predicted running time of the task to be run; a node selection module 802, where the node selection module 802 is configured to select a machine node running a target task as a target node from all machine nodes meeting the resources required by the task to be run; the difference between the ending time point of the target task and the predicted completion time point of the task to be operated is minimum; a task scheduling module 804, wherein the task scheduling module 804 is configured to schedule the task to be run to the target node; the task time prediction model is obtained by training through any one of the above training methods of the task time prediction model.

In some embodiments, the node selection module 802 performs a step of selecting a machine node running a target task as a target node from all machine nodes meeting resources required by the task to be run, including: when receiving the running requirement of the task to be run, acquiring resource requirement information of the task to be run, and selecting a machine node with idle resources meeting the resource requirement information from all machine nodes as a candidate node; and calculating the difference between the ending time point of each task operated in all the candidate nodes and the predicted completion time point of the task to be operated, and selecting the candidate node with the task with the minimum corresponding difference as the target node.

Referring to fig. 9, fig. 9 is a schematic diagram of a frame of an embodiment of an electronic device according to the present application. The electronic device 90 comprises a memory 91 and a processor 92 coupled to each other, and the processor 92 is configured to execute program instructions stored in the memory 91 to implement the steps of any one of the embodiments of the training method for a task time prediction model described above, or the steps of any one of the embodiments of the task scheduling method described above. In one particular implementation scenario, the electronic device 90 may include, but is not limited to: microcomputer, server.

In particular, the processor 92 is configured to control itself and the memory 91 to implement the steps of any of the embodiments of the training method of the task time prediction model described above, or the steps of any of the embodiments of the task scheduling method described above. The processor 92 may also be referred to as a CPU (Central Processing Unit). The processor 92 may be an integrated circuit chip having signal processing capabilities. The Processor 92 may also be a general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. In addition, the processor 92 may be collectively implemented by an integrated circuit chip.

Referring to fig. 10, fig. 10 is a block diagram illustrating an embodiment of a computer-readable storage medium according to the present application. The computer readable storage medium 100 stores program instructions 1000 executable by a processor, the program instructions 1000 being for implementing the steps of any of the above-described embodiments of the method for training a task time prediction model, or any of the above-described embodiments of the method for task scheduling.

In the several embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a module or a unit is merely one type of logical division, and an actual implementation may have another division, for example, a unit or a component may be combined or integrated with another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some interfaces, and may be in an electrical, mechanical or other form.

Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on network elements. Some or all of the units can be selected according to actual needs to achieve the purpose of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Claims

1. A training method of a task time prediction model is characterized by comprising the following steps:

converting task information of a historical task into a gray image; the task information comprises a task running instruction, resources required by the task and a code for running the task;

inputting the gray level image into a task time prediction model, and outputting the predicted running time corresponding to the historical task;

and adjusting the network parameters of the task time prediction model based on the difference between the predicted running time and the actual running time corresponding to the historical task.

2. The method for training the task time prediction model according to claim 1, wherein the converting the task information of the historical task into a gray image comprises:

combining the task information into a binary information text;

sequentially taking each 8 bits of the binary information text as an image numerical value, and arranging all the image numerical values according to a square array to form an initial image; the range of the image numerical value is 0-255, and the vacant part in the initial image is filled with zeros;

and adjusting the initial image to be 255 x 255 by using a bilinear interpolation algorithm to obtain the gray image.

3. The method for training the task time prediction model according to claim 1, wherein the adjusting the network parameters of the task time prediction model based on the difference between the predicted running time and the actual running time corresponding to the historical task comprises:

determining a square loss function of the task time prediction model based on a difference between a predicted running time and an actual running time corresponding to the historical task;

and adjusting the network parameters of the task time prediction model by utilizing the square loss function of the task time prediction model.

4. A method for training a task time prediction model according to any one of claims 1 to 3, wherein the resources include at least one of a GPU, a CPU, a memory, and a disk.

5. A task scheduling method is characterized by comprising the following steps:

predicting a task to be operated by using a task time prediction model to obtain the predicted operation time of the task to be operated;

selecting a machine node with a target task as a target node from all machine nodes meeting the resources required by the task to be operated; the difference between the ending time point of the target task and the predicted completion time point of the task to be operated is minimum;

scheduling the task to be run to the target node;

wherein the task time prediction model is obtained by training through the training method of the task time prediction model according to any one of claims 1 to 4.

6. The task scheduling method according to claim 5, wherein the selecting a machine node running a target task from all machine nodes satisfying resources required by the task to be run as the target node comprises:

when receiving the running requirement of the task to be run, acquiring resource requirement information of the task to be run, and selecting a machine node with idle resources meeting the resource requirement information from all machine nodes as a candidate node;

and calculating the difference between the ending time point of each task operated in all the candidate nodes and the predicted completion time point of the task to be operated, and selecting the candidate node with the task with the minimum corresponding difference as the target node.

7. The task scheduling method according to claim 5, wherein the target task is a task having a latest ending time point in the target node.

8. A task scheduling system, comprising:

a plurality of machine nodes for running tasks using system resources;

the task scheduler is used for predicting the task to be operated by using a task time prediction model to obtain the predicted operation time of the task to be operated; selecting a machine node with a target task as a target node from all machine nodes meeting the resources required by the task to be operated; scheduling the task to be run to the target node; the difference between the ending time point of the target task and the predicted completion time point of the task to be operated is minimum; the task time prediction model is obtained by training through the training method of the task time prediction model according to any one of claims 1 to 4.

9. An apparatus for training a task time prediction model, comprising:

the information processing module is used for converting task information of the historical task into a gray image; the task information comprises a task running instruction, resources required by the task and a code for running the task;

the first prediction module is used for inputting the gray level image into a task time prediction model and outputting the predicted running time corresponding to the historical task;

a model optimization module to adjust network parameters of the task time prediction model based on a difference between a predicted run time and an actual run time corresponding to the historical task.

10. A task scheduling apparatus, comprising:

the second prediction module is used for predicting the task to be operated by using a task time prediction model to obtain the predicted operation time of the task to be operated;

the node selection module is used for selecting a machine node with a target task as a target node from all machine nodes meeting the resources required by the task to be operated; the difference between the ending time point of the target task and the predicted completion time point of the task to be operated is minimum;

the task scheduling module is used for scheduling the task to be operated to the target node;

11. An electronic device comprising a memory and a processor coupled to each other, the processor being configured to execute program instructions stored in the memory to implement the method for training a task time prediction model according to any one of claims 1 to 4 or the method for task scheduling according to any one of claims 5 to 7.

12. A computer-readable storage medium, on which program instructions are stored, which program instructions, when executed by a processor, implement the method of training a task time prediction model according to any one of claims 1 to 4, or the method of task scheduling according to any one of claims 5 to 7.