WO2022095815A1 - 显存管理方法、装置、设备及系统 - Google Patents

显存管理方法、装置、设备及系统 Download PDF

Info

Publication number
WO2022095815A1
WO2022095815A1 PCT/CN2021/127856 CN2021127856W WO2022095815A1 WO 2022095815 A1 WO2022095815 A1 WO 2022095815A1 CN 2021127856 W CN2021127856 W CN 2021127856W WO 2022095815 A1 WO2022095815 A1 WO 2022095815A1
Authority
WO
WIPO (PCT)
Prior art keywords
video memory
priority
memory resources
task
tasks
Prior art date
Application number
PCT/CN2021/127856
Other languages
English (en)
French (fr)
Inventor
肖文聪
任仕儒
李永
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Priority to EP21888512.7A priority Critical patent/EP4242843A4/en
Publication of WO2022095815A1 publication Critical patent/WO2022095815A1/zh
Priority to US18/306,636 priority patent/US20230297498A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/0223User address space allocation, e.g. contiguous or non contiguous base addressing
    • G06F12/023Free address space management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining

Definitions

  • the present application relates to the technical field of machine learning, and in particular, to a method, apparatus and system for video memory management, a machine learning system, and an electronic device.
  • GPU Graphics Processing Unit
  • a typical GPU memory resource multiplexing method is to use a unified video memory allocator in the deep learning framework for video memory management.
  • the allocator receives a video memory resource application for any task, it only needs to run the task If the GPU has free video memory resources, it allocates the corresponding video memory space for the task, regardless of the video memory resource requirements of other tasks running on the GPU at the same time. This processing method can speed up the training speed of tasks in small batches.
  • the above resource multiplexing method does not provide any performance isolation guarantee, which will bring about uncontrollable mutual influence between multiple tasks.
  • the GPU is allocated to a "resource guarantee” task for sole use, the deep learning system can guarantee its task training performance. Due to the lack of a performance isolation mechanism on the GPU, if there are other tasks that are executed together on such a GPU, the potential competition for video memory resources may lead to serious performance degradation of the "resource guarantee" task.
  • the present application provides a video memory management method to solve the problem in the prior art that the performance of high-priority tasks cannot be guaranteed.
  • the present application additionally provides a video memory management apparatus and system, a machine learning system, and an electronic device.
  • the present application provides a video memory management method, including:
  • Allocate memory resources for high-priority tasks to run high-priority tasks at least according to the tensor data in the memory space.
  • Free video memory resources occupied by the multiple machine learning tasks are released.
  • the releasing the idle video memory resources occupied by the multiple machine learning tasks includes:
  • the idle video memory resources are released.
  • the usage status information includes: an upper limit of video memory resources actually used by the task;
  • the release condition includes: a time period in which the allocated amount of video memory resources of the task is greater than the upper limit value reaches a time period threshold.
  • the releasing at least a part of the video memory resources occupied by the low-priority task includes:
  • the free video memory resources of the low-priority task are greater than or equal to the demanded amount of video memory resources, the free video memory resources occupied by the low-priority task are released.
  • the releasing at least a part of the video memory resources occupied by the low-priority task includes:
  • the free video memory resources of the low-priority task are less than the required amount of video memory resources, allocate memory resources for the low-priority task, and release at least part of the video memory resources used by the low-priority task, so as to continue running at least according to the tensor data in the memory space low priority tasks.
  • the low-priority task is allocated video memory resource, so as to continue to run the low-priority task according to the tensor data of the video memory space.
  • the low-priority tasks include: iterative learning tasks;
  • the method before releasing at least a part of the video memory resources used by the low-priority task, the method further includes:
  • the method further includes:
  • the machine learning task includes a distributed deep learning task.
  • the present application also provides a video memory management method, including:
  • the free video memory resources occupied by the tasks are released, so as to allocate the free video memory resources to other machine learning tasks running in parallel by the graphics processing unit.
  • the present application also provides a video memory management device, including:
  • a prioritization unit for prioritizing multiple machine learning tasks run by the graphics processing unit
  • the video memory release unit is used to release at least a part of the video memory resources occupied by the low-priority tasks if the video memory resources are allocated for the high-priority tasks, and the allocatable video memory resources are less than the video memory resource requirements of the high-priority tasks;
  • the video memory allocation unit is used to allocate video memory resources for high-priority tasks, so as to run high-priority tasks at least according to the tensor data in the video memory space.
  • the application also provides an electronic device, comprising:
  • a memory for storing a program for implementing the video memory management method after the device is powered on and runs the program of the method through the processor, the following steps are performed: determine the priorities of multiple machine learning tasks run by the graphics processing unit; if To allocate video memory space for high-priority tasks, and the allocatable video memory space is less than the video memory space requirement of the high-priority task, release at least part of the video memory space for the low-priority task; according to the video memory space released by the low-priority task, the high Priority tasks allocate video memory space to run high-priority tasks at least according to the tensor data in video memory.
  • the present application also provides a video memory management device, including:
  • a task execution unit for running machine learning tasks through a graphics processing unit For running machine learning tasks through a graphics processing unit.
  • An information determination unit configured to determine the usage status information of video memory resources of the machine learning task.
  • a video memory release unit configured to release free video memory resources occupied by the task if the information satisfies a video memory resource release condition, so as to allocate the free video memory resources to other machine learning tasks running in parallel by the graphics processing unit.
  • the application also provides an electronic device, comprising:
  • a memory for storing a program for implementing the video memory management method After the device is powered on and the program of the method is run through the processor, the following steps are performed: running a machine learning task through a graphics processing unit; determining the video memory of the machine learning task Resource usage status information; if the information satisfies the video memory resource release condition, free video memory resources occupied by the task are released, so as to allocate the free video memory resources to other machine learning tasks running in parallel by the graphics processing unit.
  • the application also provides a video memory management system, including:
  • Memory resource coordinator used to determine the priority of multiple machine learning tasks running through the graphics processing unit; to allocate memory resources for high-priority tasks, and the amount of memory resources that can be allocated is less than the amount of memory resources for high-priority tasks, then sending a display memory resource release instruction to the first storage resource allocator of the low-priority task, and sending a display memory resource allocation instruction to the second storage resource allocator of the high-priority task;
  • a first storage resource allocator configured to release at least a part of the video memory resources occupied by the low-priority task according to the release instruction
  • the second storage resource allocator is configured to allocate video memory resources to the high-priority task according to the allocation instruction, so as to run the high-priority task at least according to the tensor data in the video memory space.
  • the allocator is further configured to send the video memory resource usage status information of the task to the coordinator;
  • the coordinator is further configured to send a display memory resource release instruction to the allocator if the information satisfies the display memory resource release condition.
  • the allocator is specifically configured to send the information to the coordinator according to a preset period.
  • the application also provides a machine learning system, including:
  • the client is used to send the priority information of the machine learning task to the server;
  • the server side is used to determine the priority of multiple machine learning tasks running through the graphics processing unit; if you want to allocate video memory resources for high-priority tasks, and the allocated video memory resources are less than the memory resource requirements of the high-priority tasks, release At least part of the video memory resources occupied by low-priority tasks; allocate video memory resources for high-priority tasks to run high-priority tasks at least according to the tensor data in the video memory space.
  • the present application also provides a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, which, when executed on a computer, cause the computer to execute the above-mentioned various methods.
  • the present application also provides a computer program product comprising instructions which, when executed on a computer, cause the computer to perform the various methods described above.
  • the video memory management method determines the priorities of multiple machine learning tasks run by the graphics processing unit; if video memory resources are allocated for high-priority tasks, and the allocated video memory resources are smaller than the video memory of the high-priority tasks If the resource requirements are met, at least part of the video memory resources occupied by the low-priority tasks will be released; the video memory resources will be allocated to the high-priority tasks to run the high-priority tasks at least according to the tensor data of the video memory space; When the allocated video memory resources are insufficient, the video memory resources occupied by low-priority tasks are allocated to high-priority tasks, thereby realizing dynamic scaling and optimization of GPU video memory resources occupied by multiple machine learning tasks in parallel on one GPU. On the premise of ensuring the performance of high-priority tasks, GPU memory resources can be allocated to other tasks for use; therefore, the resource utilization of the overall cluster can be effectively improved while ensuring the performance of high-priority tasks.
  • a machine learning task is run by a graphics processing unit; the use status information of the video memory resource of the machine learning task is determined; if the information satisfies the release condition of the video memory resource, the idle occupied by the task is released Video memory resources, so as to allocate the free video memory resources to other machine learning tasks running in parallel through the graphics processing unit; this processing method makes the free video memory resources occupied by tasks released in time; therefore, the overall cluster resources can be effectively improved utilization.
  • the storage resource coordinator determines the priorities of multiple machine learning tasks run by the graphics processing unit; if video memory resources are to be allocated for high-priority tasks, and the allocable video memory resources are smaller than the high-priority tasks If the memory resource demand of the high-priority task is higher, the video memory resource release instruction is sent to the first storage resource allocator of the low-priority task, and the video memory resource allocation instruction is sent to the second storage resource allocator of the high-priority task; the first storage resource allocation The second storage resource allocator, according to the release instruction, releases at least a part of the video memory resources occupied by the low-priority tasks; the second storage resource allocator allocates video memory resources for the high-priority tasks according to the allocation instructions, so as to at least according to the tensor data of the video memory space, Run high-priority tasks; this processing method makes it possible to allocate the video memory resources occupied by low-prior
  • the client sends the priority information of the machine learning tasks to the server; the server determines the priorities of multiple machine learning tasks run by the graphics processing unit; Allocate video memory resources, and the allocatable video memory resources are less than the video memory resource requirements of high-priority tasks, release at least part of the video memory resources occupied by low-priority tasks; allocate video memory resources for high-priority tasks, at least according to the tensor of video memory space data, and run high-priority tasks; this processing method makes it possible to allocate the video memory resources occupied by low-priority tasks to high-priority tasks when the allocatable video memory resources are insufficient, thereby realizing parallel multi-tasking on one GPU.
  • the GPU memory resources occupied by each machine learning task are dynamically scaled and optimized, so that the GPU memory resources can be allocated to other tasks under the premise of ensuring the performance of high-priority tasks. Therefore, the resource utilization rate of the overall cluster can be effectively improved. At the same time, the performance of high-priority tasks is guaranteed.
  • FIG. 1 is a schematic flowchart of an embodiment of a video memory management method provided by the present application.
  • FIG. 2 is a schematic diagram of an application scenario of an embodiment of a video memory management method provided by the present application
  • FIG. 3 is a schematic diagram of dynamic scaling of video memory resources according to an embodiment of a video memory management method provided by the present application
  • FIG. 4 is a schematic diagram of changes in video memory resources according to an embodiment of a video memory management method provided by the present application
  • FIG. 5 is a schematic structural diagram of an embodiment of a video memory management apparatus provided by the present application.
  • FIG. 6 is a schematic structural diagram of an embodiment of a video memory management system provided by the present application.
  • FIG. 1 is a schematic flowchart of an embodiment of a video memory management method of the present application.
  • the method provided by this embodiment may include the following steps:
  • Step S101 Determining the priorities of multiple machine learning tasks run by the graphics processing unit.
  • the video memory management method provided by the present application can be applied to a machine learning system, and is used for allocating and using GPU video memory resources when multiple machine learning tasks share GPU video memory resources.
  • the machine learning system may be a deep learning system constructed based on a deep learning computing framework, such as deep learning computing frameworks such as TensorFlow and PyTorch.
  • the machine learning task also referred to as a machine learning model training task, can learn a machine learning model from training data.
  • the model may be a model based on a deep neural network, and correspondingly, the machine learning task is a deep learning task.
  • the model is a named entity recognition model, a speech recognition model, a product recommendation model, etc. learned from the training data.
  • the model may also be a non-neural network machine learning model such as a decision tree.
  • the machine learning system may be a distributed machine learning system, including one or more GPU-based computing nodes, also referred to as GPU devices.
  • each GPU device may include one or more GPUs. Multiple machine learning tasks can be run in parallel on one GPU, and these machine learning tasks share the GPU's video memory resources.
  • the GPU device also includes a central processing unit (CPU) and a memory, wherein the CPU may also be referred to as a host of the GPU.
  • node 1 includes GPU1 and GPU2, on which task D and task E are concurrently run on GPU1, and task A, task B and task C are concurrently run on GPU2.
  • the video memory management method provided by this application can dynamically adjust the performance guarantee priority (referred to as the priority) of multiple tasks in parallel on a GPU. memory resources.
  • the priority of the task can be determined according to application requirements, for example, only two priorities are set: a high priority and a low priority. For example, if the learning task 1 of the named entity recognition model is a "performance assurance task" with guaranteed service level, and the learning task 2 of the speech recognition model is a "speculative execution task" without a service level guarantee, then task 1 can be set as high priority level, set task 2 to low priority.
  • Table 1 shows the priority setting information of machine learning tasks in an example.
  • Task ID task priority Task 1 Named Entity Recognition Model
  • Task 2 Speech Recognition Model
  • Level 2 Second highest priority
  • Task 3 Product Recommendation Model
  • Level 2 second highest priority
  • Task 4 Lianguage Model
  • Level 3 lowest priority
  • Step S103 Release at least a part of the video memory space occupied by the low-priority task if video memory resources are to be allocated for the high-priority task, and the allocatable video memory resources are less than the video memory resource requirement of the high-priority task.
  • the video memory management method when allocating video memory resources for high-priority tasks, if the allocatable video memory resources on the GPU are less than the video memory resource requirements of the high-priority tasks, part or all of the video memory occupied by the low-priority tasks should be released. space, allocate the released memory space to high-priority tasks, so as to store the tensor data of high-priority tasks in the video memory as much as possible, which can ensure the running performance of high-priority tasks.
  • the required amount of video memory resources may be the video memory resources to be increased by the running deep learning task, or may be the video memory resources initially required by the deep learning task to be run.
  • the required amount of video memory resources can be determined by a deep learning task. Since the method for determining the required amount of video memory resources belongs to a relatively mature prior art, it will not be repeated here.
  • the video memory resources required by the task are determined first. After determining the video memory resources required by a task, if the GPU's allocatable video memory resources are less than the task's video memory resource requirements, determine the priority of the task and the priorities of other tasks running in the GPU. If If the priority of the task to be run is higher than the priority of other running tasks, in order to ensure the performance of the high-priority task, it is necessary to release the video memory resources occupied by the low-priority task.
  • the amount of video memory resources to be released may be determined according to the amount of video memory resources required by the task to be executed. For example, if a high-priority task requires 1G video memory resources, the video memory resources occupied by 1.2G low-priority tasks can be released.
  • the video memory resources occupied by one or more low-priority tasks may be released.
  • high-priority tasks require 5G video memory resources, and it is not enough to release the video memory resources occupied by a low-priority task.
  • the video memory resources occupied by multiple low-priority tasks can be released to release video memory resources larger than 5G.
  • the method may further include the step of releasing idle video memory resources occupied by the multiple machine learning tasks.
  • the inventor finds that most deep learning tasks cannot fully utilize all GPU memory resources allocated to it at all times, and usually there are idle memory resources.
  • the inventor found through research that the idle video memory resources of the deep learning task may be caused by the following reasons.
  • Product-oriented deep learning training tasks usually include a lot of computing parts, some of which are not easy to parallelize, so it is difficult to completely occupy GPU memory resources, such as graph sampling in neural networks and feature extraction in advertising models. , data augmentation in computer vision, etc.
  • the above-mentioned idle video memory resources will not be released, but this part of the video memory resources will always be reserved for the task.
  • the task may use this part of the available video memory in the subsequent running process, such as storing other tensor data, or may not use this part of the free video memory resources, so the GPU resources are still in a relatively low utilization state in many cases.
  • the method provided by this embodiment based on the above-mentioned usage status of video memory resources, releases idle video memory resources that are not used up by tasks, so that they can be allocated to other tasks in time for use, so as to optimize the processing of multiple tasks in a shared GPU resource scenario, and avoid Other tasks are queued up, thereby improving the utilization of GPU resources, which in turn can improve the throughput of the shared GPU cluster.
  • the releasing the idle video memory resources occupied by the multiple machine learning tasks may include the following sub-steps: 1) determining the video memory resource usage information of the machine learning tasks; 2) if the information satisfies the video memory resources If the release condition is satisfied, the idle video memory resource is released.
  • the usage status information includes: an upper limit value of video memory resources actually used by the task; and the release condition includes: a time period when the allocated amount of video memory resources of the task is greater than the upper limit value reaches a time length threshold. For example, in the process of running a task, the upper limit value (peak value) of the actually used video memory resources is determined every 10 seconds, and the ratio of the video memory resources occupied by the task is displayed for 30 consecutive seconds (after determining the usage information three times). In the case where the actual required peak value is large, the idle video memory resources will be released.
  • releasing at least a part of the video memory space occupied by the low-priority task may include the following sub-steps: if the free video memory resources of the low-priority task are greater than or equal to the required amount of video memory resources, releasing the video memory space occupied by the low-priority task Free video memory resources.
  • both types of tasks can be run based on the data stored in the video memory space, which can not only ensure the running performance of high-priority tasks, but also avoid affecting the performance of low-priority tasks.
  • releasing at least a part of the video memory space occupied by the low-priority task may further include the following sub-step: if the free video memory resources of the low-priority task are less than the required amount of video memory resources, allocating memory for the low-priority task resources, and release at least a portion of the video memory resources used by the low-priority task to continue running the low-priority task at least according to the tensor data in the memory space.
  • the video memory resources used by the low-priority tasks do not belong to idle video memory resources, and the tensor data of the low-priority tasks are still stored in this part of the video memory resources, which are required for the low-priority tasks to ensure running performance.
  • the low-priority task will at least Continue to run based on tensor data in memory space.
  • the low-priority task may continue to run the low-priority task based on a part of tensor data in the memory space and another part of the tensor data in the video memory space at the same time. If all video memory resources used by the low-priority task are released, the low-priority task can continue to be run based on all tensor data in the memory space. It can be seen that in either case, the task will not fail.
  • This processing method makes it necessary to further release the video memory resources occupied by the tensor data of the low-priority task if the video memory resource demand of the high-priority task cannot be satisfied after releasing the idle video memory resources of the low-priority task.
  • sub-figure a shows the video memory resources occupied by a deep learning task in the initial stage, including the data of a tensor, where the height of the dotted line represents the water level of the occupied video memory resources.
  • Subgraph b shows that as the tensor data required by the task increases accordingly, the upper limit of video memory usage also increases.
  • These tensor data can be cached in the pool of the task's memory allocator (video memory resource allocator), that is, the cache In the video memory space allocated for the task, the tensor data of the next small batch of the task can be continuously reused, and the increased video memory resources of the task can accommodate the data of three tensors.
  • Sub-picture c shows that after clearing a tensor, free video memory resources appear, and the water level of video memory resources does not change.
  • the method provided by the embodiment of the present application dynamically adjusts the upper limit of the video memory usage according to the video memory resources actually required by the task.
  • the currently used video memory resources are actively detected, and the idle video memory resources are released, thereby adjusting the upper limit of the video memory usage to an appropriate value.
  • Sub-figure d shows the video memory resource water level after the above-mentioned available video memory has not been used for a period of time, and the part of the available video memory (idle video memory resource) is reclaimed (released).
  • the deep learning system exposes an interface, which allows the upper limit of GPU memory usage of the task to be increased or decreased when the task is running, and can even be reduced to a value lower than the actual requirement of the task.
  • Subgraph e can be a situation where the allocatable video memory is not enough when a high-priority task arrives, and the data of a tensor of low priority is transferred to the memory space of the host, so that the released video memory resources can be released. Allocated to high-priority tasks to ensure the performance of high-priority tasks. In this way, even if there is not enough video memory resources, low priority tasks can still continue to run, and will not fail.
  • the method may further include the step of: if the allocatable video memory resource increases to the video memory resource requirement of the low-priority task, allocating the video memory resource to the low-priority task, so that according to the amount of video memory space Tensor data, continue to run the low-priority task.
  • the upper limit of the memory for low-priority tasks can be raised, and the tensors of low-priority tasks can also be reassigned on the GPU.
  • Subgraph f shows the situation of re-storing the tensor data of low-priority tasks from memory space to video memory space when GPU memory resources are no longer in short supply, so as to improve the performance of low-priority tasks as much as possible.
  • the low-priority tasks include: iterative learning tasks; and the releasing the video memory resources used by the low-priority tasks may be implemented in the following manner: after the low-priority tasks complete the current iterative learning, release the used memory resources. memory resources.
  • a deep learning scenario to perform a gradient descent algorithm on the training set, the entire data set is usually divided into several small training sets, and a small subset is trained each time. In this way, on the one hand, the huge amount of computation caused by the one-time participation of all datasets in training can be avoided; on the other hand, the gradient direction of the subset will not be too different from the entire dataset, which ensures the correctness of training.
  • This training method is also called mini-batch training method.
  • Such a learning task is called an iterative learning task.
  • various data involved in the training should be stored in the storage space, and the container for storing these data is called a tensor.
  • a tensor is a data unit that stores data in a deep learning framework. As a data container, it can store data during training.
  • a task can include multiple tensors in one training session.
  • the tensor data can be an arbitrary number of multi-dimensional arrays composed of a set of original values, including the training data of a mini-batch training, the model parameters generated during the training process, and the data of various intermediate nodes in the network, etc. .
  • the utilization rate of GPU video memory resources is improved on the premise of ensuring the running performance of high-priority tasks.
  • the processing method of moving tensor data between video memory and memory makes it possible to use memory resources as video memory resources when video memory resources are insufficient, but this usually introduces huge data copying overhead.
  • the inventor of the present invention finds that in deep learning tasks, training is performed in small batches, and tensor data will be created and destroyed in a small batch, The same tensor will be created repeatedly between mini-batches. Therefore, we choose to dynamically adjust the upper limit of the video memory usage of the task on the boundary of the mini-batch. At this time, the tensor data has been released, so as to avoid explicit memory usage. Copy data between memory and memory, thus avoiding huge copy overhead.
  • the method may further include the following steps: allocating a part of the memory resources and a part of the video memory resources to the high-priority tasks, so that the tensor data and Tensor data in video memory space, running high-priority tasks.
  • the method may further include the following steps: releasing the memory resources of the high-priority tasks, so that the high-priority operation can be run completely based on the tensor data in the video memory space. level tasks.
  • the high-priority task suddenly surges in video memory demand at time T0, the video memory demand is a, and the allocatable video memory is not enough at this time, so it first occupies a in the memory space, temporarily based on the memory Tensor data runs high-priority tasks.
  • the tensor data of the low-priority task is moved to the memory, and the memory resource a released by the low-priority task is allocated to the high-priority task for use, and released at the same time
  • the memory resource a occupied by the high-priority task increases the video memory resource of the high-priority task by a, and continues to execute the high-priority task on the video memory. In this way, the performance of high-priority tasks is only reduced for a very short period of time, but it will not cause failure, and at the same time, huge copying overhead can be avoided.
  • the memory resource b used by the low-priority task is released, and b is occupied in the memory space.
  • the low-priority task is run based on the tensor data of the memory space, and the loss is lost. Partial run performance. After that, after other tasks release the video memory resources, the video memory resources of the low-priority tasks at time T1 increase by b, and continue to run low-priority tasks based on the tensor data in the video memory space.
  • not only the video memory space occupied by the low-priority task can be released, but also the free video memory space occupied by other high-priority tasks can be released. Using this processing method can release more video memory resources and effectively improve the performance of low-priority tasks.
  • the method may further include step S105: Allocate video memory space for the high-priority task according to the video memory space released by the low-priority task, so as to run the high-priority task at least according to the tensor data in the video memory.
  • the video memory management method determines the priorities of multiple machine learning tasks run by the graphics processing unit; According to the memory space requirement of the high-priority task, at least part of the memory space of the low-priority task is released; according to the memory space released by the low-priority task, the memory space is allocated for the high-priority task, at least according to the tensor data in the video memory. , run high-priority tasks; this processing method makes when the available video memory resources are insufficient, the video memory resources occupied by low-priority tasks are allocated to high-priority tasks to achieve dynamic scaling optimization.
  • GPU memory resources occupied by machine learning tasks so that GPU memory resources can be allocated to other tasks on the premise of ensuring the performance of high-priority tasks; therefore, the resource utilization of the overall cluster can be effectively improved, while ensuring high-priority tasks. performance.
  • a video memory management method is provided, and correspondingly, the present application also provides a video memory management apparatus.
  • the device corresponds to the embodiment of the method described above. Since the apparatus embodiment is basically similar to the method embodiment, the description is relatively simple, and reference may be made to part of the description of the method embodiment for related parts.
  • the apparatus embodiments described below are merely illustrative.
  • FIG. 5 is a schematic structural diagram of an embodiment of a video memory management apparatus of the present application.
  • the present application further provides a video memory management device, including:
  • a priority determining unit 501 configured to determine the priorities of multiple machine learning tasks run by the graphics processing unit;
  • the display memory release unit 502 is used to release at least a part of the display memory resources occupied by the low priority task if the display memory resource is to be allocated for the high priority task, and the assignable display memory resource is less than the display memory resource demand of the high priority task;
  • the video memory allocation unit 503 is configured to allocate video memory resources to the high-priority tasks, so as to run the high-priority tasks at least according to the tensor data in the video memory space.
  • a video memory management method is provided, and correspondingly, the present application also provides an electronic device.
  • the device corresponds to the embodiment of the method described above. Since the device embodiments are basically similar to the method embodiments, the description is relatively simple, and reference may be made to some descriptions of the method embodiments for related parts.
  • the device embodiments described below are merely illustrative.
  • An electronic device in this embodiment includes: a processor and a memory; the memory is used to store a program for implementing the video memory management method. After the device is powered on and runs the program of the method through the processor, the following execution is performed. Steps: Determine the priority of multiple machine learning tasks running through the graphics processing unit; if the memory space is allocated for the high-priority task, and the allocatable memory space is less than the memory space requirement of the high-priority task, release the low-priority task At least part of the video memory space of the task; according to the video memory space released by the low-priority task, allocate video memory space for the high-priority task, so as to run the high-priority task at least according to the tensor data in the video memory.
  • the present application further provides a video memory management method.
  • the parts of this embodiment that have the same contents as those of the first embodiment will not be repeated, and please refer to the corresponding parts in the first embodiment.
  • the method may include the following steps:
  • Step 1 Run the machine learning task through the graphics processing unit.
  • Step 2 Determine the video memory resource usage information of the machine learning task.
  • Step 3 If the information satisfies the condition for releasing the video memory resources, release the free video memory resources occupied by the tasks, so as to allocate the free video memory resources to other machine learning tasks running in parallel by the graphics processing unit.
  • the machine learning tasks include but are not limited to: deep learning tasks.
  • the idle video memory resources may include at least one of the following resources: idle video memory resources generated by modules that cannot be processed in parallel in the deep learning task, and idle video memory resources generated when multiple resources required by the deep learning task are satisfied.
  • the machine learning task may be a distributed deep learning task.
  • the free video memory resources may be free video memory resources generated when multiple image processing units corresponding to the distributed deep learning task synchronize data.
  • the idle memory resources of deep learning tasks may be generated by the following reasons.
  • Idle video memory resources generated when multiple image processing units corresponding to distributed deep learning tasks synchronize data The deep learning system is faced with massive training data. With the increase of data, in the ultra-large-scale distributed training, a lot of time will be spent in the network data synchronization stage of the model. As shown in Figure 2, task E is at the same time at node 1 and node n-1. The GPU memory resources are also idle during the data synchronization period.
  • Idle video memory resources generated while waiting for multiple resources required by the deep learning task to be satisfied are generated.
  • Distributed deep learning training usually uses the synchronous stochastic gradient descent (SGD) method, which requires that the resources required by the training task need to be satisfied at the same time, so that the training task will start. Therefore, from the perspective of the cluster scheduler, when the resources are insufficient, the scheduler needs to reserve some available resources for distributed tasks until all the required resources are satisfied. This reservation process also causes the GPU memory resources to be in the Idle waiting state.
  • SGD stochastic gradient descent
  • the usage status information includes, but is not limited to: the upper limit of the video memory resources actually used by the task; the release conditions include, but are not limited to: the time period for which the task's video memory resource allocation is greater than the upper limit reaches duration threshold.
  • the duration threshold can be determined according to application requirements, for example, set to 30 seconds.
  • the upper limit value (peak value) of the actually used video memory resources is determined every 10 seconds, and the ratio of the video memory resources occupied by the task is displayed for 30 consecutive seconds (after determining the usage information three times). In the case where the actual required peak value is large, the idle video memory resources will be released.
  • the video memory management method runs a machine learning task through a graphics processing unit; determines the video memory resource usage information of the machine learning task; if the information satisfies the video memory resource release condition, releases the video memory resource.
  • the idle video memory resources occupied by the tasks are allocated to other machine learning tasks running in parallel through the graphics processing unit; this processing method makes the idle video memory resources occupied by the tasks released in time; therefore, it is possible to Effectively improve the resource utilization of the overall cluster.
  • a video memory management method is provided, and correspondingly, the present application also provides a video memory management apparatus.
  • the device corresponds to the embodiment of the method described above. Since the apparatus embodiment is basically similar to the method embodiment, the description is relatively simple, and reference may be made to part of the description of the method embodiment for related parts.
  • the apparatus embodiments described below are merely illustrative.
  • the device includes:
  • a task execution unit for running machine learning tasks through a graphics processing unit For running machine learning tasks through a graphics processing unit.
  • An information determination unit configured to determine the usage status information of video memory resources of the machine learning task.
  • a video memory release unit configured to release free video memory resources occupied by the task if the information satisfies a video memory resource release condition, so as to allocate the free video memory resources to other machine learning tasks running in parallel by the graphics processing unit.
  • a video memory management method is provided, and correspondingly, the present application also provides an electronic device.
  • the device corresponds to the embodiment of the method described above. Since the device embodiments are basically similar to the method embodiments, the description is relatively simple, and reference may be made to some descriptions of the method embodiments for related parts.
  • the device embodiments described below are merely illustrative.
  • An electronic device in this embodiment includes: a processor and a memory; the memory is used to store a program for implementing the video memory management method. After the device is powered on and runs the program of the method through the processor, the following execution is performed. Steps: run the machine learning task through the graphics processing unit; determine the video memory resource usage information of the machine learning task; if the information satisfies the video memory resource release condition, release the idle video memory resources occupied by the task, so that the idle video memory resources are released. Video memory resources are allocated to other machine learning tasks running in parallel by the graphics processing unit.
  • the present application further provides a video memory management system.
  • the parts of this embodiment that have the same contents as those of the first embodiment will not be repeated, and please refer to the corresponding parts in the first embodiment.
  • FIG. 6 is a schematic structural diagram of an embodiment of a video memory management apparatus of the present application.
  • a video memory management system provided by the present application includes: a storage resource coordinator, a storage resource allocator (first storage resource allocator) for low-priority tasks (eg, task A), and a storage resource for high-priority tasks (eg, task B) Resource allocator (second storage resource allocator).
  • the system provided by this embodiment implements adaptive dynamic scaling and optimization of GPU video memory resources for machine learning tasks by co-designing the storage resource coordinator and the machine learning computing framework.
  • the storage resource coordinator supports adaptive GPU video memory resource adjustment, and can be deployed in a GPU computing node to schedule and manage the video memory resources of one or more GPUs in the node. Multiple tasks on the GPU dynamically adjust the usage of GPU memory resources.
  • the machine learning computing framework corresponds to machine learning tasks, and each task is run through the machine learning computing framework.
  • the machine learning computing framework can be an end-to-end machine learning platform, and various machine learning frameworks can have their own ecosystems, including various tools, libraries and other resources, which can help developers to easily build and deploy machine learning systems. Learn the apps that provide support.
  • the machine learning computing framework is a deep learning computing framework, including but not limited to TensorFlow, PyTorch, MXNet, Caffe, and the like.
  • the storage resource coordinator is used to determine the priority of multiple machine learning tasks running through the graphics processing unit; to allocate video memory resources for high-priority tasks, and the amount of video memory resources that can be allocated is less than the amount of video memory resources for high-priority tasks, then Send a video memory resource release instruction to the storage resource allocator of the low-priority task, and send a video memory resource allocation instruction to the storage resource allocator (the second storage resource allocator) of the high-priority task; the storage resource allocator of the low-priority task is used for According to the release instruction, release at least part of the video memory resources occupied by the low-priority task; the storage resource allocator of the high-priority task is configured to allocate video memory resources for the high-priority task according to the allocation instruction, so as to at least according to the expansion of the video memory space data and run high-priority tasks.
  • the allocator is further configured to send the video memory resource usage information of the task to the coordinator.
  • the allocator may send the information to the coordinator according to a preset period.
  • the coordinator is further configured to send a video memory resource release instruction to the allocator if the information satisfies the video memory resource release condition.
  • the storage resource coordinator determines the priorities of multiple machine learning tasks run by the graphics processing unit;
  • the allocated video memory resource is less than the video memory resource requirement of the high-priority task, then the video memory resource release instruction is sent to the first storage resource allocator of the low-priority task, and the video memory resource allocation command is sent to the second storage resource allocator of the high-priority task ;
  • the first storage resource allocator releases at least a part of the video memory resources occupied by the low-priority task according to the release instruction;
  • the second storage resource allocator allocates video memory resources for the high-priority task according to the allocation instruction, to at least according to the video memory Space tensor data to run high-priority tasks; this processing method allows the local storage resource coordinator of the GPU device to allocate video memory resources occupied by low-priority tasks to high-priority when the allocatable video memory resources are insufficient.
  • the GPU memory resources occupied by multiple machine learning tasks in parallel on one GPU can be dynamically scaled and optimized, so that the GPU memory resources can be allocated to other tasks under the premise of ensuring the performance of high-priority tasks. task usage; therefore, the resource utilization of the overall cluster can be effectively improved while ensuring the performance of high-priority tasks.
  • a machine learning system provided by this application includes: a client and a server.
  • the client is used to send the priority information of the machine learning task to the server; the server is used to determine the priority of multiple machine learning tasks run by the graphics processing unit; If the allocatable video memory resources are less than the video memory resource requirements of the high-priority tasks, at least part of the video memory resources occupied by the low-priority tasks will be released; the video memory resources will be allocated to the high-priority tasks, so that the high-priority tasks can run at least according to the tensor data of the video memory space. level tasks.
  • the client includes, but is not limited to, mobile communication equipment, that is, commonly referred to as a mobile phone or a smart phone, and also includes terminal equipment such as a personal computer, a PAD, and an iPad.
  • the server can run machine learning tasks on the GPU cluster.
  • a task service device can be provided to the user through the client, and the user can determine the machine learning task to be run through the task service device of the client, and set the task priority, such as selecting "secondary" priority, different Priority may be subject to different service charges.
  • you can submit a task running request to the server through the client.
  • the server can store the priority information of the task, and store the task in the task table.
  • the server can include one or more GPU computing nodes, each node can run machine learning tasks through a machine learning framework, and the storage resource coordinator deployed in the GPU computing node can obtain the required data from the server. Priority information for multiple tasks running through this compute node. Table 2 shows the machine learning task information in this embodiment.
  • the learning task 1 of user A's named entity recognition model is "performance assurance task”
  • the learning task 2 of user B's named entity recognition model is "speculative execution task”
  • the priority of "performance assurance task” is high. on the priority of "Speculative Execution Tasks”.
  • the following processing can be performed: If the memory resources are allocated to the high-priority tasks, and the allocated memory resources are less than the memory resource requirements of the high-priority tasks , then release at least part of the video memory resources occupied by the low-priority tasks; allocate video memory resources for the high-priority tasks, so as to run the high-priority tasks at least according to the tensor data in the video memory space.
  • the server is further configured to determine performance information of a machine learning task; and adjust the priority of the task according to the performance information.
  • the priority of task A was originally level two, but the performance of the task did not meet the "level two" service level required by the user through the system, then the priority of the task can be adjusted to level one, so that Its real performance meets the "secondary" service level requirements required by users.
  • the server can record the change information of the video memory resources during the running process of the machine learning task, and adjust the priority information of the task according to the change information. For example, if a high-priority task runs 30% of the time based on tensor data in the memory space, and the performance information of the task does not meet the service level requirements, the priority of the task can be increased.
  • the server can also be used to determine performance information of a machine learning task; and determine a serviceable machine learning task according to the performance information.
  • video memory resource management services can be provided for task A and task B.
  • the machine learning system provided by the embodiments of the present application sends priority information of machine learning tasks to the server through the client; the server determines the priorities of multiple machine learning tasks run by the graphics processing unit; To allocate video memory resources for high-priority tasks, and the allocatable video memory resources are less than the memory resource requirements of high-priority tasks, release at least part of the video memory resources occupied by low-priority tasks; allocate video memory resources for high-priority tasks to at least Run high-priority tasks according to the tensor data in the video memory space; this processing method makes the video memory resources occupied by low-priority tasks allocated to high-priority tasks when the allocatable video memory resources are insufficient.
  • the GPU memory resources occupied by multiple machine learning tasks in parallel on a GPU are dynamically scaled and optimized, so that the GPU memory resources can be allocated to other tasks under the premise of ensuring the performance of high-priority tasks; therefore, it can effectively improve the overall The resource utilization of the cluster, while ensuring the performance of high-priority tasks.
  • a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
  • processors CPUs
  • input/output interfaces network interfaces
  • memory volatile and non-volatile memory
  • Memory may include forms of non-persistent memory in computer readable media, random access memory (RAM) and/or non-volatile memory, such as read only memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
  • RAM random access memory
  • ROM read only memory
  • flash RAM flash memory
  • Computer readable media includes both persistent and non-permanent, removable and non-removable media.
  • Information storage can be implemented by any method or technology.
  • Information may be computer readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Flash Memory or other memory technology, Compact Disc Read Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, Magnetic tape cassettes, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission medium that can be used to store information that can be accessed by a computing device.
  • computer-readable media does not include non-transitory computer-readable media (transitory media), such as modulated data signals and carrier waves.
  • the embodiments of the present application may be provided as methods, systems or computer program products. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
  • computer-usable storage media including, but not limited to, disk storage, CD-ROM, optical storage, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Memory System Of A Hierarchy Structure (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Memory System (AREA)

Abstract

一种显存管理方法、装置、设备及系统。其中,所述方法包括:确定通过图形处理单元运行的多个机器学习任务的优先级;若要为高优先级任务分配显存资源、且可分配显存资源小于高优先级任务的显存资源需求量,则释放低优先级任务占用的至少一部分显存资源;为高优先级任务分配显存资源,以至少根据显存空间的张量数据,运行高优先级任务。采用这种处理方式,使得在可分配显存资源不足时,将低优先级任务占用的显存资源分配给高优先级任务使用,由此实现对在一个GPU上并行的多个机器学习任务占用的GPU显存资源进行动态伸缩优化,这样就可以在保障高优先级任务性能的前提下,提升整体集群的资源利用率。

Description

显存管理方法、装置、设备及系统
本申请要求2020年11月03日递交的申请号为202011219652.6、发明名称为“显存管理方法、装置、设备及系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及机器学习技术领域,具体涉及显存管理方法、装置及系统,机器学习系统,以及电子设备。
背景技术
随着深度学习算法的不断发展、及图形处理单元(Graphics Processing Unit,GPU)算力的加持,深度学习已经成为企业产品数据流中至关重要的一环。为了支持大规模深度学习应用,企业通常会构建共享的GPU集群,用来支持横跨多个领域产品的发展,如计算机视觉、自然语言处理、语音识别、推荐和广告服务等。
为了提升GPU资源利用率、及整个GPU集群的生产量(throughput),深度学习系统允许在一个GPU上同时运行多个深度学习任务,这样可以用相同的资源量完成更多的深度学习训练任务。目前,一种典型的GPU显存资源复用方式是,由深度学习框架内的一个统一的显存分配器进行显存管理,当该分配器接收到任何一个任务的显存资源申请时,只要运行该任务的GPU有空闲显存资源,就为该任务分配相应的显存空间,而不考虑该GPU上同时运行的其它任务对显存资源的需求,这种处理方式可以加速任务的小批量训练速度。
然而,在实现本发明过程中,发明人发现上述技术方案均至少存在如下问题:1)上述资源复用方式没有提供任何性能隔离保障,会带来难以控制的多任务间的互相影响。具体而言,当GPU被分配给一个“资源保障”任务单独使用时,深度学习系统可以保障它的任务训练性能。而由于GPU上缺乏了性能隔离的机制,如果这样的GPU上面还有其他一起执行的任务,那么潜在的显存资源竞争可能导致“资源保障”任务严重的性能下降。2)随着训练的进行,“资源保障”任务的GPU显存需求可能突然增长,而如果此时GPU显存被其它任务占用,则该“资源保障”任务将失败,这更是不能接受的。综上所述,如何对机器学习系统的共享显存资源进行管理,以在确保高优先级任务的显存资源使用的前提下,提升GPU集群利用率,成为本领域技术人员急需解决的问题。
发明内容
本申请提供显存管理方法,以解决现有技术存在的无法保障高优先级任务性能的问题。本申请另外提供显存管理装置和系统,机器学习系统,以及电子设备。
本申请提供一种显存管理方法,包括:
确定通过图形处理单元运行的多个机器学习任务的优先级;
若要为高优先级任务分配显存资源、且可分配显存资源小于高优先级任务的显存资源需求量,则释放低优先级任务占用的至少一部分显存资源;
为高优先级任务分配显存资源,以至少根据显存空间的张量数据,运行高优先级任务。
可选的,还包括:
释放所述多个机器学习任务占用的空闲显存资源。
可选的,所述释放所述多个机器学习任务占用的空闲显存资源,包括:
确定所述机器学习任务的显存资源使用状况信息;
若所述信息满足显存资源释放条件,则释放所述空闲显存资源。
可选的,所述使用状况信息包括:所述任务实际使用显存资源的上限值;
所述释放条件包括:所述任务的显存资源分配量大于所述上限值的时长达到时长阈值。
可选的,还包括:
释放其它高优先级任务的空闲显存资源。
可选的,所述释放低优先级任务占用的至少一部分显存资源,包括:
若低优先级任务的空闲显存资源大于或者等于显存资源需求量,则释放低优先级任务占用的空闲显存资源。
可选的,所述释放低优先级任务占用的至少一部分显存资源,包括:
若低优先级任务的空闲显存资源小于显存资源需求量,则为低优先级任务分配内存资源,并释放低优先级任务使用的至少一部分显存资源,以至少根据内存空间的张量数据,继续运行低优先级任务。
可选的,还包括:
若可分配显存资源增长到所述低优先级任务的显存资源需求量,则为所述低优先级任务分配显存资源,以根据显存空间的张量数据,继续运行所述低优先级任务。
可选的,所述低优先级任务包括:迭代学习任务;
所述释放低优先级任务使用的显存资源,包括:
在低优先级任务完成当前迭代学习后,释放所述使用的至少一部分显存资源。
可选的,在所述释放低优先级任务使用的至少一部分显存资源前,还包括:
为高优先级任务分配内存资源,以根据内存空间的张量数据和显存空间的张量数据,运行高优先级任务。
可选的,在所述为高优先级任务分配显存资源后,还包括:
释放高优先级任务的内存资源。
可选的,所述机器学习任务包括分布式深度学习任务。
本申请还提供一种显存管理方法,包括:
通过图形处理单元运行机器学习任务;
确定所述机器学习任务的显存资源使用状况信息;
若所述信息满足显存资源释放条件,则释放所述任务占用的空闲显存资源,以便将所述空闲显存资源分配给通过所述图形处理单元并行运行的其它机器学习任务。
本申请还提供一种显存管理装置,包括:
优先级确定单元,用于确定通过图形处理单元运行的多个机器学习任务的优先级;
显存释放单元,用于若要为高优先级任务分配显存资源、且可分配显存资源小于高优先级任务的显存资源需求量,则释放低优先级任务占用的至少一部分显存资源;
显存分配单元,用于为高优先级任务分配显存资源,以至少根据显存空间的张量数据,运行高优先级任务。
本申请还提供一种电子设备,包括:
处理器和存储器;
存储器,用于存储实现显存管理方法的程序,该设备通电并通过所述处理器运行该方法的程序后,执行下述步骤:确定通过图形处理单元运行的多个机器学习任务的优先级;若要为高优先级任务分配显存空间、且可分配显存空间小于高优先级任务的显存空间需求量,则释放低优先级任务的至少一部分显存空间;根据低优先级任务释放的显存空间,为高优先级任务分配显存空间,以至少根据显存中的张量数据,运行高优先级任务。
本申请还提供一种显存管理装置,包括:
任务运行单元,用于通过图形处理单元运行机器学习任务。
信息确定单元,用于确定所述机器学习任务的显存资源使用状况信息。
显存释放单元,用于若所述信息满足显存资源释放条件,则释放所述任务占用的空闲显存资源,以便将所述空闲显存资源分配给通过所述图形处理单元并行运行的其它机器学习任务。
本申请还提供一种电子设备,包括:
处理器和存储器;
存储器,用于存储实现显存管理方法的程序,该设备通电并通过所述处理器运行该方法的程序后,执行下述步骤:通过图形处理单元运行机器学习任务;确定所述机器学习任务的显存资源使用状况信息;若所述信息满足显存资源释放条件,则释放所述任务占用的空闲显存资源,以便将所述空闲显存资源分配给通过所述图形处理单元并行运行的其它机器学习任务。
本申请还提供一种显存管理系统,包括:
存储资源协调器,用于确定通过图形处理单元运行的多个机器学习任务的优先级;若要为高优先级任务分配显存资源、且可分配显存资源小于高优先级任务的显存资源量,则向低优先级任务的第一存储资源分配器发送显存资源释放指令,向高优先级任务的第二存储资源分配器发送显存资源分配指令;
第一存储资源分配器,用于根据所述释放指令,释放低优先级任务占用的至少一部分显存资源;
第二存储资源分配器,用于根据所述分配指令,为高优先级任务分配显存资源,以至少根据显存空间的张量数据,运行高优先级任务。
可选的,所述分配器,还用于向所述协调器发送所述任务的显存资源使用状况信息;
所述协调器,还用于若所述信息满足显存资源释放条件,则向所述分配器发送显存资源释放指令。
可选的,所述分配器,具体用于根据预设周期,向所述协调器发送所述信息。
本申请还提供一种机器学习系统,包括:
客户端,用于向服务端发送机器学习任务的优先级信息;
服务端,用于确定通过图形处理单元运行的多个机器学习任务的优先级;若要为高优先级任务分配显存资源、且可分配显存资源小于高优先级任务的显存资源需求量,则释放低优先级任务占用的至少一部分显存资源;为高优先级任务分配显存资源,以至少根据显存空间的张量数据,运行高优先级任务。
本申请还提供一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当其在计算机上运行时,使得计算机执行上述各种方法。
本申请还提供一种包括指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述各种方法。
与现有技术相比,本申请具有以下优点:
本申请实施例提供的显存管理方法,通过确定通过图形处理单元运行的多个机器学习任务的优先级;若要为高优先级任务分配显存资源、且可分配显存资源小于高优先级任务的显存资源需求量,则释放低优先级任务占用的至少一部分显存资源;为高优先级任务分配显存资源,以至少根据显存空间的张量数据,运行高优先级任务;这种处理方式,使得在可分配显存资源不足时,将低优先级任务占用的显存资源分配给高优先级任务使用,由此实现对在一个GPU上并行的多个机器学习任务占用的GPU显存资源进行动态伸缩优化,这样就可以在保障高优先级任务性能的前提下,把GPU显存资源分配给其它任务使用;因此,可以有效提升整体集群的资源利用率,同时保障高优先级任务性能。
本申请实施例提供的显存管理方法,通过图形处理单元运行机器学习任务;确定所述机器学习任务的显存资源使用状况信息;若所述信息满足显存资源释放条件,则释放 所述任务占用的空闲显存资源,以便将所述空闲显存资源分配给通过所述图形处理单元并行运行的其它机器学习任务;这种处理方式,使得及时释放任务占用的空闲显存资源;因此,可以有效提升整体集群的资源利用率。
本申请实施例提供的显存管理系统,通过存储资源协调器确定通过图形处理单元运行的多个机器学习任务的优先级;若要为高优先级任务分配显存资源、且可分配显存资源小于高优先级任务的显存资源需求量,则向低优先级任务的第一存储资源分配器发送显存资源释放指令,向高优先级任务的第二存储资源分配器发送显存资源分配指令;第一存储资源分配器根据所述释放指令,释放低优先级任务占用的至少一部分显存资源;第二存储资源分配器根据所述分配指令,为高优先级任务分配显存资源,以至少根据显存空间的张量数据,运行高优先级任务;这种处理方式,使得可通过GPU设备本地的存储资源协调器,在可分配显存资源不足时,将低优先级任务占用的显存资源分配给高优先级任务使用,由此实现对在一个GPU上并行的多个机器学习任务占用的GPU显存资源进行动态伸缩优化,这样就可以在保障高优先级任务性能的前提下,把GPU显存资源分配给其它任务使用;因此,可以有效提升整体集群的资源利用率,同时保障高优先级任务性能。
本申请实施例提供的机器学习系统,通过客户端向服务端发送机器学习任务的优先级信息;服务端确定通过图形处理单元运行的多个机器学习任务的优先级;若要为高优先级任务分配显存资源、且可分配显存资源小于高优先级任务的显存资源需求量,则释放低优先级任务占用的至少一部分显存资源;为高优先级任务分配显存资源,以至少根据显存空间的张量数据,运行高优先级任务;这种处理方式,使得在可分配显存资源不足时,将低优先级任务占用的显存资源分配给高优先级任务使用,由此实现对在一个GPU上并行的多个机器学习任务占用的GPU显存资源进行动态伸缩优化,这样就可以在保障高优先级任务性能的前提下,把GPU显存资源分配给其它任务使用;因此,可以有效提升整体集群的资源利用率,同时保障高优先级任务性能。
附图说明
图1本申请提供的一种显存管理方法的实施例的流程示意图;
图2本申请提供的一种显存管理方法的实施例的应用场景示意图;
图3本申请提供的一种显存管理方法的实施例的显存资源动态伸缩示意图;
图4本申请提供的一种显存管理方法的实施例的显存资源变化示意图;
图5本申请提供的一种显存管理装置的实施例的结构示意图;
图6本申请提供的一种显存管理系统的实施例的结构示意图。
具体实施方式
在下面的描述中阐述了很多具体细节以便于充分理解本申请。但是本申请能够以很多不同于在此描述的其它方式来实施,本领域技术人员可以在不违背本申请内涵的情况下做类似推广,因此本申请不受下面公开的具体实施的限制。
在本申请中,提供了显存管理方法、装置和系统,机器学习系统,以及电子设备。在下面的实施例中逐一对各种方案进行详细说明。
第一实施例
请参考图1,其为本申请的显存管理方法的实施例的流程示意图。本实施例提供的方法可包括如下步骤:
步骤S101:确定通过图形处理单元运行的多个机器学习任务的优先级。
本申请提供的显存管理方法可应用在机器学习系统中,用于在多个机器学习任务共享GPU显存资源时,对GPU显存资源进行分配和使用。所述机器学习系统可以是基于深度学习计算框架构建的深度学习系统,如TensorFlow、PyTorch等深度学习计算框架。所述机器学习任务,又称为机器学习模型训练任务,可从训练数据中学习机器学习模型。所述模型,可以是基于深度神经网络的模型,相应的,机器学习任务是深度学习任务。例如,所述模型为从训练数据中学习得到的命名实体识别模型、语音识别模型、商品推荐模型等。所述模型,也可以是决策树等非神经网络的机器学习模型。
为了更加直观地说明本申请提供的显存管理方法,下面首先对该方法的应用场景进行说明。如图2所示,机器学习系统可以是分布式机器学习系统,包括一个或者多个基于GPU的计算节点,又称为GPU设备。其中,每个GPU设备可包括一个或者多个GPU。在一个GPU上可并行运行多个机器学习任务,这些机器学习任务共享该GPU的显存资源。此外,GPU设备还包括中央处理器(CPU)及内存,其中CPU又可称为GPU的宿主机。在图2中,节点1包括GPU1和GPU2,在GPU1上同时运行任务D和任务E,在GPU2上同时运行任务A、任务B和任务C。
本申请提供的显存管理方法,可根据在一个GPU上并行的多个任务的性能保障优先级(简称优先级),任务的优先级越高,其性能越需要被保障,动态调整各个任务占用的显存资源。所述任务的优先级可根据应用需求确定,如只设置两个优先级:高优先级和低优先级。例如,命名实体识别模型的学习任务1为有服务水平保证的“性能保障任务”,语音识别模型的学习任务2为没有服务水平保证的“投机执行任务”,则可将任务1设置为高优先级,将任务2设置为低优先级。
具体实施时,也可以设置多个优先级。表1示出了一个示例中机器学习任务的优先级设置信息。
任务标识 任务优先级
任务1(命名实体识别模型) 一级(最高优先级)
任务2(语音识别模型) 二级(次高优先级)
任务3(商品推荐模型) 二级(次高优先级)
任务4(语言模型) 三级(最低优先级)
 
表1、机器学习任务表
由表1可见,本实施例中设置了三个任务优先级,其中一级为高优先级任务,二级为次高优先级,三级为最低优先级。在这种情况下,如果并行任务中有一级任务,则首先要保障一级任务的性能;如果不包括一级任务,则要保障二级任务的性能。
步骤S103:若要为高优先级任务分配显存资源、且可分配显存资源小于高优先级任务的显存资源需求量,则释放低优先级任务占用的至少一部分显存空间。
本申请提供的显存管理方法,在为高优先级任务分配显存资源时,如果GPU上可分配显存资源小于高优先级任务的显存资源需求量,则要释放低优先级任务占用的部分或者全部显存空间,将释放的显存空间分配给高优先级任务,以便尽可能在显存中存储高优先级任务的张量数据,这样可以确保高优先级任务的运行性能。
所述显存资源需求量,可以是正在运行的深度学习任务要增加的显存资源,也可以是将要运行的深度学习任务初始需要的显存资源。所述显存资源需求量,可由深度学习任务确定。由于显存资源需求量的确定方式属于较为成熟的现有技术,因此此处不再赘述。
在本实施例中,如果一个GPU的机器学习任务表中的某个深度学习任务要运行,则先确定该任务需要的显存资源。在确定一个任务需要的显存资源后,如果该GPU的可分配显存资源小于该任务的显存资源需求量,则要确定该任务的优先级、及该GPU中正在运行的其它任务的优先级,如果将要运行的任务的优先级高于正在运行的其它任务的优先级,则为了保障高优先级任务的性能,需要释放低优先级任务占用的显存资源。
具体实施时,可以根据将要运行的任务的显存资源需求量,确定需要释放的显存资源量。例如,高优先级任务需要1G显存资源,则可释放1.2G低优先级任务占用的显存资源。
具体实施时,可以释放一个或者多个低优先级任务占用的显存资源。例如,高优先级任务需要5G显存资源,而释放一个低优先级任务占用的显存资源并不够用,则可多个低优先级任务占用的显存资源,以释放大于5G的显存资源。
在一个示例中,所述方法还可包括如下步骤:释放所述多个机器学习任务占用的空闲显存资源。在本发明实现过程中,发明人发现大部分深度学习任务无法在所有时间完全利用所有分配给它的GPU显存资源,通常存在空闲显存资源。发明人经过研究发现,深度学习任务的空闲显存资源可能由以下几点原因产生。
1)面向产品的深度学习训练任务通常会包含很多的计算部分,其中有一些并不容易并行化,因而难以完全占满GPU显存资源,如图神经网络中的图采样、广告模型中的特 征提取、计算机视觉里的数据增强等。
2)深度学习系统面临海量训练数据,随着数据的增多,在超大规模的分布式训练中,很多时间会花费在模型的网络数据同步阶段,如图2中任务E同时在节点1和节点n-1上运行,相应的在数据同步时间段内GPU显存资源也是空闲的。
3)分布式深度学习训练通常使用同步随机梯度下降(SGD)的方法,这要求训练任务所需要的资源需要同时被满足,这样训练任务才会开始。因此,从集群调度器的角度,在资源不足的时候,调度器需要为分布式任务预留部分的可用资源,直到所有需要的资源都被满足,这个预留的过程也导致了GPU显存资源处于空闲等待的状态。
4)在实际应用中,有一些张量数据仅在特定的深度学习训练阶段中使用,如数据处理、模型评估等阶段,而过了这个阶段就不会再用到这些张量数据,会从显存中清除这些数据,由此产生空闲显存资源。
然而,在现有的深度学习框架内将不会释放上述空闲显存资源,而是为该任务始终保有这部分显存资源,该任务占用的整体显存资源并没有减少,只是其中一部分为空闲显存,该任务可在后续运行过程中,可能会利用这部分可用显存,如存储其它张量数据,也可能不会利用这部分空闲显存资源,因此导致很多时候GPU资源仍然处于相对低的利用率状态。
本实施例提供的方法,基于上述显存资源使用状况,将任务未利用完的空闲显存资源释放,这样就可以及时分配给其它任务使用,实现在共享GPU资源场景内优化多个任务的处理,避免其它任务排队等待,从而提升GPU资源利用率,进而可提升共享GPU集群的吞吐量。
具体实施时,所述释放所述多个机器学习任务占用的空闲显存资源,可包括如下子步骤:1)确定所述机器学习任务的显存资源使用状况信息;2)若所述信息满足显存资源释放条件,则释放所述空闲显存资源。所述使用状况信息包括:所述任务实际使用显存资源的上限值;所述释放条件包括:所述任务的显存资源分配量大于所述上限值的时长达到时长阈值。例如,在任务运行的过程中,每10秒确定一次实际使用显存资源的上限值(波峰值),当连续30秒(3次确定所述使用状况信息后)都呈现任务占用的显存资源比真实需要的波峰值大的情况,就会释放所述空闲显存资源。
在一个示例中,所述释放低优先级任务占用的至少一部分显存空间,可包括如下子步骤:若低优先级任务的空闲显存资源大于或者等于显存资源需求量,则释放低优先级任务占用的空闲显存资源。在这种情况下,使得这两类任务都可基于显存空间存储的数据运行,这样不仅可以保障高优先级任务的运行性能,还可避免影响低优先级任务的性能。
在另一个示例中,所述释放低优先级任务占用的至少一部分显存空间,还可包括如下子步骤:若低优先级任务的空闲显存资源小于显存资源需求量,则为低优先级任务分 配内存资源,并释放低优先级任务使用的至少一部分显存资源,以至少根据内存空间的张量数据,继续运行低优先级任务。
所述低优先级任务使用的显存资源不属于空闲显存资源,低优先级任务的张量数据仍然存储在这部分显存资源中,低优先级任务需要该资源才能保障运行性能。然而,由于释放了一部分或者全部所述低优先级任务使用的显存资源,而暂时将低优先级任务的将一部分或者全部张量数据从显存空间切换到内存空间存储,因此低优先级任务至少会根据内存空间的张量数据继续运行。如果只是释放了所述低优先级任务使用的一部分显存资源,则低优先级任务可同时基于内存空间的一部分张量数据、及显存空间的另一部分张量数据,继续运行低优先级任务。如果释放了所述低优先级任务使用的全部显存资源,则可基于内存空间的全部张量数据,继续运行低优先级任务。由此可见,无论哪种情况都不会导致任务失败。
采用这种处理方式,使得如果在释放低优先级任务的空闲显存资源后,仍不能满足高优先级任务的显存资源需求时,就需要进一步释放低优先级任务的张量数据占用的显存资源,将低优先级任务的全部或者部分张量数据转存至GPU宿主机的内存空间存储,至少基于内存空间的张量数据运行低优先级任务,这样就可以避免低优先级任务与高优先级任务竞争显存资源,确保高优先级任务的性能,其代价是牺牲了低优先级任务的部分性能。
如图3所示,子图a显示了一个深度学习任务在初始阶段占用的显存资源,包括一个张量的数据,其中虚线高度表示占用显存资源的水位。子图b显示了随着该任务所需要的张量数据相应增多,显存使用上限也在增长,这些张量数据可缓存在该任务的内存分配器(显存资源分配器)的池子里,即缓存在为该任务分配的显存空间中,可使得该任务的下一个小批量的张量数据继续复用,该任务增大后的显存资源可容纳三个张量的数据。随着任务的持续运行,有一些张量数据仅在特定的深度学习训练阶段中使用,而过了这个阶段就不会再用到这些张量数据,会从显存中清除这些数据,由此产生空闲显存资源。子图c显示了清除一个张量后,出现空闲显存资源,显存资源水位没有变化。
本申请实施例提供的方法,按照任务实际所需要的显存资源来动态调整显存使用上限。在本实施例中,主动检测当前正在使用的显存资源,释放空闲显存资源,从而调整其显存使用上限值到一个恰当的数值。子图d显示了上述可用显存一段时间始终没有使用后,将该部分可用显存(空闲显存资源)回收(释放)后的显存资源水位。
在本实施例中,深度学习系统对外暴露了接口,允许在任务运行时,上调或者缩减任务的GPU显存使用上限,甚至可缩小到比该任务的真实需求更低的数值。子图e可以是来了一个高优先级任务时,可分配显存不够用,将低优先级的一个张量的数据转存至宿主机的内存空间的情况,这样就可以把释放出来的显存资源分配给高优先级任务使用,以保障高优先级任务的性能。采用这种处理方式,即便没有足够的显存资源,低优先级 任务仍然可以继续运行,并不会失败。
在一个示例中,所述方法还可包括如下步骤:若可分配显存资源增长到所述低优先级任务的显存资源需求量,则为所述低优先级任务分配显存资源,以根据显存空间的张量数据,继续运行所述低优先级任务。采用这种处理方式,使得当GPU显存资源不再紧缺,可上调低优先级任务的显存上限,低优先级任务的张量也可以在GPU上被重新分配。子图f显示当GPU显存资源不再紧缺时,重新将低优先级任务的张量数据从内存空间存储至显存空间的情况,以尽可能提升低优先级任务的性能。
在一个示例中,所述低优先级任务包括:迭代学习任务;所述释放低优先级任务使用的显存资源,可采用如下方式实现:在低优先级任务完成当前迭代学习后,释放所述使用的显存资源。
在深度学习场景中,要对训练集执行梯度下降算法,通常会将整个数据集分为若干的小的训练集,每一次对一个小的子集进行训练。这样,一方面可避免全部数据集一次性参与训练带来的计算量巨大的问题;另一方面,子集的梯度方向和整个数据集差别不会太大,可保证训练的正确性。这种训练方式又称为小批量训练方式。将这种学习任务称为迭代学习任务。在每次进行小批量训练时,要在存储空间中存储该次训练涉及的各种数据,存储这些数据的容器称为张量(Tensor)。张量是深度学习框架中存储数据的数据单元,作为数据容器可存储训练过程中的数据,一个任务在一次训练中可包括多个张量。张量数据可以是由一组原始值组成的一个任意数量的多维度数组,包括一次小批量训练的训练数据,还可包括训练过程中产生的模型参数,以及网络中各种中间节点的数据等。
在本实施例中,通过在显存和内存间搬移张量数据,实现在保障高优先级任务的运行性能的前提下,提升GPU显存资源的利用率。采用在显存和内存间搬移张量数据的处理方式,使得可以在显存资源不足的时候将内存资源来当显存资源使用,但这通常会引入巨大的数据拷贝开销。
本发明的发明人通过研究深度学习任务独特的特性,即上述迭代学习的特性,发现在深度学习任务中,训练以小批量的方式进行,张量数据将在一个小批量内被创建和销毁,而相同的张量则将被在小批量之间反复创建,因此选择在小批量的边界上动态调整任务的显存使用上限,此时张量数据已经被释放,这样就可以避免显式地在显存和内存间进行数据拷贝,从而避免了巨大的拷贝开销。
具体实施时,在所述释放低优先级任务使用的显存资源前,所述方法还可包括如下步骤:为高优先级任务分配一部分内存资源和一部分显存资源,以根据内存空间的张量数据和显存空间的张量数据,运行高优先级任务。
具体实施时,在所述为高优先级任务分配显存资源后,所述方法还可包括如下步骤:释放高优先级任务的内存资源,这样就可以完全基于显存空间的张量数据,运行高优先 级任务。
如图4中的a所示,高优先级任务在T0时刻突然激增显存需求,显存需求量为a,而此时可分配显存不够用,于是先在内存空间中占用a,暂时基于内存中的张量数据运行高优先级任务。在T1时刻,低优先级任务的一次小批量训练完成后,将低优先级任务的张量数据搬移至内存,并将低优先级任务释放的显存资源a分配给高优先级任务使用,同时释放高优先级任务占用的内存资源a,使得高优先级任务的显存资源增长了a,并在显存上继续执行高优先级任务。采用这种处理方式,仅在极短时间内降低了高优先级任务的性能,但不会导致失败,同时可避免巨大的拷贝开销。
如图4中的b所示,在T0到T1时间段内释放低优先级任务使用的显存资源b,并在内存空间中占用b,基于内存空间的张量数据运行低优先级任务,损失了部分运行性能。此后,在其它任务释放显存资源后,T1时刻低优先级任务的显存资源增长了b,继续基于显存空间的张量数据运行低优先级任务。
在一个示例中,不仅可释放低优先级任务占用的显存空间,还可释放其它高优先级任务占用的空闲显存空间。采用这种处理方式,可以释放更多的显存资源,有效提升低优先级任务性能。
在一个示例中,所述方法还可以包括步骤S105:根据低优先级任务释放的显存空间,为高优先级任务分配显存空间,以至少根据显存中的张量数据,运行高优先级任务。
从上述实施例可见,本申请实施例提供的显存管理方法,通过确定通过图形处理单元运行的多个机器学习任务的优先级;若要为高优先级任务分配显存空间、且可分配显存空间小于高优先级任务的显存空间需求量,则释放低优先级任务的至少一部分显存空间;根据低优先级任务释放的显存空间,为高优先级任务分配显存空间,以至少根据显存中的张量数据,运行高优先级任务;这种处理方式,使得在可分配显存资源不足时,将低优先级任务占用的显存资源分配给高优先级任务使用,实现动态伸缩优化在一个GPU上并行的多个机器学习任务占用的GPU显存资源,这样就可以在保障高优先级任务性能的前提下,把GPU显存资源分配给其它任务;因此,可以有效提升整体集群的资源利用率,同时保障高优先级任务性能。
第二实施例
在上述的实施例中,提供了一种显存管理方法,与之相对应的,本申请还提供一种显存管理装置。该装置是与上述方法的实施例相对应。由于装置实施例基本相似于方法实施例,所以描述得比较简单,相关之处参见方法实施例的部分说明即可。下述描述的装置实施例仅仅是示意性的。
请参考图5,其为本申请的显存管理装置的实施例的结构示意图。本申请另外提供一种显存管理装置,包括:
优先级确定单元501,用于确定通过图形处理单元运行的多个机器学习任务的优先 级;
显存释放单元502,用于若要为高优先级任务分配显存资源、且可分配显存资源小于高优先级任务的显存资源需求量,则释放低优先级任务占用的至少一部分显存资源;
显存分配单元503,用于为高优先级任务分配显存资源,以至少根据显存空间的张量数据,运行高优先级任务。
第三实施例
在上述的实施例中,提供了一种显存管理方法,与之相对应的,本申请还提供一种电子设备。该装置是与上述方法的实施例相对应。由于设备实施例基本相似于方法实施例,所以描述得比较简单,相关之处参见方法实施例的部分说明即可。下述描述的设备实施例仅仅是示意性的。
本实施例的一种电子设备,该电子设备包括:处理器和存储器;存储器,用于存储实现显存管理方法的程序,该设备通电并通过所述处理器运行该方法的程序后,执行下述步骤:确定通过图形处理单元运行的多个机器学习任务的优先级;若要为高优先级任务分配显存空间、且可分配显存空间小于高优先级任务的显存空间需求量,则释放低优先级任务的至少一部分显存空间;根据低优先级任务释放的显存空间,为高优先级任务分配显存空间,以至少根据显存中的张量数据,运行高优先级任务。
第四实施例
与上述的显存管理方法相对应,本申请还提供一种显存管理方法。本实施例与第一实施例内容相同的部分不再赘述,请参见实施例一中的相应部分。
在本实施例中,所述方法可包括如下步骤:
步骤1:通过图形处理单元运行机器学习任务。
步骤2:确定所述机器学习任务的显存资源使用状况信息。
步骤3:若所述信息满足显存资源释放条件,则释放所述任务占用的空闲显存资源,以便将所述空闲显存资源分配给通过所述图形处理单元并行运行的其它机器学习任务。
所述机器学习任务,包括但不限于:深度学习任务。所述空闲显存资源,可以包括以下资源的至少一项:深度学习任务中无法并行处理的模块产生的空闲显存资源,等待深度学习任务需要的多个资源被满足时产生的空闲显存资源。
所述机器学习任务,可以是分布式深度学习任务。所述空闲显存资源,可以分布式深度学习任务对应的多个图像处理单元同步数据时产生的空闲显存资源。
具体而言,深度学习任务的空闲显存资源可能由以下几点原因产生。
1)深度学习任务中无法并行处理的模块产生的空闲显存资源。面向产品的深度学习训练任务通常会包含很多的计算部分,其中有一些并不容易并行化,因而难以完全占满GPU显存资源,如图神经网络中的图采样、广告模型中的特征提取、计算机视觉里的数据增强等。
2)分布式深度学习任务对应的多个图像处理单元同步数据时产生的空闲显存资源。深度学习系统面临海量训练数据,随着数据的增多,在超大规模的分布式训练中,很多时间会花费在模型的网络数据同步阶段,如图2中任务E同时在节点1和节点n-1上运行,相应的在数据同步时间段内GPU显存资源也是空闲的。
3)等待深度学习任务需要的多个资源被满足时产生的空闲显存资源。分布式深度学习训练通常使用同步随机梯度下降(SGD)的方法,这要求训练任务所需要的资源需要同时被满足,这样训练任务才会开始。因此,从集群调度器的角度,在资源不足的时候,调度器需要为分布式任务预留部分的可用资源,直到所有需要的资源都被满足,这个预留的过程也导致了GPU显存资源处于空闲等待的状态。
4)在实际应用中,有一些张量数据仅在特定的深度学习训练阶段中使用,如数据处理、模型评估等阶段,而过了这个阶段就不会再用到这些张量数据,会从显存中清除这些数据,由此产生空闲显存资源。
所述使用状况信息,包括但不限于:所述任务实际使用显存资源的上限值;所述释放条件,包括但不限于:所述任务的显存资源分配量大于所述上限值的时长达到时长阈值。所述时长阈值,可根据应用需求确定,如设置为30秒。
例如,在任务运行的过程中,每10秒确定一次实际使用显存资源的上限值(波峰值),当连续30秒(3次确定所述使用状况信息后)都呈现任务占用的显存资源比真实需要的波峰值大的情况,就会释放所述空闲显存资源。
从上述实施例可见,本申请实施例提供的显存管理方法,通过图形处理单元运行机器学习任务;确定所述机器学习任务的显存资源使用状况信息;若所述信息满足显存资源释放条件,则释放所述任务占用的空闲显存资源,以便将所述空闲显存资源分配给通过所述图形处理单元并行运行的其它机器学习任务;这种处理方式,使得及时释放任务占用的空闲显存资源;因此,可以有效提升整体集群的资源利用率。
第五实施例
在上述的实施例中,提供了一种显存管理方法,与之相对应的,本申请还提供一种显存管理装置。该装置是与上述方法的实施例相对应。由于装置实施例基本相似于方法实施例,所以描述得比较简单,相关之处参见方法实施例的部分说明即可。下述描述的装置实施例仅仅是示意性的。
在本实施例中,所述装置包括:
任务运行单元,用于通过图形处理单元运行机器学习任务。
信息确定单元,用于确定所述机器学习任务的显存资源使用状况信息。
显存释放单元,用于若所述信息满足显存资源释放条件,则释放所述任务占用的空闲显存资源,以便将所述空闲显存资源分配给通过所述图形处理单元并行运行的其它机器学习任务。
第六实施例
在上述的实施例中,提供了一种显存管理方法,与之相对应的,本申请还提供一种电子设备。该装置是与上述方法的实施例相对应。由于设备实施例基本相似于方法实施例,所以描述得比较简单,相关之处参见方法实施例的部分说明即可。下述描述的设备实施例仅仅是示意性的。
本实施例的一种电子设备,该电子设备包括:处理器和存储器;存储器,用于存储实现显存管理方法的程序,该设备通电并通过所述处理器运行该方法的程序后,执行下述步骤:通过图形处理单元运行机器学习任务;确定所述机器学习任务的显存资源使用状况信息;若所述信息满足显存资源释放条件,则释放所述任务占用的空闲显存资源,以便将所述空闲显存资源分配给通过所述图形处理单元并行运行的其它机器学习任务。
第七实施例
与上述的显存管理方法相对应,本申请还提供一种显存管理系统。本实施例与第一实施例内容相同的部分不再赘述,请参见实施例一中的相应部分。
请参考图6,其为本申请的显存管理装置的实施例的结构示意图。本申请提供的一种显存管理系统包括:存储资源协调器,低优先级任务(如任务A)的存储资源分配器(第一存储资源分配器),高优先级任务(如任务B)的存储资源分配器(第二存储资源分配器)。
本实施例提供的系统,通过所述存储资源协调器与所述机器学习计算框架协同设计,实现针对机器学习任务的自适应的GPU显存资源动态伸缩优化。如图6所示,所述存储资源协调器支持自适应GPU显存资源调节,可部署在GPU计算节点中,用于对该节点中一个或者多个GPU的显存资源进行调度管理,为在一个GPU上的多个任务动态调节GPU显存资源的使用量。所述机器学习计算框架与机器学习任务相对应,每个任务通过机器学习计算框架运行。
所述机器学习计算框架,可以是端到端的机器学习平台,各种机器学习框架可拥有各自的生态系统,其中包含各种工具、库等资源,可助力开发者能够轻松地构建和部署由机器学习提供支持的应用。在本实施例中,机器学习计算框架为深度学习计算框架,包括但不限于TensorFlow、PyTorch、MXNet、Caffe等。
其中,存储资源协调器用于确定通过图形处理单元运行的多个机器学习任务的优先级;若要为高优先级任务分配显存资源、且可分配显存资源小于高优先级任务的显存资源量,则向低优先级任务的存储资源分配器发送显存资源释放指令,向高优先级任务的存储资源分配器(第二存储资源分配器)发送显存资源分配指令;低优先级任务的存储资源分配器用于根据所述释放指令,释放低优先级任务占用的至少一部分显存资源;高优先级任务的存储资源分配器用于根据所述分配指令,为高优先级任务分配显存资源,以至少根据显存空间的张量数据,运行高优先级任务。
在一个示例中,所述分配器还用于向所述协调器发送所述任务的显存资源使用状况信息。具体实施时,所述分配器可根据预设周期,向所述协调器发送所述信息。相应的,所述协调器还用于若所述信息满足显存资源释放条件,则向所述分配器发送显存资源释放指令。
从上述实施例可见,本申请实施例提供的显存管理系统,通过存储资源协调器确定通过图形处理单元运行的多个机器学习任务的优先级;若要为高优先级任务分配显存资源、且可分配显存资源小于高优先级任务的显存资源需求量,则向低优先级任务的第一存储资源分配器发送显存资源释放指令,向高优先级任务的第二存储资源分配器发送显存资源分配指令;第一存储资源分配器根据所述释放指令,释放低优先级任务占用的至少一部分显存资源;第二存储资源分配器根据所述分配指令,为高优先级任务分配显存资源,以至少根据显存空间的张量数据,运行高优先级任务;这种处理方式,使得可通过GPU设备本地的存储资源协调器,在可分配显存资源不足时,将低优先级任务占用的显存资源分配给高优先级任务使用,由此实现对在一个GPU上并行的多个机器学习任务占用的GPU显存资源进行动态伸缩优化,这样就可以在保障高优先级任务性能的前提下,把GPU显存资源分配给其它任务使用;因此,可以有效提升整体集群的资源利用率,同时保障高优先级任务性能。
第八实施例
与上述的显存管理方法相对应,本申请还提供一种机器学习系统。本实施例与第一实施例内容相同的部分不再赘述,请参见实施例一中的相应部分。本申请提供的一种机器学习系统包括:客户端和服务端。
所述客户端用于向服务端发送机器学习任务的优先级信息;服务端用于确定通过图形处理单元运行的多个机器学习任务的优先级;若要为高优先级任务分配显存资源、且可分配显存资源小于高优先级任务的显存资源需求量,则释放低优先级任务占用的至少一部分显存资源;为高优先级任务分配显存资源,以至少根据显存空间的张量数据,运行高优先级任务。
所述客户端包括但不限于移动通讯设备,即:通常所说的手机或者智能手机,还包括个人电脑、PAD、iPad等终端设备。所述服务端,可在GPU集群上运行机器学习任务。
在本实施例中,可通过客户端向用户提供任务服务装置,用户可通过客户端的任务服务装置,确定要运行的机器学习任务,并设置任务优先级,如选择“二级”优先级,不同优先级可能需要支付不同的服务费用。在确定要运行的机器学习任务,并设置任务优先级后,可通过客户端向服务端提交任务运行请求。所述服务端响应该请求,可存储该任务的优先级信息,并将该任务存储至任务表中。
在本实施例中,所述服务端可包括一个或多个GPU计算节点,每个节点可通过机器学习框架运行机器学习任务,部署在GPU计算节点中的存储资源协调器可从服务端获取 要通过该计算节点运行的多个任务的优先级信息。表2示出了本实施例中机器学习任务信息。
Figure PCTCN2021127856-appb-000001
表2、机器学习任务表
由表2可见,用户A的命名实体识别模型的学习任务1为“性能保障任务”,用户B的命名实体识别模型的学习任务2为“投机执行任务”,“性能保障任务”的优先级高于“投机执行任务”的优先级。
在确定通过图形处理单元运行的多个机器学习任务的优先级后,就可以执行如下处理:若要为高优先级任务分配显存资源、且可分配显存资源小于高优先级任务的显存资源需求量,则释放低优先级任务占用的至少一部分显存资源;为高优先级任务分配显存资源,以至少根据显存空间的张量数据,运行高优先级任务。
在一个示例中,所述服务端还可用于确定机器学习任务的性能信息;根据所述性能信息,调整所述任务的优先级。
例如,任务A的优先级原来为二级,但通过所述系统并没有使得该任务的性能达到用户要求的“二级”服务水平要求,那么可将该任务的优先级调整为一级,使得其真实性能达到用户要求的“二级”服务水平要求。
具体实施时,所述服务端可记录机器学习任务在运行过程中的显存资源变化信息,根据这些变化信息,调整该任务的优先级信息。例如,如果高优先级任务30%的时间基于内存空间的张量数据运行,该任务的性能信息没有达到服务水平要求,则可调高该任务的优先级。
在另一个示例中,所述服务端还可用于确定机器学习任务的性能信息;根据所述性能信息,确定可服务的机器学习任务。
例如,可通过所述系统达到任务A和任务B的服务水平要求,但无法达到任务C的服务水平要求,则可为任务A和任务B提供显存资源管理服务。
从上述实施例可见,本申请实施例提供的机器学习系统,通过客户端向服务端发送机器学习任务的优先级信息;服务端确定通过图形处理单元运行的多个机器学习任务的优先级;若要为高优先级任务分配显存资源、且可分配显存资源小于高优先级任务的显 存资源需求量,则释放低优先级任务占用的至少一部分显存资源;为高优先级任务分配显存资源,以至少根据显存空间的张量数据,运行高优先级任务;这种处理方式,使得在可分配显存资源不足时,将低优先级任务占用的显存资源分配给高优先级任务使用,由此实现对在一个GPU上并行的多个机器学习任务占用的GPU显存资源进行动态伸缩优化,这样就可以在保障高优先级任务性能的前提下,把GPU显存资源分配给其它任务使用;因此,可以有效提升整体集群的资源利用率,同时保障高优先级任务性能。
本申请虽然以较佳实施例公开如上,但其并不是用来限定本申请,任何本领域技术人员在不脱离本申请的精神和范围内,都可以做出可能的变动和修改,因此本申请的保护范围应当以本申请权利要求所界定的范围为准。
在一个典型的配置中,计算设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。
内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。
1、计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括非暂存电脑可读媒体(transitory media),如调制的数据信号和载波。
2、本领域技术人员应明白,本申请的实施例可提供为方法、系统或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。

Claims (21)

  1. 一种显存管理方法,其特征在于,包括:
    确定通过图形处理单元运行的多个机器学习任务的优先级;
    若要为高优先级任务分配显存资源、且可分配显存资源小于高优先级任务的显存资源需求量,则释放低优先级任务占用的至少一部分显存资源;
    为高优先级任务分配显存资源,以至少根据显存空间的张量数据,运行高优先级任务。
  2. 根据权利要求1所述的方法,其特征在于,还包括:
    释放所述多个机器学习任务占用的空闲显存资源。
  3. 根据权利要求2所述的方法,其特征在于,所述释放所述多个机器学习任务占用的空闲显存资源,包括:
    确定所述机器学习任务的显存资源使用状况信息;
    若所述信息满足显存资源释放条件,则释放所述空闲显存资源。
  4. 根据权利要求3所述的方法,其特征在于,
    所述使用状况信息包括:所述任务实际使用显存资源的上限值;
    所述释放条件包括:所述任务的显存资源分配量大于所述上限值的时长达到时长阈值。
  5. 根据权利要求1所述的方法,其特征在于,还包括:
    释放其它高优先级任务的空闲显存资源。
  6. 根据权利要求1所述的方法,其特征在于,
    所述释放低优先级任务占用的至少一部分显存资源,包括:
    若低优先级任务的空闲显存资源大于或者等于显存资源需求量,则释放低优先级任务占用的空闲显存资源。
  7. 根据权利要求1或6所述的方法,其特征在于,
    所述释放低优先级任务占用的至少一部分显存资源,包括:
    若低优先级任务的空闲显存资源小于显存资源需求量,则为低优先级任务分配内存资源,并释放低优先级任务使用的至少一部分显存资源,以至少根据内存空间的张量数据,继续运行低优先级任务。
  8. 根据权利要求7所述的方法,其特征在于,还包括:
    若可分配显存资源增长到所述低优先级任务的显存资源需求量,则为所述低优先级任务分配显存资源,以根据显存空间的张量数据,继续运行所述低优先级任务。
  9. 根据权利要求7所述的方法,其特征在于,
    所述低优先级任务包括:迭代学习任务;
    所述释放低优先级任务使用的显存资源,包括:
    在低优先级任务完成当前迭代学习后,释放所述使用的至少一部分显存资源。
  10. 根据权利要求9所述的方法,其特征在于,
    在所述释放低优先级任务使用的至少一部分显存资源前,还包括:
    为高优先级任务分配内存资源,以根据内存空间的张量数据和显存空间的张量数据,运行高优先级任务。
  11. 根据权利要求10所述的方法,其特征在于,
    在所述为高优先级任务分配显存资源后,还包括:
    释放高优先级任务的内存资源。
  12. 根据权利要求1所述的方法,其特征在于,
    所述机器学习任务包括分布式深度学习任务。
  13. 一种显存管理方法,其特征在于,包括:
    通过图形处理单元运行机器学习任务;
    确定所述机器学习任务的显存资源使用状况信息;
    若所述信息满足显存资源释放条件,则释放所述任务占用的空闲显存资源,以便将所述空闲显存资源分配给通过所述图形处理单元并行运行的其它机器学习任务。
  14. 一种显存管理装置,其特征在于,包括:
    优先级确定单元,用于确定通过图形处理单元运行的多个机器学习任务的优先级;
    显存释放单元,用于若要为高优先级任务分配显存资源、且可分配显存资源小于高优先级任务的显存资源需求量,则释放低优先级任务占用的至少一部分显存资源;
    显存分配单元,用于为高优先级任务分配显存资源,以至少根据显存空间的张量数据,运行高优先级任务。
  15. 一种电子设备,其特征在于,包括:
    处理器和存储器;
    存储器,用于存储实现根据权利要求1至12中任一项所述的显存管理方法的程序,该设备通电并通过所述处理器运行该方法的程序。
  16. 一种显存管理装置,其特征在于,包括:
    任务运行单元,用于通过图形处理单元运行机器学习任务;
    信息确定单元,用于确定所述机器学习任务的显存资源使用状况信息;
    显存释放单元,用于若所述信息满足显存资源释放条件,则释放所述任务占用的空闲显存资源,以便将所述空闲显存资源分配给通过所述图形处理单元并行运行的其它机器学习任务。
  17. 一种电子设备,其特征在于,包括:
    处理器和存储器;
    存储器,用于存储实现根据权利要求13所述的显存管理方法的程序,该设备通电并通过所述处理器运行该方法的程序。
  18. 一种显存管理系统,其特征在于,包括:
    存储资源协调器,用于确定通过图形处理单元运行的多个机器学习任务的优先级;若要为高优先级任务分配显存资源、且可分配显存资源小于高优先级任务的显存资源量,则向低优先级任务的第一存储资源分配器发送显存资源释放指令,向高优先级任务的第二存储资源分配器发送显存资源分配指令;
    第一存储资源分配器,用于根据所述释放指令,释放低优先级任务占用的至少一部分显存资源;
    第二存储资源分配器,用于根据所述分配指令,为高优先级任务分配显存资源,以至少根据显存空间的张量数据,运行高优先级任务。
  19. 根据权利要求18所述的系统,其特征在于,
    所述分配器,还用于向所述协调器发送所述任务的显存资源使用状况信息;
    所述协调器,还用于若所述信息满足显存资源释放条件,则向所述分配器发送显存资源释放指令。
  20. 根据权利要求19所述的系统,其特征在于,
    所述分配器,具体用于根据预设周期,向所述协调器发送所述信息。
  21. 一种机器学习系统,其特征在于,包括:
    客户端,用于向服务端发送机器学习任务的优先级信息;
    服务端,用于确定通过图形处理单元运行的多个机器学习任务的优先级;若要为高优先级任务分配显存资源、且可分配显存资源小于高优先级任务的显存资源需求量,则释放低优先级任务占用的至少一部分显存资源;为高优先级任务分配显存资源,以至少根据显存空间的张量数据,运行高优先级任务。
PCT/CN2021/127856 2020-11-03 2021-11-01 显存管理方法、装置、设备及系统 WO2022095815A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP21888512.7A EP4242843A4 (en) 2020-11-03 2021-11-01 GRAPHICS CARD MEMORY MANAGEMENT METHOD AND APPARATUS, DEVICE AND SYSTEM
US18/306,636 US20230297498A1 (en) 2020-11-03 2023-04-25 Video memory management method, apparatus, device and system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011219652.6 2020-11-03
CN202011219652.6A CN114443263A (zh) 2020-11-03 2020-11-03 显存管理方法、装置、设备及系统

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/306,636 Continuation US20230297498A1 (en) 2020-11-03 2023-04-25 Video memory management method, apparatus, device and system

Publications (1)

Publication Number Publication Date
WO2022095815A1 true WO2022095815A1 (zh) 2022-05-12

Family

ID=81361391

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/127856 WO2022095815A1 (zh) 2020-11-03 2021-11-01 显存管理方法、装置、设备及系统

Country Status (4)

Country Link
US (1) US20230297498A1 (zh)
EP (1) EP4242843A4 (zh)
CN (1) CN114443263A (zh)
WO (1) WO2022095815A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115061800A (zh) * 2022-06-30 2022-09-16 中国联合网络通信集团有限公司 边缘计算任务的处理方法、边缘服务器及存储介质
CN116643860A (zh) * 2023-04-26 2023-08-25 国家气象信息中心(中国气象局气象数据中心) 针对气象机器学习算法运行的优先级调度方法、系统、电子设备和计算机程序产品

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109325494B (zh) * 2018-08-27 2021-09-17 腾讯科技(深圳)有限公司 图片处理方法、任务数据处理方法和装置
CN114675976B (zh) * 2022-05-26 2022-09-16 深圳前海环融联易信息科技服务有限公司 基于kubernetes的GPU共享方法、装置、设备及介质
CN115292199B (zh) * 2022-09-22 2023-03-24 荣耀终端有限公司 一种显存泄露的处理方法及相关装置
CN117435521B (zh) * 2023-12-21 2024-03-22 西安芯云半导体技术有限公司 基于gpu渲染的纹理显存映射方法、装置及介质
CN118312333B (zh) * 2024-06-07 2024-10-18 支付宝(杭州)信息技术有限公司 基于GPU多stream并发的显存复用方法和装置
CN118427120A (zh) * 2024-07-04 2024-08-02 北京壁仞科技开发有限公司 一种分组归一化算子的执行方法、设备及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109828833A (zh) * 2018-11-02 2019-05-31 上海帆一尚行科技有限公司 一种神经网络训练任务的排队系统及其方法
CN111078395A (zh) * 2019-11-12 2020-04-28 华中科技大学 一种基于张量的深度学习gpu内存管理优化方法及系统
CN111400022A (zh) * 2019-01-02 2020-07-10 中国移动通信有限公司研究院 一种资源调度方法、装置及电子设备
CN111768006A (zh) * 2020-06-24 2020-10-13 北京金山云网络技术有限公司 一种人工智能模型的训练方法、装置、设备及存储介质

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109766183A (zh) * 2018-12-28 2019-05-17 郑州云海信息技术有限公司 一种集群gpu复用及智能负载的方法及系统
KR102086757B1 (ko) * 2019-07-31 2020-03-09 서강대학교 산학협력단 Gpu 메모리 스케줄러 및 이를 이용한 gpu 메모리 선점 방법

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109828833A (zh) * 2018-11-02 2019-05-31 上海帆一尚行科技有限公司 一种神经网络训练任务的排队系统及其方法
CN111400022A (zh) * 2019-01-02 2020-07-10 中国移动通信有限公司研究院 一种资源调度方法、装置及电子设备
CN111078395A (zh) * 2019-11-12 2020-04-28 华中科技大学 一种基于张量的深度学习gpu内存管理优化方法及系统
CN111768006A (zh) * 2020-06-24 2020-10-13 北京金山云网络技术有限公司 一种人工智能模型的训练方法、装置、设备及存储介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115061800A (zh) * 2022-06-30 2022-09-16 中国联合网络通信集团有限公司 边缘计算任务的处理方法、边缘服务器及存储介质
CN116643860A (zh) * 2023-04-26 2023-08-25 国家气象信息中心(中国气象局气象数据中心) 针对气象机器学习算法运行的优先级调度方法、系统、电子设备和计算机程序产品

Also Published As

Publication number Publication date
EP4242843A1 (en) 2023-09-13
CN114443263A (zh) 2022-05-06
EP4242843A4 (en) 2023-09-13
US20230297498A1 (en) 2023-09-21

Similar Documents

Publication Publication Date Title
WO2022095815A1 (zh) 显存管理方法、装置、设备及系统
Xiao et al. {AntMan}: Dynamic scaling on {GPU} clusters for deep learning
RU2538920C2 (ru) Способ распределения задач сервером вычислительной системы, машиночитаемый носитель информации и система для реализации способа
US8424007B1 (en) Prioritizing tasks from virtual machines
US20190319895A1 (en) Resource Scheduling Method And Apparatus
US10013264B2 (en) Affinity of virtual processor dispatching
WO2021180092A1 (zh) 任务调度方法和装置
WO2024016596A1 (zh) 容器集群调度的方法、装置、设备及存储介质
CN113342477A (zh) 一种容器组部署方法、装置、设备及存储介质
CN108509280B (zh) 一种基于推送模型的分布式计算集群本地性调度方法
EP4177745A1 (en) Resource scheduling method, electronic device, and storage medium
US9069621B2 (en) Submitting operations to a shared resource based on busy-to-success ratios
CN115586961A (zh) 一种ai平台计算资源任务调度方法、装置及介质
CN111597035A (zh) 基于多线程的仿真引擎时间推进方法及系统
US11221971B2 (en) QoS-class based servicing of requests for a shared resource
CN117251275B (zh) 多应用异步i/o请求的调度方法及系统、设备及介质
WO2024114483A2 (zh) 基于动态规划的资源分配方法、网络及存储介质和处理器
CN108228323B (zh) 基于数据本地性的Hadoop任务调度方法及装置
CN117234691A (zh) 任务调度方法及装置
Du et al. A combined priority scheduling method for distributed machine learning
JP2015148909A (ja) 並列計算機システム、並列計算機システムの制御方法及び管理ノードの制御プログラム
CN116737370A (zh) 一种多资源调度方法、系统、存储介质及终端
Chen et al. A real-time scheduling strategy based on processing framework of Hadoop
KR101558807B1 (ko) 호스트 프로세서와 협업 프로세서 간에 협업 처리를 위한 프로세서 스케줄링 방법 및 그 방법을 수행하는 호스트 프로세서
CN114896070A (zh) 一种用于深度学习任务的gpu资源分配方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21888512

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2021888512

Country of ref document: EP

Effective date: 20230605