WO2021093365A1 - 一种gpu显存管理控制方法及相关装置 - Google Patents

一种gpu显存管理控制方法及相关装置 Download PDF

Info

Publication number
WO2021093365A1
WO2021093365A1 PCT/CN2020/103741 CN2020103741W WO2021093365A1 WO 2021093365 A1 WO2021093365 A1 WO 2021093365A1 CN 2020103741 W CN2020103741 W CN 2020103741W WO 2021093365 A1 WO2021093365 A1 WO 2021093365A1
Authority
WO
WIPO (PCT)
Prior art keywords
video memory
task
gpu
placeholder
memory usage
Prior art date
Application number
PCT/CN2020/103741
Other languages
English (en)
French (fr)
Inventor
段国栋
Original Assignee
山东英信计算机技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 山东英信计算机技术有限公司 filed Critical 山东英信计算机技术有限公司
Publication of WO2021093365A1 publication Critical patent/WO2021093365A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3024Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a central processing unit [CPU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory

Definitions

  • This application relates to the field of computer technology, and in particular to a GPU video memory management and control method, a GPU video memory management and control device, a server, and a computer-readable storage medium.
  • AI deep learning technology In order to make computer use more intelligent, AI deep learning technology has emerged.
  • GPU Graphics Processing Unit
  • the cost of the current GPU card is relatively high, and the performance of the GPU has to be fully utilized.
  • scenarios that require the use of GPU cards include model development scenarios and model training scenarios.
  • the current GPU memory is relatively large, and there is no way to use 100% of the GPU memory in the model development scene or model training scene of the GPU card used.
  • the GPU is generally only used for customer development, and the debugging model does not require a large amount of video memory for debugging. It can be seen that there is no need to use too much video memory in this case.
  • the performance of the GPU will only be used in the process of testing the model. It can be seen that in the process of using the GPU debugging model in the prior art, due to the large GPU memory, the use of GPU memory for each task is complicated, and it is impossible to accurately and timely monitor and manage the use of GPU memory for tasks in the GPU. Reduces the intensity of monitoring the use of GPU video memory, which easily leads to the abuse of video memory by GPU video memory users, and does not control the size of the video memory, which further causes a waste of GPU performance resources.
  • the purpose of this application is to provide a method for managing and controlling GPU video memory, a device for managing and controlling GPU video memory, a server, and a computer-readable storage medium, which use the video memory usage data obtained from the GPU video memory, and then transcode the data into the video memory usage Placeholder, and finally monitor the use of video memory according to the placeholder of the video memory usage, which realizes accurate and timely monitoring and management operations on GPU video memory, and improves the efficiency of video memory usage.
  • this application provides a GPU memory management and control method, including:
  • obtain the video memory usage data of all tasks from the GPU video memory according to the preset data structure including:
  • the video memory condition of the task is recorded according to the preset data structure to obtain the video memory usage data.
  • transcoding the video memory usage data to placeholders to obtain video memory usage placeholders including:
  • Hexadecimal conversion is performed on the task name in the video memory usage data to obtain a name code
  • the name code and the placeholder code are combined into the video memory usage placeholder.
  • monitor the video memory usage of the video memory usage placeholder according to the video memory allocation table to obtain over-allocation tasks including:
  • it also includes:
  • the pod information and GPU card information corresponding to the task are determined according to the placeholder of the video memory usage corresponding to the task, so as to implement the task positioning operation.
  • This application also provides a GPU video memory management and control device, including:
  • the video memory data acquisition module is used to acquire the video memory usage data of all tasks from the GPU video memory according to the preset data structure
  • a video memory data transcoding module for transcoding the video memory usage data to placeholders to obtain video memory usage placeholders
  • a video memory monitoring module configured to monitor the video memory usage of the video memory usage placeholder according to the video memory allocation table to obtain an over-allocated task
  • the over-limit task processing module is used to perform a suspension operation on the over-allocated task.
  • the video memory data acquisition module is specifically configured to record the video memory status of the task according to the preset data structure when there is a task whose state changes in the GPU video memory to obtain the use of the video memory data.
  • the video memory data transcoding module includes:
  • the name code conversion unit is used to perform hexadecimal conversion of the task name in the video memory usage data to obtain the name code;
  • a placeholder code conversion unit for converting the GPU card number and the video memory usage size in the video memory usage data into placeholders to obtain a placeholder code
  • the placeholder combination unit is used to combine the name code and the placeholder code into the video memory usage placeholder.
  • the video memory monitoring module includes:
  • a video memory difference calculation unit configured to perform a placeholder difference operation between the video memory allocation table and the video memory usage placeholder according to a preset period to obtain a video memory difference
  • a peak task acquiring unit configured to use a task whose video memory difference is greater than a preset video memory as a peak task
  • the delay judgment unit is used to judge whether the existence time of the peak task is greater than the threshold time
  • the over-allocated task acquiring unit is configured to use the peak task as the over-allocated task when the existence time of the peak task is greater than the threshold time.
  • This application also provides a server, including:
  • Memory used to store computer programs
  • the processor is used to implement the steps of the GPU memory management control method described above when the computer program is executed.
  • the present application also provides a computer-readable storage medium having a computer program stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the GPU memory management control method described above are realized.
  • the GPU video memory management control method provided by this application includes: obtaining the video memory usage data of all tasks from the GPU video memory according to a preset data structure; transcoding the video memory usage data with placeholders to obtain the video memory Usage placeholder; monitor the video memory usage of the video memory usage placeholder according to the video memory allocation table to obtain an over-allocated task; perform an abort operation on the over-allocated task.
  • Video memory usage placeholder and finally through the video memory usage placeholder for monitoring operations, that is, placeholder operations can be directly performed, which improves the monitoring efficiency, maintains the timeliness and accuracy of monitoring, and further, uses the video memory
  • the monitoring gets the overallocated task, and finally the overallocated task is suspended, that is, the execution of the task that exceeds the video memory allocation is ended, which avoids further occupation of GPU memory resources, so that each task is executed according to the allocated video memory, and improves the effectiveness of management control operations. Sex.
  • This application also provides a GPU video memory management and control device, a server, and a computer-readable storage medium, which have the above beneficial effects, and will not be repeated here.
  • FIG. 1 is a flowchart of a method for managing and controlling GPU video memory provided by an embodiment of the application
  • FIG. 2 is a schematic structural diagram of a GPU video memory management and control device provided by an embodiment of the application.
  • the core of this application is to provide a method for managing and controlling GPU video memory, a device for managing and controlling GPU video memory, a server, and a computer-readable storage medium, which use the video memory usage data obtained from the GPU video memory and then transcode the data into the video memory usage Placeholder, and finally monitor the use of video memory according to the placeholder of the video memory usage, which realizes accurate and timely monitoring and management operations on GPU video memory, and improves the efficiency of video memory usage.
  • the GPU is generally only used in the case of customer development, and the debugging model does not require a large amount of video memory for debugging. It can be seen that there is no need to use too much video memory in this case. In another case of video memory usage, the performance of the GPU will only be used in the process of testing the model. It can be seen that in the process of using the GPU debugging model in the prior art, due to the large GPU memory, the use of GPU memory for each task is complicated, and it is impossible to accurately and timely monitor and manage the use of GPU memory for tasks in the GPU. Reduces the intensity of monitoring the use of GPU video memory, which easily leads to the abuse of video memory by GPU video memory users, and does not control the size of the video memory, which further causes a waste of GPU performance resources.
  • this application provides a method for managing and controlling GPU video memory, which obtains the video memory usage data of all tasks from the GPU video memory according to a preset data structure, and uses the preset data structure to obtain the video memory usage, thereby avoiding data confusion. , And then transcode the video memory usage placeholder to get the video memory usage placeholder, and finally monitor the operation through the video memory usage placeholder, that is, you can directly perform the placeholder operation, improve the monitoring efficiency, and maintain monitoring.
  • the timeliness and accuracy of the GPU and further, the over-allocated tasks are obtained through the video memory usage monitoring, and finally the over-allocated tasks are suspended, that is, the execution of the tasks that exceed the video memory allocation is ended, which avoids further occupation of GPU memory resources and makes each task All are executed in accordance with the allocated video memory to improve the effectiveness of management control operations.
  • FIG. 1 is a flowchart of a GPU video memory management and control method provided by an embodiment of the application.
  • the method may include:
  • This step is mainly to obtain the video memory usage of all tasks executed in the GPU card from the GPU video memory. Since the usage of the video memory is obtained from the GPU video memory, and the GPU memory capacity is usually large, when the application is in the field of deep learning, the GPU will generally run a large number of tasks.
  • the pod allocated for each task is different.
  • the number of GPU cards allocated to each pod is also different. Among them, pod is a workload-type resource object, which is the basis of all business types, a container or a combination of multiple containers. Therefore, it is very complicated to obtain the video memory usage data corresponding to a task, and the video memory usage data corresponding to each task changes at any time.
  • the CPU executes the corresponding program and obtains the usage data corresponding to the task from the GPU memory according to the unit of each task.
  • each task corresponds to a complex hardware relationship and changes at any time. Therefore, to directly obtain the usage data of the task's use of the video memory from the GPU video memory, not only the data acquisition is not timely, but also a large amount of hardware resources are consumed in the acquisition process.
  • the video memory usage data of all tasks are obtained from the GPU video memory according to the preset data structure. It mainly refers to firstly recording the usage of the video memory for all tasks in the GPU video memory according to the preset data structure, and then keeping the recorded data in the GPU video memory for real-time updates.
  • the way to keep the data updated may be to update the data when the task status changes, or to update the data according to a preset period, or to update the data in both of the above-mentioned ways.
  • the control end is the CPU end, which fetches the recorded data in the video memory into the memory, and then processes it in the memory.
  • the acquired video memory usage data can be transcoded through the following steps to obtain a placeholder for video memory usage to reduce the amount of data. , Improve the efficiency of data transmission.
  • this step may include:
  • the task's video memory situation is recorded according to a preset data structure to obtain the video memory usage data.
  • the changed tasks can include deleted tasks, newly added tasks, and tasks with content changes.
  • S102 Transcode the video memory usage data with placeholders to obtain the video memory usage placeholders
  • this step is mainly to transcode the data.
  • the main purpose is to reduce the amount of data, increase the speed of operation in the monitoring process, and simplify the display of the memory usage.
  • this step may include:
  • Step 1 Convert the task name in the video memory usage data to hexadecimal to obtain the name code
  • Step 2 Convert the GPU card number and the video memory usage size in the video memory usage data to placeholders to obtain the placeholder code
  • Step 3 Combine the name code and placeholder code into a placeholder for video memory usage.
  • the task name is converted to hexadecimal, and then the GPU card number and video memory usage size are converted to binary placeholders, and finally the name code and placeholder code are combined to obtain the usage placeholder symbol.
  • the task code is used as the task name to perform corresponding transcoding operations, and the GPU card number and video memory usage size are converted into binary placeholders to obtain the video memory usage corresponding to different tasks.
  • the memory usage of four different tasks can be expressed as FF4010001101; CD010101000; AA0010101000; AF0010101000.
  • S103 Perform video memory usage monitoring on the video memory usage placeholder according to the video memory allocation table to obtain an over-allocated task
  • this step aims to monitor the video memory usage of each task according to the video memory usage placeholder obtained by transcoding, and obtain the over-allocated task. That is, the task of getting the video memory usage beyond the allocated video memory. That is, determine whether the video memory usage of each task is greater than the data specified in the video memory allocation table.
  • this step it can be judged whether the video memory usage of each task is greater than the preset proportion of the allocated video memory size. For example, determine whether the usage of the video memory of each task is greater than 1.2 times the size of the allocated video memory.
  • the duration after the task exceeds the specified size is greater than the preset time, and if so, it is judged as an over-allocated task. That is, to avoid peak jitter and reduce the number of misjudgments.
  • the process of avoiding peak jitter can include:
  • the task whose video memory difference is greater than the preset video memory is regarded as the peak task
  • this step aims to end the over-allocation task. That is to avoid over-allocated tasks occupying other task resources, so that tasks follow the allocated memory size in the execution process, and improve the efficiency of the use of video memory.
  • it may also include:
  • the pod information and GPU card information corresponding to the task are determined according to the placeholder of the video memory usage corresponding to the task, so as to implement the task positioning operation.
  • the transcoded video memory placeholder to locate the corresponding pod information and GPU information, without having to go to the video memory to collect the task.
  • the corresponding video memory information improves the speed of task positioning and realizes fast positioning operation.
  • this embodiment obtains the video memory usage data of all tasks from the GPU video memory according to the preset data structure, and uses the preset data structure to obtain the video memory usage, avoiding data confusion, and then the video memory usage is performed
  • the placeholder is transcoded to obtain the memory usage placeholder, and finally the monitoring operation is performed through the video memory usage placeholder, that is, the placeholder operation can be directly performed, which improves the monitoring efficiency and maintains the timeliness and accuracy of monitoring.
  • the over-allocated tasks are obtained through the video memory usage monitoring, and finally the over-allocated tasks are suspended, that is, to end the execution of tasks that exceed the video memory allocation, avoiding further occupation of GPU memory resources, so that each task is executed according to the allocated video memory.
  • the method may include:
  • Step 1 On the basis of fine-grained allocation of GPU video memory, the GPU video memory usage is obtained into the memory, and stored in the memory table;
  • Step 2 Perform peak value judgment based on the memory table
  • Step 3 End tasks that are larger than the peak value
  • Step 4 Keep the tasks that are not larger than the peak value and continuously monitor the running status of the tasks in the memory.
  • the basis of this embodiment is to allocate GPU video memory in a fine-grained manner, and each task has been allocated to a corresponding resource and can be run. Therefore, during the allocation process, the memory usage data should be recorded.
  • a memory table is defined for storage in the basic data.
  • the data table structure can be as follows:
  • jobname is the task name
  • podName is the pod name
  • type is the pod type
  • nodename is the corresponding node name
  • gid is the ID of the GPU
  • gsize is the GPU size used.
  • this data structure is automatically updated and maintained whenever a task changes.
  • the table data will be automatically updated when the platform changes status, including creation, operation, deletion, and errors. If the server restarts, the platform business will automatically update the current tasks and pods during the restart process. It is automatically stored in memory.
  • each running task is classified and counted.
  • the data occupancy calculation definition is directly adopted in the business layer, which can reduce the memory usage and increase the calculation speed.
  • 8 tasks they can be divided as follows:
  • the hexadecimal in the front represents the name
  • the GPU card and video memory size are defined by placeholders in the back. Because the video memory granularity defined in a pod is the same, this calculation method is used to quickly locate the video memory.
  • the video memory allocated by the GPU card of each pod can be quickly obtained, and the uuid value of each GPU card can be obtained.
  • This value is a unique mark in each node.
  • the information collected by each pod is controlled by a concurrency mechanism.
  • automatic time-out control is done, and the time-out automatic recovery occurs, so that the next time the collection will be collected, this collection is not used as a reference.
  • the GPU card memory utilization situation corresponding to this time is collected, because each program will have a process where the peak of the video memory fluctuates, and the interference-free data processing process is adopted.
  • the realization process is that when more than 20% of the specified video memory occurs in a second-level acquisition period, it is judged as a program peak jitter process, no processing is performed, and statistics are only performed after the video memory is stable. If the specified threshold is exceeded for a long time, for example, the video memory occupancy has not changed for a specified time of 5s, indicating that this application has exceeded the specified allocation control, so after obtaining the task pod number, record the time and video memory that the task exceeded Value, three warning prompts are given, the task is automatically ended by default, and it will be written to the log record.
  • this embodiment obtains the video memory usage data of all tasks from the GPU video memory according to the preset data structure, uses the preset data structure to obtain the video memory usage, avoiding data confusion, and then accounts for the video memory usage.
  • the placeholder is transcoded to obtain a placeholder for the video memory usage, and finally the monitoring operation is performed through the video memory usage placeholder, that is, the placeholder operation can be directly performed, which improves the monitoring efficiency, maintains the timeliness and accuracy of the monitoring, and further Yes, the over-allocated task is obtained through the video memory usage monitoring, and finally the over-allocated task is suspended, that is, the execution of the task that exceeds the video memory allocation is ended, which avoids further occupation of GPU memory resources, so that each task is executed according to the allocated video memory, improving Effectiveness of management control operations.
  • GPU video memory management and control device provided by an embodiment of the present application will be introduced.
  • the GPU video memory management and control device described below and the GPU video memory management and control method described above can be referred to each other.
  • FIG. 2 is a schematic structural diagram of a GPU video memory management and control device provided by an embodiment of the application.
  • the device may include:
  • the video memory data acquisition module 100 is configured to acquire the video memory usage data of all tasks from the GPU video memory according to a preset data structure
  • the video memory data transcoding module 200 is used for transcoding the video memory usage data to placeholders to obtain the video memory usage placeholders;
  • the video memory monitoring module 300 is used to monitor the video memory usage of the video memory usage placeholder according to the video memory allocation table to obtain over-allocation tasks;
  • the over-limit task processing module 400 is used to perform a suspension operation on over-allocated tasks.
  • the video memory data acquisition module 100 is specifically configured to record the video memory status of the task according to a preset data structure when there is a task whose state changes in the GPU video memory to obtain the video memory usage data.
  • the video memory data transcoding module 200 may include:
  • the name code conversion unit is used to perform hexadecimal conversion of the task name in the video memory usage data to obtain the name code;
  • the placeholder code conversion unit is used to convert the GPU card number and the video memory usage size in the video memory usage data to obtain the placeholder code
  • the placeholder combination unit is used to combine the name code and the placeholder code into a placeholder for video memory usage.
  • the video memory monitoring module 300 may include:
  • the video memory difference calculation unit is used to perform the placeholder difference calculation between the video memory allocation table and the video memory usage placeholder according to the preset cycle to obtain the video memory difference;
  • the peak task acquisition unit is used to take the task whose video memory difference is greater than the preset video memory as the peak task;
  • the delay judgment unit is used to judge whether the existence time of the peak task is greater than the threshold time
  • the over-allocated task acquiring unit is used to treat the peak task as an over-allocated task when the existence time of the peak task is greater than the threshold time.
  • the embodiment of the present application also provides a server, including:
  • Memory used to store computer programs
  • the processor is used to implement the steps of the GPU video memory management control method described in the above embodiments when the computer program is executed.
  • the embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, it implements the GPU memory management control method described in the above embodiment step.
  • the computer-readable storage medium may include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk, etc., which can store program codes Medium.
  • the steps of the method or algorithm described in combination with the embodiments disclosed in this document can be directly implemented by hardware, a software module executed by a processor, or a combination of the two.
  • the software module can be placed in random access memory (RAM), internal memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disks, removable disks, CD-ROMs, or all areas in the technical field. Any other known storage media.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Quality & Reliability (AREA)
  • Television Signal Processing For Recording (AREA)

Abstract

一种GPU显存管理控制方法、GPU显存管理控制装置、服务器以及计算机可读存储介质,包括:按照预设数据结构从GPU显存中获取到所有任务的显存使用情况数据;将所述显存使用情况数据进行占位符转码,得到显存使用情况占位符;根据显存分配表对所述显存使用情况占位符进行显存使用监控,得到超分配任务;对所述超分配任务执行中止操作。通过从GPU显存获取到的显存使用情况数据,然后将该数据转码为显存使用情况占位符,最后根据该显存使用情况占位符进行显存使用监控,实现了对GPU显存进行准确及时的监控管理操作,提高了显存使用效率。

Description

一种GPU显存管理控制方法及相关装置
本申请要求于2019年11月15日提交中国专利局、申请号为201911122487.X、发明名称为“一种GPU显存管理控制方法及相关装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机技术领域,特别涉及一种GPU显存管理控制方法、GPU显存管理控制装置、服务器以及计算机可读存储介质。
背景技术
随着信息技术的不断发展,为了使计算机使用更加智能化,出现了AI深度学习技术。在设备中使用AI深度学习技术可以使数据处理更加智能化。进一步,为了提高深度学习的速度,在深度学习领域中采用GPU(Graphics Processing Unit图形处理器)对深度学习进行加速。但是,目前GPU卡的成本较高,不得不充分使用GPU的性能。目前,需要使用GPU卡的场景包括模型开发场景和模型训练场景。但是,GPU目前的显存都比较大,并且使用的GPU卡的模型开发场景或者是模型训练场景都没有办法将GPU中的显存进行百分之百的使用。
并且,在另一种显存使用情况中,一般只在客户开发的情况下使用GPU,调试模型是并不需要大量的显存进行调试。可见,此情况下无需使用过多的显存。还在另一中显存使用情况中,在对模型进行测试的过程中,才会使用到GPU的性能。可见,现有技术中在使用GPU调试模型的过程中,由于GPU显存较大,每个任务使用GPU显存情况复杂,无法对GPU中的任务使用GPU显存的情况进行准确及时的监控管理操作,进而降低了对GPU显存使用的监控力度,容易导致GPU显存使用者对显存进行滥用,不控制显存的大小,进一步造成GPU性能资源浪费。
因此,如何对GPU显存进行准确及时的监控管理操作是本领域技术人员关注的重点问题。
发明内容
本申请的目的是提供一种GPU显存管理控制方法、GPU显存管理控制装置、服务器以及计算机可读存储介质,通过从GPU显存获取到的显存使用情况数据,然后将该数据转码为显存使用情况占位符,最后根据该显存使用情况占位符进行显存使用监控,实现了对GPU显存进行准确及时的监控管理操作,提高了显存使用效率。
为解决上述技术问题,本申请提供一种GPU显存管理控制方法,包括:
按照预设数据结构从GPU显存中获取到所有任务的显存使用情况数据;
将所述显存使用情况数据进行占位符转码,得到显存使用情况占位符;
根据显存分配表对所述显存使用情况占位符进行显存使用监控,得到超分配任务;
对所述超分配任务执行中止操作。
可选的,按照预设数据结构从GPU显存中获取到所有任务的显存使用情况数据,包括:
当所述GPU显存中存在状态发生变化的任务时,将所述任务的显存情况按照所述预设数据结构进行记录,得到所述显存使用数据。
可选的,将所述显存使用情况数据进行占位符转码,得到显存使用情况占位符,包括:
将所述显存使用情况数据中的任务名称进行十六进制转换,得到名称码;
将所述显存使用情况数据中的GPU卡号和显存使用大小进行占位符转换,得到占位符码;
将所述名称码和所述占位符码组合为所述显存使用情况占位符。
可选的,根据显存分配表对所述显存使用情况占位符进行显存使用监控,得到超分配任务,包括:
按照预设周期将所述显存分配表与所述显存使用情况占位符进行占位符差值运算,得到显存差值;
将所述显存差值大于预设显存的任务作为峰值任务;
判断峰值任务的存在时间是否大于阈值时间;
若是,将该峰值任务作为所述超分配任务。
可选的,还包括:
当定位任务时,根据所述任务对应的显存使用情况占位符确定所述任务对应的pod信息和GPU卡信息,实现任务的定位操作。
本申请还提供一种GPU显存管理控制装置,包括:
显存数据获取模块,用于按照预设数据结构从GPU显存中获取到所有任务的显存使用情况数据;
显存数据转码模块,用于将所述显存使用情况数据进行占位符转码,得到显存使用情况占位符;
显存监控模块,用于根据显存分配表对所述显存使用情况占位符进行显存使用监控,得到超分配任务;
超限任务处理模块,用于对所述超分配任务执行中止操作。
可选的,所述显存数据获取模块,具体用于当所述GPU显存中存在状态发生变化的任务时,将所述任务的显存情况按照所述预设数据结构进行记录,得到所述显存使用数据。
可选的,所述显存数据转码模块,包括:
名称码转换单元,用于将所述显存使用情况数据中的任务名称进行十六进制转换,得到名称码;
占位符码转换单元,用于将所述显存使用情况数据中的GPU卡号和显存使用大小进行占位符转换,得到占位符码;
占位符组合单元,用于将所述名称码和所述占位符码组合为所述显存使用情况占位符。
可选的,所述显存监控模块,包括:
显存差值计算单元,用于按照预设周期将所述显存分配表与所述显存使用情况占位符进行占位符差值运算,得到显存差值;
峰值任务获取单元,用于将所述显存差值大于预设显存的任务作为峰值任务;
延时判断单元,用于判断峰值任务的存在时间是否大于阈值时间;
超分配任务获取单元,用于当所述峰值任务的存在时间大于阈值时间时,将该峰值任务作为所述超分配任务。
本申请还提供一种服务器,包括:
存储器,用于存储计算机程序;
处理器,用于执行所述计算机程序时实现如上所述的GPU显存管理控制方法的步骤。
本申请还提供一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如上所述的GPU显存管理控制方法的步骤。
本申请所提供的一种GPU显存管理控制方法,包括:按照预设数据结构从GPU显存中获取到所有任务的显存使用情况数据;将所述显存使用情况数据进行占位符转码,得到显存使用情况占位符;根据显存分配表对所述显存使用情况占位符进行显存使用监控,得到超分配任务;对所述超分配任务执行中止操作。
通过按照预设数据结构从GPU显存中获取到所有任务的显存使用情况数据,采用预设的数据结构获取显存使用情况,避免了数据混乱的情况,然后将显存使用情况进行占位符转码得到显存使用情况占位符,最后通过该显存使用情况占位符进行监控操作,也就是可以直接进行占位符运算,提高了监控效率,保持监控的及时性和准确性,进一步的,通过显存使用监控得到了超分配任务,最后将超分配任务进行中止,也就是结束执行超出显存分配的任务,避免了对GPU显存资源进一步占用,使各个任务均按照分配的显存执行,提高管理控制操作的有效性。
本申请还提供一种GPU显存管理控制装置、服务器以及计算机可读存储介质,具有以上有益效果,在此不作赘述。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的实施例,对于本领域普通技术人员来讲, 在不付出创造性劳动的前提下,还可以根据提供的附图获得其他的附图。
图1为本申请实施例所提供的一种GPU显存管理控制方法的流程图;
图2为本申请实施例所提供的一种GPU显存管理控制装置的结构示意图。
具体实施方式
本申请的核心是提供一种GPU显存管理控制方法、GPU显存管理控制装置、服务器以及计算机可读存储介质,通过从GPU显存获取到的显存使用情况数据,然后将该数据转码为显存使用情况占位符,最后根据该显存使用情况占位符进行显存使用监控,实现了对GPU显存进行准确及时的监控管理操作,提高了显存使用效率。
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
现有技术中,一般只在客户开发的情况下使用GPU,调试模型是并不需要大量的显存进行调试。可见,此情况下无需使用过多的显存。还在另一中显存使用情况中,在对模型进行测试的过程中,才会使用到GPU的性能。可见,现有技术中在使用GPU调试模型的过程中,由于GPU显存较大,每个任务使用GPU显存情况复杂,无法对GPU中的任务使用GPU显存的情况进行准确及时的监控管理操作,进而降低了对GPU显存使用的监控力度,容易导致GPU显存使用者对显存进行滥用,不控制显存的大小,进一步造成GPU性能资源浪费。
因此,本申请提供一种GPU显存管理控制方法,通过按照预设数据结构从GPU显存中获取到所有任务的显存使用情况数据,采用预设的数据结构获取显存使用情况,避免了数据混乱的情况,然后将显存使用情况进行占位符转码得到显存使用情况占位符,最后通过该显存使用情况占位符进 行监控操作,也就是可以直接进行占位符运算,提高了监控效率,保持监控的及时性和准确性,进一步的,通过显存使用监控得到了超分配任务,最后将超分配任务进行中止,也就是结束执行超出显存分配的任务,避免了对GPU显存资源进一步占用,使各个任务均按照分配的显存执行,提高管理控制操作的有效性。
请参考图1,图1为本申请实施例所提供的一种GPU显存管理控制方法的流程图。
本实施例中,该方法可以包括:
S101,按照预设数据结构从GPU显存中获取到所有任务的显存使用情况数据;
本步骤主要是从GPU显存中获取到所有在GPU卡中执行的所有的任务的使用显存的显存使用情况。由于是从GPU显存中获取到显存的使用情况,并且,GPU显存容量通常较大,应用在深度学习领域时在GPU中一般会运行较多数量的任务,每个任务分配的pod不同,对每个pod分配的GPU卡数也不相同。其中,pod为一种工作负载型资源对象,是所有业务类型的基础,一个容器或者多个容器的组合。因此,获取到一个任务对应的显存使用情况数据是十分复杂的,并且,每个任务对应的显存使用情况数据随时进行变化。
在现有技术中,一般是CPU执行相应的程序按照每个任务为单位从GPU显存中获取该任务对应的使用情况数据。但是,GPU中执行的任务众多,每个任务又对应了复杂的硬件关系同时还随时发生变化。因此,直接从GPU显存中获取到任务使用显存的使用情况数据,不仅数据获取不及时,还会耗费大量的硬件资源在获取过程中。
因此,在本步骤中按照预设数据结构从GPU显存中获取所有任务的显存使用情况数据。主要是指首先在GPU显存中按照预设数据结构将所有任务使用显存的情况进行记录,然后在GPU显存中保持该记录数据实时更新。其中,保持数据更新的方式可以是当任务状态发生变化就更新数据,也可以是按照预设周期更新数据,还可以是通过上述两种方式都更新数据。 最后,控制端也就是CPU端,将显存中已经记录的数据获取至内存中,再在内存中进行处理。当然,本步骤获取到内存中的步骤,为了降低传输的数据量,还可以通过以下的步骤将获取到显存使用情况数据进行转码处理,得到显存使用情况占位符,以降低数据的数据量,提高数据的传输效率。
可选的,本步骤可以包括:
当GPU显存中存在状态发生变化的任务时,将任务的显存情况按照预设数据结构进行记录,得到显存使用数据。
可见,本可选方案主要是对更新数据做进一步说明。其中,发生变化的任务可以包括被删除的任务、新增加的任务、内容变化的任务。
S102,将显存使用情况数据进行占位符转码,得到显存使用情况占位符;
在S101的基础上,本步骤主要是将数据进行转码。主要目的是降低数据的数据量,提高监控过程中的运算速度,并且,将显存使用情况展现的更加简单化。
具体来说,可以选择出显存使用情况数据中的关键数据,例如任务名称、GPU卡号以及使用内存大小。然后将这个数据进行转换,得到固定长度的占位符。可以是转换为十六进制的码,可以是二进制码,也可以是其他进制的码数,在此不做限定。
可选的,本步骤可以包括:
步骤1,将显存使用情况数据中的任务名称进行十六进制转换,得到名称码;
步骤2,将显存使用情况数据中的GPU卡号和显存使用大小进行占位符转换,得到占位符码;
步骤3,将名称码和占位符码组合为显存使用情况占位符。
可见,本可选方案中是将任务名称转换为十六进制,然后将GPU卡号和显存使用大小转换为二进制的占位符,最后将名称码和占位符码进行组合得到使用情况占位符。
举例来说,将任务编码作为任务名称进行相应的转码操作,将GPU卡号和显存使用大小转换为二进制的占位符,得到不同任务对应的显存使用 情况。例如,四个不同的任务的显存使用情况可以表示为FF4010001101;CD010101000;AA0010101000;AF0010101000。
很显然,对每个任务进行显存大小的判断时,只需要对该显存使用情况中的二进制码进行运算即可。由于是直接计算二进制数,可以极大的提高运算的效率,可以针对众多任务在短时间内进行高频大量的运算。同时,减少了这些数据保存在内存中的数据量,避免数据冗余。
S103,根据显存分配表对显存使用情况占位符进行显存使用监控,得到超分配任务;
在S102的基础上,本步骤旨在根据转码得到的显存使用情况占位符对每个任务的显存使用情况进行监控,得到超分配任务。也就是,得到显存使用超出分配显存的任务。也就是,判断每个任务的显存使用情况是否大于显存分配表中规定的数据。
为了避免误判,降低错判数量,本步骤中可以判断每个任务的显存使用情况是否大于分配显存大小的预设比例大小。例如,判断每个任务显存的使用情况是否大于分配显存大小1.2倍的容量大小。
为了减少误判数量,还可以判断该任务超出规定大小后的持续时间是否大于预设时长,若是,才判定为超分配任务。也就是避免出现峰值抖动的情况,降低误判的数量。
具体的,避免出现峰值抖动的过程可以包括:
按照预设周期将显存分配表与显存使用情况占位符进行占位符差值运算,得到显存差值;
将显存差值大于预设显存的任务作为峰值任务;
判断峰值任务的存在时间是否大于阈值时间;
若是,将该峰值任务作为超分配任务。
S104,对超分配任务执行中止操作。
在S103的基础上,本步骤旨在结束超分配任务。也就是避免超分配任务占用其他的任务资源,使得任务在执行过程中更加遵循分配的内存大小,提高显存的使用效率。
其中,结束任务的执行的方式可以采用现有技术提供的任意一种结束 方式,在此不作赘述。
可选的,本实施例中,还可以包括:
当定位任务时,根据所述任务对应的显存使用情况占位符确定所述任务对应的pod信息和GPU卡信息,实现任务的定位操作。
也就是说,如果在本实施例中需要对任务的显存进行定位,直接采用转码后的显存情况占位符即可定位到对应的pod信息和GPU信息,而不用再去显存中收集该任务对应的显存信息,提高了任务定位的速度,实现快速定位操作。
综上,本实施例通过按照预设数据结构从GPU显存中获取到所有任务的显存使用情况数据,采用预设的数据结构获取显存使用情况,避免了数据混乱的情况,然后将显存使用情况进行占位符转码得到显存使用情况占位符,最后通过该显存使用情况占位符进行监控操作,也就是可以直接进行占位符运算,提高了监控效率,保持监控的及时性和准确性,进一步的,通过显存使用监控得到了超分配任务,最后将超分配任务进行中止,也就是结束执行超出显存分配的任务,避免了对GPU显存资源进一步占用,使各个任务均按照分配的显存执行,提高管理控制操作的有效性。
以下通过另一个具体的实施例,对本申请提供的一种GPU显存管理控制方法做进一步说明。
本实施例中,该方法可以包括:
步骤1,在已经将GPU显存进行细粒度的分配的基础上,获取到GPU显存使用情况到内存中,通过内存表进行保存;
步骤2,根据内存表进行峰值判断;
步骤3,将大于峰值的任务进行结束操作;
步骤4,将不大于峰值的任务进行保留,并持续监控内存中的任务运行情况。
本实施例的基础是将GPU显存按照细粒度分配,且每个任务都已分配到对应的资源且都可以运行起来。所以在分配过程,要把内存使用数据进行记录。
为了后续控制性能提升,在基础数据这里定义内存表来存储。举例来说,数据表结构可以如下:
Figure PCTCN2020103741-appb-000001
其中,jobname为任务名称,podName为pod名称,type为pod类型,nodename为对应的node名称,gid为GPU的ID,gsize为使用的GPU大小。
其中,本实施例因为每个任务分配的pod不同,而且每个pod分配的GPU卡数也不相同,所以这是一个动态的数据结构,方便统计和查询,通过采用常规的存储方法,方便后续的扩展使用。这里面把每个GPU卡的id都已记录,且每个卡在哪个节点都已记录,为了后面的查询做准备。每当 任务发生变化自动更新和维护这个数据结构,平台发生变化状态包括创建、运行、删除、错误都会自动更新表数据,如果出现服务器重启,该平台业务自动在重启过程中把当前的任务和pod自动存储到内存中。
在基础的数据结构中,把每个运行的任务进行分类统计,为了提高运行统计速度,在业务层直接采用数据占位运算定义,可以减少内存占用,提高运算速度。例如,将8个任务进行定义,可以划分如下:
FF4010001101;CD010101000;AA0010101000;AF0010101000;
FF0010101001;XA0010101001;FF0010101001;DD4010101000。
其中,前面的十六进制表示名称,后面通过占位符定义GPU卡和显存大小。因为在一个pod中定义的显存颗粒度都一样,所以采用该计算方法快速的定位到显存。
通过该占位符可以实现快速得到每个pod的GPU卡分配的显存,且得到每个GPU卡的uuid值,该值是每个节点内唯一标志,每个pod采集信息全部采用并发机制控制,而且做了自动超时控制,出现超时自动回收,让下次轮动去采集,本次采集不作为参考。
采集得到对应此时的GPU卡显存利用情况,因为每个程序都会有一个显存峰值出现波动的过程,采用了免干扰数据处理过程。实现过程是当在一个秒级的采集周期内出现规定显存的20%以上,判定为一个程序峰值抖动过程,不做处理,只有显存达到稳定之后才进行统计。如果长时间超过规定阈值,比如持续5s规定的时间内显存占用一直不变化,表示本次应用是超过了规定分配的控制,因此获取到该任务pod号后,记录本次任务超过的时间和显存值,给予三次预警提示,默认自动把该任务结束掉,同时会写入到日志记录中。
可见,本实施例通过按照预设数据结构从GPU显存中获取到所有任务的显存使用情况数据,采用预设的数据结构获取显存使用情况,避免了数据混乱的情况,然后将显存使用情况进行占位符转码得到显存使用情况占位符,最后通过该显存使用情况占位符进行监控操作,也就是可以直接进行占位符运算,提高了监控效率,保持监控的及时性和准确性,进一步的,通过显存使用监控得到了超分配任务,最后将超分配任务进行中止,也就 是结束执行超出显存分配的任务,避免了对GPU显存资源进一步占用,使各个任务均按照分配的显存执行,提高管理控制操作的有效性。
下面对本申请实施例提供的一种GPU显存管理控制装置进行介绍,下文描述的一种GPU显存管理控制装置与上文描述的一种GPU显存管理控制方法可相互对应参照。
请参考图2,图2为本申请实施例所提供的一种GPU显存管理控制装置的结构示意图。
本实施例中,该装置可以包括:
显存数据获取模块100,用于按照预设数据结构从GPU显存中获取到所有任务的显存使用情况数据;
显存数据转码模块200,用于将显存使用情况数据进行占位符转码,得到显存使用情况占位符;
显存监控模块300,用于根据显存分配表对显存使用情况占位符进行显存使用监控,得到超分配任务;
超限任务处理模块400,用于对超分配任务执行中止操作。
可选的,该显存数据获取模块100,具体用于当GPU显存中存在状态发生变化的任务时,将任务的显存情况按照预设数据结构进行记录,得到显存使用数据。
可选的,该显存数据转码模块200可以包括:
名称码转换单元,用于将显存使用情况数据中的任务名称进行十六进制转换,得到名称码;
占位符码转换单元,用于将显存使用情况数据中的GPU卡号和显存使用大小进行占位符转换,得到占位符码;
占位符组合单元,用于将名称码和占位符码组合为显存使用情况占位符。
可选的,该显存监控模块300,可以包括:
显存差值计算单元,用于按照预设周期将显存分配表与显存使用情况占位符进行占位符差值运算,得到显存差值;
峰值任务获取单元,用于将显存差值大于预设显存的任务作为峰值任务;
延时判断单元,用于判断峰值任务的存在时间是否大于阈值时间;
超分配任务获取单元,用于当峰值任务的存在时间大于阈值时间时,将该峰值任务作为超分配任务。
本申请实施例还提供一种服务器,包括:
存储器,用于存储计算机程序;
处理器,用于执行所述计算机程序时实现如以上实施例所述的GPU显存管理控制方法的步骤。
本申请实施例还提供一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如以上实施例所述的GPU显存管理控制方法的步骤。
该计算机可读存储介质可以包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似部分互相参见即可。对于实施例公开的装置而言,由于其与实施例公开的方法相对应,所以描述的比较简单,相关之处参见方法部分说明即可。
专业人员还可以进一步意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
结合本文中所公开的实施例描述的方法或算法的步骤可以直接用硬件、处理器执行的软件模块,或者二者的结合来实施。软件模块可以置于随机存储器(RAM)、内存、只读存储器(ROM)、电可编程ROM、电可擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM、或技术领域内所公知的任意其它形式的存储介质中。
以上对本申请所提供的一种GPU显存管理控制方法、GPU显存管理控制装置、服务器以及计算机可读存储介质进行了详细介绍。本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想。应当指出,对于本技术领域的普通技术人员来说,在不脱离本申请原理的前提下,还可以对本申请进行若干改进和修饰,这些改进和修饰也落入本申请权利要求的保护范围内。

Claims (10)

  1. 一种GPU显存管理控制方法,其特征在于,包括:
    按照预设数据结构从GPU显存中获取到所有任务的显存使用情况数据;
    将所述显存使用情况数据进行占位符转码,得到显存使用情况占位符;
    根据显存分配表对所述显存使用情况占位符进行显存使用监控,得到超分配任务;
    对所述超分配任务执行中止操作。
  2. 根据权利要求1所述的GPU显存管理控制方法,其特征在于,按照预设数据结构从GPU显存中获取到所有任务的显存使用情况数据,包括:
    当所述GPU显存中存在状态发生变化的任务时,将所述任务的显存情况按照所述预设数据结构进行记录,得到所述显存使用数据。
  3. 根据权利要求1所述的GPU显存管理控制方法,其特征在于,将所述显存使用情况数据进行占位符转码,得到显存使用情况占位符,包括:
    将所述显存使用情况数据中的任务名称进行十六进制转换,得到名称码;
    将所述显存使用情况数据中的GPU卡号和显存使用大小进行占位符转换,得到占位符码;
    将所述名称码和所述占位符码组合为所述显存使用情况占位符。
  4. 根据权利要求1所述的GPU显存管理控制方法,其特征在于,根据显存分配表对所述显存使用情况占位符进行显存使用监控,得到超分配任务,包括:
    按照预设周期将所述显存分配表与所述显存使用情况占位符进行占位符差值运算,得到显存差值;
    将所述显存差值大于预设显存的任务作为峰值任务;
    判断峰值任务的存在时间是否大于阈值时间;
    若是,将该峰值任务作为所述超分配任务。
  5. 根据权利要求1所述的GPU显存管理控制方法,其特征在于,还 包括:
    当定位任务时,根据所述任务对应的显存使用情况占位符确定所述任务对应的pod信息和GPU卡信息,实现任务的定位操作。
  6. 一种GPU显存管理控制装置,其特征在于,包括:
    显存数据获取模块,用于按照预设数据结构从GPU显存中获取到所有任务的显存使用情况数据;
    显存数据转码模块,用于将所述显存使用情况数据进行占位符转码,得到显存使用情况占位符;
    显存监控模块,用于根据显存分配表对所述显存使用情况占位符进行显存使用监控,得到超分配任务;
    超限任务处理模块,用于对所述超分配任务执行中止操作。
  7. 根据权利要求6所述的GPU显存管理控制装置,其特征在于,所述显存数据获取模块,具体用于当所述GPU显存中存在状态发生变化的任务时,将所述任务的显存情况按照所述预设数据结构进行记录,得到所述显存使用数据。
  8. 根据权利要求6所述的GPU显存管理控制装置,其特征在于,所述显存数据转码模块,包括:
    名称码转换单元,用于将所述显存使用情况数据中的任务名称进行十六进制转换,得到名称码;
    占位符码转换单元,用于将所述显存使用情况数据中的GPU卡号和显存使用大小进行占位符转换,得到占位符码;
    占位符组合单元,用于将所述名称码和所述占位符码组合为所述显存使用情况占位符。
  9. 一种服务器,其特征在于,包括:
    存储器,用于存储计算机程序;
    处理器,用于执行所述计算机程序时实现如权利要求1至5任一项所述的GPU显存管理控制方法的步骤。
  10. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求 1至5任一项所述的GPU显存管理控制方法的步骤。
PCT/CN2020/103741 2019-11-15 2020-07-23 一种gpu显存管理控制方法及相关装置 WO2021093365A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911122487.X 2019-11-15
CN201911122487.XA CN110930291B (zh) 2019-11-15 2019-11-15 一种gpu显存管理控制方法及相关装置

Publications (1)

Publication Number Publication Date
WO2021093365A1 true WO2021093365A1 (zh) 2021-05-20

Family

ID=69853156

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/103741 WO2021093365A1 (zh) 2019-11-15 2020-07-23 一种gpu显存管理控制方法及相关装置

Country Status (2)

Country Link
CN (1) CN110930291B (zh)
WO (1) WO2021093365A1 (zh)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110930291B (zh) * 2019-11-15 2022-06-17 山东英信计算机技术有限公司 一种gpu显存管理控制方法及相关装置
CN111475303B (zh) * 2020-04-08 2022-11-25 苏州浪潮智能科技有限公司 一种gpu共享调度、单机多卡方法、系统及装置
CN112957068B (zh) * 2021-01-29 2023-07-11 青岛海信医疗设备股份有限公司 超声信号处理方法及终端设备
CN113259680B (zh) * 2021-06-25 2021-10-01 腾讯科技(深圳)有限公司 视频流解码方法、装置、计算机设备和存储介质
CN115292199B (zh) * 2022-09-22 2023-03-24 荣耀终端有限公司 一种显存泄露的处理方法及相关装置

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5822205A (en) * 1994-07-29 1998-10-13 Fujitsu Limited Information processing apparatus equipped with a graphical user interface
CN101515247A (zh) * 2009-03-30 2009-08-26 福建星网锐捷网络有限公司 一种内存监控的方法和装置
CN102662850A (zh) * 2012-03-30 2012-09-12 杭州华三通信技术有限公司 一种内存管理方法及其系统
CN105824702A (zh) * 2016-03-22 2016-08-03 乐视云计算有限公司 一种管理程序内存占用量的方法和终端
CN106055407A (zh) * 2016-05-25 2016-10-26 努比亚技术有限公司 进程资源调整装置及方法
CN108733531A (zh) * 2017-04-13 2018-11-02 南京维拓科技有限公司 基于云计算的gpu性能监控系统
CN110187878A (zh) * 2019-05-29 2019-08-30 北京三快在线科技有限公司 一种页面生成方法和装置
CN110930291A (zh) * 2019-11-15 2020-03-27 山东英信计算机技术有限公司 一种gpu显存管理控制方法及相关装置

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH04103996A (ja) * 1990-08-21 1992-04-06 Mitsubishi Electric Corp 戦術表示装置
CN102654830B (zh) * 2011-03-03 2015-07-22 福建星网视易信息系统有限公司 利用纹理排料方式优化显存空间的方法
CN104050008B (zh) * 2013-03-15 2018-01-23 中兴通讯股份有限公司 一种内存超分配管理系统及方法
CN105094981B (zh) * 2014-05-23 2019-02-12 华为技术有限公司 一种数据处理的方法及装置
CN104991825B (zh) * 2015-03-27 2019-07-05 北京天云融创软件技术有限公司 一种基于负载感知的Hypervisor资源超分配及动态调整方法及系统
CN109992422A (zh) * 2019-04-11 2019-07-09 北京朗镜科技有限责任公司 一种面向gpu资源的任务调度方法、装置和系统

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5822205A (en) * 1994-07-29 1998-10-13 Fujitsu Limited Information processing apparatus equipped with a graphical user interface
CN101515247A (zh) * 2009-03-30 2009-08-26 福建星网锐捷网络有限公司 一种内存监控的方法和装置
CN102662850A (zh) * 2012-03-30 2012-09-12 杭州华三通信技术有限公司 一种内存管理方法及其系统
CN105824702A (zh) * 2016-03-22 2016-08-03 乐视云计算有限公司 一种管理程序内存占用量的方法和终端
CN106055407A (zh) * 2016-05-25 2016-10-26 努比亚技术有限公司 进程资源调整装置及方法
CN108733531A (zh) * 2017-04-13 2018-11-02 南京维拓科技有限公司 基于云计算的gpu性能监控系统
CN110187878A (zh) * 2019-05-29 2019-08-30 北京三快在线科技有限公司 一种页面生成方法和装置
CN110930291A (zh) * 2019-11-15 2020-03-27 山东英信计算机技术有限公司 一种gpu显存管理控制方法及相关装置

Also Published As

Publication number Publication date
CN110930291A (zh) 2020-03-27
CN110930291B (zh) 2022-06-17

Similar Documents

Publication Publication Date Title
WO2021093365A1 (zh) 一种gpu显存管理控制方法及相关装置
JP6961844B2 (ja) ストレージボリューム作成方法および装置、サーバ、並びに記憶媒体
US8521986B2 (en) Allocating storage memory based on future file size or use estimates
WO2017028697A1 (zh) 计算机集群的扩容和缩容方法及设备
WO2021258753A1 (zh) 一种业务处理方法、装置及电子设备和存储介质
CN108989238A (zh) 一种分配业务带宽的方法以及相关设备
WO2022016861A1 (zh) 一种热点数据缓存方法、系统及相关装置
CN102063338B (zh) 一种请求独占资源的方法及装置
US20200012602A1 (en) Cache allocation method, and apparatus
US20150280981A1 (en) Apparatus and system for configuration management
CN109582649B (zh) 一种元数据存储方法、装置、设备及可读存储介质
US20170153909A1 (en) Methods and Devices for Acquiring Data Using Virtual Machine and Host Machine
US20240061712A1 (en) Method, apparatus, and system for creating training task on ai training platform, and medium
CN110147470B (zh) 一种跨机房数据比对系统及方法
WO2022016845A1 (zh) 一种多节点监控方法、装置、电子设备及存储介质
CN106997304B (zh) 输入输出事件的处理方法及设备
CN114860449A (zh) 数据处理方法、装置、设备和存储介质
CN105659216A (zh) 多核处理器系统的缓存目录处理方法和目录控制器
CN111090627B (zh) 基于池化的日志存储方法、装置、计算机设备及存储介质
CN111158847A (zh) 一种开源信息采集虚拟主机资源调度方法和系统
CN109344043A (zh) 一种性能分析方法及相关装置
TW202027480A (zh) 自動調節無伺服器程式之系統與其方法
CN103176847A (zh) 虚拟机的分配方法
CN114385338A (zh) 应用节点调度方法及装置
CN114207595A (zh) 计算装置的管理

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20887171

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20887171

Country of ref document: EP

Kind code of ref document: A1