WO2023207361A1 - 一种内存管理方法、系统、设备及计算机可读存储介质 - Google Patents

一种内存管理方法、系统、设备及计算机可读存储介质 Download PDF

Info

Publication number
WO2023207361A1
WO2023207361A1 PCT/CN2023/080786 CN2023080786W WO2023207361A1 WO 2023207361 A1 WO2023207361 A1 WO 2023207361A1 CN 2023080786 W CN2023080786 W CN 2023080786W WO 2023207361 A1 WO2023207361 A1 WO 2023207361A1
Authority
WO
WIPO (PCT)
Prior art keywords
memory
target
computing device
execution unit
target computing
Prior art date
Application number
PCT/CN2023/080786
Other languages
English (en)
French (fr)
Inventor
何也
Original Assignee
山东云海国创云计算装备产业创新中心有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 山东云海国创云计算装备产业创新中心有限公司 filed Critical 山东云海国创云计算装备产业创新中心有限公司
Publication of WO2023207361A1 publication Critical patent/WO2023207361A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present application relates to the field of computer technology, and more specifically, to a memory management method, system, device and computer non-volatile readable storage medium.
  • This application is to provide a memory management method, which can solve to a certain extent the technical problem of how to accurately manage the memory of a computing device.
  • This application also provides a memory management system, equipment and computer non-volatile readable storage medium.
  • a memory management method including:
  • the target neural network model is divided into sub-functions corresponding to each target computing device;
  • the corresponding sub-function is divided into execution units corresponding to each computing unit, and the memory management of the target computing device is performed with the execution unit as the granularity.
  • memory management of the target computing device is performed at the execution unit granularity, including:
  • the memory of the target computing device is managed based on the memory usage information.
  • determining the memory usage information of the execution unit in the target memory includes:
  • the target memory is divided into memory blocks corresponding to each execution unit, including:
  • the target memory is divided into memory blocks corresponding to each execution unit.
  • managing the memory of the target computing device based on memory usage information includes:
  • the memory block corresponding to the execution unit is allowed to be reused
  • the memory block corresponding to the execution unit is prohibited from being reused, and the execution returns to the step of decrementing the value of the number of uses by 1 if the execution unit appears once in the target computing device.
  • the target memory is divided into memory blocks corresponding to each execution unit, including:
  • the target memory is divided into memory blocks corresponding to each execution unit.
  • the method further includes:
  • a memory management system that includes:
  • the first acquisition module is used to acquire the target neural network model
  • the first segmentation module is used to segment the target neural network model into sub-functions corresponding to each target computing device based on the operation support of the operators in the target neural network model by each target computing device;
  • the first distribution module is used to distribute sub-functions to corresponding target computing devices
  • the second splitting module is used to split the corresponding sub-function into execution units corresponding to each computing unit based on the operation information of each computing unit in the target computing device for each target computing device, with the execution unit as the granularity. Performs memory management on target computing devices.
  • a memory management device that includes:
  • Memory used to store computer programs
  • a processor is used to implement the steps of any of the above memory management methods when executing a computer program.
  • a computer non-volatile readable storage medium A computer program is stored in the computer non-volatile readable storage medium. When the computer program is executed by a processor, the steps of any of the above memory management methods are implemented.
  • This application provides a memory management method to obtain a target neural network model; based on the operation support of each target computing device for the operators in the target neural network model, the target neural network model is divided into sub-functions corresponding to each target computing device. ; Distribute the sub-function to the corresponding target computing device; for each target computing device, based on the operation information of each computing unit in the target computing device, the corresponding sub-function is divided into execution units corresponding to each computing unit to execute Units perform memory management on target computing devices at a granular level.
  • the target neural network model can be divided into sub-functions corresponding to each target computing device, so that the operations required by each target computing device can be
  • the sub-function is consistent with its own computing performance, and then for each target computing device, the corresponding sub-function needs to be divided into execution units corresponding to each computing unit based on the computing information of each computing unit in the target computing device.
  • the memory management of the target computing device is performed with the execution unit as the granularity, achieving accurate management of the memory of the computing device.
  • the memory management system, equipment and computer non-volatile readable storage medium provided by this application also solve corresponding technical problems.
  • Figure 1 is a first flow chart of a memory management method provided by an embodiment of the present application
  • Figure 2 is a second flow chart of a memory management method provided by an embodiment of the present application.
  • Figure 3 is a schematic diagram of memory allocation
  • Figure 4 is a schematic diagram of the sub-function
  • Figure 5 is a schematic structural diagram of a memory management system provided by an embodiment of the present application.
  • Figure 6 is a schematic structural diagram of a memory management processing device provided by an embodiment of the present application.
  • FIG. 7 is another schematic structural diagram of a memory management processing device provided by an embodiment of the present application.
  • Figure 1 is a first flow chart of a memory management method provided by an embodiment of the present application.
  • Step S101 Obtain the target neural network model.
  • the target neural network model to be calculated can be obtained first, and the type of the target neural network model can be determined according to actual needs, which is not specifically limited in this application.
  • Step S102 Based on the operation support of the operators in the target neural network model by each target computing device, the target neural network model is divided into sub-functions corresponding to each target computing device.
  • the target neural network model can be divided into sub-functions corresponding to each target computing device based on the operation support of the operators in the target neural network model by each target computing device, such as If some target computing devices are suitable for convolution operations, the convolution operator in the target neural network model can be divided into corresponding sub-functions and distributed to the target computing device. For example, if some target computing devices are suitable for pooling operations, then The pooling operator in the target neural network model can be divided into corresponding sub-functions and distributed to the target computing device, etc.
  • the target computing device refers to a device with computing capabilities.
  • the types of operators in the target computing device and the target neural network model can be determined according to actual needs.
  • the target computing device can be a CPU (central processing unit, central processing unit). Processor), GPU (graphics processing unit, graphics processor), FPGA (Field Programmable Gate Array, field programmable logic gate array), etc.
  • the operators in the target neural network model can be convolution operators, pooling operators , activation operators, etc., this application does not make specific limitations here.
  • Step S103 Distribute the sub-function to the corresponding target computing device.
  • the obtained sub-functions can be distributed to the corresponding The target computing device, so that the target computing device can process the corresponding sub-function.
  • Step S104 For each target computing device, based on the operation information of each computing unit in the target computing device, the corresponding sub-function is divided into execution units corresponding to each computing unit, and the target computing device is processed with the execution unit as the granularity. Memory management.
  • the corresponding sub-functions need to be divided into corresponding sub-functions based on the operation information of each computing unit in the target computing device.
  • the execution unit of the target computing device is used as the granularity of the execution unit to perform memory management because the granularity of the execution unit is smaller than that of the sub-execution unit.
  • the granularity of the function so if the memory management of the target computing device is performed with the execution unit as the granularity, the memory of the target computing device can be managed in a more detailed and accurate manner.
  • the execution subject of the memory management method provided by this application can be determined according to actual needs.
  • the execution subject can be a deep learning compiler.
  • the deep learning compiler obtains the target neural network model, it can read
  • the target neural network model generated by the deep learning framework is used to obtain the target neural network model that satisfies the IR (Intermediate Representation, intermediate representation) form of the deep learning compiler, thereby facilitating the deep learning compiler to process the target neural network model.
  • the The execution subject can also be a computer device that deploys the deep learning compiler, or a computer device that needs to run the target neural network model, etc. This application does not specifically limit it here.
  • This application provides a memory management method to obtain a target neural network model; based on the operation support of each target computing device for the operators in the target neural network model, the target neural network model is divided into sub-functions corresponding to each target computing device. ; Distribute the sub-function to the corresponding target computing device; for each target computing device, based on the operation information of each computing unit in the target computing device, the corresponding sub-function is divided into execution units corresponding to each computing unit to execute Units perform memory management on target computing devices at a granular level.
  • the target neural network model can be divided into sub-functions corresponding to each target computing device, so that the operations required by each target computing device can be
  • the sub-function is consistent with its own computing performance, and then for each target computing device, the corresponding sub-function needs to be divided into execution units corresponding to each computing unit based on the computing information of each computing unit in the target computing device.
  • the memory management of the target computing device is performed with the execution unit as the granularity, achieving accurate management of the memory of the computing device.
  • Figure 2 is a second flow chart of a memory management method provided by an embodiment of the present application.
  • Step S201 Obtain the target neural network model.
  • Step S202 Based on the operation support of the operators in the target neural network model by each target computing device, the target neural network model is divided into sub-functions corresponding to each target computing device.
  • Step S203 Distribute the sub-function to the corresponding target computing device.
  • Step S204 For each target computing device, based on the operation information of each computing unit in the target computing device, the corresponding sub-function is divided into execution units corresponding to each computing unit, and the memory of the target computing device is divided into target memories. and reserve memory; determine the memory occupation information of the execution unit in the target memory; manage the memory of the target computing device based on the memory occupation information.
  • the target in the process of memory management of the target computing device with the execution unit as the granularity, can be The memory of the computing device is divided into target memory and reserved memory; the memory occupation information of the execution unit in the target memory is determined; and the memory of the target computing device is managed based on the memory occupation information. That is, the target memory is first used to process the execution unit, and the reserved memory is used as the reserved memory when the target memory is insufficient. For example, when the target memory is insufficient to meet the memory requirements of the execution unit, the reserved memory is used to compensate the target memory for memory compensation.
  • the target memory in the process of determining the memory occupation information of the execution unit in the target memory, can be divided into memory blocks corresponding to each execution unit; the correspondence between the execution unit and the memory block is used as the memory occupation information. In this way, memory management of execution units can be accurately performed based on the correspondence between execution units and memory blocks.
  • the memory allocation can be shown in Figure 3, where memory space 1 in Figure 3 represents the memory block corresponding to the first execution unit, and memory space 2 in Figure 3 represents the second The memory block corresponding to the execution unit.
  • Memory space 3 in Figure 3 represents the memory block corresponding to the third execution unit.
  • Memory space 4 in Figure 3 represents the memory block corresponding to the fourth execution unit.
  • the unnumbered memory in Figure 3 Space represents reserved memory, etc.
  • the target memory in practical applications, in the process of dividing the target memory into memory blocks corresponding to each execution unit, the target memory can be divided into memory blocks corresponding to each execution unit based on the memory reuse principle. It should be noted that the memory reuse principle can reuse the same memory block between different execution units, or the same memory block can be reused for the input and output of the same execution unit. This application does not make specific limitations here.
  • the same memory block may be used by multiple execution units, which will cause the stored data of the memory block to be used. Overwriting, if the overwritten data is still used in subsequent applications, subsequent calculation operations will not be performed, and the final calculation result will not be obtained.
  • the memory of the target computing device is calculated based on the memory usage information. During the management process, the number of occurrences of each execution unit in the sub-function can be counted, and the number of occurrences is used as the number of uses of the memory block corresponding to the execution unit. If the execution unit appears once in the target computing device, the number of uses is The value is reduced by 1.
  • the value of the number of uses remains unchanged; for each execution unit, determine whether the value of the number of uses of the corresponding memory block is 0; if the value of the number of uses is 0, the memory block corresponding to the execution unit is allowed to be reused; if the value of the number of uses is not 0, the memory block corresponding to the execution unit is prohibited from being reused, and execution is returned. If the execution unit appears once in the target computing device , then the step of decrementing the value of the number of uses by 1. In this way, when the value of the memory block's usage count is not 0, that is, when the memory block still needs to be used, the memory block cannot be reused, and the data in the memory block cannot be overwritten. It can ensure that the data in the memory block can be used later.
  • a depth-first traversal method is used to count the occurrence of each operator on different paths in the calculation graph. The number of times, this number represents the number of times the output of this operator needs to be used by subsequent operators.
  • the first operator appears in two lines from top to bottom. On the path, its output is used by the third operator and the second operator respectively, and its usage count is 2.
  • a depth-first traversal is performed starting from the output, and the memory numbers occupied by the input and output of each operator are counted.
  • the first operator is processed first, and its input is As for the input of the entire sub-function, assuming that the memory number it occupies is 0, the number of times this memory number is updated is the statistical number of times its data is currently used.
  • the input of the entire function is only used once, so the number of times the memory number numbered 0 is 1.
  • For its output first traverse the existing memory number and compare whether the number of times the memory number has been used is 0. If it is 0 and the block size of the memory number is greater than or equal to the space required for the operator output, then the output will be The memory number is set to this memory number, otherwise a new space is opened from the memory to give it a new number.
  • the number of uses of the memory block that stores the input of the current operator is reduced by 1. In this way, the input and output numbers of each operator can be determined, and memory reuse is achieved.
  • the target memory in practical applications, in the process of dividing the target memory into memory blocks corresponding to each execution unit, the target memory can also be divided into memory blocks corresponding to each execution unit based on the principle of the fastest execution speed. This application is here No specific restrictions are made.
  • FIG. 5 is a schematic structural diagram of a memory management system provided by an embodiment of the present application.
  • the first acquisition module 101 is used to acquire the target neural network model
  • the first segmentation module 102 is used to segment the target neural network model into sub-functions corresponding to each target computing device based on the operation support of the operators in the target neural network model by each target computing device;
  • the first distribution module 103 is used to distribute sub-functions to corresponding target computing devices
  • the second splitting module 104 is used to split the corresponding sub-function into execution units corresponding to each computing unit based on the operation information of each computing unit in the target computing device for each target computing device, where the execution unit is Granular memory management for target computing devices.
  • the second segmentation module may include:
  • the first dividing unit is used to divide the memory of the target computing device into target memory and reserved memory;
  • the first determination unit is used to determine the memory occupation information of the execution unit in the target memory
  • the first management unit is used to manage the memory of the target computing device based on the memory occupation information.
  • the first determination unit may be specifically configured to: divide the target memory into memory blocks corresponding to each execution unit; and use the correspondence between the execution units and the memory blocks as memory occupation information.
  • the first determination unit may be specifically configured to: based on the memory reuse principle, divide the target memory into memory blocks corresponding to each execution unit.
  • the first management unit can be specifically used to: count the number of occurrences of each execution unit in the sub-function, use the number of occurrences as the number of uses of the memory block corresponding to the execution unit, and if executed If the unit appears once in the target computing device, the value of the number of uses is decremented by 1; for each execution unit, determine whether the value of the number of uses of the corresponding memory block is 0; if the value of the number of uses is 0, execution is allowed The memory block corresponding to the execution unit is reused; if the value of the number of uses is not 0, the memory block corresponding to the execution unit is prohibited from being reused and execution is returned. If the execution unit appears once in the target computing device, the number of uses will be Steps to decrease the value by 1.
  • the first determination unit may be specifically configured to: based on the principle of fastest execution speed, divide the target memory into memory blocks corresponding to each execution unit.
  • the first compensation module is used to perform memory compensation on the target memory using reserved memory after the second segmentation module manages the memory of the target computing device based on the memory occupation information.
  • This application also provides a memory management device and a computer non-volatile readable storage medium, both of which have the corresponding effects of a memory management method provided by the embodiment of this application.
  • FIG. 6 is a schematic structural diagram of a memory management processing device provided by an embodiment of the present application.
  • a memory management device provided by an embodiment of the present application includes a memory 201 and a processor 202.
  • a computer program is stored in the memory 201.
  • the processor 202 executes the computer program, the following steps are implemented:
  • the target neural network model is divided into sub-functions corresponding to each target computing device;
  • the corresponding sub-function is divided into execution units corresponding to each computing unit, and the memory management of the target computing device is performed with the execution unit as the granularity.
  • a memory management device provided by an embodiment of the present application includes a memory 201 and a processor 202.
  • a computer program is stored in the memory 201.
  • the processor 202 executes the computer program, it implements the following steps: divides the memory of the target computing device into target memory and Reserve memory; determine the memory occupancy information of the execution unit in the target memory; manage the memory of the target computing device based on the memory occupancy information.
  • a memory management device provided by an embodiment of the present application includes a memory 201 and a processor 202.
  • a computer program is stored in the memory 201.
  • the processor 202 executes the computer program, it implements the following steps: divides the target memory into different execution units.
  • the memory block corresponding to the row unit; the correspondence between the execution unit and the memory block is used as the memory occupation information.
  • a memory management device provided by an embodiment of the present application includes a memory 201 and a processor 202.
  • a computer program is stored in the memory 201.
  • the processor 202 executes the computer program, the following steps are implemented: Based on the memory reuse principle, the target memory is divided into Memory blocks corresponding to each execution unit.
  • a memory management device provided by an embodiment of the present application includes a memory 201 and a processor 202.
  • a computer program is stored in the memory 201.
  • the processor 202 executes the computer program, it implements the following steps: counting the number of occurrences of each execution unit in a sub-function.
  • a memory management device provided by an embodiment of the present application includes a memory 201 and a processor 202.
  • a computer program is stored in the memory 201.
  • the processor 202 executes the computer program, it implements the following steps: based on the principle of fastest execution speed, divide the target memory into is the memory block corresponding to each execution unit.
  • a memory management device provided by an embodiment of the present application includes a memory 201 and a processor 202.
  • a computer program is stored in the memory 201.
  • the processor 202 executes the computer program, it implements the following steps: performs the following steps on the memory of the target computing device based on the memory occupation information.
  • the application reserves memory to perform memory compensation on the target memory.
  • another memory management device may also include: an input port 203 connected to the processor 202 for transmitting commands input from the outside to the processor 202 ;
  • the display unit 204 is used to display the processing results of the processor 202 to the outside world;
  • the communication module 205 connected to the processor 202 is used to implement communication between the memory management device and the outside world.
  • the display unit 204 can be a display panel, a laser scanning display, etc.; the communication methods used by the communication module 205 include but are not limited to mobile high-definition link technology (HML), universal serial bus (USB), high-definition multimedia interface (HDMI), Wireless connection: wireless fidelity technology (WiFi), Bluetooth communication technology, low-power Bluetooth communication technology, communication technology based on IEEE802.11s.
  • HML mobile high-definition link technology
  • USB universal serial bus
  • HDMI high-definition multimedia interface
  • WiFi wireless fidelity technology
  • Bluetooth communication technology low-power Bluetooth communication technology
  • An embodiment of the present application provides a computer non-volatile readable storage medium.
  • the computer non-volatile readable storage medium stores a computer program.
  • the computer program is executed by a processor, the following steps are implemented:
  • the target neural network model is divided into sub-functions corresponding to each target computing device;
  • the corresponding sub-function is divided into execution units corresponding to each computing unit, and the memory management of the target computing device is performed with the execution unit as the granularity.
  • An embodiment of the present application provides a computer non-volatile readable storage medium.
  • a computer program is stored in the computer non-volatile readable storage medium.
  • the following steps are implemented: transfer the memory of the target computing device to Divide it into target memory and reserved memory; determine the memory occupation information of the execution unit in the target memory; manage the memory of the target computing device based on the memory occupation information.
  • An embodiment of the present application provides a computer non-volatile readable storage medium.
  • the computer non-volatile readable storage medium stores a computer program.
  • the following steps are implemented: Divide the target memory into The memory block corresponding to each execution unit; the correspondence between the execution unit and the memory block is used as the memory occupation information.
  • the embodiment of the present application provides a computer non-volatile readable storage medium.
  • the computer non-volatile readable storage medium stores a computer program.
  • the following steps are implemented: Based on the principle of memory reuse, Divide the target memory into memory blocks corresponding to individual execution units.
  • An embodiment of the present application provides a computer non-volatile readable storage medium.
  • a computer program is stored in the computer non-volatile readable storage medium.
  • the following steps are implemented: counting the execution times of each execution unit.
  • the number of occurrences in the function is used as the number of uses of the memory block corresponding to the execution unit, and if the execution unit appears once in the target computing device, the value of the number of uses is decremented by 1; for each execution unit, determine the corresponding Whether the value of the number of usage times of the memory block is 0; if the value of the number of usage times is 0, the memory block corresponding to the execution unit is allowed to be reused; if the value of the number of usage times is not 0, the memory block corresponding to the execution unit is prohibited from being reused Perform multiplexing and return to the step of decrementing the value of the number of uses by 1 if the execution unit appears once in the target computing device.
  • An embodiment of the present application provides a computer non-volatile readable storage medium.
  • the computer non-volatile readable storage medium stores a computer program.
  • the computer program When the computer program is executed by a processor, the following steps are implemented: Based on the principle of fastest execution speed , divide the target memory into memory blocks corresponding to each execution unit.
  • An embodiment of the present application provides a computer non-volatile readable storage medium.
  • a computer program is stored in the computer non-volatile readable storage medium.
  • the computer program is executed by a processor, the following steps are implemented: Based on the memory occupation information, the target After the memory of the computing device is managed, the reserved memory is applied to perform memory compensation on the target memory.
  • Computer non-volatile readable storage media involved in this application include random access memory (RAM), memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disks, and removable disks , CD-ROM, or any other form of storage media known in the technical field.
  • RAM random access memory
  • ROM read-only memory
  • electrically programmable ROM electrically erasable programmable ROM
  • registers hard disks
  • removable disks CD-ROM, or any other form of storage media known in the technical field.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

本申请公开了一种内存管理方法、系统、设备及计算机可读存储介质,获取目标神经网络模型;基于各个目标计算设备对目标神经网络模型中算子的运算支持,将目标神经网络模型剖分为与各个目标计算设备对应的子函数;分发子函数至对应的目标计算设备;对于每个目标计算设备,均基于目标计算设备中各个计算单元的运算信息,将对应的子函数剖分为与各个计算单元对应的执行单元,以执行单元为粒度对目标计算设备进行内存管理。使得各个目标计算设备所需运算的子函数与自身的运算性能相符合,且以执行单元为粒度对目标计算设备进行内存管理,实现了对计算设备内存的准确管理。本申请提供的内存管理系统、设备及计算机可读存储介质也解决了相应技术问题。

Description

一种内存管理方法、系统、设备及计算机可读存储介质
相关申请的交叉引用
本申请要求于2022年04月26日提交中国专利局,申请号为202210446431.5,申请名称为“一种内存管理方法、系统、设备及计算机可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机技术领域,更具体地说,涉及一种内存管理方法、系统、设备及计算机非易失性可读存储介质。
背景技术
在神经网络模型的使用过程中,需要借助具有计算能力的计算设备来运行神经网络模型中的相应计算,因为神经网络模型中算子的输入输出会占据一定的存储空间,而计算设备的内存空间有限,如果内存分配不合理的话,会使得神经网络模型的运算速度降低,所以为了提高神经网络模型的运算速度,需对计算设备的内存进行准确管理。
发明内容
本申请的目的是提供一种内存管理方法,其能在一定程度上解决如何对计算设备的内存进行准确管理的技术问题。本申请还提供了一种内存管理系统、设备及计算机非易失性可读存储介质。
为了实现上述目的,本申请提供如下技术方案:
一种内存管理方法,包括:
获取目标神经网络模型;
基于各个目标计算设备对目标神经网络模型中算子的运算支持,将目标神经网络模型剖分为与各个目标计算设备对应的子函数;
分发子函数至对应的目标计算设备;
对于每个目标计算设备,均基于目标计算设备中各个计算单元的运算信息,将对应的子函数剖分为与各个计算单元对应的执行单元,以执行单元为粒度对目标计算设备进行内存管理。
在一些实施例中,以执行单元为粒度对目标计算设备进行内存管理,包括:
将目标计算设备的内存划分为目标内存及预留内存;
确定执行单元在目标内存中的内存占用信息;
基于内存占用信息对目标计算设备的内存进行管理。
在一些实施例中,确定执行单元在目标内存中的内存占用信息,包括:
将目标内存划分为与各个执行单元对应的内存块;
将执行单元与内存块间的对应关系作为内存占用信息。
在一些实施例中,将目标内存划分为与各个执行单元对应的内存块,包括:
基于内存复用原则,将目标内存划分为与各个执行单元对应的内存块。
在一些实施例中,基于内存占用信息对目标计算设备的内存进行管理,包括:
统计各个执行单元在子函数中的出现次数,将出现次数作为执行单元对应的内存块的使用次数,并且若执行单元在目标计算设备中出现一次,则将使用次数的值减1;
对于每个执行单元,判断对应的内存块的使用次数的值是否为0;
若使用次数的值为0,则允许对执行单元对应的内存块进行复用;
若使用次数的值不为0,则禁止对执行单元对应的内存块进行复用,并返回执行若执行单元在目标计算设备中出现一次,则将使用次数的值减1的步骤。
在一些实施例中,将目标内存划分为与各个执行单元对应的内存块,包括:
基于执行速度最快原则,将目标内存划分为与各个执行单元对应的内存块。
在一些实施例中,基于内存占用信息对目标计算设备的内存进行管理之后,还包括:
应用预留内存对目标内存进行内存补偿。
一种内存管理系统,包括:
第一获取模块,用于获取目标神经网络模型;
第一剖分模块,用于基于各个目标计算设备对目标神经网络模型中算子的运算支持,将目标神经网络模型剖分为与各个目标计算设备对应的子函数;
第一分发模块,用于分发子函数至对应的目标计算设备;
第二剖分模块,用于对于每个目标计算设备,均基于目标计算设备中各个计算单元的运算信息,将对应的子函数剖分为与各个计算单元对应的执行单元,以执行单元为粒度对目标计算设备进行内存管理。
一种内存管理设备,包括:
存储器,用于存储计算机程序;
处理器,用于执行计算机程序时实现如上任一内存管理方法的步骤。
一种计算机非易失性可读存储介质,计算机非易失性可读存储介质中存储有计算机程序,计算机程序被处理器执行时实现如上任一内存管理方法的步骤。
本申请提供的一种内存管理方法,获取目标神经网络模型;基于各个目标计算设备对目标神经网络模型中算子的运算支持,将目标神经网络模型剖分为与各个目标计算设备对应的子函数;分发子函数至对应的目标计算设备;对于每个目标计算设备,均基于目标计算设备中各个计算单元的运算信息,将对应的子函数剖分为与各个计算单元对应的执行单元,以执行单元为粒度对目标计算设备进行内存管理。本申请中,可以先基于各个目标计算设备对目标神经网络模型中算子的运算支持,将目标神经网络模型剖分为与各个目标计算设备对应的子函数,使得各个目标计算设备所需运算的子函数与自身的运算性能相符合,且之后对于每个目标计算设备,还需基于目标计算设备中各个计算单元的运算信息,将对应的子函数剖分为与各个计算单元对应的执行单元,以执行单元为粒度对目标计算设备进行内存管理,实现了对计算设备内存的准确管理。本申请提供的一种内存管理系统、设备及计算机非易失性可读存储介质也解决了相应技术问题。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据提供的附图获得其他的附图。
图1为本申请实施例提供的一种内存管理方法的第一流程图;
图2为本申请实施例提供的一种内存管理方法的第二流程图;
图3为内存的分配示意图;
图4为子函数的示意图;
图5为本申请实施例提供的一种内存管理系统的结构示意图;
图6为本申请实施例提供的一种内存管理处理设备的结构示意图;
图7为本申请实施例提供的一种内存管理处理设备的另一结构示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描 述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
请参阅图1,图1为本申请实施例提供的一种内存管理方法的第一流程图。
本申请实施例提供的一种内存管理方法,可以包括以下步骤:
步骤S101:获取目标神经网络模型。
实际应用中,可以先获取待运算的目标神经网络模型,目标神经网络模型的类型可以根据实际需要确定,本申请在此不做具体限定。
步骤S102:基于各个目标计算设备对目标神经网络模型中算子的运算支持,将目标神经网络模型剖分为与各个目标计算设备对应的子函数。
实际应用中,在获取目标神经网络模型之后,便可以基于各个目标计算设备对目标神经网络模型中算子的运算支持,将目标神经网络模型剖分为与各个目标计算设备对应的子函数,比如有的目标计算设备适合卷积运算,则可以将目标神经网络模型中的卷积算子剖分成相应的子函数来分发给该目标计算设备,再比如有的目标计算设备适合池化运算,则可以将目标神经网络模型中的池化算子剖分成相应的子函数来分发给该目标计算设备等。
需要说明的是,目标计算设备指的是具有计算能力的设备,目标计算设备及目标神经网络模型中算子的类型均可以根据实际需要确定,比如目标计算设备可以为CPU(central processing unit,中央处理器)、GPU(graphics processing unit,图形处理器)、FPGA(Field Programmable Gate Array,现场可编程逻辑门阵列)等,目标神经网络模型中的算子可以为卷积算子、池化算子、激活算子等,本申请在此不做具体限定。
步骤S103:分发子函数至对应的目标计算设备。
实际应用中,在基于各个目标计算设备对目标神经网络模型中算子的运算支持,将目标神经网络模型剖分为与各个目标计算设备对应的子函数之后,便可以分发得到的子函数至对应的目标计算设备,以使目标计算设备可以对相应的子函数进行处理。
步骤S104:对于每个目标计算设备,均基于目标计算设备中各个计算单元的运算信息,将对应的子函数剖分为与各个计算单元对应的执行单元,以执行单元为粒度对目标计算设备进行内存管理。
实际应用中,在分发子函数至对应的目标计算设备之后,对于每个目标计算设备,还需基于目标计算设备中各个计算单元的运算信息,将对应的子函数剖分为与各个计算单元对应的执行单元,以执行单元为粒度对目标计算设备进行内存管理,因为执行单元的粒度小于子 函数的粒度,所以以执行单元为粒度对目标计算设备进行内存管理的话,可以更细致、准确的对目标计算设备的内存进行管理。
需要说明的是,本申请提供的内存管理方法的执行主体可以根据实际需要确定,比如该执行主体可以为深度学习编译器,此时,深度学习编译器在获取目标神经网络模型时,可以读取深度学习框架生成的目标神经网络模型,以得到满足深度学习编译器的IR(Intermediate Representation,中间表示)形式的目标神经网络模型,进而便于深度学习编译器对目标神经网络模型进行处理,当然,该执行主体还可以为部署该深度学习编译器的计算机设备,或者为需运行目标神经网络模型的计算机设备等,本申请在此不做具体限定。
本申请提供的一种内存管理方法,获取目标神经网络模型;基于各个目标计算设备对目标神经网络模型中算子的运算支持,将目标神经网络模型剖分为与各个目标计算设备对应的子函数;分发子函数至对应的目标计算设备;对于每个目标计算设备,均基于目标计算设备中各个计算单元的运算信息,将对应的子函数剖分为与各个计算单元对应的执行单元,以执行单元为粒度对目标计算设备进行内存管理。本申请中,可以先基于各个目标计算设备对目标神经网络模型中算子的运算支持,将目标神经网络模型剖分为与各个目标计算设备对应的子函数,使得各个目标计算设备所需运算的子函数与自身的运算性能相符合,且之后对于每个目标计算设备,还需基于目标计算设备中各个计算单元的运算信息,将对应的子函数剖分为与各个计算单元对应的执行单元,以执行单元为粒度对目标计算设备进行内存管理,实现了对计算设备内存的准确管理。
请参阅图2,图2为本申请实施例提供的一种内存管理方法的第二流程图。
本申请实施例提供的一种内存管理方法,可以包括以下步骤:
步骤S201:获取目标神经网络模型。
步骤S202:基于各个目标计算设备对目标神经网络模型中算子的运算支持,将目标神经网络模型剖分为与各个目标计算设备对应的子函数。
步骤S203:分发子函数至对应的目标计算设备。
步骤S204:对于每个目标计算设备,均基于目标计算设备中各个计算单元的运算信息,将对应的子函数剖分为与各个计算单元对应的执行单元,将目标计算设备的内存划分为目标内存及预留内存;确定执行单元在目标内存中的内存占用信息;基于内存占用信息对目标计算设备的内存进行管理。
实际应用中,在以执行单元为粒度对目标计算设备进行内存管理的过程中,可以将目标 计算设备的内存划分为目标内存及预留内存;确定执行单元在目标内存中的内存占用信息;基于内存占用信息对目标计算设备的内存进行管理。也即先使用目标内存来处理执行单元,将预留内存作为目标内存不足时的留存内存,比如在目标内存不足以满足执行单元的内存需求时,应用预留内存对目标内存进行内存补偿等。
实际应用中,在确定执行单元在目标内存中的内存占用信息的过程中,可以将目标内存划分为与各个执行单元对应的内存块;将执行单元与内存块间的对应关系作为内存占用信息。这样,可以基于执行单元与内存块间的对应关系来准确对执行单元进行内存管理。假设执行单元有4个,则内存的分配情况可以如图3所示,其中,图3中的内存空间1表示第一个执行单元对应的内存块,图3中的内存空间2表示第二个执行单元对应的内存块,图3中的内存空间3表示第三个执行单元对应的内存块,图3中的内存空间4表示第四个执行单元对应的内存块,图3中未编号的内存空间表示预留内存等。
实际应用中,在将目标内存划分为与各个执行单元对应的内存块的过程中,可以基于内存复用原则,将目标内存划分为与各个执行单元对应的内存块。需要说明的是,内存复用原则可以为不同执行单元间复用同一个内存块,也可以为同一个执行单元的输入、输出复用同一个内存块等,本申请在此不做具体限定。
具体应用场景中,在基于内存复用原则将目标内存划分为与各个执行单元对应的内存块之后,同一个内存块可能会被多个执行单元所使用,由此会使得内存块的存储数据被覆盖掉,如果被覆盖掉的数据还在后续有应用的话,会导致后续运算操作无法进行,进而导致无法得到最终的运算结果,为了避免此种情况,在基于内存占用信息对目标计算设备的内存进行管理的过程中,可以统计各个执行单元在子函数中的出现次数,将出现次数作为执行单元对应的内存块的使用次数,并且若执行单元在目标计算设备中出现一次,则将使用次数的值减1,若执行单元在目标计算设备中未出现,则保持使用次数的值不变;对于每个执行单元,判断对应的内存块的使用次数的值是否为0;若使用次数的值为0,则允许对执行单元对应的内存块进行复用;若使用次数的值不为0,则禁止对执行单元对应的内存块进行复用,并返回执行若执行单元在目标计算设备中出现一次,则将使用次数的值减1的步骤。这样,在内存块的使用次数的值不为0的情况下,也即在内存块还需被使用的情况下,无法对内存块进行复用,也就无法对内存块中的数据进行覆盖,可以保证内存块中的数据被后续使用。
为了便于理解,假设子函数的类型如图4所示,从子函数的第三个算子和第四个算子开始采用深度优先遍历的方式统计每个算子在计算图的不同路径上出现的次数,这个次数即表示该算子的输出需要被后续算子使用的次数,由图4可知,从上到下第一个算子出现在两条 路径上,它的输出分别被第三个算子和第二个算子所使用,它的使用计数为2。完成各算子的统计计数后同第一次一样,从输出开始进行深度优先遍历,计数各个算子的输入和输出所占内存编号,因此首先处理的是第一个算子,它的输入为整个子函数的输入,假设它所占的内存编号为0,更新此内存编号的次数为当前使用它的数据的统计次数,整个函数的输入只使用一次,因此编号为0的内存编号的次数为1,对于它的输出,首先遍历已有的内存编号,比较该内存编号的使用次数是否为0,若为0且该内存编号的区块大小大于等于该算子输出所需要空间,则将输出的内存编号置为此内存编号,否则从内存中新开辟一段空间给它新的编号。在遍历已有内存编号的过程中对于存储的是当前算子的输入的内存块的使用次数减1。通过这样的方式即能确定每个算子的输入输出编号,且实现了内存的重用。
实际应用中,在将目标内存划分为与各个执行单元对应的内存块的过程中,还可以基于执行速度最快原则,将目标内存划分为与各个执行单元对应的内存块等,本申请在此不做具体限定。
请参阅图5,图5为本申请实施例提供的一种内存管理系统的结构示意图。
本申请实施例提供的一种内存管理系统,可以包括:
第一获取模块101,用于获取目标神经网络模型;
第一剖分模块102,用于基于各个目标计算设备对目标神经网络模型中算子的运算支持,将目标神经网络模型剖分为与各个目标计算设备对应的子函数;
第一分发模块103,用于分发子函数至对应的目标计算设备;
第二剖分模块104,用于对于每个目标计算设备,均基于目标计算设备中各个计算单元的运算信息,将对应的子函数剖分为与各个计算单元对应的执行单元,以执行单元为粒度对目标计算设备进行内存管理。
本申请实施例提供的一种内存管理系统,第二剖分模块可以包括:
第一划分单元,用于将目标计算设备的内存划分为目标内存及预留内存;
第一确定单元,用于确定执行单元在目标内存中的内存占用信息;
第一管理单元,用于基于内存占用信息对目标计算设备的内存进行管理。
本申请实施例提供的一种内存管理系统,第一确定单元可以具体用于:将目标内存划分为与各个执行单元对应的内存块;将执行单元与内存块间的对应关系作为内存占用信息。
本申请实施例提供的一种内存管理系统,第一确定单元可以具体用于:基于内存复用原则,将目标内存划分为与各个执行单元对应的内存块。
本申请实施例提供的一种内存管理系统,第一管理单元可以具体用于:统计各个执行单元在子函数中的出现次数,将出现次数作为执行单元对应的内存块的使用次数,并且若执行单元在目标计算设备中出现一次,则将使用次数的值减1;对于每个执行单元,判断对应的内存块的使用次数的值是否为0;若使用次数的值为0,则允许对执行单元对应的内存块进行复用;若使用次数的值不为0,则禁止对执行单元对应的内存块进行复用,并返回执行若执行单元在目标计算设备中出现一次,则将使用次数的值减1的步骤。
本申请实施例提供的一种内存管理系统,第一确定单元可以具体用于:基于执行速度最快原则,将目标内存划分为与各个执行单元对应的内存块。
本申请实施例提供的一种内存管理系统,还可以包括:
第一补偿模块,用于第二剖分模块基于内存占用信息对目标计算设备的内存进行管理之后,应用预留内存对目标内存进行内存补偿。
本申请还提供了一种内存管理设备及计算机非易失性可读存储介质,其均具有本申请实施例提供的一种内存管理方法具有的对应效果。请参阅图6,图6为本申请实施例提供的一种内存管理处理设备的结构示意图。
本申请实施例提供的一种内存管理设备,包括存储器201和处理器202,存储器201中存储有计算机程序,处理器202执行计算机程序时实现如下步骤:
获取目标神经网络模型;
基于各个目标计算设备对目标神经网络模型中算子的运算支持,将目标神经网络模型剖分为与各个目标计算设备对应的子函数;
分发子函数至对应的目标计算设备;
对于每个目标计算设备,均基于目标计算设备中各个计算单元的运算信息,将对应的子函数剖分为与各个计算单元对应的执行单元,以执行单元为粒度对目标计算设备进行内存管理。
本申请实施例提供的一种内存管理设备,包括存储器201和处理器202,存储器201中存储有计算机程序,处理器202执行计算机程序时实现如下步骤:将目标计算设备的内存划分为目标内存及预留内存;确定执行单元在目标内存中的内存占用信息;基于内存占用信息对目标计算设备的内存进行管理。
本申请实施例提供的一种内存管理设备,包括存储器201和处理器202,存储器201中存储有计算机程序,处理器202执行计算机程序时实现如下步骤:将目标内存划分为与各个执 行单元对应的内存块;将执行单元与内存块间的对应关系作为内存占用信息。
本申请实施例提供的一种内存管理设备,包括存储器201和处理器202,存储器201中存储有计算机程序,处理器202执行计算机程序时实现如下步骤:基于内存复用原则,将目标内存划分为与各个执行单元对应的内存块。
本申请实施例提供的一种内存管理设备,包括存储器201和处理器202,存储器201中存储有计算机程序,处理器202执行计算机程序时实现如下步骤:统计各个执行单元在子函数中的出现次数,将出现次数作为执行单元对应的内存块的使用次数,并且若执行单元在目标计算设备中出现一次,则将使用次数的值减1;对于每个执行单元,判断对应的内存块的使用次数的值是否为0;若使用次数的值为0,则允许对执行单元对应的内存块进行复用;若使用次数的值不为0,则禁止对执行单元对应的内存块进行复用,并返回执行若执行单元在目标计算设备中出现一次,则将使用次数的值减1的步骤。
本申请实施例提供的一种内存管理设备,包括存储器201和处理器202,存储器201中存储有计算机程序,处理器202执行计算机程序时实现如下步骤:基于执行速度最快原则,将目标内存划分为与各个执行单元对应的内存块。
本申请实施例提供的一种内存管理设备,包括存储器201和处理器202,存储器201中存储有计算机程序,处理器202执行计算机程序时实现如下步骤:基于内存占用信息对目标计算设备的内存进行管理之后,应用预留内存对目标内存进行内存补偿。
请参阅图7,本申请实施例提供的另一种内存管理设备中还可以包括:与处理器202连接的输入端口203,用于传输外界输入的命令至处理器202;与处理器202连接的显示单元204,用于显示处理器202的处理结果至外界;与处理器202连接的通信模块205,用于实现内存管理设备与外界的通信。显示单元204可以为显示面板、激光扫描使显示器等;通信模块205所采用的通信方式包括但不局限于移动高清链接技术(HML)、通用串行总线(USB)、高清多媒体接口(HDMI)、无线连接:无线保真技术(WiFi)、蓝牙通信技术、低功耗蓝牙通信技术、基于IEEE802.11s的通信技术。
本申请实施例提供的一种计算机非易失性可读存储介质,计算机非易失性可读存储介质中存储有计算机程序,计算机程序被处理器执行时实现如下步骤:
获取目标神经网络模型;
基于各个目标计算设备对目标神经网络模型中算子的运算支持,将目标神经网络模型剖分为与各个目标计算设备对应的子函数;
分发子函数至对应的目标计算设备;
对于每个目标计算设备,均基于目标计算设备中各个计算单元的运算信息,将对应的子函数剖分为与各个计算单元对应的执行单元,以执行单元为粒度对目标计算设备进行内存管理。
本申请实施例提供的一种计算机非易失性可读存储介质,计算机非易失性可读存储介质中存储有计算机程序,计算机程序被处理器执行时实现如下步骤:将目标计算设备的内存划分为目标内存及预留内存;确定执行单元在目标内存中的内存占用信息;基于内存占用信息对目标计算设备的内存进行管理。
本申请实施例提供的一种计算机非易失性可读存储介质,计算机非易失性可读存储介质中存储有计算机程序,计算机程序被处理器执行时实现如下步骤:将目标内存划分为与各个执行单元对应的内存块;将执行单元与内存块间的对应关系作为内存占用信息。
本申请实施例提供的一种计算机非易失性可读存储介质,计算机非易失性可读存储介质中存储有计算机程序,计算机程序被处理器执行时实现如下步骤:基于内存复用原则,将目标内存划分为与各个执行单元对应的内存块。
本申请实施例提供的一种计算机非易失性可读存储介质,计算机非易失性可读存储介质中存储有计算机程序,计算机程序被处理器执行时实现如下步骤:统计各个执行单元在子函数中的出现次数,将出现次数作为执行单元对应的内存块的使用次数,并且若执行单元在目标计算设备中出现一次,则将使用次数的值减1;对于每个执行单元,判断对应的内存块的使用次数的值是否为0;若使用次数的值为0,则允许对执行单元对应的内存块进行复用;若使用次数的值不为0,则禁止对执行单元对应的内存块进行复用,并返回执行若执行单元在目标计算设备中出现一次,则将使用次数的值减1的步骤。
本申请实施例提供的一种计算机非易失性可读存储介质,计算机非易失性可读存储介质中存储有计算机程序,计算机程序被处理器执行时实现如下步骤:基于执行速度最快原则,将目标内存划分为与各个执行单元对应的内存块。
本申请实施例提供的一种计算机非易失性可读存储介质,计算机非易失性可读存储介质中存储有计算机程序,计算机程序被处理器执行时实现如下步骤:基于内存占用信息对目标计算设备的内存进行管理之后,应用预留内存对目标内存进行内存补偿。
本申请所涉及的计算机非易失性可读存储介质包括随机存储器(RAM)、内存、只读存储器(ROM)、电可编程ROM、电可擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM、或技术领域内所公知的任意其它形式的存储介质。
本申请实施例提供的内存管理系统、设备及计算机非易失性可读存储介质中相关部分的说明请参见本申请实施例提供的内存管理方法中对应部分的详细说明,在此不再赘述。另外,本申请实施例提供的上述技术方案中与现有技术中对应技术方案实现原理一致的部分并未详细说明,以免过多赘述。
还需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括要素的过程、方法、物品或者设备中还存在另外的相同要素。
对所公开的实施例的上述说明,使本领域技术人员能够实现或使用本申请。对这些实施例的多种修改对本领域技术人员来说将是显而易见的,本文中所定义的一般原理可以在不脱离本申请的精神或范围的情况下,在其它实施例中实现。因此,本申请将不会被限制于本文所示的这些实施例,而是要符合与本文所公开的原理和新颖特点相一致的最宽的范围。

Claims (20)

  1. 一种内存管理方法,其特征在于,包括:
    获取目标神经网络模型;
    基于各个目标计算设备对所述目标神经网络模型中算子的运算支持,将所述目标神经网络模型剖分为与各个所述目标计算设备对应的子函数;
    分发所述子函数至对应的所述目标计算设备;
    对于每个所述目标计算设备,均基于所述目标计算设备中各个计算单元的运算信息,将对应的子函数剖分为与各个所述计算单元对应的执行单元,以所述执行单元为粒度对所述目标计算设备进行内存管理。
  2. 根据权利要求1所述的方法,其特征在于,所述以所述执行单元为粒度对所述目标计算设备进行内存管理,包括:
    将所述目标计算设备的内存划分为目标内存及预留内存;
    确定所述执行单元在所述目标内存中的内存占用信息;
    基于所述内存占用信息对所述目标计算设备的内存进行管理。
  3. 根据权利要求2所述的方法,其特征在于,所述确定所述执行单元在所述目标内存中的内存占用信息,包括:
    将所述目标内存划分为与各个所述执行单元对应的内存块;
    将所述执行单元与所述内存块间的对应关系作为所述内存占用信息。
  4. 根据权利要求3所述的方法,其特征在于,所述将所述目标内存划分为与各个所述执行单元对应的内存块,包括:
    基于内存复用原则,将所述目标内存划分为与各个所述执行单元对应的内存块。
  5. 根据权利要求4所述的方法,其特征在于,所述基于所述内存占用信息对所述目标计算设备的内存进行管理,包括:
    统计各个所述执行单元在所述子函数中的出现次数,将所述出现次数作为所述执行单元对应的所述内存块的使用次数,并且若所述执行单元在所述目标计算设备中出现一次,则将所述使用次数的值减1;
    对于每个所述执行单元,判断对应的所述内存块的所述使用次数的值是否为0;
    若所述使用次数的值为0,则允许对所述执行单元对应的所述内存块进行复用;
    若所述使用次数的值不为0,则禁止对所述执行单元对应的所述内存块进行复用,并返回执行所述若所述执行单元在所述目标计算设备中出现一次,则将所述使用次数的值减1的步骤。
  6. 根据权利要求3所述的方法,其特征在于,所述将所述目标内存划分为与各个所述执行单元对应的内存块,包括:
    基于执行速度最快原则,将所述目标内存划分为与各个所述执行单元对应的内存块。
  7. 根据权利要求2至6任一项所述的方法,其特征在于,所述基于所述内存占用信息对所述目标计算设备的内存进行管理之后,还包括:
    应用所述预留内存对所述目标内存进行内存补偿。
  8. 根据权利要求1所述的方法,其特征在于,所述将所述目标神经网络模型剖分为与各个所述目标计算设备对应的子函数,包括:
    将所述目标神经网络模型剖分为与各个所述目标计算设备的类型对应的子函数。
  9. 根据权利要求8所述的方法,其特征在于,所述目标计算设备的类型包括:中央处理器、图形处理器、现场可编程逻辑门阵列。
  10. 根据权利要求8所述的方法,其特征在于,所述目标神经网络模型中的算子包括:卷积算子、池化算子、激活算子。
  11. 根据权利要求1所述的方法,其特征在于,所述执行单元的粒度小于所述子函数的粒度。
  12. 根据权利要求1所述的方法,其特征在于,所述方法应用于深度学习编译器或部署深度学习编译器的计算机设备。
  13. 根据权利要求1所述的方法,其特征在于,所述方法应用于运行目标神经网络模型的计算机设备。
  14. 根据权利要求4所述的方法,其特征在于,所述内存复用原则为不同执行单元间复用同一个内存块。
  15. 根据权利要求4所述的方法,其特征在于,所述内存复用原则为同一个执行单元的输入、输出复用同一个内存块。
  16. 根据权利要求5所述的方法,其特征在于,所述统计各个所述执行单元在所述子函数中的出现次数,包括:
    采用深度优先遍历的方式统计各个所述执行单元在所述子函数中的出现次数。
  17. 根据权利要求5所述的方法,其特征在于,所述出现次数表征所述算子的输出需要被后续算子使用的次数。
  18. 一种内存管理系统,其特征在于,包括:
    第一获取模块,用于获取目标神经网络模型;
    第一剖分模块,用于基于各个目标计算设备对所述目标神经网络模型中算子的运算支持,将所述目标神经网络模型剖分为与各个所述目标计算设备对应的子函数;
    第一分发模块,用于分发所述子函数至对应的所述目标计算设备;
    第二剖分模块,用于对于每个所述目标计算设备,均基于所述目标计算设备中各个计算单元的运算信息,将对应的子函数剖分为与各个所述计算单元对应的执行单元,以所述执行单元为粒度对所述目标计算设备进行内存管理。
  19. 一种内存管理设备,其特征在于,包括:
    存储器,用于存储计算机程序;
    处理器,用于执行所述计算机程序时实现如权利要求1至17任一项所述内存管理方法的步骤。
  20. 一种计算机非易失性可读存储介质,其特征在于,所述计算机非易失性可读存储介质中存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1至17任一项所述内存管理方法的步骤。
PCT/CN2023/080786 2022-04-26 2023-03-10 一种内存管理方法、系统、设备及计算机可读存储介质 WO2023207361A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210446431.5 2022-04-26
CN202210446431.5A CN114816752A (zh) 2022-04-26 2022-04-26 一种内存管理方法、系统、设备及计算机可读存储介质

Publications (1)

Publication Number Publication Date
WO2023207361A1 true WO2023207361A1 (zh) 2023-11-02

Family

ID=82507993

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/080786 WO2023207361A1 (zh) 2022-04-26 2023-03-10 一种内存管理方法、系统、设备及计算机可读存储介质

Country Status (2)

Country Link
CN (1) CN114816752A (zh)
WO (1) WO2023207361A1 (zh)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114816752A (zh) * 2022-04-26 2022-07-29 山东云海国创云计算装备产业创新中心有限公司 一种内存管理方法、系统、设备及计算机可读存储介质
CN116049029B (zh) * 2023-03-06 2023-07-14 苏州浪潮智能科技有限公司 一种内存共享方法、装置、设备及可读存储介质
CN116775274A (zh) * 2023-03-24 2023-09-19 美的集团(上海)有限公司 内存优化方法、装置、设备、产品、存储介质和芯片
CN117667424A (zh) * 2023-12-21 2024-03-08 摩尔线程智能科技(北京)有限责任公司 内存管理方法、装置和存储介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111860810A (zh) * 2020-06-30 2020-10-30 浪潮(北京)电子信息产业有限公司 一种基于fpga的神经网络运算方法、装置及设备
CN112084038A (zh) * 2020-09-23 2020-12-15 安徽寒武纪信息科技有限公司 神经网络的内存分配方法及装置
US20210158131A1 (en) * 2019-11-27 2021-05-27 Amazon Technologies, Inc. Hierarchical partitioning of operators
CN113127181A (zh) * 2019-12-30 2021-07-16 杭州海康威视数字技术股份有限公司 内存管理方法、装置及存储介质
WO2022022670A1 (zh) * 2020-07-31 2022-02-03 北京灵汐科技有限公司 神经网络计算图的处理方法、装置及处理设备
CN114356336A (zh) * 2021-11-24 2022-04-15 北京市商汤科技开发有限公司 神经网络模型部署方法及装置、电子设备和存储介质
CN114816752A (zh) * 2022-04-26 2022-07-29 山东云海国创云计算装备产业创新中心有限公司 一种内存管理方法、系统、设备及计算机可读存储介质

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210158131A1 (en) * 2019-11-27 2021-05-27 Amazon Technologies, Inc. Hierarchical partitioning of operators
CN113127181A (zh) * 2019-12-30 2021-07-16 杭州海康威视数字技术股份有限公司 内存管理方法、装置及存储介质
CN111860810A (zh) * 2020-06-30 2020-10-30 浪潮(北京)电子信息产业有限公司 一种基于fpga的神经网络运算方法、装置及设备
WO2022022670A1 (zh) * 2020-07-31 2022-02-03 北京灵汐科技有限公司 神经网络计算图的处理方法、装置及处理设备
CN112084038A (zh) * 2020-09-23 2020-12-15 安徽寒武纪信息科技有限公司 神经网络的内存分配方法及装置
CN114356336A (zh) * 2021-11-24 2022-04-15 北京市商汤科技开发有限公司 神经网络模型部署方法及装置、电子设备和存储介质
CN114816752A (zh) * 2022-04-26 2022-07-29 山东云海国创云计算装备产业创新中心有限公司 一种内存管理方法、系统、设备及计算机可读存储介质

Also Published As

Publication number Publication date
CN114816752A (zh) 2022-07-29

Similar Documents

Publication Publication Date Title
WO2023207361A1 (zh) 一种内存管理方法、系统、设备及计算机可读存储介质
KR102161448B1 (ko) 멀티 채널 메모리를 포함하는 시스템 및 그 동작 방법
WO2021051914A1 (zh) 基于gpu资源的数据处理方法、电子设备及系统
CN101751285B (zh) 用于不同种类处理单元的集中式设备虚拟化层
JP5510556B2 (ja) 仮想マシンのストレージスペースおよび物理ホストを管理するための方法およびシステム
US9626285B2 (en) Storage resource allocation to dataflows based on data requirements and attributes
US11030095B2 (en) Virtual space memory bandwidth reduction
CN114185818B (zh) 基于扩展页表的gpu访存自适应优化方法及装置
JP5923627B2 (ja) 仮想プラットフォーム上でi/oチャネルを調整する方法及び装置
CN104461735A (zh) 一种虚拟化场景下分配cpu资源的方法和装置
CN107894922B (zh) Ram资源分配方法
CN117170882B (zh) 一种资源分配方法、装置、电子设备及存储介质
CN107436798A (zh) 一种基于numa节点的进程访问方法及装置
CN107343023A (zh) 一种Mesos管理集群中的资源分配方法、装置及电子设备
CN105373484A (zh) 一种网络通信芯片中内存分配、存储和管理的方法
CN109471725A (zh) 资源分配方法、装置和服务器
CN111026500A (zh) 云计算模拟平台及其创建方法、装置和存储介质
CN108615077B (zh) 一种应用于深度学习网络的缓存优化方法及装置
CN109766179A (zh) 一种显存分配方法以及装置
KR20220025746A (ko) 컴퓨팅 자원의 동적 할당
US20180067878A1 (en) Method and apparatus for transmitting information
CN106681948A (zh) 可编程逻辑器件的逻辑控制方法及装置
CN110990318B (zh) 一种PCIe总线地址扩展方法、装置、设备和介质
CN109992536A (zh) 数据处理方法、固态硬盘以及计算机设备
CN104657216A (zh) 一种资源池的资源分配方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23794797

Country of ref document: EP

Kind code of ref document: A1