CN113485832A

CN113485832A - Method and device for carrying out allocation management on physical memory pool and physical memory pool

Info

Publication number: CN113485832A
Application number: CN202110777376.3A
Authority: CN
Inventors: 赵军平
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2021-07-09
Filing date: 2021-07-09
Publication date: 2021-10-08

Abstract

The embodiment of the specification provides a method and a device for performing allocation management on a physical memory pool, and the physical memory pool. The physical memory pool is composed of GPU video memory and other physical memories in the system, the management structure information of the physical memory pool comprises a released memory object set and an allocated memory object set, the released memory object set comprises a free memory object which is currently released after being allocated in the physical memory pool, and the allocated memory object set comprises a memory object which is allocated and currently used in the physical memory pool. In the method, in response to a request for a memory with specified capacity, whether the memory object with the specified capacity exists is inquired in a released memory object set; if yes, the inquired memory objects with the specified capacity are distributed; and deleting the allocated memory objects from the set of released memory objects and adding the allocated memory objects to the set of allocated memory objects.

Description

Method and device for carrying out allocation management on physical memory pool and physical memory pool

Technical Field

The embodiment of the specification relates to the technical field of computers, in particular to a method and a device for performing allocation management on a physical memory pool and the physical memory pool.

Background

The GPU is a widely used microprocessor which can reduce the dependence on the CPU and perform part of the original CPU work, has the characteristic of high-performance calculation acceleration, and is widely used for training and online service of AI and deep learning. In the process of operating the application program of the GPU, most functional tasks are operated on the GPU, and particularly the functional tasks needing strong computational power support.

When the application program runs on the GPU, the GPU allocates needed GPU video memory for the application program. An application may be involved in running several functional tasks. And for each functional task, the GPU allocates a memory object for the functional task from the video memory for running the functional task. After the functional task is completed, the GPU releases the allocated memory objects to the video memory of the GPU so as to increase the available video memory capacity in the video memory of the GPU. And the subsequent functional tasks execute the operations of dynamically allocating the memory objects from the video memory and releasing the memory objects to the video memory.

Disclosure of Invention

In view of the foregoing, embodiments of the present specification provide a method and an apparatus for performing allocation management on a physical memory pool, and the physical memory pool. In the allocation management scheme, the memory is allocated by using a physical memory pool consisting of the GPU video memory and other physical memories, so that the physical memory with larger capacity is provided for the memory request object, and the limitation of insufficient video memory capacity of the GPU is avoided. In addition, the released memory object set is used for storing the currently released idle memory objects in the physical memory after allocation, so that the memory objects can be reused, the frequent execution of memory operations of dynamically allocating the memory objects from the video memory and then releasing the memory objects to the video memory is reduced, and the performance consumption caused by frequent memory operations is reduced.

According to an aspect of the embodiments of the present specification, a method for performing allocation management on a physical memory pool is provided, where the physical memory pool is composed of a GPU video memory and other physical memories in a system, and management structure information of the physical memory pool includes a released memory object set and an allocated memory object set, where the released memory object set includes a free memory object that is currently released after being allocated in the physical memory pool, and the allocated memory object set includes a memory object that is allocated and currently used in the physical memory pool, the method includes: responding to a memory with a request of a specified capacity, and inquiring whether a memory object with the specified capacity exists in the released memory object set; if yes, the inquired memory objects with the specified capacity are distributed; and deleting the allocated memory objects from the set of released memory objects and adding the allocated memory objects to the set of allocated memory objects.

According to another aspect of the embodiments of the present specification, there is further provided an apparatus for performing allocation management on a physical memory pool, where the physical memory pool is composed of a GPU video memory and other physical memories in a system, and management structure information of the physical memory pool includes a released memory object set and an allocated memory object set, where the released memory object set includes a free memory object that is currently released after being allocated in the physical memory pool, and the allocated memory object set includes a memory object that is allocated and currently used in the physical memory pool, the apparatus including: at least one processor, a memory coupled with the at least one processor, and a computer program stored on the memory, the at least one processor executing the computer program to implement: responding to a memory with a request of a specified capacity, and inquiring whether a memory object with the specified capacity exists in the released memory object set; if yes, the inquired memory objects with the specified capacity are distributed; and deleting the allocated memory objects from the set of released memory objects and adding the allocated memory objects to the set of allocated memory objects.

According to another aspect of the embodiments of the present specification, there is further provided a physical memory pool, including a GPU video memory and other physical memories in a system, where management structure information of the physical memory pool includes a released memory object set and an allocated memory object set, where the released memory object set includes a free memory object that is currently released after being allocated in the physical memory pool, and the allocated memory object set includes a memory object that is allocated and currently used in the physical memory pool.

According to another aspect of embodiments herein, there is also provided an electronic device, including: at least one processor; and a memory storing instructions that, when executed by the at least one processor, cause the at least one processor to perform the method for allocation management of a physical memory pool as described above.

According to another aspect of embodiments herein, there is also provided a machine-readable storage medium storing executable instructions that, when executed, cause the machine to perform the method for allocation management of a physical memory pool as described above.

Drawings

A further understanding of the nature and advantages of the contents of the embodiments of the present specification may be realized by reference to the following drawings. In the drawings, similar components or features may have the same reference numerals.

FIG. 1 illustrates an example architecture diagram of one example of a physical memory pool in accordance with embodiments of the present description.

Fig. 2A is a schematic diagram illustrating an example of setting of each memory region in a physical memory pool according to an embodiment of the present disclosure.

Fig. 2B is a schematic diagram illustrating another example of setting of each memory region in a physical memory pool according to an embodiment of the present disclosure.

Fig. 2C is a schematic diagram illustrating another example of setting of each memory region in a physical memory pool according to an embodiment of the present disclosure.

Fig. 2D is a schematic diagram illustrating another example of setting of each memory region in a physical memory pool according to an embodiment of the present disclosure.

Fig. 3 is a flowchart illustrating an example of a method for allocation management of a physical memory pool according to an embodiment of the present specification.

Fig. 4 is a block diagram illustrating an example of an apparatus for allocation management of a physical memory pool according to an embodiment of the present disclosure.

Fig. 5 is a block diagram of an electronic device for implementing a method for allocation management of a physical memory pool according to an embodiment of the present disclosure.

Detailed Description

The subject matter described herein will be discussed with reference to example embodiments. It should be understood that these embodiments are discussed only to enable those skilled in the art to better understand and thereby implement the subject matter described herein, and are not intended to limit the scope, applicability, or examples set forth in the claims. Changes may be made in the function and arrangement of elements discussed without departing from the scope of the embodiments of the disclosure. Various examples may omit, substitute, or add various procedures or components as needed. In addition, features described with respect to some examples may also be combined in other examples.

As used herein, the term "include" and its variants mean open-ended terms in the sense of "including, but not limited to. The term "based on" means "based at least in part on". The terms "one embodiment" and "an embodiment" mean "at least one embodiment". The term "another embodiment" means "at least one other embodiment". The terms "first," "second," and the like may refer to different or the same object. Other definitions, whether explicit or implicit, may be included below. The definition of a term is consistent throughout the specification unless the context clearly dictates otherwise.

When the application program runs on the GPU, the GPU allocates needed GPU video memory for the application program. An application may be involved in running several functional tasks. And for each functional task, the GPU allocates a memory object for the functional task from the video memory for running the functional task. After the functional task is completed, the GPU releases the allocated memory objects to the video memory of the GPU so as to increase the available video memory capacity in the video memory of the GPU. And the subsequent functional tasks execute the memory operation of dynamically allocating the memory objects from the video memory and releasing the memory objects to the video memory.

However, the video memory capacity of the GPU is limited, for example, the video memory capacity is generally 16GB, 32GB, and the like, so that many large-scale computing tasks are restricted by the limited video memory, for example, in deep learning training, high-precision models such as BERT-large models and GPT-3 all need larger video memory to run on the GPU, otherwise, training cannot be completed. Therefore, the video memory capacity of the GPU becomes a bottleneck to perform large-scale computing tasks on the GPU. In addition, the GPU needs to perform a memory operation of dynamically allocating a memory object from the video memory and then releasing the memory object to the video memory for each functional task in each application program, so that the memory operation needs to be frequently performed, which leads to a large performance consumption.

In view of the foregoing, in the embodiments of the present specification, a method and an apparatus for performing allocation management on a physical memory pool, and a physical memory pool are provided. In the allocation management scheme, a physical memory pool is composed of a GPU (graphics processing unit) video memory and other physical memories in a system, management structure information of the physical memory pool comprises a released memory object set and an allocated memory object set, wherein the released memory object set comprises a free memory object which is currently released after being allocated in the physical memory pool, the allocated memory object set comprises a memory object which is allocated and currently used in the physical memory pool, and in the allocation management method, whether the memory object with the designated capacity exists in the released memory object set or not is inquired in response to a request for the memory with the designated capacity; if yes, allocating the memory objects with the specified capacity; and deleting the allocated memory objects from the set of released memory objects and adding the allocated memory objects to the set of allocated memory objects. By the allocation management scheme, the memory is allocated by using the physical memory pool consisting of the GPU video memory and other physical memories, so that the physical memory with larger capacity is provided for the memory request object, and the limitation of insufficient video memory capacity of the GPU is avoided. In addition, the released memory object set is used for storing the currently released idle memory objects in the physical memory after allocation, so that the memory objects can be reused, the frequent execution of memory operations of dynamically allocating the memory objects from the video memory and then releasing the memory objects to the video memory is reduced, and the performance consumption caused by frequent memory operations is reduced.

A method and an apparatus for performing allocation management on a physical memory pool, and a physical memory pool according to an embodiment of the present disclosure will be described in detail below with reference to the accompanying drawings.

The physical memory pool may be composed of GPU video memory and other physical memory in the system. The system is a system to which the GPU video memory and other physical memories belong, and a physical memory pool formed by the system also belongs to the system. There may be one or more GPU memories in the physical memory pool. When the physical memory pool includes a plurality of GPU video memories, the plurality of GPU video memories may be all GPU video memories in the system or may be a part of GPU video memories in the system. When the physical memory pool comprises a plurality of GPU video memories, one GPU video memory can be designated as a local GPU video memory by an application object in the system during operation, and other GPU video memories are used as remote GPU video memories. In the present specification embodiment, the application object may include an application program of an application layer, a framework of a framework layer, and the like. For example, the application CNN in the application layer may request a video memory during running, and the TensorFlow framework in the framework layer may request a video memory when called in deep learning. The following description will be given taking an application program as an example.

In one example, other physical memory in the pool of physical memory may include CPU memory and/or non-volatile memory. In addition, other physical memory may include disks, etc. As shown in fig. 1, the physical memory pool includes a GPU video memory, a CPU memory, and a nonvolatile memory.

The physical memory pool may be used as a whole to allocate memory for the application objects in the system, and the physical memories in the physical memory pool are communicatively connected, for example, the GPU video memory, the CPU memory, and the nonvolatile memory in fig. 1 may be communicatively connected with each other. Data in the physical memory pool can be migrated among the physical memories, so that the local GPU video memory specified by the application object can have larger available capacity in a data migration mode. For example, an application object in the system designates a GPU video memory as a local GPU video memory, and when the application object requests a memory, the application object preferentially allocates the video memory from the local GPU video memory, and at this time, data that is temporarily unused on the local GPU video memory may be migrated to another physical memory in the physical memory pool, so that the available capacity of the local GPU video memory may be expanded.

For the creation of the physical memory pool, all the physical memories in the system may be queried and counted, and then, a part of the physical memories is allocated from all the counted physical memories according to a configuration policy to be used for creating the physical memory pool. The configuration policy may be determined based on the total amount of all physical memory and/or the amount of physical memory required by the application object.

In an example, for the configuration policy according to the total amount of all physical memories, a specified proportion of the total amount of all physical memories may be determined as the physical memory used by the physical memory pool, for example, if the specified proportion is 50%, and the counted total amount of all physical memories is 128G, 64G of the physical memory may be allocated to the physical memory pool for use.

In another example, the constructed physical memory pool is used to allocate memory for the application object according to the configuration policy of the physical memory required by the application object. Firstly, determining a memory required by the operation of an application object served by a physical memory pool, and then increasing a specified multiple to obtain a certain amount of physical memory on the basis of the determined memory capacity, wherein the obtained physical memory is a physical memory used by the physical memory pool. For example, the designated multiple may be 4 times to 6 times, and taking the designated multiple as 4 times as an example, the capacity of the physical memory required for the application object to run is 16G, it may be determined that the capacity of the physical memory in the physical memory pool is 64G, and then 64G of physical memory may be allocated from all the physical memories of the system to be used by the physical memory pool.

When allocating a memory for a physical memory pool from a physical memory of a system, information such as the capacity size, the memory type, the attribute and the like of the allocated physical memory can be input or configured, the memory type can include a GPU memory, a CPU memory, a nonvolatile memory and the like, API interfaces called by physical memories of different memory types are different, for example, the GPU memory can call a cuMemAlloc interface, and the CPU memory used as a system memory can call a malloc/mmap standard interface. Based on this, the corresponding API interface may be called according to the information of the allocated physical memory to perform a specific allocation operation. After the allocation is completed, performing pooling processing on the allocated physical memories to merge the allocated physical memories into a physical memory pool, and initializing the structure of the physical memory pool, including the initial address, the memory type, the length information and the like.

In addition, when the system is upgraded, the service of the system is finished, or a memory pool deleting instruction is responded, and the like, the deleting operation can be executed on the physical memory pool. Specifically, first, each memory object of the physical memory pool is released; then, clearing structure information such as offset information, capacity size information and the like in the physical memory pool to enable the offset of the physical memory pool to be 0 and the capacity size to be indicated as the total capacity of the physical memory pool; and finally, releasing the whole physical memory pool, calling corresponding release API interfaces respectively for each type of physical memory in the physical memory pool to release the corresponding physical memory, and returning the released physical memory to the system to become the physical memory to be distributed in the system. Alternatively, the released physical memory is merged with a portion of physical memory in the system that is not allocated to the physical memory pool from the same physical memory. For example, the release API interface that the GPU video memory can call is a cuMemFree interface, and for the GPU video memory in the physical memory pool, the cuMemFree interface is called to release the GPU video memory, and the released GPU video memory and another part of the GPU video memory in the system that is not allocated to the physical memory pool all belong to the same GPU video memory, so that the released GPU video memory and the another part of the GPU video memory are merged into a complete GPU video memory, and the formed GPU video memory becomes the GPU video memory to be allocated in the system.

In this specification, the physical memory pool corresponds to management structure information, and the management structure information may include memory state information of the physical memory pool, and may also include other management information related to the physical memory pool, such as an allocation policy.

In one example, a physical memory pool may be configured with a corresponding memory manager, as shown in FIG. 1. The memory manager is used for managing the physical memory in the physical memory pool, for example, allocating memory and allocating policy, releasing memory, migrating data, and the like. When the memory manager manages the physical memory pool, the management structure information of the physical memory pool can be updated, so that the management structure information and the physical memory pool can be kept synchronous. The management structure information of the physical memory pool may be stored in the memory manager.

In this example, the physical memory pool is located below the memory manager, and the memory manager may perform memory management on the physical memory pool below. In addition, the application layer is located at the upper layer of the memory manager, and the memory manager can also perform memory allocation, release and the like for the application objects in the upper application layer.

The management structure information of the physical memory pool may include a set of released memory objects and a set of allocated memory objects. The set of released memory objects and the set of allocated memory objects all comprise memory objects. In this specification, a memory object is a part of memory allocated to an application object from a physical memory pool, each memory object is allocated, and the states of the memory objects include two types: an idle state that has been allocated but is not currently in use, a state that is currently allocated and is in use. The memory objects in the set of released memory objects are all memory objects that can be reallocated and reused by the application objects, and the memory objects in the set of released memory objects are released after being allocated. The memory object release is to release the data in the released memory object, and the released memory object becomes a free memory object and keeps the original memory capacity unchanged. Memory objects in the set of released memory objects can be reallocated. .

Each memory object in the set of released memory objects may include a pointer and capacity information of the memory object, and in an example, the set of released memory objects may store the pointer and the capacity information of each memory object in a hash structure manner, where for each memory object, the pointer is taken as a key, and the capacity information is taken as a value. This facilitates querying memory objects of a specified capacity from the set of released memory objects.

The memory objects in the allocated memory object set are memory objects which are currently in use but cannot be allocated, and the memory objects in the allocated memory object set are memory objects which are continuously used by the application object after being allocated to the application object, or memory objects which are released and allocated to other application objects or other functional tasks for use. The current state of the memory objects in the allocated memory object set is the occupied state in use and cannot be allocated.

Each memory object in the allocated memory object set may include a pointer and capacity information of the memory object, and in an example, the allocated memory object set may store the pointer and the capacity information of each memory object in a hash structure manner, and for each memory object, the pointer is taken as a key, and the capacity information is taken as a value.

In an example of this specification embodiment, the management structure information may further include memory object capacity information. The memory object capacity information is used to record the memory object capacity existing in the released memory object set and the allocated memory object set.

In one example, the memory object capacity information records the capacity of each memory object in the set of released memory objects and the set of allocated memory objects. In another example, the capacities of the memory objects in the released memory object set and the allocated memory object set may be deduplicated, and then the capacity of the memory object after the deduplication processing is recorded in the memory object capacity information, so that only one record is stored in the memory object capacity information for each memory object capacity, thereby simplifying the memory object capacity information. In another example, the memory object capacity information records past-repeated memory object capacities, and the memory object capacities recorded in the memory object capacity information may be arranged in order according to capacity size, and the ordered arrangement includes a descending order and an ascending order. The memory object capacity with the specified capacity can be conveniently found from the memory object capacity information through the mode that the memory object capacities are orderly arranged.

The physical memory pool can be applied to various types of application objects, and memory is allocated to the various types of application objects. In one example, the physical memory pool may be applied to deep learning, allocating memory for application objects performing deep learning.

In one example of the embodiments of the present specification, a physical memory pool applied to deep learning may be configured as a static memory region and a dynamic memory region. The static memory area and the dynamic memory area are used for distributing memory objects for different types of data.

The static memory region may be a continuous memory region for allocating a memory object for the first data that is frequently accessed in the deep learning and is smaller than a specified size. In one example, the first data may also be data that remains unchanged in the deep learning, such as weights in the deep learning. After a memory object is allocated to an application object for deep learning for storing first data, the first data is kept unchanged in the memory object, and the first data can be directly read from the memory object when the first data needs to be frequently accessed in the deep learning process.

The static memory region may be configured in the GPU video memory of the physical memory pool. The first data can be data with smaller data size, so that the storage in the GPU video memory does not occupy larger memory space, and the GPU video memory can store more first data. In addition, the first data are frequently accessed in the deep learning operation process, and are stored in the GPU video memory, so that the application object which is convenient for executing the deep learning can directly read the first data from the GPU video memory, the data migration processing for migrating the first data to the GPU video memory when the first data are stored in other physical memories is avoided, and the reading efficiency of the first data is improved.

The dynamic memory area is used for allocating memory objects for other data except the first data in the deep learning. The dynamic memory area may include at least one type of physical memory among GPU video memory, CPU memory, and non-volatile memory. The other data targeted by the dynamic memory area may be data with a large data volume, data that is used once, data that is used many times and has a change, and the like.

In the physical memory pool, the area sizes of the static memory area and the dynamic memory area may be empirically specified. For example, if the data amount of the first data in the deep learning is small, 20% of the regions in the physical memory pool may be set as static memory regions, and the remaining 80% may be set as dynamic memory regions. In addition, the estimation can be performed during the operation of the application object to determine the ratio between the data amount of the first data and the data amount of the other data, and then the static memory area and the dynamic memory area are divided according to the estimated ratio.

In one example, the dynamic memory region may be divided into two regions: the first dynamic memory area and the second dynamic memory area may be two continuous memory areas, respectively.

The first dynamic memory region may be used to allocate memory objects for data that is used multiple times in deep learning, and the multiple-use data for which the first dynamic memory region is used may be data larger than a specified data size. In addition, the data stored in the memory object allocated to the first dynamic memory region can be swapped in or swapped out of the memory object, so that the memory objects in the first dynamic memory region can be multiplexed.

For example, the data for multiple uses of the first dynamic memory region may include an intermediate result generated by the deep neural network, and in the training process of the deep neural network, the intermediate result is generated and then stored in the memory object allocated to the first dynamic memory region, so as to facilitate subsequent reading and use. And subsequently calculating by using the intermediate result to obtain another intermediate result, and then replacing the intermediate result from the memory object and replacing the other intermediate result into the memory object for storage.

The second dynamic memory region may be used to allocate memory objects for the one-time data. In this specification, disposable data is data that is not accessed any more after the data is used once by a function. In one example, the one-time data is used once in a function, the one-time data is only active during use by the function, and the one-time data is not accessed again after the function call ends. For example, a workspace suitable for deep learning is disposable data and is not reused after the completion of a workspace call.

The memory objects allocated to the second dynamic memory area are used for storing the one-time data, and after the one-time data are generated, the one-time data in the memory objects can be released, so that the memory objects can be reused, and the utilization rate of the memory objects is improved.

In the dynamic memory region, the size of the first dynamic memory region and the second dynamic memory region may or may not be fixed. When the sizes of the first dynamic memory area and the second dynamic memory area are fixed, the proportion of the first dynamic memory area in the dynamic memory area and the proportion of the second dynamic memory area in the dynamic memory area can be specified. In the dynamic memory area, the first dynamic memory area is larger than the second dynamic memory area. For example, in the dynamic memory area, 2G memory is allocated as the second dynamic memory area, and all the remaining memory is allocated to the first dynamic memory area. When the sizes of the first dynamic memory area and the second dynamic memory area are not fixed, the sizes of the first dynamic memory area and the second dynamic memory area may be dynamically changed.

In the physical memory pool, all physical memory configurations may be divided into a static memory region, a first dynamic memory region, and a second dynamic memory region, and a part of the physical memory configurations may also be divided into a static memory region, a first dynamic memory region, and a second dynamic memory region.

The physical memory pool is configured into a static memory area, a first dynamic memory area and a second dynamic memory area to be used for respectively allocating memory objects for different types of data, and memory fragments in the physical memory pool are reduced or avoided.

In the physical memory pool, the static memory area, the first dynamic memory area and the second dynamic memory area may be arranged adjacently or non-adjacently.

In the physical memory pool, the static memory area and the dynamic memory area may be adjacent. In an example, an ending address end of a static memory area may be adjacent to a starting address end of a dynamic memory area, as shown in fig. 2A and 2B, fig. 2A and 2B respectively show schematic diagrams of an example of setting of each memory area in a physical memory pool according to an embodiment of the present specification. In another example, an ending address end of the dynamic memory area may be adjacent to a starting address end of the static memory area, as shown in fig. 2C and 2D, and fig. 2C and 2D respectively show schematic diagrams of another example of setting of each memory area in the physical memory pool according to an embodiment of the present specification.

In the dynamic memory region, in an example, a start address end of the dynamic memory region may be a start address end of the first dynamic memory region, and an end address end of the dynamic memory region may be an end address end of the second dynamic memory region, as shown in fig. 2A and 2C.

In another example, the start address end of the dynamic memory region may be the start address end of the second dynamic memory region, and the end address end of the dynamic memory region may be the end address end of the first dynamic memory region. As shown in fig. 2B and 2D.

The arrangement modes of the memory areas in the physical memory pool are different, so that the allocation directions of the memory areas for allocating the memory objects can be different. And each memory area determines the allocation direction according to the position of the memory area in the physical memory pool and the adjacent relation with other memory areas.

In an example of the embodiments of the present specification, when an ending address end of a static memory region is adjacent to a starting address end of a dynamic memory region, the static memory region may allocate memory objects in a direction from the starting address to the ending address sequentially. At this time, the size of the static memory area is fixed, the starting address is fixed, the memory objects are allocated from the starting address to one direction, and the memory pointer offset in the static memory area also moves to one direction, so that the static memory area is convenient to manage.

Taking fig. 2A and 2B as an example, the ending address end of the static memory area is adjacent to the starting address end of the dynamic memory area, the starting address of the static memory area remains unchanged, the allocation direction 1 indicates the memory allocation direction of the static memory area, the static memory area always allocates the memory objects according to the sequence of the allocation direction 1, and the pointer offset in the static memory area also always moves according to the allocation direction 1.

In this example, the first dynamic memory region and the second dynamic memory region in the dynamic memory regions are different in location, and the corresponding memory allocation directions are also different.

In one example, the start address end of the dynamic memory region may be the start address end of the first dynamic memory region, and the end address end of the dynamic memory region may be the end address end of the second dynamic memory region, as shown in fig. 2A.

In this example, since the ending address end of the static memory area is adjacent to the starting address end of the dynamic memory area, the starting address of the dynamic memory area is fixed, that is, the starting address of the first dynamic memory area is fixed. At this time, the first dynamic memory region may sequentially allocate memory objects in a direction from a fixed start address to an end address. As shown in fig. 2A, the allocation direction 2 is used to indicate the memory allocation direction of the first dynamic memory region. The starting address of the first dynamic memory area is fixed, and memory objects are distributed according to a fixed sequence by taking the starting address as a starting point, so that the pointer offset in the first dynamic memory area is convenient to manage.

In addition, the ending address of the dynamic memory area is fixed when the area size of the dynamic memory area is fixed, and the ending address of the second dynamic memory area is fixed when the ending address end of the dynamic memory area is the ending address end of the second dynamic memory area. At this time, the second dynamic memory area may allocate the memory objects in the direction from the end address to the start address, as shown in the allocation direction 3 in fig. 2A, where the allocation direction 3 is used to indicate the memory allocation direction of the second dynamic memory area. The end address of the second dynamic memory area is fixed, the memory objects are distributed according to a fixed sequence by taking the end address as a starting point, and the pointer offset in the second dynamic memory area is gradually decreased, so that the pointer offset in the second dynamic memory area is conveniently managed.

In another example, the start address end of the dynamic memory region may be the start address end of the second dynamic memory region, and the end address end of the dynamic memory region may be the end address end of the first dynamic memory region, as shown in fig. 2B.

In this example, since the ending address end of the static memory area is adjacent to the starting address end of the dynamic memory area, the starting address of the dynamic memory area is fixed, that is, the starting address of the second dynamic memory area is fixed. At this time, the second dynamic memory region may allocate memory objects in a direction from the fixed starting address to the ending address, as shown in allocation direction 3 in fig. 2B.

In addition, the ending address of the dynamic memory area is fixed when the area size of the dynamic memory area is fixed, and the ending address of the first dynamic memory area is fixed when the ending address end of the dynamic memory area is the ending address end of the first dynamic memory area. At this time, the first dynamic memory region may allocate memory objects in a direction from the fixed end address to the start address, which is an allocation direction 2 shown in fig. 2B.

In an example of the embodiments of the present specification, when an ending address end of a dynamic memory region is adjacent to a starting address end of a static memory region, the static memory region allocates memory objects in a direction from the ending address to the starting address sequentially. At this time, the size of the static memory area is fixed, the end address is also fixed, the memory objects are allocated in one direction from the end address, and the memory pointer offset in the static memory area is also moved in one direction, so that the static memory area is convenient to manage.

Taking fig. 2C and 2D as an example, the start address end of the static memory area is adjacent to the end address end of the dynamic memory area, the end address of the static memory area remains unchanged, the static memory area always allocates the memory objects according to the sequence of the allocation direction 1, and the pointer offset in the static memory area also always moves according to the allocation direction 1.

In one example, the start address end of the dynamic memory region may be the start address end of the first dynamic memory region, and the end address end of the dynamic memory region may be the end address end of the second dynamic memory region, as shown in fig. 2C.

In this example, the area size of the dynamic memory area is fixed, then the start address of the dynamic memory area is fixed, and when the start address of the dynamic memory area is the start address of the first dynamic memory area, the start address of the first dynamic memory area is fixed. At this time, the first dynamic memory region may allocate memory objects in a direction from the start address to the end address, such as allocation direction 2 shown in fig. 2C.

In addition, since the start address end of the static memory area is adjacent to the end address end of the dynamic memory area, the end address of the dynamic memory area is fixed, that is, the end address of the second dynamic memory area is fixed. At this time, the second dynamic memory region may sequentially allocate memory objects in a direction from the fixed end address to the start address. As shown in the dispensing direction 3 of fig. 2C.

In another example, the start address end of the dynamic memory region may be the start address end of the second dynamic memory region, and the end address end of the dynamic memory region may be the end address end of the first dynamic memory region, as shown in fig. 2D.

In this example, the area size of the dynamic memory area is fixed, then the start address of the dynamic memory area is fixed, and when the start address end of the dynamic memory area is the start address end of the second dynamic memory area, the start address of the second dynamic memory area is fixed. At this time, the second dynamic memory region may allocate memory objects in a direction from the fixed starting address to the ending address, as shown in allocation direction 3 in fig. 2D.

In addition, since the start address end of the static memory area is adjacent to the end address end of the dynamic memory area, the end address of the dynamic memory area is fixed, that is, the end address of the first dynamic memory area is fixed. At this time, the first dynamic memory region may allocate memory objects in a direction from the fixed end address to the start address, which is an allocation direction 2 as shown in fig. 2D.

FIG. 3 is a flowchart illustrating an example 300 of a method for allocation management of a physical memory pool according to an embodiment of the present description.

As shown in FIG. 3, at 310, in response to requesting a specified amount of memory, a query is made in the set of released memory objects as to whether there are memory objects of the specified amount. If so, the operation of 320 is performed; if not, then the operations of 340 are performed. In one example, the request for the response may come from an application object, such as an application program.

At 320, the queried memory object of specified capacity is allocated.

At 330, the set of released memory objects and the set of allocated memory objects are updated based on the allocated memory objects.

Specifically, the allocated memory objects may be deleted from the set of released memory objects and the memory objects may be added to the set of allocated memory objects. By updating the released memory object set and the allocated memory object set, the memory allocation state in the physical memory pool is ensured to be consistent with the released memory object set and the allocated memory object set, so that each memory object is accurately multiplexed.

When there is no memory object of the specified capacity in the set of released memory objects, at 340, it is queried whether the capacity of unallocated memory in the physical memory pool is less than the specified capacity. If not, the operation of 350 is executed; if so, the operations of 370 are performed.

The unallocated memory may be unallocated memory in the physical memory pool that is not in the set of released memory objects and the set of allocated memory objects. The unallocated memory may be a whole, and the unallocated memory may include a video memory on a GPU video memory, a memory on a CPU memory, a memory in a nonvolatile memory, and the like. The unallocated memory may be partitioned into memory objects of different capacities according to the memory request of the application object. For example, the total capacity of the physical memory pool is 128G, where the sum of the capacities of the memory objects in the set of released memory objects is 32G, and the sum of the capacities of the memory objects in the set of allocated memory objects is 16G, then the capacity of the unallocated memory in the physical memory pool is 80G.

When the capacity of the unallocated memory in the physical memory pool is not less than the designated capacity, at 350, a memory object of the designated capacity may be allocated to the application object in the unallocated memory of the physical memory pool.

In one example, the memory objects may be allocated from unallocated memory in the priority order of local GPU memory, remote GPU memory, CPU memory, non-volatile memory, and other physical memory specified by the application object. For example, if the unallocated memory includes a video memory on a video memory of the local GPU and the capacity of the video memory is not less than the designated capacity, the memory object is preferentially allocated to the application object from the local GPU.

In another example, memory objects may be allocated from local GPU display memory in unallocated memory. When the available capacity on the local GPU video memory in the unallocated memory is less than the specified capacity, data that is temporarily unused on the local GPU video memory in the unallocated memory may be migrated to another physical memory, so that more available memory capacity is available on the local GPU in the unallocated memory for allocating memory objects.

In another example, a memory object of a specified capacity may be allocated from a remote GPU memory in unallocated memory when the available capacity on the local GPU memory in unallocated memory is less than the specified capacity. The application object may perform read and write operations on data in the allocated memory objects through an interconnection bus between the local GPU and the remote GPU.

In another example, the unallocated memory may include a static memory region, a first dynamic memory region, and a second dynamic memory region. When the application object requests the memory, the data type of the memory requested by the application object is determined, and when the data to which the application object is directed is the first data which is frequently accessed in the deep learning and is smaller than the specified size, the memory object can be allocated to the application object from a static memory area in the unallocated memory. When the targeted data is data that is used multiple times among other data, a memory object may be allocated for the application object from a first dynamic memory region in unallocated memory. When the targeted data is one-time data among other data, a memory object may be allocated to the application object from a second dynamic memory region in the unallocated memory.

Next, at 360, the allocated memory objects are added to the allocated memory object set.

When the capacity of unallocated memory in the physical memory pool is less than the specified capacity, memory objects having a set capacity of released memory objects greater than the specified capacity are allocated, at 370.

In one example, any memory object having a capacity greater than a specified capacity may be allocated.

In another example, the management structure information may further include memory object capacity information, and the memory object capacity information records the memory object capacity in the released memory object set and the allocated memory object set. The memory object capacity information may be used to query for released memory objects having a collective capacity greater than the specified capacity and a minimum difference from the specified capacity. Specifically, the specified capacity may be compared to the capacity of each memory object in the set of released memory objects, such that a memory object with a capacity greater than the specified capacity and a minimum difference from the specified capacity may be identified.

In one example, the memory object capacities recorded in the memory object capacity information may be ordered according to the capacity sizes, and the memory object with the capacity larger than the specified capacity and the smallest capacity difference from the specified capacity may be determined from the ordering of the capacities.

At 380, the allocated memory objects are deleted from the set of released memory objects and added to the set of allocated memory objects. It should be noted that the operations of block 380 and the operations of block 330 may be performed by the same execution unit.

In one example of an embodiment of the present specification, the method illustrated in fig. 3 may be applied to deep learning. In the application scenario of deep learning, several rounds of cycles need to be executed, and the number of memory objects required in each round of cycle and the capacity of each memory object are the same.

For the first round of deep learning, memory objects may be allocated for application objects performing deep learning operations by the method shown in fig. 3. Specifically, in the first round of deep learning, in response to an application object performing deep learning operation requesting a memory with a specified capacity, whether a memory object requesting the memory with the specified capacity exists in a released memory object set is queried, if so, the queried memory object with the specified capacity is allocated, and then the allocated memory object is deleted from the released memory object set and added to the allocated memory object set.

After the first round of circulation is finished, the memory object used for the first data is kept unchanged, the memory objects used for other data can release the stored data, and the released memory objects are added to the released memory object set so as to be reused in each subsequent round of circulation.

For each cycle after the first cycle, the memory objects and/or data in the memory objects allocated in the first cycle may be multiplexed. For example, the weights stored in the memory objects in the static memory region allocated in the first round of circulation also need to be used in each subsequent round of circulation, so that the weights stored in the memory objects allocated in the first round of circulation can be directly reused. For another example, a memory object is allocated in the first dynamic memory region for the intermediate result of a function in deep learning in the first round, and then in each subsequent round, the memory object allocated in the first round can be multiplexed for the intermediate result of the function.

In deep learning, based on the characteristics that the number of memory objects required in each round of loop and the capacity of each memory object are the same, in each round of loop after the first round of loop, by multiplexing the memory objects allocated in the first round of loop and the data in the memory objects, frequent memory allocation operations are avoided, and thus the performance overhead caused by the memory allocation operations is reduced.

Fig. 4 is a block diagram illustrating an example of an apparatus for allocation management of a physical memory pool (hereinafter, referred to as an allocation management apparatus 400) according to an embodiment of the present specification.

The physical memory pool allocated and managed by the allocation management device 400 is composed of GPU video memory and other physical memory in the system, and the management structure information of the physical memory pool includes a released memory object set and an allocated memory object set, where the released memory object set includes a free memory object that is currently released after being allocated in the physical memory pool, and the allocated memory object set includes a memory object that is allocated and currently used in the physical memory pool.

As shown in fig. 4, the allocation management apparatus 400 includes a memory object query unit 410, a memory object allocation unit 420, and a memory object set update unit 430.

A memory object querying unit 410 configured to query whether a memory object of a specified capacity exists in the set of released memory objects in response to a request for a memory of a specified capacity.

The memory object allocation unit 420 is configured to, when a memory object with a specified capacity exists in the released memory object set, allocate the queried memory object with the specified capacity.

A memory object set update unit 430 configured to delete the allocated memory objects from the set of released memory objects and add them to the set of allocated memory objects.

In one example, the allocation management apparatus 400 may further include a memory object allocation unit 440, and the memory object allocation unit 440 is configured to allocate a specified amount of memory objects in the unallocated memory of the physical memory pool when there is no memory object of the specified amount in the released memory object set. The memory object set update unit 430 is configured to add the allocated memory objects to the allocated memory object set.

In one example, the memory object allocation unit 420 may be further configured to allocate a memory object with a released memory object collective capacity greater than a specified capacity when the capacity of the unallocated memory is less than the specified capacity. The memory object set update unit 430 may also be configured to delete allocated memory objects from the set of released memory objects and add them to the set of allocated memory objects.

In one example, the management structure information further includes memory object capacity information, and the memory object capacity information is used for recording the memory object capacity in the released memory object set and the allocated memory object set.

The memory object allocation unit 420 may be further configured to query the released memory objects, whose collective capacity is greater than the specified capacity and whose difference from the specified capacity is minimum, using the memory object capacity information when the capacity of the unallocated memory is less than the specified capacity; and distributing the inquired released memory objects.

In one example, the allocation management apparatus 400 may be applied to deep learning.

In one example, the allocation management apparatus 400 may further include a memory object multiplexing unit configured to multiplex the memory objects allocated in the first cycle and/or the data in the memory objects for each cycle of the deep learning after the first cycle.

Embodiments of a method and an apparatus for allocation management of a physical memory pool according to an embodiment of the present disclosure are described above with reference to fig. 1 to 4.

The apparatus for performing allocation management on a physical memory pool in the embodiments of the present specification may be implemented by hardware, or may also be implemented by software, or a combination of hardware and software. The software implementation is taken as an example, and is formed by reading corresponding computer program instructions in the storage into the memory for operation through the processor of the device where the software implementation is located as a logical means. In the embodiment of the present specification, the apparatus for performing allocation management on the physical memory pool may be implemented by an electronic device, for example.

Fig. 5 is a block diagram of an electronic device 500 for implementing a method for allocation management of a physical memory pool according to an embodiment of the present disclosure.

As shown in fig. 5, the electronic device 500 may include at least one processor 510, a storage (e.g., non-volatile storage) 520, a memory 530, and a communication interface 540, and the at least one processor 510, the storage 520, the memory 530, and the communication interface 540 are connected together via a bus 550. The at least one processor 510 executes at least one computer-readable instruction (i.e., the elements described above as being implemented in software) stored or encoded in memory.

In one embodiment, computer-executable instructions are stored in the memory that, when executed, cause the at least one processor 510 to: responding to the memory with the specified capacity requested, and inquiring whether the memory object with the specified capacity exists in the released memory object set; if yes, the inquired memory objects with the specified capacity are distributed; and deleting the allocated memory objects from the set of released memory objects and adding the allocated memory objects to the set of allocated memory objects.

It should be appreciated that the computer-executable instructions stored in the memory, when executed, cause the at least one processor 510 to perform the various operations and functions described above in connection with fig. 1-4 in the various embodiments of the present description.

According to one embodiment, a program product, such as a machine-readable medium, is provided. A machine-readable medium may have instructions (i.e., elements described above as being implemented in software) that, when executed by a machine, cause the machine to perform various operations and functions described above in connection with fig. 1-4 in the various embodiments of the present specification.

Specifically, a system or apparatus may be provided which is provided with a readable storage medium on which software program code implementing the functions of any of the above embodiments is stored, and causes a computer or processor of the system or apparatus to read out and execute instructions stored in the readable storage medium.

In this case, the program code itself read from the readable medium can realize the functions of any of the above-described embodiments, and thus the machine-readable code and the readable storage medium storing the machine-readable code form part of the present invention.

Computer program code required for the operation of various portions of the present specification may be written in any one or more programming languages, including an object oriented programming language such as Java, Scala, Smalltalk, Eiffel, JADE, Emerald, C + +, C #, VB, NET, Python, and the like, a conventional programming language such as C, Visual Basic 2003, Perl, COBOL 2002, PHP, and ABAP, a dynamic programming language such as Python, Ruby, and Groovy, or other programming languages. The program code may execute on the user's computer, or on the user's computer as a stand-alone software package, or partially on the user's computer and partially on a remote computer, or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any network format, such as a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet), or in a cloud computing environment, or as a service, such as a software as a service (SaaS).

Examples of the readable storage medium include floppy disks, hard disks, magneto-optical disks, optical disks (e.g., CD-ROMs, CD-R, CD-RWs, DVD-ROMs, DVD-RAMs, DVD-RWs), magnetic tapes, nonvolatile memory cards, and ROMs. Alternatively, the program code may be downloaded from a server computer or from the cloud via a communications network.

The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

Not all steps and elements in the above flows and system structure diagrams are necessary, and some steps or elements may be omitted according to actual needs. The execution order of the steps is not fixed, and can be determined as required. The apparatus structures described in the above embodiments may be physical structures or logical structures, that is, some units may be implemented by the same physical entity, or some units may be implemented by a plurality of physical entities, or some units may be implemented by some components in a plurality of independent devices.

The term "exemplary" used throughout this specification means "serving as an example, instance, or illustration," and does not mean "preferred" or "advantageous" over other embodiments. The detailed description includes specific details for the purpose of providing an understanding of the described technology. However, the techniques may be practiced without these specific details. In some instances, well-known structures and devices are shown in block diagram form in order to avoid obscuring the concepts of the described embodiments.

Although the embodiments of the present disclosure have been described in detail with reference to the accompanying drawings, the embodiments of the present disclosure are not limited to the specific details of the embodiments, and various simple modifications may be made to the technical solutions of the embodiments of the present disclosure within the technical spirit of the embodiments of the present disclosure, and all of them fall within the scope of the embodiments of the present disclosure.

The previous description of the disclosure is provided to enable any person skilled in the art to make or use the disclosure. Various modifications to the disclosure will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other variations without departing from the scope of the disclosure. Thus, the description is not intended to be limited to the examples and designs described herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method for carrying out allocation management on a physical memory pool, wherein the physical memory pool is composed of a GPU (graphics processing Unit) video memory and other physical memories in a system, management structure information of the physical memory pool comprises a released memory object set and an allocated memory object set, wherein the released memory object set comprises a free memory object which is currently released after being allocated in the physical memory pool, the allocated memory object set comprises a memory object which is allocated and currently used in the physical memory pool,

the method comprises the following steps:

responding to a memory with a request of a specified capacity, and inquiring whether a memory object with the specified capacity exists in the released memory object set;

if yes, the inquired memory objects with the specified capacity are distributed; and

deleting the allocated memory objects from the set of released memory objects and adding to the set of allocated memory objects.

2. The method of claim 1, further comprising:

when the memory object with the designated capacity does not exist in the released memory object set, allocating the memory object with the designated capacity in the unallocated memory of the physical memory pool; and

adding the allocated memory objects to the set of allocated memory objects.

3. The method of claim 2, further comprising:

when the capacity of the unallocated memory is smaller than the designated capacity, allocating the memory objects with the released memory object centralized capacity larger than the designated capacity; and

4. The method of claim 3, wherein the management structure information further includes memory object capacity information for recording memory object capacities in the released set of memory objects and the allocated set of memory objects,

when the capacity of the unallocated memory is less than the designated capacity, allocating the memory objects with the released memory object collective capacity greater than the designated capacity includes:

when the capacity of the unallocated memory is smaller than the designated capacity, using the memory object capacity information to inquire the released memory objects with the centralized capacity larger than the designated capacity and the minimum capacity difference with the designated capacity; and

and allocating the inquired released memory objects.

5. The method of claim 1, wherein the method is applied to deep learning,

in response to a request for a specified amount of memory, querying the set of released memory objects for the presence of the specified amount of memory objects comprises:

and responding to a request for a memory with specified capacity when the deep learning is performed in the first round of circulation, and inquiring whether the memory object with the specified capacity exists in the released memory object set.

6. The method of claim 5, further comprising:

and for each cycle of the deep learning after the first cycle, multiplexing the memory objects and/or data in the memory objects allocated in the first cycle.

7. The method of claim 5, wherein the physical memory pool is configured into a static memory region and a dynamic memory region,

the static memory area is used for allocating memory objects for first data which are frequently accessed in the deep learning and are smaller than a specified size, and the static memory area is configured in a GPU (graphics processing unit) memory of the physical memory pool;

the dynamic memory area is used for distributing memory objects for other data except the first data in the deep learning.

8. The method of claim 7, wherein the dynamic memory regions comprise a first dynamic memory region for allocating memory objects for multi-use data and a second dynamic memory region for allocating memory objects for one-time data.

9. The method of claim 8, wherein the static memory region is adjacent to the dynamic memory region in a physical memory pool;

the starting address end of the dynamic memory area is the starting address end of the first dynamic memory area, and the ending address end of the dynamic memory area is the ending address end of the second dynamic memory area; or, the start address end of the dynamic memory area is the start address end of the second dynamic memory area, and the end address end of the dynamic memory area is the end address end of the first dynamic memory area.

10. The method of claim 9, wherein,

when the end address end of the static memory area is adjacent to the start address end of the dynamic memory area, the static memory area distributes memory objects according to the direction sequence from the start address to the end address;

when the end address end of the dynamic memory area is adjacent to the start address end of the static memory area, the static memory area distributes memory objects according to the direction sequence from the end address to the start address;

with respect to the dynamic memory region,

under the condition that the starting address end of the dynamic memory area is the starting address end of the first dynamic memory area and the ending address end of the dynamic memory area is the ending address end of the second dynamic memory area, the first dynamic memory area allocates memory objects in sequence in the direction from the starting address to the ending address, and the second dynamic memory area allocates memory objects in sequence in the direction from the ending address to the starting address;

and under the condition that the starting address end of the dynamic memory area is the starting address end of the second dynamic memory area and the ending address end of the dynamic memory area is the ending address end of the first dynamic memory area, sequentially allocating memory objects in the second dynamic memory area according to the direction from the starting address to the ending address, and sequentially allocating the memory objects in the first dynamic memory area according to the direction from the ending address to the starting address.

11. The method of claim 1, wherein the other physical memory comprises CPU memory and/or non-volatile memory.

12. A device for performing allocation management on a physical memory pool, wherein the physical memory pool is composed of a GPU (graphics processing unit) video memory and other physical memories in a system, management structure information of the physical memory pool comprises a released memory object set and an allocated memory object set, the released memory object set comprises a free memory object which is currently released after being allocated in the physical memory pool, the allocated memory object set comprises a memory object which is allocated and currently used in the physical memory pool,

the device comprises:

at least one processor for executing a program code for the at least one processor,

a memory coupled to the at least one processor, an

A computer program stored on the memory, the computer program being executable by the at least one processor to:

13. A physical memory pool comprises GPU video memory and other physical memory in the system,

the management structure information of the physical memory pool comprises a released memory object set and an allocated memory object set, wherein the released memory object set comprises a free memory object which is currently released after being allocated in the physical memory pool, and the allocated memory object set comprises a memory object which is allocated and currently used in the physical memory pool.

14. The physical memory pool of claim 13, wherein the physical memory pool is configured into static memory regions and dynamic memory regions,

the static memory area is used for allocating memory objects for first data which are frequently accessed in deep learning and smaller than a specified size, and the static memory area is configured in a GPU (graphics processing unit) memory of the physical memory pool;

the dynamic memory area is used for allocating memory objects for other data except the first data in deep learning.

15. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-11.

16. A computer program product comprising a computer program which, when executed by a processor, implements the method of any one of claims 1-11.