CN116680211A - Virtual memory management method, device, electronic equipment and storage medium - Google Patents

Virtual memory management method, device, electronic equipment and storage medium Download PDF

Info

Publication number
CN116680211A
CN116680211A CN202310680298.4A CN202310680298A CN116680211A CN 116680211 A CN116680211 A CN 116680211A CN 202310680298 A CN202310680298 A CN 202310680298A CN 116680211 A CN116680211 A CN 116680211A
Authority
CN
China
Prior art keywords
virtual address
address space
virtual
memory
accelerator
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310680298.4A
Other languages
Chinese (zh)
Inventor
解锋涛
李岳旸
刘田
邢彪
李俊渊
方炳祐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Samsung China Semiconductor Co Ltd
Samsung Electronics Co Ltd
Original Assignee
Samsung China Semiconductor Co Ltd
Samsung Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Samsung China Semiconductor Co Ltd, Samsung Electronics Co Ltd filed Critical Samsung China Semiconductor Co Ltd
Priority to CN202310680298.4A priority Critical patent/CN116680211A/en
Publication of CN116680211A publication Critical patent/CN116680211A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • G06F12/109Address translation for multiple virtual address spaces, e.g. segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/10Providing a specific technical effect
    • G06F2212/1012Design facilitation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/10Providing a specific technical effect
    • G06F2212/1016Performance improvement
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/65Details of virtual memory and virtual address translation
    • G06F2212/657Virtual address space management
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

The disclosure provides a virtual memory management method, a virtual memory management device, electronic equipment and a storage medium. The virtual memory management method comprises the following steps: responding to the memory application of the accelerator, and distributing virtual addresses in the reserved virtual address space to the accelerator; and responding to the memory application of the host processor, and distributing the virtual addresses outside the reserved virtual address space to the host processor.

Description

Virtual memory management method, device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of data storage technologies, and in particular, to a virtual memory management method, apparatus, electronic device, and storage medium.
Background
Because the memory management technology directly using the physical memory in the multi-process system may have defects of excessively complex variables or functions, address conflicts, insufficient memory and the like, the related technology has developed a computer virtual memory management technology for efficiently managing and protecting the memory.
In the related virtual memory management technology, addresses are often allocated in a process virtual address space according to a process running sequence. In this case, the virtual address space allocated for the Host (Host) and the virtual address space allocated for the plurality of accelerators (acceptors) are staggered in the virtual address space. When a system operation fails, data of the host and accelerator is typically restored to a state before the failure using Checkpoint/Restore (C/R) technology. However, interleaved virtual addresses of different processors often result in recovery failures of the C/R technology.
The foregoing information is presented merely as background information to aid in the understanding of the disclosure. No decision or assertion has been made as to whether any of the above is applicable as prior art with respect to the present disclosure.
Disclosure of Invention
To at least solve the above-mentioned drawbacks, embodiments of the present disclosure provide a virtual memory management method, apparatus, electronic device, and storage medium.
According to a first aspect of embodiments of the present disclosure, there is provided a virtual memory management method, including: responding to the memory application of the accelerator, and distributing virtual addresses in the reserved virtual address space to the accelerator; and responding to the memory application of the host processor, and distributing the virtual addresses outside the reserved virtual address space to the host processor.
Optionally, the step of allocating the virtual address in the reserved virtual address space to the accelerator in response to the memory application of the accelerator includes: and dynamically allocating a virtual address for the accelerator in the reserved virtual address space based on the size of the address space applied in the memory application and the size of the preset address space.
Optionally, the step of dynamically allocating a virtual address for an accelerator in the reserved virtual address space based on the size of the address space applied in the memory application and the predetermined address space size includes: dividing a minimum number of virtual address blocks corresponding to the memory application in the reserved virtual address space when the size of the address space applied in the memory application is larger than or equal to the size of the preset address space, and distributing virtual addresses corresponding to the memory application in the divided minimum number of virtual address blocks; and when the size of the address space of the application is smaller than the size of the preset address space, distributing the virtual address corresponding to the memory application in the virtual address block with the rest address space in the reserved virtual address space.
Optionally, the step of allocating a virtual address corresponding to the memory application in a virtual address block having a remaining address space in the reserved virtual address space includes: if there is a virtual address block with a size of the remaining address space smaller than the predetermined address space and larger than or equal to the size of the applied address space, an address corresponding to the memory application is allocated in the virtual address block.
Optionally, in response to a memory application of the accelerator, the step of allocating a virtual address in the reserved virtual address space to the accelerator comprises: in response to the memory application, when the size of the remaining virtual address space in the reserved virtual address space is smaller than the size of the address space applied in the memory application, reserving an additional virtual address space in the virtual address space which is not reserved and is not allocated with a virtual address; creating a chain corresponding to an accelerator for associating the reserved virtual address space and the additional virtual address space; and allocating the virtual address for the accelerator applied in the memory application in the additional virtual address space based on the chain corresponding to the accelerator.
Optionally, the virtual memory management method further includes: in response to a memory reclamation request for the additional virtual address space, releasing the allocated virtual address in the additional virtual address space and removing the additional virtual address space from the chain corresponding to the accelerator.
Optionally, the virtual memory management method further includes: in response to the checkpoint/resume C/R request, a virtual address assigned to each accelerator is restored in each of one or more reserved virtual address spaces and spaces of the one or more accelerators, respectively, and/or a virtual address assigned to the host processor is restored in a virtual address space other than the one or more virtual address spaces.
Optionally, the virtual memory management method is performed using a unified computing device architecture (CUDA).
According to a second aspect of embodiments of the present disclosure, there is provided a virtual memory management apparatus, including: an accelerator allocation unit configured to: responding to the memory application of the accelerator, and distributing virtual addresses in the reserved virtual address space to the accelerator; a host processor allocation unit configured to: and responding to the memory application of the host processor, and distributing the virtual addresses outside the reserved virtual address space to the host processor.
Optionally, the reserved virtual address space is in units of a virtual address block having a predetermined address space size, and wherein the accelerator allocation unit is configured to allocate the virtual address in the reserved virtual address space to the accelerator in response to a memory application of the accelerator by: and dynamically allocating a virtual address for the accelerator in the reserved virtual address space based on the size of the address space applied in the memory application and the size of the preset address space.
Optionally, the accelerator allocation unit is configured to dynamically allocate the virtual address for the accelerator in the reserved virtual address space based on the size of the address space applied in the memory application and the predetermined address space size by: dividing a minimum number of virtual address blocks corresponding to the memory application in the reserved virtual address space when the size of the address space applied in the memory application is larger than or equal to the size of the preset address space, and distributing virtual addresses corresponding to the memory application in the divided minimum number of virtual address blocks; and when the size of the address space of the application is smaller than the size of the preset address space, distributing the virtual address corresponding to the memory application in the virtual address block with the rest address space in the reserved virtual address space.
Optionally, the accelerator allocation unit is configured to allocate a virtual address corresponding to the memory application in a virtual address block having a remaining address space in the reserved virtual address space by: if there is a virtual address block with a size of the remaining address space smaller than the predetermined address space and larger than or equal to the size of the applied address space, an address corresponding to the memory application is allocated in the virtual address block.
Optionally, the accelerator allocation unit is configured to allocate a virtual address in the reserved virtual address space to the accelerator in response to a memory application of the accelerator by: in response to the memory application, when the size of the remaining virtual address space in the reserved virtual address space is smaller than the size of the address space applied in the memory application, reserving an additional virtual address space in the virtual address space which is not reserved and is not allocated with a virtual address; creating a chain corresponding to an accelerator for associating the reserved virtual address space and the additional virtual address space; and allocating the virtual address for the accelerator applied in the memory application in the additional virtual address space based on the chain corresponding to the accelerator.
Optionally, the virtual memory management device further includes: an address space reclamation unit configured to: in response to a memory reclamation request for the additional virtual address space, releasing the allocated virtual address in the additional virtual address space and removing the additional virtual address space from the chain corresponding to the accelerator.
Optionally, the virtual memory management device further includes: a checkpoint recovery unit configured to: in response to the checkpoint/resume C/R request, a virtual address assigned to each accelerator is restored in each of one or more reserved virtual address spaces and spaces of the one or more accelerators, respectively, and/or a virtual address assigned to the host processor is restored in a virtual address space other than the one or more virtual address spaces.
Optionally, the virtual memory management apparatus performs operations using a unified computing device architecture CUDA.
According to a third aspect of embodiments of the present disclosure, there is provided an electronic device comprising: at least one processor; at least one memory storing computer-executable instructions, wherein the computer-executable instructions, when executed by the at least one processor, cause the at least one processor to perform the virtual memory management method as described above.
According to a fourth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium, wherein instructions in the computer-readable storage medium, when executed by at least one processor, cause the at least one processor to perform the virtual memory management method as described above.
According to the virtual memory management method, the device, the electronic equipment and the storage medium, the virtual address space is reserved for the accelerator alone to isolate the accelerator from the virtual address space of the host processor, so that the memory can be efficiently managed, the problems of address randomness and inconsistent recovery addresses are avoided, the communication overhead is reduced, and the interface calling times and the execution time are reduced.
Effects obtainable from the present disclosure may not be limited by the above-described effects, and other effects not mentioned may be clearly understood from the following description by those of ordinary skill in the art to which the present disclosure pertains.
Drawings
The foregoing and other aspects, features, and advantages of certain embodiments of the disclosure will become more apparent from the following description taken in conjunction with the accompanying drawings in which:
FIG. 1 is an exemplary diagram illustrating a virtual memory management method;
FIGS. 2 a-2 c illustrate example diagrams of restoring a virtual address space;
FIG. 3 is a flow chart illustrating a virtual memory management method according to an embodiment of the present disclosure;
FIGS. 4a and 4b are exemplary diagrams illustrating a virtual memory management method according to embodiments of the present disclosure;
FIG. 5 is an example diagram illustrating a process of assigning virtual addresses according to an embodiment of the present disclosure;
FIG. 6 is an example diagram illustrating a process of dynamically assigning virtual addresses according to an embodiment of the present disclosure;
FIG. 7 is an example diagram illustrating a process of dynamically expanding a virtual address space according to an embodiment of the present disclosure;
FIG. 8 illustrates an example diagram of virtual address space using dynamic expansion according to an embodiment of the present disclosure;
FIG. 9 is an example diagram illustrating a memory reclamation process according to an embodiment of the present disclosure;
FIG. 10 is an example diagram illustrating a process of releasing virtual addresses according to an embodiment of the present disclosure;
fig. 11 is an exemplary diagram showing a C/R technique in the related art;
FIG. 12 is an example diagram illustrating a C/R technique according to an embodiment of the present disclosure;
FIG. 13 is a block diagram illustrating a configuration of a virtual memory management device according to an embodiment of the present disclosure; and
fig. 14 is a schematic diagram illustrating a system 1000 to which a storage device is applied according to an embodiment of the present disclosure.
Detailed Description
In order to enable those skilled in the art to better understand the technical solutions of the present disclosure, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.
The embodiments and the terminology used in connection with the embodiments are not intended to limit the technology described herein to the particular embodiments and should be understood to include various modifications, equivalents, and/or alternatives to the embodiments. As used herein, each of the descriptions such as "a or B", "at least one of a and B", "A, B or C", "at least one of A, B and C", and "at least one of A, B or C" may include all possible combinations of items listed with a respective one of the plurality of descriptions. For example, "at least one of the first and second steps is executed", that is, three cases are juxtaposed as follows: (1) performing step one; (2) executing the second step; (3) executing the first step and the second step. As used herein, terms such as "first" and "second" may be used to simply distinguish one item from another item and not to limit the items in other respects (e.g., importance or order).
Hereinafter, a Host processor (also referred to herein as a Host (Host)) according to embodiments of the present disclosure may include a processor, such as a central processor (Central Processing Unit, CPU) or an application processor (Application Processor, AP) for general purpose computing, as well as any other type of processor capable of performing similar functions. A host according to embodiments of the present disclosure may run, for example, software (e.g., a program) to control at least one other component (e.g., a hardware component or a software component) connected thereto, and may perform various data processing or computation. A host according to embodiments of the present disclosure may include a host controller and a host memory. The host memory may be used as a buffer memory configured to temporarily store data to be transmitted to or received from the storage device. The storage device may include a storage medium configured to store data in response to a request from a host.
Accelerators according to embodiments of the present disclosure may include auxiliary processors that are operationally independent from or in conjunction with the host, such as graphics processors (Graphics Processing Unit, GPU) for accelerating computations, data processors (Data Processing Unit, DPU) for data processing, tensor processors (Tensor Processing Unit, TPU) for accelerating artificial intelligence (Artificial Intelligence, AI) algorithms, neural network processors (Neural Network Processing Unit, NPU) for neural network algorithms, and other types of processors. An accelerator according to embodiments of the present disclosure may include dedicated circuitry for high-speed data operations, such as Artificial Intelligence (AI) data operations. An accelerator according to embodiments of the present disclosure may be implemented as a component that is physically separate/combined with a component of a host.
A virtual address space according to embodiments of the present disclosure may represent a range of virtual memory (e.g., a set of virtual addresses), which may also be referred to herein as an address space, a process virtual address space, a memory Box (Mem Box), etc., which terms may be used interchangeably.
An address allocation operation according to an embodiment of the present disclosure may include an operation of allocating virtual memory/addresses for address mapping with corresponding physical memory/addresses in response to a memory application (e.g., an address allocation request).
An address mapping operation according to an embodiment of the present disclosure may be an operation of converting a virtual address received from a host into a physical address for actually storing data in a nonvolatile memory.
Hereinafter, for clarity of explanation of embodiments of the present disclosure, a host processor may be exemplarily described as a CPU or a host, and an accelerator may be exemplarily described as a GPU, but this is merely an example, and a host and an accelerator according to embodiments of the present disclosure may be the above-described or any other type of elements performing similar functions.
The related art of the present disclosure will be described below with reference first to fig. 1 and 2a to 2 c. Fig. 1 shows an exemplary diagram of a virtual memory management method, and fig. 2a to 2c show an exemplary diagram of restoring a virtual address space.
Fig. 1 shows a case comprising a CPU and two GPUs, wherein dynamic random access memories (Dynamic Random Access Memory, DRAM) used by different processors are shown in different colors, wherein green represents the virtual address space occupied by a host (e.g., CPU), blue represents the virtual address space occupied by a first accelerator (GPU 0), and yellow represents the virtual address space occupied by a second accelerator (GPU 1). In this case, the virtual memory is managed by using unified virtual memory (Unified Virtual Memory, UVM). At system run time, as shown in FIG. 1, the processes of the CPU and the two GPUs interleave using a virtual address space. For example, virtual address space may be sequentially occupied according to the process running order.
When a system operation fails, the process virtual address space may be restored to a state prior to the failure by using Checkpoint/Restore (C/R) techniques. The commonly used C/R technologies include a kernel space C/R technology, an application level C/R technology and a transparent C/R technology, which have obvious advantages compared with the kernel space C/R technology and the application level C/R technology: the method is irrelevant to an operating system, a developer does not need to know a user program, does not need to modify an application program, and can timely perform the checkpoint. But related C/R techniques often rely on multiple API interactions to perform and restore checkpoints, introducing more communication overhead and less robust, more complex large user programs. The process of performing a restoration operation using the transparent C/R technique is described below with reference to fig. 2a to 2C.
Fig. 2a to 2c show exemplary diagrams of processes for restoring a virtual address space in different situations.
FIG. 2a is an exemplary diagram of a process for restoring a virtual address space in the case where only one GPU is included, where green represents the virtual address space occupied by the CPU and yellow represents the virtual address space occupied by the GPU. Fig. 2c is an exemplary diagram of a process of restoring a virtual address space in the case of including two GPUs, where green represents a virtual address space occupied by a CPU, blue represents a virtual address space occupied by a first GPU (GPU 0), and yellow represents a virtual address space occupied by a second GPU (GPU 1). When the system is running, the virtual address space is alternately occupied from top to bottom. In the example of FIG. 2a, when restoring to the checkpointed state using C/R techniques, if the virtual address space for the CPU is restored first, then the virtual address space for the GPU is restored, then the previous run-time process virtual address space may be successfully restored. Conversely, if the virtual address space for the GPU is restored first, and then the virtual address space for the CPU is restored, the restoration fails. Further, in fig. 2c, four orders of performing recovery operations are tried, but all of the four recovery orders fail to recover. According to the mechanism of performing the restoration operation in the related C/R technology, only the corresponding virtual address space restoration processing is performed sequentially according to the fixed host and one or more GPUs, so that the restoration can be successfully performed, and accordingly, the complexity of virtual address allocation increases due to the increase in the number of GPUs.
FIG. 2b is an exemplary diagram of a process for restoring a virtual address space in the case where there is a release of the virtual space by a CPU or GPU in system operation, where green represents the virtual address space occupied by the CPU and yellow represents the virtual address space occupied by the GPU. When the system is running, virtual address space is alternately occupied from top to bottom, the GPU releases the virtual address space before proceeding with the checkpoints, in fig. 2b the hatched portion represents the virtual address space that was occupied by GPU and has been released at the time of checkpoints, in fig. 2c the upper hatched portion represents the virtual address space that was occupied by GPU1 and has been released at the time of checkpoints, and the lower hatched portion represents the virtual address space that was occupied by GPU0 and has been released at the time of checkpoints. When restoring to the checkpointed state using C/R techniques, whether with one or more GPUs, it is possible to restore in the virtual address space that has been released at the time of checkpointing, resulting in a restoration failure.
Here, although only two GPUs are shown, the number of GPUs may be greater. Accordingly, as the number of GPUs increases, finding the correct memory recovery sequence becomes difficult, and the recovery difficulty is also increased. In summary, in the related virtual memory management method, the problem that the restored virtual address space is inconsistent with the virtual address space at the time of checking points exists by using the C/R technology, so that the restoration of the application program fails or the calculation result is wrong.
In order to solve at least the above-mentioned problems and other drawbacks of the related art, the present disclosure proposes a virtual memory management method, apparatus, electronic device, and storage medium for an accelerator, which will be described in detail below with reference to fig. 3 to 13.
According to embodiments of the present disclosure, a virtual memory management method according to embodiments of the present disclosure may be performed using a unified computing device architecture (CUDA, compute Unified Device Architecture). In particular, various specific embodiments of the disclosure can be implemented using CUDA. Through the use of CUDA, communication overhead can be saved, and the number of API interactions is reduced. Hereinafter, this will be described in detail with reference to specific embodiments.
Fig. 3 illustrates a flow chart of a virtual memory management method according to an embodiment of the present disclosure. Fig. 4a shows an example diagram of a virtual memory management method according to an embodiment of the present disclosure. Fig. 4b shows an example diagram of a virtual memory management method according to an embodiment of the present disclosure.
Referring to fig. 3, in step S301, a virtual address in a reserved virtual address space is allocated to an accelerator in response to a memory application of the accelerator.
Specifically, the virtual address space used by the accelerator is isolated from the virtual address space used by the host processor by reserving the virtual address space for the accelerator in the virtual memory in advance, so that when a memory application is received in a subsequent process, a virtual address for the accelerator is allocated in the isolated virtual address space. Step S301 is described in detail below with reference to fig. 4a and 4 b.
Fig. 4a illustrates an example of a virtual memory management method in the case where only one accelerator is included according to an embodiment of the present disclosure. Referring to fig. 4a, green represents virtual addresses assigned to a host (or host processor), and blue represents virtual addresses assigned to an accelerator. In the related art, as shown in the left-hand diagram, virtual address spaces are used in a staggered manner, in other words, virtual address spaces allocated to a host and virtual address spaces allocated to an accelerator are distributed in a staggered manner with each other throughout virtual memory.
According to an embodiment of the present disclosure, the different virtual address spaces for the host and accelerator are isolated by reserving a portion of the virtual address space (e.g., the lower shaded portion in fig. 4 a) in the virtual address space of the virtual memory (the isolated virtual address space is shown in a different style of shaded portion in fig. 4 a).
Fig. 4b illustrates an example of a virtual memory management method in the case of including two accelerators (i.e., a first accelerator and a second accelerator in fig. 4 b) according to an embodiment of the present disclosure. Referring to fig. 4b, green represents a virtual address space allocated to a host, blue represents a virtual address space allocated to a first accelerator, and yellow represents a virtual address space allocated to a second accelerator. In the related art, as shown in the left-hand diagram, virtual address spaces are used in an interleaved manner. In other words, the virtual address space allocated to the host, the virtual address space allocated to the first accelerator, and the virtual address space allocated to the second accelerator are staggered with each other throughout the virtual memory.
According to an embodiment of the present disclosure, the different virtual address spaces for the host and the two accelerators are isolated by reserving in advance two portions of the virtual address space of the virtual memory (e.g., the lower two shaded portions in FIG. 4 b) (in FIG. 4b, the isolated virtual address spaces are shown in different styles of shaded portions). In the case of fig. 4b, a respective partial virtual address space is reserved for both accelerators.
The virtual memory may be isolated by reserving a virtual address space, which may be reserved according to a preset partitioning rule according to embodiments of the present disclosure.
According to an embodiment of the present disclosure, the partitioning rule may be to sequentially reserve the virtual address space according to an order in which each of the plurality of accelerators first initiates the memory application. For example, a corresponding virtual address space may be reserved for each accelerator on demand according to the order in which each accelerator initiated the memory application.
According to an embodiment of the present disclosure, the partitioning rule may be to reserve a virtual address space according to a rule of a preset size.
According to an embodiment of the present disclosure, the reserved virtual address space is in units of virtual address blocks having a predetermined address space size. For example, when a virtual address space is reserved, the virtual address space may be reserved in accordance with the size of the virtual address space having an integer number of virtual address blocks.
According to embodiments of the present disclosure, the predetermined address space size may include, but is not limited to, a minimum granularity of virtual memory allocation or any other preset fixed value, e.g., 2M.
According to an embodiment of the present disclosure, the partitioning rule may be to reserve a virtual address space according to a size of a virtual address space applied in a first memory application of the accelerator.
It should be appreciated that the manner of reserving virtual address space according to embodiments of the present disclosure includes without limitation to the manner described above.
According to embodiments of the present disclosure, the respective virtual address spaces reserved for different accelerators may have the same/different sizes. Although the present disclosure shows only reserved virtual address spaces having the same size, embodiments according to the present disclosure are not limited thereto.
Returning to fig. 3, in step S302, in response to the memory application of the host processor, a virtual address outside the reserved virtual address space is allocated to the host processor.
Specifically, according to an embodiment of the present disclosure, in response to a memory application, a virtual address for an accelerator specified in the memory application is allocated in a reserved virtual address space corresponding to the accelerator, or a virtual address for a host processor is allocated in a virtual address space other than the reserved virtual address space among all virtual address spaces, and the allocated virtual address is mapped with a corresponding physical memory. That is, since the virtual address spaces of the virtual memory have been isolated for use by the host processor or accelerator, the virtual addresses of the corresponding host processor/accelerator may be allocated in the isolated respective virtual address spaces in response to the memory application.
A memory application according to embodiments of the present disclosure may specify a host or a particular accelerator and may include, for example, a virtual address allocation request from the host or accelerator.
Steps S301 and S302 will be described in detail below with reference to fig. 5.
Fig. 5 is an example diagram illustrating a process of allocating virtual addresses according to an embodiment of the present disclosure.
Referring to fig. 5, the reserved virtual address space for the first accelerator (GPU 0) and the virtual address space for the second accelerator (GPU 1) are shown in the left-hand diagram, wherein the different patterns of grey shaded portions represent the different virtual address spaces reserved for the different accelerators.
In step S501, in response to the allocation request of GPU0 (i.e., the memory application of GPU 0), a handle of the corresponding physical memory is created.
As an example, the memory management unit (Memory Management Unit, MMU related configuration items (e.g. the size of the virtual address space to be reserved) are read in advance and virtual address space allocation is performed according to the configuration items, so as to complete the initialization of the internal address pool and the storage allocation/release of the device with respect to the Upper Half (Upper-Half) of the C/R frame, where the application process is located in Upper-Half of the C/R frame and the code process is located in the Lower Half (Lower-Half) of the C/R frame.
As an example, the allocation of physical memory is performed in response to the Wrapper intercepting the virtual memory application interface cudaMalloc. Specifically, in response to the cudaMalloc being intercepted by the Wrapper, an allocate interface request is invoked to invoke a virtual memory management interface to allocate memory on GPU0 and obtain a physical memory handle.
Specifically, the Wrapper is configured to intercept a CUDA related application program interface (Application Programming Interface, API) invoked during an application program executing process, where the C/R framework intercepts an operation related interface by using a Wrapper mechanism and performs Log according to a requirement, and for an API related to a memory, the Wrapper intercepts and then invokes the related API to perform a storage operation.
As an example, interception of CUDA API is implemented by using Linux system environment variable ld_preload, specifically, by setting cudawrappr to ld_preload, so that the system loads the Wrapper library preferentially in the dynamic linking process of loading the application program, and finally, when the application program executes the API related to CUDA memory allocation/release, the application program can be intercepted by the C/R framework to enter allocation/release of virtual memory management.
In step S502, the virtual address space reserved for GPU0 is searched for the available virtual address space.
As an example, a satisfactory virtual address space block is found from a Free List (Free List), and the Free List is updated. According to an embodiment of the present disclosure, the free list is a data structure for storing free virtual address blocks, which in an initial state contains only one virtual address space block in its entirety.
In step S502, virtual address space blocks are allocated by mapping the handles of the physical memory in step S501 with the corresponding virtual address space blocks found (in fig. 5, blue parts show virtual address space blocks allocated in response to the current memory application).
As an example, the internal resource pool is updated based on the result of the mapping.
In step S503, access is set based on the result of the mapping.
As an example, the newly allocated virtual address space block is added to an Active List (Active List). According to an embodiment of the present disclosure, the active list is a data structure for holding a virtual address space block currently being used (i.e., an allocated and unreleased virtual address space block), which is empty in an initial state.
In addition, a mapping table (Map) is updated. According to an embodiment of the present disclosure, the mapping table is a data structure for storing a mapping relationship between a virtual address space block and a physical memory, which is empty in an initial state.
The memory allocation process for GPU1 is similar to that described above for GPU0, and will not be repeated here.
According to embodiments of the present disclosure, as a result of the memory application, as shown in fig. 4a and 4b, virtual address spaces for different accelerators can be allocated individually in reserved virtual address spaces, and virtual address spaces for host processors can be allocated in other than reserved virtual address spaces.
According to embodiments of the present disclosure, when a particular virtual address space is reserved, in response to a memory application, the particular virtual address space is dynamically allocated to map virtual addresses to physical addresses.
According to an embodiment of the present disclosure, in response to a memory application of an accelerator, the step of allocating a virtual address in a reserved virtual address space to the accelerator includes: and dynamically allocating a virtual address for the accelerator in the reserved virtual address space based on the size of the address space applied in the memory application and the size of the preset address space. This will be described below with reference to fig. 6. Fig. 6 is an example diagram illustrating a process of dynamically allocating virtual addresses according to an embodiment of the present disclosure.
According to an embodiment of the present disclosure, the step of dynamically allocating a virtual address for an accelerator in the reserved virtual address space based on the size of the address space applied in the memory application and the predetermined address space size includes: dividing a minimum number of virtual address blocks corresponding to the memory application in the reserved virtual address space when the size of the address space applied in the memory application is larger than or equal to the size of the preset address space, and distributing virtual addresses corresponding to the memory application in the divided minimum number of virtual address blocks; and when the size of the address space of the application is smaller than the size of the preset address space, distributing the virtual address corresponding to the memory application in the virtual address block with the rest address space in the reserved virtual address space. That is, when the size of the applied address space is greater than or equal to the size of the virtual address block, an integer number of unused virtual address blocks are dynamically divided from the reserved virtual address space for allocating addresses, and when the size of the applied address space is smaller than the size of the virtual address block, addresses are allocated in the virtual address block having the remaining address space.
According to an embodiment of the present disclosure, the divided minimum number of virtual address blocks corresponding to the memory application are consecutive on virtual addresses. According to an embodiment of the present disclosure, the physical addresses having a mapping relationship with the divided minimum number of virtual address blocks corresponding to the memory application are consecutive.
According to an embodiment of the present disclosure, the step of allocating a virtual address corresponding to the memory application in a virtual address block having a remaining address space in the reserved virtual address space includes: if there is a virtual address block with a size of the remaining address space smaller than the predetermined address space and larger than or equal to the size of the applied address space, an address corresponding to the memory application is allocated in the virtual address block. That is, if a portion of the virtual address block has been allocated addresses or there is mapped physical memory and a portion of the virtual address block not allocated addresses is sufficient to allocate the addresses applied in the memory application, the applied addresses are preferentially allocated in the address block.
According to an embodiment of the present disclosure, a virtual address space is dynamically allocated based on the size of the address space of the application and the unit size of the address space, which is described in detail below with reference to the example of fig. 6.
In fig. 6, it is assumed that the predetermined address space size of the virtual address block is 2M (i.e., 2048K), that is, the reserved virtual address space is an integer number of 2M sizes. Referring to fig. 6, address space sizes applied in a plurality of memory applications (e.g., address allocation requests) of one accelerator are 100K, 9M, 3M, and 100K, respectively.
In response to a first memory application for which the address space size is 100K, since 100K is smaller than 2M, a virtual address block is divided from the reserved virtual address space as a first virtual address space block, and the address space of 100K is allocated in the divided first virtual address space block with the size of 2M. The virtual address corresponding to the first memory application is mapped to physical memory of size 2M.
In response to a second memory application for which the address space size is 9M, 5 virtual address blocks are divided from the reserved virtual address space as a second virtual address space block of 10M, and 9M address space is allocated in the divided second virtual address space block of 10M, because 9M is greater than 2M. The virtual address block corresponding to the second memory application is mapped to physical memory of size 10M.
In response to a third memory application for which the address space size is 3M, since 3M is greater than 2M, 2 virtual address blocks are divided from the reserved virtual address space as a third virtual address space block of size 4M, and 3M address space is allocated in the divided third virtual address space block of size 4M. The virtual address corresponding to the third memory application is mapped to physical memory of size 4M.
In response to a memory application for an address space size of 100K, preferentially allocating 100K of address space in the first virtual address space block because 100K is less than 2M and because the remaining address space in the previous first virtual address space block is greater than 100K. The virtual address block corresponding to the fourth memory application is mapped to physical memory of size 2M.
The virtual address space is dynamically allocated and mapped according to the size of the virtual address space and the size of the virtual address block, so that reasonable use of the virtual memory and the physical memory can be realized, and subsequent efficient management of the memory is facilitated.
According to embodiments of the present disclosure, the reserved virtual address space may not be suitable for memory applications of the respective accelerator due to the variability of the respective accelerator and/or application. According to embodiments of the present disclosure, a reserved virtual address space may be dynamically extended when the reserved virtual address space is insufficient to allocate a virtual address of an accelerator. A reservation operation according to an embodiment of the present disclosure refers to an operation of dividing a predetermined virtual address space at an initial stage to achieve isolation, but it should be understood that an operation of additionally reserving a virtual address space when the reserved virtual address space is insufficient is included in addition to an operation of reserving the virtual address space in advance.
The virtual memory management method according to the embodiment of the disclosure further includes: responsive to a memory application by the accelerator, the step of assigning a virtual address in the reserved virtual address space to the accelerator comprises: in response to the memory application, when the size of the remaining virtual address space in the reserved virtual address space is smaller than the size of the address space applied in the memory application, reserving an additional virtual address space in the virtual address space which is not reserved and is not allocated with a virtual address; creating a chain corresponding to an accelerator for associating the reserved virtual address space and the additional virtual address space; and allocating the virtual address for the accelerator applied in the memory application in the additional virtual address space based on the chain corresponding to the accelerator. Here, the remaining virtual address space in the reserved virtual address space is represented as a virtual address space to which a virtual address is not allocated. When the reserved virtual address space is insufficient, the new reserved additional virtual address space is extended and a chain of associations between the previously reserved virtual address space and the additional virtual address space is established. The process of dynamically expanding the reserved virtual address space will be described in detail below with reference to fig. 7.
Fig. 7 is an example diagram illustrating a process of dynamically expanding a virtual address space according to an embodiment of the present disclosure.
Referring to fig. 7, when the size of the virtual address space for the first accelerator (GPU 0) applied is larger than the size of the remaining virtual address space (upper hatched portion in the left side view of fig. 7) in the virtual address space reserved for GPU0, an additional virtual address space (lower portion in the left side view of fig. 7) for GPU0 is additionally reserved, a chain (chain) is established between the reserved virtual address space and the additional virtual address space, and a virtual address for GPU0 is allocated in the newly reserved additional virtual address space.
Specifically, in step S701, a handle of the physical memory is created in response to a memory application (e.g., an allocation request).
In step S712, it is determined whether there is a virtual address space available.
When it is determined that there is a virtual address space available, in step S713, mapping of the handle of the physical memory created in step S701 to the virtual memory in the reserved virtual address space is performed. Here, the process of address allocation is similar to the address allocation process described above with reference to fig. 5, and a repetitive description is not made.
When it is determined that there is no virtual address space available, that is, when the remaining available virtual address space in the reserved virtual address space is smaller than the applied virtual address space, a new additional virtual address space for GPU0 (e.g., creating a memory box) is reserved in step S723. That is, a new virtual address space for GPU0 is found and reserved.
In step S724, a new additional virtual address space (e.g., box) is added to the chain. That is, a chain of reserved virtual address space and additional virtual address space associated with GPU0 is created.
In step S725, it is determined that there is a virtual address space available in the additional virtual address space.
In step S726, the mapping of the handle of the physical memory created in step S701 and the virtual memory in the additional virtual address space is performed. Here, the process of address allocation is similar to the address allocation process described above with reference to fig. 5, and a repetitive description is not made.
When the mapping is completed, in step S702, access is set based on the result of the mapping. Here, the process of setting access is similar to the process of setting access described above with reference to fig. 5, and a repetitive description is not made.
According to an exemplary embodiment of the present disclosure, the additional virtual address space of one accelerator may be one or more virtual address spaces. For example, the additional virtual address space for GPU0 may be dynamically expanded multiple times, or multiple additional virtual address spaces for GPU0 may be expanded at a time. When the additional virtual address space is a plurality of additional virtual address spaces, the chain for the respective accelerator may associate the reserved virtual address space with all of the additional virtual address spaces.
When the reserved virtual address space is insufficient, the new virtual address space is dynamically expanded for the accelerator, so that memory management confusion when the address to be allocated exceeds the reserved virtual address space range can be avoided, and the accurate mapping between the virtual memory and the physical memory is realized. FIG. 8 illustrates an example diagram of using dynamically expanding virtual address space, according to an embodiment of the present disclosure. In fig. 8, the different virtual address spaces reserved for the different accelerators (first acceleration, … …, nth accelerator) are distinguished by grey shaded portions of different patterns. Referring to fig. 8, the leftmost diagram shows a diagram in which only a virtual address space is reserved without using a dynamic expansion process, the middle two diagrams show a diagram of a dynamic expanded virtual address space and an example in which virtual addresses are allocated (in which virtual addresses are reserved and allocated in units of the minimum space granularity of address allocation), and the rightmost diagram shows the composition of virtual memory. In fig. 8, the corresponding chains are represented by curves of different colors. An example of specific information about a process virtual address space is shown in the rightmost diagram of fig. 8.
According to embodiments of the present disclosure, after dynamically expanding the additional virtual address space, memory reclamation management may be entered when virtual addresses in the expanded additional virtual address space are no longer used (e.g., an application releases the accelerator's memory). The memory reclamation process according to the embodiment of the present disclosure is described in detail below with reference to fig. 9.
Fig. 9 is an exemplary diagram illustrating a memory reclamation process according to an embodiment of the present disclosure.
The virtual memory management method according to the embodiment of the disclosure further includes: in response to a memory reclamation request for the additional virtual address space, releasing the allocated virtual address in the additional virtual address space and removing the additional virtual address space from the chain corresponding to the accelerator.
Referring to fig. 9, in response to a memory reclamation request (e.g., an address release request) of an accelerator, an allocated virtual address in an additional virtual address space is demapped from a corresponding physical memory in step S901. That is, in response to the release request, the real address is disassociated from the virtual address. As an example, a virtual memory management interface is invoked to disassociate a virtual address from physical memory.
In step S902, the allocated virtual address is released. As an example, the freed virtual address space is added to the free list for subsequent use.
In step S903, the handle of the physical memory corresponding to the virtual address released in step S902 is released.
As an example, a release (free) interface is invoked in response to cudaFree being captured by a wrapper. In response to the free request, a handle of the physical memory corresponding to the virtual address to be released is obtained from a Map, and the handle is released by calling the virtual memory management interface.
As an example, in response to releasing the physical memory and the virtual address, a mapping table (Map) is updated for subsequent use.
In step S904, the additional virtual address space is removed from the chain. For example, information related to the additional virtual address space is removed from the chain corresponding to the accelerator.
In step S905, the additional virtual address space is released. That is, the additional virtual address space is restored to an unreserved state. Referring to the rightmost diagram of fig. 9, the lower virtual address space is restored to an unreserved state.
According to the embodiment of the disclosure, memory reclamation can be realized not only for the extended additional virtual address space, but also for the virtual address space which is originally reserved.
As an example, in response to a memory reclamation request for a reserved virtual address space of an accelerator, the allocated virtual address in the reserved virtual address space is disassociated with a corresponding physical memory, the allocated virtual address in the reserved virtual address space is released, and a chain corresponding to the accelerator is deleted.
Fig. 10 is an example diagram illustrating a process of releasing a virtual address corresponding to the example of 6 according to an embodiment of the present disclosure. In fig. 10, the allocated virtual addresses may be sequentially released in the virtual address space in the order of the requests of the plurality of release requests in response to the plurality of release requests, restoring the virtual address space to an unallocated state.
Here, the order of releasing the virtual addresses is not limited to the order shown in fig. 10, and the virtual addresses may be released in any appropriate order (e.g., an address order of the allocated virtual addresses in the virtual memory, a size order of the allocated virtual addresses, etc.).
As previously described, when a system fails to operate, the virtual memory may be restored using C/R techniques.
The virtual memory management method according to the embodiment of the disclosure further includes: in response to the C/R request, restoring the virtual address assigned to each accelerator in each of one or more reserved virtual address spaces and spaces of the one or more accelerators, respectively, and/or restoring the virtual address assigned to the host processor in a virtual address space other than the one or more virtual address spaces.
Specifically, referring to fig. 4a and 4b, based on the process virtual address space at the time of checkpoints, the virtual address allocated to the host processor is restored in the virtual address space for the host, the virtual address allocated to the first accelerator is restored in the virtual address space for the first accelerator, and the virtual address allocated to the second accelerator is restored in the virtual address space for the second accelerator.
Although the process of restoring the virtual memory in the order of the first accelerator and the host processor or the first accelerator, the second accelerator, and the host processor is illustrated in fig. 4a and 4b, the restoration order according to the embodiments of the present disclosure is not limited thereto, and the restoration may be performed in any suitable order.
According to an embodiment of the present disclosure, in response to a C/R request of an accelerator, a virtual address allocated to the accelerator is restored in a reserved virtual address space and an additional virtual address space based on the chain corresponding to the accelerator. That is, based on the chain corresponding to the accelerator, a reserved virtual address space and all additional virtual address spaces for the accelerator are determined, and the virtual address of the accelerator is restored in the determined plurality of virtual address spaces.
In the C/R technology in the related art, a proxy process is introduced to perform C/R, so that the overhead of communication between the proxy process and an application process is introduced. As an example, according to an embodiment of the present disclosure, a checkpoint framework of an accelerator puts a proxy process and an application program into the same address space through a split-process (split-process) technology, so that the code process and the application program externally appear as one process, and the inter-process switching is realized through context switching, thereby reducing the overhead problem of proxy process communication in C/R technology.
By such isolated virtual address space address allocation and restoration, conventional C/R techniques may be optimized. Fig. 11 is an example diagram illustrating a C/R technique in the related art, and fig. 12 is an example diagram illustrating a C/R technique according to an embodiment of the present disclosure. Referring to FIG. 11, at runtime, checkpoints need to record multiple logs (logs) for multiple processes in execution. When recovery is performed using the C/R technique, it is necessary to re-execute the application programs one by one based on the execution order of the recorded logs for recovery.
Referring to fig. 12, according to an embodiment of the present disclosure, since virtual address spaces allocated to respective accelerators are isolated from each other and have associated chains for the same accelerator, when recovery is performed using the C/R technique, processing within each reserved virtual address space only needs to be performed based on the chains, without relying on Log information for recovery.
According to the embodiment of the disclosure, only the state of the active accelerator is maintained, log is not required to be recorded for many times in the operation stage, and the active accelerator is reapplied only in the recovery stage when checking points, so that the redundant API execution times when the state of the accelerator is recovered are reduced.
By testing the memory management method according to the embodiment of the present disclosure, the results show that, compared with the memory management method in the related art, the execution time of the memory management method according to the embodiment of the present disclosure is significantly reduced, and the memory occupation about the accelerator is also significantly reduced, regardless of whether the virtual memory process is dynamically allocated/released or restored using the C/R technique.
In addition, the memory management method according to the embodiment of the disclosure carries out completely customized management on the memory, thereby avoiding the problem of randomness of addresses in the operating system.
Fig. 13 is a block diagram illustrating a virtual memory management device according to an embodiment of the present disclosure.
The virtual memory management apparatus 100 according to an embodiment of the present disclosure includes an accelerator allocation unit 101 and a host processor allocation unit 102.
According to an embodiment of the present disclosure, the accelerator allocation unit 101 is configured to: in response to a memory application by the accelerator, assigning a virtual address in the reserved virtual address space to the accelerator is configured to: in response to a memory application by the accelerator, a virtual address in the reserved virtual address space is allocated to the accelerator. That is, the accelerator allocation unit 101 may perform an operation corresponding to step S301 in the virtual memory management method described above.
According to an embodiment of the present disclosure, the host processor allocation unit 102 is configured to: and responding to the memory application of the host processor, and distributing the virtual addresses outside the reserved virtual address space to the host processor. That is, the host processor allocation unit 102 may perform operations corresponding to step S302 in the above-described virtual memory management method.
According to an embodiment of the present disclosure, the reserved virtual address space is in units of virtual address blocks having a predetermined address space size.
According to an embodiment of the present disclosure, the accelerator allocation unit 101 is configured to allocate a virtual address in a reserved virtual address space to an accelerator in response to a memory application of the accelerator by: and dynamically allocating a virtual address for the accelerator in the reserved virtual address space based on the size of the address space applied in the memory application and the size of the preset address space.
According to an embodiment of the present disclosure, the step of the accelerator allocation unit 101 being configured to dynamically allocate a virtual address for an accelerator in the reserved virtual address space based on the size of the address space applied in the memory application and the predetermined address space size by: dividing a minimum number of virtual address blocks corresponding to the memory application in the reserved virtual address space when the size of the address space applied in the memory application is larger than or equal to the size of the preset address space, and distributing virtual addresses corresponding to the memory application in the divided minimum number of virtual address blocks; and when the size of the address space of the application is smaller than the size of the preset address space, distributing the virtual address corresponding to the memory application in the virtual address block with the rest address space in the reserved virtual address space.
According to an embodiment of the present disclosure, the accelerator allocation unit 101 is configured to allocate a virtual address corresponding to the memory application in a virtual address block having a remaining address space in the reserved virtual address space by: if there is a virtual address block with a size of the remaining address space smaller than the predetermined address space and larger than or equal to the size of the applied address space, an address corresponding to the memory application is allocated in the virtual address block.
According to an embodiment of the present disclosure, the accelerator allocation unit 101 is configured to allocate a virtual address in a reserved virtual address space to an accelerator in response to a memory application of the accelerator by: in response to the memory application, when the size of the remaining virtual address space in the reserved virtual address space is smaller than the size of the address space applied in the memory application, reserving an additional virtual address space in the virtual address space which is not reserved and is not allocated with a virtual address; creating a chain corresponding to an accelerator for associating the reserved virtual address space and the additional virtual address space; and allocating the virtual address for the accelerator applied in the memory application in the additional virtual address space based on the chain corresponding to the accelerator.
According to an embodiment of the disclosure, the virtual memory management device further includes an address space reclaiming unit.
According to an embodiment of the present disclosure, the address space reclamation unit is configured to: in response to a memory reclamation request for the additional virtual address space, releasing the allocated virtual address in the additional virtual address space and removing the additional virtual address space from the chain corresponding to the accelerator. .
According to an embodiment of the disclosure, the virtual memory management device further includes a checkpoint recovery unit.
According to an embodiment of the present disclosure, the checkpoint recovery unit is configured to: in response to the checkpoint/resume C/R request, a virtual address assigned to each accelerator is restored in each of one or more reserved virtual address spaces and spaces of the one or more accelerators, respectively, and/or a virtual address assigned to the host processor is restored in a virtual address space other than the one or more virtual address spaces.
According to an embodiment of the present disclosure, the virtual memory management apparatus 100 performs operations using a unified compute device architecture (CUDA).
The specific method for performing the operations of the respective units of the virtual memory management device 100 in the above embodiment has been described in detail in the embodiments of the related method, and will not be described in detail herein.
Further, it should be understood that various units in the virtual memory management device 100 according to exemplary embodiments of the present disclosure may be implemented as hardware components and/or software components. The individual units may be implemented, for example, using a Field Programmable Gate Array (FPGA) or an Application Specific Integrated Circuit (ASIC), depending on the processing performed by the individual units as defined.
Fig. 14 is a schematic diagram illustrating a system 1000 to which a storage device is applied according to an embodiment of the present disclosure. The system 1000 of fig. 14 may be basically a mobile system such as a portable communication terminal (e.g., a mobile phone), a smart phone, a tablet Personal Computer (PC), a wearable device, a healthcare device, or an internet of things (IOT) device. However, the system 1000 of fig. 14 is not necessarily limited to a mobile system, and may be a PC, a laptop, a server, a media player, or an automotive device (e.g., a navigation device).
Referring to fig. 14, a system 1000 may include a main processor 1100, memories (e.g., 1200a and 1200 b), and storage devices (e.g., 1300a and 1300 b). Also, system 1000 may include at least one of an image capture device 1410, a user input device 1420, a sensor 1430, a communication device 1440, a display 1450, a speaker 1460, a power supply 1470, and a connection interface 1480.
The main processor 1100 may control all operations of the system 1000, and more particularly, may control operations of other components included in the system 1000. The main processor 1100 may be implemented as a general purpose processor, a special purpose processor, an application processor, or the like.
The main processor 1100 may include at least one Central Processing Unit (CPU) core 1110, and further include a controller 1120 for controlling the memories 1200a and 1200b and/or the storage devices 1300a and 1300b. In some embodiments, host processor 1100 may further include accelerator 1130, which is a dedicated circuit for high-speed data operations such as Artificial Intelligence (AI) data operations. The accelerator 1130 may include a Graphics Processing Unit (GPU), a Neural Processing Unit (NPU), and/or a Data Processing Unit (DPU), etc., and be implemented as a chip physically separate from other components of the main processor 1100.
Memories 1200a and 1200b may be used as the primary storage for system 1000. Although the memories 1200a and 1200b may include volatile memories such as Static Random Access Memories (SRAM) and/or Dynamic Random Access Memories (DRAM), respectively, the memories 1200a and 1200b may include nonvolatile memories such as flash memories, phase change random access memories (PRAM), and/or Resistive Random Access Memories (RRAM), respectively. The memories 1200a and 1200b may be implemented in the same package as the main processor 1100.
The memory devices 1300a and 1300b may be used as nonvolatile memory devices configured to store data regardless of being powered and have a larger storage capacity than the memories 1200a and 1200 b. The memory devices 1300a and 1300b may include memory controllers (STRG CTRL) 1310a and 1310b and nonvolatile memories (NVM) 1320a and 1320b, respectively, configured to store data via control of the memory controllers 1310a and 1310 b. Although the NVMs 1320a and 1320b may include V-NAND flash memory having a two-dimensional (2D) or three-dimensional (3D) structure, the NVMs 1320a and 1320b may include other types of NVM, such as PRAM and/or RRAM, etc.
Storage devices 1300a and 1300b may be physically separate from host processor 1100 and included in system 1000, or may be implemented in the same package as host processor 1100. In addition, the storage devices 1300a and 1300b may be of the type of Solid State Devices (SSDs) or memory cards, and may be removably coupled with other components of the system 1000 through an interface such as a connection interface 1480, which will be described later. The storage devices 1300a and 1300b may be devices to which standard protocols such as universal flash memory (UFS), embedded multimedia card (eMMC), or NVMe are applied, but are not limited thereto.
The image capturing device 1410 may take a still image or a moving image. Image capture device 1410 may include a camera, a video camcorder, and/or a webcam, among others.
User input devices 1420 may receive various types of data entered by a user of system 1000 and include a touchpad, keypad, keyboard, mouse, microphone, and the like.
The sensor 1430 may detect various types of physical quantities that may be obtained from outside the system 1000 and convert the detected physical quantities into electrical signals. The sensor 1430 may include a temperature sensor, a pressure sensor, an illuminance sensor, a position sensor, an acceleration sensor, a biosensor, and/or a gyro sensor, etc.
Communication device 1440 may transmit and receive signals between other devices external to system 1000 according to various communication protocols. Communication device 1440 may include an antenna, transceiver, modem, or the like.
The display 1450 and the speaker 1460 may be used as output devices configured to output visual and audible information, respectively, to a user of the system 1000.
The power supply 1470 may appropriately convert power supplied from a battery (not shown) embedded in the system 1000 and/or an external power source and supply the converted power to each component of the system 1000.
Connection interface 1480 may provide a connection between system 1000 and an external device that is connected to system 1000 and capable of transmitting data to system 1000 and receiving data from system 1000. The connection interface 1480 may be implemented by using various interface schemes such as Advanced Technology Attachment (ATA), serial ATA (SATA), external serial ATA (e-SATA), small Computer System Interface (SCSI), serial SCSI (SAS), external device interconnect (PCI), PCI express (PCIe), NVMe, IEEE 1394, universal Serial Bus (USB) interface, secure Digital (SD) card interface, multimedia card (MMC) interface, embedded multimedia card (eMMC) interface, UFS interface, embedded UFS (UFS) interface, compact Flash (CF) card interface, and the like.
Furthermore, embodiments of the present disclosure provide an electronic device including: at least one processor; at least one memory storing computer-executable instructions, wherein the computer-executable instructions, when executed by the at least one processor, cause the at least one processor to perform the virtual memory management method as described above.
According to embodiments of the present disclosure, the electronic device may be a personal computer, tablet device, personal digital assistant, smart phone, or other device capable of executing the above-described set of instructions. Here, the electronic device is not necessarily a single electronic device, but may be any device or an aggregate of circuits capable of executing the above-described instructions (or instruction set) singly or in combination. The electronic device may also be part of an integrated control system or system manager, or may be configured as a portable electronic device that interfaces with either locally or remotely (e.g., via wireless transmission).
In an electronic device, a processor may include a Central Processing Unit (CPU), a Graphics Processor (GPU), a programmable logic device, a special purpose processor system, a microcontroller, or a microprocessor. By way of example, and not limitation, processors may also include analog processors, digital processors, microprocessors, multi-core processors, processor arrays, network processors, and the like.
The processor may execute instructions or code stored in the memory, wherein the memory may also store data. The instructions and data may also be transmitted and received over a network via a network interface device, which may employ any known transmission protocol.
The memory may be integrated with the processor, for example, RAM or flash memory disposed within an integrated circuit microprocessor or the like. In addition, the memory may include a stand-alone device, such as an external disk drive, a storage array, or any other storage device usable by a database system. The memory and the processor may be operatively coupled or may communicate with each other, for example, through an I/O port, a network connection, etc., such that the processor is able to read files stored in the memory.
In addition, the electronic device may also include a video display (such as a liquid crystal display) and a user interaction interface (such as a keyboard, mouse, touch input device, etc.). All components of the electronic device may be connected to each other via a bus and/or a network.
Further, embodiments of the present disclosure provide a computer-readable storage medium, wherein instructions in the computer-readable storage medium, when executed by at least one processor, cause the at least one processor to perform the virtual memory management method as described above.
Examples of computer readable storage media according to embodiments of the present disclosure include: read-only memory (ROM), random-access programmable read-only memory (PROM), electrically erasable programmable read-only memory (EEPROM), random-access memory (RAM), dynamic random-access memory (DRAM), static random-access memory (SRAM), flash memory, nonvolatile memory, CD-ROM, CD-R, CD + R, CD-RW, CD+RW, DVD-ROM, DVD-R, DVD + R, DVD-RW, DVD+RW, DVD-RAM, BD-ROM, BD-R, BD-R LTH, BD-RE, blu-ray or optical disk storage, hard Disk Drives (HDD), solid State Disks (SSD), card memory (such as multimedia cards, secure Digital (SD) cards or ultra-fast digital (XD) cards), magnetic tape, floppy disks, magneto-optical data storage, hard disks, solid state disks, and any other means configured to store computer programs and any associated data, data files and data structures in a non-transitory manner and to provide the computer programs and any associated data, data files and data structures to a processor or computer to enable the processor or computer to execute the programs. The computer programs in the computer readable storage media described above can be run in an environment deployed in an electronic device, such as a client, host, proxy device, server, etc., and further, in one example, the computer programs and any associated data, data files, and data structures are distributed across networked computer systems such that the computer programs and any associated data, data files, and data structures are stored, accessed, and executed in a distributed fashion by one or more processors or computers.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the claims.

Claims (10)

1. A virtual memory management method, comprising:
responding to the memory application of the accelerator, and distributing virtual addresses in the reserved virtual address space to the accelerator;
and responding to the memory application of the host processor, and distributing the virtual addresses outside the reserved virtual address space to the host processor.
2. The virtual memory management method according to claim 1, wherein the reserved virtual address space is in units of virtual address blocks having a predetermined address space size, and
Wherein, in response to the memory application of the accelerator, the step of allocating the virtual address in the reserved virtual address space to the accelerator comprises: and dynamically allocating a virtual address for the accelerator in the reserved virtual address space based on the size of the address space applied in the memory application and the size of the preset address space.
3. The virtual memory management method according to claim 2, wherein the step of dynamically allocating a virtual address for an accelerator in the reserved virtual address space based on the size of the address space applied in the memory application and the predetermined address space size comprises:
dividing a minimum number of virtual address blocks corresponding to the memory application in the reserved virtual address space when the size of the address space applied in the memory application is larger than or equal to the size of the preset address space, and distributing virtual addresses corresponding to the memory application in the divided minimum number of virtual address blocks; and
and when the size of the address space of the application is smaller than the size of the preset address space, distributing the virtual address corresponding to the memory application in the virtual address block with the rest address space in the reserved virtual address space.
4. A virtual memory management method according to claim 3, wherein the step of allocating a virtual address corresponding to the memory application in a virtual address block having a remaining address space in the reserved virtual address space comprises:
if there is a virtual address block with a size of the remaining address space smaller than the predetermined address space and larger than or equal to the size of the applied address space, an address corresponding to the memory application is allocated in the virtual address block.
5. The virtual memory management method of claim 1, the step of assigning virtual addresses in the reserved virtual address space to the accelerator in response to a memory application by the accelerator comprising:
in response to the memory application, when the size of the remaining virtual address space in the reserved virtual address space is smaller than the size of the address space applied in the memory application, reserving an additional virtual address space in the virtual address space which is not reserved and is not allocated with a virtual address;
creating a chain corresponding to an accelerator for associating the reserved virtual address space and the additional virtual address space;
and allocating the virtual address for the accelerator applied in the memory application in the additional virtual address space based on the chain corresponding to the accelerator.
6. The virtual memory management method of claim 5 further comprising:
in response to a memory reclamation request for the additional virtual address space, releasing the allocated virtual address in the additional virtual address space and removing the additional virtual address space from the chain corresponding to the accelerator.
7. The virtual memory management method of claim 1, further comprising: in response to the checkpoint/resume C/R request,
restoring a virtual address assigned to each accelerator in each of one or more reserved virtual address spaces and spaces of one or more accelerators, respectively, and/or
The virtual addresses assigned to the host processor are restored in a virtual address space other than the one or more virtual address spaces.
8. The virtual memory management method of claim 1, wherein the virtual memory management method is performed using a unified computing device architecture, CUDA.
9. A virtual memory management device, comprising:
an accelerator allocation unit configured to: responding to the memory application of the accelerator, and distributing virtual addresses in the reserved virtual address space to the accelerator;
a host processor allocation unit configured to: and responding to the memory application of the host processor, and distributing the virtual addresses outside the reserved virtual address space to the host processor.
10. An electronic device, comprising:
at least one processor;
at least one memory storing computer-executable instructions,
wherein the computer-executable instructions, when executed by the at least one processor, cause the at least one processor to perform the virtual memory management method of any one of claims 1 to 8.
CN202310680298.4A 2023-06-08 2023-06-08 Virtual memory management method, device, electronic equipment and storage medium Pending CN116680211A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310680298.4A CN116680211A (en) 2023-06-08 2023-06-08 Virtual memory management method, device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310680298.4A CN116680211A (en) 2023-06-08 2023-06-08 Virtual memory management method, device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116680211A true CN116680211A (en) 2023-09-01

Family

ID=87785144

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310680298.4A Pending CN116680211A (en) 2023-06-08 2023-06-08 Virtual memory management method, device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116680211A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117573380A (en) * 2024-01-16 2024-02-20 北京趋动智能科技有限公司 Virtual address allocation method and device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117573380A (en) * 2024-01-16 2024-02-20 北京趋动智能科技有限公司 Virtual address allocation method and device
CN117573380B (en) * 2024-01-16 2024-05-28 北京趋动智能科技有限公司 Virtual address allocation method and device

Similar Documents

Publication Publication Date Title
US10877940B2 (en) Data storage with a distributed virtual array
US20210182190A1 (en) Intelligent die aware storage device scheduler
US11934681B2 (en) Data migration for write groups
JP6799652B2 (en) Methods and devices for processing information
US20210286517A1 (en) Utilizing Allocation Shares To Improve Parallelism In A Zoned Drive Storage System
US11630578B2 (en) Electronic system with storage management mechanism and method of operation thereof
US11734169B2 (en) Optimizing spool and memory space management
US11487455B2 (en) Dynamic block allocation to optimize storage system performance
US20120117555A1 (en) Method and system for firmware rollback of a storage device in a storage virtualization environment
US20120209894A1 (en) Partition file system for virtual machine memory management
US11397545B1 (en) Emulating persistent reservations in a cloud-based storage system
US11797212B2 (en) Data migration for zoned drives
US20220300413A1 (en) Optimizing spool and memory space management
US20220291858A1 (en) Utilizing programming page size granularity to optimize data segment storage in a storage system
US20220180950A1 (en) Refresh of differing capacity nand
CN112346647B (en) Data storage method, device, equipment and medium
US11768763B2 (en) Flash secure erase
CN116680211A (en) Virtual memory management method, device, electronic equipment and storage medium
US10482049B2 (en) Configuring NVMe devices for redundancy and scaling
US20210064272A1 (en) Multi-tier storage
US11847030B2 (en) Prioritizing virtual machines for backup protection at a virtual machine disk level
US20220091743A1 (en) Bucket versioning snapshots
WO2018055686A1 (en) Information processing system
US20240061676A1 (en) Reliable system with redundant hardware, and software derived from the same source code but having different binary executions
EP4148572B1 (en) Computational storage device and storage system including the computational storage device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication