CN116681578B

CN116681578B - Memory management method, graphic processing unit, storage medium and terminal equipment

Info

Publication number: CN116681578B
Application number: CN202310968209.6A
Authority: CN
Inventors: 顾德明; 施宏彦
Original assignee: Li Computing Technology Shanghai Co ltd; Nanjing Lisuan Technology Co ltd
Current assignee: Li Computing Technology Shanghai Co ltd; Nanjing Lisuan Technology Co ltd
Priority date: 2023-08-02
Filing date: 2023-08-02
Publication date: 2023-12-19
Anticipated expiration: 2043-08-02
Also published as: CN116681578A

Abstract

The application provides a memory management method, a graphic processing unit, a storage medium and a terminal device, wherein the memory management method comprises the following steps: receiving a read request or a write request, wherein the read request or the write request comprises a virtual address; performing hit test on the virtual address in a page directory to obtain a hit result, wherein the page directory comprises page table granularity indicating information, and the page table granularity indicating information is used for indicating the granularity of paging; and searching a physical address corresponding to the virtual address in a corresponding page table according to the hit result, wherein the page table has the paging granularity indicated by the page table granularity indication information. The technical scheme can improve the flexibility and the high efficiency of the memory management.

Description

Memory management method, graphic processing unit, storage medium and terminal equipment

Technical Field

The present disclosure relates to the field of graphics processing technologies, and in particular, to a memory management method, a graphics processing unit, a storage medium, and a terminal device.

Background

The electronic device may use virtual memory technology to manage memory. Virtual memory technology refers to a method of separating and using memory addresses used by electronic devices and real memory addresses. For example, virtual memory technology is implemented as a paging technique that divides the entire memory into pages (pages) of a certain size and associates virtual addresses with physical addresses. A memory management unit (Memory Management Unit, MMU) is located inside or outside the processor and translates virtual addresses to physical addresses when the processor accesses memory. MMU maps virtual and physical addresses in units of pages and manages them as page tables in special memory regions. The MMU accesses memory to learn the mapping of the address translation relationship.

In the prior art, the page size has two forms of 4KB and 64KB. Conventional graphics processor (Graphics Processing Unit, GPU) page table management is of two types: one is a hybrid mode where the GPU only supports 4KB page tables and not multiple page tables. Another class of GPUs, while supporting 64KB page tables, only support single page table modes. In this mode, a Page Table Entry (PTE) points to one of a 4KB Page Table or a 64KB Page Table.

However, with the first type of page table management method, in the case that the GPU has an oversized memory, the management of the excessively fine granularity may cause a memory traffic (memory traffic) of the page table to increase sharply, thereby affecting the operation efficiency of the GPU. For the second type of page table management mode, the conversion between page tables is complex, resulting in reduced running efficiency of the GPU.

Disclosure of Invention

The method and the device can improve the flexibility and the high efficiency of the memory management.

In order to achieve the above purpose, the present application provides the following technical solutions:

in a first aspect, the present application provides a memory management method, where the memory management method includes: receiving a read request or a write request, the read request or the write request comprising a virtual address; performing hit test on the virtual address in a page directory to obtain a hit result, wherein the page directory comprises page table granularity indicating information, and the page table granularity indicating information is used for indicating paging granularity; and searching a physical address corresponding to the virtual address in a corresponding page table according to the hit result, wherein the page table has the paging granularity indicated by the page table granularity indication information.

Optionally, the page table entries of the first-level page table point to zero-level page tables having different page granularities.

Optionally, page table entries in zero-order page tables with different page granularity covering the same virtual address range are valid in time.

Optionally, the hit test on the virtual address in the page directory includes: and carrying out hit test on the page directory entry in at least one stage of address translation bypass buffer according to the virtual address, wherein the page directory entry in the address translation bypass buffer comprises the page table granularity indication information.

Optionally, the searching the physical address corresponding to the virtual address in the corresponding page table according to the hit result includes: if the hit result is hit, determining the address of at least one stage of page table according to the address information in the page directory entry of the stage one address translation bypass buffer and the page table granularity indication information; and querying the at least one stage of page table according to the virtual address to obtain the physical address.

Optionally, a unified cache is used to store each physical address.

Optionally, the paging granularity includes 4KB and 64KB, physical addresses in a plurality of page tables indicating pages of 64KB paging granularity are adjacently disposed in the unified cache.

In a second aspect, the present application also discloses a graphics processing unit, the graphics processing unit comprising: an engine for generating a read request or a write request, the read request or the write request comprising a virtual address; and the memory management unit is used for executing the steps of the memory management method.

Optionally, the graphics processing unit includes: the input end of the memory management unit is coupled with the output end of the graphic processor cluster; and/or a dynamic memory controller, wherein the input end of the memory management unit is coupled with the output end of the dynamic memory controller.

In a third aspect, the present application provides a terminal device, which is characterized by comprising the graphics processing unit according to the second aspect.

In a fourth aspect, the present application provides a computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor performs the steps of the memory management method.

Compared with the prior art, the technical scheme of the application has the following beneficial effects:

in the technical scheme, a read request or a write request is received, wherein the read request or the write request comprises a virtual address; performing hit test on the virtual address in a page directory to obtain a hit result, wherein the page directory comprises page table granularity indicating information, and the page table granularity indicating information is used for indicating the granularity of paging; and searching a physical address corresponding to the virtual address in a corresponding page table according to the hit result, wherein the page table has the paging granularity indicated by the page table granularity indication information. According to the technical scheme, the page granularity is indicated by setting the page table granularity indication information in the page directory, so that the support of page tables with different page granularities can be realized, the switching of memories under different page granularities can be realized without complex operation, and the flexibility and the high efficiency of memory management are improved.

Further, in the technical scheme of the application, page table entries in zero-order page tables with different page granularity covering the same virtual address range are effective in time sharing. According to the technical scheme, the page table entries in the zero-order page table are effectively arranged in a time-sharing mode, so that switching under different paging granularity is guaranteed, and the high efficiency of memory management is guaranteed.

Drawings

Fig. 1 is a flowchart of a memory management method according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a multi-level TLB provided by an embodiment of the present application;

FIG. 3 is a schematic diagram of a PDE for a TLB provided in an embodiment of the present application;

FIG. 4 is a schematic diagram of a different page table provided by an embodiment of the present application;

FIG. 5 is a schematic diagram illustrating a data arrangement manner in a unified cache according to an embodiment of the present application;

FIG. 6 is a block diagram of a graphics processing unit provided in an embodiment of the present application;

fig. 7 is a block diagram of another graphics processing unit provided in an embodiment of the present application.

Detailed Description

As described in the background art, for the first type of page table management mode, in the case that the GPU has an oversized video memory, the management with an excessively fine granularity may cause a memory traffic (memory traffic) of the page table to increase dramatically, thereby affecting the operation efficiency of the GPU. For the second type of page table management mode, the conversion between page tables is complex, resulting in reduced running efficiency of the GPU.

Specifically, for the second type of page table management approach, in single page table mode, the pagetablepage size field is added to the field dxgk_pte to indicate the type of page table (using 64KB or 4KB pages) to which the kernel mode driver corresponds. The memory manager will choose to use the 64KB page table in the virtual address range, mapping only 64KB aligned allocations to that range when the following conditions are met: all allocated memory segments mapped to this range support 64KB pages. When the virtual address range is mapped by the 64KB page and the above condition is no longer true (e.g., allocation has been committed to a system memory segment), the memory manager will switch from the 64KB page table to the 4KB page table. The following complex procedure needs to be performed: all contexts of the process are suspended. The existing PTE is updated to point to the 4KB page. The driver will get the UpdatePageTable page operation. The Level 1 PTE pointing to the PAGE TABLE will be updated to reflect the new PAGE size (pagetableagesize=dxgk_ptepage_table_page_page_4kb). The driver will get the UpdatePageTable page operation. Finally, all the contexts of the process are restored.

In addition, in the single page table mode, when the page table of 4KB PTE is converted into the page table of 64KB PTE, the current 16 continuous 4KB addresses are required to be continuous addresses, and the implementation difficulty is high.

According to the technical scheme, the page granularity is indicated by setting the page table granularity indication information in the page directory, so that the support of page tables with different page granularities can be realized, the switching of memories under different page granularities can be realized without complex operation, and the flexibility and the high efficiency of memory management are improved.

In order to better understand the embodiments of the present invention, a detailed description will be given below of how the memory management unit (Memory Management Unit, MMU) implements mapping of virtual addresses to physical addresses of the memory. First, a brief description will be given of the concept of virtual memory (virtual memory). Those skilled in the art will appreciate that programs are loaded into memory for execution. However, as the size of the program increases, it is difficult to accommodate a complete program, and the concept of virtual memory has grown. The basic idea behind virtual memory is that the total size of programs, data, and stacks can exceed the size of physical memory, with the operating system keeping the currently used portions in memory and the other unused portions on disk. For example, if a computer has only 4M memory, when the computer needs to run a 16MB program, the operating system may choose to determine to cache the 4M program content in memory and swap program segments between memory and disk if needed, so that the 16M program can be run on a computer having only 4M memory.

As will be appreciated by those skilled in the art, in a computer system, a virtual address space is a range of virtual addresses that a process can access. The size of the virtual address space is typically determined by the instruction set architecture (instruction set architecture) of the computer. For example, a 32-bit GPU provides a virtual address space of 0-0xFFFFFFFF (4G). One address in the virtual address space is referred to as a virtual address. Corresponding to the virtual address space and the virtual address is a physical address space and a physical address. The physical address space refers to the physical address range of the memory. One address in the physical address space is referred to as a physical address. Typically, the physical address space is smaller than the virtual address space into which the physical address space can be mapped. For example, for a 32bit x86 host with 256M memory, its virtual address space range is 0-0xFFFFFFFF (4G) and physical address space range is 0x000000000-0x0FFFFFFF (256M).

In the prior art, most machines use virtual memory, and virtual addresses (or linear addresses) are not sent directly to the memory address bus, but to a memory management unit (Memory Management Unit, MMU), which translates the virtual addresses into physical addresses. That is, the MMU is used to implement the mapping of virtual addresses of programs to physical addresses of memory.

To achieve mapping of virtual addresses of programs to physical addresses of memory, MMUs introduce paging (paging) mechanisms. Specifically, the virtual address space is divided in units of pages (pages), and pages in the virtual address space may be referred to as virtual pages. Accordingly, the physical address space is also divided in units of pages, and pages of the physical address space may be referred to as physical pages (or physical page frames), where virtual pages and physical pages are the same size.

In order to make the above objects, features and advantages of the present application more comprehensible, embodiments accompanied with figures are described in detail below.

Referring to fig. 1, the main execution body of the memory management method is an MMU, that is, the MMU executes the following steps. It will of course be appreciated that the memory management method may be performed by any other suitable entity.

Specifically, the memory management method specifically includes the following steps:

step 101: receiving a read request or a write request, wherein the read request or the write request comprises a virtual address;

step 102: performing hit test on the virtual address in a page directory to obtain a hit result, wherein the page directory comprises page table granularity indicating information, and the page table granularity indicating information is used for indicating the granularity of paging;

step 103: and searching a physical address in a memory corresponding to the virtual address in a corresponding page table according to the hit result, wherein the page table has the paging granularity indicated by the page table granularity indication information.

It will be appreciated that in particular implementations, each of the steps of the method described above may be implemented in a software program running on a processor integrated within a chip or chip module. The method may also be implemented by combining software with hardware, which is not limited in this application.

Unlike the existing process of virtual memory management as physical address, the embodiment of the present invention can flexibly select the paging granularity through the page table granularity indication information in the page directory (page entry). Taking the example that the granularity of paging includes 4KB and 64KB, the partitioning unit of the physical address obtained by final conversion can be flexibly switched between 4KB and 64KB. And the switching among different page tables is completed through page table granularity indication information, so that the page tables do not need to be queried for the second time, the access delay is prevented from being increased, and the GPU performance is improved.

In addition, when the GPU supports multiple page sizes, the page size used currently can be confirmed by the page table granularity indication information during the allocation of the cache (cache allocation), so that a corresponding number of cache lines (cachelines) can be accurately allocated, and the efficiency of memory management is further ensured.

In the embodiment of the invention, for the same physical address range in the memory, a plurality of page tables with different paging granularities point to the address range. Taking the example of a page granularity comprising 4KB and 64KB, then the 4KB page table and the 64KB page table point to the same address range at the same time, that is, the same address range is managed by both the 4KB page table and the 64KB page table.

In a specific application scenario, a page table with which paging granularity is selected for address mapping can be dynamically managed by an operating system, and the operating system informs an MMU of selecting a corresponding page table for address mapping according to the page table granularity indication information through dynamic switching. Through the mode that the plurality of page tables with different paging granularities point to the same address range, mapping granularity switching of the same address can be completed rapidly, complexity of switching between page tables is reduced, page table switching efficiency is improved, and operating efficiency of the GPU is further improved.

In an implementation of step 101, the MMU may receive a read request or a write request from a GPU Engine (Engine). The read request is used for requesting to read data from the memory, and the write request is used for requesting to write data into the memory. Either the read request or the write request carries a virtual address that can be translated into a physical address of the memory that can be used for reading or writing of data.

In an implementation of step 102, the MMU may hit test the virtual address in the page directory. Specifically, the page directory includes a plurality of virtual addresses and physical addresses corresponding to the virtual addresses, and then the physical addresses corresponding to the virtual addresses can be determined by performing hit test on the page table directory by using the virtual addresses in the read request or the write request.

In one non-limiting embodiment, the MMU implements virtual address to memory physical address mapping in the form of multi-level page tables. Specifically, in the secondary management mode, this mapping approach includes a lookup of the page directory and page table in turn. The page directory stores base addresses of a plurality of page tables, and each page table stores a plurality of page table entries. The page table entry records a physical base address of a physical page corresponding to the virtual address.

In one non-limiting embodiment, to reduce latency, a translation Look-aside buffer (TLB), which may also be referred to as a cache structure of a memory management bypass buffer, is used in the MMU. If the relationship between the virtual page and the physical page is confirmed in the TLB, i.e., when a TLB hit occurs, the MMU may translate the address without accessing the page table in memory, thereby greatly improving performance.

In particular implementations, the MMU concurrently performs hit testing (hit test) on page directory entry page directory entries (page directory entry, PDE) in at least one level TLB according to the virtual address. Wherein the page directory entry PDE points to a 2M memory BLOCK (BLOCK) or page table.

In one embodiment, the TLB may also have multiple levels, as shown in FIG. 2, and the MMU may hit the page directory entries PDE in the multiple levels of TLB at the same time. If the first level TLB hits (hit), the virtual address may be directly used to perform the final PTE lookup.

For example, when using a 4KB page store page table for pages of page granularity, to be able to accomplish 49 bits of virtual address resolution, a 5-level page table may be employed, i.e., a four-level TLB, a three-level TLB, a two-level TLB, a one-level TLB, and a PTE.

Referring to the PDE of the first-level TLB shown in FIG. 3, the PDE of the first-level TLB includes page table granularity indication information for indicating a page granularity, e.g., 4KB or 64KB. Correspondingly, the PDE of the first-level TLB also includes a 4KB address and a 64KB address, respectively.

The PDE also includes block/page indication information for indicating whether a page (page) address or a block (block) address is selected. Other information contains the identity of the validity, security, target area, sparseness, etc. of the current page.

Further, the MMU determines an address of the at least one level one page table based on the address information and page table granularity indication information in the PDE of the level one TLB. The MMU accesses the page table from the memory according to the address of the page table, and then the MMU queries at least one stage of page table according to the virtual address to obtain the corresponding physical address.

Referring specifically to fig. 4, in the multi-level page table, except that the PTE of the last level page table (i.e., zero level page table) is a page directly pointing to the memory, the page table entries of the page tables of other levels are all pointed to the head address of the next level page table, so that the page tables of other levels are called paging structures (paging structures).

In the embodiment of the invention, the PTE of the first-level page table points to a 4KB zero-level page table and a 64KB zero-level page table at the same time. The page granularity of physical addresses in a 4KB zero-order page table is 4KB, and correspondingly, the page granularity of physical addresses in a 64KB zero-order page table is 64KB.

Further, PTEs in zero-order page tables with different page granularity covering the same virtual address range are time-efficient. In particular, both pointers in an entry in a one-level page table may have valid flags set, but entries in a zero-level page table covering the same virtual address range cannot be valid at the same time. For example, when an allocation covered by a 4KB PTE is placed in a memory segment supporting a 64KB page size, the 64KB PTE will become invalid and the corresponding 4KB PTE will become valid.

In one non-limiting embodiment, unified cache (unified cache) is used to store each physical address, and the unified design can facilitate flexible switching of 64KB and 4KB page tables.

Further, as shown in FIG. 5, the physical addresses of the multiple page tables indicating pages of 64KB page granularity are adjacently placed in the unified cache.

In order to improve the hit rate of the cache to the greatest extent, the page table addresses storing 64KB can be closely discharged, and under the condition of the same cache size, the coverage rate can be improved by 16 times, the cache capacity and the hit rate are improved, so that the memory management efficiency of the MMU is further improved.

With continued reference to FIG. 1, in an implementation of step 103, if the hit in the page directory, e.g., in the PTE, is a hit, then the physical address corresponding to the virtual address may be determined. The MMU returns the physical address to the GPU engine for the GPU engine to perform corresponding read or write operations in the memory space pointed to by the physical address.

In one embodiment, if a hit in the page directory, such as a miss in a PTE, the PTE unified cache allocation (allocation) request is completed based on the page granularity specified by the first-level TLB, and either a read request or a write request is placed in a waiting queue.

In one embodiment, if the first-level TLB corresponding to the PTE is missed, the physical address of the corresponding first-level TLB is restored according to the information of the upper-level TLB, such as the four-level TLB/the three-level TLB/the two-level TLB, and a corresponding miss request is issued. If the information required in the four-level TLB/three-level TLB/two-level TLB is also missed when the address required in the one-level TLB is translated, the miss requests need to be issued step by step in sequence, and the order in which the addresses are obtained is four-level TLB/three-level TLB/two-level TLB/one-level TLB/PTE.

In a specific application scenario, the virtual address contains information for finding the physical address. For example, the virtual address is 4 bytes (32 bits) in size, and typically the virtual address can be divided into 3 parts:

22 nd to 31 st: the top 10 bits correspond to the index in the page directory;

12 th to 21 st bits: corresponding to an index in a page table;

bits 0 to 11: the lower 12 bits are the intra-page offset.

As will be appreciated by those skilled in the art, each process has its own dedicated virtual address space and page directory for addressing, and all processes within the system share the virtual address space of the kernel and page directory of the kernel, and each process can enter the kernel through a system call. There is a register CR3 in the GPU for holding the page directory base address. At the time of process scheduling, the register CR3 points to the page directory base address of the current process. At the time of process switching, the register CR3 switches the page directory base address to which the current point is directed. For a virtual address to be translated to a physical address, the physical page in which the page directory is located is first found based on the value in register CR3. Then, according to the value of the 10 bits (the highest 10 bits) from the 22 th bit to the 31 st bit of the virtual address, the corresponding page directory entry PDE, which can also be called a page directory, is found, and the PDE has the physical address of the page table corresponding to the virtual address. With the physical address of the Page Table, the corresponding Page Table Entry (PTE) in the Page Table is found based on the 10-bit value from the 12 th bit to the 21 st bit of the virtual address as an index. There is a physical address in the page table entry for the physical page to which this virtual address corresponds. Finally, the physical address corresponding to the virtual address is obtained by adding the physical address of the physical page to the lowest 12 bits of the virtual address, namely the intra-page offset.

Typically, a page directory has 1024 entries, and the 10 bits of the highest virtual address just index 1024 entries (10 th power of 2 equals 1024). One page table also has 1024 entries, the 10 bits of the middle part of the virtual address, just to index the 1024 page table entries. The 12 bits with the lowest virtual address (equal to 4096 to the 12 th power of 2) as an intra-page offset, just 4KB can be indexed, i.e., every byte in one physical page.

As will be appreciated by those skilled in the art, the 32bit pointer addresses a range of 0x00000000-0xFFFFFFFF (4 GB). That is, a 32bit pointer may address each byte of the entire 4GB address space. One page table entry may be responsible for mapping the address space and physical memory of 4K. One page table 1024 entries, one page table may be responsible for the mapping of 1024×4k=4m address space. A page directory entry corresponds to a page table. One page directory has 1024 entries, i.e., corresponds to 1024 page tables, each of which is responsible for mapping of a 4M address space, and 1024 page tables are responsible for address space mapping of 1024×4m=4g. A process has a page directory. The page directory and page table can guarantee a mapping of each page in the address space of 4G and physical memory in units of pages.

Each process has its own 4G address space, from 0x00000000 to 0xFFFFFFFF. Mapping from the virtual address of the process to the physical address of the memory is realized through a set of page directory and page table of each process. The physical memory mapped by the address space of each process is not identical since each process has its own page directory and page table.

For more details of this embodiment, reference may be made to the foregoing embodiments, which are not described herein.

Referring to fig. 6, the present application also discloses a graphics processing unit 60. The graphic processing unit 60 may include:

an engine 601 for generating a read request or a write request, the read request or the write request comprising a virtual address.

The memory management unit 602 is configured to perform memory management. Specifically, the memory management unit 602 may perform the steps of the memory management method in the foregoing embodiments.

In this embodiment, the engine 601 sends a read request or a write request to the memory management unit 602. The memory management unit 602 manages virtual memory as physical addresses and returns to the engine 601. The engine 601 reads or writes data using the physical address.

According to the embodiment, the page granularity is indicated by setting page table granularity indication information in the page directory, so that the support of page tables with different page granularities can be realized, the switching of the memory under different page granularities can be realized without complex operation, and the flexibility and the high efficiency of the memory management are improved.

In this embodiment, the GPU may include multiple sets of memory management units 602, and the locations and numbers of the memory management units 602 may be located in different modules of the GPU. Referring specifically to FIG. 7, the memory management unit 602 may be located at the exit of the graphics processor cluster (Graphic Processor Cluster, GPC) 603 or at the exit of the dynamic memory controller (Dynamic Memory Controller, DMC) 604.

In a particular embodiment, the graphics processing unit 60 may also include a graphics processor cluster 603. An input of the memory management unit 602 is coupled to an output of the graphics processor cluster 603. In other words, the memory management unit 602 may translate virtual addresses from the graphics processor cluster 603 into physical addresses and return the physical addresses to the graphics processor cluster 603. For example, the graphics processor cluster 603 may write its operation results into the memory space indicated by the physical address.

In one embodiment, graphics processing unit 60 may also include a dynamic memory controller 604. An input of the memory management unit 602 is coupled to an output of the dynamic memory controller 604. In other words, the memory management unit 602 may translate virtual addresses from the dynamic memory controller 604 to physical addresses and return the physical addresses to the dynamic memory controller 604. For example, the dynamic memory controller 604 may output the read data from the memory space indicated by the physical address to an external device.

With respect to each of the apparatuses and each of the modules/units included in the products described in the above embodiments, it may be a software module/unit, a hardware module/unit, or a software module/unit, and a hardware module/unit. For example, for each device or product applied to or integrated on a chip, each module/unit included in the device or product may be implemented in hardware such as a circuit, or at least part of the modules/units may be implemented in software program, where the software program runs on a processor integrated inside the chip, and the rest (if any) of the modules/units may be implemented in hardware such as a circuit; for each device and product applied to or integrated in the chip module, each module/unit contained in the device and product can be realized in a hardware manner such as a circuit, different modules/units can be located in the same component (such as a chip, a circuit module and the like) or different components of the chip module, or at least part of the modules/units can be realized in a software program, the software program runs on a processor integrated in the chip module, and the rest (if any) of the modules/units can be realized in a hardware manner such as a circuit; for each device, product, or application to or integrated with the terminal device, each module/unit included in the device may be implemented in hardware such as a circuit, and different modules/units may be located in the same component (e.g., a chip, a circuit module, etc.) or different components in the terminal device, or at least some modules/units may be implemented in a software program, where the software program runs on a processor integrated within the terminal device, and the remaining (if any) part of the modules/units may be implemented in hardware such as a circuit.

The embodiment of the application also discloses a storage medium, which is a computer readable storage medium, and a computer program is stored on the storage medium, and the computer program can execute the steps of the method shown in fig. 1 when running. The storage medium may include Read-Only Memory (ROM), random access Memory (Random Access Memory, RAM), magnetic or optical disks, and the like. The storage medium may also include non-volatile memory (non-volatile) or non-transitory memory (non-transitory) or the like.

The embodiment of the application also discloses a terminal device, which comprises the graphic processing unit; alternatively, the terminal device comprises a memory and a processor, the memory storing a computer program executable on the processor, the processor executing the steps of the instruction compiling method described above when the computer program is executed.

The term "plurality" as used in the embodiments herein refers to two or more.

The first, second, etc. descriptions in the embodiments of the present application are only used for illustrating and distinguishing the description objects, and no order division is used, nor does it indicate that the number of the devices in the embodiments of the present application is particularly limited, and no limitation on the embodiments of the present application should be construed.

The above embodiments may be implemented in whole or in part by software, hardware, firmware, or any other combination. When implemented in software, the above-described embodiments may be implemented in whole or in part in the form of a computer program product. The computer program product comprises one or more computer instructions or computer programs. When the computer instructions or computer program are loaded or executed on a computer, the processes or functions described in accordance with the embodiments of the present application are all or partially produced. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center by wired or wireless means.

It should be understood that, in various embodiments of the present application, the sequence numbers of the foregoing processes do not mean the order of execution, and the order of execution of the processes should be determined by the functions and internal logic thereof, and should not constitute any limitation on the implementation process of the embodiments of the present application.

In the several embodiments provided in the present application, it should be understood that the disclosed method, apparatus, and system may be implemented in other manners. For example, the device embodiments described above are merely illustrative; for example, the division of the units is only one logic function division, and other division modes can be adopted in actual implementation; for example, multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may be physically included separately, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in hardware plus software functional units.

The integrated units implemented in the form of software functional units described above may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium, and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform part of the steps of the methods described in the embodiments of the present application.

Although the present application is disclosed above, the present application is not limited thereto. Various changes and modifications may be made by one skilled in the art without departing from the spirit and scope of the invention, and the scope of the invention shall be defined by the appended claims.

Claims

1. A memory management method, comprising:

receiving a read request or a write request, the read request or the write request comprising a virtual address;

performing hit test on the virtual address in a page directory to obtain a hit result, wherein the page directory comprises page table granularity indicating information, and the page table granularity indicating information is used for indicating the granularity of paging and selecting the granularity of paging and switching between different page tables;

and searching physical addresses in a memory corresponding to the virtual address in a corresponding page table according to the hit result, wherein the page table has the page granularity indicated by the page table granularity indication information, and for the same physical address range in the memory, a plurality of zero-order page tables with different page granularities point to the address range, the page table entries of a first-order page table point to the head addresses of the zero-order page tables with different page granularities, a plurality of pointers in the first-order page table are provided with valid marks, the pointers respectively indicate that page table entries in the zero-order page tables with different page granularities covering the same virtual address range are valid, and unified cache is adopted to store each physical address.

2. The memory management method according to claim 1, wherein said hit testing said virtual address in the page directory comprises:

and carrying out hit test on the page directory entry in at least one stage of address translation bypass buffer according to the virtual address, wherein the page directory entry in the address translation bypass buffer comprises the page table granularity indication information.

3. The memory management method according to claim 2, wherein the searching the physical address corresponding to the virtual address in the corresponding page table according to the hit result includes:

if the hit result is hit, determining the address of at least one stage of page table according to the address information in the page directory entry of the stage one address translation bypass buffer and the page table granularity indication information;

and querying the at least one stage of page table according to the virtual address to obtain the physical address.

4. The memory management method of claim 1 wherein the page granularity comprises 4KB and 64KB, physical addresses in a plurality of page tables indicating pages of 64KB page granularity are adjacently disposed in the unified cache.

5. A graphics processing unit, comprising:

an engine for generating a read request or a write request, the read request or the write request comprising a virtual address;

a memory management unit for performing the steps of the memory management method according to any one of claims 1 to 4.

6. The graphics processing unit of claim 5, comprising:

the input end of the memory management unit is coupled with the output end of the graphic processor cluster;

and/or a dynamic memory controller, wherein the input end of the memory management unit is coupled with the output end of the dynamic memory controller.

7. A computer readable storage medium having stored thereon a computer program, which when executed by a processor performs the steps of the memory management method according to any of claims 1 to 4.

8. Terminal device comprising a memory and a processor, said memory having stored thereon a computer program, characterized in that said processor performs the steps of the memory management method according to any of claims 1 to 4 or comprises a graphics processing unit according to claim 5 or 6.