CN111552654B - Processor for detecting redundancy of page table walk - Google Patents

Processor for detecting redundancy of page table walk Download PDF

Info

Publication number
CN111552654B
CN111552654B CN201911105191.7A CN201911105191A CN111552654B CN 111552654 B CN111552654 B CN 111552654B CN 201911105191 A CN201911105191 A CN 201911105191A CN 111552654 B CN111552654 B CN 111552654B
Authority
CN
China
Prior art keywords
page table
address
walk
index
level
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911105191.7A
Other languages
Chinese (zh)
Other versions
CN111552654A (en
Inventor
朴城范
赛义德·穆因
崔周熙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Samsung Electronics Co Ltd
Original Assignee
Samsung Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from KR1020190022184A external-priority patent/KR20200098354A/en
Application filed by Samsung Electronics Co Ltd filed Critical Samsung Electronics Co Ltd
Publication of CN111552654A publication Critical patent/CN111552654A/en
Application granted granted Critical
Publication of CN111552654B publication Critical patent/CN111552654B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • G06F12/1027Address translation using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • G06F12/1009Address translation using page tables, e.g. page table structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0866Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches for peripheral storage systems, e.g. disk cache
    • G06F12/0873Mapping of cache memory to specific storage devices or parts thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • G06F12/1027Address translation using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB]
    • G06F12/1036Address translation using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB] for multiple virtual address spaces, e.g. segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/10Providing a specific technical effect
    • G06F2212/1016Performance improvement
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/10Providing a specific technical effect
    • G06F2212/1016Performance improvement
    • G06F2212/1021Hit rate improvement
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/10Providing a specific technical effect
    • G06F2212/1028Power efficiency
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/15Use in a specific computing environment
    • G06F2212/151Emulated environment, e.g. virtual machine
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/65Details of virtual memory and virtual address translation
    • G06F2212/651Multi-level translation tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/65Details of virtual memory and virtual address translation
    • G06F2212/654Look-ahead translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/68Details of translation look-aside buffer [TLB]
    • G06F2212/681Multi-level TLB, e.g. microTLB and main TLB
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/68Details of translation look-aside buffer [TLB]
    • G06F2212/684TLB miss handling
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

The processor includes a page table walk cache storing address translation information, and a page table walk. The page table walker obtains a first output address indicated by a first index of a first input address by looking up address translation information and at least a portion of the page table, and compares a level of match between a second index of a second input address and the first index of the first input address with a level of traversal cache hit obtained by looking up the page table walk cache using the second index.

Description

Processor for detecting redundancy of page table walk
Cross Reference to Related Applications
This patent application claims priority to U.S. provisional patent application No.62/803,227 filed on 8 2 months 2019 to the U.S. patent and trademark office, and claims priority to korean patent application No.10-2019-0022184 filed on 26 months 2019 to the korean intellectual property office, the disclosures of which are incorporated herein by reference in their entirety.
Technical Field
The present disclosure relates to processors, and more particularly to redundant processors configured to detect page table walks.
Background
A system on chip (hereinafter referred to as "SoC") is an integrated circuit in which components of an electronic system or Intellectual Property (IP) are integrated. The term "intellectual property" and the acronym "IP" both refer to unique circuits and circuit components, each of which may be individually protected by intellectual property. As used in the description herein, the term and acronym may be synonymous with similar terms, such as "IP block" or "IP circuit. The processor of the SoC may execute a plurality of applications that the user wants, and for this purpose, the processor may exchange data with the storage device. However, since a user wants to execute a plurality of applications quickly and simultaneously, the processor must efficiently use limited storage device resources. The processor may use the virtual memory space and may manage the page table by a function that includes mapping information between the virtual memory space and the physical memory space of the memory device. The processor may look up a page table and may perform a translation between a virtual address of the virtual memory space and a physical address of the physical memory space.
Disclosure of Invention
Embodiments of the present disclosure provide a processor that detects redundancy of page table walk.
According to an exemplary embodiment, a processor includes a page table walk cache and a page table walk. The page table walk cache stores address translation information. The page table walker obtains a first output address indicated by a first index of the first input address by looking up address translation information and at least a portion of the page table. The page table walker also compares the matching level to the level of the walk cache hit. The matching level is between the second index of the second input address and the first index of the first input address. The level of traversal cache hit is obtained by looking up the page table traversal cache using the second index.
According to another exemplary embodiment, a processor includes a page table walk cache and a page table walk. The page table walk cache stores address translation information. The page table walker obtains a first intermediate address indicated by a first index of a first input address by looking up address translation information and at least a portion of a first page table of the first stage. The page table walker also obtains the first output address indicated by the second index of each first intermediate address by looking up address translation information and at least a portion of the second page table of the second stage. The page table walker also compares the matching level to the level of the walk cache hit. The matching level is between the fourth index of each second intermediate address indicated by the third index of the second input address and the second index of each first intermediate address. The level of traversal cache hit is obtained by looking up the page table traversal cache using the fourth index.
According to yet another exemplary embodiment, a processor includes a page table walk cache and a page table walk. The page table walk cache stores address translation information. The page table walker obtains a first intermediate address indicated by a first index of a first input address by looking up address translation information and at least a portion of a first page table of the first stage. The page table walker also obtains the first output address indicated by the second index of each first intermediate address by looking up address translation information and at least a portion of the second page table of the second stage. The page table walker also compares the first matching level to the first walk cache hit level. The first matching level is between the third index of the second input address and the first index of the first input address. The first level of traversal cache hit is obtained by looking up the page table walk cache using the third index. The page table walker also compares the second level of match to the second level of traversal cache hit. The second matching level is between the fourth index of each second intermediate address indicated by the third index of the second input address and the second index of each first intermediate address. The second level of traversal cache hit is obtained by looking up the page table walk cache using the fourth index.
Drawings
Fig. 1 shows a block diagram of an electronic device according to an embodiment of the disclosure.
Fig. 2 shows a block diagram of any one of the first core to the fourth core in the SoC of fig. 1.
Fig. 3 illustrates main memory and the SoC of fig. 1 executable application programs and operating systems.
Fig. 4 shows a mapping between virtual address space and physical address space of the application of fig. 3.
FIG. 5 illustrates operations of the page table walker of FIG. 2 to perform a page table walk.
Fig. 6 shows main memory and the SoC executable application programs and operating system of fig. 1.
Fig. 7 shows a mapping between virtual address space and physical address space of the application of fig. 6.
Fig. 8A and 8B illustrate a flow chart of operations of the page table walker of fig. 2 to perform a page table walk based on a first stage and a second stage.
FIG. 9 illustrates a detailed block diagram and operation of the page table walker of FIG. 2.
FIG. 10 illustrates another detailed block diagram and operation of the page table walker of FIG. 2.
FIG. 11 illustrates another detailed block diagram and operation of the page table walker of FIG. 2.
FIG. 12 illustrates another detailed block diagram and operation of the page table walker of FIG. 2.
FIG. 13 illustrates a flow chart of a page table walk performed by the page table walker of FIG. 2 to translate virtual addresses to physical addresses.
Fig. 14A and 14B illustrate a flow chart of operations performed by the page table walker of fig. 2 to perform a first stage of page table walk for translating virtual addresses to intermediate physical addresses and a second stage of page table walk for translating intermediate physical addresses to physical addresses.
Detailed Description
Fig. 1 shows a block diagram of an electronic device according to an embodiment of the disclosure. The electronic device 100 may include a SoC 1000 (system on chip) and a main memory 2000. The electronic device 100 may also be referred to as an "electronic system". For example, the electronic device 100 may be a desktop computer, a notebook computer, a workstation, a server, a mobile device, or the like. SoC 1000 may be one chip in which various (multiple different) systems are integrated, such as on a single integrated substrate and/or such as within an integrated housing.
SoC 1000 may function as an Application Processor (AP) to control the overall operation of electronic device 100. The SoC 1000 may include a first core 1100_1 to a fourth core 1100_4 (each core may also be referred to as a "processor" or a "Central Processing Unit (CPU)"), a cache memory 1300, and a bus 1400. Although not shown in the figures, soC 1000 may also include any other Intellectual Property (IP), such as a memory controller. Each of the first core 1100_1 to the fourth core 1100_4 may execute various software such as an application program, an operating system, and/or a device driver. The number of the first core 1100_1 to the fourth core 1100_4 of fig. 1 is merely an example, and the SoC 1000 may include one or more homogeneous or heterogeneous cores.
The first core 1100_1 to the fourth core 1100_4 may include a first MMU 1200_1 (memory management unit) to a fourth MMU 1200_4, respectively. The first to fourth MMUs 1200_1 to 1200_4 may convert the virtual address used into a physical address used in a hardware storage device, such as the cache memory 1300 in the SoC 1000, the main memory 2000 external to the SoC 1000, and/or a secondary memory (not shown) external to the SoC 1000. The first to fourth MMUs 1200_1 to 1200_4 may translate virtual addresses to physical addresses when the first to fourth cores 1100_1 to 1100_4 execute the first to fourth software, respectively. The first to fourth MMUs 1200_1 to 1200_4 may manage address translation information (e.g., translation tables) between virtual addresses and physical addresses. The first to fourth MMUs 1200_1 to 1200_4 may allow an application to have a private (dedicated) virtual memory space and may allow the first to fourth cores 1100_1 to 1100_4 to perform a plurality of tasks.
The cache memories 1300 may be connected to the first core 1100_1 to the fourth core 1100_4, respectively, and may be shared by the first core 1100_1 to the fourth core 1100_4. For example, the cache memory 1300 may be implemented using registers, flip-flops, static Random Access Memory (SRAM), or a combination thereof. For the first core 1100_1 to the fourth core 1100_4, the cache memory 1300 may have a faster access speed than the main memory 2000. The cache memory 1300 may store instructions, data, addresses, address translation information, etc. of the first core 1100_1 through the fourth core 1100_4 or associated with the first core 1100_1 through the fourth core 1100_4.
Bus 1400 may connect internal IPs of SoC1000, such as cores 1100_1 to 1100_4, cache memory 1300, etc., or may provide an access path to main memory 2000 to an internal IP of SoC 1000. Bus 1400 may be an AMBA (advanced microcontroller bus architecture) standard bus protocol type. The bus type of AMBA may be AHB (advanced high-performance bus), APB (advanced peripheral bus), or AXI (advanced extensible interface).
Main memory 2000 may be in communication with SoC 1000. The main memory 2000 may provide the first core 1100_1 to the fourth core 1100_4 with a larger capacity than the cache memory 1300. Main memory 2000 may store instructions, data, addresses, address translation information, etc. provided from SoC 1000. For example, main memory 2000 may be Dynamic Random Access Memory (DRAM). In an embodiment, electronic apparatus 100 may include any other hardware storage device (not shown) in communication with SoC1000, such as a Solid State Drive (SSD), hard Disk Drive (HDD), or memory card, in addition to main memory 2000.
Fig. 2 shows a block diagram of any one of the first core to the fourth core in the SoC of fig. 1. The core 1100 may be any one of the first core 1100_1 to the fourth core 1100_4 of fig. 1. Core 1100 may include fetch unit 1110, decode unit 1120, register renaming unit 1130, issue/retire unit 1140, ALU 1150 (arithmetic logic unit), FPU 1160 (floating point unit), branch checking unit 1170, load/store unit 1180, L2 cache 1190, and MMU 1200. All components of the core 1100 (including detailed components of the MMU 1200) may be implemented in hardware using analog circuitry, digital circuitry, logic circuitry, clock circuitry, flip-flops, registers, and the like.
Fetch unit 1110 may fetch instructions with reference to memory addresses stored in a program counter (not shown) that tracks the memory addresses of instructions and store the fetched instructions in an instruction register (not shown). For example, instructions may be stored in a memory such as cache memory (not shown) in core 1100, cache memory 1300, or main memory 2000. The decode unit 1120 may decode instructions stored in instruction registers and may determine what instructions to execute to cause the instructions to be executed. Register renaming unit 1130 may map logical registers specified by instructions to physical registers in core 1100. Register renaming unit 1130 may map logical registers specified by successive instructions to different physical registers and may remove dependencies between instructions. Issue/retraction unit 1140 may control when decoded instructions are issued (or sent) to the pipeline and when returned results are retracted.
The ALU 1150 may perform arithmetic operations, logical operations, or shift operations based on issued instructions. The ALU 1150 may be provided with operation codes, operands, etc. required for operation from memory. FPU 1160 may perform floating point operations. The branch checking unit 1170 may check the prediction of the branch direction of the branch instruction to improve the flow of the pipeline. The load/store unit 1180 may execute load and store instructions, may generate virtual addresses for use in load and store operations, and may load data from the L2 cache 1190, cache memory 1300, or main memory 2000, or may store data in the L2 cache 1190, cache memory 1300, or main memory 2000.
The MMU is a component of a core, such as core 1100. The MMU 1200 may be any one of the first MMU 1200_1 through the fourth MMU 1200_4 of FIG. 1. MMU 1200 may include TLB 1210 (translation look-aside buffer), page table walk 1220, page table walk 1230, TTBR 1241 (translation table base register), and VTTBR 1242 (virtual translation table base register). Page table walker 1220 is described below and may be implemented as a unit similar to other units of core 1100. The page table walker 1220 may be implemented as or include units that perform logical operations including fetching or initiating fetching by the core 1100, and comparing or initiating comparing by the core 1100. The most recently accessed page translations may be cached in the TLB 1210. For each memory access performed by the core 1100, the MMU 1200 may check if translations for a given virtual address are cached in the TLB. Multiple entries may be stored in the TLB 1210, each divided into tags and data. For example, information of the virtual address may be located in the tag, and information of the physical address may be located in the data. In the case where a translation (mapping information) of a virtual address is cached in the TLB 1210 (in the case of a TLB hit), the translation may be immediately available. In the event that there is no valid translation of the virtual address in the TLB 1210 (in the event of a TLB miss), the translation of the virtual address should be updated in the TLB 1210 by a page table walk involving searching the page tables stored in the cache memory 1300 and/or in the main memory 2000. The page table may be a data structure that stores a mapping between virtual addresses and physical addresses.
The page table walker 1220 may perform a page table walk for virtual addresses that are not found or looked up from the TLB 1210. Page table walker 1220 may "walk" or look up the page table to translate virtual addresses into physical addresses. The page table walker 1220 may obtain address translation information about the virtual address from the page table stored in the cache memory 1300 or the main memory 2000.
The page table walk buffer 1230 may buffer or store partial or full address translation information for virtual addresses. For example, page tables may be built hierarchically. The page table walker 1220 may access or walk the page table in order (sequentially), may retrieve partial address translation information from the page table, and may store the retrieved information in the page table walk buffer 1230. Also, the page table walker 1220 may skip accesses or lookups to some of the page tables stored in the cache memory 1300 or the main memory 2000 by looking up partial address translation information previously (already) cached in the page table walk buffer 1230, and may speed up page table walks.
TTBR 1241 may store a base address indicating a page table. VTTBR 1242 may also store a base address indicating a page table. The values of the base addresses stored in TTBR 1241 and VTTBR 1242 may vary with software executable by core 1100 (e.g., application programs, operating systems, etc.).
Fig. 3 illustrates main memory and the SoC of fig. 1 executable application programs and operating systems. Fig. 4 shows a mapping between virtual address space and physical address space of the application of fig. 3. Fig. 3 and 4 will be described together.
Referring to fig. 3, an operating system may manage hardware including SoC 1000 and main memory 2000, and software including application AP1 and/or application AP2. The operating system may operate to allow application AP1 and/or application AP2 to execute on SoC 1000 and main memory 2000. The number of applications AP1 and AP2 shown in fig. 3 is merely an example. Referring to fig. 4, when executing the first application program AP1, the operating system may map the virtual address space of a process to a physical address space. When executing the second application AP2, the operating system may map the virtual address space of the process to a physical address space. The operating system can efficiently use the limited capacity of the memory installed on the hardware by managing the above-described mapping.
FIG. 5 illustrates operations of the page table walker of FIG. 2 to perform a page table walk. Page table walker 1220 may receive virtual addresses from load/store unit 1180. The virtual address received by the page table walker 1220 may be an address looked up in the TLB 1210 (i.e., a TLB miss address). The multi-bit (e.g., K bits, where "K" is a natural number) portion of the virtual address may be divided into an L0 index, an L1 index, an L2 index, an L3 index, and an offset region. The index of the virtual address may be divided according to the levels L0 to L3. Furthermore, page tables may be partitioned or hierarchically structured according to levels L0 through L3. Thus, the index may reflect fragments of the multi-bit portion each having a different weight, and the page tables may be arranged in a hierarchy constructed corresponding to the weights of the fragments of the multi-bit portion of the virtual address. In fig. 5, the number of levels, the number of indexes, and the number of page tables are merely examples. Page table walker 1220 may sequentially look up page tables built according to the levels L0 through L3 hierarchy. Regarding the search order, "L0" may be the first level and "L3" may be the last level.
First, the page table walker 1220 may look up the entry indicated by the L0 index of the virtual address from the entries of the L0 page table indicated by the base address stored in TTBR 1241. The L0 page table may be indexed by an L0 index. The descriptors stored in each entry may include attributes and output addresses (indicated by dark shading). For example, the attributes may include permission bits, access bits, dirty bits, security bits, etc. associated with the output address. The page table walker 1220 may obtain the descriptor included in the entry indicated by the L0 index of the virtual address, and may store or update partial information of the descriptor (i.e., partial address translation information regarding the L0 index of the virtual address) in the page table walk buffer 1230.
The page table walker 1220 may look up the entry indicated by the L1 index of the virtual address among the entries of the L1 page table indicated by the L0 output address of the descriptor obtained from the L0 page table. In other words, the page table walker 1220 may find an entry indicated by the L1 index of the virtual address among entries of the L1 page table indicated based on the L0 output address of the descriptor acquired from the L0 page table. The page table walker 1220 may obtain the descriptor included in the entry indicated by the L1 index of the virtual address, and may store or update partial information of the descriptor (i.e., partial address translation information regarding the L1 index of the virtual address) in the page table walk buffer 1230.
The page table walker 1220 may look up the entry indicated by the L2 index of the virtual address among the entries of the L2 page table indicated by the L1 output address of the descriptor obtained from the L1 page table. In other words, the page table walker 1220 may look up the entry indicated by the L2 index of the virtual address among the entries of the L2 page table indicated based on the L1 output address of the descriptor obtained from the L1 page table. The page table walker 1220 may obtain the descriptor included in the entry indicated by the L2 index of the virtual address, and may store or update partial information of the descriptor (i.e., partial address translation information regarding the L2 index of the virtual address) in the page table walk buffer 1230.
The page table walker 1220 may look up the entry indicated by the L3 index of the virtual address among the entries of the L3 page table indicated by the L2 output address of the descriptor obtained from the L2 page table. In other words, the page table walker 1220 may find the entry indicated by the L3 index of the virtual address among the entries of the L3 page table indicated based on the L2 output address of the descriptor obtained from the L2 page table. The page table walker 1220 may obtain the descriptor included in the entry indicated by the L3 index of the virtual address, and may store or update partial information of the descriptor (i.e., partial address translation information regarding the L3 index of the virtual address) in the page table walk buffer 1230. Moreover, because the level corresponding to the L3 index and the L3 page table is the last level, the page table walker 1220 may also store the descriptor in the TLB 1210.
MMU 1200 may look up the page indicated by the offset of the virtual address among the pages indicated by the L3 output address of the descriptor fetched from the L3 page table and may calculate the final physical address (e.g., final physical address = L3 output address + offset). In the case where a mapping (i.e., a final translation) between the virtual address and the L3 output address in the L3 page table is cached in the TLB 1210, the MMU 1200 may immediately calculate the final physical address by using the offset and the output address cached in the TLB 1210, and may return the final physical address to the load/store unit 1180.
In an embodiment, page table walker 1220 may perform a page table walk for one virtual address, and then may perform a page table walk for another virtual address. When a page table walk is performed for one virtual address, partial address translation information may already be stored in page table walk cache 1230. In the case where partial address translation information for a partial index of another virtual address is stored in the page table walk buffer 1230, the page table walk 1220 may skip an operation of acquiring a descriptor from a specific level. For example, in the case where the partial address translation information of the L0 index is already stored in the page table walk buffer 1230 (i.e., when a hit occurs in the page table walk buffer), the page table walk 1220 may skip the operation of looking up the L0 page table. As in the L0 level operation described above, page table walker 1220 may perform the remaining L1, L2, and L3 level operations.
Fig. 6 shows main memory and the SoC executable application programs and operating system of fig. 1. Fig. 7 shows a mapping between virtual address space and physical address space of the application of fig. 6. Fig. 6 and 7 will be described together, and the description will focus on the differences between the embodiments based on fig. 6 and 7 and the embodiments based on fig. 3 and 4.
Referring to fig. 6, a first operating system may manage hardware including SoC 1000 and main memory 2000 and software including application AP1 and/or application AP 2. The second operating system may manage the same hardware including SoC 1000 and main memory 2000 and software including application AP3 and/or application AP 4. There may be an additional layer of software, i.e., a hypervisor, between the first operating system, the second operating system and the hardware. The hypervisor may be used to operate two or more operating systems by using limited hardware resources.
Referring to fig. 7, when executing the first application program AP1, the first operating system may map the virtual address space of the process to the intermediate physical address space. The first operating system may also map the virtual address space of the process to an intermediate physical address space when executing the second application AP 2. Similarly, when executing the third application AP3, the second operating system may map the virtual address space of the process to an intermediate physical address space. The second operating system may also map the virtual address space of the process to an intermediate physical address space when executing the fourth application AP 4. Each of the first operating system and the second operating system may manage a first stage of address translation between virtual addresses and intermediate physical addresses. The hypervisor may manage the intermediate physical address and the second stage of address translation between physical addresses. The hypervisor used in the computer system provides the capability for address translation in the second stage and other features described above, as compared to the case of FIG. 4.
Fig. 8A and 8B illustrate a flow chart of operations of the page table walker of fig. 2 to perform a page table walk based on a first stage and a second stage. Fig. 8A and 8B will be described together. In fig. 8A and 8B, "S", "L", and "PT" denote a phase, a level, and a page table, respectively. The page table walker 1220 may receive the virtual address looked up from the TLB 1210 from the load/store unit 1180. The index of the virtual address may be divided according to the levels L0 to L3. The page table may be divided into a first stage S1 and a second stage S2, and the page table may be divided or hierarchically constructed according to the levels L0 to L3 in each stage. As described with reference to fig. 6 and 7, the hypervisor may be used for virtualization. The page table walker 1220 may calculate an S1L0 Intermediate Physical Address (IPA) (also referred to as an "intermediate address") by adding the L0 index of the virtual address and the base address stored in the TTBR 1241.
The page table walker 1220 may find an entry indicated by the L0 index of the S1L0 intermediate physical address from among entries of the S2L0 page table indicated by the base address stored in the VTTBR 1242, may acquire the descriptor included in the entry, and may store partial information of the descriptor (i.e., partial address translation information regarding the L0 index of the S1L0 intermediate physical address) in the page table walk buffer 1230. The page table walker 1220 may look up an entry indicated by the L1 index of the S1L0 intermediate physical address from among entries of the S2L1 page table indicated by the S2L0 output address, may acquire the descriptor included in the entry, and may store partial information of the descriptor (i.e., partial address translation information regarding the L1 index of the S1L0 intermediate physical address) in the page table walk buffer 1230. As in the operation associated with the S2L1 page table, the page table walker 1220 may perform the operations associated with the S2L2 and S2L3 page tables indicated by the S2L1 and S2L2 output addresses, respectively. The page table walker 1220 may find an entry indicated by the offset of the S1L0 intermediate physical address among entries of the S1L0 page table indicated by the S2L3 output address of the descriptor acquired from the S2L3 page table, may acquire the descriptor included in the entry, and may store partial information of the descriptor (i.e., partial address translation information regarding the offset of the S1L0 intermediate physical address) in the page table walk buffer 1230.
The page table walker 1220 may calculate the S1L1 intermediate physical address by adding the L1 index of the virtual address to the S1L0 output address obtained from the S1L0 page table. As in the second stage page table walk performed on the S1L0 intermediate physical address, the page table walk 1220 may perform the second stage page table walk on the S1L1 intermediate physical address. As in the second stage page table walk performed on the S1L1 intermediate physical address, page table walk 1220 may perform the second stage page table walk on the S1L2 intermediate physical address, the S1L3 intermediate physical address, and the final intermediate physical address, respectively. The second stage page table walk indicates an operation to find S2L0 to S2L3 page tables and get descriptors, and the first stage page table walk indicates an operation to find S1L0 to S1L3 page tables and get descriptors.
The page table walker 1220 may calculate the S1L0 intermediate physical address by adding the L0 index of the base address and the virtual address stored in the TTBR 1241, and may perform a second-stage page table walk on the S1L0 intermediate physical address. The page table walker 1220 may also calculate an S1L1 intermediate physical address by adding the L1 index of the virtual address to the S1L0 output address, and may perform a second-stage page table walk on the S1L1 intermediate physical address. The page table walker 1220 may also calculate an S1L2 intermediate physical address by adding the L2 index of the virtual address to the S1L1 output address, and may perform a second-stage page table walk on the S1L2 intermediate physical address. The page table walker 1220 may also calculate an S1L3 intermediate physical address by adding the L3 index of the virtual address to the S1L2 output address, and may perform a second-stage page table walk on the S1L3 intermediate physical address. The page table walker 1220 may also calculate the final intermediate physical address by adding the offset of the S1L3 output address and the virtual address, and may perform a second stage of page table walk on the final intermediate physical address. After performing the second stage of page table walk on the final intermediate physical address, page table walker 1220 may store the last fetched descriptor in page table walk buffer 1230. Moreover, page table walker 1220 may also store the last fetched descriptor in TLB 1210 as the final result. The above-described operation of page table walker 1220 may be referred to as a "nested walk".
MMU 1200 may look up a page indicated by an offset of the virtual address among pages indicated by S2L3 output addresses of descriptors fetched from the S2L3 page table, and may obtain a physical address from the looked up page (e.g., final physical address = S2L3 output address + offset). That is, in the case where a mapping (i.e., a final translation) between the virtual address and the S2L3 output address is cached in the TLB 1210, the MMU 1200 may immediately calculate the physical address by using the offset and the output address cached in the TLB 1210, and may return the physical address.
Examples of a number of levels of 4 and a number of stages of 2 per stage are shown in fig. 8A and 8B, but the teachings of the present disclosure are not limited thereto. For example, the number of levels of the first stage may be "m" ("m" is a natural number of 1 or more), and the number of levels of the second stage may be "n" ("n" is a natural number of 1 or more). In the case where the page table walker 1220 performs a page table walk of a virtual address under the conditions of a TLB miss and a page table walk cache miss, the number of times a descriptor is fetched from the page table may be "(m+1) X (n+1) -1". Of course, during the first stage and second stage of page table walk performed by the page table walker 1220, respectively, the page table walker 1220 may skip the operation of fetching descriptors with reference to the partial address translation information stored in the page table walk buffer 1230.
Fig. 9-11 illustrate detailed block diagrams and operations of the page table walker of fig. 2. Fig. 9 to 11 will be described together. In fig. 9 to 11, it is assumed that the page table walker performs the page table walk described with reference to fig. 3 to 5.
Page table walker 1220 may include a page table walk scheduler 1221, walkers 1223 and 1224, and a redundant walk detector 1225. All of the components of page table walker 1220 may be implemented in hardware using analog circuitry, digital circuitry, logic circuitry, clock circuitry, flip-flops, registers, and the like. In other words, page table walker 1220 may be physically labeled as a page table walker circuit, whether implemented as a processor/memory combination (e.g., microprocessor/memory) that stores and executes software instructions, or as logic circuitry such as an application specific integrated circuit. The page table walk scheduler 1221 may receive one or more input addresses (virtual addresses) that have not been looked up from the TLB 1210. The page table walk scheduler 1221 may manage entries, each storing or including an L0-L3 index, a hazard bit, a hazard level bit, and a hazard ID bit of an input address. Information associated with the walk request with the input address may be input to each entry of the page table walk scheduler 1221.
The hazard/replay controller 1222 may examine or identify the hazard bit, the hazard level bit, and the hazard ID bit of each entry, and may provide the input address stored in each entry or information associated with the traversal request having the input address to either one of the traversers 1223 and 1224. Each of the traversers 1223 and 1224 can perform page table traversal with respect to the input address supplied from the page table traverser 1220, and can acquire the output address. The input address may be a virtual address and each output address acquired by each of the traversers 1223 and 1224 may be a physical address. Unlike the illustration in fig. 9, the number of traversers 1223 and 1224 can be more than 2, and page table traverser 1220 can perform 2 or more page table traversers in parallel or simultaneously.
Redundant walk detector 1225 may calculate a level of match between input addresses for which page table walks have been determined to be performed by walkers 1223 and 1224 and input addresses for which it has not been determined whether to continue performing page table walks. The match level may indicate a degree of matching of the index of one input address with the index of another input address. Since the higher the matching level, the higher the similarity between the input addresses, the results of performing the respective page table traversals of the input addresses may be similar to each other and may be repetitive (or redundant). The matching level may also be referred to as a "redundant hit level". The matching level may be calculated by the redundant walk detector 1225 or the page table walk scheduler 1221.
Redundant traversal detector 1225 can manage entries that store or include input addresses to traversers 1223 and 1224. For example, the entry of the redundant walk detector 1225 may be provided with the input address of the entry input to the page table walk scheduler 1221 without modification. The redundant walk detector 1225 may use the index of the input address stored in each entry to find the page table walk buffer 1230 to obtain and store the walk cache hit level. The traversal cache hit level may be used for comparison by the redundant traversal detector 1225 (i.e., a level of match with the index of the input address) to pre-detect and predict redundancy of page table traversals of the input address. As an example of the practical significance of using traversal of cache hit levels, this improves efficiency, avoids unnecessary power consumption and avoids unnecessary processing when the result is able to avoid redundancy. Also, when the above-described traversers 1223 and 1224 store each of the output addresses respectively indicated by the index of the input address in the page table walk buffer 1230, the redundant walk detector 1225 can obtain and store the walk buffer level updated at this time.
In the case where the descriptor indicated by an arbitrary index has been scheduled to be fetched from the memory in which the page table is stored or has been stored in the page table walk buffer 1230, it is not necessary to fetch the descriptor again from the memory. Redundant traversal detector 1225 may compare the match level to the traversal cache hit level, and may flag the hazard bit based on the comparison. The redundant walk detector 1225 may detect and predict redundancy of page table walk for the input address in advance based on the comparison result. Redundancy of page table walk for an input address means that there is redundancy in at least a portion of the operation of looking up a page table using an index matching an index of an input address of another page table walk that has been determined to be performed among indexes of input addresses. Page table walker 1220 may perform page table walks in which redundancy does not exist, rather than page table walks in which redundancy exists, thereby improving performance of SoC 1000 and reducing power consumption of SoC 1000. Redundant walk detector 1225 may compare the match level to the walk cache level and may clear the marked hazard bit based on the comparison. The method of detecting redundancy of page table walk will be described more fully below.
Referring to fig. 9, it is assumed that an input address is input to entry 0 and entry 1 of the page table walk scheduler 1221, respectively, that the hazard bits, hazard level bits, and hazard ID bits are in a clear state, and that the result of a previously performed page table walk (i.e., address translation information) is stored in entry 0 of the page table walk buffer 1230. The number of entries is not limited to the examples of fig. 9 to 11. In fig. 9 to 11, the valid bit of the valid entry among the plurality of entries may be denoted by "Y".
The page table walk scheduler 1221 may assign the input address IA0 input to the entry 0 to the walk 1223 (which is in a wait state), and the walk 1223 may perform a page table walk for the input address IA 0. The walker 1223 may check (or determine) whether the output addresses indicated by the L0, L1, L2, and L3 indices 0x12, 0x23, 0x34, and 0x78 are looked up from the page table walk buffer 1230. Referring to FIG. 9, the output address 0x100 indicated by L0 index 0x12 has been stored in the page table walk cache 1230 (an L0 level hit occurs in the page table walk cache 1230). Redundant walk detector 1225 may use L0, L1, L2, and L3 indexes 0x12, 0x23, 0x34, and 0x78 of input address IA0 to lookup page table walk buffer 1230 to obtain or calculate a walk buffer hit level of "L0" for input address IA 0. Also, when the output address 0x100 indicated by the L0 index 0x12 is stored in the page table walk buffer 1230, the redundant walk detector 1225 may indicate that the walk buffer level of the input address IA0 is "L0" (Y). Because the output address 0x100 indicated by the L0 index 0x12 is already stored in the page table walk buffer 1230, the operation of fetching the output address 0x100 from memory may be skipped. However, because the output address indicated by L1 index 0x23 is not stored in the page table walk buffer 1230 (i.e., a miss occurs in the page table walk buffer 1230), the walker 1223 may initiate (initiate or begin) fetching the output address indicated by L1 index 0x23 from memory.
Referring to fig. 10, the page table walk scheduler 1221 may assign the input address IA1 input to the entry 1 to the walk 1224. The traverser 1224 may perform a page table walk for the input address IA 1. The traverser 1224 can check whether the output addresses indicated by the L0, L1, L2, and L3 indexes 0x12, 0x23, 0x9A, and 0xBC are looked up from the page table walk buffer 1230. Referring to FIG. 10, an output address 0x100 indicated by L0 index 0x12 has been stored in page table walk buffer 1230. Redundant walk detector 1225 may use L0, L1, L2, and L3 indexes 0x12, 0x23, 0x9A, and 0xBC to lookup page table walk buffer 1230 to obtain or calculate a walk buffer hit level of "L0" for input address IA 1. Also, when the output address 0x100 indicated by the L0 index 0x12 is stored in the page table walk buffer 1230, the redundant walk detector 1225 may indicate that the walk buffer level of the input address IA1 is "L0" (Y).
Because the output address 0x100 indicated by the L0 index 0x12 is already stored in the page table walk buffer 1230, the operation of fetching the output address 0x100 from memory may be skipped. Because the output address indicated by L1 index 0x23 is not stored in page table walk buffer 1230, the walker 1224 may initiate a fetch of the output address indicated by L1 index 0x23 from memory.
In the case where all of the traversers 1223 and 1224 acquire the output address indicated by the L1 index 0x23 from the memory, the traversers 1223 and 1224 may acquire the same output address, and thus the operations of the traversers 1223 and 1224 may have redundancy and may be repeated. Because the traverser 1223 first starts to fetch the output address indicated by the L1 index 0x23 from the memory, the operation of the traverser 1224 to fetch the output address indicated by the L1 index 0x23 from the memory may be redundant and may be repeated. Redundancy for page table walk of input address IA1 is an operation of acquiring an output address indicated by L1 index 0x23 from an L1 page table stored in a memory. Thus, redundancy of page table traversals to be performed by traverser 1224 can be predicted and/or detected to prevent such redundancy.
To detect redundancy of page table traversals to be performed by the traverser 1224, the redundant traversal detector 1225 may compare the input addresses IA0 and IA1 in units of segment-based indexes or levels, where each segment is a different portion of the input virtual address. The increase in index level may reflect the granularity of the input virtual addresses and, as described herein, the higher the match between the current input virtual address and the existing and/or previous input virtual addresses, the more redundancy in the process can be avoided. The L0 index 0x12 and the L1 index 0x23 of the input address IA1 may match (be equal to) the L0 index 0x12 and the L1 index 0x23 of the input address IA0, respectively. Redundant traversal detector 1225 can calculate the level of match between input addresses IA0 and IA1 to be "L1". Also, the redundant walk detector 1225 may calculate a walk cache hit level of "L0" for a different input address IA0 than the input address IA1 used to calculate the matching level. The redundant traversal detector 1225 may compare the level of match L1 between the input address IA0 and the input address IA1 with the level of traversal cache hit L0 of the input address IA 1.
Because the match level L1 is greater than (or higher than) the walk cache hit level L0, the redundant walk detector 1225 may flag the hazard bit (Y) of entry 1 of the page table walk scheduler 1221. The indicated hazard bit indicates that the match level L1 of the input address IA1 is greater than the walk cache hit level L0 and that there is redundancy in the page table walk for the input address IA 1. In the event that the hazard bit is marked, the execution of the page table walk for input address IA1 in the walker 1224 may be eliminated. Alternatively, the traverser 1224 can perform a page table walk for an input address stored in another entry (e.g., 2, 3, or 4) in the page table walk scheduler 1221. Redundant traversal detector 1225 can prevent redundant use of traverser 1224. As an example of the practical meaning of using the traversal cache hit level, by thus using the traversal cache hit level, redundancy can be avoided, efficiency can also be improved, unnecessary power consumption can be avoided, and unnecessary processing can be avoided.
In the above example, a description is given that the page table walk for the input address IA1 is performed at the traverser 1224, and the page table walk is canceled in the case of the hazard bit being marked. In another embodiment, page table walk scheduler 1221 may first check whether the hazard bits of input address IA1 are marked, and may then provide input address IA1 to walk 1224. In this case, the page table walk may be performed after redundancy of the page table walk for the input address IA1 is eliminated (i.e., after the hazard bit is cleared).
Redundant walk detector 1225 may flag the risk level bit of entry 1 of page table walk scheduler 1221 as "1". Here, "1" represents "L1" in a level for hierarchically constructing a page table, and is only an exemplary value. The hazard level bit may represent the highest level of the matching index of input addresses IA0 and IA1, or may represent another level of the matching index of input addresses IA0 and IA 1. Redundant walk detector 1225 may mark the hazard ID of entry 1 of page table walk scheduler 1221 as "0". The hazard ID may indicate which one of the traversers 1223 and 1224 (traverser 1223 in the above example) performs a page table walk for input address IA0 having the same index as some of the input address IA 1.
Referring to fig. 11, the traverser 1223 may complete fetching the output address 0x200 indicated by the L1 index 0x23 of the input address IA0 from the memory, and may store the output address 0x200 in the entry 1 of the page table walk buffer 1230 to fill the entry 1. Partial address translation information for L1 index 0x23 for input address IA0 may be stored and updated in page table walk buffer 1230. When output address 0x200 is stored in entry 1 of page table walk buffer 1230, redundant walk detector 1225 may update the walk buffer level of input address IA0 and input address IA1 to "L1". For example, the traversal cache hit level may be calculated as the following level: the level to which the index of the input address corresponding to the most recently acquired output address belongs. Thus, the traversal cache hit level for reducing redundancy may be dynamically updated based on the operation of the redundancy traversal detector 1225.
Because the traversal cache level L1 of the input address IA0 is updated, the redundant traversal detector 1225 can compare the matching level L1 of the input address IA1 with the traversal cache level L1 of the input address IA 0. Because the match level L1 is not greater than (i.e., equal to) the walk cache hit level L1, the redundant walk detector 1225 may clear the hazard bits, hazard level bits, and hazard TD of entry 1, including the input address IA1 of the page table walk scheduler 1221.
When the hazard bit of entry 1 is cleared, the hazard/replay controller 1222 of the page table walk scheduler 1221 may again provide the input address IA1 to the walk 1224. The traverser 1224 may look up the output addresses 0x100 and 0x200 indicated by the L0 index and the L1 index in the page table walk buffer 1230, and may then begin fetching the output addresses indicated by the L2 index from memory. The traverser 1224 may replay or re-perform the page table walk of the input address IA1.
When a hit occurs in the page table walk buffer 1230 with respect to the L0 index, the lookup of the L0 page table is skipped and the remaining L1 through L3 page table lookups are performed. The traverser 1223 acquires the output address indicated by the index of the input address IA0 by looking up the address translation information (output address 0x 100) stored in the page table walk buffer 1230 and at least a part of the page table. When the traverser 1223 acquires the output address, the redundant traversal detector 1225 may compare the matching level between the input address IA0 and the input address IA1 with the traversal cache hit level of the input address IA1, and may detect redundancy of the page table traversal for the input address IA1. The page table walk scheduler 1221 cannot provide input address IA1 to the walkers 1223 and 1224 until the redundant walk detector 1225 clears the hazard bits of input address IA1.
FIG. 12 illustrates a detailed block diagram of the page table walker of FIG. 2 and entries managed by the page table walker. In fig. 12, it is assumed that the page table walker performs the first-stage page table walk and the second-stage page table walk described with reference to fig. 6 to 8B. The description will focus on the differences between the embodiment based on fig. 12 and the embodiments based on fig. 9 to 11.
The page table walker 1220 may include a redundant walk detector 1225 as the first redundant walk detector and a second redundant walk detector 1226. Redundant walk detector 1225, which is a first redundant walk detector, may be associated with a first stage page table walk for translating virtual addresses to intermediate physical addresses. The second redundant walk detector 1226 may be associated with a second stage page table walk for translating intermediate physical addresses to physical addresses.
As with the redundant walk detector 1225 described with reference to fig. 9-11, the first redundant walk detector may detect redundancy of page table walks of the first stage. The redundant walk detector 1225, which is a first redundant walk detector, may compare a first matching level between an input address (virtual address) such as a current input address and another input address such as a previous input address with a first walk cache hit level obtained by looking up the page table walk buffer 1230 using an index of the input address. The redundant traversal detector 1225 as the first redundant traversal detector may flag the hazard bit, the first-stage hazard level bit, and the hazard ID bit based on the comparison result.
As with the redundant walk detector 1225 described with reference to fig. 9-11, the second redundant walk detector 1226 may detect redundancy of the second stage page table walk. The second redundant walk detector 1226 may compare the second level of match between the input address and another input address to the second level of walk cache hit obtained by looking up the page table walk cache 1230 using the index of the input address. For example, the input address may be an intermediate physical address such as a current intermediate physical address, and the other input address may be another intermediate physical address such as a previous intermediate physical address. The second redundant traversal detector 1226 may flag the hazard bits, the second stage hazard level bits, and the hazard ID bits based on the comparison results. The hazard bits marked or cleared by the redundant traversal detector 1225, which is a first redundant traversal detector, may be the same or different than the hazard bits marked or cleared by the second redundant traversal detector 1226.
The traverser 1223 acquires an intermediate physical address, which is an output address acquired from the S1L0 to S1L3 page tables of fig. 8A and 8B, and is indicated by an index of an input address such as a current input address. The walker obtains the intermediate physical address by looking up the first-stage address translation information and at least a portion of the first-stage page table stored in the page table walk buffer 1230. When the traverser 1223 acquires the intermediate physical address, a redundant traversal detector 1225, which is a first redundant traversal detector, may compare the first match level with the first traversal cache hit level, and may detect redundancy of page table traversal for the input address. Further, the traverser 1223 acquires a physical address, which is an output address acquired from the S2L0 to S2L3 page tables of fig. 8A and 8B and indicated by an index of an intermediate physical address. The walker 1223 obtains the physical address by looking up the address translation information of the second stage and at least a portion of the page table of the second stage stored in the page table walk buffer 1230. When the traverser 1223 obtains physical addresses, the second redundant traversal detector 1226 may compare the second level of match between the intermediate physical addresses to the second level of traversal cache hit, and may detect redundancy of page table traversals for the intermediate physical addresses. Each of the traversers 1223 and 1224 can perform a page table walk with respect to the input address supplied from the page table traverser 1220 and can acquire an output address. For example, the input address may be a virtual address and each output address may be an intermediate physical address. For another example, the input address may be an intermediate physical address and each output address may be a physical address.
FIG. 13 shows a flow chart of a page table walk performed by the page table walker of FIG. 2 for translating virtual addresses to physical addresses, and is described with reference to FIG. 5. In operation S103, the page table walker 1220 may receive a virtual address (i.e., an input address) after a TLB miss. The input address received by page table walker 1220 is an address that is not looked up from TLB 1210. MMU 1200 may use the input address and context to look up TLB 1210. For ease of description, the input address is shown in fig. 5 and 8A and 8B as including an index and an offset, but the input address may also include a context. For example, the context may be information of an Address Space ID (ASID), a permission level, non-security, a Virtual Machine ID (VMID), and the like.
In operation S106, the page table walker 1220 may assign or provide the input address and context to the page table walk scheduler 1221. For example, as described with reference to FIG. 9, the input addresses may be stored separately in entries of page table walker 1220.
In operation S109, the page table walk scheduler 1221 may check whether the hazard bits of the input address are marked. When the hazard bit is marked (s109=yes), the page table walk scheduler 1221 may not assign input addresses to the walkers 1223 and 1224 until the hazard bit is cleared. Page table walk for the input address may not be performed until the hazard bits are cleared.
In operation S113, when the hazard bit is not marked (s109=no) or cleared, the page table walk scheduler 1221 may assign the input address to either one of the traversers 1223 and 1224 (e.g., a free traverser that does not perform page table walk). Moreover, the page table walk scheduler 1221 may assign input addresses to the redundant walk detectors 1225.
In operation S116, the traverser may check whether partial and complete address translation information is stored in the page table walk buffer 1230. For example, the traverser may be any one of traversers 1223 and 1224, the partial and complete address translation information may be descriptors indicated by indexes of input addresses (such as current input addresses), and page table walk buffer 1230 may be a first stage page table walk buffer S1WC. That is, the traverser allocated with the input address in operation S116 may check whether partial or complete address translation information associated with the input address and the context is stored in the page table walk buffer 1230. The walker may identify the highest level of the first-stage levels of output addresses indicated by the index of the input address stored in the page table walk buffer 1230. The walker may examine the level of the first stage of the output address stored in the page table walk buffer 1230 indicated by the index of the input address. When the traverser looks up the page table walk buffer 1230, the traverser can further reference the context and the index of each of the levels L0 through L3 of the first stage. For example, the traverser can traverse partial address translation information of the entries of the cache 1230 using page tables, with contexts and indices matching the requested context and index, respectively.
In operation S119, when a hit occurs in the page table walk buffer 1230 (s116=yes), the walker may skip the operation of acquiring the output address stored in the page table walk buffer 1230 or indicated by the hit index. The traverser can skip the operation of acquiring the output address until the hit level of the first stage. For example, in the case where the current input address is the input address IA1 of fig. 11, the traverser may skip the operation of acquiring the output addresses indicated by the L0 and L1 indexes, respectively. The traverser can skip the operation of acquiring the corresponding output address from the first level (e.g., L0) to the hit level (e.g., L1) of the first stage hit in operation S116. As the level of hits in the first stage becomes higher, the number of page tables that the traverser looks up decreases.
In operation S123, the redundant walk detector 1225 may detect whether there is redundancy in the operation in which the traverser acquires the output address indicated by the index missed in the page table walk buffer 1230 by comparing the matching level with the walk cache hit level. Redundant walk detector 1225 may calculate a first stage match level between the input address (or any other input address) of the pending page table walk and the current input address. The match level may indicate a level of a match index (e.g., a match level corresponding to a hazard level bit) for the current input address and any other input addresses. As the matching level becomes higher, the degree to which the index of the current input address and the index of the other input address match each other becomes higher. The redundant traversal detector 1225 can calculate the highest (largest) match level of the match levels as the match level for the current input address. Also, the redundant walk detector 1225 may find the page table walk buffer 1230 by using the index of the current input address, and may obtain the first stage walk cache hit level.
In operation S126, when the matching level is higher than the traversal cache hit level (i.e., when redundancy is detected) (s123=yes), the redundant traversal detector 1225 may update the hazard information (e.g., hazard bits, hazard level bits, hazard ID bits) in the entry storing or including the input address and the context so that the page table traversal including redundancy is not performed. Redundant traversal detector 1225 can indicate a hazard bit for the first phase of the input address. Moreover, the input address to which the hazard bit is indicated may be deallocated from the redundant walk detector 1225. As depicted in operation S109, input addresses are not assigned to traversers 1223 and 1224 and redundant traversal detector 1225 until the indicated hazard bits are cleared. Page table walker 1220 does not provide the current input address to the traversers 1223 and 1224, does not perform a page table walk for the current input address, and may cancel or stop the page table walk while it is being performed.
In operation S129, when the matching level is not higher than the walk cache hit level (s123=no), the walker may check whether the page table walk for the input address is completed. In operation S133, when the page table walk for the input address is not completed (s129=no), the walker may acquire the output address indicated by the index of the input address and not found from the page table walk buffer 1230. In operation S136, the walker may store the acquired output address in the page table walk buffer 1230 (i.e., update the page table walk buffer 1230). The output address acquired by the traverser may also be stored in the redundant traversal detector 1225 (i.e., the redundant traversal detector 1225 is updated).
In operation S139, the redundant walk detector 1225 may obtain or calculate a walk cache level that is updated when the output address indicated by the index of the input address is stored in the page table walk cache 1230. Redundant walk detector 1225 may clear the first stage hazard bits of any other page table walk based on the result of comparing the walk cache level of the current input address to the matching levels of the current input address and other input addresses. For example, when the traversal cache level reaches or is the same as the match level, the redundant traversal detector 1225 can clear the hazard bit of another input address that was previously input. Operations S133 and S136 may be repeatedly performed until it is determined that the page table walk is completed in operation S129; when operations S133 and S136 are repeatedly performed, the traversal cache level may become gradually higher.
When the page table walk is completed (s129=yes), the allocation of the input address may be released from the page table walk scheduler 1221 and the redundant walk detector 1225 in operation S143. In operation S146, the MMU 1200 may refer to the address translation information stored in the TLB 1210 to obtain a physical address corresponding to the virtual address (i.e., the input address).
Fig. 14A and 14B illustrate a flow chart of the page table walker of fig. 2 described with reference to fig. 8A and 8B performing a first stage of page table walk that translates virtual addresses to intermediate physical addresses and a second stage of page table walk that translates intermediate physical addresses to physical addresses. Fig. 14A and 14B will be described together.
As in operation S103, the page table walker 1220 may receive a virtual address (i.e., an input address) after a TLB miss in operation S203. As in operation S106, the page table walker 1220 may allocate or provide the virtual address and context to the page table walk scheduler 1221 in operation S206. As in operation S109, in operation S209, the page table walk scheduler 1221 may check whether the hazard bits of the first stage or the second stage of the virtual address are marked. As described above, the hazard bits can be managed together by the redundant walk detector 1225 and the second redundant walk detector 1226 as the first redundant walk detector with respect to the first stage and the second stage. Alternatively, the first stage of hazard bits may be managed by a redundant traversal detector 1225 that is a first redundant traversal detector, and the second stage of hazard bits may be managed by a second redundant traversal detector 1226.
As in operation S113, when the hazard bit is not marked or cleared (s209=no), the page table walk scheduler 1221 may assign a virtual address to either one of the walkers 1223 and 1224 and the redundant walk detector 1225 as the first redundant walk detector in operation S213. As in operation S116, the traverser assigned the input address may check whether partial and complete address translation information associated with the virtual address and the context is stored in the page table walk buffer 1230 in operation S216. For example, the partial and complete translation information may be descriptors indicated by the index of the virtual address, and the page table walk cache 1230 may be a first stage page table walk cache S1WC. As in operation S119, the traverser may skip the operation of acquiring the output address until the hit level of the first stage of operation S216 in operation S219. As in operation S123, in operation S223, the redundant walk detector 1225, which is the first redundant walk detector, may detect whether there is redundancy in an operation (e.g., page table walk) in which the traverser acquires the output address indicated by the index of the miss in the page table walk buffer 1230, by comparing the walk hit level with the match level of the first stage. As in operation S126, when the matching level of the first stage is higher than the traversal cache hit level (s223=yes), the redundant traversal detector 1225 as the first redundant traversal detector may indicate the first-stage hazard bit of the virtual address in operation S226. The input address to which the hazard bit is indicated may be deallocated from the redundant walk detector 1225, which is the first redundant walk detector.
In operation S229, when the matching level is not greater than the traversal cache hit level (s223=no), the page table walk scheduler 1221 may assign an intermediate physical address of the virtual address to the second redundant walk detector 1226. In operation S233, the walker (e.g., the same as the walker of operation S216) may determine whether partial and complete address translation information (e.g., a descriptor indicated by an index of the intermediate physical address) of the intermediate physical address is stored in the page table walk buffer 1230 (e.g., the second-stage page table walk buffer S2 WC). Here, both the first-stage page table walk buffer S1WC and the second-stage page table walk buffer S2WC may be included in the page table walk buffer 1230, or the first-stage page table walk buffer S1WC and the second-stage page table walk buffer S2WC may be separately implemented in the page table walk buffer 1230. In operation S236, the traverser may skip the operation of acquiring the output address until the hit level of the second stage of operation S233.
In operation S239, the second redundant walk detector 1226 may detect whether there is redundancy in an operation (e.g., a page table walk) in which the walker acquires the output address indicated by the index missed in the page table walk buffer 1230 by comparing the walk hit level with the matching level of the second stage. The second redundant walk detector 1226 may calculate a second stage level of matching between the intermediate physical address of the pending page table walk and the current intermediate physical address. The match level may indicate the level of the match index for the current intermediate physical address and any other input addresses. The second redundant traversal detector 1226 can calculate the highest (largest) match level of the match levels as the match level for the current intermediate physical address. Also, the redundant walk detector 1226 may find the page table walk buffer 1230 by using the index of the current intermediate physical address, and may obtain the second stage walk cache hit level. In operation S243, when the matching level of the second stage is higher than the traversal cache hit level (s239=yes), the second redundant traversal detector 1226 may mark the hazard bits of the second stage of the intermediate physical address. The intermediate physical address to which the hazard bits are marked may be deallocated from the second redundant traversal detector 1226.
In operation S246, when the matching level is not higher than the walk cache hit level (s239=no), the walker may check whether the page table walk of the second stage of the intermediate physical address is completed. When the page table walk is not completed (s246=no), the walker may obtain the output address indicated by the index of the intermediate physical address and not found from the page table walk buffer 1230 in operation S249. In operation S253, the walker may store the acquired output address in the page table walk buffer 1230 (i.e., update the page table walk buffer 1230). The output address acquired by the traverser may also be stored in the second redundant traversal detector 1226 (i.e., the second redundant traversal detector 1226 is updated).
In operation S256, the second redundant walk detector 1226 may obtain or calculate a walk cache level that is updated when the output address indicated by the index of the intermediate physical address is stored in the page table walk cache 1230. The second redundant walk detector 1226 may clear the second stage hazard bits of any other page table walk based on the result of the comparison of the walk cache level of the current intermediate physical address with the matching level between the current intermediate physical address and other intermediate physical addresses. Operations S249 and S253 may be repeatedly performed until it is determined that the page table walk of the second stage is completed in operation S246. When operations S249 and S253 are repeatedly performed, the traversal cache hit level may become gradually higher.
When the page table walk is completed (s246=yes), the intermediate physical address may be deallocated from the second redundant walk detector 1226 in operation S259. Thereafter, operations S263 to S273 may be substantially the same as operations S129 to S139 of fig. 13. When the first page table walk is completed (s263=yes), the input address may be deallocated from the page table walk scheduler 1221 and the redundant walk detector 1225 as the first redundant walk detector in operation S276. In operation S279, the MMU 1200 may obtain a physical address corresponding to a virtual address (i.e., an input address) with reference to address translation information stored in the TLB 1210.
According to embodiments of the present disclosure, redundancy of page table traversals may be predicted and detected by comparing the matching level to the traversal cache hit level. The processor may perform other page table walks without redundancy, thereby improving the performance of the processor and reducing power consumption.
While the teachings of the inventive concepts described herein have been made with reference to the exemplary embodiments thereof, it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the disclosure as set forth in the following claims.

Claims (19)

1. A processor, comprising:
a page table walk buffer configured to store address translation information; and
the page table walker is used to traverse the page table,
wherein the page table walker is configured to:
obtaining a first output address indicated by a first index of a first input address by looking up the address translation information and at least a portion of a page table; and
comparing a level of match between a second index of a second input address and the first index of the first input address with a level of walk cache hit obtained by looking up the page table walk cache using the second index;
wherein the level of match indicates a degree of match of the index of the second input address with the index of the first input address,
the walk cache hit level indicates a level at which a hit occurs in the page table walk cache when the page table walk cache is looked up by using the second index,
wherein the page table walker is further configured to: redundancy of page table walk for the second input address is detected in advance based on a comparison result of the matching level and the walk cache hit level.
2. The processor of claim 1, wherein each of the first input address and the second input address is a virtual address, and
Each of the first output address and a second output address indicated by the second index of the second input address is a physical address.
3. The processor of claim 1, wherein each of the first input address and the second input address is an intermediate address, and
each of the first output address and a second output address indicated by the second index of the second input address is a physical address.
4. The processor of claim 1, wherein each of the first input address and the second input address is a virtual address, and
each of the first output address and a second output address indicated by the second index of the second input address is an intermediate address.
5. The processor of claim 1, wherein when the match level is detected to be higher than the walk cache hit level, the page table walker does not perform fetching of a second output address indicated by the second index of the second input address until the match level is equal to or less than a walk cache level updated when each of the first output addresses is stored in the page table walk cache.
6. The processor of claim 1, wherein when the match level is detected to be higher than the walk cache hit level during the retrieval of the second output address indicated by the second index of the second input address, the page table walker stops retrieving the second output address indicated by the second index until the match level is equal to or less than a walk cache level updated when each of the first output addresses is stored in the page table walk cache.
7. The processor of claim 1, wherein pre-detecting redundancy in page table walk for the second input address indicates that there is redundancy in pre-detecting an operation of looking up the at least a portion of page tables using an index matching the first index of the first input address among the second index of the second input address.
8. The processor of claim 1, wherein the match level is a first match level, and
wherein the page table walker is further configured to:
obtaining a third output address indicated by a third index of a third input address by looking up the address translation information and at least a portion of a page table; and is also provided with
And comparing the second matching level with the traversal cache hit level when a second matching level between the second index of the second input address and the third index of the third input address is greater than the first matching level during the retrieval of the first output address and the third output address.
9. The processor of claim 1, wherein the page table walker comprises:
a page table walk scheduler configured to manage a first entry to which information about a walk request including the first input address is input and a second entry to which information about a walk request including the second input address is input; and
a plurality of traversers configured to obtain the first output address and obtain a second output address indicated by the second index of the second input address.
10. The processor of claim 9, wherein the second entry of the page table walk scheduler includes a hazard bit marked according to a comparison of the match level and the walk cache hit level.
11. The processor of claim 10, wherein when the hazard bit is marked, the page table walk scheduler does not provide the second index of the walk request with the second input address included in the second entry to the plurality of walkers until the hazard bit is cleared.
12. A processor, comprising:
a page table walk buffer configured to store address translation information; and
the page table walker is used to traverse the page table,
wherein the page table walker is configured to:
obtaining a first intermediate address indicated by a first index of a first input address by looking up at least part of the address translation information and a first page table of a first stage, and obtaining a first output address indicated by a second index of each of the first intermediate addresses by looking up at least part of the address translation information and a second page table of a second stage; and is also provided with
Comparing a level of match between a fourth index of each second intermediate address indicated by a third index of a second input address and the second index of each of the first intermediate addresses with a level of traversal cache hit obtained by looking up the page table walk cache using the fourth index,
wherein said level of match indicates a degree of match of said fourth index of each said second intermediate address with said second index of each said first intermediate address,
the walk cache hit level indicates a level of hit that occurred in the page table walk cache when the page table walk cache was looked up by using the fourth index, wherein the page table walk is further configured to: redundancy of page table walk for each of the second intermediate addresses is detected in advance based on a result of the comparison of the matching level and the walk cache hit level.
13. The processor of claim 12, wherein a walk cache level is updated as each of the first output addresses is stored in the page table walk cache, and the page table walk performs fetching a second output address indicated by the fourth index of each of the second intermediate addresses by looking up at least a portion of the address translation information and the second page table of the second stage when the walk cache level is detected to reach the match level.
14. The processor of claim 12, wherein the page table walker comprises:
a page table walk scheduler configured to manage a first entry to which information about a walk request including the first input address is input and a second entry to which information about a walk request including the second input address is input;
a plurality of traversers configured to obtain the first intermediate addresses associated with the first input addresses and the first output addresses, and obtain the second intermediate addresses associated with the second input addresses and second output addresses indicated by the fourth index of each of the second intermediate addresses; and
A redundant walk detector configured to compare the match level to the walk cache hit level.
15. The processor of claim 14, wherein the second entry of the page table walk scheduler includes a hazard bit marked according to a comparison of the match level and the walk cache hit level.
16. The processor of claim 15, wherein a first one of the plurality of traversers performs a first page table traversal to obtain the first intermediate address and the first output address,
wherein a second one of the plurality of traversers performs a second page table walk to obtain the second intermediate address and the second output address, an
Wherein the second page table walk performed by the second walker is canceled when the redundant walk detector marks the hazard bit.
17. The processor of claim 15, wherein the second entry of the page table walk scheduler further comprises a hazard ID bit indicating a number of walkers among the plurality of walkers performing the obtaining the first intermediate address and the first output address.
18. A processor, comprising:
a page table walk buffer configured to store address translation information; and
the page table walker is used to traverse the page table,
wherein the page table walker is configured to:
obtaining a first intermediate address indicated by a first index of a first input address by looking up at least part of the address translation information and a first page table of a first stage, and obtaining a first output address indicated by a second index of each of the first intermediate addresses by looking up at least part of the address translation information and a second page table of a second stage;
comparing a first matching level between a third index of a second input address and the first index of the first input address with a first traversal cache hit level obtained by using the third index to look up the page table walk cache, an
Comparing a second match level between a fourth index of each second intermediate address indicated by the third index of the second input address and the second index of each first intermediate address with a second traversal cache hit level obtained by looking up the page table walk cache using the fourth index,
Wherein the first match level indicates a degree of match of the third index of the second input address and the first index of the first input address,
the second matching level indicates a degree of matching of the fourth index of each second intermediate address and the second index of each first intermediate address,
the first walk cache hit level indicates a level at which a hit occurs in the page table walk cache when the page table walk cache is looked up by using the third index,
the second walk cache hit level indicates a level at which a hit occurs in the page table walk cache when the page table walk cache is looked up by using the fourth index,
wherein the page table walker is further configured to:
based on the comparison of the first match level and the first walk cache hit level, redundancy of page table walk for the second input address is pre-detected,
redundancy of page table walk for the second intermediate address is pre-detected based on a result of the comparison of the second match level and the second walk cache hit level.
19. The processor of claim 18, wherein the page table walker comprises:
A page table walk scheduler configured to manage a first entry to which information about a walk request including the first input address is input and a second entry to which information about a walk request including the second input address is input;
a plurality of traversers configured to obtain the first intermediate addresses associated with the first input addresses and the first output addresses, and obtain the second intermediate addresses associated with the second input addresses and second output addresses indicated by a fourth index of each of the second intermediate addresses;
a first redundant walk detector configured to compare the first match level to the first walk cache hit level; and
a second redundant walk detector configured to compare the second match level to the second walk cache hit level.
CN201911105191.7A 2019-02-08 2019-11-12 Processor for detecting redundancy of page table walk Active CN111552654B (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US201962803227P 2019-02-08 2019-02-08
US62/803,227 2019-02-08
KR1020190022184A KR20200098354A (en) 2019-02-08 2019-02-26 Processor to detect redundancy of page table walk
KR10-2019-0022184 2019-02-26

Publications (2)

Publication Number Publication Date
CN111552654A CN111552654A (en) 2020-08-18
CN111552654B true CN111552654B (en) 2024-03-19

Family

ID=71738945

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911105191.7A Active CN111552654B (en) 2019-02-08 2019-11-12 Processor for detecting redundancy of page table walk

Country Status (3)

Country Link
US (1) US11210232B2 (en)
CN (1) CN111552654B (en)
DE (1) DE102019128465A1 (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11663141B2 (en) * 2019-10-11 2023-05-30 Texas Instruments Incorporated Non-stalling, non-blocking translation lookaside buffer invalidation
US11461237B2 (en) * 2019-12-03 2022-10-04 International Business Machines Corporation Methods and systems for translating virtual addresses in a virtual memory based system
US11163695B2 (en) 2019-12-03 2021-11-02 International Business Machines Corporation Methods and systems for translating virtual addresses in a virtual memory based system
US11422947B2 (en) * 2020-08-12 2022-08-23 International Business Machines Corporation Determining page size via page table cache
WO2023288192A1 (en) * 2021-07-14 2023-01-19 Nuvia, Inc. Level-aware cache replacement
US11586371B2 (en) * 2021-07-23 2023-02-21 Vmware, Inc. Prepopulating page tables for memory of workloads during live migrations
CN114238167B (en) * 2021-12-14 2022-09-09 海光信息技术股份有限公司 Information prefetching method, processor and electronic equipment
CN114238176B (en) * 2021-12-14 2023-03-10 海光信息技术股份有限公司 Processor, address translation method for processor and electronic equipment
CN114281720B (en) * 2021-12-14 2022-09-02 海光信息技术股份有限公司 Processor, address translation method for processor and electronic equipment
CN114218132B (en) * 2021-12-14 2023-03-24 海光信息技术股份有限公司 Information prefetching method, processor and electronic equipment
US11822472B2 (en) * 2022-01-13 2023-11-21 Ceremorphic, Inc. Memory management unit for multi-threaded architecture

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5586283A (en) * 1993-10-07 1996-12-17 Sun Microsystems, Inc. Method and apparatus for the reduction of tablewalk latencies in a translation look aside buffer
US5751990A (en) * 1994-04-26 1998-05-12 International Business Machines Corporation Abridged virtual address cache directory
CN102722452A (en) * 2012-05-29 2012-10-10 南京大学 Memory redundancy eliminating method
TW201617885A (en) * 2014-11-14 2016-05-16 凱為公司 Caching TLB translations using a unified page table walker cache
CN106030501A (en) * 2014-09-30 2016-10-12 株式会社日立制作所 Distributed storage system
CN107636626A (en) * 2015-05-29 2018-01-26 高通股份有限公司 Predictive for the conversion of MMU (MMU) prefetches
CN116246685A (en) * 2021-12-08 2023-06-09 三星电子株式会社 Memory device outputting test result and method of testing the same

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6549985B1 (en) 2000-03-30 2003-04-15 I P - First, Llc Method and apparatus for resolving additional load misses and page table walks under orthogonal stalls in a single pipeline processor
US7111145B1 (en) 2003-03-25 2006-09-19 Vmware, Inc. TLB miss fault handler and method for accessing multiple page tables
US7975109B2 (en) * 2007-05-30 2011-07-05 Schooner Information Technology, Inc. System including a fine-grained memory and a less-fine-grained memory
US9092358B2 (en) 2011-03-03 2015-07-28 Qualcomm Incorporated Memory management unit with pre-filling capability
US9684601B2 (en) * 2012-05-10 2017-06-20 Arm Limited Data processing apparatus having cache and translation lookaside buffer
US20130326143A1 (en) 2012-06-01 2013-12-05 Broadcom Corporation Caching Frequently Used Addresses of a Page Table Walk
US9235529B2 (en) 2012-08-02 2016-01-12 Oracle International Corporation Using broadcast-based TLB sharing to reduce address-translation latency in a shared-memory system with optical interconnect
US9213649B2 (en) 2012-09-24 2015-12-15 Oracle International Corporation Distributed page-table lookups in a shared-memory system
US10380030B2 (en) * 2012-12-05 2019-08-13 Arm Limited Caching of virtual to physical address translations
US8984255B2 (en) 2012-12-21 2015-03-17 Advanced Micro Devices, Inc. Processing device with address translation probing and methods
GB2528842B (en) 2014-07-29 2021-06-02 Advanced Risc Mach Ltd A data processing apparatus, and a method of handling address translation within a data processing apparatus
US20160179662A1 (en) 2014-12-23 2016-06-23 David Pardo Keppel Instruction and logic for page table walk change-bits
US20160378684A1 (en) 2015-06-26 2016-12-29 Intel Corporation Multi-page check hints for selective checking of protected container page versus regular page type indications for pages of convertible memory
US10127627B2 (en) 2015-09-23 2018-11-13 Intel Corporation Mapping graphics resources to linear arrays using a paging system

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5586283A (en) * 1993-10-07 1996-12-17 Sun Microsystems, Inc. Method and apparatus for the reduction of tablewalk latencies in a translation look aside buffer
US5751990A (en) * 1994-04-26 1998-05-12 International Business Machines Corporation Abridged virtual address cache directory
CN102722452A (en) * 2012-05-29 2012-10-10 南京大学 Memory redundancy eliminating method
CN106030501A (en) * 2014-09-30 2016-10-12 株式会社日立制作所 Distributed storage system
TW201617885A (en) * 2014-11-14 2016-05-16 凱為公司 Caching TLB translations using a unified page table walker cache
CN107636626A (en) * 2015-05-29 2018-01-26 高通股份有限公司 Predictive for the conversion of MMU (MMU) prefetches
CN116246685A (en) * 2021-12-08 2023-06-09 三星电子株式会社 Memory device outputting test result and method of testing the same

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
董卫宇 ; 刘金鑫 ; 戚旭衍 ; 何红旗 ; 蒋烈辉 ; .基于热例程的动态二进制翻译优化.计算机科学.2016,27-33+41. *

Also Published As

Publication number Publication date
CN111552654A (en) 2020-08-18
DE102019128465A1 (en) 2020-08-13
DE102019128465A9 (en) 2020-10-08
US11210232B2 (en) 2021-12-28
US20200257635A1 (en) 2020-08-13

Similar Documents

Publication Publication Date Title
CN111552654B (en) Processor for detecting redundancy of page table walk
US9747218B2 (en) CPU security mechanisms employing thread-specific protection domains
US10802987B2 (en) Computer processor employing cache memory storing backless cache lines
CN109074316B (en) Page fault solution
CN107111455B (en) Electronic processor architecture and method of caching data
JP2833062B2 (en) Cache memory control method, processor and information processing apparatus using the cache memory control method
KR101770496B1 (en) Efficient address translation caching in a processor that supports a large number of different address spaces
US8296547B2 (en) Loading entries into a TLB in hardware via indirect TLB entries
US9405702B2 (en) Caching TLB translations using a unified page table walker cache
US8190652B2 (en) Achieving coherence between dynamically optimized code and original code
US10831675B2 (en) Adaptive tablewalk translation storage buffer predictor
US11403222B2 (en) Cache structure using a logical directory
US9928000B2 (en) Memory mapping for object-based storage devices
US11775445B2 (en) Translation support for a virtual cache
US9208082B1 (en) Hardware-supported per-process metadata tags
US10606762B2 (en) Sharing virtual and real translations in a virtual cache
US20190026231A1 (en) System Memory Management Unit Architecture For Consolidated Management Of Virtual Machine Stage 1 Address Translations
US20200174945A1 (en) Managing Translation Lookaside Buffer Entries Based on Associativity and Page Size
TWI805866B (en) Processor to detect redundancy of page table walk
US11119945B1 (en) Context tracking for multiple virtualization layers in a virtually tagged cache

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant