WO2009045884A2 - Address translation caching and i/o cache performance improvement in virtualized environments - Google Patents

Address translation caching and i/o cache performance improvement in virtualized environments Download PDF

Info

Publication number
WO2009045884A2
WO2009045884A2 PCT/US2008/077819 US2008077819W WO2009045884A2 WO 2009045884 A2 WO2009045884 A2 WO 2009045884A2 US 2008077819 W US2008077819 W US 2008077819W WO 2009045884 A2 WO2009045884 A2 WO 2009045884A2
Authority
WO
WIPO (PCT)
Prior art keywords
cache
memory access
access request
hint
logic
Prior art date
Application number
PCT/US2008/077819
Other languages
French (fr)
Other versions
WO2009045884A3 (en
Inventor
Mahesh Wagh
Jasmin Ajanovic
Original Assignee
Intel Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intel Corporation filed Critical Intel Corporation
Priority to CN200880110445.8A priority Critical patent/CN101868786A/en
Priority to RU2010104040/08A priority patent/RU2483347C2/en
Publication of WO2009045884A2 publication Critical patent/WO2009045884A2/en
Publication of WO2009045884A3 publication Critical patent/WO2009045884A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/0223User address space allocation, e.g. contiguous or non contiguous base addressing
    • G06F12/023Free address space management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • G06F12/1027Address translation using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/12Replacement control
    • G06F12/121Replacement control using replacement algorithms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • G06F12/1009Address translation using page tables, e.g. page table structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/15Use in a specific computing environment
    • G06F2212/152Virtualized environment, e.g. logically partitioned system

Definitions

  • the present disclosure generally relates to the field of electronics. More particularly, an embodiment of the invention relates to improving address translation caching and/or input/output (I/O) cache performance in virtualized environments.
  • I/O virtualization is a technology being developed to ensure that I/O devices function properly in a virtualized environment.
  • a virtualized environment may be an environment in which more than one operating system (OS) may be active at the same time.
  • OS operating system
  • Some implementations of I/O virtualization may utilize hardware structures to improve performance. Such implementations may however require a relatively high gate count to realize, which would in turn be more costly and/or complex to implement.
  • FIGs. 1-3 illustrate block diagrams of embodiments of computing systems, which may be utilized to implement various embodiments discussed herein.
  • Fig. 4 illustrates a flow diagram of a method according to an embodiment.
  • Some of the embodiments discussed herein may improve address translation caching (such as virtualization for directed I/O (VTd) address translation) and/or I/O cache performance in virtualized environments. More specifically, some virtualization services may be implemented in hardware structures that are utilized to translate a guest physical address (GPA) to host physical addresses (HPA). Accordingly, such structures may provide caching support, e.g., in the form of I/O look-aside-buffers (IOTLBs) to cache the GPA to HPA translations. In some embodiments, these caching structures may provide lower latency for requests that target the same address translation. Furthermore, some of the techniques may be utilized in various types of computing environments, such as those discussed with reference to Figs. 1-4.
  • Fig. 1 illustrates a block diagram of a computing system 100, according to an embodiment of the invention.
  • the system 100 may include one or more agents 102-1 through 102-M (collectively referred to herein as "agents 102" or more generally “agent 102").
  • agents 102 may be components of a computing system, such as the computing systems discussed with reference to Figs. 2-4.
  • the agents 102 may communicate via a network fabric 104.
  • the network fabric 104 may include a computer network that allows various agents (such as computing devices) to communicate data.
  • the network fabric 104 may include one or more interconnects (or interconnection networks) that communicate via a serial (e.g., point-to-point) link and/or a shared communication network.
  • a serial link e.g., point-to-point
  • some embodiments may facilitate component debug or validation on links that allow communication with fully buffered dual in-line memory modules (FBD), e.g., where the FBD link is a serial link for coupling memory modules to a host controller device (such as a processor or memory hub).
  • Debug information may be transmitted from the FBD channel host such that the debug information may be observed along the channel by channel traffic trace capture tools (such as one or more logic analyzers).
  • the system 100 may support a layered protocol scheme, which may include a physical layer, a link layer, a routing layer, a transport layer, and/or a protocol layer.
  • the fabric 104 may further facilitate transmission of data (e.g., in form of packets) from one protocol (e.g., caching processor or caching aware memory controller) to another protocol for a point-to-point or shared network.
  • the network fabric 104 may provide communication that adheres to one or more cache coherent protocols.
  • Fig. 2 illustrates a block diagram of portions of a computing system 200, according to an embodiment.
  • various components of the system 200 may be implemented within one of the agents 102-1 and/or 102-M discussed with reference to Fig. 1. Further details regarding some of the operations of the computing system 200 will be discussed herein with reference to Fig. 4.
  • the system 200 may include one or more processors 202-1 through 202-
  • processors 202 may include various components, such as private or shared cache(s), execution unit(s), one or more cores, etc.
  • each of the processors 202 may have access to a memory 204 (e.g., memories 204-1 through 204-N).
  • the system 200 may include an optional system memory 206 that may be shared by various components of the system 200, including, for example, one or more of the processors 202, components of an uncore or chipset (CS) 208, or components coupled to the uncore 208, etc.
  • CS uncore or chipset
  • One or more of the memories 204 and/or 206 may store one or more operating systems.
  • the system 200 may be capable of executing a plurality of operating systems (e.g., at the same time) in some embodiments.
  • the uncore 208 may include various components such as root complex (RC) cache 210 (e.g., that may be shared amongst various components of a computing system such as the system 200).
  • the RC cache 210 may be present in a memory control hub (MCH) and/or a graphics MCH (GMCH) portion of a chipset or uncore (e.g., CS/uncore 208).
  • MCH memory control hub
  • GMCH graphics MCH
  • the RC cache 210 may communicate with other components via a data path 212 (which may include an optional core interconnect 214, e.g., to facilitate communication between one or more cores of the processors 202 and other components of the system 200).
  • the system 200 may further include a prefetch logic 216, e.g., to prefetch data (including instructions or micro-operations) from various locations (such as one or more of the memories 204, the system memory 206, other storage devices, including for example a volatile or non- volatile memory device, etc.) into an IOTLB 220 (e.g., via virtualization or translation logics 222-1 through 222-P (collectively referred to herein as "logics 222" or more generally “logic 222”)).
  • a prefetch logic 216 e.g., to prefetch data (including instructions or micro-operations) from various locations (such as one or more of the memories 204, the system memory 206, other storage devices, including for example a volatile or non- volatile memory device, etc.) into an IOTLB 220 (e.g., via virtualization or translation logics 222-1 through 222-P (collectively referred to herein as "logics 222" or more generally “logic 222”)).
  • the data path 212 may be coupled to one or more I/O devices. Any type of an I/O device may be utilized.
  • the I/O devices may include one or more devices 224-1 through 224-P (collectively referred to herein as "endpoint devices 224" or more generally “endpoint 224").
  • the endpoint devices 224 may be peripheral component interconnect (PCI) devices in an embodiment.
  • PCI peripheral component interconnect
  • PCI bus PCI Local Bus Specification, Revision 3.0, March 9, 3004, available from the PCI Special Interest Group, Portland, Oregon, U.S.A. (hereinafter referred to as a "PCI bus").
  • PCI-X Specification Rev. 3.0a, April 33, 3003, hereinafter referred to as a "PCI-X bus”
  • PCIe PCI Express
  • peripherals coupled to the CS/uncore 208 may include, in various embodiments of the invention, integrated drive electronics (IDE) or small computer system interface (SCSI) hard drive(s), universal serial bus (USB) device(s), a keyboard, a mouse, parallel port(s), serial port(s), floppy disk drive(s), digital output support (e.g., digital video interface (DVI)), etc.
  • IDE integrated drive electronics
  • SCSI small computer system interface
  • USB universal serial bus
  • DVI digital video interface
  • the endpoint devices 224 may communicate through root ports 226-1 through 226-P (collectively referred to herein as "ports 226" or more generally “port 226") with other components of system 200 such as the logics 222.
  • the logics 222 may perform address translation operations for virtualized environments, such as translating virtual addresses into physical addresses, e.g., by reference to the IOTLB 220.
  • the physical addresses may correspond to locations (e.g., entries) with a system memory 206.
  • the logic 222 may additionally perform other operations such as those discussed with reference to Figs. 3 and 4 which may involve translation of GPA and HPA of entries in a memory device coupled to the systems 200 and/or 300 (such as the system memory 206).
  • the logic 222 may be a root complex in accordance with the PCIe specification.
  • the processors 202 may be any type of processor such as a general purpose processor, a network processor (which may process data communicated over a computer network 250), etc. (including a reduced instruction set computer (RISC) processor or a complex instruction set computer (CISC)).
  • the processors 202 may have a single or multiple core design.
  • the processors 202 with a multiple core design may integrate different types of processor cores on the same integrated circuit (IC) die.
  • the processors 202 with a multiple core design may be implemented as symmetrical or asymmetrical multiprocessors.
  • at least one or more of the endpoint devices 224 may be coupled to the network 250 in an embodiment.
  • the processors 202 may include one or more caches (not shown), which may be private and/or shared in various embodiments.
  • a cache stores data corresponding to original data stored elsewhere or computed earlier. To reduce memory access latency, once data is stored in a cache, future use may be made by accessing a cached copy rather than refetching or re-computing the original data.
  • the cache(s) discussed herein may be any type of cache, such a level 1 (Ll) cache, a level 2 (L2) cache, a level 3 (L3), a mid-level cache, a last level cache (LLC), combinations thereof, etc. to store electronic data (e.g., including instructions) that is utilized by one or more components of the system 200.
  • the systems 200 and/or 300 may also include other devices such as one or more of: a display device (e.g., coupled to the CS/uncore 208 to display images), an audio device (e.g., coupled to the CS/uncore 208 to process audio signals), etc.
  • a display device e.g., coupled to the CS/uncore 208 to display images
  • an audio device e.g., coupled to the CS/uncore 208 to process audio signals
  • endpoint devices 224 which may communicate with the CS/uncore 208 via root ports 226, for example.
  • FIG. 3 illustrates a block diagram of portions of a computing system 300, according to an embodiment.
  • various components of the system 300 may be implemented within one of the agents 102-1 and/or 102-M discussed with reference to Fig. 1. Further details regarding some of the operations of the computing system 300 will be discussed herein with reference to Fig. 4.
  • the system 300 may include one or more of the processors 202, memories 204, system memory 206, RC cache 210, data path 212, optional core interconnect 214, prefetch logic 216, IOTLB 220, logic 222, endpoints devices 224, and root ports 226. Also, as illustrated, the RC cache 210 and IOTLB 220 may be combined into a single cache in one embodiment.
  • Fig. 4 illustrates a flow diagram of a method 400 to update information stored in an I/O cache to improve address translation caching and/or I/O cache performance in virtualized environments, according to an embodiment.
  • various components discussed with reference to Figs. 1-3 and 5 may be utilized to perform one or more of the operations discussed with reference to Fig. 4.
  • the method 400 starts with receiving a memory access request.
  • a memory access request (such as a read or write access) may be generated by one of the endpoints 224 and received by a corresponding virtualization logic 222 through one of the ports 226 at operation 402.
  • the virtualization logic 222 may access the IOTLB 220, the RC cache 210, and/or combinations thereof (such as shown in Fig. 3) at operation 404. If a corresponding entry is absent, the data may be fetched into the cache at operation 406 (e.g., by the virtualization logic 222 and/or the prefetch logic 216).
  • corresponding data may have been pre-fetched into cache by the logic 216 prior to operation 402.
  • the prefetch request is issued by one of the endpoint devices 224 to fetch-ahead and maintain coherent copies of the targeted address location.
  • These prefetch requests also would enable to warm up the IOTLB 220, RC cache 210, and/or combinations thereof; the entries would be allocated and cached until the request is issued by the device.
  • the demand request ACH settings would determine if the entry in the IOTLB 220, RC cache 210, and/or combinations thereof, needs to be maintained or tagged for replacement.
  • the memory access request may be processed at an operation 410, e.g., by translating HPA and GPA addresses and/or physical/virtual addresses by reference to entries within the IOTLB 220, RC cache 210, and/or combinations thereof.
  • address translation caching and/or I/O cache performance in virtualized environments performance may be improved based on I/O device traffic hints (which may be also referred to herein as access control hints (ACHs)).
  • ACHs access control hints
  • ACHs may be supplied by an I/O device (e.g., one of the endpoints 224) in the memory request (e.g., over PCIe) to indicate if the device would access the same address again.
  • an operation 412 may determine whether the hint indicates future access to the same address. This information may be stored in one or more bits corresponding to a cache entry (e.g., an entry within the IOTLB 220, RC cache 210, and/or combinations thereof) that would be useful in cache line replacement policies, for example, where cached translations without the intended re-use bit set (or cleared depending on the implementation) would be candidates for replacements.
  • the logic 222 may perform operation 412.
  • the method 400 resumes with operation 410. Otherwise, the corresponding entry information may be updated at operation 414 (e.g., one or more bits for a corresponding entry in the IOTLB 220, RC cache 210, and/or combinations thereof may be updated by the corresponding logic 222). After operation 414, the method 400 resumes at operation 410.
  • the corresponding entry information may be updated at operation 414 (e.g., one or more bits for a corresponding entry in the IOTLB 220, RC cache 210, and/or combinations thereof may be updated by the corresponding logic 222).
  • consolidating IOTLB 220 and RC cache 210 structures into a combined IOTLB cache and RC cache structure may provide improved performance (e.g., improve latency for I/O transactions) and/or a more effective utilization of silicon real-estate (e.g., reduce the total number of gates).
  • snoops issued by a processor e.g., one or more of the processors 202 would look up in the RC cache 210 (or the combined I/O cache) using the physical address, the I/O accesses would look up the address in the RC cache 210 (or the combined I/O cache) based on GPA.
  • various cache replacement policies may be applied to the RC cache 210, IOTLB 220, and/or the combinations thereof. For example, some replacement policies may implement random replacement policies, whereas others may implement least recently used (LRU) policies.
  • LRU least recently used
  • the address translation latency and/or latency associated with servicing I/O requests may be reduced. Also, consolidation of storage (e.g., address or data) structures used for RC cache 210 and IOTLB 220 (e.g., into a single I/O cache) may yield improved silicon efficiency and better performance or silicon-area (e.g., through a reduction in gate count).
  • the operations discussed herein may be implemented as hardware (e.g., circuitry), software, firmware, microcode, or combinations thereof, which may be provided as a computer program product, e.g., including a machine-readable or computer-readable medium having stored thereon instructions (or software procedures) used to program a computer to perform a process discussed herein.
  • a computer program product e.g., including a machine-readable or computer-readable medium having stored thereon instructions (or software procedures) used to program a computer to perform a process discussed herein.
  • the term "logic” may include, by way of example, software, hardware, or combinations of software and hardware.
  • the machine-readable medium may include a storage device such as those discussed herein.
  • Nonvolatile memory may include one or more of the following: read-only memory (ROM), programmable ROM (PROM), erasable PROM (EPROM), electrically EPROM (EEPROM), a disk drive, a floppy disk, a compact disk ROM (CD-ROM), a digital versatile disk (DVD), flash memory, a magneto-optical disk, or other types of nonvolatile machine-readable media capable of storing electronic data (e.g., including instructions).
  • Volatile storage (or memory) may include devices such as random access memory (RAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), static RAM (SRAM), etc.
  • Such computer-readable media may be downloaded as a computer program product, wherein the program may be transferred from a remote computer (e.g., a server) to a requesting computer (e.g., a client) by way of data signals embodied in a carrier wave or other propagation medium via a communication link (e.g., a bus, a modem, or a network connection).
  • a remote computer e.g., a server
  • a requesting computer e.g., a client
  • a communication link e.g., a bus, a modem, or a network connection
  • connection may be used to indicate that two or more elements are in direct physical or electrical contact with each other.
  • Connected may be used to indicate that two or more elements are in direct physical or electrical contact with each other.
  • Coupled may mean that two or more elements are in direct physical or electrical contact. However, “coupled” may also mean that two or more elements may not be in direct contact with each other, but may still cooperate or interact with each other.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)
  • Executing Machine-Instructions (AREA)

Abstract

Methods and apparatus relating to improving address translation caching and/or input/output (I/O) cache performance in virtualized environments are described. In one embodiment, a hint provided by an endpoint device may be utilized to update information stored in an I/O cache. Such information may be utilized for implementation of a more efficient replacement policy in an embodiment. Other embodiments are also disclosed.

Description

ADDRESS TRANSLATION CACHING AND I/O CACHE PERFORMANCE IMPROVEMENT IN VIRTUALIZED ENVIRONMENTS
BACKGROUND
[0001] The present disclosure generally relates to the field of electronics. More particularly, an embodiment of the invention relates to improving address translation caching and/or input/output (I/O) cache performance in virtualized environments. [0002] I/O virtualization is a technology being developed to ensure that I/O devices function properly in a virtualized environment. Generally, a virtualized environment may be an environment in which more than one operating system (OS) may be active at the same time. Some implementations of I/O virtualization may utilize hardware structures to improve performance. Such implementations may however require a relatively high gate count to realize, which would in turn be more costly and/or complex to implement.
BRIEF DESCRIPTION OF THE DRAWINGS
[0003] The detailed description is provided with reference to the accompanying figures. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. The use of the same reference numbers in different figures indicates similar or identical items.
[0004] Figs. 1-3 illustrate block diagrams of embodiments of computing systems, which may be utilized to implement various embodiments discussed herein. [0005] Fig. 4 illustrates a flow diagram of a method according to an embodiment.
DETAILED DESCRIPTION
[0006] In the following description, numerous specific details are set forth in order to provide a thorough understanding of various embodiments. However, some embodiments may be practiced without the specific details. In other instances, well- known methods, procedures, components, and circuits have not been described in detail so as not to obscure the particular embodiments. Various aspects of embodiments of the invention may be performed using various means, such as integrated semiconductor circuits ("hardware"), computer-readable instructions organized into one or more programs ("software") or some combination of hardware and software. For the purposes of this disclosure reference to "logic" shall mean either hardware, software, or some combination thereof.
[0007] Some of the embodiments discussed herein may improve address translation caching (such as virtualization for directed I/O (VTd) address translation) and/or I/O cache performance in virtualized environments. More specifically, some virtualization services may be implemented in hardware structures that are utilized to translate a guest physical address (GPA) to host physical addresses (HPA). Accordingly, such structures may provide caching support, e.g., in the form of I/O look-aside-buffers (IOTLBs) to cache the GPA to HPA translations. In some embodiments, these caching structures may provide lower latency for requests that target the same address translation. Furthermore, some of the techniques may be utilized in various types of computing environments, such as those discussed with reference to Figs. 1-4.
[0008] More particularly, Fig. 1 illustrates a block diagram of a computing system 100, according to an embodiment of the invention. The system 100 may include one or more agents 102-1 through 102-M (collectively referred to herein as "agents 102" or more generally "agent 102"). In an embodiment, the agents 102 may be components of a computing system, such as the computing systems discussed with reference to Figs. 2-4.
[0009] As illustrated in Fig. 1, the agents 102 may communicate via a network fabric 104. In one embodiment, the network fabric 104 may include a computer network that allows various agents (such as computing devices) to communicate data. In an embodiment, the network fabric 104 may include one or more interconnects (or interconnection networks) that communicate via a serial (e.g., point-to-point) link and/or a shared communication network. For example, some embodiments may facilitate component debug or validation on links that allow communication with fully buffered dual in-line memory modules (FBD), e.g., where the FBD link is a serial link for coupling memory modules to a host controller device (such as a processor or memory hub). Debug information may be transmitted from the FBD channel host such that the debug information may be observed along the channel by channel traffic trace capture tools (such as one or more logic analyzers).
[0010] In one embodiment, the system 100 may support a layered protocol scheme, which may include a physical layer, a link layer, a routing layer, a transport layer, and/or a protocol layer. The fabric 104 may further facilitate transmission of data (e.g., in form of packets) from one protocol (e.g., caching processor or caching aware memory controller) to another protocol for a point-to-point or shared network. Also, in some embodiments, the network fabric 104 may provide communication that adheres to one or more cache coherent protocols.
[0011] Furthermore, as shown by the direction of arrows in Fig. 1, the agents
102 may transmit and/or receive data via the network fabric 104. Hence, some agents may utilize a unidirectional link while others may utilize a bidirectional link for communication. For instance, one or more agents (such as agent 102-M) may transmit data (e.g., via a unidirectional link 106), other agent(s) (such as agent 102-2) may receive data (e.g., via a unidirectional link 108), while some agent(s) (such as agent 102-1) may both transmit and receive data (e.g., via a bidirectional link 110). [0012] Fig. 2 illustrates a block diagram of portions of a computing system 200, according to an embodiment. In one embodiment, various components of the system 200 may be implemented within one of the agents 102-1 and/or 102-M discussed with reference to Fig. 1. Further details regarding some of the operations of the computing system 200 will be discussed herein with reference to Fig. 4.
[0013] The system 200 may include one or more processors 202-1 through 202-
N (collectively referred to herein as "processors 202" or more generally "processor 202"). Each of the processors 202-1 through 202-N may include various components, such as private or shared cache(s), execution unit(s), one or more cores, etc. Moreover, each of the processors 202 may have access to a memory 204 (e.g., memories 204-1 through 204-N). Also, the system 200 may include an optional system memory 206 that may be shared by various components of the system 200, including, for example, one or more of the processors 202, components of an uncore or chipset (CS) 208, or components coupled to the uncore 208, etc. One or more of the memories 204 and/or 206 may store one or more operating systems. Hence, the system 200 may be capable of executing a plurality of operating systems (e.g., at the same time) in some embodiments.
[0014] As shown in Fig. 2, the uncore 208 may include various components such as root complex (RC) cache 210 (e.g., that may be shared amongst various components of a computing system such as the system 200). In some embodiments, the RC cache 210 may be present in a memory control hub (MCH) and/or a graphics MCH (GMCH) portion of a chipset or uncore (e.g., CS/uncore 208). The RC cache 210 may communicate with other components via a data path 212 (which may include an optional core interconnect 214, e.g., to facilitate communication between one or more cores of the processors 202 and other components of the system 200). The system 200 may further include a prefetch logic 216, e.g., to prefetch data (including instructions or micro-operations) from various locations (such as one or more of the memories 204, the system memory 206, other storage devices, including for example a volatile or non- volatile memory device, etc.) into an IOTLB 220 (e.g., via virtualization or translation logics 222-1 through 222-P (collectively referred to herein as "logics 222" or more generally "logic 222")).
[0015] As shown in Fig. 2, in at least one embodiment, the data path 212 may be coupled to one or more I/O devices. Any type of an I/O device may be utilized. For illustrative purposes, in the embodiment illustrated in Fig. 2, the I/O devices may include one or more devices 224-1 through 224-P (collectively referred to herein as "endpoint devices 224" or more generally "endpoint 224"). The endpoint devices 224 may be peripheral component interconnect (PCI) devices in an embodiment. [0016] For example, the endpoint devices 224 may communicate with the
CS/uncore 208 in accordance with the PCI Local Bus Specification, Revision 3.0, March 9, 3004, available from the PCI Special Interest Group, Portland, Oregon, U.S.A. (hereinafter referred to as a "PCI bus"). Alternatively, the PCI-X Specification Rev. 3.0a, April 33, 3003, hereinafter referred to as a "PCI-X bus") and/or PCI Express (PCIe) Specifications (PCIe Specification, Revision 2.0, October 2006), available from the aforesaid PCI Special Interest Group, Portland, Oregon, USA, may be utilized. Further, other peripherals coupled to the CS/uncore 208 may include, in various embodiments of the invention, integrated drive electronics (IDE) or small computer system interface (SCSI) hard drive(s), universal serial bus (USB) device(s), a keyboard, a mouse, parallel port(s), serial port(s), floppy disk drive(s), digital output support (e.g., digital video interface (DVI)), etc.
[0017] As shown in Fig. 2, the endpoint devices 224 may communicate through root ports 226-1 through 226-P (collectively referred to herein as "ports 226" or more generally "port 226") with other components of system 200 such as the logics 222. In an embodiment, the logics 222 may perform address translation operations for virtualized environments, such as translating virtual addresses into physical addresses, e.g., by reference to the IOTLB 220. The physical addresses may correspond to locations (e.g., entries) with a system memory 206. The logic 222 may additionally perform other operations such as those discussed with reference to Figs. 3 and 4 which may involve translation of GPA and HPA of entries in a memory device coupled to the systems 200 and/or 300 (such as the system memory 206). Also, the logic 222 may be a root complex in accordance with the PCIe specification.
[0018] Moreover, the processors 202 may be any type of processor such as a general purpose processor, a network processor (which may process data communicated over a computer network 250), etc. (including a reduced instruction set computer (RISC) processor or a complex instruction set computer (CISC)). Moreover, the processors 202 may have a single or multiple core design. The processors 202 with a multiple core design may integrate different types of processor cores on the same integrated circuit (IC) die. Also, the processors 202 with a multiple core design may be implemented as symmetrical or asymmetrical multiprocessors. Also, as shown in Fig. 2, at least one or more of the endpoint devices 224 may be coupled to the network 250 in an embodiment.
[0019] Further, the processors 202 may include one or more caches (not shown), which may be private and/or shared in various embodiments. Generally, a cache stores data corresponding to original data stored elsewhere or computed earlier. To reduce memory access latency, once data is stored in a cache, future use may be made by accessing a cached copy rather than refetching or re-computing the original data. The cache(s) discussed herein (including, for example, RC cache 210, IOTLB 220, combinations thereof, etc.) may be any type of cache, such a level 1 (Ll) cache, a level 2 (L2) cache, a level 3 (L3), a mid-level cache, a last level cache (LLC), combinations thereof, etc. to store electronic data (e.g., including instructions) that is utilized by one or more components of the system 200.
[0020] In an embodiment, the systems 200 and/or 300 may also include other devices such as one or more of: a display device (e.g., coupled to the CS/uncore 208 to display images), an audio device (e.g., coupled to the CS/uncore 208 to process audio signals), etc. In some embodiments, such devices may be implemented as endpoint devices 224 (which may communicate with the CS/uncore 208 via root ports 226, for example).
[0021] Fig. 3 illustrates a block diagram of portions of a computing system 300, according to an embodiment. In one embodiment, various components of the system 300 may be implemented within one of the agents 102-1 and/or 102-M discussed with reference to Fig. 1. Further details regarding some of the operations of the computing system 300 will be discussed herein with reference to Fig. 4.
[0022] As shown in Fig. 3, the system 300 may include one or more of the processors 202, memories 204, system memory 206, RC cache 210, data path 212, optional core interconnect 214, prefetch logic 216, IOTLB 220, logic 222, endpoints devices 224, and root ports 226. Also, as illustrated, the RC cache 210 and IOTLB 220 may be combined into a single cache in one embodiment.
[0023] Fig. 4 illustrates a flow diagram of a method 400 to update information stored in an I/O cache to improve address translation caching and/or I/O cache performance in virtualized environments, according to an embodiment. In one embodiment, various components discussed with reference to Figs. 1-3 and 5 may be utilized to perform one or more of the operations discussed with reference to Fig. 4. [0024] Referring to Figs. 1-4, at an operation 402, the method 400 starts with receiving a memory access request. For example, a memory access request (such as a read or write access) may be generated by one of the endpoints 224 and received by a corresponding virtualization logic 222 through one of the ports 226 at operation 402. At an operation 404, it may be determined whether an entry corresponding to the memory access request exists in a cache. In an embodiment, the virtualization logic 222 may access the IOTLB 220, the RC cache 210, and/or combinations thereof (such as shown in Fig. 3) at operation 404. If a corresponding entry is absent, the data may be fetched into the cache at operation 406 (e.g., by the virtualization logic 222 and/or the prefetch logic 216).
[0025] In an embodiment, corresponding data may have been pre-fetched into cache by the logic 216 prior to operation 402. In one embodiment, the prefetch request is issued by one of the endpoint devices 224 to fetch-ahead and maintain coherent copies of the targeted address location. These prefetch requests also would enable to warm up the IOTLB 220, RC cache 210, and/or combinations thereof; the entries would be allocated and cached until the request is issued by the device. The demand request ACH settings would determine if the entry in the IOTLB 220, RC cache 210, and/or combinations thereof, needs to be maintained or tagged for replacement. [0026] At an operation 408, it may be determined (e.g., by the virtualization logic 222) whether the memory access request includes a hint (such as one or more bits of the memory access request). If no hint exists, the memory access request may be processed at an operation 410, e.g., by translating HPA and GPA addresses and/or physical/virtual addresses by reference to entries within the IOTLB 220, RC cache 210, and/or combinations thereof. In one embodiment, address translation caching and/or I/O cache performance in virtualized environments performance may be improved based on I/O device traffic hints (which may be also referred to herein as access control hints (ACHs)). For example, ACHs may be supplied by an I/O device (e.g., one of the endpoints 224) in the memory request (e.g., over PCIe) to indicate if the device would access the same address again. Accordingly, an operation 412 may determine whether the hint indicates future access to the same address. This information may be stored in one or more bits corresponding to a cache entry (e.g., an entry within the IOTLB 220, RC cache 210, and/or combinations thereof) that would be useful in cache line replacement policies, for example, where cached translations without the intended re-use bit set (or cleared depending on the implementation) would be candidates for replacements. In one embodiment, the logic 222 may perform operation 412. If no future access is indicated, the method 400 resumes with operation 410. Otherwise, the corresponding entry information may be updated at operation 414 (e.g., one or more bits for a corresponding entry in the IOTLB 220, RC cache 210, and/or combinations thereof may be updated by the corresponding logic 222). After operation 414, the method 400 resumes at operation 410.
[0027] In some embodiments, consolidating IOTLB 220 and RC cache 210 structures into a combined IOTLB cache and RC cache structure (which may be referred to herein as an I/O cache) may provide improved performance (e.g., improve latency for I/O transactions) and/or a more effective utilization of silicon real-estate (e.g., reduce the total number of gates). In an embodiment, snoops issued by a processor (e.g., one or more of the processors 202) would look up in the RC cache 210 (or the combined I/O cache) using the physical address, the I/O accesses would look up the address in the RC cache 210 (or the combined I/O cache) based on GPA. [0028] In some embodiments, various cache replacement policies may be applied to the RC cache 210, IOTLB 220, and/or the combinations thereof. For example, some replacement policies may implement random replacement policies, whereas others may implement least recently used (LRU) policies. [0029] Accordingly, in some embodiments, the address translation latency and/or latency associated with servicing I/O requests may be reduced. Also, consolidation of storage (e.g., address or data) structures used for RC cache 210 and IOTLB 220 (e.g., into a single I/O cache) may yield improved silicon efficiency and better performance or silicon-area (e.g., through a reduction in gate count). [0030] In various embodiments of the invention, the operations discussed herein, e.g., with reference to Figs. 1-4, may be implemented as hardware (e.g., circuitry), software, firmware, microcode, or combinations thereof, which may be provided as a computer program product, e.g., including a machine-readable or computer-readable medium having stored thereon instructions (or software procedures) used to program a computer to perform a process discussed herein. Also, the term "logic" may include, by way of example, software, hardware, or combinations of software and hardware. The machine-readable medium may include a storage device such as those discussed herein.
[0031] For example, a storage device as discussed herein may include volatile and/or nonvolatile memory (or storage). Nonvolatile memory may include one or more of the following: read-only memory (ROM), programmable ROM (PROM), erasable PROM (EPROM), electrically EPROM (EEPROM), a disk drive, a floppy disk, a compact disk ROM (CD-ROM), a digital versatile disk (DVD), flash memory, a magneto-optical disk, or other types of nonvolatile machine-readable media capable of storing electronic data (e.g., including instructions). Volatile storage (or memory) may include devices such as random access memory (RAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), static RAM (SRAM), etc. [0032] Additionally, such computer-readable media may be downloaded as a computer program product, wherein the program may be transferred from a remote computer (e.g., a server) to a requesting computer (e.g., a client) by way of data signals embodied in a carrier wave or other propagation medium via a communication link (e.g., a bus, a modem, or a network connection).
[0033] Reference in the specification to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least an implementation. The appearances of the phrase "in one embodiment" in various places in the specification may or may not be all referring to the same embodiment.
[0034] Also, in the description and claims, the terms "coupled" and
"connected," along with their derivatives, may be used. In some embodiments of the invention, "connected" may be used to indicate that two or more elements are in direct physical or electrical contact with each other. "Coupled" may mean that two or more elements are in direct physical or electrical contact. However, "coupled" may also mean that two or more elements may not be in direct contact with each other, but may still cooperate or interact with each other. [0035] Thus, although embodiments of the invention have been described in language specific to structural features and/or methodological acts, it is to be understood that claimed subject matter may not be limited to the specific features or acts described. Rather, the specific features and acts are disclosed as sample forms of implementing the claimed subject matter.

Claims

CLAIMSWhat is claimed is:
1. An apparatus comprising: a cache to store one or more entries, wherein each entry corresponds to an input/output (I/O) memory access request between a guest physical address (GPA) and a host physical address (HPA); and a first logic to receive a first I/O memory access request from a endpoint device and to determine whether the first I/O memory access request comprises a future access hint associated with an address, wherein the first logic is to cause an update to one or more bits of a corresponding cache entry in response to a determination that the first I/O memory access request comprises the hint.
2. The apparatus of claim 1, wherein the endpoint device is to generate the memory access request.
3. The apparatus of claim 1, further comprising a prefetch logic to prefetch data into the cache in response to a request issued by the endpoint device.
4. The apparatus of claim 1, wherein the endpoint device comprises a peripheral component interconnect (PCI) express device.
5. The apparatus of claim 1, wherein the future access hint is to indicate that future access is to be made to the address.
6. The apparatus of claim 1, wherein one or more of the first logic, one or more processor cores, or the cache are on a same integrated circuit die.
7. The apparatus of claim 1, wherein the cache comprises one or more of a root complex cache, an I/O translation look-aside buffer (IOTLB), or combinations thereof.
8. The apparatus of claim 1, wherein the cache is a shared or private cache.
9. The apparatus of claim 1, wherein the cache comprises one or more of a level 1 (Ll) cache, a level 2 (L2) cache, a level 3 (L3), a mid-level cache, a last level cache (LLC), or combinations thereof.
10. The apparatus of claim 1, further comprising a root port to couple the first logic and the endpoint device.
11. A method comprising: receiving a first input/output (I/O) memory access request from an endpoint device; storing one or more entries in a cache, wherein each entry corresponds to an input/output (I/O) memory access request between a guest physical address (GPA) and a host physical address (HPA); and determining whether the first I/O memory access request comprises a future access hint associated with an address, wherein the future access hint is to indicate that future access is to be made to the address.
12. The method of claim 11, further comprising updating one or more bits of a corresponding cache entry in response to a determination that the first I/O memory access request comprises the hint.
13. The method of claim 11, further comprising replacing entries in the cache that do not comprise a hint prior to entries that comprise a hint.
14. The method of claim 11, further comprising translating addresses corresponding to the first I/O memory access.
15. A system comprising: a memory to store one or more entries; a cache to store one or more entries corresponding to the one or more entries stored in the memory, wherein each entry of the cache is to corresponds to an input/output (I/O) memory access request between a guest physical address (GPA) and a host physical address (HPA); and a first logic to receive a first I/O memory access request from a endpoint device and to determine whether the first I/O memory access request comprises a future access hint associated with an address, wherein the first logic is to cause an update to one or more bits of a corresponding cache entry in response to a determination that the first I/O memory access request comprises the hint.
16. The system of claim 15, wherein the endpoint device is to generate the memory access request.
17. The system of claim 15, further comprising a prefetch logic to prefetch data into the cache in response to a request issued by the endpoint device.
18. The system of claim 15, wherein the endpoint device comprises a peripheral component interconnect (PCI) express device.
19. The system of claim 15, wherein the future access hint is to indicate that future access is to be made to the address.
20. The system of claim 15, further comprising a display device coupled to an uncore that comprises the cache.
PCT/US2008/077819 2007-09-28 2008-09-26 Address translation caching and i/o cache performance improvement in virtualized environments WO2009045884A2 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN200880110445.8A CN101868786A (en) 2007-09-28 2008-09-26 Address translation caching and I/O cache performance improvement in virtualized environments
RU2010104040/08A RU2483347C2 (en) 2007-09-28 2008-09-26 Caching apparatus, method and system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US11/906,176 2007-09-28
US11/906,176 US8161243B1 (en) 2007-09-28 2007-09-28 Address translation caching and I/O cache performance improvement in virtualized environments

Publications (2)

Publication Number Publication Date
WO2009045884A2 true WO2009045884A2 (en) 2009-04-09
WO2009045884A3 WO2009045884A3 (en) 2009-06-25

Family

ID=40418368

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2008/077819 WO2009045884A2 (en) 2007-09-28 2008-09-26 Address translation caching and i/o cache performance improvement in virtualized environments

Country Status (5)

Country Link
US (2) US8161243B1 (en)
CN (2) CN101868786A (en)
DE (1) DE102008048421A1 (en)
RU (1) RU2483347C2 (en)
WO (1) WO2009045884A2 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7707383B2 (en) 2006-11-21 2010-04-27 Intel Corporation Address translation performance in virtualized environments
US8161243B1 (en) 2007-09-28 2012-04-17 Intel Corporation Address translation caching and I/O cache performance improvement in virtualized environments
US9389895B2 (en) 2009-12-17 2016-07-12 Microsoft Technology Licensing, Llc Virtual storage target offload techniques
US9632557B2 (en) 2011-09-30 2017-04-25 Intel Corporation Active state power management (ASPM) to reduce power consumption by PCI express components

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI506434B (en) * 2010-03-29 2015-11-01 Via Tech Inc Prefetcher,method of prefetch data,computer program product and microprocessor
AT510716B1 (en) * 2011-04-08 2012-06-15 Albrecht Dipl Ing Kadlec PURE ALLOCATION CACHE FOR REAL-TIME SYSTEMS
US8904113B2 (en) 2012-05-24 2014-12-02 International Business Machines Corporation Virtual machine exclusive caching
CN104077171B (en) * 2013-03-28 2017-12-15 华为技术有限公司 Processing method and equipment during scheduling virtual machine
US9983893B2 (en) 2013-10-01 2018-05-29 Red Hat Israel, Ltd. Handling memory-mapped input-output (MMIO) based instructions using fast access addresses
US9916173B2 (en) * 2013-11-25 2018-03-13 Red Hat Israel, Ltd. Facilitating execution of MMIO based instructions
US20150286529A1 (en) * 2014-04-08 2015-10-08 Micron Technology, Inc. Memory device having controller with local memory
GB2528842B (en) * 2014-07-29 2021-06-02 Advanced Risc Mach Ltd A data processing apparatus, and a method of handling address translation within a data processing apparatus
US9846610B2 (en) 2016-02-08 2017-12-19 Red Hat Israel, Ltd. Page fault-based fast memory-mapped I/O for virtual machines
US10310547B2 (en) * 2016-03-05 2019-06-04 Intel Corporation Techniques to mirror a command/address or interpret command/address logic at a memory device
US10324857B2 (en) * 2017-01-26 2019-06-18 Intel Corporation Linear memory address transformation and management
US10324858B2 (en) * 2017-06-12 2019-06-18 Arm Limited Access control
CN107341115B (en) * 2017-06-30 2021-07-16 联想(北京)有限公司 Virtual machine memory access method and system and electronic equipment
CN108021518B (en) * 2017-11-17 2019-11-29 华为技术有限公司 A kind of data interactive method and calculate equipment
US10929310B2 (en) 2019-03-01 2021-02-23 Cisco Technology, Inc. Adaptive address translation caches
KR20230105441A (en) 2022-01-04 2023-07-11 삼성전자주식회사 Storage system and storage device and operating method thereof

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6886085B1 (en) * 2000-04-19 2005-04-26 International Business Machines Corporation Method and apparatus for efficient virtual memory management
US20050149562A1 (en) * 2003-12-31 2005-07-07 International Business Machines Corporation Method and system for managing data access requests utilizing storage meta data processing
WO2006041471A2 (en) * 2004-10-06 2006-04-20 Thomson Licensing Method and system for caching data
US20070143565A1 (en) * 2005-12-15 2007-06-21 International Business Machines Corporation Apparatus and method for selectively invalidating entries in an address translation cache

Family Cites Families (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6128703A (en) * 1997-09-05 2000-10-03 Integrated Device Technology, Inc. Method and apparatus for memory prefetch operation of volatile non-coherent data
US6009488A (en) 1997-11-07 1999-12-28 Microlinc, Llc Computer having packet-based interconnect channel
US6542968B1 (en) * 1999-01-15 2003-04-01 Hewlett-Packard Company System and method for managing data in an I/O cache
US6789168B2 (en) * 2001-07-13 2004-09-07 Micron Technology, Inc. Embedded DRAM cache
US6981099B2 (en) * 2002-12-16 2005-12-27 Sun Microsystems, Inc. Smart-prefetch
US20040123278A1 (en) * 2002-12-23 2004-06-24 Murthi Nanja Persistent cache apparatus and methods
US6978351B2 (en) * 2002-12-30 2005-12-20 Intel Corporation Method and system to improve prefetching operations
US20040233146A1 (en) * 2003-05-21 2004-11-25 Nguyen Don J. Selective window display
US7941554B2 (en) 2003-08-01 2011-05-10 Microsoft Corporation Sparse caching for streaming media
US20050160229A1 (en) * 2004-01-16 2005-07-21 International Business Machines Corporation Method and apparatus for preloading translation buffers
US7930503B2 (en) * 2004-01-26 2011-04-19 Hewlett-Packard Development Company, L.P. Method and apparatus for operating multiple security modules
US7340582B2 (en) * 2004-09-30 2008-03-04 Intel Corporation Fault processing for direct memory access address translation
US7330940B2 (en) * 2005-02-02 2008-02-12 Hewlett-Packard Development Company, L.P. Method and system for cache utilization by limiting prefetch requests
US20060288130A1 (en) 2005-06-21 2006-12-21 Rajesh Madukkarumukumana Address window support for direct memory access translation
US8490065B2 (en) * 2005-10-13 2013-07-16 International Business Machines Corporation Method and apparatus for software-assisted data cache and prefetch control
US7395407B2 (en) * 2005-10-14 2008-07-01 International Business Machines Corporation Mechanisms and methods for using data access patterns
US7653803B2 (en) * 2006-01-17 2010-01-26 Globalfoundries Inc. Address translation for input/output (I/O) devices and interrupt remapping for I/O devices in an I/O memory management unit (IOMMU)
US7669028B2 (en) * 2006-02-07 2010-02-23 International Business Machines Corporation Optimizing data bandwidth across a variable asynchronous clock domain
US7739474B2 (en) * 2006-02-07 2010-06-15 International Business Machines Corporation Method and system for unifying memory access for CPU and IO operations
US7716423B2 (en) * 2006-02-07 2010-05-11 International Business Machines Corporation Pseudo LRU algorithm for hint-locking during software and hardware address translation cache miss handling modes
TW200802175A (en) * 2006-06-28 2008-01-01 Giga Byte Tech Co Ltd Hot-pluggable video display card and computer system using the same
US7707383B2 (en) 2006-11-21 2010-04-27 Intel Corporation Address translation performance in virtualized environments
US8161243B1 (en) 2007-09-28 2012-04-17 Intel Corporation Address translation caching and I/O cache performance improvement in virtualized environments

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6886085B1 (en) * 2000-04-19 2005-04-26 International Business Machines Corporation Method and apparatus for efficient virtual memory management
US20050149562A1 (en) * 2003-12-31 2005-07-07 International Business Machines Corporation Method and system for managing data access requests utilizing storage meta data processing
WO2006041471A2 (en) * 2004-10-06 2006-04-20 Thomson Licensing Method and system for caching data
US20070143565A1 (en) * 2005-12-15 2007-06-21 International Business Machines Corporation Apparatus and method for selectively invalidating entries in an address translation cache

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ABRAMSON, D. ET AL.: 'Intel ® Virtualization Technology for Directed I/O.' INTEL ® TECHNOLOGY JOURNAL. vol. 10, no. 03, 10 August 2006, ISSN 1535-864X pages 179 - 192 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7707383B2 (en) 2006-11-21 2010-04-27 Intel Corporation Address translation performance in virtualized environments
US8161243B1 (en) 2007-09-28 2012-04-17 Intel Corporation Address translation caching and I/O cache performance improvement in virtualized environments
US9389895B2 (en) 2009-12-17 2016-07-12 Microsoft Technology Licensing, Llc Virtual storage target offload techniques
US10248334B2 (en) 2009-12-17 2019-04-02 Microsoft Technology Licensing, Llc Virtual storage target offload techniques
US9632557B2 (en) 2011-09-30 2017-04-25 Intel Corporation Active state power management (ASPM) to reduce power consumption by PCI express components

Also Published As

Publication number Publication date
CN101868786A (en) 2010-10-20
CN101398787A (en) 2009-04-01
US20120203950A1 (en) 2012-08-09
RU2010104040A (en) 2011-08-20
US8161243B1 (en) 2012-04-17
US8407422B2 (en) 2013-03-26
WO2009045884A3 (en) 2009-06-25
RU2483347C2 (en) 2013-05-27
DE102008048421A1 (en) 2009-04-09

Similar Documents

Publication Publication Date Title
US8407422B2 (en) Address translation caching and I/O cache performance improvement in virtualized environments
US8250254B2 (en) Offloading input/output (I/O) virtualization operations to a processor
US7636832B2 (en) I/O translation lookaside buffer performance
US7707383B2 (en) Address translation performance in virtualized environments
US9665486B2 (en) Hierarchical cache structure and handling thereof
US7623134B1 (en) System and method for hardware-based GPU paging to system memory
US8285969B2 (en) Reducing broadcasts in multiprocessors
US6725337B1 (en) Method and system for speculatively invalidating lines in a cache
US8230179B2 (en) Administering non-cacheable memory load instructions
US7426627B2 (en) Selective address translation for a resource such as a hardware device
US20070143546A1 (en) Partitioned shared cache
US20120102273A1 (en) Memory agent to access memory blade as part of the cache coherency domain
US9563568B2 (en) Hierarchical cache structure and handling thereof
US9418016B2 (en) Method and apparatus for optimizing the usage of cache memories
US20090006668A1 (en) Performing direct data transactions with a cache memory
US20080109624A1 (en) Multiprocessor system with private memory sections
EP3671473A1 (en) A scalable multi-key total memory encryption engine
JP2012520533A (en) On-die system fabric block control
US20100332762A1 (en) Directory cache allocation based on snoop response information
CN114328295A (en) Storage management apparatus, processor, related apparatus and related method
US20130007376A1 (en) Opportunistic snoop broadcast (osb) in directory enabled home snoopy systems
US8661169B2 (en) Copying data to a cache using direct memory access
US7535918B2 (en) Copy on access mechanisms for low latency data movement
US10013352B2 (en) Partner-aware virtual microsectoring for sectored cache architectures
US10372622B2 (en) Software controlled cache line replacement within a data property dependent cache segment of a cache using a cache segmentation enablement bit and cache segment selection bits

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 200880110445.8

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08834867

Country of ref document: EP

Kind code of ref document: A2

WWE Wipo information: entry into national phase

Ref document number: 2010104040

Country of ref document: RU

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 08834867

Country of ref document: EP

Kind code of ref document: A2