US9367478B2 - Controlling direct memory access page mappings - Google Patents

Controlling direct memory access page mappings Download PDF

Info

Publication number
US9367478B2
US9367478B2 US14/281,979 US201414281979A US9367478B2 US 9367478 B2 US9367478 B2 US 9367478B2 US 201414281979 A US201414281979 A US 201414281979A US 9367478 B2 US9367478 B2 US 9367478B2
Authority
US
United States
Prior art keywords
page
component
memory
request
logical partition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
US14/281,979
Other versions
US20150205729A1 (en
Inventor
Cary L. Bates
Lee N. Helgeson
Justin K. King
Michelle A. Schlicht
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Priority to US14/281,979 priority Critical patent/US9367478B2/en
Assigned to INTERNATIONAL BUSINESS MACHINES CORPORATION reassignment INTERNATIONAL BUSINESS MACHINES CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: BATES, CARY L., KING, JUSTIN K., HELGESON, LEE N., SCHLICHT, MICHELLE A.
Publication of US20150205729A1 publication Critical patent/US20150205729A1/en
Application granted granted Critical
Publication of US9367478B2 publication Critical patent/US9367478B2/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • G06F12/1081Address translation for peripheral access to main memory, e.g. direct memory access [DMA]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • G06F12/1009Address translation using page tables, e.g. page table structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/20Handling requests for interconnection or transfer for access to input/output bus
    • G06F13/28Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access DMA, cycle steal
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45579I/O management, e.g. providing access to device drivers or storage
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45583Memory management, e.g. access or allocation

Definitions

  • the disclosure relates generally to information technology, and more particularly to mapping memory pages used for direct memory address (DMA) transfers between memory and input/output (I/O) devices in a virtualized computer system.
  • DMA direct memory address
  • virtualization may refer to different aspects of computing.
  • hardware virtualization generally refers to methods that allow multiple applications to run concurrently on a computer.
  • Hardware virtualization may be implemented by a hypervisor, which may also be referred to as a virtual machine manager.
  • virtual memory refers to techniques in which virtual addresses are mapped to actual (physical) addresses in the memory of a computer system. With virtual addressing mechanisms, applications are able to address more memory than is physically available in the main memory of the system. In addition, application programming is simplified as applications are not required to manage memory that is shared with other applications.
  • a method may include receiving a first request to map a first page of the memory.
  • the request may identify a first requester.
  • the method may include determining a first logical partition associated with the first page and determining that an attribute of the first logical partition limits access to individual pages of the first logical partition to a single requester.
  • the method may include determining that the first page is available to be mapped to a requester.
  • the method may include mapping the first page to the first requester and setting a flag to indicate that the first page is unavailable for an additional mapping.
  • the first request is received from a device driver on behalf of an input/output adapter and the first requester is the input/output adapter.
  • the first page is requested for use in a direct memory access (DMA) transfer.
  • a method may include processing a DMA access request from the first requester. The DMA access request may specify the first page.
  • a method may include rejecting a DMA access request from a second requester. Again, the DMA access request may specify the first page.
  • the determining that the first page is available to be mapped to a requester may include reading a target exclusive page flag stored in a target exclusive page table.
  • the determining that the first page is available to be mapped to a first requester may include reading a target exclusive page flag stored in a translation control entry table. Further, the request may identify a first logical address of the first page, and the determining the first logical partition associated with the first page may include translating the first logical address into a first physical address.
  • Various embodiments are directed to a system that includes a processor, a memory, at least one input/output adapter, and a hypervisor.
  • the hypervisor may be configured to receive a first request to map a first page of the memory for use in a direct memory access (DMA) transfer operation. The request may identify a first component.
  • the hypervisor may be additionally configured to determine a first logical partition associated with the first page, and to determine that an attribute of the first logical partition limits access to individual pages of the first logical partition to a single component. Further, the hypervisor may determine that the first page is available to be mapped. The hypervisor may map the first page to the first component and set a flag to indicate that the first page is unavailable for an additional mapping.
  • DMA direct memory access
  • the first request is received from a device driver on behalf of an input/output adapter and the first component is the input/output adapter.
  • the hypervisor processes a DMA access request from the first component.
  • the DMA access request may specify the first page.
  • the hypervisor rejects a DMA access request from a component other than the first component. Again, the DMA access request may specify the first page.
  • the hypervisor may determine that that the first page is available to be mapped by reading a target exclusive page flag stored in a target exclusive page table. In other embodiments, the hypervisor may determine that the first page is available to be mapped includes reading a target exclusive page flag stored in a translation control entry table.
  • Yet other embodiments are directed to a computer readable storage medium having instructions stored thereon for controlling access to a memory which, when executed, cause a processor to perform various operations.
  • the operations may include receiving a first request to map a first page of the memory.
  • the request may identify a first requester and specify that the first page is requested for use in a direct memory access (DMA) transfer.
  • the operations may include determining a first logical partition associated with the first page, and determining that an attribute of the first logical partition limits access to individual pages of the first logical partition to a single requester.
  • the operations may include determining that the first page is available to be mapped to a requester.
  • the operations may include mapping the first page to the first requester and setting a flag to indicate that the first page is unavailable for an additional mapping.
  • FIG. 1 depicts a block diagram of an exemplary computer system in which various embodiments of the invention may be implemented.
  • FIG. 2 is a block diagram showing a view of the computer hardware, hypervisor, operating systems, applications, and I/O adapters of the computer system of FIG. 1 in a virtualized environment, according to various embodiments.
  • FIG. 3 is a diagram of an alternative view of the memory, TCE tables, and I/O adapters of the computer system of FIG. 1 , according to various embodiments.
  • FIG. 4 is a diagram of the memory, TCE tables, and I/O adapters of the computer system of FIG. 3 showing an alternative mapping of the I/O adapters to the memory, according to various embodiments.
  • FIG. 5 illustrates a format of an entry in a translation control entry table according to various embodiments.
  • FIG. 6 illustrates a target exclusive page table for a logical partition according to an embodiment.
  • FIG. 7 is a flow diagram of a method for processing a mapping request according to various embodiments.
  • FIG. 8 illustrates an exclusive target table according to various embodiments.
  • a hypervisor may not prevent two or more components, such as I/O adapters, firmware, or virtual I/O servers from mapping the same page of memory for use in DMA transfers. Allowing two or more components to map the same page is sometimes desirable, but at other times it is not desirable. For example, a bug (defect) can occur when multiple mappings are allowed and DMA writes from two or more components conflict with one another. These bugs are generally difficult to find.
  • methods and systems are provided for controlling access to a page of memory of a logical partition. When a request to map a page of memory for use in a DMA transfer operation is received, a logical partition associated with the page is determined.
  • an attribute of the logical partition limits access to individual pages of the logical partition to a single requester, such as a single input/output adapter or other component. If the logical partition limits access, it is determined whether the page is available to be mapped to a component, i.e., the page has not already been mapped. If the page is available, the page is mapped to the requester. In addition, a page exclusivity attribute or flag may be set to indicate that the page is unavailable for an additional mapping. This ensures that the requester has exclusive access to the page. Exclusivity is maintained until the page exclusivity flag is reset to indicate that the page is available.
  • Each logical partition may have an exclusive target for DMA (ET for DMA) attribute, which may be stored in an exclusive target table. If the ET for DMA attribute of the logical partition limits access to individual pages of the partition to a single requester, a target exclusive page table is provided for the logical partition. Page exclusivity attributes or flags may be stored in the target exclusive page table. In alternative embodiments, page exclusivity may be maintained using a bit of an entry of a TCE table or a bit of an entry of a page translation table.
  • ET for DMA exclusive target for DMA
  • Page exclusivity attributes or flags may be stored in the target exclusive page table. In alternative embodiments, page exclusivity may be maintained using a bit of an entry of a TCE table or a bit of an entry of a page translation table.
  • a DMA write request is received for a logical partition having the ET for DMA attribute enabled, and the request specifies a page that has been exclusively mapped (i.e., the page exclusivity attribute is set) to a component, such as an input/output adapter, the request is only processed if the requester is the exclusively-mapped component.
  • the DMA write request is rejected if the requester is a component other than the exclusively-mapped component.
  • FIG. 1 depicts a high-level block diagram of an exemplary computer system 100 for implementing various embodiments.
  • the mechanisms and apparatus of the various embodiments disclosed herein apply equally to any appropriate computing system.
  • the major components of the computer system 100 may include one or more processors 102 , a memory 104 , one or more input/output (I/O) adapters 106 A- 106 C, all of which are communicatively coupled, directly or indirectly, for inter-component communication via a host bus 108 , a memory bus 110 , a bus 112 , an I/O bus 114 , a bus interface unit (IF) 116 , and an I/O bus interface unit 118 .
  • I/O input/output
  • the computer system 100 may contain one or more general-purpose programmable CPUs, herein generically referred to as the processor 102 .
  • the computer system 100 may contain multiple processors 102 ; however, in another embodiment, the computer system 100 may alternatively include a single CPU.
  • Each processor 102 executes instructions stored in the memory 104 and may include one or more levels of on-board cache.
  • Each processor 102 may include one or more cores 103 , e.g., cores 103 A- 103 D.
  • the memory 104 may include a random-access semiconductor memory, storage device, or storage medium (either volatile or non-volatile) for storing or encoding data and programs.
  • the memory 104 represents the entire virtual memory of the computer system 100 , and may also include the virtual memory of other computer systems coupled to the computer system 100 or connected via a network.
  • the memory 104 is conceptually a single monolithic entity, but in other embodiments the memory 104 is a more complex arrangement, such as a hierarchy of caches and other memory devices.
  • memory may exist in multiple levels of caches, and these caches may be further divided by function, so that one cache holds instructions while another holds non-instruction data, which is used by the processor 102 .
  • Memory may be further distributed and associated with different CPUs or sets of CPUs, as is known in any of various so-called non-uniform memory access (NUMA) computer architectures.
  • NUMA non-uniform memory access
  • the memory 104 may store all or a portion of a hypervisor 120 , one or more operating systems 122 , one or more device drivers 124 , and one or more application programs 126 .
  • a device driver 124 may be a computer program that controls a particular device using low-level commands that the device understands. The device driver 124 may translate higher-level application code to low-level, device-specific commands.
  • a portion of the memory 104 may be allocated for one or more DMA buffers 128 , one or more page translation tables (PTT) 127 , one or more translation control entry (TCE) tables 129 , an exclusive target (ET) table 133 , and a target exclusive page (TEP) table 135 .
  • PTT page translation tables
  • TCE translation control entry
  • ET exclusive target
  • TEP target exclusive page
  • the memory 104 may store a virtual I/O server 131 .
  • These programs and data structures are illustrated as being included within the memory 104 in the computer system 100 , however, in other embodiments, some or all of them may be on different computer systems and may be accessed remotely, e.g., via a network.
  • the computer system 100 may use virtual addressing mechanisms that allow the programs of the computer system 100 to behave as if they only have access to a large, single storage entity instead of access to multiple, smaller storage entities.
  • the processor 102 and various devices, such as the I/O adapters 106 A- 106 C, may use virtual addresses that are translated into physical addresses in the memory 104 .
  • the hypervisor 120 , operating systems 122 , application programs 126 , device drivers 124 , DMA buffers 126 , PTT tables 127 , TCE tables 129 , ET table 133 , TEP table 135 , and virtual I/O server 131 are illustrated as being included within the memory 104 , one or more of them are not necessarily all completely contained in the same storage device at the same time.
  • modules, units, and databases of the hypervisor 120 , operating systems 122 , application programs 126 , device drivers 124 , DMA buffers 128 , PTT tables 127 , TCE tables 129 , ET table 133 , TEP table 135 , and virtual I/O server 131 are illustrated as being separate entities in FIG. 1 , in other embodiments some of them, portions of some of them, or all of them may be packaged together.
  • the modules, units, and databases of the hypervisor 120 , operating systems 122 , application programs 126 , device drivers 124 , and virtual I/O server 131 may include instructions or statements that execute on the processor 102 or instructions or statements that are interpreted by instructions or statements that execute on the processor 102 to carry out the functions as further described below.
  • the modules, units, and databases of the hypervisor 120 , operating systems 122 , application programs 126 , device drivers 124 , and virtual I/O server 131 are implemented in hardware or firmware via semiconductor devices, chips, logical gates, circuits, circuit cards, and/or other physical hardware devices in lieu of, or in addition to, a processor-based system.
  • the modules, units, and databases of the hypervisor 120 , operating systems 122 , application programs 126 , device drivers 124 , and virtual I/O server 131 may include data in addition to instructions or statements.
  • the bus interface unit 116 may handle communications among the processor 102 , the memory 104 , and the I/O bus interface unit 118 .
  • the bus interface unit 116 may include a memory management unit (MMU) 130 .
  • the MMU 130 handles memory requests for the processor 102 .
  • the MMU 130 may translate processor-visible virtual addresses to physical addresses of the memory 104 .
  • one or more of the functions provided by the bus interface unit 116 may be on board an integrated circuit that also includes the processor 102 .
  • the I/O bus interface unit 118 may be coupled with the I/O bus 114 for transferring data to and from the various I/O units.
  • the I/O bus interface unit 118 may communicate with multiple I/O adapters 106 A, 106 B, and 106 C, which are also known as I/O processors (IOPs) or I/O interface units, through the I/O bus 114 .
  • the I/O bus interface unit 118 may include an I/O MMU 132 and a DMA unit 134 .
  • the I/O MMU 132 translates virtual addresses visible to various I/O devices to physical addresses of the memory 104 .
  • the DMA unit 134 may be used to transfer data between the memory 104 and the memory of any of the I/O adapters 106 A- 106 C.
  • the DMA 134 may provide two or more DMA channels.
  • the I/O adapters 106 may support communication with a variety of storage and I/O devices 136 A- 136 C.
  • the I/O adapters 106 A- 106 C may support the attachment of one or more disk drives or direct access storage devices.
  • the I/O adapters 106 A- 106 C may support the attachment of solid state memory devices.
  • the I/O adapters 106 A- 106 C may provide an interface to any of various other I/O devices or devices of other types, such as printers or fax machines.
  • the I/O adapters 106 A- 106 C may provide one or more communication paths from the computer system 100 to other digital devices and computer systems; these communication paths may include one or more networks.
  • an I/O adapter 106 may be a device for connecting SCSI, Fibre Channel, or eSATA devices. In various embodiments, an I/O adapter 106 may be a device for connecting to IDE, Ethernet, Firewire, PCIe, or USB buses. In an embodiment, an I/O adapter 106 may be a host Ethernet adapter.
  • the computer system 100 shown in FIG. 1 illustrates a particular bus structure providing a direct communication path among the processors 102 , the memory 104 , the bus interface 116 , and the I/O bus interface unit 118
  • the computer system 100 may include different buses or communication paths, which may be arranged in any of various forms, such as point-to-point links in hierarchical, star or web configurations, multiple hierarchical buses, parallel and redundant paths, or any other appropriate type of configuration.
  • the I/O bus interface unit 118 and the bus interface unit 116 are shown as single respective units, the computer system 100 may, in fact, contain multiple I/O bus interface units 118 or multiple bus interface units 116 .
  • multiple I/O devices 136 are shown as being coupled to the I/O bus 114 via various communications paths running through I/O adapters 106 , in other embodiments, some or all of the I/O devices are connected directly to one or more system I/O buses.
  • FIG. 2 is a block diagram showing a view of the computer hardware 202 , hypervisor 120 , operating systems 122 , applications 126 , and I/O adapters 106 of the computer system 100 of FIG. 1 in a virtualized environment, according to various embodiments.
  • the hardware 202 may include the processor 102 , memory 104 , buses 114 , and various other components of the computer system 100 .
  • the hypervisor 120 is used to implement logical partitions (LPAR) 204 , 206 , and 208 .
  • Applications APP 1 , 126 A and APP 2 , 126 B run under operating system 1 , 122 A in LPAR 2 , 206 .
  • Applications APP 3 , 126 C and APP 4 , 126 D run under operating system 2 , 122 B in LPAR 3 , 208 .
  • the virtual I/O server 131 operates in LPAR 1 , 204 .
  • FIGS. 3 and 4 are diagrams showing an alternative view of the memory 104 , TCE tables 129 A- 129 E, I/O adapters 106 A- 106 E of the computer system 100 of FIG. 1 , according to various embodiments. As shown in FIG. 3 and, in various embodiments, one TCE table 129 may be provided per I/O adapter 106 . FIGS. 3 and 4 are used to explain the use of the TCE tables 129 and the mapping between I/O adapters 106 and memory 104 . While the TCE tables 129 A, 129 B, 129 C, 129 D, and 129 E are depicted as being outside the memory 104 , this is to clarify their use; in practice, the TCE tables 129 are stored in the memory 104 .
  • FIGS. 3 and 4 show a type of map of the memory 104 commonly used in the art. Maps of this type may include multiple rows, with each row corresponding to a memory location. Each row has an address with the bottom row corresponding with the lowest possible address and the top row with the highest possible address.
  • diagonally striped regions P 1 -P 5 represent pages in memory, which may be a range of 4K of addresses.
  • a first range R 1 of addresses has been allocated for use by the LPAR 1
  • a second range R 2 of addresses has been allocated for use by the LPAR 2
  • a third range R 3 of addresses has been allocated for use by the LPAR 3 .
  • Data may be moved (read or written) between the system memory 104 and the I/O adapters 106 using DMA.
  • An I/O adapter uses virtual addresses when it makes a DMA transfer.
  • the virtual addresses are translated into physical addresses using a translation control entry (TCE) table 129 .
  • TCE translation control entry
  • address translations for the I/O adapters 106 A, 106 B, 106 C, 106 D, or 106 E use the adapter's respective TCE table 129 A, 129 B, 129 C, 129 D or 129 E.
  • Host bridge hardware e.g., I/O MMU 132 , uses a TCE table 129 to convert I/O bus logical addresses to physical real addresses.
  • FIG. 5 illustrates the format of an entry 500 in a TCE table 129 .
  • each entry in a TCE 129 is 64 bits.
  • Bits 12 - 63 contain the translation of an I/O bus page address to a memory page address. Memory and I/O pages may be 4K in size.
  • Bits 2 - 11 are reserved for firmware control.
  • Bits 1 and 0 are, respectively a read access bit, which, if set, authorizes an I/O adapter to read system memory, and a write access bit, which, if set, authorizes an I/O adapter to write to system memory.
  • FIG. 4 shows an alternative mapping of the I/O adapters 106 A- 106 E to the memory 104 of FIG. 3 , according to various embodiments. As shown in FIG. 4 , both of the I/O adapters 106 A and 106 B are mapped to the same memory page P 1 . In addition, both of the I/O adapters 106 D and 106 E are mapped to the same memory page P 4 .
  • mapping two or more I/O adapters to a single page is advantageous for performance reasons. For example, statically mapped pages avoid the overhead associated with remapping pages on an as-needed basis.
  • mapping two or more I/O adapters to a single page causes problems. DMA write operations from two I/O adapters may conflict with one another, causing corruption of the data stored in the common page. These types of problems may be highly dependent on timing, and as such, may be difficult to correct.
  • the operating system 122 creates and maintains the entries in the TCE tables 129 .
  • entries the TCE tables 129 are managed differently.
  • TCE tables 129 are only accessible by the hypervisor 120 .
  • an operating system 122 needs to create or modify an entry in a TCE table 129 , it must issue a hypervisor call, which may include as an argument a “logical real” address of the operating system.
  • the hypervisor 120 translates the “logical real” address of the client operating system 122 into a “physical real” address that the I/O hardware can understand.
  • the hypervisor 120 also enforces an isolation mechanism that prevents an operating system 120 from setting up a mapping to a region of memory 104 not allocated to it. However, the hypervisor 120 permits an individual I/O adapter 106 to be mapped to two or more partitions. In addition, in various embodiments, the hypervisor 120 does not prevent two different I/O adapters 106 that both belong to the same operating system 122 from mapping to the same page of memory. This last feature is illustrated in FIG. 4 .
  • Correctly-written device drivers may avoid problems caused by conflicting writes by multiple I/O adapters.
  • system firmware such as a flexible service processor (FSP) may be allowed to perform DMA write operations into the same page that is mapped to an I/O adapter.
  • FSP provides diagnostics, initialization, configuration, run-time error detection and correction.
  • FSP connects a computer system to a hardware management console.
  • An FSP may have its own TCE table.
  • An operating system 122 or the firmware itself may request the hypervisor 120 to map a page for DMA transfers by the firmware. Mappings by these components are not controlled by a device driver 124 .
  • the virtual I/O server 131 may establish its own DMA mappings.
  • a correctly-written device driver 124 would not prevent a page conflict with firmware or virtual I/O server mappings.
  • a computer system 100 includes an exclusive target (ET) table 133 .
  • the ET table 133 is used to store an exclusive target for DMA (ET for DMA or ET DMA) attribute for each LPAR.
  • the ET DMA attribute may be used to control access by multiple requesters e.g., multiple I/O adapters, to the same page in memory. If an LPAR has its ET DMA attribute enabled, the hypervisor 120 will allow a single logical page to be mapped to a maximum of one requester, e.g., one I/O adapter 106 , for DMA write purposes.
  • a user may wish to enable the ET DMA attribute for an LPAR in a development environment for debug purposes.
  • a user may wish to enable the ET DMA attribute on critical LPARS. The user may wish to disable the ET DMA attribute for performance-sensitive or minimal-memory LPARS.
  • FIG. 8 illustrates an ET table 133 according to an embodiment.
  • the ET table 133 may include a field to identify the LPAR and a field for an associated ET DMA attribute.
  • Page exclusivity may be maintained in several ways.
  • page exclusivity may be maintained using a target exclusive page table (TEP) 135 .
  • FIG. 6 depicts one example of a TEP table 135 .
  • the TEP 135 includes a field to identify each logical page in an LPAR and an associated bit to store a page exclusivity attribute or flag for the page.
  • the field to identify logical pages may be eliminated and the identification of logical pages may be determined implicitly from the sequential position of the page exclusivity flag in the table. This embodiment would require one (1) bit of hypervisor storage per 4 Kbytes of virtual machine memory, for an overhead of 0.003%, which may generally be much better than the overhead for the DMA translation table, e.g., 1.95%.
  • page exclusivity may be maintained using a bit of the TCE entry.
  • one bit of the translation index (bits 12 - 63 of FIG. 5 ) may be used for the TEP flag.
  • one bit of a page translation table (PTT) 127 entry may be used for the TEP flag.
  • the ET DMA attribute may be changed in a virtual machine power off/on cycle, i.e., an LPAR power off/on cycle.
  • the TCE table 129 may be reset so that it is in a known state, e.g., no I/O adapter mappings.
  • the ET DMA attribute may be enabled for the LPAR.
  • the hypervisor 120 may perform a search of all mappings in all TCE tables 129 before accepting a request to enable the ET DMA attribute for an LPAR. If the search does not identify any pages as being mapped to multiple I/O adapters, the hypervisor 120 would enable the requested ET DMA attribute for the LPAR.
  • an I/O adapter via a device driver, or a firmware component (all of which may be referred to collectively, as “components”) seeks to establish a DMA write-capable mapping, it makes a hypervisor call to request that the hypervisor put the mapping in the corresponding TCE table 129 for the component.
  • the hypervisor call is necessary because TCE tables 129 are only accessible by the hypervisor 120 in virtualized systems.
  • the operating system 122 , device driver 124 (on behalf of an I/O adapter), or firmware component may seek to establish a DMA write-capable mapping at any time. For example, an operating system 122 , device driver 124 (on behalf of an I/O adapter), or firmware component may request a mapping at initialization time or at the time a particular DMA operation is being set up.
  • FIG. 7 shows a flow chart of a method 700 for processing a mapping request according to various embodiments.
  • a mapping request is received by a hypervisor 120 .
  • the requester may be an operating system, an I/O adapter, firmware, or other component.
  • the request may be received directly from a requester or indirectly, such as when the requester is an I/O adapter and a device driver makes the request on behalf of the I/O adapter.
  • the hypervisor 120 When the hypervisor 120 receives the mapping request, it translates the “logical real” address of the page into a physical one (operation 704 ). The hypervisor 120 then determines which LPAR the physical address is allocated to and whether the ET DMA attribute for that LPAR is enabled (operation 706 ). If the ET DMA attribute is not enabled, the hypervisor 120 adds an entry to the TCE table for the requester (operation 712 ).
  • the hypervisor 120 examines at least the write bit of the proposed entry contained in the request (operation 708 ). In various embodiments, the hypervisor 120 may additionally examine both the read and write access bits of the proposed entry contained in the request. If the write bit is disabled, the hypervisor 120 adds an entry to the TCE table (operation 712 ). However, if the write bit is enabled, the hypervisor 120 , the hypervisor 120 determines whether the page has previously been exclusively mapped to a requester, e.g., an I/O adapter 106 (operation 710 ). The hypervisor 120 may determine whether the page has been previously mapped from the target exclusive page (TEP) table 135 .
  • TEP target exclusive page
  • the hypervisor 120 rejects the mapping request (operation 714 ).
  • the hypervisor 120 may return an error code to the requester, e.g., an operating system, device driver for an I/O adapter, firmware, or other component. If the page is not exclusively associated with a particular requester, the hypervisor 120 atomically sets a flag indicating that the page has been exclusively mapped to the requesting entity, e.g., an I/O adapter 106 . The hypervisor 120 may then insert the entry into the TCE table 135 (operation 712 ).
  • the hypervisor 120 may set a flag indicating that a page has been exclusively mapped an I/O adapter 106 or may perform all or part of the method 700 for processing a mapping request in an atomic operation.
  • the atomicity may be provided by any known suitable technique, e.g., a lock, a mutual exclusion algorithm (mutex), or by an “ldarx” (Load Double word Reserve Indexed) instruction used in conjunction with a subsequent “stdcx” (Store Double Word Conditional Indexed) instruction.
  • a hypervisor 120 supports a page un-map request, that process may be used. However, if the hypervisor 120 that does not explicitly support a page un-map request, the method 700 may be modified and used in its modified form to un-map a page exclusively allocated to an I/O adapter 106 or other component. In this alternative, the method 700 may be used to establish a new mapping for the page with the read and write bits disabled. This will effectively remove a previously established exclusive mapping. Operations 702 , 706 , and 706 are unchanged. Operations 708 and 710 are modified to recognize that if a proposed entry contained in the request specifies that the write bit is disabled but the page has been mapped, the page exclusivity flag for the page should be disabled.
  • the computer system 100 may be a multi-user mainframe computer system, a single-user system, or a server computer or similar device that has little or no direct user interface, but receives requests from other computer systems (clients).
  • the computer system 100 may be implemented as a desktop computer, portable computer, laptop or notebook computer, tablet computer, pocket computer, telephone, smart phone, or any other appropriate type of electronic device.
  • the computer system 100 may include some or all of the hardware and/or computer program elements of the computer system 100 .
  • the various program components implementing various embodiments of the invention may be implemented in a number of manners, including using various computer applications, routines, components, programs, objects, modules, data structures, etc., and are referred to herein as “computer programs,” or simply “programs.”
  • the computer programs include one or more instructions or statements that are resident at various times in various memory and storage devices in the computer system 100 and that, when read and executed by one or more processors in the computer system 100 , or when interpreted by instructions that are executed by one or more processors, cause the computer system 100 to perform the actions necessary to execute steps or elements including the various aspects of embodiments of the invention. Aspects of embodiments of the invention may be embodied as a system, method, or computer program product.
  • aspects of embodiments of the invention may take the form of an entirely hardware embodiment, an entirely program embodiment (including firmware, resident programs, micro-code, etc., which are stored in a storage device), or an embodiment combining program and hardware aspects that may all generally be referred to herein as a “circuit,” “module,” or “system.” Further, embodiments of the invention may take the form of a computer program product embodied in one or more computer-readable medium(s) having computer-readable program code embodied thereon.
  • the computer-readable medium may be a computer-readable signal medium or a computer-readable storage medium.
  • a computer-readable storage medium may be, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing.
  • a computer-readable storage medium may include: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM) or Flash memory, an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
  • a computer-readable storage medium may be any tangible medium that can contain, or store, a program for use by or in connection with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a propagated data signal with computer-readable program code embodied thereon, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof.
  • a computer-readable signal medium may be any computer-readable medium that is not a computer-readable storage medium and that communicates, propagates, or transports a program for use by, or in connection with, an instruction execution system, apparatus, or device.
  • Program code embodied on a computer-readable medium may be transmitted using any appropriate medium, including but not limited to, wireless, wire line, optical fiber cable, Radio Frequency, or any suitable combination of the foregoing.
  • Computer program code for carrying out operations for aspects of embodiments of the present invention may be written in any combination of one or more programming languages, including object oriented programming languages and conventional procedural programming languages.
  • the program code may execute entirely on the user's computer, partly on a remote computer, or entirely on the remote computer or server.
  • the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
  • LAN local area network
  • WAN wide area network
  • Internet Service Provider for example, AT&T, MCI, Sprint, EarthLink, MSN, GTE, etc.
  • These computer program instructions may also be stored in a computer-readable medium that can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer-readable medium produce an article of manufacture, including instructions that implement the function/act specified by the flowchart and/or block diagram block or blocks.
  • the computer programs defining the functions of various embodiments of the invention may be delivered to a computer system via a variety of tangible computer-readable storage media that may be operatively or communicatively connected (directly or indirectly) to the processor or processors.
  • the computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus, or other devices to produce a computer-implemented process, such that the instructions, which execute on the computer or other programmable apparatus, provide processes for implementing the functions/acts specified in the flowcharts and/or block diagram block or blocks.
  • each block in the flowcharts or block diagrams may represent a module, segment, or portion of code, which includes one or more executable instructions for implementing the specified logical function(s).
  • the functions noted in the block may occur out of the order noted in the figures.
  • two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
  • Each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flow chart illustrations can be implemented by special purpose hardware-based systems that perform the specified functions or acts, in combinations of special purpose hardware and computer instructions.
  • Embodiments of the invention may also be delivered as part of a service engagement with a client corporation, nonprofit organization, government entity, or internal organizational structure. Aspects of these embodiments may include configuring a computer system to perform, and deploying computing services (e.g., computer-readable code, hardware, and web services) that implement, some or all of the methods described herein. Aspects of these embodiments may also include analyzing the client company, creating recommendations responsive to the analysis, generating computer-readable code to implement portions of the recommendations, integrating the computer-readable code into existing processes, computer systems, and computing infrastructure, metering use of the methods and systems described herein, allocating expenses to users, and billing users for their use of these methods and systems.
  • computing services e.g., computer-readable code, hardware, and web services

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

A method for controlling access to a memory of a computer system configured with at least one logical partition may include receiving a first request to map a first page of the memory, the request identifying a first requester. A first logical partition associated with the first page may be determined. It may be determined that an attribute of the first logical partition limits access to individual pages of the first logical partition to a single requester, and that the first page is available to be mapped to a requester. The first page may be mapped to the first requester and a flag indicating that the first page is unavailable for an additional mapping may be set. The first request may be from a device driver on behalf of an input/output adapter, as the first requester, to use the first page in a direct memory access transfer.

Description

BACKGROUND
The disclosure relates generally to information technology, and more particularly to mapping memory pages used for direct memory address (DMA) transfers between memory and input/output (I/O) devices in a virtualized computer system.
The term “virtualization” may refer to different aspects of computing. In one aspect, hardware virtualization generally refers to methods that allow multiple applications to run concurrently on a computer. Hardware virtualization may be implemented by a hypervisor, which may also be referred to as a virtual machine manager. In another aspect, virtual memory refers to techniques in which virtual addresses are mapped to actual (physical) addresses in the memory of a computer system. With virtual addressing mechanisms, applications are able to address more memory than is physically available in the main memory of the system. In addition, application programming is simplified as applications are not required to manage memory that is shared with other applications.
SUMMARY
Various embodiments are directed to methods for controlling access to a memory of a computer system configured with at least one logical partition. A method may include receiving a first request to map a first page of the memory. The request may identify a first requester. The method may include determining a first logical partition associated with the first page and determining that an attribute of the first logical partition limits access to individual pages of the first logical partition to a single requester. In addition, the method may include determining that the first page is available to be mapped to a requester. Further, the method may include mapping the first page to the first requester and setting a flag to indicate that the first page is unavailable for an additional mapping.
In various embodiments, the first request is received from a device driver on behalf of an input/output adapter and the first requester is the input/output adapter. In various embodiments, the first page is requested for use in a direct memory access (DMA) transfer. In addition, a method may include processing a DMA access request from the first requester. The DMA access request may specify the first page. Moreover, a method may include rejecting a DMA access request from a second requester. Again, the DMA access request may specify the first page. In some embodiments, the determining that the first page is available to be mapped to a requester may include reading a target exclusive page flag stored in a target exclusive page table. In other embodiments, the determining that the first page is available to be mapped to a first requester may include reading a target exclusive page flag stored in a translation control entry table. Further, the request may identify a first logical address of the first page, and the determining the first logical partition associated with the first page may include translating the first logical address into a first physical address.
Various embodiments are directed to a system that includes a processor, a memory, at least one input/output adapter, and a hypervisor. The hypervisor may be configured to receive a first request to map a first page of the memory for use in a direct memory access (DMA) transfer operation. The request may identify a first component. The hypervisor may be additionally configured to determine a first logical partition associated with the first page, and to determine that an attribute of the first logical partition limits access to individual pages of the first logical partition to a single component. Further, the hypervisor may determine that the first page is available to be mapped. The hypervisor may map the first page to the first component and set a flag to indicate that the first page is unavailable for an additional mapping.
In various embodiments directed to a system, the first request is received from a device driver on behalf of an input/output adapter and the first component is the input/output adapter. In various embodiments, the hypervisor processes a DMA access request from the first component. The DMA access request may specify the first page. In various embodiments, the hypervisor rejects a DMA access request from a component other than the first component. Again, the DMA access request may specify the first page. In some embodiments, the hypervisor may determine that that the first page is available to be mapped by reading a target exclusive page flag stored in a target exclusive page table. In other embodiments, the hypervisor may determine that the first page is available to be mapped includes reading a target exclusive page flag stored in a translation control entry table.
Yet other embodiments are directed to a computer readable storage medium having instructions stored thereon for controlling access to a memory which, when executed, cause a processor to perform various operations. The operations may include receiving a first request to map a first page of the memory. The request may identify a first requester and specify that the first page is requested for use in a direct memory access (DMA) transfer. In addition, the operations may include determining a first logical partition associated with the first page, and determining that an attribute of the first logical partition limits access to individual pages of the first logical partition to a single requester. Further, the operations may include determining that the first page is available to be mapped to a requester. Additionally, the operations may include mapping the first page to the first requester and setting a flag to indicate that the first page is unavailable for an additional mapping.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 depicts a block diagram of an exemplary computer system in which various embodiments of the invention may be implemented.
FIG. 2 is a block diagram showing a view of the computer hardware, hypervisor, operating systems, applications, and I/O adapters of the computer system of FIG. 1 in a virtualized environment, according to various embodiments.
FIG. 3 is a diagram of an alternative view of the memory, TCE tables, and I/O adapters of the computer system of FIG. 1, according to various embodiments.
FIG. 4 is a diagram of the memory, TCE tables, and I/O adapters of the computer system of FIG. 3 showing an alternative mapping of the I/O adapters to the memory, according to various embodiments.
FIG. 5 illustrates a format of an entry in a translation control entry table according to various embodiments.
FIG. 6 illustrates a target exclusive page table for a logical partition according to an embodiment.
FIG. 7 is a flow diagram of a method for processing a mapping request according to various embodiments.
FIG. 8 illustrates an exclusive target table according to various embodiments.
Like reference numbers and designations in the various drawings indicate like elements.
DETAILED DESCRIPTION
A hypervisor may not prevent two or more components, such as I/O adapters, firmware, or virtual I/O servers from mapping the same page of memory for use in DMA transfers. Allowing two or more components to map the same page is sometimes desirable, but at other times it is not desirable. For example, a bug (defect) can occur when multiple mappings are allowed and DMA writes from two or more components conflict with one another. These bugs are generally difficult to find. According to various embodiments, methods and systems are provided for controlling access to a page of memory of a logical partition. When a request to map a page of memory for use in a DMA transfer operation is received, a logical partition associated with the page is determined. Once the logical partition is known, it is determined whether an attribute of the logical partition limits access to individual pages of the logical partition to a single requester, such as a single input/output adapter or other component. If the logical partition limits access, it is determined whether the page is available to be mapped to a component, i.e., the page has not already been mapped. If the page is available, the page is mapped to the requester. In addition, a page exclusivity attribute or flag may be set to indicate that the page is unavailable for an additional mapping. This ensures that the requester has exclusive access to the page. Exclusivity is maintained until the page exclusivity flag is reset to indicate that the page is available.
Each logical partition may have an exclusive target for DMA (ET for DMA) attribute, which may be stored in an exclusive target table. If the ET for DMA attribute of the logical partition limits access to individual pages of the partition to a single requester, a target exclusive page table is provided for the logical partition. Page exclusivity attributes or flags may be stored in the target exclusive page table. In alternative embodiments, page exclusivity may be maintained using a bit of an entry of a TCE table or a bit of an entry of a page translation table. If a DMA write request is received for a logical partition having the ET for DMA attribute enabled, and the request specifies a page that has been exclusively mapped (i.e., the page exclusivity attribute is set) to a component, such as an input/output adapter, the request is only processed if the requester is the exclusively-mapped component. The DMA write request is rejected if the requester is a component other than the exclusively-mapped component.
FIG. 1 depicts a high-level block diagram of an exemplary computer system 100 for implementing various embodiments. The mechanisms and apparatus of the various embodiments disclosed herein apply equally to any appropriate computing system. The major components of the computer system 100 may include one or more processors 102, a memory 104, one or more input/output (I/O) adapters 106A-106C, all of which are communicatively coupled, directly or indirectly, for inter-component communication via a host bus 108, a memory bus 110, a bus 112, an I/O bus 114, a bus interface unit (IF) 116, and an I/O bus interface unit 118.
The computer system 100 may contain one or more general-purpose programmable CPUs, herein generically referred to as the processor 102. In an embodiment, the computer system 100 may contain multiple processors 102; however, in another embodiment, the computer system 100 may alternatively include a single CPU. Each processor 102 executes instructions stored in the memory 104 and may include one or more levels of on-board cache. Each processor 102 may include one or more cores 103, e.g., cores 103A-103D.
In an embodiment, the memory 104 may include a random-access semiconductor memory, storage device, or storage medium (either volatile or non-volatile) for storing or encoding data and programs. In another embodiment, the memory 104 represents the entire virtual memory of the computer system 100, and may also include the virtual memory of other computer systems coupled to the computer system 100 or connected via a network. The memory 104 is conceptually a single monolithic entity, but in other embodiments the memory 104 is a more complex arrangement, such as a hierarchy of caches and other memory devices. For example, memory may exist in multiple levels of caches, and these caches may be further divided by function, so that one cache holds instructions while another holds non-instruction data, which is used by the processor 102. Memory may be further distributed and associated with different CPUs or sets of CPUs, as is known in any of various so-called non-uniform memory access (NUMA) computer architectures.
The memory 104 may store all or a portion of a hypervisor 120, one or more operating systems 122, one or more device drivers 124, and one or more application programs 126. A device driver 124 may be a computer program that controls a particular device using low-level commands that the device understands. The device driver 124 may translate higher-level application code to low-level, device-specific commands. In addition, a portion of the memory 104 may be allocated for one or more DMA buffers 128, one or more page translation tables (PTT) 127, one or more translation control entry (TCE) tables 129, an exclusive target (ET) table 133, and a target exclusive page (TEP) table 135. Further, the memory 104 may store a virtual I/O server 131. These programs and data structures are illustrated as being included within the memory 104 in the computer system 100, however, in other embodiments, some or all of them may be on different computer systems and may be accessed remotely, e.g., via a network.
The computer system 100 may use virtual addressing mechanisms that allow the programs of the computer system 100 to behave as if they only have access to a large, single storage entity instead of access to multiple, smaller storage entities. The processor 102 and various devices, such as the I/O adapters 106A-106C, may use virtual addresses that are translated into physical addresses in the memory 104. Thus, while the hypervisor 120, operating systems 122, application programs 126, device drivers 124, DMA buffers 126, PTT tables 127, TCE tables 129, ET table 133, TEP table 135, and virtual I/O server 131 are illustrated as being included within the memory 104, one or more of them are not necessarily all completely contained in the same storage device at the same time. Further, although the modules, units, and databases of the hypervisor 120, operating systems 122, application programs 126, device drivers 124, DMA buffers 128, PTT tables 127, TCE tables 129, ET table 133, TEP table 135, and virtual I/O server 131 are illustrated as being separate entities in FIG. 1, in other embodiments some of them, portions of some of them, or all of them may be packaged together.
In an embodiment, the modules, units, and databases of the hypervisor 120, operating systems 122, application programs 126, device drivers 124, and virtual I/O server 131 may include instructions or statements that execute on the processor 102 or instructions or statements that are interpreted by instructions or statements that execute on the processor 102 to carry out the functions as further described below. In another embodiment, the modules, units, and databases of the hypervisor 120, operating systems 122, application programs 126, device drivers 124, and virtual I/O server 131 are implemented in hardware or firmware via semiconductor devices, chips, logical gates, circuits, circuit cards, and/or other physical hardware devices in lieu of, or in addition to, a processor-based system. In an embodiment, the modules, units, and databases of the hypervisor 120, operating systems 122, application programs 126, device drivers 124, and virtual I/O server 131 may include data in addition to instructions or statements.
The bus interface unit 116 may handle communications among the processor 102, the memory 104, and the I/O bus interface unit 118. The bus interface unit 116 may include a memory management unit (MMU) 130. The MMU 130 handles memory requests for the processor 102. The MMU 130 may translate processor-visible virtual addresses to physical addresses of the memory 104. In addition, one or more of the functions provided by the bus interface unit 116 may be on board an integrated circuit that also includes the processor 102.
The I/O bus interface unit 118 may be coupled with the I/O bus 114 for transferring data to and from the various I/O units. The I/O bus interface unit 118 may communicate with multiple I/ O adapters 106A, 106B, and 106C, which are also known as I/O processors (IOPs) or I/O interface units, through the I/O bus 114. The I/O bus interface unit 118 may include an I/O MMU 132 and a DMA unit 134. The I/O MMU 132 translates virtual addresses visible to various I/O devices to physical addresses of the memory 104. The DMA unit 134 may be used to transfer data between the memory 104 and the memory of any of the I/O adapters 106A-106C. The DMA 134 may provide two or more DMA channels.
The I/O adapters 106 may support communication with a variety of storage and I/O devices 136A-136C. In addition, the I/O adapters 106A-106C may support the attachment of one or more disk drives or direct access storage devices. The I/O adapters 106A-106C may support the attachment of solid state memory devices. The I/O adapters 106A-106C may provide an interface to any of various other I/O devices or devices of other types, such as printers or fax machines. The I/O adapters 106A-106C may provide one or more communication paths from the computer system 100 to other digital devices and computer systems; these communication paths may include one or more networks. In various embodiments, an I/O adapter 106 may be a device for connecting SCSI, Fibre Channel, or eSATA devices. In various embodiments, an I/O adapter 106 may be a device for connecting to IDE, Ethernet, Firewire, PCIe, or USB buses. In an embodiment, an I/O adapter 106 may be a host Ethernet adapter.
Although the computer system 100 shown in FIG. 1 illustrates a particular bus structure providing a direct communication path among the processors 102, the memory 104, the bus interface 116, and the I/O bus interface unit 118, in alternative embodiments the computer system 100 may include different buses or communication paths, which may be arranged in any of various forms, such as point-to-point links in hierarchical, star or web configurations, multiple hierarchical buses, parallel and redundant paths, or any other appropriate type of configuration. Furthermore, while the I/O bus interface unit 118 and the bus interface unit 116 are shown as single respective units, the computer system 100 may, in fact, contain multiple I/O bus interface units 118 or multiple bus interface units 116. While multiple I/O devices 136 are shown as being coupled to the I/O bus 114 via various communications paths running through I/O adapters 106, in other embodiments, some or all of the I/O devices are connected directly to one or more system I/O buses.
FIG. 2 is a block diagram showing a view of the computer hardware 202, hypervisor 120, operating systems 122, applications 126, and I/O adapters 106 of the computer system 100 of FIG. 1 in a virtualized environment, according to various embodiments. The hardware 202 may include the processor 102, memory 104, buses 114, and various other components of the computer system 100. In FIG. 2, the hypervisor 120 is used to implement logical partitions (LPAR) 204, 206, and 208. Applications APP1, 126A and APP2, 126B run under operating system 1, 122A in LPAR 2, 206. Applications APP3, 126C and APP4, 126 D run under operating system 2, 122B in LPAR 3, 208. The virtual I/O server 131 operates in LPAR 1, 204.
FIGS. 3 and 4 are diagrams showing an alternative view of the memory 104, TCE tables 129A-129E, I/O adapters 106A-106E of the computer system 100 of FIG. 1, according to various embodiments. As shown in FIG. 3 and, in various embodiments, one TCE table 129 may be provided per I/O adapter 106. FIGS. 3 and 4 are used to explain the use of the TCE tables 129 and the mapping between I/O adapters 106 and memory 104. While the TCE tables 129A, 129B, 129C, 129D, and 129E are depicted as being outside the memory 104, this is to clarify their use; in practice, the TCE tables 129 are stored in the memory 104. FIGS. 3 and 4 show a type of map of the memory 104 commonly used in the art. Maps of this type may include multiple rows, with each row corresponding to a memory location. Each row has an address with the bottom row corresponding with the lowest possible address and the top row with the highest possible address. In FIGS. 3 and 4, diagonally striped regions P1-P5 represent pages in memory, which may be a range of 4K of addresses. In FIGS. 3 and 4, a first range R1 of addresses has been allocated for use by the LPAR 1, a second range R2 of addresses has been allocated for use by the LPAR 2, and a third range R3 of addresses has been allocated for use by the LPAR 3.
Data may be moved (read or written) between the system memory 104 and the I/O adapters 106 using DMA. An I/O adapter uses virtual addresses when it makes a DMA transfer. The virtual addresses are translated into physical addresses using a translation control entry (TCE) table 129. Specifically, in DMA operations, address translations for the I/ O adapters 106A, 106B, 106C, 106D, or 106E use the adapter's respective TCE table 129A, 129B, 129C, 129D or 129E. Host bridge hardware, e.g., I/O MMU 132, uses a TCE table 129 to convert I/O bus logical addresses to physical real addresses.
FIG. 5 illustrates the format of an entry 500 in a TCE table 129. In various embodiments, each entry in a TCE 129 is 64 bits. Bits 12-63 contain the translation of an I/O bus page address to a memory page address. Memory and I/O pages may be 4K in size. Bits 2-11 are reserved for firmware control. Bits 1 and 0 are, respectively a read access bit, which, if set, authorizes an I/O adapter to read system memory, and a write access bit, which, if set, authorizes an I/O adapter to write to system memory. Some embodiments do not increase the number of bits required for entry 500 in a TCE table 129, which may be an advantage.
FIG. 4 shows an alternative mapping of the I/O adapters 106A-106E to the memory 104 of FIG. 3, according to various embodiments. As shown in FIG. 4, both of the I/ O adapters 106A and 106B are mapped to the same memory page P1. In addition, both of the I/ O adapters 106D and 106E are mapped to the same memory page P4.
There are situations where mapping two or more I/O adapters to a single page is advantageous for performance reasons. For example, statically mapped pages avoid the overhead associated with remapping pages on an as-needed basis. However, in other situations, mapping two or more I/O adapters to a single page causes problems. DMA write operations from two I/O adapters may conflict with one another, causing corruption of the data stored in the common page. These types of problems may be highly dependent on timing, and as such, may be difficult to correct.
Generally, in a non-virtualized system, i.e., one without LPARs, the operating system 122 creates and maintains the entries in the TCE tables 129. However, in a virtualized system, i.e., one with LPARs, entries the TCE tables 129 are managed differently. In virtualized systems, TCE tables 129 are only accessible by the hypervisor 120. When an operating system 122 needs to create or modify an entry in a TCE table 129, it must issue a hypervisor call, which may include as an argument a “logical real” address of the operating system. (From the perspective of the operating system the argument is a real address, however, the argument is in fact a logical address.) The hypervisor 120 translates the “logical real” address of the client operating system 122 into a “physical real” address that the I/O hardware can understand. The hypervisor 120 also enforces an isolation mechanism that prevents an operating system 120 from setting up a mapping to a region of memory 104 not allocated to it. However, the hypervisor 120 permits an individual I/O adapter 106 to be mapped to two or more partitions. In addition, in various embodiments, the hypervisor 120 does not prevent two different I/O adapters 106 that both belong to the same operating system 122 from mapping to the same page of memory. This last feature is illustrated in FIG. 4.
Correctly-written device drivers may avoid problems caused by conflicting writes by multiple I/O adapters. However, system firmware, such as a flexible service processor (FSP) may be allowed to perform DMA write operations into the same page that is mapped to an I/O adapter. (FSP provides diagnostics, initialization, configuration, run-time error detection and correction. FSP connects a computer system to a hardware management console.) An FSP may have its own TCE table. An operating system 122 or the firmware itself may request the hypervisor 120 to map a page for DMA transfers by the firmware. Mappings by these components are not controlled by a device driver 124. In addition, the virtual I/O server 131 may establish its own DMA mappings. Accordingly, a correctly-written device driver 124 would not prevent a page conflict with firmware or virtual I/O server mappings. Thus, there are multiple entities, each with its own code base, that may establish a DMA mapping to a same page, and a bug in any entity may corrupt important data.
According to various embodiments, a computer system 100 includes an exclusive target (ET) table 133. The ET table 133 is used to store an exclusive target for DMA (ET for DMA or ET DMA) attribute for each LPAR. The ET DMA attribute may be used to control access by multiple requesters e.g., multiple I/O adapters, to the same page in memory. If an LPAR has its ET DMA attribute enabled, the hypervisor 120 will allow a single logical page to be mapped to a maximum of one requester, e.g., one I/O adapter 106, for DMA write purposes. A user may wish to enable the ET DMA attribute for an LPAR in a development environment for debug purposes. In addition, a user may wish to enable the ET DMA attribute on critical LPARS. The user may wish to disable the ET DMA attribute for performance-sensitive or minimal-memory LPARS.
FIG. 8 illustrates an ET table 133 according to an embodiment. As shown in FIG. 8, the ET table 133 may include a field to identify the LPAR and a field for an associated ET DMA attribute.
Page exclusivity may be maintained in several ways. In various embodiments, page exclusivity may be maintained using a target exclusive page table (TEP) 135. FIG. 6 depicts one example of a TEP table 135. In one embodiment, the TEP 135 includes a field to identify each logical page in an LPAR and an associated bit to store a page exclusivity attribute or flag for the page. In another embodiment, the field to identify logical pages may be eliminated and the identification of logical pages may be determined implicitly from the sequential position of the page exclusivity flag in the table. This embodiment would require one (1) bit of hypervisor storage per 4 Kbytes of virtual machine memory, for an overhead of 0.003%, which may generally be much better than the overhead for the DMA translation table, e.g., 1.95%.
In an alternative embodiment, page exclusivity may be maintained using a bit of the TCE entry. For example, one bit of the translation index (bits 12-63 of FIG. 5) may be used for the TEP flag. In another alternative, one bit of a page translation table (PTT) 127 entry may be used for the TEP flag.
According to various embodiments, the ET DMA attribute may be changed in a virtual machine power off/on cycle, i.e., an LPAR power off/on cycle. In a virtual machine power off/on cycle, the TCE table 129 may be reset so that it is in a known state, e.g., no I/O adapter mappings. Once the TCE table 129 is in a known state, the ET DMA attribute may be enabled for the LPAR. Alternatively, in various alternative embodiments, the hypervisor 120 may perform a search of all mappings in all TCE tables 129 before accepting a request to enable the ET DMA attribute for an LPAR. If the search does not identify any pages as being mapped to multiple I/O adapters, the hypervisor 120 would enable the requested ET DMA attribute for the LPAR.
When an operating system, an I/O adapter (via a device driver), or a firmware component (all of which may be referred to collectively, as “components”) seeks to establish a DMA write-capable mapping, it makes a hypervisor call to request that the hypervisor put the mapping in the corresponding TCE table 129 for the component. The hypervisor call is necessary because TCE tables 129 are only accessible by the hypervisor 120 in virtualized systems. The operating system 122, device driver 124 (on behalf of an I/O adapter), or firmware component may seek to establish a DMA write-capable mapping at any time. For example, an operating system 122, device driver 124 (on behalf of an I/O adapter), or firmware component may request a mapping at initialization time or at the time a particular DMA operation is being set up.
FIG. 7 shows a flow chart of a method 700 for processing a mapping request according to various embodiments. In operation 702, a mapping request is received by a hypervisor 120. The requester may be an operating system, an I/O adapter, firmware, or other component. The request may be received directly from a requester or indirectly, such as when the requester is an I/O adapter and a device driver makes the request on behalf of the I/O adapter. (Note that while, technically, a device driver makes requests for page mappings and an I/O adapter does not make requests for page mappings, an I/O adapter is nonetheless included within the meaning of the term “requester” as that term is used in the claims.) When the hypervisor 120 receives the mapping request, it translates the “logical real” address of the page into a physical one (operation 704). The hypervisor 120 then determines which LPAR the physical address is allocated to and whether the ET DMA attribute for that LPAR is enabled (operation 706). If the ET DMA attribute is not enabled, the hypervisor 120 adds an entry to the TCE table for the requester (operation 712). However, if the ET DMA attribute is enabled, the hypervisor 120 examines at least the write bit of the proposed entry contained in the request (operation 708). In various embodiments, the hypervisor 120 may additionally examine both the read and write access bits of the proposed entry contained in the request. If the write bit is disabled, the hypervisor 120 adds an entry to the TCE table (operation 712). However, if the write bit is enabled, the hypervisor 120, the hypervisor 120 determines whether the page has previously been exclusively mapped to a requester, e.g., an I/O adapter 106 (operation 710). The hypervisor 120 may determine whether the page has been previously mapped from the target exclusive page (TEP) table 135. If the page exclusivity flag in the TEP 135 is set for the page, then the page is exclusively associated with a particular requester, e.g., an I/O adapter 106, and the hypervisor 120 rejects the mapping request (operation 714). In addition, the hypervisor 120 may return an error code to the requester, e.g., an operating system, device driver for an I/O adapter, firmware, or other component. If the page is not exclusively associated with a particular requester, the hypervisor 120 atomically sets a flag indicating that the page has been exclusively mapped to the requesting entity, e.g., an I/O adapter 106. The hypervisor 120 may then insert the entry into the TCE table 135 (operation 712).
In various embodiments, the hypervisor 120 may set a flag indicating that a page has been exclusively mapped an I/O adapter 106 or may perform all or part of the method 700 for processing a mapping request in an atomic operation. The atomicity may be provided by any known suitable technique, e.g., a lock, a mutual exclusion algorithm (mutex), or by an “ldarx” (Load Double word Reserve Indexed) instruction used in conjunction with a subsequent “stdcx” (Store Double Word Conditional Indexed) instruction.
At various times, it may be desirable to un-map a page exclusively allocated to an I/O adapter 106. If a hypervisor 120 supports a page un-map request, that process may be used. However, if the hypervisor 120 that does not explicitly support a page un-map request, the method 700 may be modified and used in its modified form to un-map a page exclusively allocated to an I/O adapter 106 or other component. In this alternative, the method 700 may be used to establish a new mapping for the page with the read and write bits disabled. This will effectively remove a previously established exclusive mapping. Operations 702, 706, and 706 are unchanged. Operations 708 and 710 are modified to recognize that if a proposed entry contained in the request specifies that the write bit is disabled but the page has been mapped, the page exclusivity flag for the page should be disabled.
Referring back to FIG. 1, in various embodiments, the computer system 100 may be a multi-user mainframe computer system, a single-user system, or a server computer or similar device that has little or no direct user interface, but receives requests from other computer systems (clients). In other embodiments, the computer system 100 may be implemented as a desktop computer, portable computer, laptop or notebook computer, tablet computer, pocket computer, telephone, smart phone, or any other appropriate type of electronic device.
The computer system 100 may include some or all of the hardware and/or computer program elements of the computer system 100. The various program components implementing various embodiments of the invention may be implemented in a number of manners, including using various computer applications, routines, components, programs, objects, modules, data structures, etc., and are referred to herein as “computer programs,” or simply “programs.”
The computer programs include one or more instructions or statements that are resident at various times in various memory and storage devices in the computer system 100 and that, when read and executed by one or more processors in the computer system 100, or when interpreted by instructions that are executed by one or more processors, cause the computer system 100 to perform the actions necessary to execute steps or elements including the various aspects of embodiments of the invention. Aspects of embodiments of the invention may be embodied as a system, method, or computer program product. Accordingly, aspects of embodiments of the invention may take the form of an entirely hardware embodiment, an entirely program embodiment (including firmware, resident programs, micro-code, etc., which are stored in a storage device), or an embodiment combining program and hardware aspects that may all generally be referred to herein as a “circuit,” “module,” or “system.” Further, embodiments of the invention may take the form of a computer program product embodied in one or more computer-readable medium(s) having computer-readable program code embodied thereon.
Any combination of one or more computer-readable medium(s) may be utilized. The computer-readable medium may be a computer-readable signal medium or a computer-readable storage medium. For example, a computer-readable storage medium may be, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage media may include: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM) or Flash memory, an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer-readable storage medium may be any tangible medium that can contain, or store, a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer-readable signal medium may include a propagated data signal with computer-readable program code embodied thereon, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer-readable signal medium may be any computer-readable medium that is not a computer-readable storage medium and that communicates, propagates, or transports a program for use by, or in connection with, an instruction execution system, apparatus, or device. Program code embodied on a computer-readable medium may be transmitted using any appropriate medium, including but not limited to, wireless, wire line, optical fiber cable, Radio Frequency, or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of embodiments of the present invention may be written in any combination of one or more programming languages, including object oriented programming languages and conventional procedural programming languages. The program code may execute entirely on the user's computer, partly on a remote computer, or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
Aspects of embodiments of the invention are described below with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products. Each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams may be implemented by computer program instructions embodied in a computer-readable medium. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified by the flowchart and/or block diagram block or blocks. These computer program instructions may also be stored in a computer-readable medium that can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer-readable medium produce an article of manufacture, including instructions that implement the function/act specified by the flowchart and/or block diagram block or blocks.
The computer programs defining the functions of various embodiments of the invention may be delivered to a computer system via a variety of tangible computer-readable storage media that may be operatively or communicatively connected (directly or indirectly) to the processor or processors. The computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus, or other devices to produce a computer-implemented process, such that the instructions, which execute on the computer or other programmable apparatus, provide processes for implementing the functions/acts specified in the flowcharts and/or block diagram block or blocks.
The flowchart and the block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products, according to various embodiments of the present invention. In this regard, each block in the flowcharts or block diagrams may represent a module, segment, or portion of code, which includes one or more executable instructions for implementing the specified logical function(s). In some embodiments, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. Each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flow chart illustrations, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, in combinations of special purpose hardware and computer instructions.
Embodiments of the invention may also be delivered as part of a service engagement with a client corporation, nonprofit organization, government entity, or internal organizational structure. Aspects of these embodiments may include configuring a computer system to perform, and deploying computing services (e.g., computer-readable code, hardware, and web services) that implement, some or all of the methods described herein. Aspects of these embodiments may also include analyzing the client company, creating recommendations responsive to the analysis, generating computer-readable code to implement portions of the recommendations, integrating the computer-readable code into existing processes, computer systems, and computing infrastructure, metering use of the methods and systems described herein, allocating expenses to users, and billing users for their use of these methods and systems. In addition, various programs described herein may be identified based upon the application for which they are implemented in a specific embodiment of the invention. But, any particular program nomenclature used herein is used merely for convenience, and thus embodiments of the invention are not limited to use solely in any specific application identified and/or implied by such nomenclature. The exemplary environments illustrated in FIG. 1 are not intended to limit the present invention. Indeed, other alternative hardware and/or program environments may be used without departing from the scope of embodiments of the invention.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the singular forms “a,” “an,” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “includes” and/or “including,” when used in this specification, specify the presence of the stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. In the previous detailed description of exemplary embodiments of the invention, reference was made to the accompanying drawings (where like numbers represent like elements), which form a part hereof, and in which is shown by way of illustration specific exemplary embodiments in which the invention may be practiced. These embodiments were described in sufficient detail to enable those skilled in the art to practice the invention, but other embodiments may be utilized and logical, mechanical, electrical, and other changes may be made without departing from the scope of the present invention. In the previous description, numerous specific details were set forth to provide a thorough understanding of embodiments of the invention. But, embodiments of the invention may be practiced without these specific details. In other instances, well-known circuits, structures, and techniques have not been shown in detail in order not to obscure embodiments of the invention.
Different instances of the word “embodiment” as used within this specification do not necessarily refer to the same embodiment, but they may. Any data and data structures illustrated or described herein are examples only, and in other embodiments, different amounts of data, types of data, fields, numbers and types of fields, field names, numbers and types of rows, records, entries, or organizations of data may be used. In addition, any data may be combined with logic, so that a separate data structure may not be necessary. The previous detailed description is, therefore, not to be taken in a limiting sense.
While the foregoing is directed to exemplary embodiments, other and further embodiments of the invention may be devised without departing from the basic scope thereof, and the scope thereof is determined by the claims that follow.

Claims (18)

What is claimed is:
1. A system, comprising:
a processor;
a memory;
at least one input/output adapter; and
a hypervisor to:
receive a first request to map a first page of the memory for use in a direct memory access (DMA) transfer operation, the first request identifying a first component, wherein a mapping of a page of memory to a component associates the page with the component,
determine that an address of the first page is within a range of addresses allocated for a first logical partition of two or more logical partitions,
determine whether an attribute of the first logical partition limits access to individual pages of the first logical partition to a single component,
determine that the first page is available to be mapped when the attribute limits access to individual pages of the first logical partition to a single component, and
exclusively map the first page to the first component and set a target exclusive page (TEP) flag to indicate that the first page is exclusively associated with the first component and is unavailable for mapping to a second component when the first page is available to be mapped and the attribute limits access to individual pages of the first logical partition to a single component.
2. The system of claim 1, wherein the first request is received from a device driver on behalf of an input/output adapter and the first component is the input/output adapter.
3. The system of claim 1, wherein the first request is received from an operating system.
4. The system of claim 1, further comprising a DMA unit to process a DMA access request from the first component, wherein the DMA access request specifies the first page.
5. The system of claim 1, further comprising the hypervisor to: reject a second request to map the first page of the memory for use in a DMA transfer operation, the second request identifying the second component.
6. The system of claim 5, wherein the second component is one of an input/output adapter, a flexible service processor, or firmware component.
7. The system of claim 1, wherein the hypervisor determines that that the first page is available to be mapped by reading the target exclusive page flag stored in a target exclusive page table.
8. The system of claim 1, wherein the hypervisor determines that the first page is available to be mapped by reading the target exclusive page flag stored in a translation control entry table.
9. A system, comprising:
a processor;
a memory;
at least one input/output adapter; and
a hypervisor to:
receive a first request to map a first page of the memory for use in a direct memory access (DMA) transfer operation, the first request identifying a first component, wherein a mapping of a page of memory to a component associates the page with the component,
determine that an address of the first page is within a range of addresses allocated for a first logical partition of two or more logical partitions,
determine whether an attribute of the first logical partition limits access to individual pages of the first logical partition to a single component,
determine that the first page is unavailable to be mapped when the attribute limits access to individual pages of the first logical partition to a single component and the first page is exclusively associated with a second component, and
reject the first request to map the first page of memory.
10. The system of claim 9, wherein the first request is received from a device driver on behalf of an input/output adapter and the first component is the input/output adapter.
11. The system of claim 9, wherein the first request is received from an operating system, a firmware component, or a flexible service processor.
12. The system of claim 9, wherein the hypervisor determines that that the first page is unavailable to be mapped by reading a target exclusive page flag stored in a target exclusive page table.
13. The system of claim 9, wherein the hypervisor determines that the first page is unavailable to be mapped by reading a target exclusive page flag stored in a translation control entry table.
14. The system of claim 9, wherein the first request to map the first page of the memory for use in a DMA transfer operation specifies that the DMA operation is a write operation.
15. The system of claim 1, wherein the first request is received from a firmware component.
16. The system of claim 1, wherein the first request is received from a flexible service processor.
17. The system of claim 1, wherein the first request to map the first page of the memory for use in a DMA transfer operation specifies that the DMA operation is a write operation.
18. A system, comprising:
a processor;
a memory;
at least one input/output adapter; and
a hypervisor to:
receive a first request to map a first page of the memory for use in a direct memory access (DMA) transfer operation, the first request identifying a first component, wherein a mapping of a page of memory to a component associates the page with the component,
determine that an address of the first page is within a range of addresses allocated for a first logical partition of two or more logical partitions,
determine whether an attribute of the first logical partition limits access to individual pages of the first logical partition to a single component,
determine that the first page is available to be mapped when the attribute limits access to individual pages of the first logical partition to a single component, and
exclusively map the first page to the first component and set a target exclusive page (TEP) flag to indicate that the first page is exclusively associated with the first component and is unavailable for mapping to a second component when the first page is available to be mapped and the attribute limits access to individual pages of the first logical partition to a single component; and
receive a second request to map a second page of the memory for use in a direct memory access (DMA) transfer operation, the second request identifying a third component, wherein a mapping of a page of memory to a component associates the page with the component,
determine that an address of the second page is within a range of addresses allocated for a second logical partition of the two or more logical partitions,
determine whether an attribute of the second logical partition limits access to individual pages of the second logical partition to a single component,
determine that the second page is unavailable to be mapped when the attribute limits access to individual pages of the second logical partition to a single component and the second page is exclusively associated with a fourth component, and
reject the second request to map the second page of memory.
US14/281,979 2014-01-17 2014-05-20 Controlling direct memory access page mappings Expired - Fee Related US9367478B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US14/281,979 US9367478B2 (en) 2014-01-17 2014-05-20 Controlling direct memory access page mappings

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US14/157,593 US9639478B2 (en) 2014-01-17 2014-01-17 Controlling direct memory access page mappings
US14/281,979 US9367478B2 (en) 2014-01-17 2014-05-20 Controlling direct memory access page mappings

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US14/157,593 Continuation US9639478B2 (en) 2014-01-17 2014-01-17 Controlling direct memory access page mappings

Publications (2)

Publication Number Publication Date
US20150205729A1 US20150205729A1 (en) 2015-07-23
US9367478B2 true US9367478B2 (en) 2016-06-14

Family

ID=53544934

Family Applications (2)

Application Number Title Priority Date Filing Date
US14/157,593 Active 2034-10-30 US9639478B2 (en) 2014-01-17 2014-01-17 Controlling direct memory access page mappings
US14/281,979 Expired - Fee Related US9367478B2 (en) 2014-01-17 2014-05-20 Controlling direct memory access page mappings

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US14/157,593 Active 2034-10-30 US9639478B2 (en) 2014-01-17 2014-01-17 Controlling direct memory access page mappings

Country Status (1)

Country Link
US (2) US9639478B2 (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9639478B2 (en) * 2014-01-17 2017-05-02 International Business Machines Corporation Controlling direct memory access page mappings
CN107430548B (en) * 2015-03-06 2021-02-05 东芝存储器株式会社 Storage device control method and storage device
US10209889B2 (en) * 2016-07-14 2019-02-19 International Business Machines Corporation Invalidation of shared memory in a virtual environment
US10176115B2 (en) 2016-07-14 2019-01-08 International Business Machines Corporation Shared memory in a virtual environment
US10241956B2 (en) * 2016-09-12 2019-03-26 International Business Machines Corporation Virtualizing coherent hardware accelerators
US11113440B1 (en) * 2017-03-17 2021-09-07 Synopsys, Inc. Memory migration in hybrid emulation
CN111181736B (en) * 2019-12-31 2022-04-05 奇安信科技集团股份有限公司 Data transmission method, device, system and medium

Citations (42)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5553307A (en) * 1992-04-17 1996-09-03 Hitachi, Ltd. Method and device for transferring noncontiguous blocks in one transfer start by creating bit-map indicating which block is to be transferred
US6314501B1 (en) * 1998-07-23 2001-11-06 Unisys Corporation Computer system and method for operating multiple operating systems in different partitions of the computer system and for allowing the different partitions to communicate with one another through shared memory
US20020010811A1 (en) * 2000-06-08 2002-01-24 International Business Machines Corporation DMA windowing in an LPAR environment using device arbitration level to allow multiple IOAs per terminal bridge
US20020124194A1 (en) * 2001-03-01 2002-09-05 International Business Machines Corporation Method and apparatus to power off and/or reboot logical partitions in a data processing system
US6496909B1 (en) * 1999-04-06 2002-12-17 Silicon Graphics, Inc. Method for managing concurrent access to virtual memory data structures
US6629162B1 (en) * 2000-06-08 2003-09-30 International Business Machines Corporation System, method, and product in a logically partitioned system for prohibiting I/O adapters from accessing memory assigned to other partitions during DMA
US20030212873A1 (en) * 2002-05-09 2003-11-13 International Business Machines Corporation Method and apparatus for managing memory blocks in a logical partitioned data processing system
US6654906B1 (en) * 2000-06-08 2003-11-25 International Business Machines Corporation Recovery from instruction fetch errors in hypervisor code
US20040064601A1 (en) 2002-09-30 2004-04-01 International Business Machines Corporation Atomic memory migration apparatus and method
US20040073766A1 (en) * 2002-10-10 2004-04-15 International Business Machines Corporation Method, apparatus and system for allocating and accessing memory-mapped facilities within a data processing system
US6892383B1 (en) * 2000-06-08 2005-05-10 International Business Machines Corporation Hypervisor function sets
US6938114B2 (en) * 2001-03-01 2005-08-30 International Business Machines Corporation Method and apparatus for managing access to a service processor
US20050223127A1 (en) * 2004-03-31 2005-10-06 International Business Machines Corporation Logical memory tags for redirected DMA operations
US20060010276A1 (en) 2004-07-08 2006-01-12 International Business Machines Corporation Isolation of input/output adapter direct memory access addressing domains
US20060026383A1 (en) * 2004-07-31 2006-02-02 Dinechin Christophe De Method for efficient virtualization of physical memory in a virtual-machine monitor
US20060036830A1 (en) * 2004-07-31 2006-02-16 Dinechin Christophe De Method for monitoring access to virtual memory pages
US7039692B2 (en) * 2001-03-01 2006-05-02 International Business Machines Corporation Method and apparatus for maintaining profiles for terminals in a configurable data processing system
US20060212608A1 (en) 2005-02-25 2006-09-21 International Business Machines Corporation System, method, and computer program product for a fully trusted adapter validation of incoming memory mapped I/O operations on a physical adapter that supports virtual adapters or virtual resources
US20060209724A1 (en) * 2005-02-28 2006-09-21 International Business Machines Corporation Method and system for fully trusted adapter validation of addresses referenced in a virtual host transfer request
US20060212870A1 (en) * 2005-02-25 2006-09-21 International Business Machines Corporation Association of memory access through protection attributes that are associated to an access control level on a PCI adapter that supports virtualization
US20060253682A1 (en) * 2005-05-05 2006-11-09 International Business Machines Corporation Managing computer memory in a computing environment with dynamic logical partitioning
US20060288187A1 (en) 2005-06-16 2006-12-21 International Business Machines Corporation Method and mechanism for efficiently creating large virtual memory pages in a multiple page size environment
US20070136554A1 (en) * 2005-12-12 2007-06-14 Giora Biran Memory operations in a virtualized system
US7272671B2 (en) * 2000-11-16 2007-09-18 International Business Machines Corporation Means of control bits protection in a logical partition environment having a first and second distinct operating system
US20090037941A1 (en) * 2007-08-02 2009-02-05 International Business Machines Corporation Multiple partition adjunct instances interfacing multiple logical partitions to a self-virtualizing input/output device
US20090144731A1 (en) * 2007-12-03 2009-06-04 Brown Aaron C System and method for distribution of resources for an i/o virtualized (iov) adapter and management of the adapter through an iov management partition
US20090144508A1 (en) 2007-12-03 2009-06-04 Freimuth Douglas M PCI Express Address Translation Services Invalidation Synchronization with TCE Invalidation
US20090276605A1 (en) * 2008-05-05 2009-11-05 International Business Machines Corporation Retaining an Association Between a Virtual Address Based Buffer and a User Space Application that Owns the Buffer
US7617340B2 (en) * 2007-01-09 2009-11-10 International Business Machines Corporation I/O adapter LPAR isolation with assigned memory space
US20090307690A1 (en) * 2008-06-06 2009-12-10 International Business Machines Corporation Managing Assignment of Partition Services to Virtual Input/Output Adapters
WO2009158178A2 (en) 2008-06-26 2009-12-30 Microsoft Corporation Direct memory access filter for virtualized operating systems
US20090327643A1 (en) * 2008-06-27 2009-12-31 International Business Machines Corporation Information Handling System Including Dynamically Merged Physical Partitions
US20100125709A1 (en) * 2008-11-17 2010-05-20 International Business Machines Corporation Logical Partition Memory
US20100125708A1 (en) * 2008-11-17 2010-05-20 International Business Machines Corporation Recursive Logical Partition Real Memory Map
US20100161850A1 (en) 2008-12-24 2010-06-24 Sony Computer Entertainment Inc. Methods And Apparatus For Providing User Level DMA And Memory Access Management
US7752485B2 (en) * 2007-08-17 2010-07-06 International Business Machines Corporation Method and system for virtual removal of physical field replaceable units
US20120110275A1 (en) * 2010-10-27 2012-05-03 Ibm Corporation Supporting Virtual Input/Output (I/O) Server (VIOS) Active Memory Sharing in a Cluster Environment
US20120216188A1 (en) 2011-02-22 2012-08-23 Red Hat Israel, Ltd. Exposing a dma engine to guests in a virtual machine system
US20130024648A1 (en) 2010-01-08 2013-01-24 International Business Machines Corporation Tlb exclusion range
US8914587B2 (en) * 2011-02-15 2014-12-16 Phison Electronics Corp. Multi-threaded memory operation using block write interruption after a number or threshold of pages have been written in order to service another request
US20150143037A1 (en) 2011-04-06 2015-05-21 P4tents1, LLC System, method and computer program product for multi-thread operation involving first memory of a first memory class and second memory of a second memory class
US20150205738A1 (en) * 2014-01-17 2015-07-23 International Business Machines Corporation Controlling direct memory access page mappings

Patent Citations (58)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5553307A (en) * 1992-04-17 1996-09-03 Hitachi, Ltd. Method and device for transferring noncontiguous blocks in one transfer start by creating bit-map indicating which block is to be transferred
US6314501B1 (en) * 1998-07-23 2001-11-06 Unisys Corporation Computer system and method for operating multiple operating systems in different partitions of the computer system and for allowing the different partitions to communicate with one another through shared memory
US6496909B1 (en) * 1999-04-06 2002-12-17 Silicon Graphics, Inc. Method for managing concurrent access to virtual memory data structures
US6892383B1 (en) * 2000-06-08 2005-05-10 International Business Machines Corporation Hypervisor function sets
US6823404B2 (en) * 2000-06-08 2004-11-23 International Business Machines Corporation DMA windowing in an LPAR environment using device arbitration level to allow multiple IOAs per terminal bridge
US6629162B1 (en) * 2000-06-08 2003-09-30 International Business Machines Corporation System, method, and product in a logically partitioned system for prohibiting I/O adapters from accessing memory assigned to other partitions during DMA
US6654906B1 (en) * 2000-06-08 2003-11-25 International Business Machines Corporation Recovery from instruction fetch errors in hypervisor code
US20020010811A1 (en) * 2000-06-08 2002-01-24 International Business Machines Corporation DMA windowing in an LPAR environment using device arbitration level to allow multiple IOAs per terminal bridge
US6973510B2 (en) * 2000-06-08 2005-12-06 International Business Machines Corporation DMA windowing in an LPAR environment using device arbitration level to allow multiple IOAs per terminal bridge
US20050055470A1 (en) * 2000-06-08 2005-03-10 Arndt Richard Louis DMA windowing in an LPAR environment using device arbitration level to allow multiple IOAs per terminal bridge
US7272671B2 (en) * 2000-11-16 2007-09-18 International Business Machines Corporation Means of control bits protection in a logical partition environment having a first and second distinct operating system
US7039692B2 (en) * 2001-03-01 2006-05-02 International Business Machines Corporation Method and apparatus for maintaining profiles for terminals in a configurable data processing system
US20020124194A1 (en) * 2001-03-01 2002-09-05 International Business Machines Corporation Method and apparatus to power off and/or reboot logical partitions in a data processing system
US6820207B2 (en) * 2001-03-01 2004-11-16 International Business Machines Corporation Method for rebooting only a specific logical partition in a data processing system as per a request for reboot
US6938114B2 (en) * 2001-03-01 2005-08-30 International Business Machines Corporation Method and apparatus for managing access to a service processor
US20030212873A1 (en) * 2002-05-09 2003-11-13 International Business Machines Corporation Method and apparatus for managing memory blocks in a logical partitioned data processing system
US6941436B2 (en) * 2002-05-09 2005-09-06 International Business Machines Corporation Method and apparatus for managing memory blocks in a logical partitioned data processing system
US20040064601A1 (en) 2002-09-30 2004-04-01 International Business Machines Corporation Atomic memory migration apparatus and method
US20040073766A1 (en) * 2002-10-10 2004-04-15 International Business Machines Corporation Method, apparatus and system for allocating and accessing memory-mapped facilities within a data processing system
US6829762B2 (en) * 2002-10-10 2004-12-07 International Business Machnies Corporation Method, apparatus and system for allocating and accessing memory-mapped facilities within a data processing system
US20050223127A1 (en) * 2004-03-31 2005-10-06 International Business Machines Corporation Logical memory tags for redirected DMA operations
US20060010276A1 (en) 2004-07-08 2006-01-12 International Business Machines Corporation Isolation of input/output adapter direct memory access addressing domains
US20060026383A1 (en) * 2004-07-31 2006-02-02 Dinechin Christophe De Method for efficient virtualization of physical memory in a virtual-machine monitor
US20060036830A1 (en) * 2004-07-31 2006-02-16 Dinechin Christophe De Method for monitoring access to virtual memory pages
US7330942B2 (en) * 2004-07-31 2008-02-12 Hewlett-Packard Development Company, L.P. Method for efficient virtualization of physical memory in a virtual-machine monitor
US20060212608A1 (en) 2005-02-25 2006-09-21 International Business Machines Corporation System, method, and computer program product for a fully trusted adapter validation of incoming memory mapped I/O operations on a physical adapter that supports virtual adapters or virtual resources
US20060212870A1 (en) * 2005-02-25 2006-09-21 International Business Machines Corporation Association of memory access through protection attributes that are associated to an access control level on a PCI adapter that supports virtualization
US20080168461A1 (en) * 2005-02-25 2008-07-10 Richard Louis Arndt Association of memory access through protection attributes that are associated to an access control level on a pci adapter that supports virtualization
US20060209724A1 (en) * 2005-02-28 2006-09-21 International Business Machines Corporation Method and system for fully trusted adapter validation of addresses referenced in a virtual host transfer request
US20060253682A1 (en) * 2005-05-05 2006-11-09 International Business Machines Corporation Managing computer memory in a computing environment with dynamic logical partitioning
US20060288187A1 (en) 2005-06-16 2006-12-21 International Business Machines Corporation Method and mechanism for efficiently creating large virtual memory pages in a multiple page size environment
US20070136554A1 (en) * 2005-12-12 2007-06-14 Giora Biran Memory operations in a virtualized system
US7617340B2 (en) * 2007-01-09 2009-11-10 International Business Machines Corporation I/O adapter LPAR isolation with assigned memory space
US20090037941A1 (en) * 2007-08-02 2009-02-05 International Business Machines Corporation Multiple partition adjunct instances interfacing multiple logical partitions to a self-virtualizing input/output device
US7752485B2 (en) * 2007-08-17 2010-07-06 International Business Machines Corporation Method and system for virtual removal of physical field replaceable units
US20090144731A1 (en) * 2007-12-03 2009-06-04 Brown Aaron C System and method for distribution of resources for an i/o virtualized (iov) adapter and management of the adapter through an iov management partition
US20090144508A1 (en) 2007-12-03 2009-06-04 Freimuth Douglas M PCI Express Address Translation Services Invalidation Synchronization with TCE Invalidation
US20090276605A1 (en) * 2008-05-05 2009-11-05 International Business Machines Corporation Retaining an Association Between a Virtual Address Based Buffer and a User Space Application that Owns the Buffer
US7908457B2 (en) * 2008-05-05 2011-03-15 International Business Machines Corporation Retaining an association between a virtual address based buffer and a user space application that owns the buffer
US20090307690A1 (en) * 2008-06-06 2009-12-10 International Business Machines Corporation Managing Assignment of Partition Services to Virtual Input/Output Adapters
US20090307438A1 (en) * 2008-06-06 2009-12-10 International Business Machines Corporation Automated Paging Device Management in a Shared Memory Partition Data Processing System
US20090307439A1 (en) * 2008-06-06 2009-12-10 International Business Machines Corporation Dynamic Control of Partition Memory Affinity in a Shared Memory Partition Data Processing System
US20090307445A1 (en) * 2008-06-06 2009-12-10 International Business Machines Corporation Shared Memory Partition Data Processing System With Hypervisor Managed Paging
US20090307440A1 (en) * 2008-06-06 2009-12-10 International Business Machines Corporation Transparent Hypervisor Pinning of Critical Memory Areas in a Shared Memory Partition Data Processing System
US20090307713A1 (en) * 2008-06-06 2009-12-10 International Business Machines Corporation Hypervisor-Based Facility for Communicating Between a Hardware Management Console and a Logical Partition
WO2009158178A2 (en) 2008-06-26 2009-12-30 Microsoft Corporation Direct memory access filter for virtualized operating systems
CN102077188A (en) 2008-06-26 2011-05-25 微软公司 Direct memory access filter for virtualized operating systems
US20090327643A1 (en) * 2008-06-27 2009-12-31 International Business Machines Corporation Information Handling System Including Dynamically Merged Physical Partitions
US20100125708A1 (en) * 2008-11-17 2010-05-20 International Business Machines Corporation Recursive Logical Partition Real Memory Map
US20100125709A1 (en) * 2008-11-17 2010-05-20 International Business Machines Corporation Logical Partition Memory
US20100161850A1 (en) 2008-12-24 2010-06-24 Sony Computer Entertainment Inc. Methods And Apparatus For Providing User Level DMA And Memory Access Management
US20130024648A1 (en) 2010-01-08 2013-01-24 International Business Machines Corporation Tlb exclusion range
US20120110275A1 (en) * 2010-10-27 2012-05-03 Ibm Corporation Supporting Virtual Input/Output (I/O) Server (VIOS) Active Memory Sharing in a Cluster Environment
US8458413B2 (en) * 2010-10-27 2013-06-04 International Business Machines Corporation Supporting virtual input/output (I/O) server (VIOS) active memory sharing in a cluster environment
US8914587B2 (en) * 2011-02-15 2014-12-16 Phison Electronics Corp. Multi-threaded memory operation using block write interruption after a number or threshold of pages have been written in order to service another request
US20120216188A1 (en) 2011-02-22 2012-08-23 Red Hat Israel, Ltd. Exposing a dma engine to guests in a virtual machine system
US20150143037A1 (en) 2011-04-06 2015-05-21 P4tents1, LLC System, method and computer program product for multi-thread operation involving first memory of a first memory class and second memory of a second memory class
US20150205738A1 (en) * 2014-01-17 2015-07-23 International Business Machines Corporation Controlling direct memory access page mappings

Non-Patent Citations (10)

* Cited by examiner, † Cited by third party
Title
Anonymous, "A Method for Direct Memory Access (DMA)," An IP.com Prior Art Database Technical Disclosure, IP.com No. IPCOM000127437D, Aug. 30, 2005. http://ip.com/IPCOM/000127437.
Anonymous, "DMA aware user space memory buffer allocation," An IP.com Prior Art Database Technical Disclosure, IP.com No. IPCOM000225155D, Jan. 28, 2013. http://ip.com/IPCOM/000225155.
Bates et al., "Controlling Direct Memory Access Page Mappings", U.S. Appl. No. 14/157,593, filed Jan. 17, 2014.
'HMC History Lesson and Firmware Overview' from IBM Systems Magazine, archived from Jun. 2, 2012. *
IBM, "Logical Partition Security in the IBM eServer pSeries 690", First Edition, Feb. 15, 2002, © International Business Machines Corporation, 2002.
'Network Virtualization-Breaking the Performance Barrier' by Scott Rixner, Jan./Feb. 2008, ACM Queue. *
Unknown, "Device driver," Wikipedia, last modified Dec. 20, 2013 https://web.archive.org/web/20131230094253/http://en.wikipedia.org/wiki/Device-driver.
Unknown, "Intel Itanium Architecture Software Developer's Manual," Revision 2.3, vol. 2: System Architecture, May 2010, p. 2:55, Copyright © 1999-2010, Intel Corporation.
Unknown, "Virtual memory," Wikipedia, last modified Dec. 12, 2014 http://en.wikipedia.org/wiki/Virtual-memory.
Wikipedia article on 'Device Driver' archived from Dec. 30, 2013. *

Also Published As

Publication number Publication date
US9639478B2 (en) 2017-05-02
US20150205738A1 (en) 2015-07-23
US20150205729A1 (en) 2015-07-23

Similar Documents

Publication Publication Date Title
US9367478B2 (en) Controlling direct memory access page mappings
US10546361B2 (en) Unified memory systems and methods
US9671970B2 (en) Sharing an accelerator context across multiple processes
US9436537B2 (en) Enhanced restart of a core dumping application
US7290112B2 (en) System and method for virtualization of processor resources
JP2008165789A (en) Guest to host address translation for device to access memory in partitioned system
US9875132B2 (en) Input output memory management unit based zero copy virtual machine to virtual machine communication
US10061701B2 (en) Sharing of class data among virtual machine applications running on guests in virtualized environment using memory management facility
US20140310484A1 (en) System and method for globally addressable gpu memory
US10983833B2 (en) Virtualized and synchronous access to hardware accelerators
US10365825B2 (en) Invalidation of shared memory in a virtual environment
US20160188251A1 (en) Techniques for Creating a Notion of Privileged Data Access in a Unified Virtual Memory System
US10430221B2 (en) Post-copy virtual machine migration with assigned devices
US20170003893A1 (en) Memory state indicator
US10884946B2 (en) Memory state indicator check operations
US8984179B1 (en) Determining a direct memory access data transfer mode
US10185679B2 (en) Multi-queue device assignment to virtual machine groups
US20190377671A1 (en) Memory controller with memory resource memory management
US10656834B2 (en) Transparent CAPI exploitation for legacy application
US20170322889A1 (en) Computing resource with memory resource memory management

Legal Events

Date Code Title Description
AS Assignment

Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW Y

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:BATES, CARY L.;HELGESON, LEE N.;KING, JUSTIN K.;AND OTHERS;SIGNING DATES FROM 20140113 TO 20140115;REEL/FRAME:032928/0360

STCF Information on status: patent grant

Free format text: PATENTED CASE

FEPP Fee payment procedure

Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

LAPS Lapse for failure to pay maintenance fees

Free format text: PATENT EXPIRED FOR FAILURE TO PAY MAINTENANCE FEES (ORIGINAL EVENT CODE: EXP.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

STCH Information on status: patent discontinuation

Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362

FP Expired due to failure to pay maintenance fee

Effective date: 20200614