CN101751284B - I/O resource scheduling method for distributed virtual machine monitor - Google Patents

I/O resource scheduling method for distributed virtual machine monitor Download PDF

Info

Publication number
CN101751284B
CN101751284B CN 200910243088 CN200910243088A CN101751284B CN 101751284 B CN101751284 B CN 101751284B CN 200910243088 CN200910243088 CN 200910243088 CN 200910243088 A CN200910243088 A CN 200910243088A CN 101751284 B CN101751284 B CN 101751284B
Authority
CN
China
Prior art keywords
virtual
resource
interrupt
instruction
dvmm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN 200910243088
Other languages
Chinese (zh)
Other versions
CN101751284A (en
Inventor
肖利民
李卓
姜兆龙
陈思名
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN 200910243088 priority Critical patent/CN101751284B/en
Publication of CN101751284A publication Critical patent/CN101751284A/en
Application granted granted Critical
Publication of CN101751284B publication Critical patent/CN101751284B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention relates to an I/O resource scheduling model for a distributed virtual machine monitor. The model is implemented by the following four steps: 1, detecting a physical resource stage; 2, establishing a global virtual I/O bitmap; 3, processing I/O instructions; and 4, interrupting virtualization. The model provided by the invention implements direct access of the distributed virtual machine monitor to equipment under a single system image, provides a complete I/O resource scheduling scheme, improves the utilization efficiency of I/O resources, and has good use value and broad development prospect.

Description

A kind of I/O resource regulating method of distributed virtual machine monitor
(1) technical field
The present invention relates to field of computer technology, be specifically related to the virtual field of I/O in the computer virtualized technology, refer in particular to a kind of I/O resource regulating method of distributed virtual machine monitor.Adopt up-to-date hardware virtualization technology, integrate the I/O resource that is distributed in many main frames, make up an overall virtual i/o space, for upper strata client operating system scheduling I/O resource, realize that client operating system is to global administration and the use of distributed I/O resource.
(2) background technology
Virtual is the term of a broad sense, calculates and process on virtual basis rather than real basis is moved typically referring to aspect the computing machine.Computer virtualized technology can abstract machine physical resource, the isolation hardware and software.Utilize Intel Virtualization Technology a group of planes can be abstracted into the virtual machine with shared drive feature, support the shared drive programming mode, thereby overcome the deficiency of a group of planes.
In recent years, large-scale collaborative computer operation becomes the important method that improves the computing machine whole capability.Large-scale computer system such as a group of planes (cluster), exist a large amount of I/O resources, effectively these I/O resources of management and use can improve the overall performance of system, reduce the I/O of the system wasting of resources, reduce simultaneously the operation and maintenance cost of system, improve system availability.
Distributed virtual machine monitor (DVMM) is the Network of Workstation of single system mapping, this system is with symmetric multiprocessor (Symmetric Multi-Processors, SMP) be node, and client had the completely transparency, simplify monitor of virtual machine (VMM) design, lower virtualization overhead is arranged.
Simultaneously, the Intel Virtualization Technology of I/O resource has also obtained fast development, each major company and colleges and universities are also launching cooperation and competition aspect the I/O device virtualization, as now commercial VMware software, Xen software and also rest on Virtual Multiprocessor and the vNUMA project of laboratory stage.The two large Intel of processor manufacturer and AMD make respectively the virtualized technical manual of hardware supported I/O simultaneously, comprise I/O memory management unit (IOMMU) technology of the direct I/O Intel Virtualization Technology of Intel (VT-d) and AMD.The I/O Intel Virtualization Technology can better be realized large-scale I/O resource management, distribution and scheduling.The I/O Intel Virtualization Technology has also obtained very large popularization in extensive I/O resource management, scheduling.So that enterprise, colleges and universities or research institute can better utilize large-scale computer network to produce and research to be engaged in the situation that needn't arrange a large amount of physical computers.
VT-d is for virtualization solution provides hardware supported, for I/O equipment provides brand-new virtual support, can help the safety and reliability of final user's improved system, and promotes the performance of I/O equipment in the virtual environment.
The VT-d of Intel carries out direct memory access (DMA) by limiting device for pre-assigned territory or physical memory zone and protects.This is to realize by the hardware capabilities that a kind of DMA of being called remaps.VT-d DMA in the chipset remaps hardware logic between the physical memory of the peripheral hardware I/O equipment of supporting DMA and computing machine, and its programming is finished by computer system software.In virtualized environment, system software is VMM.In not having the home environment of virtualization software, system software is exactly the local operation system.It is corresponding physical memory addresses that DMA remaps the address translation that the DMA that receives can be asked, and according to the information that system software provides the license of accessing this physical address is checked.
At present, the I/O Intel Virtualization Technology is to help more reasonably Resources allocation of server by both direction at present: a direction is " dividing ", namely the I/O equipment of a physics is invented several independently logical I/O equipment, can allow a plurality of client operating systems use synchronously these logical I/O equipment; Another direction is " closing ", be exactly several dispersions physical I/O equipment is virtual in the input/output space of an overall situation, allow client operating system seem that all virtual logical I/O equipment are on the same system bus.Business software VMware and the project Xen that increases income mainly concentrate on the direction of " dividing ", and the Virtual Multiprocessor software of Tokyo University and the vNUMA project of University of New South Wales mainly concentrate on the direction of " closing ".
VMware and Xen adopt respectively fully virtualized (Full-Virtualization) and half virtual (Para-Virtualization) technology.Fully virtualized is to adopt the complete analog physical I/O of software equipment in virtual level, but all can cause the execution of every privileged instruction and the switching of user model and privileged mode to cause the I/O access efficiency low.Half Intel Virtualization Technology is by revising client operating system (Guest Operation System is called for short GOS) source code, and (Hypercall) replaces privileged instruction with hypercalls, allows instruction as much as possible run directly on the processor.Although this mode has improved the efficient of I/O access, needs the Guest operating system code is revised on a large scale.VMware all is that virtual single physical I/O equipment becomes a plurality of logical I/O equipment with Xen, and is different from direction of the present invention, but it can be used for reference for client operating system provides the mode of virtual i/o facility information.
Virtual Multiprocessor is based on the distributed VMM of an IA-32 group of planes, has adopted half virtualized technology, by the modification to client operating system it is cooperated with Virtual Multiprocessor and finishes the work.VirtualMultiprocessor runs on user's attitude, and the I/O equipment of client operating system all leaves in the host operating system with the form of file.If will conduct interviews to physical I/O equipment, then need to carry out the system call of host operating system, thereby inefficiency.Whole project is still at the experimental stage.
VNUMA is based on the distributed VMM of an IA-64 group of planes, and it directly runs on the hardware layer.Client operating system (Linux) cooperates with vNUMA by half virtualized mode.The main target of vNUMA is that the Distributed sharing storage that provides transparent is used for science and calculates.VNUMA can only carry out the I/O operation to starter node, has greatly limited its utilization factor to a large amount of I/O resources, so that whole system I/O ability is low.Still be at present laboratory stage.
In sum, Virtual Multiprocessor, vNUMA and the present invention towards the direction of " closing ", adopt the method for " dividing " in physics real hardware scheduling of resource on software.Realization mechanism and the relative merits thereof of the cross-node I/O access by analyzing Virtual Multiprocessor and vNUMA, the correlation technique of simultaneously and access device rendered in conjunction with virtual i/o in the Xen software of increasing income, introduce the VT-d technology, realize globalize scheduling and the management of distributed I/O resource.
(3) summary of the invention
The object of the present invention is to provide a kind of I/O resource regulating method of distributed virtual machine monitor, the Intel Virtualization Technology that it mainly adopts up-to-date hardware and software to combine, in conjunction with the Distributed sharing storage algorithm, present single shared physical address space, realize distributed virtual machine monitor direct access to equipment under single system mapping, improve the I/O level of resources utilization.
The I/O resource regulating method of a kind of distributed virtual machine monitor of the present invention, Network of Workstation based on single system mapping, this system is with symmetric multiprocessor (Symmetric Multi-Processors, be called for short SMP) be node, provide completely single system mapping (Single System Image to client, be called for short SSI) feature, and client had the completely transparency, simplify the VMM design, making the business-like operating system of supporting the SMP structure need not to revise namely may operate in this virtual machine, lower virtualization overhead is arranged, and such Network of Workstation is a kind of DVMM.
I/O scheduling of resource model belongs to the I/O virtualization modules among the DVMM.The I/O virtualization modules realizes the integration of distributed I/O resource by the I/O Intel Virtualization Technology, and the virtual i/o environment of the overall situation is provided for client operating system.Client operating system does not need to make any modification simultaneously.
The auxiliary of other modules of DVMM used in the realization of system, comprises the communication module of distributed shared memory (DSM) technology, DVMM and the Intel Virtualization Technology (VT-x) on the x86 of the Intel platform.DSM is the technology that the internal memory virtualization module of DVMM realizes, communication module is the communication mechanism of striding physical node that is based upon on the bare machine, and the VT-x technology is the virtual applied technology of the instruction set in the instruction processing module, specifically asks for an interview the Intel technical manual.
What Fig. 1 represented is the I/O resource regulating method model one-piece construction synoptic diagram of DVMM.
Take two nodes as example, it is the host node of virtual machine that one is selected on left side or right side among the figure, another be virtual machine from node.The DVMM bottom is the physical node that links to each other by High speed network.The level of Xen is by following module composition: DSM and bottom communication module; The I/O request is processed in the node of privileged domain (Domain 0), cross-node I/O request is processed and equipment simulating is module and the scheduling of resource module that system realizes.The scheduling of resource module comprises, virtual four modules of resource detection, global I/O bitmap, instruction process and interruption, and wherein, the resource detection module is responsible for integrating the hardware physical resource of each node; Global I/O bitmap is set up client's physical address to the mapping of machine address; Command process module is processed the I/O instruction that DVMM intercepts and captures; Interrupt the hardware interrupts under the virtualization modules processing distributed environment; By the perception of resource detection module, the I/O instruction that client CPU is sent or the bottom physical discontinuity that receives guarantee the security of address access to DVMM by the I/O bitmap to the scheduling of I/O resource.
Fig. 2 has described I/O resource detection module:
This module is responsible for surveying the physical resource of each node.The resource that provides DVMM to use comprises: processor, and internal memory, storage medium, nucleus equipment (being used for instruction simulation), external unit, bus is interrupted etc., and wherein with external unit, bus interrupts being referred to as the I/O resource.Start and initialization by system, integrate the I/O resource of all nodes, for expansion Basic Input or Output System (BIOS) (eBIOS) is prepared in the startup of client operating system, by the log-on message page, all preliminary works when shared information page and global I/O bitmap provide normal the startup for client operating system.
Fig. 3 describes the processing procedure of I/O instruction in detail.
The bottom is hardware layer, represents real physical hardware resources; Be the DVMM layer on it, this one deck is disposed distributed virtual machine monitor; Be the visible resource layer again, this is a virtual level, presents the I/O resource that GOS can see here, comprises I/O port and I/O internal memory etc.; Be the GOS layer topmost, operation client operating system and application program.
Wherein, global physical address has recorded the map information of virtual system physical address to machine address to machine address mapping table (P2M table); Global I/O bitmap has recorded the corresponding relation of all node machine machine I/O device port addresses (host's node) to virtual machine virtual i/o device port address, adopts fixedly mapped mode; Input output request structure (IOREQ) delivery and reception module is mainly used in the I/O virtualization modules and accepts and transmission by communication module; The virtual machine I/O port address that port update module record obtains from above each block search is to the direct relation of node machine machine I/O port address.
I/O equipment control among the DVMM in the software simulation situation, used for reference simulation softward QEMU to equipment simulating method, directly needing oneself registration virtual bus in the access situation, real equipment is mounted on this bus, finishes the root complex to the adapter of bus transaction (such as the PCI affairs).In addition, also be responsible for the most basic I/O management of VMM, such as the scheduling of the registration of the initialization of each node I/O equipment, I/O equipment and unloading, I/O equipment, the I/O equipment mapping to the I/O internal memory, these all are the most basic I/O management, other modules of main reference DVMM are to the demand of I/O, and DVMM normally moves the most basic required I/O function etc.;
Fig. 4 has described and has interrupted virtualized processing procedure.
Interrupt the interruption that virtualization modules can respond physical equipment and virtual unit generation.Support to realize that interruption is virtual on the platform that VT-d supports to interrupt remapping, the virtual client operating system that refers to is not here directly processed bus transaction, interrupt passing to the interruption that the GOS virtual cpu uses by actual physical hardware but process through message signale, the client is interrupted processing function and is seen it being interrupt source on the virtual bus, and what in fact process is that actual physical is interrupted.
Striding the processor interruption is to write data by virtual processor to interrupt register (ICR) between the processor of virtual local Advanced Programmable Interrupt Controllers APICs (Local APIC) to cause, and is caught by DVMM thereby this operation can produce to skip leaf.DVMM locates the register of its access by the parse operation number.
In sum, the present invention is a kind of I/O resource regulating method of distributed virtual machine monitor, and the method specific implementation step is as follows:
Step 1, detection Physics resource stage:
1) the .DVMM prototype system starts, from each node Basic Input or Output System (BIOS) (BIOS) Power-On Self-Test, till packet preparation transmission between node.
2) set up the communications infrastructure, set up protocol stack, for the VMM that is distributed on the different physical machines provides communication service, for realizing striding physical machine I/O access, striding physical machine and interrupt the function such as processings the communication support is provided of DVMM.
3) read the resource information table that BIOS makes up, the reserved part resource mainly is memory source, is used for storing all kinds of control informations, reads the ACPI table, resolves DMAR and DHRD structure, and each node DSS is set.Finish the initialization of VT-d hardware environment.
4). this node can be informed other nodes for the resource information of operating system (OS), collect the resource information of other nodes;
5). after collecting global resource information, integrate all kinds of resources, make up relevant virtual resource information table, prepare to make up global I/O bitmap, to process I/O virtual by software simulation equipment or by the auxiliary structure that remaps of hardware.
6) communication module setting completed the report, setting completed in the DVMM initialization.
7). load the OS bootstrap module, begin to start client operating system (GOS);
8). intercept and capture the BIOS interrupt call that OS sends;
9). the information of BIOS interrupt call is passed to the resource detection module, and virtual resource information is read in simulation BIOS break in service from the resource information table;
10). the virtual hardware configuration information is reported to GOS.
Step 2, set up overall virtual i/o bitmap:
Set up the virtual I/O bitmap of the overall situation among the DVMM, for upper strata GOS improves single physical address space.
1) distributes the I/O bitmap.For the I/O resource of the overall situation, adopt two kinds of allocative decisions, static allocation and dynamic assignment.After static allocation refers to that DVMM starts, before GOS starts, for GOS distributes the I/O resource, comprise the device resource of software simulation and the hardware resource that hardware is directly accessed.After dynamic assignment refers to that GOS starts, specify the directly I/O resource of access by user's attitude program, be that with the former difference kernel loads with modular manner, need to safeguard a super investigation that causes the injection overall situation virtual i/o bitmap of level of privilege conversion.
2) secondary arbitration mechanism.I/O internal memory address pointed needs to make up the P2M table of the overall situation when finishing client's physics to the machine address conversion, guarantee the security of memory address; The node machine local carries out client's physics to machine address when conversion, needs to make up unit P2M and shows to guarantee that VT-d hardware can carry out safely that DMA remaps and interruption remaps.
Step 3, processing I/O instruction
1) when client operating system after causing that from node virtual machine withdraws from, the I/O virtualization modules is divided into three classes with this I/O instruction and processes respectively: the non-memory access class of a. I/O instruction.I.e. input (IN), instructions of output (OUT) these two kinds access I/O port, they are access system internal memory not in the process of implementation; B. memory access class I/O instruction.Be INS, this two kinds of character string IO instruction of OUTS, and the instruction in access memory mapping IO (MMIO) zone, they are the access system internal memory in the process of implementation; C. direct memory access (DMA) instruction.The DMA steering order is character string IO instruction, and there are a large amount of accessing operations in DMA when carrying out.
2) intercept and capture the I/O instruction, inquiry I/O port mapping table judges that I/O address drops on this locality or remote node.
3) instruction is carried out.After the port mapping, if the decision instruction executing location in this locality, by the VT-d technology, is carried out the address by unit P2M table and is remapped, and perhaps gives the QEMU software simulation and carries out.If long-range, be packaged in the IOREQ structure, send, remote node receives IOREQ by communication module.
4) result returns.By the instruction after the software execution, or the I/O address variation of hardware physical equipment state variation generation, need notice to GOS.DVMM middle port update module is implemented this function.
Step 4, interrupt virtual:
1) DVMM bottom physical equipment interrupt is issued physical I/O APIC, is converted into interrupt vector and issues localAPIC
2) local APIC is injected into interruption among the CPU.
3) CPU jumps to the processing function of interrupt-descriptor table (IDT) table indication, the interrupt service register of Local APIC (ISR) set, and shielding is interrupted.
4) function is processed in the interruption of carrying out VMM, determines whether directly to distribute to the interruption that the physical equipment of client computer produces.Re-direction table clauses and subclauses (RTE) mask bit of glove reason I/O APIC is write End of Interrupt register (EOI) to physics local APIC, removes the ISR shielding.
5) virtual i/o APIC receive interruption calls virtual local APIC interface function, and shielding is interrupted.
6) judge whether to send out and interrupt (IPI) between processor interrupting being injected in the remote cpu going, this operation produces skips leaf, and is caught by DVMM, behind the parse operation number, locates the register of its access.If interrupt register (ICR) between processor, the then operation of the IPI among the analog D VMM.If to the write operation of LDR or DFR register, then after carrying out the local update simulation, the transmission renewal function of calling data bag sends logic destination register (LDR) or purpose format register (DFR) lastest imformation arrives every other server.
7) remote node is by virtual machine control structure (VMCS), and virtual localAPIC is injected into GOS with interruption, and GOS carries out and interrupts processing function.
8) GOS writes EOI to virtual local, and DVMM intercepts and captures this operation, removes virtual localAPIC interrupt mask, and notice GOS finishes processing, and DVMM removes physical I/O APIC RTE interrupt mask bit.
Every code name in the foregoing is unitedly described as follows:
BIOS Basic Input or Output System (BIOS) eBIOS expands Basic Input or Output System (BIOS)
OS operating system GOS client operating system
ACPI ACPI DMAR DMA remaps report
DHRD DMA remaps hardware cell definition DSS equipment scope structure
The direct I/O Intel Virtualization Technology DMA of VT-d Intel direct memory access
The P2M physical address is to machine address mapping IOREQ I/O appealing structure
The local Advanced Programmable Interrupt Controllers APICs of the senior I/O programmable interrupt controller of I/O APIC Local APIC
VMCS virtual machine control structure ISR interrupt service register
Interrupt interrupt register between the ICR processor between the IPI processor
EOI End of Interrupt register LDR logic destination register
DFR purpose format register IDT interrupt-descriptor table
RTE re-direction table clauses and subclauses VMCS virtual machine control structure
Advantage of the present invention and effect: by the I/O Intel Virtualization Technology of assisting in conjunction with means and the hardware of software repeated usage simulation, the present invention provides complete I/O resource scheduling scheme for distributed system, realized that creatively virtual machine has improved the virtualized robustness of the I/O in the distributed system to the direct access of true physical equipment in the distributed system.Along with the develop rapidly of hardware device, the direct access to physical hardware under the distributed virtual machine monitor environment will greatly improve the utilization ratio of device resource.Run in the distributed type colony system efficient, secure access I/O resource with making business-like operating system and application software cross-node.To sum up, the present invention has good use value and vast potential for future development.
(4) description of drawings
Fig. 1 I/O resource regulating method of the present invention model one-piece construction synoptic diagram
Fig. 2 I/O resource detection of the present invention module diagram
Fig. 3 I/O command process module of the present invention synoptic diagram
Virtual synoptic diagram is interrupted in Fig. 4 the present invention
(5) embodiment
See Fig. 1, Fig. 2, Fig. 3, shown in Figure 4, the implementation step is as follows:
Step 1, detection Physics resource stage:
1) the .DVMM prototype system starts, from each node Basic Input or Output System (BIOS) (BIOS) Power-On Self-Test, till packet preparation transmission between node.
2) set up the communications infrastructure, set up protocol stack, for the VMM that is distributed on the different physical machines provides communication service, for realizing striding physical machine I/O access, striding physical machine and interrupt the function such as processings the communication support is provided of DVMM.
3) read the resource information table that BIOS makes up, the reserved part resource mainly is memory source, is used for storing all kinds of control informations, reads the ACPI table, resolves DMAR and DHRD structure, and each node DSS is set.Finish the initialization of VT-d hardware environment.
4). this node can be informed other nodes for the resource information of operating system (OS), collect the resource information of other nodes;
5). after collecting global resource information, integrate all kinds of resources, make up relevant virtual resource information table, prepare to make up global I/O bitmap, to process I/O virtual by software simulation equipment or by the auxiliary structure that remaps of hardware.
6) communication module setting completed the report, setting completed in the DVMM initialization.
7). load the OS bootstrap module, begin to start client operating system (GOS);
8). intercept and capture the BIOS interrupt call that OS sends;
9). the information of BIOS interrupt call is passed to the resource detection module, and virtual resource information is read in simulation BIOS break in service from the resource information table;
10). the virtual hardware configuration information is reported to GOS.
Step 2, set up overall virtual i/o bitmap:
Set up the virtual I/O bitmap of the overall situation among the DVMM, for upper strata GOS improves single physical address space.
1) distributes the I/O bitmap.For the I/O resource of the overall situation, adopt two kinds of allocative decisions, static allocation and dynamic assignment.After static allocation refers to that DVMM starts, before GOS starts, for GOS distributes the I/O resource, comprise the device resource of software simulation and the hardware resource that hardware is directly accessed.After dynamic assignment refers to that GOS starts, specify the directly I/O resource of access by user's attitude program, be that with the former difference kernel loads with modular manner, need to safeguard a super investigation that causes the injection overall situation virtual i/o bitmap of level of privilege conversion.
2) secondary arbitration mechanism.I/O internal memory address pointed needs to make up the P2M table of the overall situation when finishing client's physics to the machine address conversion, guarantee the security of memory address; The node machine local carries out client's physics to machine address when conversion, needs to make up unit P2M and shows to guarantee that VT-d hardware can carry out safely that DMA remaps and interruption remaps.
Step 3, processing I/O instruction
1) when client operating system after causing that from node virtual machine withdraws from, the I/O virtualization modules is divided into three classes with this I/O instruction and processes respectively: the non-memory access class of a. I/O instruction.I.e. input (IN), instructions of output (OUT) these two kinds access I/O port, they are access system internal memory not in the process of implementation; B. memory access class I/O instruction.Be INS, this two kinds of character string IO instruction of OUTS, and the instruction in access memory mapping IO (MMIO) zone, they are the access system internal memory in the process of implementation; C. direct memory access (DMA) instruction.The DMA steering order is character string IO instruction, and there are a large amount of accessing operations in DMA when carrying out.
2) intercept and capture the I/O instruction, inquiry I/O port mapping table judges that I/O address drops on this locality or remote node.
3) instruction is carried out.After the port mapping, if the decision instruction executing location in this locality, by the VT-d technology, is carried out the address by unit P2M table and is remapped, and perhaps gives the QEMU software simulation and carries out.If long-range, be packaged in the IOREQ structure, send, remote node receives IOREQ by communication module.
4) result returns.By the instruction after the software execution, or the I/O address variation of hardware physical equipment state variation generation, need notice to GOS.DVMM middle port update module is implemented this function.
Step 4, interrupt virtual:
1) DVMM bottom physical equipment interrupt is issued physical I/O APIC, is converted into interrupt vector and issues localAPIC
2) local APIC is injected into interruption among the CPU.
3) CPU jumps to the processing function of interrupt-descriptor table (IDT) table indication, the interrupt service register of Local APIC (ISR) set, and shielding is interrupted.
4) function is processed in the interruption of carrying out VMM, determines whether directly to distribute to the interruption that the physical equipment of client computer produces.Re-direction table clauses and subclauses (RTE) mask bit of glove reason I/O APIC is write End of Interrupt register (EOI) to physics local APIC, removes the ISR shielding.
5) virtual i/o APIC receive interruption calls virtual local APIC interface function, and shielding is interrupted.
6) judge whether to send out and interrupt (IPI) between processor interrupting being injected in the remote cpu going, this operation produces skips leaf, and is caught by DVMM, behind the parse operation number, locates the register of its access.If interrupt register (ICR) between processor, the then operation of the IPI among the analog D VMM.If to the write operation of LDR or DFR register, then after carrying out the local update simulation, the transmission renewal function of calling data bag sends logic destination register (LDR) or purpose format register (DFR) lastest imformation arrives every other server.
7) remote node is by virtual machine control structure (VMCS), and virtual local APIC is injected into GOS with interruption, and GOS carries out and interrupts processing function.
8) GOS writes EOI to virtual local, and DVMM intercepts and captures this operation, removes virtual local APIC interrupt mask, and notice GOS finishes processing, and DVMM removes physical I/OAPIC RTE interrupt mask bit.

Claims (1)

1. the I/O resource regulating method of a distributed virtual machine monitor, the implementation step is as follows:
Step 1, detection Physics resource stage:
1) the .DVMM prototype system starts, from each node Basic Input or Output System (BIOS) Power-On Self-Test, till packet preparation transmission between node;
2). set up the communications infrastructure, set up protocol stack, for the VMM that is distributed on the different physical machines provides communication service, for realizing striding physical machine I/O access, striding physical machine and interrupt processing communication support being provided of DVMM;
3) read the resource information table that BIOS makes up, the reserved part memory source is used for storing all kinds of control informations, read the ACPI table, parsing DMA remaps report and DMA remaps hardware cell structure, and each node device scope structure DSS is set, and finishes the initialization of VT-d hardware environment;
4). this node can be informed other nodes for the resource information of operating system, collect the resource information of other nodes;
5). after collecting global resource information, integrate all kinds of resources, make up relevant virtual resource information table, prepare to make up global I/O bitmap, to process I/O virtual by software simulation equipment or by the auxiliary structure that remaps of hardware;
6) communication module setting completed the report, setting completed in the DVMM initialization;
7). load the OS bootstrap module, begin to start client operating system;
8). intercept and capture the BIOS interrupt call that OS sends;
9). the information of BIOS interrupt call is passed to the resource detection module, and virtual resource information is read in simulation BIOS break in service from the resource information table;
10). the virtual hardware configuration information is reported to client operating system GOS;
Step 2, set up overall virtual i/o bitmap:
Set up the virtual I/O bitmap of the overall situation among the DVMM, for upper strata GOS improves single physical address space;
1) distributes the I/O bitmap: for the I/O resource of the overall situation, adopt two kinds of allocative decisions, static allocation and dynamic assignment, after static allocation refers to that DVMM starts, before GOS starts, for GOS distributes the I/O resource, comprise the device resource of software simulation and the hardware resource that hardware is directly accessed; After dynamic assignment refers to that GOS starts, specify the directly I/O resource of access by user's attitude program, be that with the former difference kernel loads with modular manner, need to safeguard a super investigation that causes the injection overall situation virtual i/o bitmap of level of privilege conversion;
2) secondary arbitration mechanism: I/O internal memory address pointed is when finishing client's physics to the machine address conversion, and the physical address that needs to make up the overall situation guarantees the security of memory address to machine address mapping table P2M table; The node machine local carries out client's physics to machine address when conversion, needs to make up unit P2M and shows to guarantee that VT-d hardware can carry out safely that DMA remaps and interruption remaps;
Step 3, processing I/O instruction
1) when client operating system after causing that from node virtual machine withdraws from, the I/O virtualization modules is divided into three classes with this I/O instruction and processes respectively: a, non-memory access class I/O instruction: i.e. input, export the instructions of these two kinds access I/O ports, they are access system internal memory not in the process of implementation; B, memory access class I/O instruction: i.e. INS, this two kinds of character string I/O instruction of OUTS, and the instruction in access memory mapping I/O zone, they are the access system internal memory in the process of implementation; C, direct memory access instruction; The DMA steering order is character string I/O instruction, and there are a large amount of accessing operations in DMA when carrying out;
2) intercept and capture the I/O instruction, inquiry I/O port mapping table judges that I/O address drops on this locality or remote node;
3) instruction is carried out: after the port mapping, if the decision instruction executing location in this locality, by the VT-d technology, is carried out the address by unit P2M table and is remapped, and perhaps gives the QEMU software simulation and carries out; If long-range, be packaged in the IOREQ structure, send, remote node receives IOREQ by communication module;
4) result returns: the instruction after being carried out by software, or the I/O address variation of hardware physical equipment state variation generation need notice to GOS; DVMM middle port update module is implemented this function;
Step 4, interrupt virtual:
1) DVMM bottom physical equipment interrupt is issued physical I/O APIC, is converted into interrupt vector and issues local APIC;
2) localAPIC is injected into interruption among the CPU;
3) CPU jumps to the processing function of interrupt-descriptor table indication, the interrupt service register set of Local APIC, and shielding is interrupted;
4) function is processed in the interruption of carrying out VMM, determines whether directly to distribute to the interruption that the physical equipment of client computer produces, and the re-direction table clauses and subclauses mask bit of glove reason I/O APIC is write the End of Interrupt register to physics local APIC, removes the ISR shielding;
5) virtual i/o APIC receive interruption calls virtual local APIC interface function, and shielding is interrupted;
6) judge whether interrupting being injected in the remote cpu going, send out between processor and interrupt, this operation produces skips leaf, caught by DVMM, behind the parse operation number, locate the register of its access, if interrupt register between processor, then interrupt operation between the processor among the analog D VMM (IPI operation); If to the write operation of LDR or DFR register, then after carrying out the local update simulation, the transmission renewal function of calling data bag sends logic destination register or purpose format register lastest imformation to every other server;
7) remote node is by virtual machine control structure, and virtual local APIC is injected into GOS with interruption, and GOS carries out and interrupts processing function;
8) GOS writes EOI to virtual local, and DVMM intercepts and captures this operation, removes virtual local APIC interrupt mask, and notice GOS finishes processing, and DVMM removes physical I/O APIC RTE interrupt mask bit.
CN 200910243088 2009-12-25 2009-12-25 I/O resource scheduling method for distributed virtual machine monitor Expired - Fee Related CN101751284B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 200910243088 CN101751284B (en) 2009-12-25 2009-12-25 I/O resource scheduling method for distributed virtual machine monitor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 200910243088 CN101751284B (en) 2009-12-25 2009-12-25 I/O resource scheduling method for distributed virtual machine monitor

Publications (2)

Publication Number Publication Date
CN101751284A CN101751284A (en) 2010-06-23
CN101751284B true CN101751284B (en) 2013-04-24

Family

ID=42478298

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 200910243088 Expired - Fee Related CN101751284B (en) 2009-12-25 2009-12-25 I/O resource scheduling method for distributed virtual machine monitor

Country Status (1)

Country Link
CN (1) CN101751284B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112099941A (en) * 2016-08-26 2020-12-18 华为技术有限公司 Method, equipment and system for realizing hardware acceleration processing

Families Citing this family (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101923507B (en) * 2010-07-30 2012-09-26 华中科技大学 Universal virtual machine monitoring system based on driving
DE112011105745T5 (en) * 2011-10-21 2014-08-14 Hewlett-Packard Development Company, L.P. Providing a function of a basic data exchange system (BIOS) in a privileged domain
JP5881835B2 (en) 2011-10-21 2016-03-09 ヒューレット−パッカード デベロップメント カンパニー エル.ピー.Hewlett‐Packard Development Company, L.P. Web-based interface to access basic input / output system (BIOS) functionality
CN103270491B (en) * 2011-12-06 2016-12-21 华为技术有限公司 A kind of hardware resource protection method and system and virtual machine manager
CN102521054B (en) * 2011-12-15 2013-07-17 中国人民解放军国防科学技术大学 DMA (direct memory access) resource allocation method for virtual machine under sun4v architecture
CN103179048B (en) * 2011-12-21 2016-04-13 中国电信股份有限公司 Main frame qos policy transform method and the system of cloud data center
CN103514222B (en) * 2012-06-29 2017-09-19 无锡江南计算技术研究所 Storage method, management method, memory management unit and the system of virtual machine image
CN102799465B (en) * 2012-06-30 2015-05-27 华为技术有限公司 Virtual interrupt management method and device of distributed virtual system
US9697031B2 (en) 2013-10-31 2017-07-04 Huawei Technologies Co., Ltd. Method for implementing inter-virtual processor interrupt by writing register data in a single write operation to a virtual register
CN103559087B (en) * 2013-10-31 2017-11-28 华为技术有限公司 Implementation method, relevant apparatus and the system of a kind of interruption between virtual processor
CN106062717B (en) * 2014-11-06 2019-05-03 华为技术有限公司 A kind of distributed storage dubbing system and method
WO2016090577A1 (en) * 2014-12-10 2016-06-16 华为技术有限公司 Computer and device accessing method
CN106200448B (en) * 2015-05-09 2019-02-22 精航伟泰测控仪器(北京)有限公司 A kind of long-range mapped system of industry interface implementation
CN106484031A (en) * 2015-08-26 2017-03-08 鸿富锦精密工业(深圳)有限公司 Server management system and method
CN106844258B (en) * 2015-12-03 2019-09-20 华为技术有限公司 Heat addition CPU enables the method and server system of x2APIC
CN106302628B (en) * 2015-12-29 2019-12-27 北京典赞科技有限公司 Unified management scheduling method for computing resources in ARM architecture network cluster
CN106990998B (en) * 2016-01-21 2020-10-27 阿里巴巴集团控股有限公司 Virtual machine monitoring method and device
CN107783913A (en) * 2016-08-31 2018-03-09 华为技术有限公司 A kind of resource access method and computer applied to computer
CN106445635A (en) * 2016-09-23 2017-02-22 生活立方家(武汉)科技有限公司 Computer transmission method
CN107239696B (en) * 2017-04-11 2019-07-19 中国科学院信息工程研究所 A kind of hot restorative procedure of loophole for virtualization hypercalls function
CN107491340B (en) * 2017-07-31 2020-07-14 上海交通大学 Method for realizing huge virtual machine crossing physical machines
CN108063737B (en) * 2017-11-23 2020-09-08 华中科技大学 FCoE storage area network read request processing method and system
CN108073451B (en) * 2017-12-20 2020-09-22 北京东土科技股份有限公司 Interrupt processing method and device between heterogeneous operating systems on multi-core CPU
CN109062671A (en) * 2018-08-15 2018-12-21 无锡江南计算技术研究所 A kind of high-performance interconnection network software virtual method of lightweight

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101430674A (en) * 2008-12-23 2009-05-13 北京航空航天大学 Intraconnection communication method of distributed virtual machine monitoring apparatus
CN101477495A (en) * 2008-10-28 2009-07-08 北京航空航天大学 Implementing method for distributed internal memory virtualization technology

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101477495A (en) * 2008-10-28 2009-07-08 北京航空航天大学 Implementing method for distributed internal memory virtualization technology
CN101430674A (en) * 2008-12-23 2009-05-13 北京航空航天大学 Intraconnection communication method of distributed virtual machine monitoring apparatus

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
分布式I/O资源虚拟化技术的研究;张萧等;《微电子学与计算机》;20081031;第25卷(第10期);第178-181页 *
张萧等.分布式I/O资源虚拟化技术的研究.《微电子学与计算机》.2008,第25卷(第10期),

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112099941A (en) * 2016-08-26 2020-12-18 华为技术有限公司 Method, equipment and system for realizing hardware acceleration processing

Also Published As

Publication number Publication date
CN101751284A (en) 2010-06-23

Similar Documents

Publication Publication Date Title
CN101751284B (en) I/O resource scheduling method for distributed virtual machine monitor
CN104040518B (en) Efficient memory and resource management
CN101271401B (en) Server cluster unit system with single system image
KR101574403B1 (en) Combined virtual graphics device
KR101673435B1 (en) Creating an isolated execution environment in a co-designed processor
CN105830026A (en) Apparatus and method for scheduling graphics processing unit workloads from virtual machines
US10877793B2 (en) Extending the base address register by modifying the number of read-only bits associated with a device to be presented to a guest operating system
CN103034524A (en) Paravirtualized virtual GPU
Amiri Sani et al. I/O paravirtualization at the device file boundary
CN104714846A (en) Resource processing method, operating system and equipment
CN101520738A (en) Virtual machine system based on the management technology of equipment access storage and equipment access control method thereof
CN104615480A (en) Virtual processor scheduling method based on NUMA high-performance network processor loads
CN104583959A (en) Enabling virtualization of processor resource
CN101425046A (en) Method for implementing distributed I/O resource virtualizing technique
CN104503825A (en) Mixed type equipment virtualization method based on KVM (Kernel-based Virtual Machine)
CN103995733A (en) Lightweight nested virtualization implementation method based on physical resource penetration mechanism
Bose et al. Benchmarking database performance in a virtual environment
CN101876954A (en) Virtual machine control system and working method thereof
WO2018041075A9 (en) Resource access method applied to computer, and computer
CN103984591B (en) PCI (Peripheral Component Interconnect) device INTx interruption delivery method for computer virtualization system
Kornaros et al. Towards full virtualization of heterogeneous noc-based multicore embedded architectures
US8402191B2 (en) Computing element virtualization
US20180239715A1 (en) Secure zero-copy packet forwarding
Guo et al. A cooperative model virtual-machine monitor based on multi-core platform
Pickartza et al. Virtualization in HPC

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
C10 Entry into substantive examination
C41 Transfer of patent application or patent right or utility model
ASS Succession or assignment of patent right

Owner name: HUAWEI TECHNOLOGY CO LTD

Free format text: FORMER OWNER: BEIJING AERONAUTICS AND ASTRONAUTICS UNIV.

Effective date: 20110926

TA01 Transfer of patent application right

Effective date of registration: 20110926

Address after: 518129 headquarter office building of Bantian HUAWEI base, Longgang District, Shenzhen, Guangdong, China

Applicant after: Huawei Technologies Co., Ltd.

Address before: 100191 School of computer science and engineering, Beihang University, Xueyuan Road 37, Beijing, Haidian District

Applicant before: Beihang University

COR Change of bibliographic data

Free format text: CORRECT: ADDRESS; FROM: 100191 HAIDIAN, BEIJING TO: 518129 SHENZHEN, GUANGDONG PROVINCE

C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20130424

Termination date: 20171225