WO2023093418A1 - 数据迁移方法、装置及电子设备 - Google Patents

数据迁移方法、装置及电子设备 Download PDF

Info

Publication number
WO2023093418A1
WO2023093418A1 PCT/CN2022/127151 CN2022127151W WO2023093418A1 WO 2023093418 A1 WO2023093418 A1 WO 2023093418A1 CN 2022127151 W CN2022127151 W CN 2022127151W WO 2023093418 A1 WO2023093418 A1 WO 2023093418A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
migrated
memory
migration
host
Prior art date
Application number
PCT/CN2022/127151
Other languages
English (en)
French (fr)
Inventor
卢胜文
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP22897501.7A priority Critical patent/EP4421631A1/en
Publication of WO2023093418A1 publication Critical patent/WO2023093418A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/546Message passing systems or structures, e.g. queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/4557Distribution of virtual machine instances; Migration and load balancing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45583Memory management, e.g. access or allocation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45595Network integration; Enabling network access in virtual machine instances

Definitions

  • the present application relates to the field of computer technology, in particular to a data migration method, device and electronic equipment.
  • a virtual machine refers to a virtual device obtained by virtualizing physical computing resources, storage resources, and network resources through virtualization technology.
  • the physical device on which the virtual machine runs is called the host machine.
  • the live migration technology is usually used to migrate the VM from the source host (also called the source host) to the destination host (also called the destination host). , to achieve hardware fault tolerance or load balancing functions.
  • the hot migration technology needs to include the migration of VM memory data, which involves the processing of dirty page data in memory, and dirty page refers to the storage space where data in memory is modified.
  • the processor of the source host executes the hot migration process, including identifying dirty pages, relocating data in the dirty pages, and instructing the network card to send the data in the dirty pages to the destination host.
  • the above process completely depends on the processing power and input/output (I/O) bandwidth of the processor of the source host.
  • I/O input/output
  • the present application provides a data migration method, device and electronic equipment, which solve the problem of service performance degradation of the source host during the data migration process of the virtual machine.
  • a data migration method is provided, and the data migration method can be applied to a source host, or a physical device supporting the implementation of the data migration method, for example, the physical device includes a chip system.
  • the source host includes a memory and a network card
  • the data migration method is executed by the network card of the source host.
  • the data migration method includes: first, the network card of the source host obtains a first data migration notification message, and the first data migration notification message is used to indicate The ID of the virtual machine to migrate.
  • the network card of the source host determines the data to be migrated according to the identifier, and the data to be migrated is stored in the memory of the source host and associated with the aforementioned virtual machine to be migrated.
  • the network card of the source host migrates the data to be migrated to the destination host.
  • the data to be migrated is determined by the network card based on the identity of the virtual machine to be migrated, which avoids the process of determining the data to be migrated by the source host for the virtual machine, and reduces the need for the source host during the hot migration of the virtual machine.
  • Computing resources improve the ability of the source host to perform other services (such as AI, HPC and other computing-intensive and delay-sensitive services).
  • the network card determines the data to be migrated according to the identifier, including: the network card determines the memory page set associated with the virtual machine to be migrated in the memory of the source host according to the identifier, and stores the data stored in the memory page set as data to be migrated.
  • the memory page set includes one or more memory pages.
  • the data to be migrated is determined by the network card according to the identity of the virtual machine to be migrated, which avoids the performance degradation of the source host caused by the processor of the source host determining the data to be migrated, improves the ability of the source host to process other services, and reduces the number of cards on the source host. pause.
  • the data to be migrated provided in the above embodiment includes dirty page data
  • the dirty page data is data stored in memory pages whose data has been modified among one or more memory pages.
  • the network card determines the dirty page data included in the data to be migrated according to the identifier, including: the network card queries the dirty page tag information stored in the network card, and determines the set of dirty pages associated with the identifier; then, the network card will The data stored in the dirty page set is used as dirty page data.
  • the dirty page set includes one or more dirty pages, and the dirty pages are memory pages whose data in one or more memory pages has been modified, and the aforementioned dirty page tag information is used to indicate the memory address of the dirty pages.
  • the network card marks the dirty pages in the memory pages associated with the virtual machine to be migrated in the memory of the source host, that is, the function of marking dirty in the memory is offloaded to the network card by the processor of the source host, avoiding the processor from marking the memory pages
  • the process of dirty pages reduces the resource consumption of the processor, thereby avoiding the impact on other computing services of the source host caused by the hot migration process of the processor managing the virtual machine.
  • the dirty page marking information includes at least one of the first dirty page table and the second dirty page table.
  • the first dirty page table is used to mark the dirty page as the marked dirty state
  • the marked dirty state is the state where the source host modifies the data stored in the dirty page
  • the second dirty page table is used to mark the dirty page as the data migration state
  • the data migration status is the status of the NIC migrating the data stored in the dirty page.
  • the network card migrates the data to be migrated to the destination host, including: first, the network card sends page information of the data to be migrated to the destination host, and the page information is used to indicate the location of the data to be migrated in the memory memory address and offset. Second, the network card receives a migration message from the destination host based on page information feedback, and the migration message is used to indicate a receive queue (receive queue, RQ) corresponding to the virtual machine to be migrated in the destination host. Third, the network card sends the data to be migrated to the RQ indicated by the migration message.
  • the network card in the destination host will assign a migration message to the virtual machine to be migrated in the source host according to the page information sent by the network card in the source host, and The network card in the source host migrates the data to be migrated to the RQ indicated by the migration message, which prevents the data to be migrated from the virtual machine to be migrated (such as VM1) from being sent to the corresponding RQ of other virtual machines (such as VM2), and improves the performance of the virtual machine. data migration accuracy.
  • the send queue (send queue, SQ) in the network card maintains SG information including the memory address and offset of the dirty page, and the aforementioned page information includes the SG information corresponding to the dirty page.
  • the network card sends the data to be migrated to the RQ indicated by the migration message, including: the network card obtains the SG information corresponding to the dirty page from the SQ, and sends the data indicated by the SG information to the RQ indicated by the migration message .
  • the data migration function of the source host is offloaded to the NIC, which prevents the processor in the source host from performing multiple data copy operations of the virtual machine, reduces the consumption of computing resources of the source host, and improves the Efficiency of host processing other computing services.
  • the source host establishes a control connection and a data connection with the destination host through a transmission control protocol/internet protocol (transmission control protocol/internet protocol, TCP/IP), wherein the The control connection is used to transmit page information and migration messages, and the data connection is used to transmit data to be migrated.
  • TCP/IP transmission control protocol/internet protocol
  • different transmission channels are used to transmit different information or data, which avoids the page information being transmitted by the data connection, resulting in that the page information cannot be processed by the network card on the receiving side (destination host), and the virtual machine heat Migration problems have improved the data migration stability of virtual machines.
  • the source host establishes a single connection with the destination host through TCP/IP, and the single connection is used to transmit page information, migration messages and data to be migrated.
  • the migration message is a message processing identifier (identifier, ID) allocated by the destination host to the virtual machine to be migrated. Since a single connection between the source host and the destination host can transmit data of different VMs, the above-mentioned message processing ID can also be used to distinguish the VM to which the data in a single connection belongs, so as to avoid data transmission errors of multiple VMs in a single connection, such as The data of VM(1) is mistakenly identified as the data of VM(2), thereby improving the accuracy of VM live migration.
  • ID message processing identifier
  • the source host communicates with the destination host through a remote direct memory access (RDMA) network, and a memory protection table (memory protect table, MPT) is stored in the network card, and the MPT table It is used to indicate the corresponding relationship between the host physical address (host physical address, HPA) of the memory and the guest physical address (guest physical address, GPA) of the virtual machine to be migrated, and the MPT table contains the virtual machine to be migrated.
  • Physical function (PF) information The network card sends the data to be migrated to the RQ indicated by the migration message, including: first, the network card determines the PF information of the virtual machine to be migrated according to the migration information.
  • the network card queries the MPT table according to the PF information and the GPA of the virtual machine to be migrated, and determines the HPA corresponding to the page information. Finally, the network card sends the data to be migrated stored in the HPA of the virtual machine to be migrated to the destination host.
  • the data migration method further includes: the network card acquires a second data migration notification message, where the second data migration notification message is used to indicate the identity of the virtual machine to be received.
  • the network card migrates the to-be-received data sent by other hosts to the memory of the source host, and the to-be-received data is stored in the memory of other hosts and associated with the virtual machine to be received.
  • the source host can serve as the sender to send the data to be migrated of the virtual machine to be migrated to the destination host, and the source host can also serve as the receiver to receive the data of the virtual machine sent by other hosts, so as to realize the multiplexing of the source host.
  • the function of sending and receiving during the migration process of a virtual machine improves the data migration performance of the source host.
  • the present application provides a data migration device, the data migration device is applied to the network card of the source host, and the data migration device includes a data migration device for performing the data migration in the first aspect or any possible implementation of the first aspect method modules.
  • the data migration device includes: a communication unit, a data identification unit and a migration unit.
  • the communication unit is configured to acquire a first data migration notification message, where the first data migration notification message is used to indicate the identity of the virtual machine to be migrated.
  • the data identifying unit is used to determine the data to be migrated according to the identifier, the data to be migrated is stored in the memory of the source host and associated with the virtual machine to be migrated.
  • the migration unit is used to migrate the aforementioned data to be migrated to the destination host.
  • the data migration device has the function of realizing the behavior in the method example of any implementation manner in the first aspect above.
  • the functions described above may be implemented by hardware, or may be implemented by executing corresponding software on the hardware.
  • the hardware or software includes one or more modules corresponding to the above functions.
  • the present application provides an electronic device, which includes: an interface circuit and a control circuit.
  • the interface circuit is used to receive signals from other devices other than electronic devices and transmit them to the control circuit, or send signals from the control circuit to other devices other than electronic devices, and the control circuit is used to implement logic circuits or execute code instructions.
  • the electronic device may refer to a network card.
  • the electronic device may also refer to a processor contained in a network card.
  • the electronic device may also refer to a dedicated processing device including a network card, and the dedicated processing device may implement the method in any implementation manner in the first aspect.
  • the present application provides a computer-readable storage medium, in which computer programs or instructions are stored.
  • the computer programs or instructions are executed by a host or a network card, any one of the first aspect and the first aspect may be realized. method in the implementation.
  • the present application provides a computer program product, the computer program product includes instructions, and when the computer program product runs on the host or the network card, the host or the network card executes the instructions, so as to realize the first aspect and the first aspect A method in any of the possible implementations.
  • the present application provides a chip, including a memory and a processor, the memory is used to store computer instructions, and the processor is used to call and run the computer instructions from the memory, so as to implement the above-mentioned first aspect and any possibility of the first aspect. method in the implementation of .
  • FIG. 1 is an application scenario diagram of a communication system provided by the present application
  • FIG. 2 is a schematic structural diagram of a host provided by the present application.
  • FIG. 3 is a schematic flow diagram 1 of a data migration method provided by the present application.
  • FIG. 4 is a schematic flow diagram II of a data migration method provided by the present application.
  • FIG. 5 is a schematic diagram of data migration provided by the present application.
  • FIG. 6 is a schematic structural diagram of a data migration device provided by the present application.
  • the present application provides a data migration method: firstly, the network card of the source host obtains a data migration notification message, and the data migration notification message is used to indicate the identity of the virtual machine to be migrated; secondly, the network card determines the data to be migrated according to the identity, and the data migration The data is stored in the memory of the source host and associated with the virtual machine to be migrated; finally, the network card migrates the data to be migrated to the destination host.
  • the data to be migrated is determined by the network card according to the identity of the virtual machine to be migrated, which avoids the process of the source host determining the data to be migrated of the virtual machine, and reduces the number of problems in the source host during the hot migration of the virtual machine.
  • the required computing resources improve the ability of the source host to perform other services (such as AI, HPC and other computing-intensive and delay-sensitive services).
  • Fig. 1 is the application scene figure of a kind of communication system that the present application provides, and this communication system comprises computer cluster (computer cluster) 110 and client 120, and computer cluster 110 can communicate with client 120 through network 130, and network 130 can be Internet, or other networks such as Ethernet.
  • the network 130 may include one or more network devices, for example, the network devices may be routers or switches.
  • the client 120 may be a computer running an application program, and the computer running the application program may be a physical machine or a virtual machine.
  • the computer running the application program is a physical computing device
  • the physical computing device may be a host or a terminal (Terminal).
  • the terminal may also be called terminal equipment, user equipment (user equipment, UE), mobile station (mobile station, MS), mobile terminal (mobile terminal, MT) and so on.
  • Terminals can be mobile phones, tablet computers, notebook computers, desktop computers, desktop computers, virtual reality (virtual reality, VR) terminal equipment, augmented reality (augmented reality, AR) terminal equipment, wireless terminals in industrial control, unmanned driving Wireless terminals in remote medical surgery, wireless terminals in smart grid, wireless terminals in transportation safety, wireless terminals in smart city, smart Wireless terminals in the family (smart home), etc.
  • the embodiment of the present application does not limit the specific technology and specific device form adopted by the client 120 .
  • the client 120 may be a software module running on any one or more hosts in the computer cluster 110 .
  • the computer cluster 110 refers to a collection of computers connected through a local area network or the Internet, and is usually used to execute large-scale tasks (also called jobs (jobs)).
  • the jobs here are usually large-scale jobs that require more computing resources to be processed in parallel. This embodiment does not Limit the nature and quantity of work.
  • a job may contain multiple computing tasks, which can be assigned to multiple computing resources for execution. Most tasks are executed concurrently or in parallel, while some tasks depend on data produced by other tasks.
  • Each computing device in the computer cluster uses the same hardware and the same operating system; different hardware and different operating systems can also be used in the host computer of the computer cluster according to business requirements. Since the tasks deployed by the computer cluster can be executed concurrently, the overall performance can be improved.
  • a computer cluster 110 includes multiple hosts, such as hosts 111 to 114 , and each host can be used to provide computing resources. As far as a host is concerned, it can contain multiple processors or processor cores, and each processor or processing core can be a computing resource, so a physical host can provide multiple computing resources. For example, a physical host may refer to a server.
  • Computer cluster 110 can handle many types of jobs.
  • the number of tasks and the data of tasks that can be executed in parallel are not limited.
  • the above job refers to hot migration of a virtual machine or master-slave backup between hosts, such as data backup.
  • a job may be submitted to computer cluster 110 from client 120 to host 111 over network 130 .
  • the host 111 can be used to manage all the hosts in the computer cluster 110 to complete one or more tasks included in the job, such as scheduling computing resources or storage of other hosts resources etc.
  • the position where the job is submitted may also be another host in the computer cluster 110, and this embodiment does not limit the position where the job is submitted.
  • a virtual machine refers to a virtual device obtained by virtualizing physical computing resources, storage resources, and network resources through virtualization technology.
  • one or more VMs run on one host, for example, two VMs run on the host 111 and one VM runs on the host 114 .
  • one VM runs on multiple hosts, for example, one VM utilizes the processing resources of the host 111 and the storage resources of the host 114 .
  • FIG. 1 is only an example provided by this embodiment, and should not be construed as a limitation to this application.
  • This application takes a VM running on a host as an example for illustration.
  • Fig. 1 is only a schematic diagram, and other devices may also be included in the communication system, which are not shown in Fig. 1, and the embodiments of the present application are limited to the number and types of hosts (computing devices) and clients included in the system No limit.
  • the computer cluster 110 may also include more or fewer computing devices.
  • the computer cluster 110 includes two computing devices, one computing device is used to realize the functions of the above-mentioned host 111 and host 112, and the other computing device is used to realize Functions of the host 113 and the host 114 described above.
  • FIG. 2 is a schematic structural diagram of a host provided by the present application. Exemplarily, any host in FIG. 1 can be implemented by the host 200 shown in FIG. 2, and the host 200 includes a baseboard management controller (baseboard management controller, BMC) 210, processor 220, memory 230, hard disk 240 and network card 250.
  • BMC baseboard management controller
  • the baseboard management controller 210 can upgrade the firmware of the device, manage the running state of the device, troubleshoot, and so on.
  • the processor 220 can access the baseboard management controller 210 through a bus such as a peripheral component interconnect express (PCIe), a universal serial bus (Universal Serial Bus, USB), or an integrated circuit bus (Inter-Integrated Circuit, I2C).
  • PCIe peripheral component interconnect express
  • USB Universal Serial Bus
  • I2C Inter-Integrated Circuit
  • the baseboard management controller 210 may also be connected to at least one sensor.
  • the state data of the computer equipment is obtained through the sensor, wherein the state data includes: temperature data, current data, voltage data and so on. In this application, there is no specific limitation on the type of status data.
  • the baseboard management controller 210 communicates with the processor 220 through the PCIe bus or other types of buses, for example, transfers the acquired state data to the processor 220 for processing.
  • the baseboard management controller 210 can also maintain the program codes in the memory, including upgrading or restoring.
  • the baseboard management controller 210 may also control a power supply circuit or a clock circuit in the host computer 200 .
  • the BMC 210 can manage the host 200 through the above methods.
  • the baseboard management controller 210 is only an optional device.
  • the processor 220 can directly communicate with the sensor, so as to directly manage and maintain the computer equipment.
  • connection mode of the device in the host can also be extended through the extended industry standard architecture (EISA) bus, unified bus (Ubus or UB) ), computer express link (compute express link, CXL), cache coherent interconnect protocol (cache coherent interconnect for accelerators, CCIX), etc.
  • EISA extended industry standard architecture
  • Ubus or UB unified bus
  • CXL compute express link
  • CCIX cache coherent interconnect protocol
  • the bus can also be divided into address bus, data bus, control bus and so on.
  • the processor 220 is connected to the memory 230 through a double data rate (DDR) bus.
  • DDR double data rate
  • different memories 230 may use different data buses to communicate with the processor 220, so the DDR bus may also be replaced with other types of data buses, and the embodiment of the present application does not limit the bus type.
  • the host 200 also includes various input/output (I/O) devices, and the processor 220 can access these I/O devices through the PCIe bus.
  • I/O input/output
  • the processor (processor) 220 is the computing core and control core of the host 200 .
  • the processor 220 may include one or more processor cores (core) 221 .
  • Processor 220 may be a VLSI. An operating system and other software programs are installed in the processor 220, so that the processor 220 can realize access to the memory 230 and various PCIe devices. It can be understood that, in the embodiment of the present invention, the processor 220 may be a central processing unit (central processing unit, CPU) or other specific integrated circuit (application specific integrated circuit, ASIC).
  • CPU central processing unit
  • ASIC application specific integrated circuit
  • the processor 220 can also be other general-purpose processors, digital signal processing (digital signal processing, DSP), field programmable gate array (field programmable gate array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • DSP digital signal processing
  • FPGA field programmable gate array
  • the host 200 may also include multiple processors.
  • the memory 230 is also called main memory.
  • the memory 230 is usually used to store various running software in the operating system, input and output data, and information exchanged with external storage. In order to increase the access speed of the processor 220, the memory 230 needs to have the advantage of fast access speed.
  • a dynamic random access memory (dynamic random access memory, DRAM) is usually used as the memory 230 .
  • the processor 220 can access the memory 230 at high speed through the memory controller, and perform read and write operations on any storage unit in the memory 230 .
  • the memory 230 may also be other random access memories, such as static random access memory (static random access memory, SRAM). This embodiment does not limit the quantity and type of the memory 230 .
  • the memory 230 can be configured to have a power saving function.
  • the power protection function means that when the system is powered off and then powered on again, the data stored in the memory will not be lost.
  • the memory 230 having a power saving function is called a nonvolatile memory.
  • the memory 230 includes multiple memory pages (pages).
  • a memory page is the smallest unit of data I/O operations of the memory 230, and a memory page is also called an atomic unit of data read and write.
  • Each memory page corresponds to a section of storage address space of memory 230, such as a memory page can be used to store 4 kilobytes (kilo bytes, KB) of data, then the memory page corresponds to a 4KB storage address space.
  • a memory page can also correspond to a larger or smaller storage address space, such as 2KB or 8KB.
  • the memory page can be called a dirty page in the live migration process of the virtual machine
  • the data stored in the modified memory page is called dirty page data of the virtual machine.
  • An I/O device refers to hardware capable of data transmission, and can also be understood as a device connected to an I/O interface.
  • Common I/O devices include network cards, printers, keyboards, mice, etc.
  • the I/O devices may be the network card 250 shown in FIG. 2 .
  • All external storage can also be used as I/O devices, such as hard disks, floppy disks, and CDs.
  • the processor 220 can access various I/O devices through the PCIe bus. It should be noted that the PCIe bus is just one example, and can be replaced by other buses, such as a unified (Unified Bus, UB or Ubus) bus or a computer express link (compute express link, CXL).
  • the network card 250 includes a processor 251 , a memory 252 and a communication interface 253 .
  • a network card including a processing unit and a network interface card (NIC) is also referred to as an intelligent NIC (iNIC).
  • the processor 251 refers to a processor with processing capabilities, such as a data processing unit (data processing unit, DPU).
  • DPU data processing unit
  • a DPU has the general purpose and programmability of a CPU, but is more specialized, operating efficiently on network packets, storage requests, or analytics requests. DPUs are distinguished from CPUs by a greater degree of parallelism (the need to process a large number of requests).
  • the DPU here can also be replaced with a graphics processing unit (graphics processing unit, GPU), an embedded neural network processor (neural-network processing units, NPU) and other processing chips.
  • the memory 252 may refer to an internal memory directly exchanging data with the processor 251, it can read and write data at any time, and the speed is very fast, and serves as a temporary data memory for the operating system or other running programs.
  • the memory 252 includes at least two types of memory, for example, the memory 252 can be either a random access memory or a ROM.
  • Random access memory is, for example, DRAM, or storage class memory (SCM).
  • DRAM is a semiconductor memory, which, like most RAM, is a volatile memory device.
  • SCM is a composite storage technology that combines the characteristics of traditional storage devices and memory. Storage-class memory can provide faster read and write speeds than hard disks, but the access speed is slower than DRAM, and the cost is also cheaper than DRAM. .
  • DRAM and SCM are only illustrative examples in this embodiment, and the memory 252 may also include other random access memories, such as SRAM.
  • the read-only memory for example, it can be PROM, EPROM and so on.
  • the memory 252 may also be a dual in-line memory module or a dual in-line memory module (DIMM), that is, a module composed of DRAM, or a solid state disk (SSD).
  • DIMM dual in-line memory module
  • SIMM dual in-line memory module
  • SSD solid state disk
  • multiple memories 252 and different types of memories 252 may be configured in the network card 250 . This embodiment does not limit the quantity and type of the memory 252 .
  • the memory 252 can be configured to have a power saving function.
  • the power saving function means that the data stored in the memory 252 will not be lost when the system is powered off and then powered on again.
  • a memory with a power saving function is called a non-volatile memory.
  • a software program is stored in the memory 252, and the processor 251 runs the software program in the memory 252 to manage VM migration, for example, migrate the VM data stored in the memory 230 to other devices.
  • the communication interface 253 is used to realize the network interface card that the host 200 communicates with other devices, for example, the network adapter 253 can realize data conversion from parallel to serial, assembly and disassembly of data packets, network access control, and data cache and one or more functions in network signals.
  • one or more VMs may run on the host 200 , such as virtual machine 200A ⁇ virtual machine 200C shown in FIG. 2 .
  • the computing resources required by the VM come from the local processor 220 and memory 230 of the host 200, while the storage resources required by the VM can come from the local hard disk 240 of the host 200 or hard disks in other hosts.
  • the virtual machine 200A includes a virtual processor 200A1, a virtual memory 200A2, and a virtual network card 200A3.
  • the computing resources required by the virtual processor 200A1 are provided by the processor 200, and the storage resources required by the virtual memory 200A2 are provided by the memory 230 (or hard disk 240).
  • the network resource required by the virtual network card 200A3 is provided by the network card 250 .
  • various applications can run in the VM, and users can trigger services such as read/write data requests through the applications in the VM.
  • the storage space of the virtual memory 200A2 in the VM can be provided by memory pages included in the memory 230 .
  • VM live migration can migrate a VM from one host to another without interrupting business.
  • the key technology to ensure uninterrupted business is the migration of memory data, which refers to storage The memory page data included in the memory 230 . If the data stored in the memory page is modified within a period of time, the memory page can be called a dirty page in the VM live migration process. Therefore, the data stored in the modified memory page (dirty page) The data is called dirty page data of the VM.
  • the source host is the host 111 in FIG. 1 and the destination host is the host 112 as an example for illustration.
  • the memory migration can be in the following stages.
  • Iterative pre-copy stage When the VM migration starts, it still runs on the source host; in order to ensure uninterrupted VM services, the data of the virtual machine to be migrated will be written into the memory of the source host and the destination host at the same time.
  • Downtime copy stage the running of the VM on the source host is interrupted, and the data stored by the VM in the memory page of the source host is transferred to the memory of the destination host.
  • Dirty page copy stage The VM is still running on the source host. The source host monitors and records any modification of all transferred memory pages during the migration process (that is, the dirty pages in the memory page), and uses them in the VM. After all the memory pages in the memory page are transferred, the dirty page data stored in the dirty page will be transferred.
  • the source host estimates the data transmission speed during the migration process.
  • the VM on the source host is shut down, and then the remaining VM data is transferred. Dirty page data is transferred to the destination host.
  • Virtual machine recovery stage start the VM on the destination host, and the entire migration process of the VM is completed.
  • the shared storage system means that the image file directories of the source and destination virtual machines are on a shared storage.
  • FIG. 3 is a schematic flow diagram of a data migration method provided by the present application.
  • the data migration method can be applied to the communication system shown in FIG. 1.
  • the host 310 can implement the functions of the host 111 in FIG.
  • the functions of the host 112 in FIG. 1 can be realized.
  • This data migration method can be carried out by network card 313, and the hardware realization of network card 313 place host computer 310 can refer to host computer 200 shown in Figure 2, and this network card 313 also can have the function of network card 253 in Figure 2, and host computer 320 is similar to host computer 310, I won't repeat them here.
  • the host 310 shown in FIG. 3 is called a source host, or a sending host, a first host, a source node, etc.
  • the memory 312 stores data to be migrated of one or more VMs, and the data to be migrated includes The running data of the VM before copying, and the data stored in dirty pages during live migration.
  • a dirty page refers to a memory page (page) in which data in the memory 312 is modified.
  • the network card 313 may run a virtual machine live migration management program, and the computing resources required by the virtual machine live migration management program are provided by the processor and memory included in the network card 313.
  • the virtual machine live migration management program Dirty page data stored in the memory 312 of the host 310 may be managed, such as read, written, or erased.
  • the data migration function of the processor 311 is offloaded to the network card 313, which avoids the processor 311 in the host 310 from performing multiple virtual machine data copy operations, and reduces the resource consumption of the processor 311. The efficiency of processing other computing services by the host 310 is improved.
  • the host 320 shown in FIG. 3 is referred to as the destination host, or the receiving end host, the second host, the destination node, etc., and the memory 322 is used to store data to be migrated of one or more VMs, the data to be migrated Including the running data of the VM before copying, and the data stored in dirty pages during live migration.
  • a dirty page refers to a memory page (page) where data stored in the memory 322 is modified during the live migration process of the VM.
  • the above-mentioned virtual machine live migration management program may also run in the network card 323 .
  • virtual machine management software may run on the source host and the destination host, and the virtual machine management software is used to manage virtual machines on the source host and the destination host.
  • the aforementioned virtual machine live migration management program may be a part of the virtual machine management software, or a thread started in the network card by the virtual machine management software to realize the virtual machine live migration.
  • the network card receives and sends data usually in the form of a message queue, and the message queue includes a group of queue pairs (queue pair, QP), and the QP includes a sending queue and a receiving queue. ), the message queue used to receive data in the network card 323 is a receive queue (RQ).
  • Message queuing is a connection mode used for communication between multiple hosts. For example, multiple hosts can use the TCP/IP protocol to establish multiple connections. Each connection has a receiving queue and a sending queue. The receiving queue and The send queue is used to transfer data for this connection.
  • the host 310 and the host 320 realize the virtualization of hardware network resources (such as network cards) based on the single root I/O virtualization (SR-IOV) technology, and the SR-IOV technology is a technical realization of "virtual channel" .
  • hardware network resources such as network cards
  • SR-IOV single root I/O virtualization
  • SR-IOV technology virtualizes a PCIe device that implements a physical function (physical function, PF), and obtains one or more PCIe devices that implement a virtual function (virtual function, VF), and each PCIe device that can implement a VF
  • the device hereinafter referred to as VF
  • VF is directly allocated to a virtual machine, and the host also provides independent memory space, interrupt (number) and direct memory access (direct memory access, DMA) stream for each VF.
  • the PF driver of SR-IOV devices is used to manage the physical functions of devices with SR-IOV functions.
  • the physical functions support the PCI functions of SR-IOV functions defined by the SR-IOV specification.
  • the physical functions are comprehensive
  • the PCIe function of the device can be discovered, managed and handled like any other physical PCIe device.
  • Physical functions can be used to configure and control virtual PCIe devices.
  • a VF is a virtual network card or instance virtualized by a physical network card that supports SR-IOV.
  • the VF will be presented to the virtual machine as an independent network card.
  • Each VF has an exclusive PCI configuration area and may share the same space with other VFs.
  • a physical resource such as sharing a physical network port
  • VF has a lightweight PCIe function, and can share one or more physical resources (such as physical network cards) with the PF and other VFs associated with the PF.
  • a VF corresponds to one or more SQs; in the destination host, one VF corresponds to one or more RQs.
  • a work queue element (work queue element, WQE) is stored in the message queue, and WQE stores information pointing to the address and length of the sending queue or receiving queue data.
  • the length of the data can be determined by the address and offset of the data.
  • the information indicating the address and offset of data in the WQE is also called scatter/gather (SG) information. If a set of data includes multiple pieces of data, and the SG information of a piece of data includes the address and offset of the data, then multiple SG information about the set of data in WQE can also be called a chain of SGs, or a hash list ( scatter/gather list, SGL).
  • the address included in the above SG information refers to the client physical address (guest physical address, GPA) of the VM data, and the client refers to the virtual machine (VM) running on the source host.
  • the GPA refers to the memory address where the VM data is stored in the memory (for example, the memory 312 ).
  • the GPA may refer to an intermediate physical address (IPA).
  • IPA intermediate physical address
  • the VM can access the corresponding data based on the GPA, but the hardware device (such as the processor or network card of the host) needs to use the host physical address (HPA) to access the data. Therefore, the GPA needs to go through GPA ⁇ host The virtual address (host virtual address, HVA), and the two-level address conversion of HVA ⁇ HPA, so as to realize the access of the hardware device to the data indicated by the SG information.
  • the address included in the above SG information refers to the HPA of the VM data, and a hardware device (such as a processor or a network card of the host) can access the VM data based on the GPA.
  • the memory 312 includes a plurality of memory pages (pages), such as pages with serial numbers 1 to 5, which are respectively recorded as page1, page2, page3, page4, and page5, and there are 2 pages in the plurality of pages that are to be migrated.
  • memory pages associated with the virtual machine such as page2 and page4 in memory 312 .
  • network card 313 may send the data stored in page2 and page4 in memory 312 to other hosts, such as host 320 .
  • the data migration method provided in this embodiment includes the following steps.
  • the network card 313 acquires a data migration notification message.
  • the data migration notification message is used to indicate the identity of the virtual machine to be migrated, as shown in the black-filled small block in FIG. 3 .
  • the data migration notification message can be used to indicate the virtual machines to be migrated among the multiple virtual machines, so as to prevent the data of the virtual machines that do not need to be migrated from being migrated to other hosts, reducing data The probability that the data of multiple virtual machines will be disturbed during the migration process.
  • the data migration notification message is generated when the virtual machine management software triggers the live migration operation of the virtual machine.
  • the virtual machine management software manages multiple virtual machines.
  • the virtual machine management software After the host 310 (or client) triggers the live migration operation, the virtual machine management software generates a data migration notification message, which may include the virtual machine to be migrated.
  • the identifier of the machine for example, the identifier is a serial number of the virtual machine, or a label of the virtual machine in a virtual local area network (virtual local area network, VLAN).
  • the trigger conditions for the live migration operation of the virtual machine may be: user operations and resource usage of the host.
  • the resource usage status may be information such as the amount of resources to be released by the host, and specifically, such as computing resources, storage resources, and network bandwidth of the host.
  • the network card 313 determines the data to be migrated according to the identifier indicated by the data migration notification message.
  • the data to be migrated is stored in the memory (memory 312 ) of the source host (host 310 ) and associated with the virtual machine to be migrated.
  • the network card 313 may determine the memory page set associated with the virtual machine to be migrated in the memory 312 according to the identifier indicated by the data migration notification message, and use the data stored in the memory page set as the data to be migrated.
  • the memory page set includes one or more memory pages.
  • the memory pages associated with the virtual machine to be migrated in memory 312 are page2 and page4 , and the data stored in page2 and page4 is the data to be migrated of the VM.
  • the data to be migrated is determined by the network card 313 according to the identity of the virtual machine to be migrated, which avoids the performance degradation of the host 310 caused by the processor 311 determining the data to be migrated, improves the ability of the host 310 to process other services, and reduces the lag of the host 310 .
  • the data to be migrated includes dirty page data, where the dirty page data is data stored in a memory page where data is modified among one or more memory pages.
  • the memory page whose data is modified is a dirty page, and page2 as shown in FIG.
  • the data of the machine is dirty page data.
  • the host 310 queries the dirty page mark information stored in the network card, determines the dirty page set associated with the identifier, and stores the dirty page set in the dirty page set data as dirty page data.
  • the dirty page set includes one or more dirty pages, such as page2 shown in FIG. 3 .
  • the aforementioned dirty page mark information is used to indicate the memory address of the dirty page.
  • the dirty page marking information includes at least one of the first dirty page table and the second dirty page table.
  • one or more dirty page tables may be stored in the network card 313, and the multiple dirty page tables may be used to implement different functions at the same time.
  • the first dirty page table is used to mark a dirty page as a marked dirty state
  • the marked dirty state is a state in which the source host (such as the processor 311 in the host 310 ) modifies the data stored in the dirty page.
  • the second dirty page table is used to mark the dirty page as a data migration state, and the data migration state is a state in which the data stored in the dirty page is migrated by the network card (such as the network card 313 ).
  • the dirty page marking information may only include a dirty page table, and the dirty page table is used to record the migration of dirty pages.
  • the dirty page table is used to record the migration of dirty pages. For example, when the source host performs data access to the dirty page, only the first dirty page table needs to be maintained in the network card 313, and the dirty page is recorded as the marked dirty state; for another example, when the network card 313 performs data access to the dirty page, the network card In 313, only the second dirty page table needs to be maintained, and the dirty page is recorded as the data migration state.
  • two dirty page tables can also be set in the network card 313 to mark the migration status of memory pages, such as the first dirty page table and the second dirty page table above. Two dirty page tables. At the same time node, the two dirty page tables are used to implement different functions.
  • the network card 313 can use a dirty page table to mark the state of the page.
  • a dirty page table represents the migration status of dirty pages in all memory pages (pages) associated with the virtual machine to be migrated, such as the migration status of each dirty page is represented by a bit (Bit).
  • one or more status flags may be set in the dirty page table, for example, the structure of the dirty page table may be shown in Table 1 below.
  • M and S are status flags used to mark the migration status of the page.
  • the network card such as the network card 313
  • Table 1 is only an example of the dirty page table provided in this embodiment, and the meanings represented by the status flags (such as M and S) in the dirty page table can be adjusted according to different usage scenarios and requirements. This is not limited.
  • the virtual machine management software can run a dirty mark program and a migration program.
  • the dirtying program and migration program may be a software module in the virtual machine live migration management program, or an independent software unit triggered by the virtual machine management software, which is not limited in this application.
  • the standardization program and migration program run on the network card 313 .
  • the dirtying program and the migrating program run on the processor 311 .
  • the dirty mark program and the migration program run on the network card 313 as an example for illustration.
  • the network card 313 marks the dirty pages in the memory pages associated with the virtual machine to be migrated in the memory 312, that is, the function of marking dirty in the memory 312 is offloaded to the network card 313 by the processor 311, avoiding
  • the process of marking dirty pages in the memory 312 by the processor 311 reduces the resource consumption of the processor 311 , thereby avoiding the impact on other computing services of the host 310 caused by the processor 311 managing the hot migration process of the virtual machine.
  • the migration program can modify the second dirty page table to avoid The marked dirty program and the migration program modify a dirty page table at the same time, resulting in data migration errors.
  • the dirty page scanning process includes the following possible steps:
  • Step 1 The network card 313 sets two dirty page tables: dirty page table 1 and dirty page table 2.
  • Each of the dirty page tables in these two dirty page tables has two flag bits (M and S shown in Table 1 above), and the flag bits in the dirty page table 1 are set to be all 0, and in the dirty page table 2
  • the flag bits of all are 1, where "1" indicates a dirty page, "0" indicates a non-dirty page, and all "1" in the dirty page table 2 indicates that any page recorded in the dirty page table 2 is a dirty page.
  • the network card 313 initializes the dirty page table 1 and the dirty page table 2, so that the state of all dirty pages recorded in the dirty page table 1 is marked as a marked dirty state, and the management of the dirty page table 1 is handed over to the above-mentioned marked dirty program;
  • the state of all dirty pages recorded in the page table 2 is recorded as the data migration state, and the management of the dirty page table 2 is handed over to the above-mentioned migration program.
  • the two dirty page tables are managed separately by different programs to prevent a dirty page table from being modified by different programs, resulting in errors in data migration.
  • the dirty page table stored in the network card 313 also records the dirty page address information, for example, the address information refers to the HPA of the dirty page in the host 310, or the GPA corresponding to the dirty page in the VM.
  • the data migration method provided in this embodiment further includes the following steps.
  • the network card 313 migrates the data to be migrated to the host 320.
  • the data to be migrated is determined by the network card according to the identity of the virtual machine to be migrated, which avoids the process of the source host determining the data to be migrated of the virtual machine, and reduces the number of problems in the source host during the hot migration of the virtual machine.
  • the required computing resources improve the ability of the source host to perform other services (such as AI, HPC and other computing-intensive and delay-sensitive services).
  • the data migration method provided by this embodiment further includes the following step S340.
  • the network card 323 migrates the data to be migrated to the memory 322 of the host 320.
  • the network card of the destination host writes the data to be migrated into the memory of the destination host, avoiding the copy process of the data to be migrated from the network card to the memory in the destination host, reducing the data migration delay and improving the efficiency of data migration.
  • the processor of the destination host since the processor of the destination host does not need to copy the data to be migrated of the virtual machine, the consumption of computing resources and storage resources of the destination host during the live migration of the virtual machine is reduced, and the ability of the destination host to perform other services is improved.
  • the host 310 may also serve as a destination host to receive data of virtual machines sent by other hosts, for example, the host 310 receives another data migration notification message, and the data migration notification message is used to indicate the identity of the virtual machine to be received by the host 310;
  • the host 310 migrates the data to be received sent by other hosts to the memory 312 , the data to be received is data stored in the memory of other hosts and associated with the virtual machine to be received.
  • the source host can serve as the sender to send the data to be migrated of the virtual machine to be migrated to the destination host, and the source host can also serve as the receiver to receive the data of the virtual machine sent by other hosts, so as to realize the multiplexing of the source host.
  • the function of sending and receiving during the migration process of a virtual machine improves the data migration performance of the source host.
  • FIG. 4 shows a possible implementation of the above S330, and the above S330 includes the following steps.
  • the network card 313 sends the page information of the data to be migrated to the host 320.
  • the page information is used to indicate the memory address and offset of the data to be migrated in the memory 312 .
  • the sending queue SQ in the network card 313 maintains SG information including memory addresses and offsets of dirty pages, and the aforementioned page information includes SG information corresponding to dirty pages.
  • the page information may be called description information of the data to be migrated, for example, the description information refers to metadata used to describe the service data stored in the dirty page.
  • the network card 323 sends to the network card 313 a migration message determined based on the page information.
  • the migration message is used to indicate the receiving queue (RQ) corresponding to the virtual machine to be migrated in the destination host.
  • the migration message is the RQ identifier or RQ serial number assigned by the network card 323 to the virtual machine to be migrated in the host 310 according to the page information.
  • the network card 313 and the network card 323 obtain the data migration notification message, they allocate one or more RQs to the virtual machine to be migrated; After the information, the serial number (or identification) of one RQ selected from the one or more RQs.
  • the migration message is that after the network card 323 receives the page information of the data to be migrated, the network card 323 and the network card 313 establish a data connection (or transmission channel) for transmitting the data to be migrated.
  • the data transmission in the connection is realized through the QP, and the migration message refers to the serial number (or identification) of the receiving queue (RQ) included in the QP.
  • the network card 313 sends the data to be migrated to the RQ indicated by the migration message.
  • the network card 323 assigns a migration message to the virtual machine to be migrated in the host 310 according to the page information sent by the network card 313, and the network card 313 migrates to the virtual machine.
  • the RQ indicated by the message migrates the data to be migrated, which prevents the data to be migrated of the virtual machine to be migrated (such as VM1) from being sent to the corresponding RQ of other virtual machines (such as VM2), and improves the accuracy of data migration of the virtual machine.
  • FIG. 5 is a schematic diagram of a data migration provided by this application, and the network card 313 and the network card 323 pass through the The method establishes a transmission channel, where the QP includes the sending queue in the network card 313 and the receiving queue in the network card 323, the sending queue is located at the sending end of the transmission channel, and the receiving queue is located at the receiving end of the transmission channel.
  • the SG information maintained by the send queue (SQ) included in the network card 313 has a first relationship with the memory page in the memory 312 associated with the virtual machine to be migrated.
  • the network card 313 determines the data to be migrated according to the identifier of the virtual machine to be migrated, the network card 313 constructs the memory address of the memory page associated with the identifier in the memory 312 .
  • the first relationship refers to the corresponding relationship between the service data stored in the memory page in the memory 312 and the SG information. As shown in FIG. 5, page2-SG1 and page4-SG2 of the network card 313 are shown.
  • the first relationship refers to the corresponding relationship between the memory address of the memory page and the SG information, for example, the memory address of page2 is 001, and the memory address of page4 is 011, the corresponding relationship includes 001-SG1, and 011-SG2.
  • the first relationship refers to the relationship associated with the virtual machine to be migrated.
  • each dirty page corresponds to one SG information. For example, if SG1 includes "001 4KB”, it means that page 2 with the address "001" in the memory 312 is a dirty page, and the data length of the business data stored in the dirty page is 4KB; if SG2 includes "011 4KB", it means that the address in the memory 312 The page4 of "011” is a dirty page, and the data length of the service data stored in the dirty page is 4KB.
  • the basic unit of reading and writing data in the memory 312 is the page, and the storage space of a single page is 4KB. If multiple dirty pages are involved, and the memory addresses of the multiple dirty pages are continuous, then the multiple dirty pages may only correspond to one SG information in the SQ. For example, the addresses of the two dirty pages are "001" and "010" respectively, and the data length of the business data stored in each dirty page is 4KB, then the network card 313 maps the business data stored in the two dirty pages When it comes to SQ, the SG information corresponding to the two dirty pages can be "001 8KB".
  • the dirty page in the memory 312 is mapped to the SG information in the SQ by the network card 313, and the business data stored in the dirty page does not need to be copied to the memory in the network card 313.
  • the data copying of service data from the memory 312 to the network card 313 reduces the data migration time of the VM and improves the hot migration efficiency of the VM.
  • the SG information maintained by the receive queue (RQ) included in the network card 323 has a second relationship with the memory pages in the memory 322 .
  • the second relationship refers to the corresponding relationship between the service data stored in the memory page and the SG information.
  • page2-SG1 and page4-SG2 in the network card 323 are shown.
  • the second relationship refers to the corresponding relationship between the memory address of the memory page in the memory 322 and the SG information.
  • the memory address of page2 is 001
  • the memory address of page4 is 011.
  • the second relationship includes 001-SG1, and 011 -SG2.
  • the second relationship refers to the corresponding relationship between the memory address (or dirty page data) of the dirty page associated with the virtual machine to be migrated and the SG information.
  • the network card 323 constructs the above-mentioned second relationship according to the memory address contained in the page information.
  • the virtual machine to be migrated in the source host, the virtual machine to be migrated is called the source VM; in the destination host, the virtual machine to be migrated is called the target VM.
  • the memory address of a set of data on the source host (such as the source VM GPA) and the memory address of the destination host (such as the destination VM GPA) should be is consistent.
  • the consistency between the source VM GPA and the target VM GPA is realized by virtual machine management software, for example, after the network card 313 builds the above-mentioned first relationship, the network card 323 uses the virtual machine management software and the aforementioned first relationship One relationship is used to construct the second relationship, so that the source VM GPA is consistent with the target VM GPA, so as to avoid changes in the memory address of the indicated data in the virtual machine after the virtual machine is migrated, and improve the accuracy of the virtual machine migration.
  • the network card 313 obtains the SG information corresponding to the dirty page from the SQ; the network card sends the SG information to the RQ indicated by the migration message indicated data.
  • the memory address included in the SG information corresponding to the dirty page may be the HPA or GPA provided in the above example.
  • the network card 313 migrates the dirty page data stored in the dirty page to the host 320 based on the storage address space indicated by the HPA.
  • the network card 313 performs address translation based on the GPA to obtain the HPA of the dirty page, and then migrates the dirty page data stored in the storage address space indicated by the HPA to the host 320 .
  • the address translation performed by the network card 313 based on the GPA can be realized by an IO memory management unit (memory management unit, MMU).
  • MMU memory management unit
  • the IOMMU can perform address translation on the GPA based on the VF information adopted by the virtual machine to be migrated at the host computer 310. Convert to get the HPA of the dirty page.
  • the VF information is used to indicate the identity of the virtual PCIe device (such as network card or memory) adopted by the virtual machine to be migrated.
  • the network card 313 Since the host provides an independent memory space for each VF, and in the source host, one VF corresponds to one or more SQs, the network card 313 performs address translation on the SG information in the SQ including the GPA based on the VF information, and obtains the HPA of the dirty page, The network card 313 can migrate the dirty page data to the host 320 based on the HPA, realize the data migration of the virtual machine to be migrated, avoid the copy process of the dirty page data from the memory 312 to the network card 313, reduce the time for data migration, and improve Improve the migration efficiency of virtual machines.
  • the dirty page in the internal memory 312 is mapped to the SG information in the SQ by the network card 313, and the service data stored in the dirty page does not need to be copied to the memory in the network card 313.
  • the service data is avoided
  • the data copy from the memory 312 to the network card 313 reduces the VM hot migration time and improves the VM hot migration efficiency.
  • the above-mentioned embodiment is illustrated by taking the SQ and RQ in the message queue pair (QP) as 1:1 as an example.
  • the ratio of SQ and RQ may also be N:M, N and M are both positive integers, and N ⁇ M.
  • the quantity ratio of SQ and RQ is 2:1, as in the network card 313, the SG1 information corresponding to page2 is stored in SQ(1), and the SG2 information corresponding to page4 is stored in SQ(2); in the network card 323, the SG1 information and SG2 information are stored in the same RQ.
  • the network card 313 migrates the data stored in page2 and page4 to the storage address space mapped by the RQ based on the above SG1 information and SG2 information.
  • the migration of the data to be migrated may be implemented through TCP/IP protocol or RDMA network.
  • the host 310 (source host) establishes two transmission channels with the host 320 (destination host) through the TCP/IP protocol: a control connection and a data connection.
  • the control connection is used to transmit page information and migration messages
  • the data connection is used to transmit data to be migrated.
  • different transmission channels are used to transmit different information or data, which avoids the page information being transmitted by the data connection, resulting in that the page information cannot be processed by the network card 323 on the receiving side, and problems occur in the hot migration of the virtual machine , improving the data migration stability of the virtual machine.
  • the host 310 may establish only one transmission channel with the host 320 (destination host) through the TCP/IP protocol: a single connection.
  • This single connection can be used to transmit the above-mentioned page information, migration message and data to be migrated.
  • Step 1 The network card 313 sends the page information (SG information) of the dirty page to be copied to the network card 323 through a single connection.
  • the address contained in the SG information in the sending queue and the receiving queue can be GPA or HPA, as long as the memory spaces corresponding to the two (source VM and destination VM) are completely consistent at the GPA level, it can realize the memory of the source host to the destination A complete copy of the host's memory is sufficient.
  • the IOMMU needs to realize the address conversion of GPA ⁇ HPA.
  • the network card 313 points directly to the PCIe device for implementing PF that the host 320 belongs to, and directly sends it to the host 320. At this time, because the data transmitted by the network card 313 based on a single connection is sent to the host 320, that is, in this single connection Among them, the information of the PCIe device implementing PF (referred to as PF information) is the same, and the network card 313 can configure PF in the connection context of TCP.
  • the PF information is used to indicate the identity of the physical PCIe device used by the virtual machine to be migrated.
  • the network card 313 configures PF information in the TCP connection context, avoiding the need to configure PF information for each set of data during the migration process of multiple sets of data of a virtual machine between the source host and the destination host, reducing data
  • the total time of migration improves the data migration efficiency of the virtual machine.
  • PF and VF are concepts of virtualized devices in the PCIe interface, and should not be construed as limiting the present application. For other interfaces, just mark the virtual device with the corresponding identifier.
  • Step 2 The destination host allocates a message processing ID (such as the above-mentioned migration message) based on the page information received by the network card 323 , and a message processing ID corresponds to a receiving queue in the network card 323 .
  • a message processing ID such as the above-mentioned migration message
  • the message processing ID is allocated by the network card 323, so as to reduce the load of the destination host and increase the computing capability of the destination host to process other services.
  • the network card 323 can also use the hot migration management program to set up a receiving queue according to the received page information (SG information) of the dirty page.
  • SG information the received page information
  • the SG sequence of the corresponding receiving memory block in the receiving queue is exactly the same as the SG sequence of the corresponding memory block in the sending queue, and the corresponding memory space is completely consistent at the GPA level of the VM, and the ones placed in the receiving queue are the aforementioned Page information (SG information or SG information block).
  • Step 3 the network card 313 adds the memory block (dirty page data) corresponding to the dirty page to be sent this time, and after adding a message header carrying the aforementioned message processing ID, puts it in the SG mode according to the same memory organization order as the SG information In the SQ of TCP, the network card 313 notifies the hardware (such as a communication interface) to fetch data from the memory specified by the SQ and send it to the RQ of the destination host.
  • the hardware such as a communication interface
  • Step 4 After receiving the data carrying the message header (message processing ID), the network card 323 obtains the SG information stored in the corresponding RQ according to the message processing ID, and writes the data into the memory of the destination host corresponding to the SG information.
  • the network card 323 may also notify the network card 313 of the cache information of the memory 322.
  • the cache information includes The available margin of the cache (buffer) space of host hot migration, such as 10 million bytes (mega byte, MB), in the transmission process of business data, the business data sent by network card 313 does not exceed the buffer space notified by host 320
  • the available margin is to prevent the data volume of the service data from being too large, causing the network card 323 to be unable to quickly write the data in the buffer space into the memory 322 .
  • the above-mentioned PF information/VF information can also be used to distinguish the VM to which the data in a single connection belongs, and avoid the occurrence of data of multiple VMs in a single connection. Transmission errors, such as misidentifying the data of VM (1) as the data of VM (2), thereby improving the accuracy of VM live migration.
  • the live migration management process of the VM is implemented by the network card, which does not need to consume the computing resources of the processor of the source host; and the business data of the VM is directly copied from the memory of the source host to the memory of the destination host by the network card, reducing the amount of business data Copy times and data copy time, thereby reducing the live migration delay of the VM.
  • the network card 313 can also migrate business data in one unit (such as 10 dirty pages) to the destination host (host 320), and the network card 313 receives feedback from the destination host If the response indicates that the network card 323 has completed the writing operation of the business data of the unit, the network card 313 initiates the process of migrating the business data of the next unit.
  • business data such as 10 dirty pages
  • the business data of dirty pages does not need to be saved in the network card, and the amount of metadata is smaller than that of business data. The amount of data, which reduces the consumption of storage resources in the network card.
  • the network card can use the metadata of the dirty page to send the business data stored in the dirty page of the memory to the destination host, this avoids the copying process of business data from the memory to the network card, reduces the data copy time, and thus improves the performance of the VM.
  • the delay of live migration improves the efficiency of VM live migration.
  • the virtual machine live migration process is implemented based on the TCP protocol, but in some possible cases, the virtual machine live migration process may also be implemented based on the RDMA network.
  • RDMA is a technology that bypasses the kernel of the remote host operating system to access data in its memory. Because it does not go through the operating system, it not only saves a lot of CPU resources, but also improves the system throughput and reduces the network communication delay of the system. It is especially suitable for It is widely used in massively parallel computer clusters.
  • RDMA has several major characteristics, (1) data is transmitted between the network and the remote machine; (2) without the participation of the operating system kernel, all content related to sending and transmitting is offloaded to the smart network card; (3) virtualized in user space The direct data transmission between the memory and the smart network card does not involve the operating system kernel, and there is no additional data movement and copying.
  • Infiniband is a network specially designed for RDMA, which ensures reliable transmission from the hardware, and requires network cards and switches that support this technology.
  • RoCE and iWARP are based on the RDMA technology of Ethernet, and only need to configure special network cards. In terms of performance, the Infiniband network is the best, but the price of network cards and switches is also high, while RoCE and iWARP only need to use special network cards, and the price is relatively much cheaper.
  • the network card 313 and the network card 323 can also be called smart network cards, and a possible hardware and software implementation of the smart network card is given here: intelligent
  • the network card includes a CPU and a network adapter (network interface card, NIC), and a hot migration management program runs on the CPU.
  • the network card 313 registers the entire memory space GPA and HPA of the VM to be migrated (or source VM) into a storage (memory) area, and generates a memory protection table (memory protect table, MPT), and a memory translation table ( memory translation table, MTT), wherein, the PF information of the access host is added to the MPT table to obtain the local source local key (source local key, S_LKey) and source remote key (source remote key, S_RKey).
  • MPT memory protection table
  • MTT memory translation table
  • the PF information of the access host is added to the MPT table to obtain the local source local key (source local key, S_LKey) and source remote key (source remote key, S_RKey).
  • the MPT table is used to indicate the corresponding relationship between the HPA of the memory and the GPA of the virtual machine to be migrated.
  • the network card 323 registers the entire memory space GPA and HPA of the VM into a memory area, and generates an MPT table and an MTT table.
  • the PF information of the accessing host is added to the MPT table to obtain the local destination local key. , D_LKey) and the destination remote key (destination remote key, D_RKey).
  • the network card 313 sends the data to be migrated to the RQ indicated by the migration message, which may include the following process: first, the network card 313 determines the PF information of the virtual machine to be migrated according to the migration information; secondly, according to the PF information of the virtual machine to be migrated and The GPA queries the MPT table to determine the HPA corresponding to the page information; finally, the network card 313 sends the data to be migrated stored in the HPA of the virtual machine to be migrated to the destination host (host 320 ).
  • the live migration management program running in the network card 313 obtains the description information (metadata) of dirty pages in the memory 312, determines the D_RKey registered in the network card 323 as the R-KEY with the GPA address, and initiates "RDMA write” batch by batch to write the data To memory 322 in the destination host (host 320).
  • send/receive is a bilateral operation, that is, it requires remote application awareness to participate in sending and receiving.
  • read and write are unilateral operations, which only need to specify the source address and destination address of the information at the local end. The remote application does not need to perceive this communication. The reading or storage of data is completed through the remote network card, and then the remote network card is encapsulated into a The message is returned to the local end.
  • send/receive can be used to transmit description information of dirty pages
  • read and write can be used to transmit business data stored in dirty pages.
  • the data migration method of this embodiment is described by taking the RDMA write operation as an example.
  • the RDMA write operation is used for the requester (such as the network card 313 ) to write data into the storage space of the responder (such as the host 320 ).
  • the host 320 Before allowing the network card 313 to perform RDMA write operations, the host 320 first allocates a storage space for the QP (or QP group) of the host 320 to access.
  • the channel adapter of host 320 associates a key with the virtual address of this memory space.
  • the host 320 sends the virtual address, length and key of the storage space to the network card 313 that can access the memory area.
  • the above information may be sent to the network card 313 through the above-mentioned sending operation.
  • the virtual address and key of the memory space can be used to determine the HPA of the dirty page.
  • the network card 313 can initiate an RDMA write operation by sending an RDMA write message, which includes the data to be written to the host 320, the virtual address of the storage space of the host 320, the length of the data, and the key.
  • the length of the data can be between 0 bytes and 231 bytes. Similar to the sending operation, if the length of the data is greater than the path maximum transmission unit (path maximum transmission unit, PMTU), it will be segmented into multiple messages according to the PMTU size , and the host 320 reassembles these packets to obtain data.
  • path maximum transmission unit path maximum transmission unit
  • the host 320 sends an acknowledgment message to the network card 313 for each message; ), the host 320 can send a confirmation message to the network card 313 for each message, or send a confirmation message to the network card 313 for multiple consecutive messages of the same data, or send a confirmation message to the network card 313 for the tail packet of the message A confirmation message; in addition, no matter whether the data is a short message or a long message, the host 320 can send a confirmation message for a plurality of messages received before, for example, a message sequence number (packet sequence numbers, PSN) is the RDMA of X
  • PSN packet sequence numbers
  • the confirmation message of the write message may be used to confirm that the message whose PSN is less than X before the RDMA write message has been successfully received by the host 320.
  • the host and the network card include corresponding hardware structures and/or software modules for performing respective functions.
  • the present application can be implemented in the form of hardware or a combination of hardware and computer software with reference to the units and method steps of the examples described in the embodiments disclosed in the present application. Whether a certain function is executed by hardware or computer software drives the hardware depends on the specific application scenario and design constraints of the technical solution.
  • FIG. 6 is a schematic structural diagram of a data migration device provided by the present application.
  • the data migration device 600 can be used to implement the functions of the host or the network card in the above-mentioned method embodiment, so it can also realize the beneficial effects of the above-mentioned method embodiment.
  • the data migration device 600 may be the network card 250 shown in FIG. 2, or the network card 313 or the network card 323 shown in FIGS. Modules (such as chips).
  • the data migration apparatus 600 includes a communication unit 610 , a data identification unit 620 , a migration unit 630 and a storage unit 640 .
  • the data migration device 600 is used to implement the functions of the network card in the method embodiments shown in FIGS. 3 to 5 above.
  • the specific process of the data migration apparatus 600 for implementing the above data migration method includes the following contents 1-3.
  • the communication unit 610 is configured to obtain a data migration notification message.
  • the first data migration notification message is used to indicate the identity of the virtual machine to be migrated.
  • the data identifying unit 620 is configured to determine the data to be migrated according to the identifier, the data to be migrated is stored in the memory of the source host and associated with the virtual machine to be migrated.
  • the data identification unit 620 may determine the memory page set associated with the virtual machine to be migrated in the memory of the source host according to the aforementioned identification, and use the data stored in the memory page set as the data to be migrated.
  • the storage unit 640 included in the data migration apparatus 600 is used to store dirty page marking information
  • the data identification unit 620 can use the identification of the virtual machine to be migrated in the data migration notification message and the dirty page marking information , determine the dirty page associated with the virtual machine to be migrated in the memory of the source host, so as to determine the dirty page data stored in the associated dirty page.
  • the dirty page marking information For more detailed content about the dirty page marking information, reference may be made to the relevant descriptions in Table 1 in the foregoing method embodiments, and details are not repeated here.
  • the migration unit 630 is configured to migrate the aforementioned data to be migrated to the destination host.
  • the data to be migrated is determined by the data identification unit according to the identity of the virtual machine to be migrated, which avoids the process of the source host determining the data to be migrated of the virtual machine, and reduces the calculation required by the source host during the virtual machine live migration process resources, which improves the ability of the source host to perform other services (such as AI, HPC and other computing-intensive and delay-sensitive services).
  • other services such as AI, HPC and other computing-intensive and delay-sensitive services.
  • the communication unit 610 is used to perform S310; the data identification unit 620 is used to perform S320; and the migration unit 630 is used to perform S330.
  • the communication unit 610 is used to perform S330; the migration unit 630 is used to perform S340.
  • the communication unit 610 is used to perform S410; the migration unit 630 is used to perform S430.
  • the communication unit 610 is used to execute S410-S430.
  • the communication unit 610 can be used to receive the virtual machine data sent by other hosts, and the migration unit 630 can migrate the virtual machine data To the memory address space mapped by the SG information in the receiving queue, so as to prevent the data migration device 600 from making multiple copies of the virtual machine data on the receiving host, reduce the consumption of computing resources and storage resources of the destination host, and improve the hot migration efficiency of the virtual machine , and the processing capability of the destination host to perform other computing services.
  • the data migration device 600 of the embodiment of the present invention can be implemented by a CPU, or by an ASIC, or by a programmable logic device (programmable logic device, PLD), and the above-mentioned PLD can be a complex program logic device ( complex programmable logical device (CPLD), FPGA, generic array logic (generic array logic, GAL) or any combination thereof.
  • PLD programmable logic device
  • CPLD complex programmable logical device
  • FPGA complex programmable logical device
  • GAL generic array logic
  • the hardware may be an electronic device, such as the aforementioned network card, or a processor or chip applied to the network card, for example, the electronic device includes an interface circuit and a control circuit.
  • the interface circuit is used to receive signals from other devices other than the electronic device and transmit them to the control circuit, or send signals from the control circuit to other devices other than the electronic device.
  • control circuit implements the method in any possible implementation manner in the foregoing embodiments through a logic circuit or by executing code instructions.
  • code instructions for the beneficial effects, reference may be made to the description in any of the foregoing embodiments, and details are not repeated here.
  • the network card according to the embodiment of the application may correspond to the data migration device 600 in the embodiment of the application, and may correspond to the corresponding subjects in FIGS.
  • the above-mentioned and other operations and/or functions of each module in FIG. 3 are for realizing the corresponding flow of each method in FIG. 3 to FIG.
  • processor in the embodiments of the present application may be a CPU, NPU or GPU, and may also be other general-purpose processors, DSP, ASIC, FPGA or other programmable logic devices, transistor logic devices, hardware components or other random combination.
  • a general-purpose processor can be a microprocessor, or any conventional processor.
  • the method steps in the embodiments of the present application may be implemented by means of hardware, or may be implemented by means of a processor executing software instructions.
  • Software instructions can be composed of corresponding software modules, and software modules can be stored in random access memory (random access memory, RAM), flash memory, read-only memory (read-only memory, ROM), programmable read-only memory (programmable ROM) , PROM), erasable programmable read-only memory (erasable PROM, EPROM), electrically erasable programmable read-only memory (electrically EPROM, EEPROM), register, hard disk, mobile hard disk, CD-ROM or known in the art any other form of storage medium.
  • An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium.
  • the storage medium may also be a component of the processor.
  • the processor and storage medium can be located in the ASIC.
  • the ASIC can be located in a network device or a terminal device.
  • the processor and the storage medium may also exist in the network device or the terminal device as discrete components.
  • all or part of them may be implemented by software, hardware, firmware or any combination thereof.
  • software When implemented using software, it may be implemented in whole or in part in the form of a computer program product.
  • the computer program product comprises one or more computer programs or instructions. When the computer program or instructions are loaded and executed on the computer, the processes or functions described in the embodiments of the present application are executed in whole or in part.
  • the computer may be a general purpose computer, a special purpose computer, a computer network, network equipment, user equipment, or other programmable devices.
  • the computer program or instructions can be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer program or instructions can be downloaded from a website, computer, A server or data center transmits to another website site, computer, server or data center by wired or wireless means.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device such as a server or a data center integrating one or more available media. Described usable medium can be magnetic medium, for example, floppy disk, hard disk, magnetic tape; It can also be optical medium, for example, digital video disc (digital video disc, DVD); It can also be semiconductor medium, for example, solid state drive (solid state drive) , SSD).

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种数据迁移方法、装置及电子设备,涉及计算机技术领域。该数据迁移方法包括:首先,源主机的网卡获取数据迁移通知消息,该数据迁移通知消息用于指示待迁移的虚拟机的标识;其次,网卡根据标识确定待迁移数据,该待迁移数据为存储在源主机的内存中,且与待迁移的虚拟机关联的数据;最后,网卡将待迁移数据迁移至目的主机。待迁移数据是由网卡依据待迁移的虚拟机的标识确定的,避免了源主机对虚拟机的待迁移数据进行确定的过程,减少了源主机在虚拟机热迁移过程中所需的计算资源,提高了源主机执行其他业务(如AI、HPC等计算密集型和时延敏感型业务)的能力。

Description

数据迁移方法、装置及电子设备
本申请要求于2021年11月26日提交国家知识产权局、申请号为202111426045.1、申请名称为“数据迁移方法、装置及电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机技术领域,尤其涉及一种数据迁移方法、装置及电子设备。
背景技术
虚拟机(virtual machine,VM)指通过虚拟化技术将物理的计算资源、存储资源和网络资源进行虚拟获得的虚拟设备。运行虚拟机所在物理设备称为宿主机。由于宿主机存在硬件故障等问题,为了保证虚拟机的正常运行,通常采用热迁移技术实现将VM从源宿主机(也可以称为源主机)迁移到目的宿主机(也可以称为目的主机),来实现硬件的双机容错或者负载均衡功能。为了保证业务不中断,热迁移技术需包括VM的内存数据的迁移,这就涉及内存中脏页数据的处理,而脏页是指内存中数据发生修改的存储空间。通常地,由源主机的处理器执行热迁移处理,包括识别脏页、搬迁脏页中数据、以及指示网卡向目的主机发送上述脏页中数据。上述过程完全依赖于源主机的处理器的处理能力和输入/输出(input/output,I/O)带宽,对于部署人工智能(artificial intelligent,AI)、高性能计算(high performance computing,HPC)等计算密集型和时延敏感型业务而言,现有方案无法满足业务性能需求,因此,如何提供一种更高效的数据迁移方法成为亟待解决的技术问题。
发明内容
本申请提供一种数据迁移方法、装置及电子设备,解决了源主机在虚拟机的数据迁移过程中,业务性能降低的问题。
第一方面,提供了一种数据迁移方法,该数据迁移方法可应用于源主机,或者支持实现该数据迁移方法的物理设备,例如该物理设备包括芯片系统。例如,源主机包括内存和网卡,该数据迁移方法由源主机的网卡执行,该数据迁移方法包括:首先,源主机的网卡获取第一数据迁移通知消息,该第一数据迁移通知消息用于指示待迁移的虚拟机的标识。其次,源主机的网卡根据标识确定待迁移数据,该待迁移数据为存储在源主机的内存中,且与前述待迁移的虚拟机关联的数据。最后,源主机的网卡将待迁移数据迁移至目的主机。
在本申请中待迁移数据是由网卡依据待迁移的虚拟机的标识确定的,避免了源主机对虚拟机的待迁移数据进行确定的过程,减少了源主机在虚拟机热迁移过程中所需的计算资源,提高了源主机执行其他业务(如AI、HPC等计算密集型和时延敏感型业务)的能力。
在一种可选的实现方式中,网卡根据标识确定待迁移数据,包括:网卡根据标识确定源主机的内存中与待迁移的虚拟机关联的内存页集合,并将内存页集合中存储的数据作为待迁移数据。其中,内存页集合包括一个或多个内存页。由网卡依据待迁移的虚拟机的标识确定待迁移数据,避免了由源主机的处理器确定待迁移数据导致源主机的性能下降,提高了源主机处理其他业务的能力,减少了源主机的卡顿。
在另一种可选的实现方式中,上述实施例提供的待迁移数据包括脏页数据,脏页数据 为一个或多个内存页中,数据发生修改的内存页所存储的数据。
在另一种可选的实现方式中,网卡根据标识确定待迁移数据包括的脏页数据,包括:网卡查询网卡中保存的脏页标记信息,确定与标识关联的脏页集合;继而,网卡将该脏页集合中存储的数据作为脏页数据。其中,脏页集合包括一个或多个脏页,该脏页为一个或多个内存页中数据发生修改的内存页,前述的脏页标记信息用于指示脏页的内存地址。
由网卡来对源主机的内存中与待迁移的虚拟机关联的内存页中的脏页进行标记,即内存中的标脏功能由源主机的处理器卸载到了网卡,避免了处理器标记内存中脏页的过程,减少了处理器的资源消耗,进而,避免了由于处理器管理虚拟机的热迁移过程导致的源主机的其他计算业务受到影响。
在另一种可选的实现方式中,脏页标记信息包括第一脏页表和第二脏页表中至少一个。其中,第一脏页表用于标记脏页为标脏状态,该标脏状态为源主机对脏页中存储的数据进行修改的状态;第二脏页表用于标记脏页为数据迁移状态,该数据迁移状态为网卡对脏页中存储的数据进行迁移的状态。
在另一种可选的实现方式中,网卡将待迁移数据迁移至目的主机,包括:第一,网卡向目的主机发送待迁移数据的页面信息,页面信息用于指示待迁移数据在内存中的内存地址和偏移量。第二,网卡接收目的主机基于页面信息反馈的迁移消息,迁移消息用于指示目的主机中与待迁移的虚拟机相应的接收队列(receive queue,RQ)。第三,网卡向迁移消息指示的RQ发送待迁移数据。在目的主机中网卡中存在多个RQ,一个RQ对应一个虚拟机的情况下,由目的主机中网卡依据源主机中网卡发送的页面信息,为源主机中待迁移的虚拟机分配迁移消息,并由源主机中网卡向该迁移消息指示的RQ迁移待迁移数据,避免了待迁移的虚拟机(如VM1)的待迁移数据被发送到其他虚拟机(如VM2)对应的RQ,提高了虚拟机的数据迁移准确性。
在另一种可选的实现方式中,网卡中发送队列(send queue,SQ)维护有包含脏页的内存地址和偏移量的SG信息,前述的页面信息包括脏页对应的SG信息。
在另一种可选的实现方式中,网卡向迁移消息指示的RQ发送待迁移数据,包括:网卡从SQ中获取脏页对应的SG信息,并向迁移消息指示的RQ发送SG信息指示的数据。在虚拟机热迁移过程中,将源主机的数据迁移功能卸载到网卡中,避免了源主机中的处理器执行多次虚拟机的数据拷贝操作,降低了源主机的计算资源消耗,提高了源主机处理其他计算业务的效率。
在另一种可选的实现方式中,源主机通过传输控制协议/网络之间互连协议(transmission control protocol/internet protocol,TCP/IP)和目的主机建立有控制连接和数据连接,其中,该控制连接用于传输页面信息和迁移消息,该数据连接用于传输待迁移数据。在本实施例中,不同的传输通道用于传输不同的信息或数据,避免了页面信息由数据连接传输,导致该页面信息无法被接收侧(目的主机)的网卡进行处理,以及虚拟机的热迁移出现问题,提高了虚拟机的数据迁移稳定性。
在另一种可选的实现方式中,源主机通过TCP/IP和目的主机建立有单一连接,单一连接用于传输页面信息、迁移消息和待迁移数据。
在另一种可选的实现方式中,迁移消息是目的主机为待迁移的虚拟机分配的消息处理标识(identifier,ID)。由于源主机和目的主机之间的单一连接可以传输不同VM的数据, 上述的消息处理ID还可以用于区分单一连接中数据所属的VM,避免单一连接中多个VM的数据发生传输错误,如将VM(1)的数据错误的识别为VM(2)的数据,进而,提高了VM热迁移的准确性。
在另一种可选的实现方式中,源主机通过远程直接内存访问(remote direct memory access,RDMA)网络和目的主机进行通信,网卡中存储有内存保护表(memory protect table,MPT),MPT表用于指示内存的主机物理地址(host physical address,HPA)和待迁移的虚拟机的客户机物理地址(guest physical address,GPA)的对应关系,且MPT表中包含有待迁移的虚拟机所使用的物理功能(physical function,PF)信息。网卡向迁移消息指示的RQ发送待迁移数据,包括:首先,网卡依据迁移信息确定待迁移的虚拟机的PF信息。其次,网卡根据待迁移的虚拟机的PF信息和GPA查询MPT表,确定页面信息对应的HPA。最后,网卡向目的主机发送待迁移的虚拟机的HPA中存储的待迁移数据。
在另一种可选的实现方式中,数据迁移方法还包括:网卡获取第二数据迁移通知消息,该第二数据迁移通知消息用于指示待接收的虚拟机的标识。网卡将其他主机发送的待接收数据迁移至源主机的内存中,待接收数据为存储在其他主机的内存中,且与待接收的虚拟机关联的数据。在本实施例中,源主机可以作为发送端将待迁移的虚拟机的待迁移数据发送到目的主机,源主机还可以作为接收端接收其他主机发送的虚拟机的数据,从而实现源主机的多个虚拟机的迁移过程中的收发功能,提高了源主机的数据迁移性能。
第二方面,本申请提供了一种数据迁移装置,该数据迁移装置应用于源主机的网卡,该数据迁移装置包括用于执行第一方面或第一方面任一种可能实现方式中的数据迁移方法的各个模块。
示例的,数据迁移装置包括:通信单元、数据识别单元和迁移单元。其中,通信单元用于获取第一数据迁移通知消息,该第一数据迁移通知消息用于指示待迁移的虚拟机的标识。数据识别单元用于根据标识确定待迁移数据,该待迁移数据为存储在源主机的内存中,且与待迁移的虚拟机关联的数据。迁移单元用于将前述的待迁移数据迁移至目的主机。
有益效果可以参见第一方面中任一实现方式的描述,此处不再赘述。所述数据迁移装置具有实现上述第一方面中任一实现方式的方法实例中行为的功能。所述功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。所述硬件或软件包括一个或多个与上述功能相对应的模块。
第三方面,本申请提供了一种电子设备,该电子设备包括:接口电路和控制电路。接口电路用于接收来自电子设备之外的其它设备的信号并传输至控制电路,或将来自控制电路的信号发送给电子设备之外的其它设备,控制电路通过逻辑电路或执行代码指令用于实现第一方面中任一实现方式的方法。有益效果可以参见第一方面中任一实现方式的描述,此处不再赘述。
在一种可能的示例中,该电子设备可以是指网卡。
在另一种可能的示例中,该电子设备也可以是指网卡包含的处理器。
在又一种可能的示例中,该电子设备还可以是指包含有网卡的专用处理设备,该专用处理设备可以实现第一方面中任一实现方式的方法。
值得注意的是,上述三种示例仅为本实施例提供的可能的实现方式,不应理解为对本申请的限定。
第四方面,本申请提供一种计算机可读存储介质,存储介质中存储有计算机程序或指令,当计算机程序或指令被主机或网卡执行时,实现第一方面和第一方面中任一种可能实现方式中的方法。
第五方面,本申请提供一种计算机程序产品,该计算程序产品包括指令,当计算机程序产品在主机或网卡上运行时,使得主机或网卡执行该指令,以实现第一方面和第一方面中任一种可能实现方式中的方法。
第六方面,本申请提供一种芯片,包括存储器和处理器,存储器用于存储计算机指令,处理器用于从存储器中调用并运行该计算机指令,以执行上述第一方面及其第一方面任意可能的实现方式中的方法。
本申请在上述各方面提供的实现方式的基础上,还可以进行进一步组合以提供更多实现方式。
附图说明
图1为本申请提供的一种通信系统的应用场景图;
图2为本申请提供的一种主机的结构示意图;
图3为本申请提供的一种数据迁移方法的流程示意图一;
图4为本申请提供的一种数据迁移方法的流程示意图二;
图5为本申请提供的一种数据迁移的示意图;
图6为本申请提供的一种数据迁移装置的结构示意图。
具体实施方式
本申请提供一种数据迁移方法:首先,源主机的网卡获取数据迁移通知消息,该数据迁移通知消息用于指示待迁移的虚拟机的标识;其次,网卡根据标识确定待迁移数据,该待迁移数据为存储在源主机的内存中,且与待迁移的虚拟机关联的数据;最后,网卡将待迁移数据迁移至目的主机。在本实施例中,待迁移数据是由网卡依据待迁移的虚拟机的标识确定的,避免了源主机对虚拟机的待迁移数据进行确定的过程,减少了源主机在虚拟机热迁移过程中所需的计算资源,提高了源主机执行其他业务(如AI、HPC等计算密集型和时延敏感型业务)的能力。
下面结合附图详细介绍本申请提供的数据迁移方法。
图1为本申请提供的一种通信系统的应用场景图,该通信系统包括计算机集群(computer cluster)110和客户端120,计算机集群110可以通过网络130与客户端120进行通信,网络130可以是因特网,或其他网络(如以太网)。网络130可以包括一个或多个网络设备,如网络设备可以是路由器或交换机等。
客户端120可以是运行有应用程序的计算机,该运行有应用程序的计算机可以是物理机,也可以是虚拟机。例如,若该运行有应用程序的计算机为物理计算设备,该物理计算设备可以是主机或终端(Terminal)。其中,终端也可以称为终端设备、用户设备(user equipment,UE)、移动台(mobile station,MS)、移动终端(mobile terminal,MT)等。终端可以是手机、平板电脑、笔记本电脑、桌面电脑、台式计算机、虚拟现实(virtual reality,VR)终端设备、增强现实(augmented reality,AR)终端设备、工业控制中的无线终端、无人驾驶中的无线终端、远程手术(remote medical surgery)中的无线终端、智能电网(smart  grid)中的无线终端、运输安全(transportation safety)中的无线终端、智慧城市(smart city)中的无线终端、智慧家庭(smart home)中的无线终端等等。本申请的实施例对客户端120所采用的具体技术和具体设备形态不做限定。在一种可能的实现方式中,该客户端120可以是运行在计算机集群110中任意一个或多个主机上的软件模块。
计算机集群110是指通过局域网或互联网连接计算机集合,通常用于执行大型任务(也可以称为作业(job)),这里的作业通常是需要较多计算资源并行处理的大型作业,本实施例不限定作业的性质和数量。一个作业可能包含多个计算任务,这些任务可以分配给多个计算资源执行。大多数任务是并发或并行执行的,而有一些任务则需要依赖于其他任务所产生的数据。计算机集群中每台计算设备使用相同的硬件和相同的操作系统;也可以根据业务需求,在计算机集群的主机中采用不同的硬件和不同的操作系统。由于采用计算机集群部署的任务可并发度执行,可以提升总体性能。
如图1所示,计算机集群110包括多个主机,例如主机111~主机114,各个主机可以用于提供计算资源。就一台主机来说,它可以包含多个处理器或处理器核,每个处理器或者处理核可以是一个计算资源,因此一台物理主机可以提供多个计算资源。例如,物理主机可以是指服务器。
计算机集群110可以处理多种类型的作业。本实施例对任务的数量,以及可以并行执行的任务的数据都没有予以限制。示例的,上述的作业是指虚拟机的热迁移或者主机之间的主从备份等,如数据备份。
在图1中,作业可以从客户端120通过网络130向主机111提交给计算机集群110。在作业从主机111提交给计算机集群110的情况下,主机111可以用于管理计算机集群110中所有的主机,以完成该作业所包括的一个或多个任务,如调度其他主机的计算资源或存储资源等。在另一种可能的实现方式中,作业提交的位置也可以是计算机集群110中的其他主机,本实施例不限定提交作业的位置。
如图1所示,计算机集群110中可以运行有一个或多个虚拟机。虚拟机是指通过虚拟化技术将物理的计算资源、存储资源和网络资源进行虚拟获得的虚拟设备。
在一种可能的示例中,一个主机上运行有一个或多个VM,如主机111中运行有2个VM,主机114中运行有1个VM。
在另一种可能的示例中,一个VM运行在多个主机上,如一个VM利用主机111的处理资源和主机114的存储资源等。
图1仅为本实施例提供的示例,不应理解为对本申请的限定,本申请以一个VM运行在一个主机为例进行说明。
值得注意的是,图1只是示意图,该通信系统中还可以包括其他设备,在图1中未画出,本申请的实施例对该系统中包括的主机(计算设备)、客户端的数量和类型不做限定。示例的,计算机集群110还可以包括更多或更少的计算设备,如计算机集群110包括两个计算设备,一个计算设备用于实现上述主机111和主机112的功能,另一个计算设备用于实现上述主机113和主机114的功能。
图2为本申请提供的一种主机的结构示意图,示例性的,图1中的任一个主机可通过图2所示的主机200来实现,该主机200包括基板管理控制器(baseboard management controller,BMC)210、处理器220、内存230、硬盘240和网卡250。
基板管理控制器210,可以对设备进行固件升级,对设备的运行状态进行管理以及排除故障等。处理器220可通过外围器件互联(Peripheral Component Interconnect express,PCIe)总线、通用串行总线(Universal Serial Bus,USB),或者集成电路总线(Inter-Integrated Circuit,I2C)等总线访问基板管理控制器210。基板管理控制器210还可以和至少一个传感器相连。通过传感器获取计算机设备的状态数据,其中状态数据包括:温度数据,电流数据、电压数据等等。在本申请中不对状态数据的类型做具体限制。基板管理控制器210通过PCIe总线或者其他类型的总线和处理器220通信,例如,将获取到的状态数据,传递给处理器220进行处理。基板管理控制器210也可以对存储器中的程序代码进行维护,包括升级或恢复等等。基板管理控制器210还可以对主机200内的电源电路或时钟电路进行控制等。总之,基板管理控制器210可以通过以上方式实现对主机200的管理。然而,基板管理控制器210只是一个可选设备。在一些实施方式中,处理器220可以直接和传感器通信,从而对计算机设备直接进行管理和维护。
值得说明的是,主机中器件的连接方式除了采用上述PCIe总线、USB总线、I2C总线外,还可以通过扩展工业标准结构(extended industry standard architecture,EISA)总线、统一总线(unified bus,Ubus或UB)、计算机快速链接(compute express link,CXL)、缓存一致互联协议(cache coherent interconnect for accelerators,CCIX)等。总线还可以分为地址总线、数据总线、控制总线等。
处理器220通过双倍速率(double data rate,DDR)总线和内存230相连。这里,不同的内存230可能采用不同的数据总线与处理器220通信,因此DDR总线也可以替换为其他类型的数据总线,本申请实施例不对总线类型进行限定。
另外,主机200还包括各种输入输出(input/output,I/O)设备,处理器220可以通过PCIe总线访问这些I/O设备。
处理器(processor)220是主机200的运算核心和控制核心。处理器220中可以包括一个或多个处理器核(core)221。处理器220可以是一块超大规模的集成电路。在处理器220中安装有操作系统和其他软件程序,从而处理器220能够实现对内存230及各种PCIe设备的访问。可以理解的是,在本发明实施例中,处理器220可以是中央处理器(central processing unit,CPU),可以是其他特定集成电路(application specific integrated circuit,ASIC)。处理器220还可以是其他通用处理器、数字信号处理器(digital signal processing,DSP)、现场可编程门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。实际应用中,主机200也可以包括多个处理器。
内存230,也称为主存(main memory)。内存230通常用来存放操作系统中各种正在运行的软件、输入和输出数据以及与外存交换的信息等。为了提高处理器220的访问速度,内存230需要具备访问速度快的优点。在传统的计算机系统架构中,通常采用动态随机存取存储器(dynamic random access memory,DRAM)作为内存230。处理器220能够通过内存控制器高速访问内存230,对内存230中的任意一个存储单元进行读操作和写操作。除了DRAM之外,内存230还可以是其他随机存取存储器,例如静态随机存取存储器(static random access memory,SRAM)等。本实施例不对内存230的数量和类型进行限定。此外,可对内存230进行配置使其具有保电功能。保电功能是指系统发生掉电又重新上电时,存 储器中存储的数据也不会丢失。具有保电功能的内存230被称为非易失性存储器。
示例的,内存230包括多个内存页(page),内存页是内存230的数据I/O操作的最小单位,内存页也称为数据读写的原子单位。每个内存页对应内存230的一段存储地址空间,如一个内存页可以用于存储4千字节(kilo bytes,KB)的数据,则该内存页对应4KB的存储地址空间。另外,一个内存页也可以对应更大或更小的存储地址空间,如2KB或8KB等。
在一些可能的情形中,如虚拟机热迁移过程中,若内存页中所存储的虚拟机数据在一段时间内被修改,则该内存页可以被称为该虚拟机的热迁移过程的脏页,该修改后的内存页(脏页)中所存储的数据被称为该虚拟机的脏页数据。
I/O设备是指可以进行数据传输的硬件,也可以理解为与I/O接口对接的设备。常见的I/O设备有网卡、打印机、键盘、鼠标等,如I/O设备可以是图2所示出的网卡250。所有的外存也可以作为I/O设备,如硬盘、软盘、光盘等。处理器220可通过PCIe总线访问各个I/O设备。需要说明的是,PCIe总线只是其中的一个示例,可以被替换为其他总线,例如统一(Unified Bus,UB或Ubus)总线或计算机快速链接(compute express link,CXL)等。
如图2所示,网卡250包括处理器251、存储器252和通信接口253。在一些可能的示例中,包含处理单元和网络适配器(network interface card,NIC)的网卡也被称为智能网卡(intelligent NIC,iNIC)。
处理器251是指具有处理能力的处理器,例如数据处理单元(data processing unit,DPU)。DPU具有CPU的通用性和可编程性,但更具有专用性,可以在网络数据包,存储请求或分析请求上高效运行。DPU通过较大程度的并行性(需要处理大量请求)与CPU区别开来。可选的,这里的DPU也可以替换成图形处理单元(graphics processing unit,GPU)、嵌入式神经网络处理器(neural-network processing units,NPU)等处理芯片。
存储器252可以是指与处理器251直接交换数据的内部存储器,它可以随时读写数据,而且速度很快,作为操作系统或其他正在运行中的程序的临时数据存储器。存储器252包括至少两种存储器,例如存储器252既可以是随机存取存储器,也可以是ROM。举例来说,随机存取存储器是DRAM,或者存储级存储器(storage class memory,SCM)。DRAM是一种半导体存储器,与大部分RAM一样,属于一种易失性存储器(volatile memory)设备。SCM是一种同时结合传统储存装置与存储器特性的复合型储存技术,存储级存储器能够提供比硬盘更快速的读写速度,但存取速度上比DRAM慢,在成本上也比DRAM更为便宜。然而,DRAM和SCM在本实施例中只是示例性的说明,存储器252还可以包括其他随机存取存储器,例如SRAM等。而对于只读存储器,举例来说,可以是PROM、EPROM等。另外,存储器252还可以是双列直插式存储器模块或双线存储器模块(dual in-line memory module,DIMM),即由DRAM组成的模块,还可以是固态硬盘(solid state disk,SSD)。实际应用中,网卡250中可配置多个存储器252,以及不同类型的存储器252。本实施例不对存储器252的数量和类型进行限定。此外,可对存储器252进行配置使其具有保电功能。保电功能是指系统发生掉电又重新上电时,存储器252中存储的数据也不会丢失。具有保电功能的存储器被称为非易失性存储器。
在一种可能的情形中,存储器252中存储有软件程序,处理器251运行存储器252中 的软件程序可实现VM迁移的管理,例如,将内存230存储的VM数据迁移到其他设备。
通信接口253用于实现的主机200与其他设备进行通信的网络接口卡,例如,网络适配器253可以实现从并行到串行的数据转换、数据包的装配和拆装、网络存取控制、数据缓存和网络信号中的一种或多种功能。
如图2所示,主机200上可以运行有一个或多个VM,如图2所示出的虚拟机200A~虚拟机200C。VM所需的计算资源来源于主机200本地的处理器220和内存230,而VM所需的存储资源既可以来源于主机200本地的硬盘240,也可以来自其他主机中的硬盘。例如,虚拟机200A包括虚拟处理器200A1、虚拟内存200A2和虚拟网卡200A3,虚拟处理器200A1所需的计算资源由处理器200提供、虚拟内存200A2所需的存储资源由内存230(或硬盘240)提供,虚拟网卡200A3所需的网络资源由网卡250提供。此外,VM中可运行各种应用程序,用户可通过VM中的应用程序触发读/写数据请求等业务。
示例的,VM中虚拟内存200A2的存储空间可由内存230包括的内存页来提供。通常,VM的热迁移可以在保证业务不中断的情况下,将VM从一台主机迁移到另外一台主机,而保证业务不中断的关键技术即为内存数据的迁移,该内存数据是指存储在内存230包括的内存页的数据。若该内存页中所存储的数据在一段时间内被修改,则该内存页可以被称为该VM热迁移过程中的脏页,因此,该修改后的内存页(脏页)中所存储的数据被称为VM的脏页数据。
这里以源主机是图1中主机111、目的主机是主机112为例进行说明,内存的迁移可为以下几个阶段。
1、迭代预拷贝阶段:VM迁移开始时,依然在源主机上运行;为了保证VM业务不中断,待迁移虚拟机的数据会被同时写入源主机和目的主机的内存中。
2、停机拷贝阶段:VM在源主机上的运行中断,且VM在源主机的内存页中所存储的数据被传输到目的主机的内存上。
3、脏页拷贝阶段:VM依然在源主机上的运行,源主机监控并记录下迁移过程中所有已被传输的内存页的任何修改(即内存页中的脏页),并在VM所使用的所有内存页都传输完成后,将传输脏页所存储的脏页数据。
另外,源主机估计迁移过程中的数据传输速度,当剩余的内存数据量能够在一个可以设定的时间周期(如30毫秒)内传输完成时,关闭源主机上的VM,再将VM剩余的脏页数据传输到目的主机上。
4、虚拟机恢复阶段:在目的主机启动VM,VM的整个迁移过程完成。
在一种可能的情形中,如果源主机和目的主机共享存储系统,则源主机只需要通过网络发送VM的执行状态、内存中的内容、虚机设备的状态到目的主机上。否则,还需要将VM的磁盘存储发到目的主机上。共享存储系统指的是源和目的虚机的镜像文件目录是在一个共享的存储上的。
值得注意的是,本实施例提供的内存迁移过程是在计算机集群110内部的多个主机之间实现的,在一种可选的实现方式中,内存的迁移也可以是在计算机集群110中主机与计算机集群110外部的其他设备之间的数据迁移,本申请对此不予限定。
下面将结合附图对本申请实施例的实施方式进行详细描述。
图3为本申请提供的一种数据迁移方法的流程示意图一,该数据迁移方法可应用于图 1所示出的通信系统,示例的,主机310可以实现图1中主机111的功能,主机320可以实现图1中主机112的功能。该数据迁移方法可以由网卡313执行,网卡313所在主机310的硬件实现可以参考图2所示出的主机200,该网卡313也可以具有图2中网卡253的功能,主机320和主机310相似,此处不再赘述。
为了便于描述,将图3所示的主机310称为源主机,或称发送端主机、第一主机、源节点等,内存312存储有一个或多个VM的待迁移数据,该待迁移数据包括VM在拷贝前的运行数据,以及热迁移过程中脏页存储的数据等。脏页是指内存312中数据发生修改的内存页(page)。
示例的,网卡313中可以运行有虚拟机热迁移管理程序,该虚拟机热迁移管理程序所需的计算资源由网卡313所包括的处理器和存储器提供,具体的,该虚拟机热迁移管理程序可以对主机310中内存312存储的脏页数据进行管理,如读取、写入或擦除等。在虚拟机热迁移过程中,将处理器311的数据迁移功能卸载到网卡313中,避免了主机310中的处理器311执行多次虚拟机的数据拷贝操作,降低了处理器311的资源消耗,提高了主机310处理其他计算业务的效率。
为了便于描述,将图3所示中主机320称为目的主机,或称接收端主机、第二主机、目的节点等,内存322用于存储一个或多个VM的待迁移数据,该待迁移数据包括VM在拷贝前的运行数据,以及热迁移过程中脏页存储的数据等。脏页是指在VM的热迁移过程中,内存322存储的数据发生修改的内存页(page)。示例的,为实现虚拟机的热迁移管理过程,网卡323中也可以运行有上述的虚拟机热迁移管理程序。
值得注意的是,源主机和目的主机中可以运行有虚拟机管理软件,该虚拟机管理软件用于管理源主机和目的主机上的虚拟机。示例的,前述的虚拟机热迁移管理程序可以是虚拟机管理软件的一部分,也可以是虚拟机管理软件为实现虚拟机热迁移而在网卡中启动的一个线程。
网卡接收和发送数据通常采用消息队列的方式,消息队列包括一组队列对(queue pair,QP),QP包括发送队列和接收队列,如网卡313用于发送数据的消息队列是发送队列(SQ),网卡323中用于接收数据的消息队列是接收队列(RQ)。消息队列是多个主机之间进行通信所采用的连接方式,例如,多个主机之间可以利用TCP/IP协议建立多条连接,每条连接都有一个接收队列和发送队列,该接收队列和发送队列用于传输该连接的数据。
示例的,主机310和主机320基于硬直通(single root I/O virtualization,SR-IOV)技术来实现硬件网络资源(如网卡)的虚拟化,SR-IOV技术是“虚拟通道”的一个技术实现。示例的,SR-IOV技术将一个实现物理功能(physical function,PF)的PCIe设备进行虚拟化,获得一个或多个实现虚拟功能(virtual function,VF)的PCIe设备,每个可以实现VF的PCIe设备(下文简称为VF)被直接分配到一个虚拟机,主机还为每个VF提供独立的内存空间、中断(号)和直接内存访问(direct memory access,DMA)流。
在SR-IOV技术中,SR-IOV设备的PF驱动程序用于管理具有SR-IOV功能的设备的物理功能,物理功能支持SR-IOV规范定义的SR-IOV功能的PCI功能,物理功能是全面的PCIe功能,可以像其他任何物理PCIe设备一样发现、管理和处理。物理功能可用于配置和控制虚拟PCIe设备。
VF是支持SR-IOV的物理网卡所虚拟出的一个虚拟网卡或实例,VF会以一个独立网 卡的形式呈现给虚拟机,每个VF具有独享的PCI配置区域,并且可能与其他VF共享同一个物理资源(如公用一个物理网口),VF具有轻量级的PCIe功能,可以与PF以及该PF关联的其他VF共享一个或多个物理资源(如物理网卡)。
例如,对于虚拟机的一个QP来说,在源主机中,一个VF对应一个或多个SQ;在目的主机中,一个VF对应一个或多个RQ。
消息队列中存储有工作队列元素(work queue element,WQE),WQE中存储有指向发送队列、或接收队列数据的地址和长度的信息,数据的长度可以由数据的地址和偏移量来确定,WQE中指示了数据的地址和偏移量的信息又被称为散集(scatter/gather,SG)信息。如果一组数据包括多段的数据,一段数据的SG信息包括该数据的地址和偏移量,则WQE中关于该组数据的多个SG信息也可称为SG的链,或称散集列表(scatter/gather list,SGL)。
在第一种可行的实现方式中,上述SG信息所包括的地址是指VM数据的客户机物理地址(guest physical address,GPA),客户机是指运行在源主机上的虚拟机(VM),该GPA是指VM数据在内存(例如,内存312)中所存储的内存地址。在一种可能的情形中,GPA可以是指中间的物理地址(intermediate phyical address,IPA)。VM可以基于该GPA访问相应的数据,但硬件设备(如主机的处理器或网卡)需要使用主机物理地址(host physical address,HPA)才能对该数据进行访问,因此,该GPA需经过GPA→主机虚拟地址(host virtual address,HVA),以及HVA→HPA的两级地址转换,以实现硬件设备对该SG信息指示的数据的访问。
在第二种可行的实现方式中,上述SG信息所包括的地址是指VM数据的HPA,硬件设备(如主机的处理器或网卡)可以基于该GPA访问VM数据。
值得注意的是,上述的SG信息仅为本实施例提供的一种示例,不应理解为对本申请的限定。
请继续参见图3,内存312包括多个内存页(page),如序号为1~5的page,分别记为page1、page2、page3、page4、page5,该多个page中具有2个与待迁移的虚拟机关联的内存页,如内存312中page2和page4。例如,在VM热迁移的过程中,网卡313可以将内存312中page2和page4存储的数据发送到其他主机,如主机320。
如图3所示,本实施例提供的数据迁移方法包括以下步骤。
S310,网卡313获取数据迁移通知消息。
该数据迁移通知消息用于指示待迁移的虚拟机的标识,如图3中黑色填充的小块。
示例的,若主机310中运行有多个虚拟机,该数据迁移通知消息可以用于指示该多个虚拟机中待迁移的虚拟机,避免无需迁移的虚拟机数据被迁移到其他主机,降低数据迁移过程中多个虚拟机的数据发生紊乱的概率。
可选的,数据迁移通知消息是虚拟机管理软件触发虚拟机的热迁移操作生成的。例如,虚拟机管理软件管理了多个虚拟机,在主机310(或客户端)触发热迁移操作后,由虚拟机管理软件生成一个数据迁移通知消息,该数据迁移通知消息可以包含待迁移的虚拟机的标识,例如,该标识是虚拟机的序号,或者虚拟机在虚拟局域网(virtual local area network,VLAN)中的标签等。
虚拟机的热迁移操作的触发条件可以是:用户操作和主机的资源使用情况。该资源使用情况可以是例如主机待释放的资源量等信息,具体的,如主机的计算资源、存储资源和 网络带宽等。
S320,网卡313根据数据迁移通知消息指示的标识确定待迁移数据。
该待迁移数据为存储在源主机(主机310)的内存(内存312)中,且与待迁移的虚拟机关联的数据。
可选的,网卡313可以根据数据迁移通知消息指示的标识确定内存312中与待迁移的虚拟机关联的内存页集合,并将内存页集合中存储的数据作为待迁移数据。示例的,该内存页集合包括一个或多个内存页。
如图3所示,内存312中与待迁移的虚拟机关联的内存页为page2和page4,该page2和page4中存储的数据为VM的待迁移数据。
由网卡313依据待迁移的虚拟机的标识确定待迁移数据,避免了由处理器311确定待迁移数据导致主机310的性能下降,提高了主机310处理其他业务的能力,减少了主机310的卡顿。
在一种可能的情形中,该待迁移数据包括脏页数据,脏页数据为一个或多个内存页中,数据发生修改的内存页所存储的数据。在内存312中与待迁移的虚拟机关联的内存页集合中,数据发生修改的内存页为脏页,如图3所示出的page2为脏页,则该page2中所存储的待迁移的虚拟机的数据为脏页数据。
针对于待迁移数据包括的脏页数据,下面提供一种可能的具体实现方式:主机310查询网卡中保存的脏页标记信息,确定与标识关联的脏页集合,并将脏页集合中存储的数据作为脏页数据。
其中,脏页集合包括一个或多个脏页,如图3所示出的page2。
前述的脏页标记信息用于指示脏页的内存地址。在一种可选的示例中,脏页标记信息包括第一脏页表和第二脏页表中至少一个。例如,网卡313中可以保存有一张或多张脏页表,该多张脏页表在同一个时间可以用于实现不同的功能。
第一脏页表用于标记脏页为标脏状态,该标脏状态为源主机(如主机310中的处理器311)对脏页中存储的数据进行修改的状态。
第二脏页表用于标记脏页为数据迁移状态,该数据迁移状态为网卡(如网卡313)对脏页中存储的数据进行迁移的状态。
在数据迁移过程中,脏页标记信息可以仅包括一个脏页表,该脏页表用于记录脏页的迁移。例如,当源主机对脏页进行数据访问时,网卡313中仅需维护有第一脏页表,记该脏页为标脏状态;又如,当网卡313对脏页进行数据访问时,网卡313中仅需维护有第二脏页表,记该脏页为数据迁移状态。
另外,由于一个脏页表无法被多个程序或硬件设备同时访问,因此,网卡313中也可以设置有2个脏页表来标记内存页的迁移状态,如上述的第一脏页表和第二脏页表,在同一个时间节点,该2个脏页表分别用于实现不同的功能。
在网卡313确定与待迁移的虚拟机关联的内存页集合后,下面提供一种可行的实现方式,来对上述的脏页标记信息包括的脏页表进行说明。
对于内存312中一个内存页(page)而言,网卡313可以用脏页表来标记该page的状态。例如,一张脏页表表示了与待迁移的虚拟机关联的所有内存页(page)中的脏页的迁移状态,如每个脏页的迁移状态用一个位元(Bit)来表示。
示例的,该脏页表中可以设置有一个或多个状态标志,例如,脏页表的结构可以如下表1所示。
表1
  M S
情况1 0 1
情况2 1 0
其中,M和S是用于标记page的迁移状态的状态标志。
状态标志M用于指示主机310对该page的访问信息,例如,M=1是指主机310对该page进行了数据访问,M=0是指主机310未对该page进行数据访问。
状态标志S用于指示网卡313对该page的访问信息,例如,S=1是指网卡313对该page进行了数据访问(如将该page的数据迁移到其他主机),S=0是指网卡313未对该page进行数据访问。
如表1所示,上述状态标志对该page的状态标记过程存在以下两种可能的情况。
情况1:M=1,S=0,则该page为标脏状态,标脏状态为源主机(如主机310中的处理器311)对脏页中存储的数据进行修改的状态,如上述的第一脏页表。
情况2:M=0,S=1,则该page为数据迁移状态,数据迁移状态为网卡(如网卡313)对脏页中存储的数据进行迁移的状态,如上述的第二脏页表。
由于一个page无法在同一个时间节点为多个硬件设备提供数据访问服务,因此,脏页表不存在“M=1,S=1”的情况。另外,由于“M=0,S=0”指示该page不是脏页,且处理器311和网卡313均未对脏页表指示的page进行数据访问,而脏页表无需记录非脏页的迁移状态,因此,脏页表也不存在“M=0,S=0”的情况。
值得注意的是,表1仅为本实施例提供的脏页表的示例,脏页表中的状态标志(如M和S)所代表的含义可根据不同的使用场景和需求进行调整,本申请对此不予限定。
在数据迁移过程中,为了实现对脏页标记信息的管理,虚拟机管理软件中可以运行标脏程序和迁移程序。标脏程序和迁移程序可以是虚拟机热迁移管理程序中一个软件模块,也可以是虚拟机管理软件触发的一个独立软件单元,本申请对此不予限定。
在一种可能的示例中,该标脏程序和迁移程序运行在网卡313。
在另一种可能的示例中,该标脏程序和迁移程序运行在处理器311。
这里以标脏程序和迁移程序运行在网卡313为例进行说明。
例如,标脏程序用于管理第一脏页表,如在内存页中存储的数据发生修改后,标脏程序将该内存页在第一脏页表中的状态标志位记为“M=1,S=0”。
又如,迁移程序用于管理第二脏页表,如在内存页中发生修改后的数据被迁移到其他主机时,迁移程序将该内存页在第二脏页表中的状态标志位记为“M=0,S=1”。在内存页中发生修改后的数据被迁移到其他主机后,迁移程序将该内存页在第二脏页表中的状态标志位记为“M=0,S=1”。
在本实施例中,由网卡313来对内存312中与待迁移的虚拟机关联的内存页中的脏页进行标记,即内存312中的标脏功能由处理器311卸载到了网卡313,避免了处理器311标记内存312中脏页的过程,减少了处理器311的资源消耗,进而,避免了由于处理器311管理虚拟机的热迁移过程导致的主机310的其他计算业务受到影响。
在网卡313中,针对内存312的一个page,网卡313可以利用两张脏页表来标记该page:若该page为脏页,且处理器311正在对该page存储的数据进行修改,则网卡313利用脏页表1(第一脏页表)对该page进行标记,记为M=1,S=0,如上述的情况1;若网卡313需要发送该脏页中存储的数据,则网卡313利用脏页表2(第二脏页表)对该脏页进行标记,记为M=0,S=1,如上述的情况2。在网卡313中设置有2个脏页表来标记一个脏页的迁移状态的情况下,由于标脏程序可以对第一脏页表进行修改,迁移程序可以对第二脏页表进行修改,避免了标脏程序和迁移程序同时修改一个脏页表,造成数据迁移错误。
关于网卡313利用脏页标记信息来识别内存页集合中的脏页的过程,这里给出一种可能的具体示例,该脏页扫描过程包括以下可能的步骤:
步骤1、网卡313设置两个脏页表:脏页表1和脏页表2。这两个脏页表中每个脏页表均有两个标志位(如上述表1示出的M和S),并设置脏页表1中的标志位全为0,脏页表2中的标志位全为1,其中,“1”表示脏页,“0”表示非脏页,脏页表2中全部为“1”表示脏页表2中记录的任一个page均为脏页。
步骤2、网卡313对标志位全为0的脏页表1进行初始化,将脏页表1中所有page记为“M=1,S=0”,网卡313对标志全为1的脏页表2进行初始化,将脏页表2中所有page记为“M=0,S=1”。网卡313对脏页表1和脏页表2进行初始化,使得脏页表1中记录的所有脏页的状态记为标脏状态,将脏页表1的管理交由上述的标脏程序;脏页表2中记录的所有脏页的状态记为数据迁移状态,将脏页表2的管理交由上述的迁移程序。由不同的程序来对2个脏页表分别进行管理,避免一张脏页表被不同的程序进行修改,导致数据迁移发生错误。
步骤3、网卡313对主机310中内存312的多个page进行扫描,并将多个page中的脏页记为“M=1,S=0”。在网卡313确定内存312中被处理器311进行了数据修改后的内存页后,网卡313将这些内存页记为脏页,以便网卡313启动脏页的数据迁移操作。
步骤4、网卡313扫描M=0,S=1的脏页表2。在完成全部的扫描后,将脏页表2中记录的任一个脏页对应的状态标志位全部清0(记为M=1,S=0),并将脏页表1中与脏页表2中相应的page置1(记为M=0,S=1)。
如此,网卡313可以扫描脏页表1(或脏页表2)中“M=0,S=1”的page,并在网卡313将该page中存储的数据发往其他主机后,将脏页表1中该page的标志记为“M=1,S=0”,从而实现内存312中脏页的标脏和数据迁移(或推送)过程。
值得注意的是,上述实施例是以一个page是脏页为例进行说明的,在VM的热迁移过程涉及多个脏页的情况下,网卡313中保存的脏页表还记录有该脏页的地址信息,如该地址信息是指在主机310中脏页的HPA,或在VM中脏页对应的GPA等。
请继续参见图3,本实施例提供的数据迁移方法还包括以下步骤。
S330,网卡313将待迁移数据迁移至主机320。
在本实施例中,待迁移数据是由网卡依据待迁移的虚拟机的标识确定的,避免了源主机对虚拟机的待迁移数据进行确定的过程,减少了源主机在虚拟机热迁移过程中所需的计算资源,提高了源主机执行其他业务(如AI、HPC等计算密集型和时延敏感型业务)的能力。
针对于主机320接收到待迁移数据后,请继续参见图3,本实施例提供的数据迁移方 法还包括以下步骤S340。
S340,网卡323将待迁移数据迁移至主机320的内存322中。
由目的主机的网卡来将待迁移数据写入目的主机的内存中,避免了目的主机中,待迁移数据从网卡到内存的拷贝过程,减少了数据迁移时延,提高了数据迁移的效率。另外,由于目的主机的处理器无需对虚拟机的待迁移数据进行拷贝,减少了虚拟机热迁移过程中,目的主机的计算资源和存储资源消耗,提高了目的主机执行其他业务的能力。
另外,主机310也可以作为目的主机来接收其他主机发送的虚拟机的数据,如主机310接收到另一个数据迁移通知消息,该数据迁移通知消息用于指示主机310待接收的虚拟机的标识;主机310将其他主机发送的待接收数据迁移至内存312中,该待接收数据为存储在其他主机的内存中,且与待接收的虚拟机关联的数据。关于主机310接收其他主机迁移的虚拟机的数据过程,可以参考上述实施例中主机320接收数据的内容,此处不予赘述。
在本实施例中,源主机可以作为发送端将待迁移的虚拟机的待迁移数据发送到目的主机,源主机还可以作为接收端接收其他主机发送的虚拟机的数据,从而实现源主机的多个虚拟机的迁移过程中的收发功能,提高了源主机的数据迁移性能。
在一种可选的实现方式中,针对于源主机将待迁移数据迁移至目的主机的过程,这里提供一种可行的具体实现方式,如图4所示,图4为本申请提供的一种数据迁移方法的流程示意图二,图4示出了上述S330的一种可能的实现方式,上述的S330包括以下步骤。
S410,网卡313向主机320发送待迁移数据的页面信息。
其中,该页面信息用于指示待迁移数据在内存312中的内存地址和偏移量。
示例的,网卡313中发送队列SQ维护有包含脏页的内存地址和偏移量的SG信息,前述的页面信息包括脏页对应的SG信息。在一些情形中,该页面信息可以称为待迁移数据的描述信息,如该描述信息是指用于描述脏页中存储的业务数据的元数据。
S420,网卡323向网卡313发送基于页面信息确定的迁移消息。
其中,该迁移消息用于指示目的主机中与待迁移的虚拟机相应的接收队列(RQ)。
示例的,该迁移消息是网卡323依据页面信息为主机310中待迁移的虚拟机分配的RQ标识,或者RQ序号等。
在一种可能的情形中,网卡313和网卡323获取到数据迁移通知消息后,为该待迁移的虚拟机分配了一个或多个RQ;该迁移消息是在网卡323收到待迁移数据的页面信息后,从该一个或多个RQ中选择的一个RQ的序号(或标识)。
在另一种可能的情形中,该迁移消息是网卡323收到待迁移数据的页面信息后,网卡323与网卡313建立了一个用于传输待迁移数据的数据连接(或传输通道),该数据连接中的数据传输是通过QP实现的,该迁移消息是指该QP所包括的接收队列(RQ)的序号(或标识)。
以上两种可能的情形仅为本实施例提供的迁移消息的生成方式的示例,不应理解为对本申请的限定。
S430,网卡313向迁移消息指示的RQ发送待迁移数据。
在网卡323中存在多个RQ,一个RQ对应一个虚拟机的情况下,由网卡323依据网卡313发送的页面信息,为主机310中待迁移的虚拟机分配迁移消息,并由网卡313向该迁移消息指示的RQ迁移待迁移数据,避免了待迁移的虚拟机(如VM1)的待迁移数据被 发送到其他虚拟机(如VM2)对应的RQ,提高了虚拟机的数据迁移准确性。
针对于上述的页面信息和待迁移数据的关系,这里提供一种可能的实现方式,如图5所示,图5为本申请提供的一种数据迁移的示意图,网卡313和网卡323通过QP的方式建立有传输通道,这里的QP包括网卡313中的发送队列和网卡323中的接收队列,发送队列位于该传输通道的发送端,接收队列位于该传输通道的接收端。
如图5所示,网卡313包括的发送队列(SQ)维护的SG信息和内存312中与待迁移的虚拟机关联的内存页具有第一关系。
示例的,该第一关系是由网卡313在依据待迁移的虚拟机的标识确定待迁移数据后,网卡313利用内存312中与标识关联的内存页的内存地址构建的。
在一种情形中,该第一关系是指内存312中内存页存储的业务数据与SG信息的对应关系。如图5所示出网卡313的page2-SG1,以及page4-SG2。
在另一种情形中,该第一关系是指内存页的内存地址与SG信息的对应关系,例如,page2的内存地址为001,page4的内存地址为011,该对应关系包括001-SG1,以及011-SG2。
值得注意的是,以上两种情形是本实施例提供的示例,不应理解为对本申请的限定,如在另一种可能的情形中,该第一关系是指与待迁移的虚拟机关联的脏页的内存地址(或脏页数据)与SG信息的对应关系。
如图5所示,当多个脏页的内存地址不连续时,则在SQ中,每个脏页对应一个SG信息。如SG1包括“001 4KB”,表示内存312中地址为“001”的page2为脏页,且该脏页中存储的业务数据的数据长度为4KB;SG2包括“011 4KB”,表示内存312中地址为“011”的page4为脏页,且该脏页中存储的业务数据的数据长度为4KB。
值得注意的是,基于图5给出的示例是以内存312中数据读写的基本单位为page,单个page的存储空间是4KB为例进行说明的,但在一个VM的热迁移过程中,可能会涉及到多个脏页,且这多个脏页的内存地址是连续的,则该多个脏页在SQ中可以仅对应一个SG信息。例如,2个脏页的地址分别为“001”和“010”,且每个脏页中存储的业务数据的数据长度均为4KB,则在网卡313将该2个脏页存储的业务数据映射到SQ时,该2个脏页对应的SG信息可以为“001 8KB”。
在VM的数据迁移过程中,由网卡313将内存312中的脏页映射到SQ中的SG信息,该脏页中存储的业务数据无需拷贝到网卡313中的存储器,在主机310中,避免了业务数据从内存312到网卡313的数据拷贝,减少了VM的数据迁移时间,提高了VM的热迁移效率。
如图5所示,网卡323包括的接收队列(RQ)维护的SG信息和内存322中的内存页具有第二关系。
例如,该第二关系是指内存页存储的业务数据与SG信息的对应关系。如图5所示出网卡323中的page2-SG1,以及page4-SG2。
又如,该第二关系是指内存322中内存页的内存地址与SG信息的对应关系,如page2的内存地址为001,page4的内存地址为011,该第二关系包括001-SG1,以及011-SG2。
在另一种可能的情形中,该第二关系是指与待迁移的虚拟机关联的脏页的内存地址(或脏页数据)与SG信息的对应关系。
示例的,在网卡313和网卡323构建待迁移数据的传输通道后,网卡323依据页面信 息包含的内存地址构建上述的第二关系。
在本文的下述内容中,在源主机中,将待迁移的虚拟机称为源VM;在目的主机中,将待迁移的虚拟机称为目的VM。
为避免待迁移的数据在源主机和目的主机中的存储空间发生变化,针对于一组数据在源主机的内存地址(如源VM GPA)和在目的主机的内存地址(如目的VM GPA)应是一致的。在一种可行的示例中,源VM GPA与目的VM GPA的一致是由虚拟机管理软件实现的,如在网卡313构建了上述的第一关系后,网卡323利用虚拟机管理软件和前述的第一关系来构建第二关系,使得源VM GPA与目的VM GPA保持一致,从而避免虚拟机迁移后,虚拟机中指示数据的内存地址发生变化,提高虚拟机迁移的准确性。
为避免网卡313从内存312中拷贝数据(如脏页数据)后,再将该拷贝的脏页数据发送到主机320,导致数据迁移的时延较大,虚拟机的卡顿明显,针对于上述的S430,结合图5所示出的SQ、RQ和SG信息,这里提供一种可选的实现方式:网卡313从SQ中获取脏页对应的SG信息;网卡向迁移消息指示的RQ发送SG信息指示的数据。
在本实施例中,脏页对应的SG信息包含的内存地址,可以是上述示例提供的HPA或者GPA。
例如,若SG信息包含的内存地址是指HPA,则网卡313基于该HPA指示的存储地址空间,将该脏页中存储的脏页数据迁移到主机320中。
又如,若SG信息包含的内存地址是指GPA,则网卡313基于该GPA进行地址转换,获得脏页的HPA,进而将该HPA指示的存储地址空间中存储的脏页数据迁移到主机320中。
网卡313基于GPA进行地址转换可以是通过IO内存管理单元(memory management unit,MMU)实现的,具体的,该IOMMU可以基于待迁移的虚拟机在主机310所采用的VF信息,对该GPA进行地址转换,得到脏页的HPA。该VF信息用于指示待迁移的虚拟机采用的虚拟PCIe设备(如网卡或内存)的标识。
由于主机为每个VF提供了独立的内存空间,且在源主机中,一个VF对应一个或多个SQ,网卡313基于VF信息对SQ中SG信息包括GPA进行地址转换,获得脏页的HPA,使得网卡313可以基于该HPA将脏页数据迁移到主机320,实现了待迁移的虚拟机的数据迁移,避免了脏页数据从内存312到网卡313的拷贝过程,减少了数据迁移的时间,提高了虚拟机的迁移效率。
在本实施例中,由网卡313将内存312中的脏页映射到SQ中的SG信息,该脏页中存储的业务数据无需拷贝到网卡313中的存储器,在主机310中,避免了业务数据从内存312到网卡313的数据拷贝,减少了VM的热迁移时间,提高了VM的热迁移效率。
另外,上述的实施例是以消息队列对(QP)中SQ和RQ是1:1为例进行说明的,在另一些可能的情形中,SQ和RQ的数量比例也可以是N:M,N和M均为正整数,且N≠M。例如,SQ和RQ的数量比例为2:1,如在网卡313中,page2对应的SG1信息存储在SQ(1)、page4对应的SG2信息存储在SQ(2);在网卡323中,SG1信息和SG2信息保存在一个RQ,在待迁移数据迁移过程中,网卡313基于上述的SG1信息和SG2信息,将page2和page4中存储的数据迁移到RQ映射的存储地址空间。
在本申请提供的上述实施例中,待迁移数据的迁移可以是通过TCP/IP协议或RDMA 网络等实现的。
在TCP/IP场景中,主机310(源主机)通过TCP/IP协议和主机320(目的主机)建立有2个传输通道:控制连接和数据连接。其中,控制连接用于传输页面信息和迁移消息,数据连接用于传输待迁移数据。
在本实施例中,不同的传输通道用于传输不同的信息或数据,避免了页面信息由数据连接传输,导致该页面信息无法被接收侧的网卡323进行处理,以及虚拟机的热迁移出现问题,提高了虚拟机的数据迁移稳定性。
另外,在TCP/IP场景中,主机310(源主机)通过TCP/IP协议和主机320(目的主机)也可以仅建立有1个传输通道:单一连接。该单一连接可用于传输上述的页面信息、迁移消息和待迁移数据。
这里给出一种可能的具体示例,来说明TCP/IP场景中,源主机和目的数据进行数据迁移的过程。
步骤1、网卡313将本次要复制的脏页的页面信息(SG信息)通过单一连接发送给网卡323。
其中,发送队列和接收队列中SG信息包含的地址可以是GPA或者HPA,只需两者(源VM和目的VM)对应的内存空间在GPA层面是完全一致的,能够实现源主机的内存到目的主机的内存的完全复制即可。
如果是SG信息包含的地址是GPA,则需要由IOMMU实现GPA→HPA的地址转换。
如果是SG信息包含的地址是HPA,则不需要IOMMU,进行地址转换。由网卡313指向直接指定主机320归属的用于实现PF的PCIe设备,直接上送给主机320即可,这时由于网卡313基于单一连接传输的数据都是发往主机320,即在该单一连接中,实现PF的PCIe设备的信息(简称PF信息)相同,网卡313可以将PF配置在TCP的连接上下文中。该PF信息用于指示待迁移的虚拟机采用的物理PCIe设备的标识。
在本示例中,网卡313将PF信息配置在TCP的连接上下文中,避免了一个虚拟机的多组数据在源主机和目的主机的迁移过程中,每组数据都需要配置PF信息,减少了数据迁移的总时间,提高了虚拟机的数据迁移效率。
值得注意的是,PF和VF是PCIe接口中虚拟化设备的概念,不应理解为对本申请的限定。如果是其它接口,只需使用对应的标识来标记虚拟设备即可。
步骤2、目的主机基于网卡323接收到的页面信息分配一个消息处理ID(如上述的迁移消息),一个消息处理ID对应网卡323中的一个接收队列。
在一种可能的示例中,该消息处理ID是由网卡323分配的,以减少目的主机的负载,提高目的主机处理其他业务的计算能力。
另外,网卡323还可以利用热迁移管理程序根据接收到的脏页的页面信息(SG信息),设置好接收队列。如接收队列中对应的接收内存块的SG顺序与发送队列中对应的内存块的SG顺序完全一致,其对应的内存空间在VM的GPA这个层面是完全一致的,接收队列中放置的就是前述的页面信息(SG信息或SG信息块)。
步骤3、网卡313将本次要发送的脏页对应的内存块(脏页数据),加上一个携带前述消息处理ID的消息头后,按与SG信息相同的内存组织顺序以SG方式放入TCP的SQ中,网卡313通知硬件(如通信接口),从SQ指定的内存去取数据发往目的主机的RQ。
步骤4、网卡323接收到携带有消息头(消息处理ID)的数据后,依据消息处理ID,获取对应RQ中存储的SG信息,并将数据写入SG信息对应的目的主机内存。
在一种可选的实现方式中,网卡323向网卡313发送迁移消息的过程中,网卡323还可以向网卡313通知内存322的缓存信息,示例的,该缓存信息包括主机320中用于进行虚拟机热迁移的缓存(buffer)空间的可用余量,如10百万字节(mega byte,MB),在业务数据的传输过程中,网卡313发送的业务数据不超过主机320通知的buffer空间的可用余量,避免业务数据的数据量过大,导致网卡323无法快速的将buffer空间中的数据写入内存322中。
另外,由于源主机和目的主机之间的单一连接可以传输不同VM的数据,上述的PF信息/VF信息还可以用于区分单一连接中数据所属的VM,避免单一连接中多个VM的数据发生传输错误,如将VM(1)的数据错误的识别为VM(2)的数据,进而,提高了VM热迁移的准确性。
VM的热迁移管理过程由网卡实现,不需要消耗源主机的处理器所具有的计算资源;且VM的业务数据由网卡从源主机的内存直接复制到目的主机的内存,减少了业务数据的数据拷贝次数和数据拷贝时间,从而降低了VM的热迁移时延。
由于单一连接中配置了待迁移的虚拟机采用的PF信息,网卡313还可以在一个单位(如10个脏页)的业务数据迁移到目的主机(主机320),且网卡313接收到目的主机反馈的响应,如该响应指示网卡323已经完成该一个单位的业务数据的写操作之后,网卡313发起下一个单位的业务数据迁移过程。
综上,由于VM的热迁移过程由主机中的处理器卸载到网卡,因此,减少了主机中处理器的处理资源消耗。
其次,VM的热迁移过程中,由于源主机的网卡中仅保存脏页的元数据(如SG信息),该网卡中不需要保存脏页的业务数据,且元数据的数据量小于业务数据的数据量,这减少了网卡中存储资源的消耗。
而且,由于网卡可以利用脏页的元数据,将内存的脏页中存储的业务数据发往目的主机,这避免了业务数据从内存-网卡的拷贝过程,减少了数据拷贝时间,从而提高了VM热迁移的时延,提高了VM热迁移的效率。
在本申请的上述实施例中,虚拟机热迁移过程是基于TCP协议来实现的,但是一些可能的情况中,虚拟机热迁移过程也可以是基于RDMA网络实现的。
RDMA是一种绕过远程主机操作系统内核访问其内存中数据的技术,由于不经过操作系统,不仅节省了大量CPU资源,同样也提高了系统吞吐量、降低了系统的网络通信延迟,尤其适合在大规模并行计算机集群中有广泛应用。RDMA有几大特点,(1)数据通过网络与远程机器间进行数据传输;(2)没有操作系统内核的参与,有关发送传输的所有内容都卸载到智能网卡上;(3)在用户空间虚拟内存与智能网卡直接进行数据传输不涉及操作系统内核,没有额外的数据移动和复制。
目前,大致有三类RDMA网络,分别是无限带宽(Infiniband,IB)、承载融合以太网上的RDMA(RDMA over Converged Ethernet,RoCE)、互联网广域RDMA协议(internet wide area RDMA protocol,iWARP)。其中,Infiniband是一种专为RDMA设计的网络,从硬件上保证可靠传输,需要支持该技术的网卡和交换机。而RoCE和iWARP都是基于以 太网的RDMA技术,只需要配置特殊的网卡即可。从性能上,Infiniband网络最好,但网卡和交换机是价格也很高,而RoCE和iWARP仅需使用特殊的网卡就可以了,价格也相对便宜很多。
下面给出一种可能的示例,对RDMA场景中虚拟机的数据迁移方法进行说明。
当上述的网卡313和网卡323基于RDMA网络来实现本申请提供的数据迁移方法时,网卡313和网卡323也可以称为智能网卡,这里给出一种该智能网卡可能的硬件和软件实现:智能网卡包括CPU和网络适配器(network interface card,NIC),该CPU上运行有热迁移管理程序。
在RDMA场景中,网卡313将待迁移VM(或称源VM)的全部内存空间GPA与HPA注册成一个存储(memory)区,生成内存保护表(memory protect table,MPT),以及内存翻译表(memory translation table,MTT),其中,MPT表中增加访问主机的PF信息,得到本地的源本地密钥(source local key,S_LKey)和源远程密钥(source remote key,S_RKey)。其中,该MPT表用于指示内存的HPA和待迁移的虚拟机的GPA的对应关系。
同样的,网卡323将VM的全部内存空间GPA与HPA注册成一个memory区,生成MPT表和MTT表,其中,MPT表中增加访问主机的PF信息,得到本地的目的本地密钥(destination local key,D_LKey)和目的远程密钥(destination remote key,D_RKey)。
由RDMA场景中的MPT表来实现上述TCP/IP场景中页面信息和迁移消息的功能,以实现本申请提供的数据迁移方法。
示例的,网卡313向迁移消息指示的RQ发送待迁移数据,可以包括以下过程:首先,网卡313依据迁移信息确定待迁移的虚拟机的PF信息;其次,根据待迁移的虚拟机的PF信息和GPA查询MPT表,确定页面信息对应的HPA;最后,网卡313向目的主机(主机320)发送待迁移的虚拟机的HPA中存储的待迁移数据。
例如,网卡313中运行的热迁移管理程序获得内存312中脏页的描述信息(元数据),以GPA地址确定网卡323中注册的D_RKey为R-KEY,逐批发起“RDMA write”将数据写往目的主机(主机320)中的内存322。
值得注意的是,RDMA的传输模式有双边操作也有单边操作。send/receive属于双边操作,即需要远端的应用感知参与才能完成收发。read和write是单边操作,只需要本端明确信息的源地址和目的地址,远端应用不必感知此次通信,数据的读或存都通过远端的网卡完成,再由远端网卡封装成消息返回到本端。
例如,在本实施例提供的数据迁移方法中,send/receive可以用于传输脏页的描述信息,read和write可以用于传输脏页中存储的业务数据。
这里以RDMA写操作为例对本实施例的数据迁移方法进行说明,RDMA写操作用于请求端(如网卡313)将数据写入响应端(如主机320)的存储空间。
在允许网卡313进行RDMA写操作之前,主机320首先分配一个存储空间供主机320的QP(或QP组)访问。主机320的通道适配器将一个密钥与此存储空间的虚拟地址相关联。主机320将该存储空间的虚拟地址、长度和密钥发送给可以访问该内存区域的网卡313。示例性的,可以通过前文所述的发送操作来将上述信息发送给网卡313。存储空间的虚拟地址和密钥可以用于确定脏页的HPA。
网卡313可以通过发送RDMA write消息来发起RDMA写操作,该消息中包括待写至 主机320的数据、主机320的存储空间的虚拟地址、数据的长度和密钥。数据的长度可以在0字节到231字节之间,与发送操作类似的,如果数据的长度大于路径最大传输单元(path maximum transmission unit,PMTU),将按照PMTU大小分段为多个报文,主机320再将这些报文重新组合得到数据。对于可靠连接,如果数据是短消息(即不必分段为多个报文),主机320针对每个报文向网卡313发送确认报文;如果数据是长消息(即分段为多个报文),主机320可以针对每个报文向网卡313发送确认报文,或者,针对同一数据的连续多个报文向网卡313发送一个确认报文,或者,针对报文的尾包向网卡313发送确认报文;另外,无论数据是短消息还是长消息,主机320可以针对之前接收的多个报文发送一个确认报文,例如,一个报文序列号(packet sequence numbers,PSN)为X的RDMA write消息的确认报文可以用于确认该RDMA write消息之前的PSN小于X的消息已被主机320成功接收。
可以理解的是,为了实现上述实施例中功能,主机和网卡包括了执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本申请中所公开的实施例描述的各示例的单元及方法步骤,本申请能够以硬件或硬件和计算机软件相结合的形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用场景和设计约束条件。
上文中结合图1至图5,详细描述了根据本申请所提供的数据迁移的方法,下面将结合图6,描述根据本申请所提供的数据迁移装置。
图6为本申请提供的一种数据迁移装置的结构示意图,数据迁移装置600可以用于实现上述方法实施例中主机或网卡的功能,因此也能实现上述方法实施例所具备的有益效果。在本申请的实施例中,该数据迁移装置600可以是如图2所示的网卡250,也可以是图3~图5中所示出的网卡313或网卡323,还可以是应用于网卡的模块(如芯片)。
如图6所示,数据迁移装置600包括通信单元610、数据识别单元620、迁移单元630和存储单元640。该数据迁移装置600用于实现上述图3~图5中所示的方法实施例中网卡的功能。在一种可能的示例中,该数据迁移装置600用于实现上述数据迁移方法的具体过程包括以下内容1~3。
1、通信单元610用于获取数据迁移通知消息。该第一数据迁移通知消息用于指示待迁移的虚拟机的标识。
2、数据识别单元620用于根据标识确定待迁移数据,该待迁移数据为存储在源主机的内存中,且与待迁移的虚拟机关联的数据。
示例的,数据识别单元620可以根据前述的标识确定源主机的内存中与待迁移的虚拟机关联的内存页集合,并将该内存页集合中存储的数据作为待迁移数据。
另外,如图6所示,数据迁移装置600包括的存储单元640用于存储脏页标记信息,数据识别单元620可以根据数据迁移通知消息中待迁移的虚拟机的标识,和该脏页标记信息,确定源主机内存中与待迁移的虚拟机关联的脏页,以确定该关联的脏页中存储的脏页数据。关于脏页标记信息更详细的内容可以参考上述方法实施例中表1的相关阐述,此处不予赘述。
3、迁移单元630用于将前述的待迁移数据迁移至目的主机。
待迁移数据是由数据识别单元依据待迁移的虚拟机的标识确定的,避免了源主机对虚拟机的待迁移数据进行确定的过程,减少了源主机在虚拟机热迁移过程中所需的计算资 源,提高了源主机执行其他业务(如AI、HPC等计算密集型和时延敏感型业务)的能力。
当数据迁移装置600用于实现图3所示的方法实施例中主机310的功能时:通信单元610用于执行S310;数据识别单元620用于执行S320;迁移单元630用于执行S330。
当数据迁移装置600用于实现图3所示的方法实施例中主机320的功能时:通信单元610用于执行S330;迁移单元630用于执行S340。
当数据迁移装置600用于实现图4所示的方法实施例中主机310的功能时:通信单元610用于执行S410;迁移单元630用于执行S430。
当数据迁移装置600用于实现图4所示的方法实施例中主机320的功能时:通信单元610用于执行S410~S430。
另外,当数据迁移装置600部署在接收侧主机(如虚拟机迁移过程中的目的主机)时,通信单元610可以用于接收其他主机发送的虚拟机数据,该迁移单元630可以将虚拟机数据迁移到接收队列中SG信息映射的内存地址空间,以避免数据迁移装置600在接收侧主机对该虚拟机数据进行多次拷贝,减少目的主机的计算资源和存储资源消耗,提高虚拟机的热迁移效率,以及目的主机执行其他计算业务的处理能力。
应理解的是,本发明本申请实施例的数据迁移装置600可以通过CPU实现,也可以通过ASIC实现,或可编程逻辑器件(programmable logic device,PLD)实现,上述PLD可以是复杂程序逻辑器件(complex programmable logical device,CPLD)、FPGA、通用阵列逻辑(generic array logic,GAL)或其任意组合。数据迁移装置600通过软件实现图3至图5中任一所示的数据迁移方法时,数据迁移装置600及其各个模块也可以为软件模块。
有关上述数据迁移装置600更详细的描述可以直接参考上述图3~图5所示的实施例中相关描述直接得到,这里不加赘述。
示例的,当数据迁移装置600通过硬件实现时,该硬件可以是一种电子设备,如上述的网卡,或应用在网卡的处理器或芯片等,如该电子设备包括接口电路和控制电路。
接口电路用于接收来自电子设备之外的其它设备的信号并传输至控制电路,或将来自控制电路的信号发送给电子设备之外的其它设备。
控制电路通过逻辑电路或执行代码指令用于实现上述实施例中任一种可能实现方式的方法。有益效果可以参见上述任一实施例中的描述,此处不再赘述。
应理解,根据本申请实施例的网卡可对应于申请实施例中的数据迁移装置600,并可以对应于执行根据本申请实施例的方法图3~图5中的相应主体,并且数据迁移装置600中的各个模块的上述和其它操作和/或功能分别为了实现图3至图5中的各个方法的相应流程,为了简洁,在此不再赘述。
可以理解的是,本申请的实施例中的处理器可以是CPU、NPU或GPU,还可以是其它通用处理器、DSP、ASIC、FPGA或者其它可编程逻辑器件、晶体管逻辑器件,硬件部件或者其任意组合。通用处理器可以是微处理器,也可以是任何常规的处理器。
本申请的实施例中的方法步骤可以通过硬件的方式来实现,也可以由处理器执行软件指令的方式来实现。软件指令可以由相应的软件模块组成,软件模块可以被存放于随机存取存储器(random access memory,RAM)、闪存、只读存储器(read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)、寄存器、 硬盘、移动硬盘、CD-ROM或者本领域熟知的任何其它形式的存储介质中。一种示例性的存储介质耦合至处理器,从而使处理器能够从该存储介质读取信息,且可向该存储介质写入信息。当然,存储介质也可以是处理器的组成部分。处理器和存储介质可以位于ASIC中。另外,该ASIC可以位于网络设备或终端设备中。当然,处理器和存储介质也可以作为分立组件存在于网络设备或终端设备中。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机程序或指令。在计算机上加载和执行所述计算机程序或指令时,全部或部分地执行本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、网络设备、用户设备或者其它可编程装置。所述计算机程序或指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机程序或指令可以从一个网站站点、计算机、服务器或数据中心通过有线或无线方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是集成一个或多个可用介质的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,例如,软盘、硬盘、磁带;也可以是光介质,例如,数字视频光盘(digital video disc,DVD);还可以是半导体介质,例如,固态硬盘(solid state drive,SSD)。
在本申请的各个实施例中,如果没有特殊说明以及逻辑冲突,不同的实施例之间的术语和/或描述具有一致性、且可以相互引用,不同的实施例中的技术特征根据其内在的逻辑关系可以组合形成新的实施例。在本申请的实施例中涉及的各种数字编号仅为描述方便进行的区分,并不用来限制本申请的实施例的范围。上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定。

Claims (15)

  1. 一种数据迁移方法,其特征在于,所述方法由源主机的网卡执行,所述方法包括:
    获取第一数据迁移通知消息,所述第一数据迁移通知消息用于指示待迁移的虚拟机的标识;
    根据所述标识确定待迁移数据,所述待迁移数据为存储在所述源主机的内存中,且与所述待迁移的虚拟机关联的数据;
    将所述待迁移数据迁移至目的主机。
  2. 根据权利要求1所述的方法,其特征在于,
    根据所述标识确定待迁移数据,包括:
    根据所述标识确定所述源主机的内存中与所述待迁移的虚拟机关联的内存页集合,所述内存页集合包括一个或多个内存页;
    将所述内存页集合中存储的数据作为所述待迁移数据。
  3. 根据权利要求2所述的方法,其特征在于,所述待迁移数据包括脏页数据,所述脏页数据为所述一个或多个内存页中,数据发生修改的内存页所存储的数据。
  4. 根据权利要求3所述的方法,其特征在于,
    根据所述标识确定所述待迁移数据包括的脏页数据,包括:
    查询所述网卡中保存的脏页标记信息,确定与所述标识关联的脏页集合;所述脏页集合包括一个或多个脏页,所述脏页为所述一个或多个内存页中数据发生修改的内存页,所述脏页标记信息用于指示所述脏页的内存地址;
    将所述脏页集合中存储的数据作为所述脏页数据。
  5. 根据权利要求4所述的方法,其特征在于,
    所述脏页标记信息包括第一脏页表和第二脏页表中至少一个;
    所述第一脏页表用于标记所述脏页为标脏状态,所述标脏状态为所述源主机对所述脏页中存储的数据进行修改的状态;
    所述第二脏页表用于标记所述脏页为数据迁移状态,所述数据迁移状态为所述网卡对所述脏页中存储的数据进行迁移的状态。
  6. 根据权利要求3-5中任一项所述的方法,其特征在于,
    将所述待迁移数据迁移至目的主机,包括:
    向目的主机发送所述待迁移数据的页面信息,所述页面信息用于指示所述待迁移数据在所述内存中的内存地址和偏移量;
    接收所述目的主机基于所述页面信息反馈的迁移消息,所述迁移消息用于指示所述目的主机中与所述待迁移的虚拟机相应的接收队列RQ;
    向所述迁移消息指示的RQ发送所述待迁移数据。
  7. 根据权利要求6所述的方法,其特征在于,所述网卡中发送队列SQ维护有包含所述脏页的内存地址和偏移量的SG信息,所述页面信息包括所述脏页对应的SG信息。
  8. 根据权利要求7所述的方法,其特征在于,
    向所述迁移消息指示的RQ发送所述待迁移数据,包括:
    从所述SQ中获取所述脏页对应的SG信息;
    向所述迁移消息指示的RQ发送所述SG信息指示的数据。
  9. 根据权利要求6所述的方法,其特征在于,所述源主机通过传输控制协议/网络之间互连协议TCP/IP和所述目的主机建立有控制连接和数据连接,所述控制连接用于传输所述页面信息和所述迁移消息,所述数据连接用于传输所述待迁移数据。
  10. 根据权利要求6所述的方法,其特征在于,所述源主机通过TCP/IP和所述目的主机建立有单一连接,所述单一连接用于传输所述页面信息、所述迁移消息和所述待迁移数据。
  11. 根据权利要求10所述的方法,其特征在于,
    所述迁移消息是所述目的主机为所述待迁移的虚拟机分配的消息处理标识ID。
  12. 根据权利要求6所述的方法,其特征在于,所述源主机通过远程直接内存访问RDMA网络和所述目的主机进行通信,所述网卡中存储有内存保护MPT表,所述MPT表用于指示所述内存的主机物理地址HPA和所述待迁移的虚拟机的客户机物理地址GPA的对应关系,且所述MPT表中包含有所述待迁移的虚拟机所使用的物理功能PF信息;
    向所述迁移消息指示的RQ发送所述待迁移数据,包括:
    依据所述迁移信息确定所述待迁移的虚拟机的PF信息;
    根据所述待迁移的虚拟机的PF信息和GPA查询所述MPT表,确定所述页面信息对应的HPA;
    向所述目的主机发送所述待迁移的虚拟机的HPA中存储的待迁移数据。
  13. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    获取第二数据迁移通知消息,所述第二数据迁移通知消息用于指示待接收的虚拟机的标识;
    将其他主机发送的待接收数据迁移至所述源主机的内存中,所述待接收数据为存储在所述其他主机的内存中,且与所述待接收的虚拟机关联的数据。
  14. 一种数据迁移装置,其特征在于,所述数据迁移装置应用于源主机的网卡,所述数据迁移装置包括:
    通信单元,用于获取第一数据迁移通知消息,所述第一数据迁移通知消息用于指示待迁移的虚拟机的标识;
    数据识别单元,用于根据所述标识确定待迁移数据,所述待迁移数据为存储在所述源主机的内存中,且与所述待迁移的虚拟机关联的数据;
    迁移单元,用于将所述待迁移数据迁移至目的主机。
  15. 一种电子设备,其特征在于,包括:接口电路和控制电路;
    所述接口电路用于接收来自所述电子设备之外的其它设备的信号并传输至所述控制电路,或将来自所述控制电路的信号发送给所述电子设备之外的其它设备,所述控制电路通过逻辑电路或执行代码指令用于实现如权利要求1至13中任一项所述的方法。
PCT/CN2022/127151 2021-11-26 2022-10-24 数据迁移方法、装置及电子设备 WO2023093418A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP22897501.7A EP4421631A1 (en) 2021-11-26 2022-10-24 Data migration method and apparatus, and electronic device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111426045.1A CN116185553A (zh) 2021-11-26 2021-11-26 数据迁移方法、装置及电子设备
CN202111426045.1 2021-11-26

Publications (1)

Publication Number Publication Date
WO2023093418A1 true WO2023093418A1 (zh) 2023-06-01

Family

ID=86444788

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/127151 WO2023093418A1 (zh) 2021-11-26 2022-10-24 数据迁移方法、装置及电子设备

Country Status (3)

Country Link
EP (1) EP4421631A1 (zh)
CN (1) CN116185553A (zh)
WO (1) WO2023093418A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117519908A (zh) * 2023-12-29 2024-02-06 珠海星云智联科技有限公司 一种虚拟机热迁移方法、计算机设备及介质
CN117785757A (zh) * 2024-02-23 2024-03-29 北京超弦存储器研究院 Cxl内存模组、内存页交换的方法、芯片、介质和系统

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117851304B (zh) * 2024-03-07 2024-07-30 济南浪潮数据技术有限公司 一种硬盘更换方法、装置、设备及介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103530167A (zh) * 2013-09-30 2014-01-22 华为技术有限公司 一种虚拟机内存数据的迁移方法及相关装置和集群系统
CN103618809A (zh) * 2013-11-12 2014-03-05 华为技术有限公司 一种虚拟化环境下通信的方法、装置和系统
US20160139944A1 (en) * 2014-11-13 2016-05-19 Freescale Semiconductor, Inc. Method and Apparatus for Combined Hardware/Software VM Migration
CN109918172A (zh) * 2019-02-26 2019-06-21 烽火通信科技股份有限公司 一种虚拟机热迁移方法及系统
CN111666036A (zh) * 2019-03-05 2020-09-15 华为技术有限公司 一种迁移数据的方法、装置及系统
CN111736945A (zh) * 2019-08-07 2020-10-02 北京京东尚科信息技术有限公司 基于智能网卡的虚拟机热迁移方法、装置、设备及介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103530167A (zh) * 2013-09-30 2014-01-22 华为技术有限公司 一种虚拟机内存数据的迁移方法及相关装置和集群系统
CN103618809A (zh) * 2013-11-12 2014-03-05 华为技术有限公司 一种虚拟化环境下通信的方法、装置和系统
US20160139944A1 (en) * 2014-11-13 2016-05-19 Freescale Semiconductor, Inc. Method and Apparatus for Combined Hardware/Software VM Migration
CN109918172A (zh) * 2019-02-26 2019-06-21 烽火通信科技股份有限公司 一种虚拟机热迁移方法及系统
CN111666036A (zh) * 2019-03-05 2020-09-15 华为技术有限公司 一种迁移数据的方法、装置及系统
CN111736945A (zh) * 2019-08-07 2020-10-02 北京京东尚科信息技术有限公司 基于智能网卡的虚拟机热迁移方法、装置、设备及介质

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117519908A (zh) * 2023-12-29 2024-02-06 珠海星云智联科技有限公司 一种虚拟机热迁移方法、计算机设备及介质
CN117519908B (zh) * 2023-12-29 2024-04-09 珠海星云智联科技有限公司 一种虚拟机热迁移方法、计算机设备及介质
CN117785757A (zh) * 2024-02-23 2024-03-29 北京超弦存储器研究院 Cxl内存模组、内存页交换的方法、芯片、介质和系统
CN117785757B (zh) * 2024-02-23 2024-05-28 北京超弦存储器研究院 Cxl内存模组、内存页交换的方法、芯片、介质和系统

Also Published As

Publication number Publication date
EP4421631A1 (en) 2024-08-28
CN116185553A (zh) 2023-05-30

Similar Documents

Publication Publication Date Title
WO2023093418A1 (zh) 数据迁移方法、装置及电子设备
EP3706394B1 (en) Writes to multiple memory destinations
US11704059B2 (en) Remote direct attached multiple storage function storage device
US9672143B2 (en) Remote memory ring buffers in a cluster of data processing nodes
CN112540941B (zh) 一种数据转发芯片及服务器
CN107995129B (zh) 一种nfv报文转发方法和装置
WO2019233322A1 (zh) 资源池的管理方法、装置、资源池控制单元和通信设备
US20110004732A1 (en) DMA in Distributed Shared Memory System
US20100269027A1 (en) User level message broadcast mechanism in distributed computing environment
WO2020087927A1 (zh) 一种内存数据迁移的方法及装置
WO2019153702A1 (zh) 一种中断处理方法、装置及服务器
WO2017201984A1 (zh) 一种数据处理的方法、相关设备及存储系统
US11741039B2 (en) Peripheral component interconnect express device and method of operating the same
US20240330087A1 (en) Service processing method and apparatus
WO2023125524A1 (zh) 数据存储方法、系统、存储访问配置方法及相关设备
EP3465450A1 (en) Improving throughput in openfabrics environments
Shim et al. Design and implementation of initial OpenSHMEM on PCIe NTB based cloud computing
US11334487B2 (en) Cache sharing in virtual clusters
US20230342087A1 (en) Data Access Method and Related Device
WO2024051292A1 (zh) 数据处理系统、内存镜像方法、装置和计算设备
WO2022222977A1 (zh) 用于运行云业务实例的物理服务器的内存管理方法和装置
CN108139980B (zh) 用于合并存储器页的方法和存储器合并功能
US10860334B2 (en) System and method for centralized boot storage in an access switch shared by multiple servers
WO2024169157A1 (zh) 容器热迁移的方法、处理器、主机、芯片及接口卡
US11601515B2 (en) System and method to offload point to multipoint transmissions

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22897501

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2022897501

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2022897501

Country of ref document: EP

Effective date: 20240523

NENP Non-entry into the national phase

Ref country code: DE