WO2017157145A1 - 一种数据预取方法以及装置 - Google Patents

一种数据预取方法以及装置 Download PDF

Info

Publication number
WO2017157145A1
WO2017157145A1 PCT/CN2017/074388 CN2017074388W WO2017157145A1 WO 2017157145 A1 WO2017157145 A1 WO 2017157145A1 CN 2017074388 W CN2017074388 W CN 2017074388W WO 2017157145 A1 WO2017157145 A1 WO 2017157145A1
Authority
WO
WIPO (PCT)
Prior art keywords
prefetching
target
host
data
data block
Prior art date
Application number
PCT/CN2017/074388
Other languages
English (en)
French (fr)
Inventor
徐晓忻
陈立钢
廖义祥
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2017157145A1 publication Critical patent/WO2017157145A1/zh
Priority to US16/133,179 priority Critical patent/US20190037043A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0862Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with prefetch
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/084Multiuser, multiprocessor or multiprocessing cache systems with a shared cache
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0842Multiuser, multiprocessor or multiprocessing cache systems for multiprocessing or multitasking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/4401Bootstrapping
    • G06F9/4416Network booting; Remote initial program loading [RIPL]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1004Server selection for load balancing
    • H04L67/101Server selection for load balancing based on network conditions
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/34Network arrangements or protocols for supporting network services or applications involving the movement of software or configuration parameters 
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services
    • H04L67/568Storing data temporarily at an intermediate stage, e.g. caching
    • H04L67/5681Pre-fetching or pre-delivering data based on network characteristics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • G06F12/109Address translation for multiple virtual address spaces, e.g. segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45575Starting, stopping, suspending or resuming virtual machine instances
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45579I/O management, e.g. providing access to device drivers or storage
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/10Providing a specific technical effect
    • G06F2212/1016Performance improvement
    • G06F2212/1021Hit rate improvement
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/10Providing a specific technical effect
    • G06F2212/1016Performance improvement
    • G06F2212/1024Latency reduction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/15Use in a specific computing environment
    • G06F2212/151Emulated environment, e.g. virtual machine
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/15Use in a specific computing environment
    • G06F2212/152Virtualized environment, e.g. logically partitioned system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/60Details of cache memory
    • G06F2212/6028Prefetching based on hints or prefetch instructions

Definitions

  • the present invention relates to the field of data storage, and in particular, to a data prefetching method and apparatus.
  • Virtualization technology generally deploys multiple virtual machines (English: virtual machine, abbreviation: VM) on the host (English: host), and uses the hypervisor (English: hypervisor) to allocate the resources of the host to each VM, so that each VMs can perform computing functions independently.
  • VM virtual machine
  • hypervisor Hypervisor
  • the VM in the host starts, it needs to read the boot image data of the VM from the storage device connected to the host.
  • some data of the boot image data read by them is repeated. Therefore, at the current stage of technology, when starting a VM cluster, a VM is generally started first, and the boot image data of the VM is written into the cache of the host. In this way, when other VMs are started, the repeated startup mirror data data can be directly obtained from the local cache, and a small amount of non-repeating data can be read from the storage device.
  • the present invention provides a data prefetching method for improving service performance of a host in a cluster system.
  • the first aspect of the present invention provides a data prefetching method suitable for a cluster system.
  • the cluster system includes multiple prefetching devices, each of which is uniquely connected to one host and has one or more disks connected thereto.
  • the prefetching devices are also connected to each other.
  • the present invention is described by taking a first prefetching device to which a first host and a first disk are connected as an example.
  • the first prefetching device receives a data prefetching instruction from the first host before the first host starts the virtual machine, where the data prefetching instruction is used to indicate that the first host needs to use the startup of the virtual machine in the first host. data.
  • the first prefetching device determines one or more target data blocks according to the data prefetch instruction.
  • the first prefetching device obtains the identification information of the target prefetching device from the second prefetching device, and the second prefetching device is the target that saves the target data block.
  • a prefetching device connected to the storage device, and the target prefetching device is a prefetching device in which a target data block is stored in a plurality of prefetching devices of the cluster system. Wherein, if the original storage location of the target data block is the target storage device in the cluster system, when the target prefetch device acquires the target data block, the second prefetch device connected to the target storage device records the target prefetch device. Identification information.
  • the first prefetching device can acquire the identification information of the target prefetching device from the second prefetching device.
  • the first prefetching device determines a target storage location of the target data block according to the identification information of the target data block, and prefetches the target data block from the target storage location to the first prefetching device local.
  • the boot image data originally saved in the host cache is saved on the prefetch device outside the host, and the VM in the host directly obtains the boot image from the prefetch device at startup. The data is fine.
  • the data repeated in the embodiment only needs to be written into the prefetching device once, which reduces the number of reading and writing of data and the occupation of bandwidth.
  • the startup mirror data is saved in the host cache, and the startup mirror data in the method provided by the present invention does not occupy a large amount of cache of the host, so the cache hit ratio of the host is low or the cache occupancy rate is high. The problem is that the host business process is accelerated and the service performance of the host is improved.
  • the first prefetching device may request the second prefetching device to obtain the address information of the target prefetching device, and receive the identifier information returned by the second prefetching device.
  • the identification information of one or more target prefetching devices may be recorded in the identification information list. If the list of identification information returned by the second prefetching device is empty, it means that no prefetching device reads the target data block from the second storage device, and the target data block is only saved in the second storage device. In this case, the first prefetching device determines that the target storage location of the target data block is the second storage device.
  • the first prefetching device may determine the target storage location of the target data block to be acquired based on the identification information of the target prefetching device recorded in the identification information list. Specifically, the first prefetching device determines, according to the identification information of each target prefetching device, the shortest delay in the delay of accessing each target prefetching device, and the target prefetching device corresponding to the shortest delay.
  • the shortest delay is less than the delay of the first prefetching device accessing the target storage device, determining that the target storage location of the target data block is the target prefetching device corresponding to the shortest delay; if the shortest delay is greater than the first prefetching device
  • the time delay of accessing the target storage device determines that the target storage location of the target data block is the target storage device.
  • the first prefetching device may perform alignment cutting on the startup mirror data according to the data prefetching instruction to obtain one or more target data blocks.
  • the first prefetching device may register the virtual storage disk with the hypervisor in the first host to present the connected storage device to the first host in the form of a virtual storage disk.
  • the first host's hypervisor sends a data prefetch command in the form of a data set management (DSM) command to the virtual storage disk in the first host, and the first prefetching device receives the data prefetch command.
  • DSM data set management
  • the first host when the virtual machine in the first host is started, the first host sends a data read instruction to the first prefetching device to indicate that the target data block is read.
  • the first prefetching device sends the locally saved target data block to the first host according to the data read command.
  • a second aspect of the invention provides a prefetching device for use as a first prefetching device in a cluster system.
  • the prefetching device includes: an instruction receiving module, configured to receive a data prefetching instruction from the first host before the first host starts the virtual machine, where the data prefetching instruction is used to instruct the first host to start the virtual in the first host The startup data to be used by the machine; the data determination module is configured to determine one or more target data blocks according to the data prefetch instruction; and the information acquisition module is configured to: when the target data block is not saved in the first prefetch device, Obtaining identification information of the target prefetching device obtained from the second prefetching device, the second prefetching device is a prefetching device connected to the target storage device storing the target data block, and the target prefetching device is a cluster system a pre-fetching device storing a target data block in the plurality of prefetching devices; a location determining module, configured to determine a target
  • the information obtaining module is specifically configured to: request the address information of the target prefetching device from the second prefetching device, and receive the identifier information returned by the second prefetching device.
  • One or more targets can be recorded in the identification information list Identification information of the prefetching device.
  • the location determining module is specifically configured to: if the identifier information list of the target prefetching device is empty, determine that the target storage location is the target storage device.
  • the location determining module is further configured to: if the identifier information returned by the second prefetching device is not empty, determine, according to the identifier information of the target prefetching device recorded in the identifier information list, the target storage of the target data block. position. Specifically, according to the identification information of each target prefetching device, the shortest delay in the delay of accessing each target prefetching device and the target prefetching device corresponding to the shortest delay are determined.
  • the shortest delay is less than the delay of the first prefetching device accessing the target storage device, determining that the target storage location of the target data block is the target prefetching device corresponding to the shortest delay; if the shortest delay is greater than the first prefetching device
  • the time delay of accessing the target storage device determines that the target storage location of the target data block is the target storage device.
  • the data determining module is specifically configured to: perform alignment cutting on the startup mirror data according to the data prefetching instruction to obtain one or more target data blocks.
  • the instruction receiving module is specifically configured to: register, in an initial running phase of the cluster system, a virtual storage disk to the hypervisor in the first host, to present the connected storage device to the first host in the form of a virtual storage disk.
  • the hypervisor of the first host sends a data prefetch command in the form of a DSM command to the virtual storage disk in the first host, and the first prefetching device receives the data prefetch command.
  • the first host when the virtual machine in the first host is started, the first host sends a data read instruction to the first prefetching device to indicate that the target data block is read.
  • the instruction receiving module is further configured to receive the data read instruction.
  • the data prefetching device may further include a data sending module, configured to send the locally saved target data block to the first host according to the data reading instruction.
  • a third aspect of the invention provides a computing device comprising a processor, a memory, a communication interface, and a bus.
  • the processor is configured to execute the data prefetching method provided by the first aspect of the present invention by calling program code saved in the memory.
  • Figure 1 is a schematic diagram of an architecture of a cluster system in the current technology
  • FIG. 2 is a schematic structural diagram of a cluster system provided by the present invention.
  • FIG. 3 is a structural diagram of an embodiment of a computing device provided by the present invention.
  • FIG. 4 is a flowchart of an embodiment of a data prefetching method provided by the present invention.
  • FIG. 5 is a structural diagram of an embodiment of a prefetching device provided by the present invention.
  • the present invention provides a data prefetching method for improving the cache hit ratio of a host of a cluster system when starting a virtual machine.
  • the present invention also provides related prefetching device arrangements, which are described separately below.
  • FIG. 1 For the basic architecture of the cluster system in virtualization technology.
  • There are multiple hosts in the cluster system multiple VMs are deployed in each host, and a hypervisor is deployed to allocate resources of the host to each VM, so that each VM can independently perform computing functions.
  • Each device is connected to a storage device for storing data in the south direction.
  • the storage device may be a disk or a solid state disk (English: solid state disk, abbreviated as SSD).
  • SSD solid state disk
  • a large number of VMs are often deployed on one host.
  • the cluster system starts a large number of VMs, it will generate huge amounts of data read and write operations in a short time. This huge amount of data read and write operations can consume a lot of network bandwidth, affect service services and even cause VM downtime.
  • VM 1 is a Windows system
  • VM 2 is a Linux system
  • the repetition rate of the boot image data of VM 1 and VM 2 is not high.
  • the host needs to save the boot image of the windows system and the boot image of the Linux system in the cache. Therefore, when there are many types of VMs in the host, the boot image data saved in the host cache is greatly increased. Increasing the amount of boot image data saved in the cache can cause a series of problems, such as the host's cache occupancy rate being too high, the cache hit rate being low, and the host service process being slow, which seriously affects the performance of the host.
  • the present application provides a data prefetching method for host performance based on the current technology.
  • the application adds a prefetching device between the host and the storage device, and obtains a cluster system different from the current technology.
  • the architecture thereof is shown in FIG. 2 .
  • the northward direction of each prefetching device is connected to the host, the south is connected to the storage device, and the different prefetching devices are connected to each other in the east-west direction.
  • the prefetching device is configured to prefetch the boot image data (that is, pre-fetched before the virtual machine is started) to the prefetching device, and send the saved boot image data to the host when the host starts the VM. This way the host does not need to save the boot image data to the local cache.
  • a computing device 300 includes a processor 301, a memory 302, a communication interface 303, and a bus 304.
  • the communication interface 303 is a collection of interfaces that the computing device 300 communicates with the host, the storage device, and other computing devices.
  • the communication interface 303 can include a fast external link standard for connecting to the host (English: peripheral component interconnect express, Abbreviations: PCIE) interface, fast non-volatile memory (English: non-volatile memory express, abbreviation: NVMe) interface, serial connection small computer system interface (English: serial attached SCSI, abbreviation: SAS), serial advanced technology Attachment (English: serial advanced technology attachment, abbreviation: SATA) or other interface, the computing device 300 receives the host data prefetch instruction, data read instruction or other instruction through the PCIE interface or other interface, and the locally saved target data The block is sent to the host.
  • PCIE peripheral component interconnect express
  • NVMe non-volatile memory express
  • SATA serial advanced technology Attachment
  • Communication interface 303 may also include a disk controller or other interface for connection to a storage device through which computing device 300 accesses the storage device.
  • the communication interface 303 may further include a network interface (English: network interface card, abbreviated as NIC) for accessing the Ethernet so that multiple computing devices can access each other through the Ethernet.
  • NIC network interface card
  • the communication interface 303 can also be other types of interfaces, which are not limited herein.
  • the memory 302 may include a volatile memory (English: volatile memory), such as a random access memory (English: random-access memory, abbreviation: RAM); the memory may also include a non-volatile memory (English: non-volatile memory) For example, read-only memory (English: read-only memory, abbreviation: ROM), flash memory (English: flash memory), hard disk (English: hard disk drive, abbreviated: HDD) or SSD; the memory 202 may also include the above categories a combination of memory.
  • computing device 300 is used to target The data block is prefetched to the computing device 300 locally, and the prefetched target data block is stored in the memory 302.
  • the program code for implementing the data prefetching method provided by FIG. 4 of the present invention may be stored in the memory 302 and executed by the processor 301.
  • the processor 301 can be a central processing unit (English: central processing unit, CPU for short), a hardware chip or a combination of a CPU and a hardware chip.
  • the processor 301 is running, by calling the program code of the memory 302, the following steps may be performed: receiving a data prefetch instruction from the first host before the first host starts the virtual machine; and determining the target data block according to the data prefetch instruction Obtaining identification information of the target prefetching device from the second prefetching device; determining a target storage location of the target data block according to the identification information of the target data block; acquiring and saving the target data block according to the target storage location of the target data block; Receiving a data read instruction and transmitting the target data block to the first host according to the data read instruction.
  • the processor 301, the memory 302, and the communication interface 303 can implement communication connections with each other through the bus 304, and can also implement communication by other means such as wireless transmission.
  • the present invention also provides a data prefetching method, the prefetching device of FIG. 2 and the computing device 300 of FIG. 3 executing the data prefetching method at runtime.
  • the data prefetching method is described by taking the first prefetching device as an example. For the basic process, refer to FIG. 4, including:
  • the first prefetching device receives a data prefetching instruction sent by the first host, where the data prefetching instruction is used to instruct the first host to start the startup data that is needed by the virtual machine in the first host.
  • the first prefetching device may register the virtual storage disk with the hypervisor in the first host to present the storage device of the southbound connected cluster system to the virtual storage disk.
  • the virtual storage disk may be in the form of a virtual disk such as a virtual NVMe disk, a virtual SAS disk, or a virtual SATA disk, or other forms.
  • a mapping table is configured in the memory of the first prefetching device, where the mapping table is used to record the correspondence between the storage device in the cluster system and the virtual storage disk in the host.
  • the VM and the hypervisor in the first host do not perceive the authenticity of the virtual storage disk, and treat the virtual storage disk as a real physical storage.
  • the Hypervisor is responsible for managing the VMs in the host, so it can detect the startup of the VM.
  • the hypervisor of the first host sends a DSM command to the virtual storage disk in the first host before the VM in the first host starts, and the DSM command is used to indicate the data required for the VM in the first host to be started. .
  • the DSM command sent to the virtual storage disk is actually received by the first prefetch device.
  • the first prefetching device cuts the boot image data to be prefetched into one or more target data blocks according to the data prefetch instruction.
  • the first prefetching device can perform alignment cutting on the startup mirror data according to the storage granularity of the cluster system. For example, if the storage granularity of the cluster system is 1 MB, and the logical address of the boot image data to be prefetched is 2.5 M to 4.5 MB, the first prefetch device can cut the boot image data into 2.5 MB to 3 MB, 3 MB to 4 MB, Three target data blocks of 4MB to 4.5MB.
  • the cutting mode of the boot image data is aligned and cut according to the storage granularity, the data in the obtained single target data block is stored in the same storage device, and the data in different target data blocks may be stored. In different storage devices.
  • the first prefetching device determines the target data block, it is determined whether the data of the target data block has been saved in the first Prefetching device local.
  • the first prefetching device may be configured according to a globally unique identifier (English: globally unique identifier, GUID) of the virtual storage disk corresponding to the target data block, a logical address of the target data block in the virtual storage disk, and saving.
  • the mapping table looks up the storage device where the target data block is located and the logical address in the storage device. Then, by retrieving the logical address of the target data block in the storage device in the local logical address table, it is determined whether the target data block is saved locally in the first prefetch device.
  • step 406 is directly executed.
  • the first prefetching device needs to acquire the target data block to the first prefetching device local.
  • the method of prefetching the target data block by the first prefetching device will be described in detail below through steps 403 to 405.
  • the first prefetching device needs to obtain the identification information of the target prefetching device.
  • the first prefetching device can find the storage device where the target data block is located.
  • the second storage device in which the target data block is stored in the cluster system will be described as an example. Similar to the first host, the first prefetching device and the first storage device, the second storage device is connected to the second pre-fetch device in the north direction, and the second host is connected in the north direction of the second pre-fetch device. It can be seen that other prefetching devices in the cluster system need to pass through the second prefetching device when accessing the second storage device.
  • the prefetching device that saves the target data block is referred to as a target prefetching device.
  • the target prefetching device does not include the first prefetching device. However, it may be any prefetching device (including the second prefetching device) other than the first prefetching device in the storage system.
  • the second prefetching device records the identification information of the target prefetching device, such as an IP address and a device number. . Therefore, the first prefetching device can acquire the identification information of the target prefetching device from the second prefetching device.
  • an access threshold may be set for each prefetching device, and only the number of times accessed by other prefetching devices is less than the accessed threshold and saved.
  • a prefetching device with a target data block is considered to be a target prefetching device.
  • the first prefetching device may request the second prefetching device to obtain the address information of the target prefetching device, and receive the identifier information returned by the second prefetching device.
  • the identification information of one or more target prefetching devices may be recorded in the identification information list.
  • the second storage device is used to refer to the storage device storing the target data block.
  • the second storage device may also be the same storage device as the first storage device.
  • the second prefetching device and the first prefetching device are actually the same prefetching device.
  • the first prefetching device determines the target storage location of the target data block according to the identification information of the target data block.
  • the target storage location is one of one or more storage locations of the target data block in the cluster system. There are many criteria for selecting a target storage location from a storage location of the target data block in the cluster system. For example, the location of the target data block in the storage location in the cluster system may be determined as the closest to the network distance of the first prefetcher. The target storage location, or the location where the first prefetcher access latency is the shortest, is determined as the target storage location. The target storage location can also be determined according to other criteria, which is not limited herein.
  • the first prefetching device determines that the target storage location of the target data block is the second storage device.
  • the identifier information list returned by the second prefetching device is not empty, it indicates that there is a target prefetching device that reads the target data block from the second storage device.
  • the target data block is not only stored in the second storage device but also in the target prefetch device.
  • the first prefetching device may determine the target storage location of the target data block to be acquired based on the identification information of the target prefetching device recorded in the identification information list. For details, refer to the determination methods shown in (1) to (3):
  • the first prefetching device respectively determines the delay in accessing each of the target prefetching devices, and determines the shortest delay t1 in the delay in accessing each of the target prefetching devices, and the target prefetching device corresponding to t1.
  • the first prefetching device determines the delay t2 of accessing the second storage device by the second prefetching device.
  • the first prefetching device determines that the target prefetching device corresponding to t1 is the target storage location of the target data block;
  • the first prefetching device determines that the second storage device is a target storage location of the target data block
  • the first prefetching device may determine that the target prefetching device corresponding to t1 is the target storage location of the target data block, and may also determine that the second storage device is the target storage location of the target data block.
  • the first prefetching device may also determine the target storage location of the target data block by other methods, which is not limited herein.
  • the first prefetching device After determining the acquisition path of the target data block, the first prefetching device prefetches the target data block to the first prefetch device local according to the acquisition path.
  • the second prefetching device may record the identification information of the first prefetching device to indicate that the target data block is saved in the first prefetching device.
  • the data prefetching method provided by the present invention adds a prefetching device between the host and the storage device, and is configured to pre-acquire the boot image data that the host needs to use at the startup according to the data prefetching instruction of the host to the prefetching device locally. For use by the host.
  • the boot image data originally saved in the host cache is saved on the prefetch device outside the host, and the VM in the host can directly obtain the boot image data from the prefetch device at startup.
  • the data repeated in the embodiment only needs to be written into the prefetching device once, which reduces the number of reading and writing of data and the occupation of bandwidth.
  • the startup mirror data is saved in the host cache, and the startup mirror data in the method provided by the present invention does not occupy a large amount of cache of the host, so the cache hit ratio of the host is low or the cache occupancy rate is high.
  • the problem is that the host business process is accelerated and the service performance of the host is improved.
  • the method provided by the present invention may further perform step 406:
  • the target data block is prefetched locally.
  • the first host sends a data read instruction to the first prefetching device to indicate that the target data block is read.
  • the first prefetching device receives the data read command and transmits the locally saved target data block to the first host according to the data read command.
  • the embodiment shown in FIG. 5 introduces the data prefetching method provided by the present invention.
  • the prefetching device for implementing the method will be described below.
  • the basic structure of the prefetching device is as follows:
  • the instruction receiving module 501 is configured to perform the operations in step 401 in the embodiment shown in FIG. 4;
  • the data determining module 502 is configured to perform the operations in step 402 in the embodiment shown in FIG. 4;
  • the information obtaining module 503 is configured to perform the operations in step 403 in the embodiment shown in FIG. 4;
  • a location determining module 504 configured to perform the operations in step 404 in the embodiment shown in FIG. 4;
  • the data saving module 505 is configured to perform the operations in step 405 in the embodiment shown in FIG.
  • the instruction receiving module 501 is further configured to receive a data read instruction sent by the first host, where the data read instruction is used to indicate that the target data block is read.
  • the prefetching device shown in FIG. 5 may further include a data sending module 506, configured to send the target data block to the first host after the command receiving module 501 receives the data reading instruction.
  • the prefetching device provided in FIG. 5 is located between the host and the storage device, wherein the command receiving module 501 receives the data prefetching instruction of the host, the data determining module 502 determines the target data block according to the data prefetching instruction of the host, and the information acquiring module 503 obtains the save. Identification information of one or more target prefetching devices of the target data block; location determining module 504 determines a target storage location of the target data block; data saving module 505 prefetches the target data block from the target storage location to the prefetching device local For use by the host.
  • the boot image data originally saved in the host cache is saved on the prefetch device outside the host, and the VM in the host can directly obtain the boot image data from the prefetch device at startup.
  • the data repeated in the embodiment only needs to be written into the prefetching device once, which reduces the number of reading and writing of data and the occupation of bandwidth.
  • the startup mirror data is saved in the host cache, and the startup mirror data in the method provided by the present invention does not occupy a large amount of cache of the host, so the cache hit ratio of the host is low or the cache occupancy rate is high. The problem is that the host business process is accelerated and the service performance of the host is improved.
  • the disclosed system, apparatus, and method may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the modules is only a logical function division.
  • there may be another division manner for example, multiple modules or components may be combined or Can be integrated into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
  • each functional module in each embodiment of the present invention may be integrated into one processing module, or each module may exist physically separately, or two or more modules may be integrated into one module.
  • the above integrated modules can be implemented in the form of hardware or in the form of software functional modules.
  • the integrated modules if implemented in the form of software functional modules and sold or used as separate products, may be stored in a computer readable storage medium.
  • the technical solution of the present invention which is essential or contributes to the prior art, or all or part of the technical solution, may be embodied in the form of a software product stored in a storage medium.
  • a number of instructions are included to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention.
  • the foregoing storage medium includes various media that can store program codes, such as a USB flash drive, a mobile hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Memory System Of A Hierarchy Structure (AREA)
  • Information Transfer Between Computers (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种数据预取方法及预取装置,用于提升系统的服务性能。该数据预取方法包括:第一预取装置从第一主机处接收数据预取指令;根据数据预取指令,确定一个或多个目标数据块;若第一预取装置中没有保存目标数据块,则从第二预取装置处获取目标预取装置的标识信息;根据目标数据块的标识信息,确定目标数据块的目标存储位置;从目标存储位置处将目标数据块预取到第一预取装置本地。通过这样的方法,使得原本保存在主机缓存中的启动镜像数据被保存在了主机之外的预取装置上,减少了数据的读写次数和带宽的占用,加速了主机业务进程,提升了主机的服务性能。

Description

一种数据预取方法以及装置 技术领域
本发明涉及数据存储领域,尤其涉及一种数据预取方法以及装置。
背景技术
云计算的飞速发展,其背后离不开虚拟化技术的有力支撑。虚拟化技术一般在主机(英文:host)上部署多个虚拟机(英文:virtual machine,缩写:VM),并使用超级管理器(英文:hypervisor)将主机的资源分配给各VM,使得每个VM都可以独立行使计算功能。
主机中的VM在启动时,需要从主机相连的存储装置中读取VM的启动镜像数据。不同的VM在启动时,其读取的启动镜像数据有部分数据是重复的。因此,现阶段的技术在启动VM集群时,一般先启动一台VM,并将该VM的启动镜像数据写入主机的缓存中。这样在其它VM启动时,可以直接从本地缓存中获取重复的启动镜像数据数据,并从存储装置中读取少量的非重复数据即可。
但是在实际应用中,一台主机中的不同VM可能具有不同的类型,而不同类型的VM对应的启动镜像数据之间存在较大差别。因此,当主机中存在多种类型的VM时,缓存中保存的启动镜像数据与待启动的VM需要的启动镜像数据重复率不高。为了尽量减少从存储装置中读取的数据,需要将不同类型的VM的启动镜像数据都写入主机的缓存中。这就导致主机的缓存占用率高,且缓存命中率低,进而使得主机业务进程缓慢,性能达不到使用要求。
发明内容
本发明提供了一种数据预取方法,用于提升集群系统中主机的服务性能。
本发明第一方面提供了一种数据预取方法,适用于集群系统。其中,该集群系统中包括多个预取装置,每个预取装置都唯一连接有一个主机,并连接有一个或多个磁盘。各预取装置之间也彼此相连。本发明以连接有第一主机和第一磁盘的第一预取装置为例进行说明。第一预取装置在第一主机启动虚拟机之前,从第一主机处接收数据预取指令,该数据预取指令用于指示第一主机启动第一主机内的虚拟机所需要使用到的启动数据。第一预取装置根据数据预取指令,确定一个或多个目标数据块。若第一预取装置中没有保存目标数据块,则第一预取装置从第二预取装置处获取目标预取装置的标识信息,该第二预取装置为与保存了目标数据块的目标存储装置相连的预取装置,且该目标预取装置为集群系统的多个预取装置中保存有目标数据块的预取装置。其中,若目标数据块的原始保存位置为集群系统中的目标存储装置,则目标预取装置在获取目标数据块时,与目标存储装置相连的第二预取装置会记录各目标预取装置的标识信息。因此第一预取装置可以从第二预取装置处获取目标预取装置的标识信息。第一预取装置根据目标数据块的标识信息,确定目标数据块的目标存储位置,并从目标存储位置处将目标数据块预取到第一预取装置本地。通过这样的方法,使得原本保存在主机缓存中的启动镜像数据被保存在了主机之外的预取装置上,主机中的VM在启动时直接从预取装置上获取启动镜像 数据即可。与现有技术中直接从存储装置中读取启动镜像数据相比,本实施例中重复的数据只需要被写入一次预取装置即可,减少了数据的读写次数和带宽的占用。与现有技术中将启动镜像数据保存在主机缓存中相比,本发明提供的方法中启动镜像数据不会大量占用主机的缓存,因此不会产生主机的缓存命中率低或缓存占用率高的问题,加速了主机业务进程,提升了主机的服务性能。
可选的,第一预取装置可以向第二预取装置请求目标预取装置的地址信息,并接收第二预取装置返回的标识信息列表。标识信息列表中可以记录有一个或多个目标预取装置的标识信息。若第二预取装置返回的标识信息列表为空,则说明没有任何预取装置从第二存储装置中读取了目标数据块,目标数据块仅保存在第二存储装置中。在这种情况下,第一预取装置确定目标数据块的目标存储位置为该第二存储装置。
可选的,若第二预取装置返回的标识信息列表不为空,则说明存在从第二存储装置中读取了目标数据块的目标预取装置。目标数据块不仅仅保存在第二存储装置中,还保存在目标预取装置中。在这种情况下,第一预取装置可以根据标识信息列表中记录的目标预取装置的标识信息,确定获取目标数据块的目标存储位置。具体的,第一预取装置根据每个目标预取装置的标识信息,确定访问每个目标预取装置的时延中的最短时延,以及该最短时延对应的目标预取装置。若该最短时延小于第一预取装置访问目标存储装置的时延,则确定目标数据块的目标存储位置为最短时延对应的目标预取装置;若该最短时延大于第一预取装置访问目标存储装置的时延,则确定目标数据块的目标存储位置为目标存储装置。通过这样的方法,可以保证从目标存储位置处获取目标数据块的时延尽可能的短。
可选的,第一预取装置可以根据数据预取指令,对启动镜像数据进行对齐切割,得到一个或多个目标数据块。
可选的,在集群系统的初始运行阶段,第一预取装置可以向第一主机中的hypervisor注册虚拟存储盘,以将连接的存储装置以虚拟存储盘的形式呈现给第一主机。第一主机的hypervisor向第一主机中的虚拟存储盘下发数据集管理(英文:data set management,缩写:DSM)命令形式的数据预取命令,第一预取装置接收该数据预取命令。
可选的,当第一主机中的虚拟机启动时,第一主机会向第一预取装置下发数据读取指令,用于指示读取该目标数据块。第一预取装置根据该数据读取指令,将本地保存的目标数据块发送给第一主机。
本发明的第二方面提供了一种预取装置,用于作为集群系统中的第一预取装置。该预取装置包括:指令接收模块,用于在第一主机启动虚拟机之前,从第一主机处接收数据预取指令,该数据预取指令用于指示第一主机启动第一主机内的虚拟机所需要使用到的启动数据;数据确定模块,用于根据数据预取指令,确定一个或多个目标数据块;信息获取模块,用于在第一预取装置中没有保存目标数据块时,获取从第二预取装置处获取目标预取装置的标识信息,该第二预取装置为与保存了目标数据块的目标存储装置相连的预取装置,且该目标预取装置为集群系统的多个预取装置中保存有目标数据块的预取装置;位置确定模块,用于根据目标数据块的标识信息,确定目标数据块的目标存储位置;数据保存模块,用于从目标存储位置处将目标数据块预取到第一预取装置本地。
可选的,信息获取模块具体用于:向第二预取装置请求目标预取装置的地址信息,并接收第二预取装置返回的标识信息列表。标识信息列表中可以记录有一个或多个目标 预取装置的标识信息。位置确定模块具体用于:若目标预取装置的标识信息列表为空,则确定目标存储位置为目标存储装置。
可选的,位置确定模块还用于:若第二预取装置返回的标识信息列表不为空,则根据标识信息列表中记录的目标预取装置的标识信息,确定获取目标数据块的目标存储位置。具体的,根据每个目标预取装置的标识信息,确定访问每个目标预取装置的时延中的最短时延,以及该最短时延对应的目标预取装置。若该最短时延小于第一预取装置访问目标存储装置的时延,则确定目标数据块的目标存储位置为最短时延对应的目标预取装置;若该最短时延大于第一预取装置访问目标存储装置的时延,则确定目标数据块的目标存储位置为目标存储装置。通过这样的方法,可以保证从目标存储位置处获取目标数据块的时延尽可能的短。
可选的,数据确定模块具体用于:根据数据预取指令,对启动镜像数据进行对齐切割,得到一个或多个目标数据块。
可选的,指令接收模块具体用于:在集群系统的初始运行阶段,向第一主机中的hypervisor注册虚拟存储盘,以将连接的存储装置以虚拟存储盘的形式呈现给第一主机。第一主机的hypervisor向第一主机中的虚拟存储盘下发DSM命令形式的数据预取命令,第一预取装置接收该数据预取命令。
可选的,当第一主机中的虚拟机启动时,第一主机会向第一预取装置下发数据读取指令,用于指示读取该目标数据块。指令接收模块还用于接收该数据读取指令。数据预取装置还可以包括数据发送模块,用于根据该数据读取指令,将本地保存的目标数据块发送给第一主机。
本发明的第三方面提供了一种计算设备,包括处理器、存储器、通信接口以及总线。其中,通过调用存储器中保存的程序代码,处理器用于执行本发明第一方面所提供的数据预取方法。
附图说明
图1为现阶段技术中集群系统的一个架构示意图;
图2为本发明提供的集群系统的一个架构示意图;
图3为本发明提供的计算设备的一个实施例结构图;
图4为本发明提供的数据预取方法的一个实施例流程图;
图5为本发明提供的预取装置的一个实施例结构图。
具体实施方式
本发明提供了一种数据预取方法,用于提升集群系统的主机在启动虚拟机时的缓存命中率。本发明还提供了相关的预取装置装置,以下将分别进行描述。
云计算的飞速发展,其背后离不开虚拟化技术的有力支撑。虚拟化技术中集群系统的基本架构请参阅图1。在集群系统中有多台主机,每台主机中部署有多个VM,还部署有hypervisor用于将主机的资源分配给各VM,使得每个VM都可以独立行使计算功能。每台主机的南向都连接有用于存储数据的存储装置,存储装置具体可以为磁盘、固态硬盘(英文:solid state disk,缩写:SSD)。在启动VM时,主机需要从存储装置中读取VM的启动镜像数据供VM使用。
集群系统中,一台主机上往往部署有大量的VM。当集群系统启动大量的VM时,会在短时间内产生巨量的数据读写操作。该巨量的数据读写操作会占用大量的网络带宽,影响服务业务甚至导致VM宕机。
经研究发现,不同的VM在启动时,其读取的启动镜像数据有部分数据是重复的。因此,现阶段的技术在启动VM集群时,一般先启动一台VM,并将该VM的启动镜像数据写入主机的缓存中。这样在其它VM启动时,可以直接从本地缓存中获取重复的启动镜像数据数据,从存储装置中只需要读取少量的非重复数据即可。这样能够减少大量的存储装置读写操作,节省了系统带宽、读写资源和VM启动时间。
但是在实际应用中,一台主机中的不同VM可能具有不同的类型,而不同类型的VM对应的启动镜像数据之间存在较大差别。例如,若VM 1为windows系统,而VM 2为Linux系统,则VM 1和VM 2的启动镜像数据的重复率并不高。此时,为了仍旧达到节省系统带宽、读写资源和VM启动时间的效果,主机需要将windows系统的启动镜像和Linux系统的启动镜像都保存在缓存中。因此,当主机中VM的类型较多时,保存在主机缓存中的启动镜像数据会大大增加。而增大启动镜像数据在缓存中的保存量会引起一系列的问题,如主机的缓存占用率过高、缓存命中率低、主机业务进程缓慢等,严重影响主机的性能。
针对上述问题,本申请在现阶段技术的基础上提供了一种数据预取方法,用于主机性能。其中,本申请在主机和存储装置之间添加了预取装置,得到了与现阶段技术不同的集群系统,其架构如图2所示。从图2中可以看出,每个预取装置的北向与主机相连,南向与存储装置相连,且不同的预取装置在东西向彼此相连。预取装置用于将启动镜像数据预取(即在虚拟机启动前预先获取)到预取装置本地,并在主机启动VM时将保存的启动镜像数据上发给主机。这样主机就不需要将启动镜像数据保存到本地缓存中。
图2中的预取装置可以通过图3中的计算设备300来实现。计算设备300的组织结构请参阅图3,包括处理器301、存储器302、通信接口303以及总线304。
其中,通信接口303为计算设备300与主机、存储装置以及其他计算设备进行通信的接口的集合,例如,通信接口303可以包括用于与主机相连的快速外部链接标准(英文:peripheral component interconnect express,缩写:PCIE)接口、快速非易失性存储器(英文:non-volatile memory express,缩写:NVMe)接口、串行连接小型计算机系统接口(英文:serial attached SCSI,缩写:SAS)、串行高级技术附件(英文:serial advanced technology attachment,缩写:SATA)或其它接口,计算设备300通过该PCIE接口或其它接口接收主机的数据预取指令、数据读取指令或其它指令,并将本地保存的目标数据块发送给主机。通信接口303还可以包括用于与存储装置相连的磁盘控制器或其它接口,计算设备300通过该磁盘控制器或其它接口访问存储装置。此外,通信接口303还可以包括网卡(英文:network interface card,缩写:NIC),用于接入以太网,使得多个计算设备可以通过以太网互相访问。通信接口303还可以为其它形式的接口,此处不做限定。
存储器302可以包括易失性存储器(英文:volatile memory),例如随机存取存储器(英文:random-access memory,缩写:RAM);存储器也可以包括非易失性存储器(英文:non-volatile memory),例如只读存储器(英文:read-only memory,缩写:ROM),快闪存储器(英文:flash memory),硬盘(英文:hard disk drive,缩写:HDD)或SSD;存储器202还可以包括上述种类的存储器的组合。其中,计算设备300用于将目标 数据块预取到计算设备300本地,预取到的目标数据块保存在存储器302中。在通过软件来实现本发明提供的技术方案时,用于实现本发明图4提供的数据预取方法的程序代码可以保存在存储器302中,并由处理器301来执行。
处理器301可以为中央处理器(英文:central processing unit,简称:CPU),硬件芯片或CPU和硬件芯片的组合。处理器301在运行时,通过调用存储器302的程序代码,可以执行如下步骤:在第一主机启动虚拟机之前,从第一主机处接收数据预取指令;根据数据预取指令,确定目标数据块;从第二预取装置处获取目标预取装置的标识信息;根据目标数据块的标识信息,确定目标数据块的目标存储位置;根据目标数据块的目标存储位置,获取并保存目标数据块;接收数据读取指令,并根据数据读取指令将目标数据块发送给第一主机。
处理器301、存储器302以及通信接口303可以通过总线304来实现彼此之间的通信连接,也可以通过无线传输等其它手段实现通信。
本发明还提供了一种数据预取方法,图2中的预取装置以及图3中的计算设备300在运行时执行该数据预取方法。下面仅以第一预取装置为例对该数据预取方法进行描述,其基本流程请参阅图4,包括:
401、在第一主机启动虚拟机之前,从第一主机处接收数据预取指令;
第一预取装置接收第一主机下发的数据预取指令,该数据预取指令用于指示第一主机启动第一主机内的虚拟机所需要使用到的启动数据。
可选的,在集群系统的初始运行阶段,第一预取装置可以向第一主机中的hypervisor注册虚拟存储盘,以将南向连接的集群系统的存储装置以虚拟存储盘的形式呈现给第一主机。其中,虚拟存储盘具体可以为虚拟NVMe盘、虚拟SAS盘、虚拟SATA盘等虚拟磁盘的形式或其它形式。且第一预取装置的存储器中可以保存有映射表,该映射表用于记录集群系统中的存储装置和主机中的虚拟存储盘的对应关系。第一主机中的VM以及hypervisor并不感知虚拟存储盘的真实性,将虚拟存储盘当做真实的物理存储器来对待处理。
Hypervisor负责管理主机中的VM,故能够检测到VM的启动。可选的,第一主机的hypervisor在第一主机中的VM启动前,向第一主机中的虚拟存储盘下发DSM命令,该DSM命令用于指示第一主机中的VM启动所需要的数据。下发给虚拟存储盘的DSM指令实际被第一预取装置接收。
402、根据数据预取指令,确定目标数据块。
第一预取装置根据数据预取指令,将待预取的启动镜像数据切割为一个或多个目标数据块。可选的,第一预取装置可以按照集群系统的存储粒度,对启动镜像数据进行对齐切割。例如,若集群系统的存储粒度为1MB,待预取的启动镜像数据的逻辑地址为2.5M~4.5MB,则第一预取装置可以将启动镜像数据切割为2.5MB~3MB、3MB~4MB、4MB~4.5MB这三个目标数据块。值得指出的是,若对启动镜像数据的切割方式为按照存储粒度进行对齐切割,则得到的单个目标数据块中的数据都存储于同一个存储装置中,不同的目标数据块中的数据可能存储于不同的存储装置中。
第一预取装置确定了目标数据块后,对于每个数据块,都执行本实施例后续403至406的全部步骤。
第一预取装置确定了目标数据块后,判断目标数据块的数据是否已经保存在了第一 预取装置的本地。可选的,第一预取装置可以根据目标数据块对应的虚拟存储盘的全局唯一标识符(英文:globally unique identifier,缩写:GUID)、目标数据块在虚拟存储盘中的逻辑地址、以及保存的映射表,查找目标数据块所位于的存储装置以及在存储装置中的逻辑地址。然后通过在本地的逻辑地址表中检索目标数据块在存储装置中的逻辑地址,来判断目标数据块是否保存在第一预取装置本地。
若目标数据块保存在第一预取装置本地,则无需执行步骤403至405的数据预取操作,直接执行步骤406即可。
若目标数据块没有保存在第一预取装置本地,则第一预取装置需要将目标数据块获取到第一预取装置本地。下面将通过步骤403至405详细介绍第一预取装置预取目标数据块的方法。
403、从第二预取装置处获取目标预取装置的标识信息。
若目标数据块没有保存在第一预取装置本地,则第一预取装置需要获取目标预取装置的标识信息。
步骤402中提到,第一预取装置可以查找到目标数据块所处的存储装置。本实施例中仅以目标数据块保存于集群系统中的第二存储装置为例进行说明。与第一主机、第一预取装置和第一存储装置的连接方式类似的,第二存储装置的北向连接有第二预取装置,第二预取装置的北向连接有第二主机。可以看出,集群系统中其它预取装置在访问第二存储装置时都需要通过第二预取装置。本发明中将保存了目标数据块的预取装置称为目标预取装置,可以理解的,由于第一预取装置中没有保存目标数据块,因此目标预取装置不包括第一预取装置,但可以为存储系中除了第一预取装置以外的任何预取装置(包括第二预取装置)。一般的,目标预取装置通过第二预取装置访问第二存储装置中的目标数据块时,第二预取装置会将目标预取装置的标识信息,如IP地址、设备编号等信息记录下来。因此,第一预取装置可以从第二预取装置处获取目标预取装置的标识信息。
可选的,为了保证每个预取装置的被访问频率不会过高,可以为每个预取装置都设定一个被访问阈值,只有被其它预取装置访问的次数小于被访问阈值且保存有目标数据块的预取装置,才被认为是目标预取装置。
可选的,第一预取装置可以向第二预取装置请求目标预取装置的地址信息,并接收第二预取装置返回的标识信息列表。标识信息列表中可以记录有一个或多个目标预取装置的标识信息。
值得指出的是,本实施例中仅用第二存储装置来指代保存有目标数据块的存储装置,在实际应用中,第二存储装置也可以与第一存储装置是同一个存储装置。在这种情况下,第二预取装置与第一预取装置实际也为同一个预取装置。
404、根据目标数据块的标识信息,确定目标数据块的目标存储位置。
第一预取装置在获取了目标数据块的标识信息后,根据目标数据块的标识信息,确定目标数据块的目标存储位置。其中,目标存储位置为目标数据块在集群系统中的一个或多个存储位置中的一个。从目标数据块在集群系统中的存储位置内选择目标存储位置的标准有很多,例如可以将目标数据块在集群系统中的存储位置中,距第一预取装置的网络距离最近的位置确定为目标存储位置、或将第一预取装置访问时延最短的位置确定为目标存储位置。目标存储位置也可以根据其他的标准确定,此处不做限定。
可选的,若第二预取装置返回的标识信息列表为空,则说明没有任何预取装置从第 二存储装置中读取了目标数据块,目标数据块仅保存在第二存储装置中。在这种情况下,第一预取装置确定目标数据块的目标存储位置为该第二存储装置。
可选的,若第二预取装置返回的标识信息列表不为空,则说明存在从第二存储装置中读取了目标数据块的目标预取装置。目标数据块不仅仅保存在第二存储装置中,还保存在目标预取装置中。在这种情况下,第一预取装置可以根据标识信息列表中记录的目标预取装置的标识信息,确定获取目标数据块的目标存储位置。具体可以参考(1)~(3)所示的确定方法:
(1)第一预取装置分别确定访问每个目标预取装置的时延,并确定访问每个目标预取装置的时延中的最短时延t1,以及t1对应的目标预取装置。
(2)第一预取装置确定通过第二预取装置访问第二存储装置的时延t2。
(3)若t1小于t2,则第一预取装置确定t1对应的目标预取装置为目标数据块的目标存储位置;
若t1大于t2,则第一预取装置确定第二存储装置为目标数据块的目标存储位置;
若t1等于t2,则第一预取装置既可以确定t1对应的目标预取装置为目标数据块的目标存储位置,也可以确定第二存储装置为目标数据块的目标存储位置。
第一预取装置也可以通过其它方法确定目标数据块的目标存储位置,此处不做限定。
405、根据目标数据块的目标存储位置,获取并保存目标数据块。
第一预取装置在确定了目标数据块的获取路径后,根据该获取路径,将目标数据块预取到第一预取装置本地。
可选的,在步骤405之后,第二预取装置可以记录下第一预取装置的标识信息,以表示第一预取装置中保存了目标数据块。
本发明提供的数据预取方法在主机和存储装置之间添加了预取装置,用于根据主机的数据预取指令,将主机在启动时需要使用的启动镜像数据预先获取到预取装置本地,以供主机使用。通过这样的方法,使得原本保存在主机缓存中的启动镜像数据被保存在了主机之外的预取装置上,主机中的VM在启动时直接从预取装置上获取启动镜像数据即可。与现有技术中直接从存储装置中读取启动镜像数据相比,本实施例中重复的数据只需要被写入一次预取装置即可,减少了数据的读写次数和带宽的占用。与现有技术中将启动镜像数据保存在主机缓存中相比,本发明提供的方法中启动镜像数据不会大量占用主机的缓存,因此不会产生主机的缓存命中率低或缓存占用率高的问题,加速了主机业务进程,提升了主机的服务性能。
可选的,本发明提供的方法在完成了启动镜像数据的预取之后,还可以执行步骤406:
406、接收数据读取指令,并根据数据读取指令将目标数据块发送给第一主机。
第一预取装置在执行完步骤401至405后,就将目标数据块预取到了本地。当第一主机中的虚拟机启动时,第一主机会向第一预取装置下发数据读取指令,用于指示读取该目标数据块。第一预取装置接收该数据读取指令,并根据该数据读取指令将本地保存的目标数据块发送给第一主机。
图5所示的实施例介绍了本发明提供的数据预取方法,下面将介绍用于实现该方法的预取装置,其基本结构请参阅图5,包括:
指令接收模块501,用于执行图4所示的实施例中步骤401中的操作;
数据确定模块502,用于执行图4所示的实施例中步骤402中的操作;
信息获取模块503,用于执行图4所示的实施例中步骤403中的操作;
位置确定模块504,用于执行图4所示的实施例中步骤404中的操作;
数据保存模块505,用于执行图4所示的实施例中步骤405中的操作。
图5所示的装置的相关描述可以对应参阅图4所示的方法实施例部分的相关描述和效果进行理解,本处不做过多赘述。
可选的,指令接收模块501还可以接收第一主机下发的数据读取指令,该数据读取指令用于指示读取该目标数据块。图5所示的预取装置还可以包括数据发送模块506,用于在指令接收模块501接收到数据读取指令后,将目标数据块发送给第一主机。
图5提供的预取装置位于主机和存储装置之间,其中指令接收模块501接收主机的数据预取指令,数据确定模块502根据主机的数据预取指令确定目标数据块,信息获取模块503获取保存了目标数据块的一个或多个目标预取装置的标识信息;位置确定模块504确定目标数据块的目标存储位置;数据保存模块505从目标存储位置处将目标数据块预取到预取装置本地,以供主机使用。通过这样的方法,使得原本保存在主机缓存中的启动镜像数据被保存在了主机之外的预取装置上,主机中的VM在启动时直接从预取装置上获取启动镜像数据即可。与现有技术中直接从存储装置中读取启动镜像数据相比,本实施例中重复的数据只需要被写入一次预取装置即可,减少了数据的读写次数和带宽的占用。与现有技术中将启动镜像数据保存在主机缓存中相比,本发明提供的方法中启动镜像数据不会大量占用主机的缓存,因此不会产生主机的缓存命中率低或缓存占用率高的问题,加速了主机业务进程,提升了主机的服务性能。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个模块或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
另外,在本发明各个实施例中的各功能模块可以集成在一个处理模块中,也可以是各个模块单独物理存在,也可以两个或两个以上模块集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。
所述集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。
Figure PCTCN2017074388-appb-000001

Claims (8)

  1. 所述第一预取装置获取所述第一主机向所述虚拟存储盘下发的数据预取指令。
  2. 根据权利要求1至5中任一项所述的数据预取方法,其特征在于,所述方法在所述根据所述获取路径将所述目标数据块保存到所述第一预取装置本地之后还包括:
    所述第一预取装置接收数据读取指令,所述数据读取指令用于指示读取所述目标数据块,且所述数据读取指令由所述第一主机在启动所述第一主机内的虚拟机时发送;
    所述第一预取装置根据所述数据读取指令,将所述第一预取装置中保存的所述目标数据块发送给所述第一主机。
  3. 一种预取装置,作为集群系统中的第一预取装置,所述集群系统中包括多个预取装置,每个所述预取装置连接有主机,每个所述预取装置连接有存储装置,且每个所述预取装置彼此相连,所述预取装置包括:
    指令接收模块,用于在第一主机启动所述第一主机内的虚拟机之前,从所述第一主机处接收数据预取指令,所述数据预取指令用于指示:所述第一主机启动所述第一主机内的虚拟机所需的启动镜像数据;
    数据确定模块,用于根据所述数据预取指令,确定所述第一主机待使用的目标数据块;
    信息获取模块,用于在所述第一预取装置中没有保存所述目标数据块时,从第二预取装置处获取目标预取装置的标识信息,其中,所述第二预取装置连接有目标存储装置,所述目标存储装置中保存有所述目标数据块,且所述目标预取装置中保存有所述目标数据块;
    位置确定模块,用于根据所述目标预取装置的标识信息,确定所述目标数据块的目标存储位置;
    数据保存模块,用于根据所述目标存储位置将所述目标数据块保存到所述第一预取装置本地。
  4. 根据权利要求7所述的预取装置,其特征在于,所述信息获取模块具体用于:向所述第二预取装置请求所述目标预取装置的标识信息,并接收所述第二预取装置返回的所述目标预取装置的标识信息列表,所述目标预取装置的标识信息列表用于记录一个或多个所述目标预取装置的标识信息;
    所述位置确定模块具体用于:若所述目标预取装置的标识信息列表为空,则确定所述目标存储位置为所述目标存储装置。
  5. 根据权利要求8所述的预取装置,其特征在于,所述位置确定模块还用于:
    若所述目标预取装置的标识信息列表不为空,则根据每个所述目标预取装置的标识信息,确定访问每个所述目标预取装置的时延中的最短时延,以及所述最短时延对应的目标预取装置;
    若所述最短时延小于所述第一预取装置通过所述第二预取装置访问所述目标存储装置的时延,则确定所述最短时延对应的目标预取装置为所述目标数据块的目标存储位置
    若所述最短时延大于所述第一预取装置通过所述第二预取装置访问所述目标存储装置的时延,则确定所述目标存储装置为所述目标数据块的目标存储位置。
  6. 根据权利要求6至9中任一项所述的预取装置,其特征在于,所述数据确定模块具体用于:
    根据所述数据预取指令,对所述启动镜像数据进行切割,得到一个或多个所述目标 数据块。
  7. 根据权利要求7至10中任一项所述的预取装置,其特征在于,所述指令接收模块具体用于:
    在所述第一主机中创建虚拟存储盘;
    获取所述第一主机向所述虚拟存储盘下发的数据预取指令。
  8. 根据权利要求7至11中任一项所述的预取装置,其特征在于,所述指令接收模块还用于:接收数据读取指令,所述数据读取指令用于指示读取所述目标数据块,且所述数据读取指令由所述第一主机在启动所述第一主机内的虚拟机时发送;
    所述预取装置还包括:数据发送模块,用于根据所述数据读取指令,将所述第一预取装置中保存的所述目标数据块发送给所述第一主机。
PCT/CN2017/074388 2016-03-17 2017-02-22 一种数据预取方法以及装置 WO2017157145A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US16/133,179 US20190037043A1 (en) 2016-03-17 2018-09-17 Data Prefetching Method and Apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610153153.9 2016-03-17
CN201610153153.9A CN107203480B (zh) 2016-03-17 2016-03-17 一种数据预取方法以及装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/133,179 Continuation US20190037043A1 (en) 2016-03-17 2018-09-17 Data Prefetching Method and Apparatus

Publications (1)

Publication Number Publication Date
WO2017157145A1 true WO2017157145A1 (zh) 2017-09-21

Family

ID=59850734

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/074388 WO2017157145A1 (zh) 2016-03-17 2017-02-22 一种数据预取方法以及装置

Country Status (3)

Country Link
US (1) US20190037043A1 (zh)
CN (2) CN112486858A (zh)
WO (1) WO2017157145A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10552349B1 (en) * 2018-05-31 2020-02-04 Lightbits Labs Ltd. System and method for dynamic pipelining of direct memory access (DMA) transactions

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109308288B (zh) * 2018-09-26 2020-12-08 新华三云计算技术有限公司 数据处理方法及装置
CN115344197A (zh) * 2019-06-24 2022-11-15 华为技术有限公司 一种数据访问方法、网卡及服务器
CN117348793A (zh) * 2022-06-28 2024-01-05 华为技术有限公司 一种数据读取方法、数据加载装置及通信系统
CN114995960A (zh) * 2022-07-19 2022-09-02 银河麒麟软件(长沙)有限公司 一种虚拟机资源池启动优化方法、系统及介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100115206A1 (en) * 2008-11-04 2010-05-06 Gridlron Systems, Inc. Storage device prefetch system using directed graph clusters
CN102148870A (zh) * 2011-03-07 2011-08-10 浪潮(北京)电子信息产业有限公司 一种云存储系统及其实现方法
CN103098043A (zh) * 2010-09-10 2013-05-08 国际商业机器公司 随需虚拟机映像流式传输
CN103902469A (zh) * 2012-12-25 2014-07-02 华为技术有限公司 一种数据预取的方法和系统
CN104933110A (zh) * 2015-06-03 2015-09-23 电子科技大学 一种基于MapReduce的数据预取方法

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8607005B2 (en) * 2006-02-17 2013-12-10 International Business Machines Corporation Monitoring program execution to learn data blocks accessed by software process for facilitating efficient prefetching
JP4909021B2 (ja) * 2006-11-20 2012-04-04 株式会社日立製作所 コピー制御方法及び記憶装置
US8555278B2 (en) * 2011-05-02 2013-10-08 Symantec Corporation Method and system for migrating a selected set of virtual machines between volumes
CN102508638B (zh) * 2011-09-27 2014-09-17 华为技术有限公司 用于非一致性内存访问的数据预取方法和装置
CN102629941B (zh) * 2012-03-20 2014-12-31 武汉邮电科学研究院 云计算系统中虚拟机镜像缓存的方法
US10474691B2 (en) * 2012-05-25 2019-11-12 Dell Products, Lp Micro-staging device and method for micro-staging
US9460024B2 (en) * 2013-03-15 2016-10-04 Vmware, Inc. Latency reduction for direct memory access operations involving address translation
US9547600B2 (en) * 2013-07-30 2017-01-17 Vmware, Inc. Method and system for restoring consumed memory after memory consolidation
CN103559075B (zh) * 2013-10-30 2016-10-05 华为技术有限公司 一种数据传输方法、装置和系统及内存装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100115206A1 (en) * 2008-11-04 2010-05-06 Gridlron Systems, Inc. Storage device prefetch system using directed graph clusters
CN103098043A (zh) * 2010-09-10 2013-05-08 国际商业机器公司 随需虚拟机映像流式传输
CN102148870A (zh) * 2011-03-07 2011-08-10 浪潮(北京)电子信息产业有限公司 一种云存储系统及其实现方法
CN103902469A (zh) * 2012-12-25 2014-07-02 华为技术有限公司 一种数据预取的方法和系统
CN104933110A (zh) * 2015-06-03 2015-09-23 电子科技大学 一种基于MapReduce的数据预取方法

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10552349B1 (en) * 2018-05-31 2020-02-04 Lightbits Labs Ltd. System and method for dynamic pipelining of direct memory access (DMA) transactions

Also Published As

Publication number Publication date
CN112486858A (zh) 2021-03-12
CN107203480A (zh) 2017-09-26
US20190037043A1 (en) 2019-01-31
CN107203480B (zh) 2020-11-17

Similar Documents

Publication Publication Date Title
US9819739B2 (en) Systems and methods for supporting hot plugging of remote storage devices accessed over a network via NVME controller
US10289555B1 (en) Memory read-ahead using learned memory access patterns
WO2017157145A1 (zh) 一种数据预取方法以及装置
US10691341B2 (en) Method for improving memory system performance in virtual machine systems
JP5592942B2 (ja) 仮想マシンシステムにおけるショートカット入出力
US20160077740A1 (en) Systems and methods for enabling local caching for remote storage devices over a network via nvme controller
US20150317088A1 (en) Systems and methods for nvme controller virtualization to support multiple virtual machines running on a host
EP3608790B1 (en) Modifying nvme physical region page list pointers and data pointers to facilitate routing of pcie memory requests
US9983997B2 (en) Event based pre-fetch caching storage controller
US10657052B2 (en) Information handling system with priority based cache flushing of flash dual in-line memory module pool
WO2014089967A1 (zh) 建立虚拟机共享存储缓存的方法及装置
KR101636878B1 (ko) 가상화 환경에서의 데이터 처리 방법 및 드라이버
JP7227907B2 (ja) バイトアドレス可能メモリとして不揮発性メモリにアクセスする方法及び装置
CN115687193A (zh) 存储模块、包括其的系统以及存储模块的操作方法
US20240086113A1 (en) Synchronous write method and device, storage system and electronic device
WO2015172391A1 (zh) 快速数据读写方法和装置
US11983115B2 (en) System, device and method for accessing device-attached memory
US20230325277A1 (en) Memory controller performing selective and parallel error correction, system including the same and operating method of memory device
CN114860625A (zh) 数据访问方法、装置、设备及可读存储介质
US10795771B2 (en) Information handling system with reduced data loss in block mode
US11809341B2 (en) System, device and method for indirect addressing
CN113704165B (zh) 一种超融合服务器、数据处理方法及装置
CN117130551A (zh) 存储装置及其数据访问方法

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17765687

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 17765687

Country of ref document: EP

Kind code of ref document: A1