WO2014094421A1 - 数据处理方法和虚拟机管理平台 - Google Patents

数据处理方法和虚拟机管理平台 Download PDF

Info

Publication number
WO2014094421A1
WO2014094421A1 PCT/CN2013/079573 CN2013079573W WO2014094421A1 WO 2014094421 A1 WO2014094421 A1 WO 2014094421A1 CN 2013079573 W CN2013079573 W CN 2013079573W WO 2014094421 A1 WO2014094421 A1 WO 2014094421A1
Authority
WO
WIPO (PCT)
Prior art keywords
data block
data
hard disk
storage information
identification information
Prior art date
Application number
PCT/CN2013/079573
Other languages
English (en)
French (fr)
Inventor
任努努
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2014094421A1 publication Critical patent/WO2014094421A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects

Definitions

  • the present invention relates to the field of communications technologies, and in particular, to a data processing method and a virtual machine management platform. Background technique
  • Virtualization technology is one of the key technologies in the field of cloud computing.
  • the main principle is to virtualize the physical resources of a physical machine into multiple virtual machines that can run the operating system independently. Each virtual machine can independently implement the physical machine. Function, do not work ⁇ ⁇ , virtual machine use process is consistent with the physical machine, you can think of the virtual machine is an abstract form of the physical machine.
  • FIG. 1 is a schematic structural diagram of an existing virtualization technology.
  • various physical resources including a central processing unit 102, a memory 103, a hard disk 104, and a network card 105 on a hardware platform 101 are managed by a virtual machine management platform.
  • the abstraction is a plurality of virtual machines (VMs) 130, wherein the hard disks 104 can be local or remote.
  • the function of the virtual resource is completely consistent with the function of the real physical resource, and the operation of the virtual resource in the virtual machine is completely consistent with the physical machine.
  • a separate operating system 132 and one or more applications 131 can be installed in different virtual machines.
  • the virtual machine management platform 1 10 is responsible for abstracting the physical hard disk 104 into separate virtual hard disks 124 for use by different virtual machines 130.
  • All virtual machines 130 are used on the surface of separate virtual hard disks 124, which are actually used. Different spaces in one or more physical hard disks 104.
  • the virtual machine management platform 10 maps the request for the virtual machine hard disk 124 to a request for a different space of the physical hard disk 104, so that different virtual machines 130 have their own independent The virtual hard disk 124 does not interfere with each other.
  • I0PS Input/Output Operations Per Second
  • I/O read/write
  • embodiments of the present invention provide a data processing method and a virtual machine management platform.
  • an aspect of the present invention provides a data processing method, including:
  • the acquiring the repetitive data block group having the same data block content between the virtual hard disks includes:
  • the method further includes:
  • the duplicate data record table Querying the duplicate data record table according to the identifier information, and if it is determined that the storage information corresponding to the identifier information is stored in the duplicate data record table, the number is determined according to the identifier information
  • the block is written into the physical hard disk, and the storage information corresponding to the data block in the duplicate data record table is deleted.
  • the obtaining, according to the identification information of the required data block, the corresponding storage information, and according to the storing Reading the data block from the memory includes:
  • the duplicate data record table Querying the duplicate data record table according to the identifier information, if it is determined that the duplicate data record table stores storage information corresponding to the identifier information, and the storage information identifies that the data block is stored in the memory And reading the data block from the memory according to the storage information.
  • the method further includes:
  • the storage information identifies that the data block is stored in the physical hard disk, reading the data block from the physical hard disk storage according to the identification information, and storing the data block into the memory And updating the storage information corresponding to the data block in the duplicate data record table.
  • an aspect of the present invention provides a virtual machine management platform, including:
  • the obtaining module is configured to scan the data blocks corresponding to the virtual hard disks stored on the physical hard disk, obtain the duplicate data block groups having the same data block content between the virtual hard disks, and store the duplicate data blocks in the duplicate data record table. Corresponding relationship between the identification information of each data block in the group and the storage information; the processing module, configured to update the repetition according to the memory address when any data block in each of the duplicate data block groups is stored in the memory from the physical hard disk All stored information in the repeating block group in which the data block is located in the data record table;
  • a reading module configured to: when the data block is read, obtain corresponding storage information according to the identification information of the required read data block, and read the data block from the memory according to the storage information.
  • the acquiring module is specifically configured to:
  • the method further includes: a write module, configured to: Receiving a data block write request carrying the identification information;
  • the reading module is specifically configured to:
  • the duplicate data record table Querying the duplicate data record table according to the identifier information, if it is determined that the duplicate data record table stores storage information corresponding to the identifier information, and the storage information identifies that the data block is stored in the memory And reading the data block from the memory according to the storage information.
  • the reading module is further configured to:
  • the storage information identifies that the data block is stored in the physical hard disk, reading the data block from the physical hard disk storage according to the identification information, and storing the data block into the memory And updating the storage information corresponding to the data block in the duplicate data record table.
  • the data processing method and the virtual machine management platform provided by the embodiment of the present invention scan the data blocks corresponding to the virtual hard disks stored on the physical hard disk to obtain duplicate data blocks in which the data content of each virtual hard disk appears repeatedly, and repeat
  • the data record table stores the correspondence between the identification information of all the duplicate data blocks and the storage information.
  • the duplicate data record table is updated according to the memory address. All the storage information corresponding to the identification information, so that when the data block is read, the corresponding storage information is obtained according to the identification information of the required read data block, and the data block is read from the memory according to the storage information.
  • FIG. 1 is a schematic structural diagram of an existing virtualization technology
  • FIG. 2 is a flowchart of a data processing method according to an embodiment of the present invention
  • 3 is a flow chart of data reading performed by the data processing method of FIG. 2;
  • FIG. 4 is a schematic structural diagram of a virtual machine management platform according to an embodiment of the present invention.
  • FIG. 5 is a schematic structural diagram of another virtual machine management platform according to an embodiment of the present disclosure.
  • FIG. 2 is a flowchart of a data processing method according to an embodiment of the present invention. As shown in FIG. 2, the method includes:
  • Step 100 Scan the data blocks corresponding to the virtual hard disks stored on the physical hard disk, obtain the duplicate data block groups with the same data block content between the virtual hard disks, and store the duplicate data block groups in the duplicate data record table.
  • the invention is not limited thereto.
  • the following embodiments are specifically described by taking the virtual machine management platform as an example.
  • the specific execution process of the remaining execution entities refers to the virtual machine management platform.
  • the virtual machine management platform scans the data blocks corresponding to the virtual hard disks stored on the physical hard disk, and obtains the same data block content between the virtual hard disks according to the data content of each scanned data block.
  • Each of the repeated data block groups wherein each of the repeated data block groups includes at least two data blocks having the same data content. It should be noted that it is inefficient to obtain each of the duplicate data block groups having the same data block content between the virtual hard disks by comparing the contents of each data block one by one, so that the hash algorithm can be used for each data.
  • the data content of the block is calculated to obtain a hash value, and then the hash value of each data block is compared, and each repeated data block group having the same hash value between the virtual hard disks is obtained.
  • the virtual machine management platform is based on the location of each data block in each duplicate block group on the virtual hard disk. And storing, in the duplicate data record table, a correspondence between the identifier information and the storage information of each data block in each of the duplicate data block groups, where the identifier information is used to identify a storage location of the data block in the virtual hard disk, and the identifier information includes the virtual hard disk number.
  • the virtual hard disk data block offset number according to the preset logical algorithm, the virtual hard disk number and the virtual hard disk data block offset number are operated to obtain the storage location of the data block in the physical hard disk, for example: if a data block is virtual The hard disk number is 2, the virtual hard disk data block offset number is 0005, and the virtual hard disk number is combined with the virtual hard disk data block offset number in order.
  • the storage location of the physical hard disk where the data block is located is 20005.
  • the combined algorithm in this example For the sake of example, the specific logic algorithm is adjusted by the technician according to the actual application needs.
  • the storage information is used to indicate whether the data block is stored in the memory from the physical hard disk and in a specific location in the memory.
  • the storage information can be expressed in various forms.
  • the storage information is represented by a memory address
  • the memory address is OxFFFFFFFFFFFF
  • the specific memory address such as 0X1 1 1 10000
  • the memory address is 0X1 1 1 10000. It is worth noting that the virtual machine management platform performs an initial scan of the physical hard disk to obtain each duplicate data block group. All data blocks are not yet stored in the memory, that is, the memory address is OxFFFFFFFF.
  • each duplicate data block group is stored in the duplicate data record table.
  • Method 1 the duplicate data record table is stored in groups, including grouping.
  • the duplicate data record table is stored in the order of the virtual hard disk, including the correspondence between the identification information of each data block in each repeated data block group, the storage information, and the content identifier of the data block, where the data block content identifier
  • the data block content identifier There are many forms of expression, such as characters corresponding to the content of each data block, hash values, etc., wherein the data block content identifier in Table 2 is represented by a hash value, and the storage information is represented by a memory address, ⁇ Table 2 Show,
  • Step 101 When any data block in each repeated data block group is stored in the memory from the physical hard disk, update all the duplicate data block groups in the duplicate data record table in the duplicate data record table according to the memory address.
  • the information is stored such that, when the data block is read, corresponding storage information is acquired according to the identification information of the desired read data block, and the data block is read from the memory according to the storage information.
  • the virtual machine management platform compares the number of data blocks in each repeating block group with The preset threshold is compared. If it is determined that the number of data blocks in the repeated data block group is greater than or equal to a preset threshold, it indicates that the data block of the repeated data block group has a high repetition frequency and is accessed by a large number of virtual machines. Pre-storing any data block in the duplicate data block group from the physical hard disk into the memory, and updating all the storage information in the duplicate data block group in the duplicate data record table according to the memory address, as shown in Table 1.
  • the storage format is exemplified as an example:
  • the number of data blocks in the duplicate data block group with the group identifier 2 in Table 1 is 3, and the number of data blocks in the virtual machine management platform of the virtual machine management platform and the preset threshold 3
  • the data block in the duplicate data block group is stored from the physical hard disk into the memory, the memory address is 0X00001 1 1 1 , and then the duplicate data record table is updated with the duplicate data block group according to the stored memory address. All stored information in the duplicate data record table is modified from OxFFFFFFFF to 0x00001 1 1 1 in the duplicate data block group, according to the After the type of data block processing of all blocks is repeated to establish a duplicate set of data records in Table 3.
  • the virtual machine management platform receives the data block read request sent by the virtual machine and carries the identifier information, and queries the duplicate data record table according to the identifier information of the data block, if the data block is stored in the duplicate data record table.
  • Identification information determining that the data block is a data block in the duplicate data block group, storing the data block from the physical hard disk into the memory according to the identification information, and transmitting the data block to the corresponding virtual machine, according to the memory address Update all stored information in the duplicate block group in which the data block is located in the repeat data record table. Take the storage form of Table 2 as an example.
  • the virtual machine management platform receives the identifier information sent by the virtual machine as virtual hard disk number 1 and the virtual hard disk data block offset number as 0000 data block read request, querying the duplicate data record table according to the identification information to determine that the data block belongs to the data block in the duplicate data block group, and therefore, from the physical hard disk according to the virtual hard disk number 1 and the virtual hard disk data block offset number 0000
  • the data block is read and stored in a memory address of 0X0000AAAA, and then read from the 0X0000AAAA location of the memory to the corresponding virtual machine, and then the duplicated data record table is updated with the data block according to the stored memory address. All the stored information in the data block group.
  • the identification information of the data block having the same hash value in the duplicate data record table is the virtual hard disk number 2, and the virtual hard disk data block is offset.
  • the number is 0005. Therefore, the two data blocks are a repeating data block group, so that the identification information in the duplicate data record table is the virtual hard disk number 1 and the virtual hard disk data block offset number is 0000 according to the memory address 0X0000AAAA.
  • the corresponding memory address, and the identification information is the virtual hard disk number 2, and the virtual hard disk data block offset number is The memory address corresponding to the data block of 0005 is changed from OxFFFFFFFF to 0X0000AAAA.
  • the repeated data record table established after the data block is processed according to the data block read request is as shown in Table 4.
  • the storage information directly identifies the storage location of the data blocks in the duplicate block group in memory (such as the duplicate data record table shown in Table 3 or Table 4), so that when the virtual machine needs to read the data block,
  • the virtual machine management platform searches the duplicate data record table according to the identification information of the read data block to obtain the corresponding storage information, and directly reads the data content of the required data block from the memory according to the storage information and returns the data to the corresponding virtual machine, Then you need to read the data content of the required data block from the physical hard disk.
  • the data processing method provided in this embodiment scans the data blocks corresponding to the virtual hard disks stored on the physical hard disk, and obtains duplicate data blocks in which the data content between the virtual hard disks is repeated, and stores all the duplicate data records in the duplicate data record table. Corresponding relationship between the identification information of the duplicate data block and the storage information.
  • the storage information corresponding to all the identification information of the duplicate data block in the duplicate data record table is updated according to the memory address. So that when the data block is read, the corresponding storage information is obtained according to the identification information of the required read data block, and the data block is read from the memory according to the storage information.
  • the repeated access to the physical hard disk is reduced, the read response speed to the virtual machine is increased, and the service life of the hard disk is prolonged.
  • the virtual machine management platform receives the data block write request that carries the identification information sent by the virtual machine
  • the virtual data record table is queried according to the identification information, and if it is determined that the duplicate data record table stores the storage information corresponding to the identification information,
  • the data block is written into the physical hard disk according to the identification information, and the storage information corresponding to the data block in the duplicate data record table is deleted, and the deleted record is a modified record, indicating that the data block corresponding to the identifier information has been newly written.
  • the incoming data block is overwritten and is no longer the content of the original data block.
  • the virtual machine management platform updates the duplicate data record table according to the preset trigger condition, where the trigger condition includes when the number of write data blocks of the physical hard disk is greater than or equal to a preset first threshold, or when the physical machine is in an idle state. (When the CPU usage and memory usage of the physical machine are less than or equal to the preset second threshold for a period of time), or when the storage information on the duplicate data record table is deleted.
  • the specific update process is as follows: The virtual machine management platform acquires all newly written data blocks between the last update and the update time range, and then calculates the hash value of each data block, and the new hash value and the existing hash value. The hash value is compared.
  • FIG. 3 is a flowchart of data reading performed by the data processing method of FIG. 2, as shown in FIG. As shown, the method includes:
  • Step 200 Receive a data block read request that carries the identifier information, and query the duplicate data record table according to the identifier information.
  • the virtual machine management platform receives the data block read request sent by the virtual machine and carries the identification information, and queries the duplicate data record table according to the identification information of the data block to be read.
  • Step 201 it is determined whether the storage information corresponding to the identification information is stored in the duplicate data record table, and if so, step 203 is performed, otherwise, step 202 is performed;
  • the virtual machine management platform determines whether the storage information corresponding to the identification information that needs to be read is stored in the duplicate data record table, and if it is determined that the storage information corresponding to the identification information that needs to be read is stored in the duplicate data record table, it is determined.
  • the data block is a data block in the repeated data block group, and step 203 is performed; if it is determined that the storage information corresponding to the identification information to be read is not stored in the duplicate data record table, it is determined that the data block is not a duplicate data block group. In the data block, go to step 202.
  • Step 202 Read the data block from a physical hard disk according to the identifier information.
  • the virtual machine management platform determines that the data block to be read is not a data block in the duplicate data block group, and obtains the data block on the physical hard disk according to the virtual hard disk number and the virtual hard disk data block offset number in the identification information.
  • the storage location is stored, and the data block is read from the physical hard disk according to the storage location and sent to the corresponding virtual machine.
  • Step 203 determining, according to the storage information, whether the data block is stored in the memory, and if so, executing step 204, otherwise, performing step 205;
  • the data block is determined according to the storage information corresponding to the identification information of the data block to be read stored in the duplicate data record table. If it is determined to be stored in the memory, if it is determined that the data block has been stored in the memory from the physical hard disk, step 204 is performed. If it is determined that the data block is not stored in the memory from the physical hard disk, step 205 is performed.
  • Step 204 Read the data block from the memory according to the storage information.
  • Step 205 Read the data block from the physical hard disk storage according to the identification information, store the data block into the memory, and read the data block from the memory, and update The storage information corresponding to the data block in the duplicate data record table.
  • the virtual machine management platform obtains the storage location of the data block on the physical hard disk according to the virtual hard disk number and the virtual hard disk data block offset number in the identification information, and reads the data block from the physical hard disk into the memory according to the storage location, And sending the data block to the corresponding virtual machine, and then updating the storage information corresponding to the identification information of the data block in the duplicate data record table according to the memory address stored in the data block.
  • the corresponding storage information is obtained according to the identification information of the required read data block, and the data block is directly read from the memory according to the storage information, and is no longer Read on the physical hard disk, which enables multiple virtual machines to reduce the repeated access to the physical hard disk when accessing the same data content located at different locations on the physical hard disk, improve the read response speed of the virtual machine, and extend the hard disk.
  • the service life when the virtual machine reads the data block, the corresponding storage information is obtained according to the identification information of the required read data block, and the data block is directly read from the memory according to the storage information
  • the foregoing program may be stored in a computer readable storage medium, and the program is executed when executed.
  • the foregoing steps include the steps of the foregoing method embodiments; and the foregoing storage medium includes: a medium that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk.
  • FIG. 4 is a schematic structural diagram of a virtual machine management platform according to an embodiment of the present invention.
  • the virtual machine management platform includes: an obtaining module 11, a processing module 12, and a reading module 13, where the obtaining module 11 is used. Scanning the data blocks corresponding to the virtual hard disks stored on the physical hard disk, acquiring the duplicate data block groups having the same data block content between the virtual hard disks, and storing the data in each duplicate data block group in the duplicate data record table.
  • the processing module 12 is configured to update the duplicate data record table according to the memory address when any data block in each of the duplicate data block groups is stored in the memory from the physical hard disk All the storage information in the set of duplicate data blocks in which the data block is located;
  • the reading module 13 is configured to acquire corresponding storage information according to the identification information of the required read data block when the data block is read, and according to the storage Information reads the data block from the memory.
  • the obtaining module 11 is specifically configured to: use a hash algorithm to input data content of each data block.
  • the row calculation obtains the hash value, compares the hash value of each data block, and obtains each duplicate data block group with the same hash value between the virtual hard disks.
  • FIG. 5 is a schematic structural diagram of another virtual machine management platform according to an embodiment of the present invention.
  • the virtual machine management platform further includes: a write module 14 configured to receive and carry a data block write request of the identification information, querying the duplicate data record table according to the identifier information, and if it is determined that the stored information corresponding to the identifier information is stored in the duplicate data record table, according to the identifier information Writing the data block into the physical hard disk, and deleting the storage information corresponding to the data block in the duplicate data record table.
  • a write module 14 configured to receive and carry a data block write request of the identification information, querying the duplicate data record table according to the identifier information, and if it is determined that the stored information corresponding to the identifier information is stored in the duplicate data record table, according to the identifier information Writing the data block into the physical hard disk, and deleting the storage information corresponding to the data block in the duplicate data record table.
  • the processing module 12 is further configured to: perform an update operation on the duplicate data record table according to the preset trigger condition, where the trigger condition includes: when the number of write data blocks of the physical hard disk is greater than or equal to a preset first threshold, or When the physical machine is in an idle state (when the CPU usage and memory usage of the physical machine are less than or equal to the preset second threshold in a period of time), or when the storage information on the duplicate data record table is deleted.
  • the specific update process is as follows: The virtual machine management platform obtains all newly written data blocks between the last update and the update time range, and then calculates the hash value of each data block, and the new hash value and the existing hash value.
  • the comparison is performed, and if it is determined that the new data block belongs to the existing duplicate data block group in the repeated data record table, the corresponding relationship between the identification information of the new data block and the stored information is added to the repeated data block group; If it is determined that the new data block group having the same new data content is known, the correspondence relationship between the identification information and the storage information of each data block in the new duplicate data block group is added to the duplicate data record table; if it is determined that the duplicate data record table is obtained If the number of data blocks in the existing duplicate block group is one, the correspondence between the identification information of the data block and the stored information is deleted in the duplicate data record table.
  • the reading module 13 is specifically configured to: receive a data block read request carrying the identification information, query the repeated data record table according to the identifier information, and if it is determined that the duplicate data record table is stored in the And the storage information corresponding to the identifier information, and the storage information identifies that the data block is stored in the memory, and the data block is read from the memory according to the storage information.
  • the reading module 13 is further configured to: if the storage information identifies that the data block is stored in The physical hard disk reads the data block from the physical hard disk storage according to the identification information, stores the data block into the memory, and updates the duplicate data record table and the The storage information corresponding to the data block.
  • the virtual machine management platform 300 includes: a processor 301, a memory 302, a communication interface 303, and a bus 304.
  • the processor 301, the memory 302 and the communication interface 303 are connected by a bus 304.
  • the bus 304 can be an ISA bus, a PCI bus, or a ⁇ SA bus.
  • the bus can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only one thick line is shown in Figure 6, but it does not mean that there is only one bus or one type of bus.
  • the memory 302 is used to store program code, and the program code includes computer operation instructions.
  • the memory 302 may be a high speed random access memory or a non-volatile memory, for example, at least one disk saver.
  • the processor 301 executes the program code for:
  • the process for the processor 301 to obtain each duplicate data block group having the same data block content between the virtual hard disks includes:
  • processor 301 is further configured to:
  • Receiving a data block write request carrying the identification information carrying the identification information; Querying the duplicate data record table according to the identifier information, and if it is determined that the storage information corresponding to the identifier information is stored in the duplicate data record table, writing the data block according to the identifier information And storing, in the physical hard disk, the storage information corresponding to the data block in the duplicate data record table.
  • the process of the processor 301 acquiring the corresponding storage information according to the identification information of the data block to be read, and reading the data block from the memory according to the storage information specifically includes:
  • the duplicate data record table Querying the duplicate data record table according to the identifier information, if it is determined that the duplicate data record table stores storage information corresponding to the identifier information, and the storage information identifies that the data block is stored in the memory And reading the data block from the memory according to the storage information.
  • the process of the processor 301 acquiring the corresponding storage information according to the identification information of the data block to be read, and reading the data block from the memory according to the storage information further includes:
  • the storage information identifies that the data block is stored in the physical hard disk, reading the data block from the physical hard disk storage according to the identification information, and storing the data block into the memory And updating the storage information corresponding to the data block in the duplicate data record table.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明提供一种数据处理方法和虚拟机管理平台,通过扫描物理硬盘上存储的、与各个虚拟硬盘对应的数据块,获取各个虚拟硬盘之间数据内容出现重复的重复数据块,并在重复数据记录表中存储所有重复数据块的标识信息和存储信息的对应关系,当任一重复数据块从物理硬盘中存储到内存中时,根据内存地址更新重复数据记录表中与重复数据块的所有标识信息对应的存储信息,以便在读取数据块时,根据所需读取数据块的标识信息获取对应的存储信息,并根据存储信息从内存中读取数据块。实现了多台虚拟机在访问位于物理硬盘不同位置的相同数据内容时,减少了对物理硬盘的反复访问,提高了对虚拟机的读请求响应速度,并且延长了硬盘的使用寿命。

Description

数据处理方法和虚拟机管理平台 技术领域 本发明实施例涉及通信技术领域,尤其涉及一种数据处理方法和虚拟机管 理平台。 背景技术
虚拟化技术是云计算领域的关键技术之一, 其主要原理是将一台物理 机的物理资源虚拟化为多个可以独立运行操作系统的虚拟机, 每个虚拟机 都可以独立实现物理机的功能, 互不干 ·ί尤, 虚拟机使用过程与物理机一致, 可以认为虚拟机是物理机的一种抽象化形式。
图 1是现有虚拟化技术的结构示意图, 如图 1所示, 硬件平台 101上的 各种物理资源 (包括中央处理器 102、 内存 103、 硬盘 104、 网卡 105 )被虚 拟机管理平台 1 10抽象为多个虚拟机(Virtual Machine, VM ) 130 , 其中硬 盘 104可以位于本地或远端。虚拟资源的功能与真实物理资源的作用完全一 致, 用户在虚拟机中对虚拟资源的操作与物理机完全一致。 在不同的虚拟 机中可以安装独立的操作系统 132 , 以及一个或多个应用程序 131。 虚拟机 管理平台 1 10负责将物理硬盘 104抽象为各个独立的虚拟硬盘 124 , 供不同 的虚拟机 130使用, 所有虚拟机 130表面上使用的是各个独立的虚拟硬盘 124 , 实际上则是使用的一个或多个物理硬盘 104中的不同空间。 当某个虚 拟机 104访问其对应的虚拟硬盘 124时,虚拟机管理平台 1 10将对虚拟机硬盘 124的请求映射为对物理硬盘 104不同空间的请求, 从而实现不同虚拟机 130拥有各自独立的虚拟硬盘 124 , 互不干扰。
在虚拟化技术中由于多个虚拟机共享同一个物理硬盘, 这就意味着在 同一时间可能有一个或几个,甚至几十个虚拟机在访问各自的虚拟机硬盘, 对应到物理硬盘上, 则是在访问同一个物理硬盘。 物理硬盘有多项技术指 标, 其中最重要的一项指标是(Input/Output Operations Per Second , I0PS) , 即每秒进行读写 (I/O )操作的次数, 该值越大, 表示硬盘响应请 求的速度越快、 延迟越小。 在这项指标中, 读操作相比于写操作, 更影响 虚拟机性能。 在硬盘发展的当前阶段, 其所能提供的 IOPS数值是有限的。 当物理硬盘被一个或几个虚拟机同时访问时, 不会有太大的性能瓶颈, 用 户感受不到明显的延迟。 但是, 当物理硬盘被几十个用户在同时访问时, 则会有明显的延迟产生, 这极大影响了虚拟机性能。 例如, 当早上大量用 户同时进行开机操作时, 需要同时访问硬盘获取系统数据, 这时就会发现 开机速度有明显的降低, 这种现象称为 "启动风暴" ; 又如, 当大量用户 进行杀毒操作时, 也会出现明显的延迟, 可以称之为 "杀毒风暴" 。 发明内容 针对现有技术的上述缺陷,本发明实施例提供一种数据处理方法和虚拟机 管理平台。
第一方面, 本发明一方面提供一种数据处理方法, 包括:
扫描物理硬盘上存储的、 与各个虚拟硬盘对应的数据块, 获取各个虚拟硬 盘之间数据块内容相同的各重复数据块组,并在重复数据记录表中存储各重复 数据块组中各数据块的标识信息和存储信息的对应关系;
当各重复数据块组中任一数据块从所述物理硬盘中存储到内存中时,根据 内存地址更新所述重复数据记录表中所述数据块所在的重复数据块组中的所 有存储信息, 以便在读取数据块时,根据所需读取数据块的标识信息获取对应 的存储信息, 并根据所述存储信息从所述内存中读取所述数据块。
在第一种可能的实现方式中,所述获取各个虚拟硬盘之间数据块内容相同 的各重复数据块组包括:
釆用哈希算法对每个数据块的数据内容进行计算获取哈希值;
比较每个数据块的哈希值,获取各个虚拟硬盘之间哈希值相同的各重复数 据块组。
结合第一方面或第一方面的第一种可能的实现方式,在第二种可能的实现 方式中, 还包括:
接收携带标识信息的数据块写入请求;
根据所述标识信息查询所述重复数据记录表,若判断获知所述重复数据记 录表中存储有与所述标识信息对应的存储信息,则根据所述标识信息将所述数 据块写入所述物理硬盘中,并删除所述重复数据记录表中与所述数据块对应的 存储信息。
结合第一方面或第一方面的第一种可能的实现方式,在第三种可能的实现 方式中, 所述根据所需读取数据块的标识信息获取对应的存储信息, 并根据所 述存储信息从所述内存中读取所述数据块包括:
接收携带标识信息的数据块读取请求;
根据所述标识信息查询所述重复数据记录表,若判断获知所述重复数据记 录表中存储有与所述标识信息对应的存储信息 ,且所述存储信息标识所述数据 块存储在所述内存中, 则根据所述存储信息从所述内存中读取所述数据块。
结合第一方面的第三种可能的实现方式,在第四种可能的实现方式中,还 包括:
若所述存储信息标识所述数据块存储在所述物理硬盘中,则根据所述标识 信息从所述物理硬盘存储中读取所述数据块,并将所述数据块存储到所述内存 中, 并更新所述重复数据记录表中与所述数据块对应的存储信息。
第二方面, 本发明一方面提供一种虚拟机管理平台, 包括:
获取模块, 用于扫描物理硬盘上存储的、 与各个虚拟硬盘对应的数据块, 获取各个虚拟硬盘之间数据块内容相同的各重复数据块组,并在重复数据记录 表中存储各重复数据块组中各数据块的标识信息和存储信息的对应关系; 处理模块,用于当各重复数据块组中任一数据块从所述物理硬盘中存储到 内存中时,根据内存地址更新所述重复数据记录表中所述数据块所在的重复数 据块组中的所有存储信息;
读取模块, 用于在读取数据块时,根据所需读取数据块的标识信息获取对 应的存储信息, 并根据所述存储信息从所述内存中读取所述数据块。
在第一种可能的实现方式中, 所述获取模块具体用于:
釆用哈希算法对每个数据块的数据内容进行计算获取哈希值;
比较每个数据块的哈希值,获取各个虚拟硬盘之间哈希值相同的各重复数 据块组。
结合第二方面或第二方面的第一种可能的实现方式,在第二种可能的实现 方式中, 还包括: 写入模块, 用于 接收携带标识信息的数据块写入请求;
根据所述标识信息查询所述重复数据记录表,若判断获知所述重复数据记 录表中存储有与所述标识信息对应的存储信息,则根据所述标识信息将所述数 据块写入所述物理硬盘中,并删除所述重复数据记录表中与所述数据块对应的 存储信息。
结合第二方面或第二方面的第一种可能的实现方式,在第三种可能的实现 方式中, 所述读取模块具体用于:
接收携带标识信息的数据块读取请求;
根据所述标识信息查询所述重复数据记录表,若判断获知所述重复数据记 录表中存储有与所述标识信息对应的存储信息 ,且所述存储信息标识所述数据 块存储在所述内存中, 则根据所述存储信息从所述内存中读取所述数据块。
结合第二方面的第三种可能的实现方式,在第四种可能的实现方式中, 所 述读取模块还用于:
若所述存储信息标识所述数据块存储在所述物理硬盘中,则根据所述标识 信息从所述物理硬盘存储中读取所述数据块,并将所述数据块存储到所述内存 中, 并更新所述重复数据记录表中与所述数据块对应的存储信息。
本发明实施例提供的数据处理方法和虚拟机管理平台,通过扫描物理硬盘 上存储的、 与各个虚拟硬盘对应的数据块, 获取各个虚拟硬盘之间数据内容出 现重复的重复数据块,并在重复数据记录表中存储所有重复数据块的标识信息 和存储信息的对应关系, 当任一重复数据块从物理硬盘中存储到内存中时,根 据内存地址更新重复数据记录表中与重复数据块相关的所有标识信息对应的 存储信息, 以便在读取数据块时,根据所需读取数据块的标识信息获取对应的 存储信息, 并根据存储信息从内存中读取数据块。 实现了多台虚拟机在访问位 于物理硬盘不同位置的相同数据内容时, 减少了对物理硬盘的反复访问,提高 了对虚拟机的读请求响应速度, 并且延长了硬盘的使用寿命。 附图说明 图 1是现有虚拟化技术的结构示意图;
图 2为本发明实施例提供的一个数据处理方法的流程图; 图 3为针对图 2的数据处理方法进行的数据读取的流程图;
图 4为本发明实施例提供的一个虚拟机管理平台的结构示意图;
图 5为本发明实施例提供的另一个虚拟机管理平台的结构示意图;
具体实施方式 为使本发明实施例的目的、技术方案和优点更加清楚, 下面将结合本发明 实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然, 所描述的实施例是本发明一部分实施例, 而不是全部的实施例。基于本发明中 的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其 他实施例, 都属于本发明保护的范围。
图 2为本发明实施例提供的一个数据处理方法的流程图,如图 2所示,该方 法包括:
步骤 100, 扫描物理硬盘上存储的、 与各个虚拟硬盘对应的数据块, 获取 各个虚拟硬盘之间数据块内容相同的各重复数据块组,并在重复数据记录表中 存储各重复数据块组中各数据块的标识信息和存储信息的对应关系; 根据具体部署的软硬件资源和用户需求进行选择, 例如: 虚拟机管理平台、硬 件平台上的处理器、 位于物理机上的数据处理装置等, 本发明对此不作限定。 为了更清楚的描述具体的实施过程 ,下面所示实施例均以虚拟机管理平台为例 进行具体说明, 其余执行主体的具体执行过程参考虚拟机管理平台。 当物理机 启动后,虚拟机管理平台对物理硬盘上存储的、与各个虚拟硬盘对应的数据块 进行扫描,根据所扫描的每个数据块的数据内容获取各个虚拟硬盘之间数据块 内容相同的各重复数据块组, 其中,每个重复数据块组中包括至少两个数据内 容相同的数据块。 需要说明的是, 通过一一比较每个数据块内容是否相同来获 取各个虚拟硬盘之间数据块内容相同的各重复数据块组的方式效率较低,因此 可以釆用哈希算法对每个数据块的数据内容进行计算获取哈希值,然后比较每 个数据块的哈希值, 获取各个虚拟硬盘之间哈希值相同的各重复数据块组。
虚拟机管理平台根据各重复数据块组中的各数据块在虚拟硬盘中的位置 , 在重复数据记录表中存储各重复数据块组中各数据块的标识信息和存储信息 的对应关系, 其中, 标识信息用于标识数据块在虚拟硬盘中的存储位置, 标识 信息具体包括虚拟硬盘编号和虚拟硬盘数据块偏移号,按照预设的逻辑算法对 虚拟硬盘编号和虚拟硬盘数据块偏移号进行运算可以获取数据块在物理硬盘 中的存储位置, 例如: 若一个数据块所在的虚拟硬盘编号为 2 , 虚拟硬盘数据 块偏移号为 0005 , 将虚拟硬盘编号与虚拟硬盘数据块偏移号顺序组合即为该 数据块所在的物理硬盘的存储位置为 20005 , 该例子中的组合算法只是为了举 例说明, 具体的逻辑算法由技术人员根据实际应用需要进行调整。存储信息用 于表示数据块是否从物理硬盘中存储到内存中以及在内存中的具体位置,存储 信息可以有多种表现形式, 举例说明: 若存储信息用内存地址进行表示, 当内 存地址为 OxFFFFFFFF时, 表示该数据块还没有从物理硬盘中存储到内存中, 当内存地址不为 OxFFFFFFFF时而是具体的内存地址比如 0X1 1 1 10000 , 表示 该数据块已经从物理硬盘中存储到内存中, 且内存地址为 0X1 1 1 10000。 值得 注意的是, 虚拟机管理平台对物理硬盘进行初始化扫描获取各重复数据块组 时, 所有数据块都还没有存储到内存中即内存地址均为 OxFFFFFFFF。
需要说明的是,虚拟机管理平台对物理硬盘进行初始化扫描获取各重复数 据块组时, 所有数据块都还没有存储到内存中, 此时, 在重复数据记录表中存 储各重复数据块组中各数据块的标识信息和存储信息的对应关系的具体形式 有很多, 本领域技术人员可以根据具体的应用需要进行选择, 具体说明如下: 方式一, 重复数据记录表中按组进行存储, 包括分组标识、 各重复数据块组中 各数据块的标识信息和存储信息的对应关系, 其中, 表 1中的存储信息用内存 地址进行表示, ^口表 1所示,
Figure imgf000008_0001
Figure imgf000008_0002
N 1 0032 OxFFFFFFFF
5 0003 OxFFFFFFFF 方式二, 重复数据记录表中按虚拟硬盘顺序进行存储, 包括各重复数据块 组中各数据块的标识信息、 存储信息和数据块内容标识的对应关系, 其中, 数 据块内容标识的表现形式很多, 如与每个数据块内容对应的字符、 哈希值等, 其中, 表 2中的数据块内容标识用哈希值表示, 以及存储信息用内存地址表示 进行说明, ^表 2所示,
Figure imgf000009_0001
Figure imgf000009_0002
步骤 101 , 当各重复数据块组中任一数据块从所述物理硬盘中存储到内存 中时,根据内存地址更新所述重复数据记录表中所述数据块所在的重复数据块 组中的所有存储信息, 以便在读取数据块时,根据所需读取数据块的标识信息 获取对应的存储信息, 并根据所述存储信息从所述内存中读取所述数据块。
当虚拟机管理平台获知各重复数据块组中任一数据块从物理硬盘中存储 到内存中时,根据内存地址更新重复数据记录表中该数据块所在的重复数据块 组中的所有存储信息。具体地,各重复数据块组中任一数据块从物理硬盘中存 储到内存中的情况有很多, 以两种具体应用场景举例说明:
第一种应用场景,虚拟机管理平台将每个重复数据块组中的数据块数量与 预设的门限进行比较,若判断获知重复数据块组中的数据块数量大于等于预设 的门限, 则说明该重复数据块组的数据块的重复频率较高,会被大量虚拟机访 问, 因此预先将该重复数据块组中的任一数据块从物理硬盘中存储到内存中, 根据内存地址更新重复数据记录表中该数据块所在的重复数据块组中的所有 存储信息, 以表 1的存储形式为例具体说明: 表 1中分组标识为 2的重复数据块 组中的数据块的数量为 3, 虚拟机管理平台将该重复数据块组中的数据块的数 量与预设的门限 3比较后, 将该重复数据块组中的任一个数据块从物理硬盘中 存储到内存中, 内存地址为 0X00001 1 1 1 , 然后根据存放的内存地址更新重复 数据记录表中与该重复数据块组中的所有存储信息 ,即将重复数据记录表中该 重复数据块组中的所有内存地址从 OxFFFFFFFF修改为 0X00001 1 1 1 , 根据该 方式对所有的重复数据块组中的数据块处理后建立的重复数据记录表如表 3所
Figure imgf000010_0001
Figure imgf000010_0002
第二种应用场景,虚拟机管理平台接收到虚拟机发送的携带标识信息的数 据块读取请求,根据该数据块的标识信息查询重复数据记录表, 若重复数据记 录表中存储有该数据块的标识信息 ,则确定该数据块为重复数据块组中的数据 块, 则根据标识信息将该数据块从物理硬盘中存储到内存中, 并将数据块发送 给对应的虚拟机,根据内存地址更新重复数据记录表中该数据块所在的重复数 据块组中的所有存储信息。 以表 2的存储形式为例具体说明: 虚拟机管理平台 接收到虚拟机发送的标识信息为虚拟硬盘编号为 1 , 虚拟硬盘数据块偏移号为 0000的数据块读取请求, 根据标识信息查询重复数据记录表确定该数据块属 于重复数据块组中的数据块, 因此, 根据虚拟硬盘编号 1和虚拟硬盘数据块偏 移号 0000从物理硬盘上读取该数据块并存储到内存地址为 0X0000AAAA的位 置上, 再从内存的 0X0000AAAA位置读取发送给对应的虚拟机, 然后根据存 放的内存地址更新重复数据记录表中与该数据块所在的重复数据块组中的所 有存储信息, 由于该数据块的哈希值为 ABC123, 则重复数据记录表中具有相 同哈希值的数据块的标识信息为虚拟硬盘编号为 2 , 虚拟硬盘数据块偏移号为 0005 , 因此, 这两个数据块为一个重复数据块组, 从而根据内存地址 0X0000AAAA将重复数据记录表中标识信息为虚拟硬盘编号为 1 , 虚拟硬盘数 据块偏移号为 0000的数据块所对应的内存地址, 以及标识信息为虚拟硬盘编 号为 2 , 虚拟硬盘数据块偏移号为 0005的数据块所对应的内存地址从 OxFFFFFFFF修改为 0X0000AAAA。根据数据块读取请求逐渐对数据块处理后 建立的重复数据记录表如表 4所示,
Figure imgf000011_0001
Figure imgf000011_0002
综上所述, 当各重复数据块组中任一数据块从物理硬盘中存储到内存中 时,根据内存地址更新重复数据记录表中该数据块所在的重复数据块组中的所 有存储信息 ,存储信息直接标识了重复数据块组中的数据块在内存中的存储位 置, (如表 3或表 4所示的重复数据记录表),从而当虚拟机需要读取数据块时, 虚拟机管理平台根据所需读取数据块的标识信息查看重复数据记录表获取对 应的存储信息 ,并根据存储信息直接从内存中读取所需数据块的数据内容返回 给对应的虚拟机, 不再需要从物理硬盘中读取所需数据块的数据内容。
本实施例提供的数据处理方法, 通过扫描物理硬盘上存储的、与各个虚拟 硬盘对应的数据块, 获取各个虚拟硬盘之间数据内容出现重复的重复数据块, 并在重复数据记录表中存储所有重复数据块的标识信息和存储信息的对应关 系, 当任一重复数据块从物理硬盘中存储到内存中时,根据内存地址更新重复 数据记录表中与重复数据块的所有标识信息对应的存储信息 ,以便在读取数据 块时,根据所需读取数据块的标识信息获取对应的存储信息, 并根据存储信息 从内存中读取数据块。实现了多台虚拟机在访问位于物理硬盘不同位置的相同 数据内容时, 减少了对物理硬盘的反复访问,提高了对虚拟机的读请求响应速 度, 并且延长了硬盘的使用寿命。
进一步地,当虚拟机管理平台接收虚拟机发送的携带标识信息的数据块写 入请求时,根据标识信息查询重复数据记录表, 若判断获知重复数据记录表中 存储有与标识信息对应的存储信息, 则根据标识信息将数据块写入物理硬盘 中, 并删除重复数据记录表中与该数据块对应的存储信息, 该删除记录即为修 改记录,说明该标识信息对应的数据块已经被新写入的数据块覆盖了, 不再是 原数据块的内容。
虚拟机管理平台根据预设的触发条件对重复数据记录表进行更新操作 ,其 中,触发条件包括当物理硬盘的写数据块数量大于等于预设第一阔值时,或者, 物理机处于空闲状态时(指物理机的 CPU使用率及内存使用量在一段时间范围 内小于等于预设的第二阔值时), 或者, 每次重复数据记录表上的存储信息被 删除时。具体更新过程为: 虚拟机管理平台获取上一次更新到此次更新时间范 围之间所有新写入的数据块, 然后计算每个数据块的哈希值,将新的哈希值与 已有哈希值进行对比,若判断获知新的数据块归属于重复数据记录表中已有的 重复数据块组时,则在该重复数据块组中添加新数据块的标识信息和存储信息 的对应关系; 若判断获知具有新的数据内容相同的重复数据块组, 则在重复数 据记录表中添加新的重复数据块组中各数据块的标识信息和存储信息的对应 关系;若判断获知重复数据记录表中已有的重复数据块组中的数据块数量为一 个, 则在重复数据记录表中删除该数据块的标识信息和存储信息的对应关系。 基于上述实施例, 为了更清楚的说明根据重复数据记录表读取数据过程, 下面通过图 3进行具体说明, 图 3为针对图 2的数据处理方法进行的数据读取的 流程图, 如图 3所示, 该方法包括:
步骤 200, 接收携带标识信息的数据块读取请求, 根据所述标识信息查询 所述重复数据记录表;
虚拟机管理平台接收到虚拟机发送的携带标识信息的数据块读取请求,根 据所需读取的数据块的标识信息查询重复数据记录表。
步骤 201 , 判断重复数据记录表中是否存储有与所述标识信息对应的存储 信息, 若是, 执行步骤 203 , 否则, 执行步骤 202;
虚拟机管理平台判断重复数据记录表中是否存储有与所需读取的标识信 息对应的存储信息,若判断获知重复数据记录表中存储有所需读取的标识信息 对应的存储信息,则确定该数据块为重复数据块组中的数据块,执行步骤 203 ; 若判断获知重复数据记录表中没有存储所需读取的标识信息对应的存储信息, 则确定该数据块不为重复数据块组中的数据块, 执行步骤 202。
步骤 202, 根据所述标识信息从物理硬盘中读取所述数据块。
虚拟机管理平台确定所需读取的数据块不为重复数据块组中的数据块时, 根据标识信息中的虚拟硬盘编号和虚拟硬盘数据块偏移号获取该数据块的在 物理硬盘上的存储位置,并根据该存储位置从物理硬盘中读取数据块并发送给 对应的虚拟机。
步骤 203 , 根据存储信息判断所述数据块是否存储在内存中, 若是, 执行 步骤 204, 否则, 执行步骤 205;
虚拟机管理平台确定所需读取的数据块为重复数据块组中的数据块时 ,根 据重复数据记录表中存储的与所需读取的数据块的标识信息对应的存储信息 判断该数据块是否存储在内存中,若判断获知该数据块已经从物理硬盘中存储 到内存中, 执行步骤 204 , 若判断获知该数据块没有从物理硬盘中存储到内存 中, 执行步骤 205。
步骤 204, 根据所述存储信息从所述内存中读取所述数据块。
虚拟机管理平台根据存储信息获知该数据块在内存中存储的内存地址,并 根据该内存地址直接从内存中读取数据块并发送给对应的虚拟机。 步骤 205, 根据所述标识信息从所述物理硬盘存储中读取所述数据块, 并 将所述数据块存储到所述内存中,再从所述内存中读取所述数据块, 并更新所 述重复数据记录表中与所述数据块对应的存储信息。
虚拟机管理平台根据标识信息中的虚拟硬盘编号和虚拟硬盘数据块偏移 号获取该数据块的在物理硬盘上的存储位置,并根据该存储位置从物理硬盘中 读取数据块到内存中, 并将数据块发送给对应的虚拟机, 然后根据该数据块所 存储的内存地址更新重复数据记录表中与该数据块的标识信息对应的存储信 息。
因此, 与现有技术相比, 当虚拟机读取数据块时, 根据所需读取数据块的 标识信息获取对应的存储信息, 并根据存储信息从内存中直接读取数据块, 不 再从物理硬盘中读取,从而实现了多台虚拟机在访问位于物理硬盘不同位置的 相同数据内容时, 减少了对物理硬盘的反复访问,提高了对虚拟机的读请求响 应速度, 并且延长了硬盘的使用寿命。
本领域普通技术人员可以理解:实现上述方法实施例的全部或部分步骤可 以通过程序指令相关的硬件来完成,前述的程序可以存储于一计算机可读取存 储介质中, 该程序在执行时, 执行包括上述方法实施例的步骤; 而前述的存储 介质包括: ROM、 RAM, 磁碟或者光盘等各种可以存储程序代码的介质。
图 4 为本发明实施例提供的一个虚拟机管理平台的结构示意图, 如图 4 所示, 该虚拟机管理平台包括: 获取模块 11、 处理模块 12和读取模块 13 , 其 中, 获取模块 11用于扫描物理硬盘上存储的、 与各个虚拟硬盘对应的数据块, 获取各个虚拟硬盘之间数据块内容相同的各重复数据块组,并在重复数据记录 表中存储各重复数据块组中各数据块的标识信息和存储信息的对应关系;处理 模块 12用于当各重复数据块组中任一数据块从所述物理硬盘中存储到内存中 时,根据内存地址更新所述重复数据记录表中所述数据块所在的重复数据块组 中的所有存储信息; 读取模块 13用于在读取数据块时, 根据所需读取数据块 的标识信息获取对应的存储信息,并根据所述存储信息从所述内存中读取所述 数据块。
其中, 获取模块 11具体用于: 釆用哈希算法对每个数据块的数据内容进 行计算获取哈希值, 比较每个数据块的哈希值, 获取各个虚拟硬盘之间哈希值 相同的各重复数据块组。
本实施例提供的虚拟机管理平台中各模块的功能和处理流程,可以参见上 述所示的方法实施例, 其实现原理和技术效果类似, 此处不再赘述。
图 5为本发明实施例提供的另一个虚拟机管理平台的结构示意图, 如图 5 所示, 基于图 4所示实施例, 该虚拟机管理平台还包括: 写入模块 14, 用于 接收携带标识信息的数据块写入请求 ,根据所述标识信息查询所述重复数据记 录表 ,若判断获知所述重复数据记录表中存储有与所述标识信息对应的存储信 息, 则根据所述标识信息将所述数据块写入所述物理硬盘中, 并删除所述重复 数据记录表中与所述数据块对应的存储信息。
进一步地, 处理模块 12还用于: 根据预设的触发条件对重复数据记录表 进行更新操作, 其中,触发条件包括当物理硬盘的写数据块数量大于等于预设 第一阔值时, 或者, 物理机处于空闲状态时(指物理机的 CPU使用率及内存 使用量在一段时间范围内小于等于预设的第二阔值时), 或者, 每次重复数据 记录表上的存储信息被删除时。 具体更新过程为: 虚拟机管理平台获取上一次 更新到此次更新时间范围之间所有新写入的数据块,然后计算每个数据块的哈 希值,将新的哈希值与已有哈希值进行对比, 若判断获知新的数据块归属于重 复数据记录表中已有的重复数据块组时,则在该重复数据块组中添加新数据块 的标识信息和存储信息的对应关系;若判断获知具有新的数据内容相同的重复 数据块组,则在重复数据记录表中添加新的重复数据块组中各数据块的标识信 息和存储信息的对应关系;若判断获知重复数据记录表中已有的重复数据块组 中的数据块数量为一个,则在重复数据记录表中删除该数据块的标识信息和存 储信息的对应关系。
基于上述实施例, 读取模块 13具体用于: 接收携带标识信息的数据块读 取请求,根据所述标识信息查询所述重复数据记录表, 若判断获知所述重复数 据记录表中存储有与所述标识信息对应的存储信息,且所述存储信息标识所述 数据块存储在所述内存中, 则根据所述存储信息从所述内存中读取所述数据 块。
进一步地, 读取模块 13还用于: 若所述存储信息标识所述数据块存储在 所述物理硬盘中, 则根据所述标识信息从所述物理硬盘存储中读取所述数据 块, 并将所述数据块存储到所述内存中, 并更新所述重复数据记录表中与所述 数据块对应的存储信息。
本实施例提供的虚拟机管理平台中各模块的功能和处理流程,可以参见上 述所示的方法实施例, 其实现原理和技术效果类似, 此处不再赘述。 示, 所述虚拟机管理平台 300包括: 处理器 301、 存储器 302、 通信接口 303和 总线 304。其中,所述处理器 301、存储器 302和通信接口 303之间通过总线 304 连接。 总线 304可以是 ISA总线、 PCI总线或曰 SA总线等。 所述总线可以分为 地址总线、 数据总线、 控制总线等。 为便于表示, 图 6中仅用一条粗线表示, 但并不表示仅有一根总线或一种类型的总线。
存储器 302用于存放程序代码,所述程序代码中包括计算机操作指令。存 储器 302 可以是高速随机存储器, 也可以是非易失性存储器 (non-volatile memory ), 例 ^口至少一个磁盘存者器。
处理器 301执行所述程序代码, 用于:
扫描物理硬盘上存储的、 与各个虚拟硬盘对应的数据块, 获取各个虚拟硬 盘之间数据块内容相同的各重复数据块组,并在重复数据记录表中存储各重复 数据块组中各数据块的标识信息和存储信息的对应关系;
当各重复数据块组中任一数据块从所述物理硬盘中存储到内存中时,根据 内存地址更新所述重复数据记录表中所述数据块所在的重复数据块组中的所 有存储信息, 以便在读取数据块时,根据所需读取数据块的标识信息获取对应 的存储信息, 并根据所述存储信息从所述内存中读取所述数据块。
所述处理器 301 获取各个虚拟硬盘之间数据块内容相同的各重复数据块 组的过程具体包括:
釆用哈希算法对每个数据块的数据内容进行计算获取哈希值;
比较每个数据块的哈希值,获取各个虚拟硬盘之间哈希值相同的各重复数 据块组。
进一步地, 所述处理器 301还用于:
接收携带标识信息的数据块写入请求; 根据所述标识信息查询所述重复数据记录表,若判断获知所述重复数据记 录表中存储有与所述标识信息对应的存储信息,则根据所述标识信息将所述数 据块写入所述物理硬盘中,并删除所述重复数据记录表中与所述数据块对应的 存储信息。
所述处理器 301根据所需读取数据块的标识信息获取对应的存储信息,并 根据所述存储信息从所述内存中读取所述数据块的过程具体包括:
接收携带标识信息的数据块读取请求;
根据所述标识信息查询所述重复数据记录表,若判断获知所述重复数据记 录表中存储有与所述标识信息对应的存储信息 ,且所述存储信息标识所述数据 块存储在所述内存中, 则根据所述存储信息从所述内存中读取所述数据块。
所述处理器 301根据所需读取数据块的标识信息获取对应的存储信息,并 根据所述存储信息从所述内存中读取所述数据块的过程还包括:
若所述存储信息标识所述数据块存储在所述物理硬盘中,则根据所述标识 信息从所述物理硬盘存储中读取所述数据块,并将所述数据块存储到所述内存 中, 并更新所述重复数据记录表中与所述数据块对应的存储信息。
本实施例提供的虚拟机管理平台中处理器执行存储器中的程序代码的处 理流程, 可以参见上述所示的方法实施例, 其实现原理和技术效果类似, 此处 不再赘述。
最后应说明的是: 以上实施例仅用以说明本发明的技术方案, 而非对其限 制; 尽管参照前述实施例对本发明进行了详细的说明, 本领域的普通技术人员 应当理解: 其依然可以对前述各实施例所记载的技术方案进行修改, 或者对其 中部分技术特征进行等同替换; 而这些修改或者替换, 并不使相应技术方案的 本质脱离本发明各实施例技术方案的精神和范围。

Claims

WO 2014/094421 4-τ-τ Cii τττί - - -Ν· PCT/CN2013/079573 权 利 要 求 书
1、 一种数据处理方法, 其特征在于, 包括:
扫描物理硬盘上存储的、 与各个虚拟硬盘对应的数据块, 获取各个虚拟硬 盘之间数据块内容相同的各重复数据块组,并在重复数据记录表中存储各重复 数据块组中各数据块的标识信息和存储信息的对应关系;
当各重复数据块组中任一数据块从所述物理硬盘中存储到内存中时,根据 内存地址更新所述重复数据记录表中所述数据块所在的重复数据块组中的所 有存储信息, 以便在读取数据块时,根据所需读取数据块的标识信息获取对应 的存储信息, 并根据所述存储信息从所述内存中读取所述数据块。
2、 根据权利要求 1所述的数据处理方法, 其特征在于, 所述获取各个虚 拟硬盘之间数据块内容相同的各重复数据块组包括:
釆用哈希算法对每个数据块的数据内容进行计算获取哈希值;
比较每个数据块的哈希值,获取各个虚拟硬盘之间哈希值相同的各重复数 据块组。
3、 根据权利要求 1或 2所述的数据处理方法, 其特征在于, 还包括: 接收携带标识信息的数据块写入请求;
根据所述标识信息查询所述重复数据记录表,若判断获知所述重复数据记 录表中存储有与所述标识信息对应的存储信息,则根据所述标识信息将所述数 据块写入所述物理硬盘中,并删除所述重复数据记录表中与所述数据块对应的 存储信息。
4、 根据权利要求 1或 2所述的数据处理方法, 其特征在于, 所述根据所 需读取数据块的标识信息获取对应的存储信息,并根据所述存储信息从所述内 存中读取所述数据块包括:
接收携带标识信息的数据块读取请求;
根据所述标识信息查询所述重复数据记录表,若判断获知所述重复数据记 录表中存储有与所述标识信息对应的存储信息 ,且所述存储信息标识所述数据 块存储在所述内存中, 则根据所述存储信息从所述内存中读取所述数据块。
5、 根据权利要求 4所述的数据处理方法, 其特征在于, 还包括: 若所述存储信息标识所述数据块存储在所述物理硬盘中,则根据所述标识 信息从所述物理硬盘存储中读取所述数据块,并将所述数据块存储到所述内存 中, 并更新所述重复数据记录表中与所述数据块对应的存储信息。
6、 一种虚拟机管理平台, 其特征在于, 包括:
获取模块, 用于扫描物理硬盘上存储的、 与各个虚拟硬盘对应的数据块, 获取各个虚拟硬盘之间数据块内容相同的各重复数据块组,并在重复数据记录 表中存储各重复数据块组中各数据块的标识信息和存储信息的对应关系; 处理模块,用于当各重复数据块组中任一数据块从所述物理硬盘中存储到 内存中时,根据内存地址更新所述重复数据记录表中所述数据块所在的重复数 据块组中的所有存储信息;
读取模块, 用于在读取数据块时,根据所需读取数据块的标识信息获取对 应的存储信息, 并根据所述存储信息从所述内存中读取所述数据块。
7、 根据权利要求 6所述的虚拟机管理平台, 其特征在于, 所述获取模块 具体用于:
釆用哈希算法对每个数据块的数据内容进行计算获取哈希值;
比较每个数据块的哈希值,获取各个虚拟硬盘之间哈希值相同的各重复数 据块组。
8、 根据权利要求 6或 7所述的虚拟机管理平台, 其特征在于, 还包括: 写入模块, 用于
接收携带标识信息的数据块写入请求;
根据所述标识信息查询所述重复数据记录表,若判断获知所述重复数据记 录表中存储有与所述标识信息对应的存储信息,则根据所述标识信息将所述数 据块写入所述物理硬盘中,并删除所述重复数据记录表中与所述数据块对应的 存储信息。
9、 根据权利要求 6或 7所述的虚拟机管理平台, 其特征在于, 所述读取 模块具体用于:
接收携带标识信息的数据块读取请求;
根据所述标识信息查询所述重复数据记录表,若判断获知所述重复数据记 录表中存储有与所述标识信息对应的存储信息 ,且所述存储信息标识所述数据 块存储在所述内存中, 则根据所述存储信息从所述内存中读取所述数据块。
10、 根据权利要求 9所述的虚拟机管理平台, 其特征在于, 所述读取模块 还用于:
若所述存储信息标识所述数据块存储在所述物理硬盘中,则根据所述标识 信息从所述物理硬盘存储中读取所述数据块,并将所述数据块存储到所述内存 中, 并更新所述重复数据记录表中与所述数据块对应的存储信息。
PCT/CN2013/079573 2012-12-21 2013-07-18 数据处理方法和虚拟机管理平台 WO2014094421A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201210560675.2A CN103064797B (zh) 2012-12-21 2012-12-21 数据处理方法和虚拟机管理平台
CN201210560675.2 2012-12-21

Publications (1)

Publication Number Publication Date
WO2014094421A1 true WO2014094421A1 (zh) 2014-06-26

Family

ID=48107428

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2013/079573 WO2014094421A1 (zh) 2012-12-21 2013-07-18 数据处理方法和虚拟机管理平台

Country Status (2)

Country Link
CN (1) CN103064797B (zh)
WO (1) WO2014094421A1 (zh)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103064797B (zh) * 2012-12-21 2016-06-29 华为技术有限公司 数据处理方法和虚拟机管理平台
CN103593147B (zh) * 2013-11-07 2016-08-17 华为技术有限公司 一种数据读取的方法及装置
CN104185060B (zh) * 2014-02-26 2017-07-07 无锡天脉聚源传媒科技有限公司 一种视频排重的方法及装置
JP6042025B2 (ja) * 2014-02-27 2016-12-14 三菱電機株式会社 ソフトウェア搭載機器及びソフトウェア更新方法
CN104951244B (zh) * 2014-03-31 2018-04-27 伊姆西公司 用于存取数据的方法和设备
CN104765571A (zh) * 2015-03-17 2015-07-08 深信服网络科技(深圳)有限公司 虚拟数据写入、读取的方法及系统
CN105929851B (zh) * 2016-04-07 2019-08-09 广州盈可视电子科技有限公司 一种采用摇杆设备控制云台方法和装置
EP3985949B1 (en) * 2017-12-26 2024-07-31 Huawei Technologies Co., Ltd. Method and apparatus for managing storage device in storage system
CN112181301A (zh) * 2020-09-27 2021-01-05 北京金山云网络技术有限公司 云硬盘的数据导出方法、装置以及服务端设备
CN112433675B (zh) * 2020-11-23 2024-03-08 山东可信云信息技术研究院 一种针对超融合架构的存储空间优化方法及系统
CN112530474B (zh) * 2020-12-29 2024-02-23 北京中科开迪软件有限公司 一种智能硬盘存放柜
CN114138198B (zh) * 2021-11-29 2024-05-28 苏州浪潮智能科技有限公司 一种数据重删的方法、装置、设备及可读介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7606868B1 (en) * 2006-03-30 2009-10-20 Wmware, Inc. Universal file access architecture for a heterogeneous computing environment
CN101697134A (zh) * 2009-10-27 2010-04-21 北京大学 一种支持相似虚拟机快速启动的方法
CN102467408A (zh) * 2010-11-12 2012-05-23 阿里巴巴集团控股有限公司 一种虚拟机数据的访问方法和设备
CN102722450A (zh) * 2012-05-25 2012-10-10 清华大学 一种基于位置敏感哈希的删冗块设备存储方法
CN103064797A (zh) * 2012-12-21 2013-04-24 华为技术有限公司 数据处理方法和虚拟机管理平台

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008033392A (ja) * 2006-07-26 2008-02-14 Nec Corp 仮想計算機システム及びその動作方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7606868B1 (en) * 2006-03-30 2009-10-20 Wmware, Inc. Universal file access architecture for a heterogeneous computing environment
CN101697134A (zh) * 2009-10-27 2010-04-21 北京大学 一种支持相似虚拟机快速启动的方法
CN102467408A (zh) * 2010-11-12 2012-05-23 阿里巴巴集团控股有限公司 一种虚拟机数据的访问方法和设备
CN102722450A (zh) * 2012-05-25 2012-10-10 清华大学 一种基于位置敏感哈希的删冗块设备存储方法
CN103064797A (zh) * 2012-12-21 2013-04-24 华为技术有限公司 数据处理方法和虚拟机管理平台

Also Published As

Publication number Publication date
CN103064797A (zh) 2013-04-24
CN103064797B (zh) 2016-06-29

Similar Documents

Publication Publication Date Title
WO2014094421A1 (zh) 数据处理方法和虚拟机管理平台
US10871960B2 (en) Upgrading a storage controller operating system without rebooting a storage system
US10534547B2 (en) Consistent transition from asynchronous to synchronous replication in hash-based storage systems
US20210141917A1 (en) Low latency access to physical storage locations by implementing multiple levels of metadata
US11436157B2 (en) Method and apparatus for accessing storage system
US10210191B2 (en) Accelerated access to objects in an object store implemented utilizing a file storage system
US10552089B2 (en) Data processing for managing local and distributed storage systems by scheduling information corresponding to data write requests
EP3531666B1 (en) Method for managing storage devices in a storage system, and storage system
CN105518631B (zh) 内存管理方法、装置和系统、以及片上网络
CN107153512B (zh) 一种数据迁移方法和装置
US20150286414A1 (en) Scanning memory for de-duplication using rdma
CN112632069B (zh) 哈希表数据存储管理方法、装置、介质和电子设备
US11321021B2 (en) Method and apparatus of managing mapping relationship between storage identifier and start address of queue of storage device corresponding to the storage identifier
US20180239649A1 (en) Multi Root I/O Virtualization System
JP6268116B2 (ja) データ処理装置、データ処理方法およびコンピュータプログラム
US20190114076A1 (en) Method and Apparatus for Storing Data in Distributed Block Storage System, and Computer Readable Storage Medium
US11016676B2 (en) Spot coalescing of distributed data concurrent with storage I/O operations
CN110417777A (zh) 一种优化的微服务间通信的方法及装置
US10061725B2 (en) Scanning memory for de-duplication using RDMA
US12032849B2 (en) Distributed storage system and computer program product
WO2019143967A1 (en) Methods for automated artifact storage management and devices thereof
CN110658999B (zh) 一种信息更新方法、装置、设备及计算机可读存储介质
US11061835B1 (en) Sensitivity matrix for system load indication and overload prevention
CN114640678A (zh) 基于SR-IOV的Pod管理方法、设备及介质
CN105786608A (zh) 异地虚拟机去重迁移方法及系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13865918

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 13865918

Country of ref document: EP

Kind code of ref document: A1