WO2021057377A1 - Data storage method and data storage device - Google Patents

Data storage method and data storage device Download PDF

Info

Publication number
WO2021057377A1
WO2021057377A1 PCT/CN2020/111903 CN2020111903W WO2021057377A1 WO 2021057377 A1 WO2021057377 A1 WO 2021057377A1 CN 2020111903 W CN2020111903 W CN 2020111903W WO 2021057377 A1 WO2021057377 A1 WO 2021057377A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
storage
storage device
processing software
nodes
Prior art date
Application number
PCT/CN2020/111903
Other languages
French (fr)
Chinese (zh)
Inventor
杨艳伟
孙荣宗
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2021057377A1 publication Critical patent/WO2021057377A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0644Management of space entities, e.g. partitions, extents, pools
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0662Virtualisation aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Retry When Errors Occur (AREA)

Abstract

A data storage method, the data storage method comprising: sending a first control instruction, the first control instruction instructing to install data processing software in N storage devices and creating storage resource pools and virtual machines in any storage device among the N storage devices, any virtual machine using a created storage resource pool in a corresponding storage device thereof to store data, any virtual machine serving as a selectable data node for the data processing software, and the N being an integer greater than or equal to two (101); acquiring a configuration instruction, the configuration instruction comprising configuring a number of copies M for data storage, and a storage awareness policy, the storage awareness policy comprising determining M data nodes for storing data, the M data nodes being located in M different storage devices, the M being an integer smaller than or equal to N (102); according to the configuration instruction, determining the M data nodes for saving data to be stored (103); and storing the data in the M data nodes (104). Configuring a storage awareness policy is advantageous for improving the reliability of data storage.

Description

一种数据存储方法和数据存储装置Data storage method and data storage device
本申请要求于2019年9月27日提交中国国家知识产权局、申请号为201910926872.3、发明名称为“一种数据存储方法和数据存储装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the State Intellectual Property Office of China, the application number is 201910926872.3, and the title of the invention is "a data storage method and data storage device" on September 27, 2019, the entire content of which is incorporated by reference In this application.
技术领域Technical field
本申请涉及计算机网络领域,尤其涉及一种数据存储方法和数据存储装置。This application relates to the field of computer networks, and in particular to a data storage method and data storage device.
背景技术Background technique
为了提高数据存储的安全性,在进行数据存储时,通常采用多副本的方法,将一份数据文件复制成多份副本分别存储到多台服务器中或者磁阵中,以数据文件保存到多台服务器中为例,只要保存有该数据文件副本的任意一台服务器处于可用状态,该数据文件就能够被访问,这样可以避免单台服务器由于网络故障、磁盘损坏、断电、宕机等原因造成数据丢失及不可访问的问题。In order to improve the security of data storage, multiple copies are usually used when storing data. One data file is copied into multiple copies and stored in multiple servers or magnetic arrays, and the data files are saved to multiple computers. Take the server as an example. As long as any server that saves a copy of the data file is available, the data file can be accessed, which can prevent a single server from being caused by network failure, disk damage, power outage, downtime, etc. Data loss and inaccessibility issues.
多副本存储方式通过高冗余来实现数据存储的可靠性,需要说明的是,这种方式虽然具有数据易于恢复的优点,但是,在存储时有时会出现多次备份保存在同一个数据存储装置中的现象,当保存了多个备份数据的数据存储装置出现断电、宕机等情况时,可以使用的副本数量并非是总副本数量减一,而是总副本数量减去当前数据存储装置中存储的副本数量,也就是实际可用的副本数量比预期要少,这使得存储的可靠性降低。The multi-copy storage method achieves data storage reliability through high redundancy. It should be noted that although this method has the advantage of easy data recovery, sometimes multiple backups are stored in the same data storage device during storage. When the data storage device that saves multiple backup data is out of power, downtime, etc., the number of copies that can be used is not the total number of copies minus one, but the total number of copies minus the current data storage device The number of copies stored, that is, the number of copies actually available is less than expected, which reduces the reliability of storage.
因此,如何提升数据存储的可靠性是需要解决的问题。Therefore, how to improve the reliability of data storage is a problem that needs to be solved.
发明内容Summary of the invention
本申请实施例提供了一种数据存储方法和数据存储装置,可以提高数据存储的可靠性。The embodiments of the present application provide a data storage method and a data storage device, which can improve the reliability of data storage.
第一方面,本申请实施例提供了一种数据存储方法,所述方法包括如下步骤:In the first aspect, an embodiment of the present application provides a data storage method, and the method includes the following steps:
发送第一控制指令,所述第一控制指令指示在N个存储装置中安装数据处理软件,在所述N个存储装置中的任一存储装置中创建存储资源池和虚拟机,任一所述虚拟机使用与其对应的存储装置中已创建的存储资源池存储数据,任一所述虚拟机作为所述数据处理软件的可选数据节点,所述N是大于或者等于2的整数;Send a first control instruction, the first control instruction instructs to install data processing software in N storage devices, create a storage resource pool and a virtual machine in any one of the N storage devices, any one of the The virtual machine uses the created storage resource pool in its corresponding storage device to store data, any one of the virtual machines is used as an optional data node of the data processing software, and the N is an integer greater than or equal to 2;
获取配置指令,所述配置指令包括:设置进行数据存储时的副本数M,以及存储感知策略;所述存储感知策略包括:确定M个数据节点用于存储数据,所述M个数据节点位于M个不同的存储装置中,所述M为小于或者等于所述N的整数;Acquire a configuration instruction, the configuration instruction includes: setting the number of copies M for data storage, and a storage awareness strategy; the storage awareness strategy includes: determining M data nodes for storing data, and the M data nodes are located in M In different storage devices, the M is an integer less than or equal to the N;
根据所述配置指令确定用于保存待存储数据的M个数据节点;Determining, according to the configuration instruction, M data nodes for storing data to be stored;
将所述待存储数据存储在所述M个数据节点中。Storing the data to be stored in the M data nodes.
本申请实施例提供的数据存储方法在进行多副本存储时,不同的副本位于不同的存储装置中,这样当保存备份数据的一个存储装置出现故障时,实际减少的副本数是一个,相对于现有技术中保存一个副本的存储装置出现问题时,可能会导致多个副本不能用的情况来说,提高了存储的可靠性。In the data storage method provided by the embodiments of the present application, when multiple copies are stored, different copies are located in different storage devices. In this way, when a storage device storing backup data fails, the number of copies actually reduced is one, which is compared with the current one. In some technologies, when a problem occurs in a storage device that stores one copy, it may cause multiple copies to be unusable, which improves storage reliability.
在一些可能的实施方式中,所述N个存储装置中创建的存储资源池采用纠删码EC编码。本申请实施例在存储装置创建的存储资源池中采用EC编码,在数据丢失或者损坏时,可以把丢失的数据计算出来,保证某些数据损失之后仍能继续使用。相对于传统的分布式系统在硬件失效等故障发生后仍然能继续提供服务采用的每份数据3副本的方式来说,本申请实施例节省了存储空间,提高了存储利用率。In some possible implementation manners, the storage resource pools created in the N storage devices adopt erasure code EC encoding. In the embodiment of the present application, EC coding is used in the storage resource pool created by the storage device. When data is lost or damaged, the lost data can be calculated to ensure that some data can still be used after loss. Compared with a traditional distributed system that can continue to provide services after 3 copies of data after hardware failures and other faults occur, the embodiment of the present application saves storage space and improves storage utilization.
在一些可能的实施方式中,在述EC编码采用8数据块1校验块8D1P模式时,硬盘的存储利用率为88.89%,在EC编码采用4D1P模式时,硬盘的存储利用率为80%,在EC编码采用8D2P模式时,硬盘的存储利用率为80%,在EC编码采用4D2P模式时,硬盘的存储利用率为66.67%。In some possible implementations, when the EC encoding adopts 8 data blocks and 1 check block 8D1P mode, the storage utilization rate of the hard disk is 88.89%, and when the EC encoding adopts 4D1P mode, the storage utilization rate of the hard disk is 80%. When the EC encoding adopts the 8D2P mode, the storage utilization rate of the hard disk is 80%, and when the EC encoding adopts the 4D2P mode, the storage utilization rate of the hard disk is 66.67%.
第二方面,本申请实施例提供了一种数据存储装置,所述存储装置包括:In a second aspect, an embodiment of the present application provides a data storage device, and the storage device includes:
发送单元,用于发送第一控制指令,所述第一控制指令指示在N个存储装置中安装数据处理软件,在所述N个存储装置中的任一存储装置中创建存储资源池和虚拟机,任一所述虚拟机使用与其对应的存储装置中已创建的存储资源池存储数据,任一所述虚拟机作为所述数据处理软件的可选数据节点,所述N是大于或者等于2的整数。The sending unit is configured to send a first control instruction that instructs to install data processing software in N storage devices, and create a storage resource pool and a virtual machine in any one of the N storage devices , Any one of the virtual machines uses the created storage resource pool in the corresponding storage device to store data, any one of the virtual machines is used as an optional data node of the data processing software, and the N is greater than or equal to 2 Integer.
获取单元,用于获取配置指令,所述配置指令包括:设置进行数据存储时的副本数M,以及存储感知策略;所述存储感知策略包括:确定M个数据节点用于存储数据,所述M个数据节点位于M个不同的存储装置中,所述M为小于或者等于所述N的整数。The acquiring unit is configured to acquire a configuration instruction, the configuration instruction includes: setting the number of copies M for data storage, and a storage awareness strategy; the storage awareness strategy includes: determining M data nodes for storing data, the M Data nodes are located in M different storage devices, and the M is an integer less than or equal to the N.
确定单元,用于根据所述配置指令确定用于保存待存储数据的M个数据节点。The determining unit is configured to determine M data nodes for storing data to be stored according to the configuration instruction.
处理单元,用于将所述待存储数据存储在所述M个数据节点中。The processing unit is configured to store the data to be stored in the M data nodes.
本申请实施例提供的数据存储装置在进行多副本存储时,不同的副本位于不同的存储装置中,这样当保存备份数据的一个存储装置出现故障时,实际减少的副本数是一个,相对于现有技术中保存一个副本的存储装置出现问题时,可能会导致多个副本不能用的情况来说,体高了存储的可靠性。When the data storage device provided in the embodiment of the present application performs multiple copy storage, different copies are located in different storage devices. In this way, when a storage device storing backup data fails, the number of copies actually reduced is one, which is compared with the current one. When there is a problem with the storage device that stores one copy in the technology, it may cause multiple copies to be unusable, which increases the reliability of storage.
在一些可能的实施方式中,所述N个存储装置中创建的存储资源池采用纠删码EC编码。本申请实施例在存储装置创建的存储资源池中采用EC编码,在数据丢失或者损坏时,可以把丢失的数据计算出来,保证某些数据损失之后仍能继续使用。相对于传统的分布式系统在硬件失效等故障发生后仍然能继续提供服务采用的每份数据3副本的方式来说,本申请实施例节省了存储空间,提高了存储利用率。In some possible implementation manners, the storage resource pools created in the N storage devices adopt erasure code EC encoding. In the embodiment of the present application, EC coding is used in the storage resource pool created by the storage device. When data is lost or damaged, the lost data can be calculated to ensure that some data can still be used after loss. Compared with a traditional distributed system that can continue to provide services after 3 copies of data after hardware failures and other faults occur, the embodiment of the present application saves storage space and improves storage utilization.
在一些可能的实施方式中,在述EC编码采用8数据块1校验块8D1P模式时,硬盘的存储利用率为88.89%,在EC编码采用4D1P模式时,硬盘的存储利用率为80%,在EC编码采用8D2P模式时,硬盘的存储利用率为80%,在EC编码采用4D2P模式时,硬盘的存储利用率为66.67%。In some possible implementations, when the EC encoding adopts 8 data blocks and 1 check block 8D1P mode, the storage utilization rate of the hard disk is 88.89%, and when the EC encoding adopts 4D1P mode, the storage utilization rate of the hard disk is 80%. When the EC encoding adopts the 8D2P mode, the storage utilization rate of the hard disk is 80%, and when the EC encoding adopts the 4D2P mode, the storage utilization rate of the hard disk is 66.67%.
在一些可能的实施方式中,所述存储装置包括:分布式服务器或者磁阵。In some possible implementation manners, the storage device includes: a distributed server or a magnetic array.
在一些可能的实施方式中,所述数据处理软件包括:分布式处理软件Hadoop。In some possible implementation manners, the data processing software includes: distributed processing software Hadoop.
在一些可能的实施方式中,所述M=2。In some possible implementations, the M=2.
第三方面,本申请实施例提供了一种数据存储系统,包括N个存储装置如第二方面或者第二方面任一种可能的实施方式中所述的数据存储装置,所述N是大于或者等于2的整数。In a third aspect, an embodiment of the present application provides a data storage system, including N storage devices such as the data storage device described in the second aspect or any one of the possible implementation manners of the second aspect, where N is greater than or An integer equal to 2.
第四方面,本申请实施例提供了一种电子设备,包括:In a fourth aspect, an embodiment of the present application provides an electronic device, including:
一个或多个处理器;One or more processors;
存储装置,用于存储一个或多个程序;Storage device for storing one or more programs;
当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如第一方面或者第一方面任一种可能的实施方式中所述的方法。When the one or more programs are executed by the one or more processors, the one or more processors implement the method described in the first aspect or any one of the possible implementation manners of the first aspect.
第五方面,本申请实施例提供了一种计算机可读介质,其上存储有计算机程序,所述程序被处理器执行时,实现如第一方面或者第一方面任一种可能的实施方式中所述的方法。In the fifth aspect, the embodiments of the present application provide a computer-readable medium on which a computer program is stored. When the program is executed by a processor, the implementation is as in the first aspect or any one of the possible implementation manners of the first aspect The method described.
附图说明Description of the drawings
图1是本申请一个实施例提供的数据存储方法的流程示意图。FIG. 1 is a schematic flowchart of a data storage method provided by an embodiment of the present application.
图2是本申请另一个实施例提供的数据存储方法的流程示意图。Fig. 2 is a schematic flowchart of a data storage method provided by another embodiment of the present application.
图3是本申请一个实施例提供的数据存储方法的交互流程示意图。FIG. 3 is a schematic diagram of the interaction flow of a data storage method provided by an embodiment of the present application.
图4是本申请一个实施例提供的数据存储装置的结构示意图。Fig. 4 is a schematic structural diagram of a data storage device provided by an embodiment of the present application.
具体实施方式detailed description
下面将结合附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请的一部分实施例,并不是全部的实施例。基于本申请中的实施例,本领域普通技术人员还可以获得他实施例。The technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of the present application, not all of the embodiments. Based on the embodiments in this application, those of ordinary skill in the art can also obtain other embodiments.
请参见图1,图1是本申请一个实施例提供的一种数据处理方法,包括如下步骤。Please refer to FIG. 1. FIG. 1 is a data processing method provided by an embodiment of the present application, which includes the following steps.
101、发送第一控制指令,所述第一控制指令指示在N个存储装置中安装数据处理软件,在所述N个存储装置中的任一存储装置中创建存储资源池和虚拟机,任一所述虚拟机使用与其对应的存储装置中已创建的存储资源池存储数据,任一所述虚拟机作为所述数据处理软件的可选数据节点,所述N是大于或者等于2的整数。101. Send a first control instruction, the first control instruction instructs to install data processing software in N storage devices, and create a storage resource pool and a virtual machine in any one of the N storage devices, any The virtual machine uses the created storage resource pool in the corresponding storage device to store data, any one of the virtual machines is used as an optional data node of the data processing software, and the N is an integer greater than or equal to 2.
举例来说,存储装置可以是分布式服务器或者磁阵等。For example, the storage device may be a distributed server or a magnetic array.
102、获取配置指令,所述配置指令包括:设置进行数据存储时的副本数M,以及存储感知策略;所述存储感知策略包括:确定M个数据节点用于存储数据,所述M个数据节点位于M个不同的存储装置中,所述M为小于或者等于所述N的整数。102. Obtain a configuration instruction, where the configuration instruction includes: setting the number of copies M for data storage and a storage awareness strategy; the storage awareness strategy includes: determining M data nodes for storing data, the M data nodes Located in M different storage devices, the M is an integer less than or equal to the N.
举例来说,若N为3,M为2,则第一控制指令指示在3个存储装置中安装数据处理软件,在3个存储装置中的任一存储装置中创建存储资源池和虚拟机,任意虚拟机使用与其对应的存储装置中已创建的存储资源池创建数据,已创建的3个虚拟机可以作为数据处理软件的可选数据节点。在一些可能的实施例中,数据处理软件可以是分布式处理软件Hadoop。For example, if N is 3 and M is 2, the first control instruction instructs to install data processing software in three storage devices, and create storage resource pools and virtual machines in any one of the three storage devices. Any virtual machine uses the created storage resource pool in its corresponding storage device to create data, and the three created virtual machines can be used as optional data nodes of the data processing software. In some possible embodiments, the data processing software may be distributed processing software Hadoop.
103、根据所述配置指令确定用于保存待存储数据的M个数据节点。103. Determine, according to the configuration instruction, M data nodes for storing data to be stored.
举例来说,若M为2,则根据配置指令确定用于保存待存储数据的两个数据节点。For example, if M is 2, two data nodes for storing the data to be stored are determined according to the configuration instruction.
104、将所述待存储数据存储在所述M个数据节点中。104. Store the data to be stored in the M data nodes.
举例来说,若M为2,则将待存储的数据保存到确定的连个数据节点中。For example, if M is 2, the data to be stored is saved to a certain number of data nodes.
本申请实施例提供的数据存储方法在进行多副本存储时,不同的副本位于不同的存储装置中,这样当保存备份数据的一个存储装置出现故障时,实际减少的副本数是一个,相对于现有技术中保存一个副本的存储装置出现问题时,可能会导致多个副本不能用的情况来说,体高了存储的可靠性。In the data storage method provided by the embodiments of the present application, when multiple copies are stored, different copies are located in different storage devices. In this way, when a storage device storing backup data fails, the number of copies actually reduced is one, which is compared with the current one. When there is a problem with the storage device that stores one copy in the technology, it may cause multiple copies to be unusable, which increases the reliability of storage.
请参见图2,图2是本申请另一实施例提供的一种数据处理方法的流程示意图。包括如下步骤:Please refer to FIG. 2, which is a schematic flowchart of a data processing method provided by another embodiment of the present application. Including the following steps:
201、发送第一控制指令,所述第一控制指令指示在N个存储装置中安装数据处理软件,在所述N个存储装置中的任一存储装置中创建存储资源池和虚拟机,任一所述虚拟机使用与其对应的存储装置中已创建的存储资源池存储数据,任一所述虚拟机作为所述数据处理软件的可选数据节点,所述N是大于或者等于2的整数,存储资源池采用纠删码EC编码。201. Send a first control instruction, the first control instruction instructs to install data processing software in N storage devices, create a storage resource pool and a virtual machine in any one of the N storage devices, any The virtual machine uses the created storage resource pool in its corresponding storage device to store data, any one of the virtual machines is used as an optional data node of the data processing software, and the N is an integer greater than or equal to 2, and the storage The resource pool is coded with erasure code EC.
举例来说,存储装置可以是分布式服务器或者磁阵等。以下以分布式服务器为例进行描述。For example, the storage device may be a distributed server or a magnetic array. The following describes the distributed server as an example.
在一些可能的实施方式中,EC编码采用8数据块1校验块8D1P模式时,硬盘的存储利用率为88.89%,在EC编码采用4D1P模式时,硬盘的存储利用率为80%,在EC编码采用8D2P模式时,硬盘的存储利用率为80%,在EC编码采用4D2P模式时,硬盘的存储利用率为66.67%。In some possible implementations, when the EC encoding adopts 8 data block and 1 parity block 8D1P mode, the storage utilization rate of the hard disk is 88.89%. When the EC encoding adopts 4D1P mode, the storage utilization rate of the hard disk is 80%. When the encoding adopts the 8D2P mode, the storage utilization rate of the hard disk is 80%, and when the EC encoding adopts the 4D2P mode, the storage utilization rate of the hard disk is 66.67%.
202、获取配置指令,所述配置指令包括:设置进行数据存储时的副本数M,以及存储感知策略;所述存储感知策略包括:确定M个数据节点用于存储数据,所述M个数据节点位于M个不同的存储装置中,所述M为小于或者等于所述N的整数。202. Obtain a configuration instruction, where the configuration instruction includes: setting the number of copies M for data storage, and a storage-aware strategy; the storage-aware strategy includes: determining M data nodes for storing data, and the M data nodes Located in M different storage devices, the M is an integer less than or equal to the N.
举例来说,若N为3,M为2,则第一控制指令指示在3个存储装置中安装数据处理软件,在3个存储装置中的任一存储装置中创建存储资源池和虚拟机,任意虚拟机使用与其对应的存储装置中已创建的存储资源池创建数据,已创建的3个虚拟机可以作为数据处理软件的可选数据节点。在一些可能的实施例中,数据处理软件可以是分布式处理软件Hadoop。For example, if N is 3 and M is 2, the first control instruction instructs to install data processing software in three storage devices, and create storage resource pools and virtual machines in any one of the three storage devices. Any virtual machine uses the created storage resource pool in its corresponding storage device to create data, and the three created virtual machines can be used as optional data nodes of the data processing software. In some possible embodiments, the data processing software may be distributed processing software Hadoop.
203、根据所述配置指令确定用于保存待存储数据的M个数据节点。203. Determine, according to the configuration instruction, M data nodes for storing data to be stored.
举例来说,若M为2,则根据配置指令确定用于保存待存储数据的两个数据节点。For example, if M is 2, two data nodes for storing the data to be stored are determined according to the configuration instruction.
204、将所述待存储数据存储在所述M个数据节点中。204. Store the data to be stored in the M data nodes.
举例来说,若M为2,则将待存储的数据保存到确定的连个数据节点中。For example, if M is 2, the data to be stored is saved to a certain number of data nodes.
本申请实施例在存储装置创建的存储资源池中采用EC编码,在数据丢失或者损坏时,可以把丢失的数据计算出来,保证某些数据损失之后仍能继续使用。相对于传统的分布式系统在硬件失效等故障发生后仍然能继续提供服务采用的每份数据3副本的方式来说,本申请实施例节省了存储空间,提高了存储利用率。In the embodiment of the present application, EC coding is used in the storage resource pool created by the storage device. When data is lost or damaged, the lost data can be calculated to ensure that some data can still be used after loss. Compared with a traditional distributed system that can continue to provide services after 3 copies of data after hardware failures and other faults occur, the embodiment of the present application saves storage space and improves storage utilization.
请参见图3,图3是本申请一个实施例提供的数据存储方法的交互流程示意图。如图3所示,在该实施例中进行数据存储时,包括如下步骤。Please refer to FIG. 3, which is a schematic diagram of the interaction flow of the data storage method provided by an embodiment of the present application. As shown in FIG. 3, when data storage is performed in this embodiment, the following steps are included.
301、在第一机架Rack1和第二机架Rack2上安装软件定义存储(Software Defined Storage,SDS),并分别在Rack1和Rack2上创建不同的存储资源池,采用EC(N:N+M)编 码,在该实施例中,EC编码采用8D1P模式。301. Install Software Defined Storage (SDS) on the first rack Rack1 and the second rack Rack2, and create different storage resource pools on Rack1 and Rack2 respectively, using EC (N: N+M) Encoding. In this embodiment, the EC encoding adopts the 8D1P mode.
在该实施例中,如图3所示,可以由SDS创建存储资源池1,该存储资源池1采用8D1P模式的EC编码。以及由SDS创建资源池2,资源池2采用8D1P模式的EC编码。In this embodiment, as shown in FIG. 3, a storage resource pool 1 can be created by SDS, and the storage resource pool 1 adopts EC coding in the 8D1P mode. And the resource pool 2 is created by SDS, and the resource pool 2 adopts the EC code of the 8D1P mode.
SDS是一种能将存储软件与硬件分隔开的存储架构。不同于传统的网络附加存储(Network Attached Storage,NAS)或存储区域网络(Storage Area Network,SAN)系统,SDS一般都在行业标准系统或x86系统上执行,从而消除了软件对于专有硬件的依赖性。SDS通常采用分布式架构以提升可靠性和扩展性,所以有时也称SDS为分布式存储。其实两者区别明显,分布式存储指的是架构,强调架构是分布式的;SDS指的是软件定义存储,强调的是软硬件解耦。SDS is a storage architecture that can separate storage software and hardware. Unlike traditional Network Attached Storage (NAS) or Storage Area Network (SAN) systems, SDS is generally executed on industry standard systems or x86 systems, thereby eliminating software dependence on proprietary hardware Sex. SDS usually uses a distributed architecture to improve reliability and scalability, so SDS is sometimes called distributed storage. In fact, the difference between the two is obvious. Distributed storage refers to the architecture, which emphasizes that the architecture is distributed; SDS refers to software-defined storage, which emphasizes the decoupling of software and hardware.
SDS具有如下优点:(1)软硬件解耦。存储硬件为商用货架产品(commercial off-the-shelf,COTS)COTS,避免厂商锁定,软硬件分层采购降低设备采购成本。(2)扩展性强。SDS采用分布式架构,存储规格理论上无限扩展,存储规格随服务器数量线性增加(横向扩展)。SAN受限于控制器的处理能力,单套磁阵规格受限,存储规格超出磁阵规格后,必须增值一套存储设备(纵向扩展)。(3)可靠性高。SDS has the following advantages: (1) Software and hardware decoupling. The storage hardware is a commercial off-the-shelf (COTS) COTS, which avoids vendor lock-in, and purchases software and hardware hierarchically to reduce equipment procurement costs. (2) Strong scalability. SDS adopts a distributed architecture, and the storage specifications are theoretically unlimited, and the storage specifications increase linearly with the number of servers (horizontal expansion). SAN is limited by the processing capacity of the controller, and the specifications of a single set of magnetic arrays are limited. After the storage specifications exceed the specifications of the magnetic array, a set of storage equipment must be added (vertical expansion). (3) High reliability.
302、创建Hadoop使用的虚拟机,所述虚拟机的磁盘使用本虚拟机所在机架上的存储池。302. Create a virtual machine used by Hadoop, where the disk of the virtual machine uses a storage pool on the rack where the virtual machine is located.
具体地,Rack1上的虚拟机磁盘使用存储资源池1,Rack2上的虚拟机使用存储资源池2。Specifically, the virtual machine disk on Rack1 uses storage resource pool 1, and the virtual machine on Rack2 uses storage resource pool 2.
Hadoop是现有技术中经常使用的分布式系统基础架构,Hadoop分布式文件系统(Hadoop Distributed File System,HDFS)把节点分成两类,名字结点(Name Node)和数据节点(Data Node)。NameNode管理文件系统的命名空间。它维护着文件系统树及整棵树内所有的文件和目录。这些信息以两个文件形式永久保存在本地磁盘上:命名空间镜像文件和编辑日志文件。NameNode记录着每个文件中各个块所在的数据节点信息,但它并不永久保存块的位置信息,这些信息在系统启动时由数据节点重建。Hadoop is a distributed system infrastructure often used in the prior art. The Hadoop Distributed File System (HDFS) divides nodes into two categories, Name Node and Data Node. The NameNode manages the namespace of the file system. It maintains the file system tree and all files and directories in the entire tree. This information is permanently stored on the local disk in the form of two files: the namespace mirror file and the edit log file. The NameNode records the data node information where each block in each file is located, but it does not permanently store the location information of the block. This information is reconstructed by the data node when the system is started.
303、在步骤302创建出的虚拟机里安装Hadoop,这些虚拟机被Hadoop纳为DataNode。303. Install Hadoop in the virtual machine created in step 302, and these virtual machines are included as DataNodes by Hadoop.
304、配置Hadoop的机架感知策略,并配置副本个数为2。304. Configure Hadoop's rack awareness strategy, and configure the number of replicas to 2.
数据写入Hadoop时,会在两个机架上各取一个DataNode,这样就写入了两个不同的存储池里。机架级可靠性保证数据双活,提升了整体可靠性。When data is written to Hadoop, one DataNode is taken from each of the two racks, so that it is written to two different storage pools. Rack-level reliability guarantees data dual-active and improves overall reliability.
可以理解的,实际搭建存储系统时,机架个数可以不限于两个,多个机架也可以采用同样的策略。It is understandable that when the storage system is actually built, the number of racks may not be limited to two, and the same strategy may be adopted for multiple racks.
本申请实施例在存储装置创建的存储资源池中采用EC编码,在数据丢失或者损坏时,可以把丢失的数据计算出来,保证某些数据损失之后仍能继续使用。相对于传统的分布式系统在硬件失效等故障发生后仍然能继续提供服务采用的每份数据3副本的方式来说,本申请实施例节省了存储空间,提高了存储利用率。In the embodiment of the present application, EC coding is used in the storage resource pool created by the storage device. When data is lost or damaged, the lost data can be calculated to ensure that some data can still be used after loss. Compared with a traditional distributed system that can continue to provide services after 3 copies of data after hardware failures and other faults occur, the embodiment of the present application saves storage space and improves storage utilization.
请参见图4,图4是本申请实施例提供了一种数据存储装置400,存储装置400包括:发送单元401,用于发送第一控制指令,所述第一控制指令指示在N个存储装置中安装数据处理软件,在所述N个存储装置中的任一存储装置中创建存储资源池和虚拟机,任一所述虚拟机使用与其对应的存储装置中已创建的存储资源池存储数据,任一所述虚拟机作为所述数据处理软件的可选数据节点,所述N是大于或者等于2的整数。4, FIG. 4 is an embodiment of the present application provides a data storage device 400, the storage device 400 includes: a sending unit 401, configured to send a first control instruction, the first control instruction instructs N storage devices Install data processing software in the N storage devices, create a storage resource pool and a virtual machine in any one of the N storage devices, and any one of the virtual machines uses the created storage resource pool in the corresponding storage device to store data, Any one of the virtual machines is used as an optional data node of the data processing software, and the N is an integer greater than or equal to 2.
获取单元402,用于获取配置指令,所述配置指令包括:设置进行数据存储时的副本数M,以及存储感知策略;所述存储感知策略包括:确定M个数据节点用于存储数据,所述M个数据节点位于M个不同的存储装置中,所述M为小于或者等于所述N的整数。The obtaining unit 402 is configured to obtain a configuration instruction. The configuration instruction includes: setting the number of copies M for data storage and a storage awareness strategy; the storage awareness strategy includes: determining M data nodes for storing data, the The M data nodes are located in M different storage devices, and the M is an integer less than or equal to the N.
举例来说,若N为3,M为2,则第一控制指令指示在3个存储装置中安装数据处理软件,在3个存储装置中的任一存储装置中创建存储资源池和虚拟机,任意虚拟机使用与其对应的存储装置中已创建的存储资源池创建数据,已创建的3个虚拟机可以作为数据处理软件的可选数据节点。在一些可能的实施例中,数据处理软件可以是分布式处理软件Hadoop。For example, if N is 3 and M is 2, the first control instruction instructs to install data processing software in three storage devices, and create storage resource pools and virtual machines in any one of the three storage devices. Any virtual machine uses the created storage resource pool in its corresponding storage device to create data, and the three created virtual machines can be used as optional data nodes of the data processing software. In some possible embodiments, the data processing software may be distributed processing software Hadoop.
确定单元403,用于根据所述配置指令确定用于保存待存储数据的M个数据节点。The determining unit 403 is configured to determine M data nodes for storing data to be stored according to the configuration instruction.
举例来说,若M为2,则根据配置指令确定用于保存待存储数据的两个数据节点。For example, if M is 2, two data nodes for storing the data to be stored are determined according to the configuration instruction.
处理单元404,用于将所述待存储数据存储在所述M个数据节点中。The processing unit 404 is configured to store the data to be stored in the M data nodes.
举例来说,若M为2,则将待存储的数据保存到确定的连个数据节点中。For example, if M is 2, the data to be stored is saved to a certain number of data nodes.
本申请实施例提供的数据存储方法在进行多副本存储时,不同的副本位于不同的存储装置中,这样当保存备份数据的一个存储装置出现故障时,实际减少的副本数是一个,相对于现有技术中保存一个副本的存储装置出现问题时,可能会导致多个副本不能用的情况来说,体高了存储的可靠性。In the data storage method provided by the embodiments of the present application, when multiple copies are stored, different copies are located in different storage devices. In this way, when a storage device storing backup data fails, the number of copies actually reduced is one, which is compared with the current one. When there is a problem with the storage device that stores one copy in the technology, it may cause multiple copies to be unusable, which increases the reliability of storage.
在一些可能的实施方式中,N个存储装置中创建的存储资源池采用纠删码EC编码。本申请实施例在存储装置创建的存储资源池中采用EC编码,在数据丢失或者损坏时,可以把丢失的数据计算出来,保证某些数据损失之后仍能继续使用。相对于传统的分布式系统在硬件失效等故障发生后仍然能继续提供服务采用的每份数据3副本的方式来说,本申请实施例节省了存储空间,提高了存储利用率。In some possible implementation manners, the storage resource pools created in N storage devices are encoded with erasure code EC. In the embodiment of the present application, EC coding is used in the storage resource pool created by the storage device. When data is lost or damaged, the lost data can be calculated to ensure that some data can still be used after loss. Compared with a traditional distributed system that can continue to provide services after 3 copies of data after hardware failures and other faults occur, the embodiment of the present application saves storage space and improves storage utilization.
在一些可能的实施方式中,在述EC编码采用8数据块1校验块8D1P模式时,硬盘的存储利用率为88.89%,在EC编码采用4D1P模式时,硬盘的存储利用率为80%,在EC编码采用8D2P模式时,硬盘的存储利用率为80%,在EC编码采用4D2P模式时,硬盘的存储利用率为66.67%。In some possible implementations, when the EC encoding adopts 8 data blocks and 1 check block 8D1P mode, the storage utilization rate of the hard disk is 88.89%, and when the EC encoding adopts 4D1P mode, the storage utilization rate of the hard disk is 80%. When the EC encoding adopts the 8D2P mode, the storage utilization rate of the hard disk is 80%, and when the EC encoding adopts the 4D2P mode, the storage utilization rate of the hard disk is 66.67%.
本申请实施例还提供了一种数据存储系统,包括N个存储装置和前面任一数据存储装置对应的实施例,所述N是大于或者等于2的整数。所述数据存储装置如图4所示,存储装置包括:发送单元401,用于发送第一控制指令,所述第一控制指令指示在N个存储装置中安装数据处理软件,在所述N个存储装置中的任一存储装置中创建存储资源池和虚拟机,任一所述虚拟机使用与其对应的存储装置中已创建的存储资源池存储数据,任一所述虚拟机作为所述数据处理软件的可选数据节点,所述N是大于或者等于2的整数。An embodiment of the present application also provides a data storage system, including N storage devices and an embodiment corresponding to any of the foregoing data storage devices, where N is an integer greater than or equal to 2. The data storage device is shown in FIG. 4, the storage device includes: a sending unit 401, configured to send a first control instruction, the first control instruction instructs to install data processing software in N storage devices, A storage resource pool and a virtual machine are created in any one of the storage devices, and any one of the virtual machines uses the created storage resource pool in the corresponding storage device to store data, and any one of the virtual machines is used as the data processing An optional data node of the software, where the N is an integer greater than or equal to 2.
获取单元402,用于获取配置指令,所述配置指令包括:设置进行数据存储时的副本数M,以及存储感知策略;所述存储感知策略包括:确定M个数据节点用于存储数据,所述M个数据节点位于M个不同的存储装置中,所述M为小于或者等于所述N的整数。The obtaining unit 402 is configured to obtain a configuration instruction. The configuration instruction includes: setting the number of copies M for data storage and a storage awareness strategy; the storage awareness strategy includes: determining M data nodes for storing data, the The M data nodes are located in M different storage devices, and the M is an integer less than or equal to the N.
举例来说,若N为3,M为2,则第一控制指令指示在3个存储装置中安装数据处理软件,在3个存储装置中的任一存储装置中创建存储资源池和虚拟机,任意虚拟机使用与其对应的存储装置中已创建的存储资源池创建数据,已创建的3个虚拟机可以作为数据处理 软件的可选数据节点。在一些可能的实施例中,数据处理软件可以是分布式处理软件Hadoop。For example, if N is 3 and M is 2, the first control instruction instructs to install data processing software in three storage devices, and create storage resource pools and virtual machines in any one of the three storage devices. Any virtual machine uses the created storage resource pool in its corresponding storage device to create data, and the three created virtual machines can be used as optional data nodes of the data processing software. In some possible embodiments, the data processing software may be distributed processing software Hadoop.
确定单元403,用于根据所述配置指令确定用于保存待存储数据的M个数据节点。The determining unit 403 is configured to determine M data nodes for storing data to be stored according to the configuration instruction.
举例来说,若M为2,则根据配置指令确定用于保存待存储数据的两个数据节点。For example, if M is 2, two data nodes for storing the data to be stored are determined according to the configuration instruction.
处理单元404,用于将所述待存储数据存储在所述M个数据节点中。The processing unit 404 is configured to store the data to be stored in the M data nodes.
举例来说,若M为2,则将待存储的数据保存到确定的连个数据节点中。For example, if M is 2, the data to be stored is saved to a certain number of data nodes.
本申请实施例提供的数据存储方法在进行多副本存储时,不同的副本位于不同的存储装置中,这样当保存备份数据的一个存储装置出现故障时,实际减少的副本数是一个,相对于现有技术中保存一个副本的存储装置出现问题时,可能会导致多个副本不能用的情况来说,体高了存储的可靠性。In the data storage method provided by the embodiments of the present application, when multiple copies are stored, different copies are located in different storage devices. In this way, when a storage device storing backup data fails, the number of copies actually reduced is one, which is compared with the current one. When there is a problem with the storage device that stores one copy in the technology, it may cause multiple copies to be unusable, which increases the reliability of storage.
在一些可能的实施方式中,N个存储装置中创建的存储资源池采用纠删码EC编码。本申请实施例在存储装置创建的存储资源池中采用EC编码,在数据丢失或者损坏时,可以把丢失的数据计算出来,保证某些数据损失之后仍能继续使用。相对于传统的分布式系统在硬件失效等故障发生后仍然能继续提供服务采用的每份数据3副本的方式来说,本申请实施例节省了存储空间,提高了存储利用率。In some possible implementation manners, the storage resource pools created in N storage devices are coded with erasure code EC. In the embodiment of the present application, EC coding is used in the storage resource pool created by the storage device. When data is lost or damaged, the lost data can be calculated to ensure that some data can still be used after loss. Compared with a traditional distributed system that can continue to provide services after 3 copies of data after hardware failures and other faults occur, the embodiment of the present application saves storage space and improves storage utilization.
在一些可能的实施方式中,在述EC编码采用8数据块1校验块8D1P模式时,硬盘的存储利用率为88.89%,在EC编码采用4D1P模式时,硬盘的存储利用率为80%,在EC编码采用8D2P模式时,硬盘的存储利用率为80%,在EC编码采用4D2P模式时,硬盘的存储利用率为66.67%。In some possible implementations, when the EC encoding uses 8 data blocks and 1 check block 8D1P mode, the storage utilization rate of the hard disk is 88.89%, and when the EC encoding uses 4D1P mode, the storage utilization rate of the hard disk is 80%. When the EC encoding adopts the 8D2P mode, the storage utilization rate of the hard disk is 80%, and when the EC encoding adopts the 4D2P mode, the storage utilization rate of the hard disk is 66.67%.
本申请实施例还提供了一种电子设备,包括:一个或多个处理器;存储装置,用于存储一个或多个程序;当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如前面任一方法实施例所述的数据存储方法。所述方法包括:The embodiment of the present application also provides an electronic device, including: one or more processors; a storage device for storing one or more programs; when the one or more programs are used by the one or more processors Execution, so that the one or more processors implement the data storage method described in any of the foregoing method embodiments. The method includes:
发送第一控制指令,所述第一控制指令指示在N个存储装置中安装数据处理软件,在所述N个存储装置中的任一存储装置中创建存储资源池和虚拟机,任一所述虚拟机使用与其对应的存储装置中已创建的存储资源池存储数据,任一所述虚拟机作为所述数据处理软件的可选数据节点,所述N是大于或者等于2的整数;Send a first control instruction, the first control instruction instructs to install data processing software in N storage devices, create a storage resource pool and a virtual machine in any one of the N storage devices, any one of the The virtual machine uses the created storage resource pool in its corresponding storage device to store data, any one of the virtual machines is used as an optional data node of the data processing software, and the N is an integer greater than or equal to 2;
获取配置指令,所述配置指令包括:设置进行数据存储时的副本数M,以及存储感知策略;所述存储感知策略包括:确定M个数据节点用于存储数据,所述M个数据节点位于M个不同的存储装置中,所述M为小于或者等于所述N的整数;Acquire a configuration instruction, the configuration instruction includes: setting the number of copies M for data storage, and a storage awareness strategy; the storage awareness strategy includes: determining M data nodes for storing data, and the M data nodes are located in M In different storage devices, the M is an integer less than or equal to the N;
根据所述配置指令确定用于保存待存储数据的M个数据节点;Determining, according to the configuration instruction, M data nodes for storing data to be stored;
将所述待存储数据存储在所述M个数据节点中。Storing the data to be stored in the M data nodes.
在一些可能的实施例中,所述N个存储装置中创建的存储资源池采用纠删码EC编码。In some possible embodiments, the storage resource pools created in the N storage devices adopt erasure code EC encoding.
在一些可能的实施例中,所述存储装置包括:分布式服务器或者磁阵。In some possible embodiments, the storage device includes a distributed server or a magnetic array.
在一些可能的实施例中,所述EC编码:包括8数据块1校验块8D1P模式、4D1P模式、8D2P模式、或者4D2P模式。In some possible embodiments, the EC encoding includes 8 data blocks 1 check block 8D1P mode, 4D1P mode, 8D2P mode, or 4D2P mode.
在一些可能的实施例中,所述数据处理软件包括:分布式处理软件Hadoop。In some possible embodiments, the data processing software includes: distributed processing software Hadoop.
在一些可能的实施例中,所述M=2。In some possible embodiments, the M=2.
本申请实施例提供的数据存储方法在进行多副本存储时,不同的副本位于不同的存储装置中,这样当保存备份数据的一个存储装置出现故障时,实际减少的副本数是一个,相对于现有技术中保存一个副本的存储装置出现问题时,可能会导致多个副本不能用的情况来说,提高了存储的可靠性。当N个存储装置中创建的存储资源池采用纠删码EC编码时,在数据丢失或者损坏时,可以把丢失的数据计算出来,保证某些数据损失之后仍能继续使用。相对于传统的分布式系统在硬件失效等故障发生后仍然能继续提供服务采用的每份数据3副本的方式来说,本申请实施例节省了存储空间,提高了存储利用率。In the data storage method provided by the embodiments of the present application, when multiple copies are stored, different copies are located in different storage devices. In this way, when a storage device storing backup data fails, the number of copies actually reduced is one, which is relative to the current one. In some technologies, when a storage device that stores one copy has a problem, it may cause multiple copies to be unusable, which improves storage reliability. When the storage resource pools created in N storage devices are coded with erasure code EC, when data is lost or damaged, the lost data can be calculated to ensure that some data can still be used after loss. Compared with a traditional distributed system that can continue to provide services after 3 copies of data after hardware failures and other faults occur, the embodiment of the present application saves storage space and improves storage utilization.
本申请实施例还提供了一种计算机可读介质,其上存储有计算机程序,所述程序被处理器执行时实现如前面任一方法实施例所述的数据存储方法。所述方法包括:The embodiment of the present application also provides a computer-readable medium on which a computer program is stored, and when the program is executed by a processor, the data storage method as described in any of the foregoing method embodiments is implemented. The method includes:
发送第一控制指令,所述第一控制指令指示在N个存储装置中安装数据处理软件,在所述N个存储装置中的任一存储装置中创建存储资源池和虚拟机,任一所述虚拟机使用与其对应的存储装置中已创建的存储资源池存储数据,任一所述虚拟机作为所述数据处理软件的可选数据节点,所述N是大于或者等于2的整数;Send a first control instruction, the first control instruction instructs to install data processing software in N storage devices, create a storage resource pool and a virtual machine in any one of the N storage devices, any one of the The virtual machine uses the created storage resource pool in its corresponding storage device to store data, any one of the virtual machines is used as an optional data node of the data processing software, and the N is an integer greater than or equal to 2;
获取配置指令,所述配置指令包括:设置进行数据存储时的副本数M,以及存储感知策略;所述存储感知策略包括:确定M个数据节点用于存储数据,所述M个数据节点位于M个不同的存储装置中,所述M为小于或者等于所述N的整数;Acquire a configuration instruction, the configuration instruction includes: setting the number of copies M for data storage, and a storage awareness strategy; the storage awareness strategy includes: determining M data nodes for storing data, and the M data nodes are located in M In different storage devices, the M is an integer less than or equal to the N;
根据所述配置指令确定用于保存待存储数据的M个数据节点;Determining, according to the configuration instruction, M data nodes for storing data to be stored;
将所述待存储数据存储在所述M个数据节点中。Storing the data to be stored in the M data nodes.
在一些可能的实施例中,所述N个存储装置中创建的存储资源池采用纠删码EC编码。In some possible embodiments, the storage resource pools created in the N storage devices adopt erasure code EC encoding.
在一些可能的实施例中,所述存储装置包括:分布式服务器或者磁阵。In some possible embodiments, the storage device includes a distributed server or a magnetic array.
在一些可能的实施例中,所述EC编码:包括8数据块1校验块8D1P模式、4D1P模式、8D2P模式、或者4D2P模式。In some possible embodiments, the EC encoding includes 8 data blocks 1 check block 8D1P mode, 4D1P mode, 8D2P mode, or 4D2P mode.
在一些可能的实施例中,所述数据处理软件包括:分布式处理软件Hadoop。In some possible embodiments, the data processing software includes: distributed processing software Hadoop.
在一些可能的实施例中,所述M=2。In some possible embodiments, the M=2.
本申请实施例提供的数据存储方法在进行多副本存储时,不同的副本位于不同的存储装置中,这样当保存备份数据的一个存储装置出现故障时,实际减少的副本数是一个,相对于现有技术中保存一个副本的存储装置出现问题时,可能会导致多个副本不能用的情况来说,提高了存储的可靠性。当N个存储装置中创建的存储资源池采用纠删码EC编码时,在数据丢失或者损坏时,可以把丢失的数据计算出来,保证某些数据损失之后仍能继续使用。相对于传统的分布式系统在硬件失效等故障发生后仍然能继续提供服务采用的每份数据3副本的方式来说,本申请实施例节省了存储空间,提高了存储利用率。In the data storage method provided by the embodiments of the present application, when multiple copies are stored, different copies are located in different storage devices. In this way, when a storage device storing backup data fails, the number of copies actually reduced is one, which is relative to the current one. In some technologies, when a storage device that stores one copy has a problem, it may cause multiple copies to be unusable, which improves storage reliability. When the storage resource pools created in N storage devices are coded with erasure code EC, when data is lost or damaged, the lost data can be calculated to ensure that some data can still be used after loss. Compared with a traditional distributed system that can continue to provide services after 3 copies of data after hardware failures and other faults occur, the embodiment of the present application saves storage space and improves storage utilization.
应理解本发明的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的实施例能够以除了在这里图示或描述的内容以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或模块的过程、方法、系统、产品或设备不必限于清楚地 列出的那些步骤或模块,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或模块。It should be understood that the terms "first", "second", etc. in the specification and claims of the present invention and the above-mentioned drawings are used to distinguish similar objects, and are not necessarily used to describe a specific sequence or sequence. It should be understood that the data used in this way can be interchanged under appropriate circumstances, so that the embodiments described herein can be implemented in a sequence other than the content illustrated or described herein. In addition, the terms "including" and "having" and any variations of them are intended to cover non-exclusive inclusions. For example, a process, method, system, product, or device that includes a series of steps or modules is not necessarily limited to what is clearly listed. Those steps or modules may include other steps or modules that are not clearly listed or are inherent to these processes, methods, products, or equipment.
以上所揭露的仅为本发明一种较佳实施例而已,当然不能以此来限定本发明之权利范围,本领域普通技术人员可以理解实现上述实施例的全部或部分流程,并依本发明权利要求所作的等同变化,仍属于发明所涵盖的范围。What is disclosed above is only a preferred embodiment of the present invention. Of course, it cannot be used to limit the scope of rights of the present invention. A person of ordinary skill in the art can understand all or part of the process of implementing the above embodiments and follow the rights of the present invention. The equivalent changes required are still within the scope of the invention.

Claims (14)

  1. 一种数据存储方法,其特征在于,所述方法包括如下步骤:A data storage method, characterized in that the method includes the following steps:
    发送第一控制指令,所述第一控制指令指示在N个存储装置中安装数据处理软件,在所述N个存储装置中的任一存储装置中创建存储资源池和虚拟机,任一所述虚拟机使用与其对应的存储装置中已创建的存储资源池存储数据,任一所述虚拟机作为所述数据处理软件的可选数据节点,所述N是大于或者等于2的整数;Send a first control instruction, the first control instruction instructs to install data processing software in N storage devices, create a storage resource pool and a virtual machine in any one of the N storage devices, any one of the The virtual machine uses the created storage resource pool in its corresponding storage device to store data, any one of the virtual machines is used as an optional data node of the data processing software, and the N is an integer greater than or equal to 2;
    获取配置指令,所述配置指令包括:设置进行数据存储时的副本数M,以及存储感知策略;所述存储感知策略包括:确定M个数据节点用于存储数据,所述M个数据节点位于M个不同的存储装置中,所述M为小于或者等于所述N的整数;Acquire a configuration instruction, the configuration instruction includes: setting the number of copies M for data storage, and a storage awareness strategy; the storage awareness strategy includes: determining M data nodes for storing data, and the M data nodes are located in M In different storage devices, the M is an integer less than or equal to the N;
    根据所述配置指令确定用于保存待存储数据的M个数据节点;Determining, according to the configuration instruction, M data nodes for storing data to be stored;
    将所述待存储数据存储在所述M个数据节点中。Storing the data to be stored in the M data nodes.
  2. 根据权利要求1所述的数据存储方法,其特征在于,The data storage method according to claim 1, wherein:
    所述N个存储装置中创建的存储资源池采用纠删码EC编码。The storage resource pools created in the N storage devices are coded with erasure code EC.
  3. 根据权利要求2所述的数据存储方法,其特征在于,所述存储装置包括:分布式服务器或者磁阵。The data storage method according to claim 2, wherein the storage device comprises: a distributed server or a magnetic array.
  4. 根据权利要求2所述的数据存储方法,其特征在于,所述EC编码:包括8数据块1校验块8D1P模式、4D1P模式、8D2P模式、或者4D2P模式。The data storage method according to claim 2, wherein the EC encoding includes 8 data blocks 1 check block 8D1P mode, 4D1P mode, 8D2P mode, or 4D2P mode.
  5. 根据权利要求1所述的数据存储方法,其特征在于,所述数据处理软件包括:分布式处理软件Hadoop。The data storage method according to claim 1, wherein the data processing software comprises: distributed processing software Hadoop.
  6. 根据权利要求1至5任一项所述的数据存储方法,其特征在于,所述M=2。The data storage method according to any one of claims 1 to 5, wherein the M=2.
  7. 一种数据存储装置,其特征在于,包括:A data storage device, characterized in that it comprises:
    发送单元,用于发送第一控制指令,所述第一控制指令指示在N个存储装置中安装数据处理软件,在所述N个存储装置中的任一存储装置中创建存储资源池和虚拟机,任一所述虚拟机使用与其对应的存储装置中已创建的存储资源池存储数据,任一所述虚拟机作为所述数据处理软件的可选数据节点,所述N是大于或者等于2的整数;The sending unit is configured to send a first control instruction, the first control instruction instructs to install data processing software in N storage devices, and create a storage resource pool and a virtual machine in any one of the N storage devices , Any one of the virtual machines uses the created storage resource pool in the corresponding storage device to store data, any one of the virtual machines is used as an optional data node of the data processing software, and the N is greater than or equal to 2 Integer
    获取单元,用于获取配置指令,所述配置指令包括:设置进行数据存储时的副本数M,以及存储感知策略;所述存储感知策略包括:确定M个数据节点用于存储数据,所述M个数据节点位于M个不同的存储装置中,所述M为小于或者等于所述N的整数;The obtaining unit is configured to obtain a configuration instruction, the configuration instruction includes: setting the number of copies M for data storage, and a storage awareness strategy; the storage awareness strategy includes: determining M data nodes for storing data, the M Data nodes are located in M different storage devices, where M is an integer less than or equal to N;
    确定单元,用于根据所述配置指令确定用于保存待存储数据的M个数据节点;A determining unit, configured to determine M data nodes for storing data to be stored according to the configuration instruction;
    处理单元,用于将所述待存储数据存储在所述M个数据节点中。The processing unit is configured to store the data to be stored in the M data nodes.
  8. 根据权利要求7所述的数据存储装置,其特征在于,所述N个存储装置中创建的存储资源池采用纠删码EC编码。8. The data storage device according to claim 7, wherein the storage resource pools created in the N storage devices are coded with erasure code EC.
  9. 根据权利要求8所述的数据存储装置,其特征在于,所述存储装置包括:分布式服务器或者磁阵。The data storage device according to claim 8, wherein the storage device comprises: a distributed server or a magnetic array.
  10. 根据权利要求8所述的数据存储装置,其特征在于,所述EC编码:包括8数据块1 校验块8D1P模式、4D1P模式、8D2P模式、或者4D2P模式。The data storage device according to claim 8, wherein the EC coding includes 8 data blocks 1 check block 8D1P mode, 4D1P mode, 8D2P mode, or 4D2P mode.
  11. 根据权利要求7所述的数据存储装置,其特征在于,所述数据处理软件包括:分布式处理软件Hadoop。The data storage device according to claim 7, wherein the data processing software comprises: distributed processing software Hadoop.
  12. 根据权利要求7至11任一项所述的数据存储装置,其特征在于,所述M=2。The data storage device according to any one of claims 7 to 11, wherein the M=2.
  13. 一种电子设备,其特征在于,包括:An electronic device, characterized in that it comprises:
    一个或多个处理器;One or more processors;
    存储装置,用于存储一个或多个程序;Storage device for storing one or more programs;
    当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1至6中任意一项所述的数据存储方法。When the one or more programs are executed by the one or more processors, the one or more processors implement the data storage method according to any one of claims 1 to 6.
  14. 一种计算机可读介质,其特征在于,其上存储有计算机程序,所述计算机程序被处理器执行时,实现如权利要求1至6中任意一项所述的数据存储方法。A computer readable medium, characterized in that a computer program is stored thereon, and when the computer program is executed by a processor, the data storage method according to any one of claims 1 to 6 is realized.
PCT/CN2020/111903 2019-09-27 2020-08-27 Data storage method and data storage device WO2021057377A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910926872.3A CN112578992B (en) 2019-09-27 2019-09-27 Data storage method and data storage device
CN201910926872.3 2019-09-27

Publications (1)

Publication Number Publication Date
WO2021057377A1 true WO2021057377A1 (en) 2021-04-01

Family

ID=75110606

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/111903 WO2021057377A1 (en) 2019-09-27 2020-08-27 Data storage method and data storage device

Country Status (2)

Country Link
CN (1) CN112578992B (en)
WO (1) WO2021057377A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108287669A (en) * 2018-01-26 2018-07-17 平安科技(深圳)有限公司 Date storage method, device and storage medium
CN108846009A (en) * 2018-04-28 2018-11-20 北京奇艺世纪科技有限公司 Copy data storage method and device in a kind of ceph
CN109408597A (en) * 2018-11-29 2019-03-01 广东电网有限责任公司 A kind of power grid metering big data storage system and its creation method
US20190220208A1 (en) * 2018-01-18 2019-07-18 Dell Products L.P. Method, device and computer program product for storing data
CN110169008A (en) * 2018-07-10 2019-08-23 深圳花儿数据技术有限公司 A kind of distributed data redundancy storage method based on consistency hash algorithm

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10248319B2 (en) * 2015-03-31 2019-04-02 International Business Machines Corporation Storage pool capacity management
CN108667867B (en) * 2017-03-29 2021-05-18 华为技术有限公司 Data storage method and device
CN109799948B (en) * 2017-11-17 2023-05-16 航天信息股份有限公司 Data storage method and device
CN109828868B (en) * 2019-01-04 2023-02-03 新华三技术有限公司成都分公司 Data storage method, device, management equipment and double-active data storage system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190220208A1 (en) * 2018-01-18 2019-07-18 Dell Products L.P. Method, device and computer program product for storing data
CN108287669A (en) * 2018-01-26 2018-07-17 平安科技(深圳)有限公司 Date storage method, device and storage medium
CN108846009A (en) * 2018-04-28 2018-11-20 北京奇艺世纪科技有限公司 Copy data storage method and device in a kind of ceph
CN110169008A (en) * 2018-07-10 2019-08-23 深圳花儿数据技术有限公司 A kind of distributed data redundancy storage method based on consistency hash algorithm
CN109408597A (en) * 2018-11-29 2019-03-01 广东电网有限责任公司 A kind of power grid metering big data storage system and its creation method

Also Published As

Publication number Publication date
CN112578992B (en) 2022-07-22
CN112578992A (en) 2021-03-30

Similar Documents

Publication Publication Date Title
US11068503B2 (en) File system operation handling during cutover and steady state
US8938643B1 (en) Cloning using streaming restore
US8886607B2 (en) Cluster configuration backup and recovery
US9916198B2 (en) Erasure coding and replication in storage clusters
US11868213B2 (en) Incremental backup to object store
CN103226502B (en) A kind of data calamity is for control system and data reconstruction method
US8370302B2 (en) Method and apparatus for block based volume backup
US10146649B2 (en) Handling a virtual data mover (VDM) failover situation by performing a network interface control operation that controls availability of network interfaces provided by a VDM
WO2018098972A1 (en) Log recovery method, storage device and storage node
US8839031B2 (en) Data consistency between virtual machines
WO2012075845A1 (en) Distributed file system
US10055309B1 (en) Parallel restoration of a virtual machine's virtual machine disks
US20130282653A1 (en) Initializing replication in a virtual machine
US8977827B1 (en) System, method and computer program product for recovering stub files
CN105955989B (en) Method for establishing master server and slave server of cloud platform database
WO2017097233A1 (en) Fault tolerance method for data storage load and iptv system
US8117493B1 (en) Fast recovery in data mirroring techniques
US10114715B2 (en) Providing data integrity in a non-reliable storage behavior
WO2021057377A1 (en) Data storage method and data storage device
CN116389233A (en) Container cloud management platform active-standby switching system, method and device and computer equipment
CN115658390A (en) Container disaster tolerance method, system, device, equipment and computer readable storage medium
CN105446830A (en) Data backup method based on NDMP (Network Data Management Protocol) disk-to-disk replication technology
CN106933698B (en) Method and device for storage system
US20170199793A1 (en) Storage aggregate restoration
US20230252045A1 (en) Life cycle management for standby databases

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20870145

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20870145

Country of ref document: EP

Kind code of ref document: A1