CN112578992B - Data storage method and data storage device - Google Patents

Data storage method and data storage device Download PDF

Info

Publication number
CN112578992B
CN112578992B CN201910926872.3A CN201910926872A CN112578992B CN 112578992 B CN112578992 B CN 112578992B CN 201910926872 A CN201910926872 A CN 201910926872A CN 112578992 B CN112578992 B CN 112578992B
Authority
CN
China
Prior art keywords
data
storage
storing
nodes
storage device
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910926872.3A
Other languages
Chinese (zh)
Other versions
CN112578992A (en
Inventor
杨艳伟
孙荣宗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN201910926872.3A priority Critical patent/CN112578992B/en
Priority to PCT/CN2020/111903 priority patent/WO2021057377A1/en
Publication of CN112578992A publication Critical patent/CN112578992A/en
Application granted granted Critical
Publication of CN112578992B publication Critical patent/CN112578992B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0644Management of space entities, e.g. partitions, extents, pools
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0662Virtualisation aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Retry When Errors Occur (AREA)

Abstract

The embodiment of the application discloses a data storage method, which comprises the following steps: sending a first control instruction, wherein the first control instruction instructs to install data processing software in the N storage devices, a storage resource pool and a virtual machine are created in any one of the N storage devices, any virtual machine uses the created storage resource pool in the storage device corresponding to the virtual machine to store data, and any virtual machine is used as an optional data node of the data processing software; acquiring a configuration instruction, wherein the configuration instruction comprises: the number M of copies and a storage perception strategy; the storage-aware policy includes: determining that M data nodes are used for storing data, wherein the M data nodes are located in M different storage devices; determining M data nodes for storing data to be stored according to the configuration instruction; and storing the data to be stored in the M data nodes. According to the data storage method provided by the embodiment of the application, the reliability of data storage is favorably improved through the set storage sensing strategy.

Description

Data storage method and data storage device
Technical Field
The present application relates to the field of computer networks, and in particular, to a data storage method and a data storage apparatus.
Background
In order to improve the security of data storage, when data storage is performed, a multi-copy method is generally adopted, one data file is copied into multiple copies and stored in multiple servers or magnetic arrays respectively, taking the example that the data file is stored in multiple servers, as long as any one server storing the copy of the data file is in an available state, the data file can be accessed, and thus the problems of data loss and inaccessibility of a single server due to network failure, disk damage, power failure, downtime and the like can be avoided.
Although this method has the advantage that data is easy to restore, when data storage devices storing multiple backup data are powered off, down, etc., the number of usable copies is not the total number of copies minus one, but the total number of copies minus the number of copies stored in the current data storage device, that is, the number of actually usable copies is less than expected, which reduces the reliability of storage.
Therefore, how to improve the reliability of data storage is a problem to be solved.
Disclosure of Invention
The embodiment of the application provides a data storage method and a data storage device, which can improve the reliability of data storage.
In a first aspect, an embodiment of the present application provides a data storage method, where the method includes the following steps:
sending a first control instruction, wherein the first control instruction instructs to install data processing software in N storage devices, a storage resource pool and a virtual machine are created in any one of the N storage devices, any virtual machine uses the created storage resource pool in the corresponding storage device to store data, any virtual machine serves as an optional data node of the data processing software, and N is an integer greater than or equal to 2;
obtaining configuration instructions, the configuration instructions comprising: setting the copy number M when data storage is carried out and a storage perception strategy; the storage aware policy includes: determining M data nodes for storing data, wherein the M data nodes are located in M different storage devices, and M is an integer less than or equal to N;
determining M data nodes for storing data to be stored according to the configuration instruction;
and storing the data to be stored in the M data nodes.
In the data storage method provided by the embodiment of the application, when multiple copies are stored, different copies are located in different storage devices, so that when one storage device for storing backup data fails, the number of actually reduced copies is one, and compared with the situation that a plurality of copies cannot be used possibly when a storage device for storing one copy in the prior art fails, the storage reliability is improved.
In some possible embodiments, the storage resource pool created in the N storage devices is encoded with erasure codes EC.
The EC codes are adopted in the storage resource pool created by the storage device, when data are lost or damaged, the lost data can be calculated, and the storage device can still be used after certain data are lost. Compared with the traditional mode that the distributed system can still continuously provide 3 copies of each data after failures such as hardware failure occur, the embodiment of the application saves the storage space and improves the storage utilization rate.
In some possible embodiments, when the EC coding adopts the 8 data block 1 parity block 8D1P mode, the storage utilization of the hard disk is 88.89%, when the EC coding adopts the 4D1P mode, the storage utilization of the hard disk is 80%, when the EC coding adopts the 8D2P mode, the storage utilization of the hard disk is 80%, and when the EC coding adopts the 4D2P mode, the storage utilization of the hard disk is 66.67%.
In a second aspect, an embodiment of the present application provides a data storage device, including:
the data processing device comprises a sending unit, a storage unit and a processing unit, wherein the sending unit is used for sending a first control instruction, the first control instruction instructs to install data processing software in N storage devices, a storage resource pool and a virtual machine are created in any one of the N storage devices, any one virtual machine uses the created storage resource pool in the corresponding storage device to store data, the any one virtual machine serves as an optional data node of the data processing software, and N is an integer greater than or equal to 2.
An obtaining unit, configured to obtain a configuration instruction, where the configuration instruction includes: setting the copy number M when data storage is carried out and a storage perception strategy; the storage aware policy includes: determining M data nodes for storing data, the M data nodes being located in M different storage devices, M being an integer less than or equal to N.
And the determining unit is used for determining M data nodes for storing the data to be stored according to the configuration instruction.
And the processing unit is used for storing the data to be stored in the M data nodes.
When the data storage device provided by the embodiment of the application performs multi-copy storage, different copies are located in different storage devices, so that when one storage device for storing backup data fails, the number of actually reduced copies is one, and compared with the situation that a plurality of copies cannot be used possibly when a storage device for storing one copy in the prior art fails, the reliability of storage is improved.
In some possible embodiments, the storage resource pool created in the N storage devices is encoded with erasure codes EC. The EC codes are adopted in the storage resource pool created by the storage device, when data are lost or damaged, the lost data can be calculated, and the storage device can still be used after certain data are lost. Compared with the traditional mode that the distributed system can still continuously provide 3 copies of each data after failures such as hardware failure occur, the embodiment of the application saves the storage space and improves the storage utilization rate.
In some possible embodiments, when the EC coding adopts the 8 data block 1 parity block 8D1P mode, the storage utilization of the hard disk is 88.89%, when the EC coding adopts the 4D1P mode, the storage utilization of the hard disk is 80%, when the EC coding adopts the 8D2P mode, the storage utilization of the hard disk is 80%, and when the EC coding adopts the 4D2P mode, the storage utilization of the hard disk is 66.67%.
In some possible embodiments, the storage device comprises: distributed servers or magnetic arrays.
In some possible embodiments, the data processing software comprises: and distributed processing software Hadoop.
In some possible embodiments, the M-2.
In a third aspect, an embodiment of the present application provides a data storage system, which includes N storage devices, where N is an integer greater than or equal to 2, as described in the second aspect or any one of the possible implementations of the second aspect.
In a fourth aspect, an embodiment of the present application provides an electronic device, including:
one or more processors;
storage means for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement a method as described in the first aspect or any one of the possible embodiments of the first aspect.
In a fifth aspect, the present application provides a computer-readable medium, on which a computer program is stored, which when executed by a processor, implements the method as described in the first aspect or any one of the possible implementation manners of the first aspect.
Drawings
Fig. 1 is a schematic flowchart of a data storage method according to an embodiment of the present application.
Fig. 2 is a schematic flowchart of a data storage method according to another embodiment of the present application.
Fig. 3 is an interaction flow diagram of a data storage method according to an embodiment of the present application.
Fig. 4 is a schematic structural diagram of a data storage device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described clearly and completely with reference to the accompanying drawings, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. Other embodiments can be derived by those skilled in the art based on the embodiments in the present application.
Referring to fig. 1, fig. 1 is a data processing method according to an embodiment of the present application, including the following steps.
101. Sending a first control instruction, wherein the first control instruction instructs to install data processing software in N storage devices, a storage resource pool and a virtual machine are created in any one of the N storage devices, any virtual machine uses the created storage resource pool in the corresponding storage device to store data, any virtual machine serves as an optional data node of the data processing software, and N is an integer greater than or equal to 2.
The storage device may be, for example, a distributed server or a magnetic array, etc.
102. Obtaining configuration instructions, the configuration instructions comprising: setting the copy number M and a storage perception strategy when data storage is carried out; the storage aware policy includes: determining M data nodes for storing data, the M data nodes being located in M different storage devices, M being an integer less than or equal to N.
For example, if N is 3 and M is 2, the first control instruction instructs to install the data processing software in 3 storage devices, create a storage resource pool and a virtual machine in any one of the 3 storage devices, where any virtual machine creates data using the created storage resource pool in the storage device corresponding to the virtual machine, and the created 3 virtual machines may serve as optional data nodes of the data processing software. In some possible embodiments, the data processing software may be distributed processing software Hadoop.
103. And determining M data nodes for storing the data to be stored according to the configuration instruction.
For example, if M is 2, two data nodes for saving data to be stored are determined according to the configuration instruction.
104. And storing the data to be stored in the M data nodes.
For example, if M is 2, the data to be stored is saved into the determined consecutive data nodes.
In the data storage method provided by the embodiment of the application, when multiple copies are stored, different copies are located in different storage devices, so that when one storage device for storing backup data fails, the number of actually reduced copies is one, and compared with the situation that a plurality of copies cannot be used possibly when a storage device for storing one copy in the prior art fails, the storage reliability is improved.
Referring to fig. 2, fig. 2 is a schematic flowchart of a data processing method according to another embodiment of the present application. The method comprises the following steps:
201. sending a first control instruction, wherein the first control instruction instructs to install data processing software in N storage devices, a storage resource pool and a virtual machine are created in any one of the N storage devices, any one virtual machine uses the created storage resource pool in the corresponding storage device to store data, any one virtual machine serves as an optional data node of the data processing software, N is an integer greater than or equal to 2, and the storage resource pool is encoded by erasure codes EC.
The storage device may be, for example, a distributed server or a magnetic array, etc. The following description will be made taking a distributed server as an example.
In some possible embodiments, when the EC coding adopts the 8 data block 1 parity block 8D1P mode, the storage utilization rate of the hard disk is 88.89%, when the EC coding adopts the 4D1P mode, the storage utilization rate of the hard disk is 80%, when the EC coding adopts the 8D2P mode, the storage utilization rate of the hard disk is 80%, and when the EC coding adopts the 4D2P mode, the storage utilization rate of the hard disk is 66.67%.
202. Obtaining a configuration instruction, wherein the configuration instruction comprises: setting the copy number M when data storage is carried out and a storage perception strategy; the storage aware policy includes: determining M data nodes for storing data, the M data nodes being located in M different storage devices, M being an integer less than or equal to N.
For example, if N is 3 and M is 2, the first control instruction instructs to install the data processing software in 3 storage devices, create a storage resource pool and a virtual machine in any one of the 3 storage devices, where any virtual machine creates data using the created storage resource pool in the storage device corresponding to the virtual machine, and the created 3 virtual machines may serve as optional data nodes of the data processing software. In some possible embodiments, the data processing software may be distributed processing software Hadoop.
203. And determining M data nodes for storing the data to be stored according to the configuration instruction.
For example, if M is 2, two data nodes for saving data to be stored are determined according to the configuration instruction.
204. And storing the data to be stored in the M data nodes.
For example, if M is 2, the data to be stored is saved into the determined consecutive data nodes.
According to the embodiment of the application, EC codes are adopted in the storage resource pool created by the storage device, when data is lost or damaged, the lost data can be calculated, and the continuous use of some data after the data is lost is ensured. Compared with the traditional mode that the distributed system can still continuously provide 3 copies of each data after failures such as hardware failure occur, the embodiment of the application saves the storage space and improves the storage utilization rate.
Referring to fig. 3, fig. 3 is an interaction flow diagram of a data storage method according to an embodiment of the present application. As shown in fig. 3, the data storage in this embodiment includes the following steps.
301. Software Defined Storage (SDS) is installed on the first Rack1 and the second Rack2, and different Storage resource pools are created on Rack1 and Rack2 respectively, and EC (N: N + M) encoding is adopted, and in this embodiment, EC encoding adopts 8D1P mode.
In this embodiment, as shown in FIG. 3, storage resource pool 1 may be created by an SDS, with storage resource pool 1 encoded with EC in 8D1P mode. And creating resource pool 2 from the SDS, resource pool 2 employing EC encoding in 8D1P mode.
SDS is a storage architecture that can separate storage software from hardware. Unlike conventional Network Attached Storage (NAS) or Storage Area Network (SAN) systems, SDS is typically implemented on industry standard systems or x86 systems, thereby eliminating the dependency of software on proprietary hardware. SDS typically employs a distributed architecture to promote reliability and scalability, and is sometimes referred to as SDS for distributed storage. The two are obviously different, the distributed storage refers to an architecture, and the emphasis is that the architecture is distributed; SDS refers to software defined storage, emphasizing software and hardware decoupling.
SDS has the following advantages: (1) and decoupling software and hardware. The storage hardware is commercial off-the-shelf (COTS) COTS, so that locking by a manufacturer is avoided, and equipment purchasing cost is reduced by layered purchasing of the hardware and the software. (2) And the expansibility is strong. The SDS adopts the distributed architecture, the storage specification is infinitely expanded theoretically, and the storage specification is increased linearly (horizontally expanded) along with the number of the servers. SAN is limited by the processing capacity of a controller, the specification of a single set of magnetic array is limited, and after the storage specification exceeds the specification of the magnetic array, a set of storage equipment (longitudinal expansion) must be added. (3) The reliability is high.
302. And creating a virtual machine used by Hadoop, wherein the disk of the virtual machine uses the storage pool on the rack where the virtual machine is located.
Specifically, the virtual machine disk on Rack1 uses storage resource pool 1, and the virtual machine on Rack2 uses storage resource pool 2.
Hadoop is a Distributed System infrastructure frequently used in the prior art, and a Hadoop Distributed File System (HDFS) divides nodes into two types, namely Name nodes and Data nodes. The NameNode manages the namespace of the file system. It maintains a file system tree and all files and directories within the entire tree. This information is permanently stored on the local disk in two files: a namespace mirror file and an edit log file. The NameNode records the data node information of each block in each file, but the NameNode does not permanently store the position information of the blocks, and the information is reconstructed by the data nodes when the system is started.
303. Hadoops are installed in the virtual machines created in step 302, and the virtual machines are Hadoop-hosted as DataNodes.
304. And configuring a rack sensing strategy of Hadoop, and configuring the number of copies to be 2.
When data is written into Hadoop, one DataNode is taken from each of the two racks, and thus the data is written into two different storage pools. The machine frame-level reliability ensures that data are live and live, and the overall reliability is improved.
It can be understood that, when the storage system is actually built, the number of the racks is not limited to two, and a plurality of racks may also adopt the same strategy.
According to the embodiment of the application, EC codes are adopted in the storage resource pool created by the storage device, when data is lost or damaged, the lost data can be calculated, and the continuous use of some data after the data is lost is ensured. Compared with the traditional mode that the distributed system can still continuously provide 3 copies of each data after failures such as hardware failure occur, the embodiment of the application saves the storage space and improves the storage utilization rate.
Referring to fig. 4, fig. 4 is a block diagram illustrating a data storage device 400 according to an embodiment of the present application, where the data storage device 400 includes:
a sending unit 401, configured to send a first control instruction, where the first control instruction instructs to install data processing software in N storage devices, and create a storage resource pool and a virtual machine in any one of the N storage devices, where any one of the virtual machines uses the created storage resource pool in the storage device corresponding to the virtual machine to store data, and any one of the virtual machines serves as an optional data node of the data processing software, where N is an integer greater than or equal to 2.
A obtaining unit 402, configured to obtain a configuration instruction, where the configuration instruction includes: setting the copy number M when data storage is carried out and a storage perception strategy; the storage aware policy includes: determining M data nodes for storing data, the M data nodes being located in M different storage devices, the M being an integer less than or equal to the N.
For example, if N is 3 and M is 2, the first control instruction instructs to install the data processing software in 3 storage devices, create a storage resource pool and a virtual machine in any one of the 3 storage devices, where any virtual machine creates data using the created storage resource pool in the storage device corresponding to the virtual machine, and the created 3 virtual machines may serve as optional data nodes of the data processing software. In some possible embodiments, the data processing software may be distributed processing software Hadoop.
A determining unit 403, configured to determine, according to the configuration instruction, M data nodes for storing data to be stored.
For example, if M is 2, two data nodes for saving data to be stored are determined according to the configuration instruction.
A processing unit 404, configured to store the data to be stored in the M data nodes.
For example, if M is 2, the data to be stored is saved into the determined consecutive data nodes.
In the data storage method provided by the embodiment of the application, when multiple copies are stored, different copies are located in different storage devices, so that when one storage device for storing backup data fails, the actually reduced number of copies is one, and compared with the situation that a plurality of copies are possibly unavailable when a storage device for storing one copy in the prior art fails, the storage reliability is improved.
In some possible embodiments, the storage resource pool created in the N storage devices is encoded with erasure codes EC. According to the embodiment of the application, EC codes are adopted in the storage resource pool created by the storage device, when data is lost or damaged, the lost data can be calculated, and the continuous use of some data after the data is lost is ensured. Compared with the traditional mode that the distributed system can still continuously provide 3 copies of each data after failures such as hardware failure occur, the embodiment of the application saves the storage space and improves the storage utilization rate.
In some possible embodiments, when the EC coding adopts the 8 data block 1 parity block 8D1P mode, the storage utilization of the hard disk is 88.89%, when the EC coding adopts the 4D1P mode, the storage utilization of the hard disk is 80%, when the EC coding adopts the 8D2P mode, the storage utilization of the hard disk is 80%, and when the EC coding adopts the 4D2P mode, the storage utilization of the hard disk is 66.67%.
The embodiment of the application also provides a data storage system, which comprises N storage devices and an embodiment corresponding to any one of the data storage devices, wherein N is an integer greater than or equal to 2. As shown in fig. 4, the data storage device includes: a sending unit 401, configured to send a first control instruction, where the first control instruction instructs to install data processing software in N storage devices, and create a storage resource pool and a virtual machine in any one of the N storage devices, where any one of the virtual machines uses the created storage resource pool in the storage device corresponding to the virtual machine to store data, and any one of the virtual machines serves as an optional data node of the data processing software, where N is an integer greater than or equal to 2.
An obtaining unit 402, configured to obtain a configuration instruction, where the configuration instruction includes: setting the copy number M and a storage perception strategy when data storage is carried out; the storage aware policy includes: determining M data nodes for storing data, the M data nodes being located in M different storage devices, the M being an integer less than or equal to the N.
For example, if N is 3 and M is 2, the first control instruction instructs to install the data processing software in 3 storage devices, create a storage resource pool and a virtual machine in any one of the 3 storage devices, where any virtual machine creates data using the created storage resource pool in the storage device corresponding to the virtual machine, and the created 3 virtual machines may serve as optional data nodes of the data processing software. In some possible embodiments, the data processing software may be distributed processing software Hadoop.
A determining unit 403, configured to determine, according to the configuration instruction, M data nodes for storing data to be stored.
For example, if M is 2, two data nodes for saving data to be stored are determined according to the configuration instruction.
A processing unit 404, configured to store the data to be stored in the M data nodes.
For example, if M is 2, the data to be stored is saved into the determined consecutive data nodes.
In the data storage method provided by the embodiment of the application, when multiple copies are stored, different copies are located in different storage devices, so that when one storage device for storing backup data fails, the actually reduced number of copies is one, and compared with the situation that a plurality of copies are possibly unavailable when a storage device for storing one copy in the prior art fails, the storage reliability is improved.
In some possible embodiments, the pool of storage resources created in the N storages is encoded with erasure codes EC. The EC codes are adopted in the storage resource pool created by the storage device, when data are lost or damaged, the lost data can be calculated, and the storage device can still be used after certain data are lost. Compared with the traditional mode that the distributed system can still continuously provide 3 copies of each data after failures such as hardware failure occur, the embodiment of the application saves the storage space and improves the storage utilization rate.
In some possible embodiments, when the EC coding adopts the 8 data block 1 parity block 8D1P mode, the storage utilization of the hard disk is 88.89%, when the EC coding adopts the 4D1P mode, the storage utilization of the hard disk is 80%, when the EC coding adopts the 8D2P mode, the storage utilization of the hard disk is 80%, and when the EC coding adopts the 4D2P mode, the storage utilization of the hard disk is 66.67%.
An embodiment of the present application further provides an electronic device, including: one or more processors; storage means for storing one or more programs; when executed by the one or more processors, cause the one or more processors to implement a data storage method as in any preceding method embodiment. The method comprises the following steps:
sending a first control instruction, wherein the first control instruction instructs to install data processing software in N storage devices, a storage resource pool and a virtual machine are created in any one of the N storage devices, any virtual machine uses the created storage resource pool in the corresponding storage device to store data, any virtual machine serves as an optional data node of the data processing software, and N is an integer greater than or equal to 2;
obtaining a configuration instruction, wherein the configuration instruction comprises: setting the copy number M and a storage perception strategy when data storage is carried out; the storage aware policy includes: determining M data nodes for storing data, wherein the M data nodes are located in M different storage devices, and M is an integer less than or equal to N;
determining M data nodes for storing data to be stored according to the configuration instruction;
and storing the data to be stored in the M data nodes.
In some possible embodiments, the storage resource pool created in the N storage devices is encoded with erasure codes EC.
In some possible embodiments, the storage device comprises: distributed servers or magnetic arrays.
In some possible embodiments, the EC encodes: including an 8 data chunk 1 parity chunk 8D1P mode, a 4D1P mode, an 8D2P mode, or a 4D2P mode.
In some possible embodiments, the data processing software comprises: and distributed processing software Hadoop.
In some possible embodiments, the M-2.
In the data storage method provided by the embodiment of the application, when multiple copies are stored, different copies are located in different storage devices, so that when one storage device for storing backup data fails, the number of actually reduced copies is one, and compared with the situation that a plurality of copies cannot be used possibly when a storage device for storing one copy in the prior art fails, the storage reliability is improved. When the storage resource pool created in the N storage devices adopts the erasure codes EC for coding, when the data is lost or damaged, the lost data can be calculated, and the data can still be used after some data are lost. Compared with the traditional mode that the distributed system can still continuously provide 3 copies of each data after failures such as hardware failure occur, the embodiment of the application saves the storage space and improves the storage utilization rate.
The present application also provides a computer readable medium, on which a computer program is stored, which when executed by a processor implements the data storage method according to any of the preceding method embodiments. The method comprises the following steps:
sending a first control instruction, wherein the first control instruction instructs to install data processing software in N storage devices, a storage resource pool and a virtual machine are created in any one of the N storage devices, any virtual machine uses the created storage resource pool in the corresponding storage device to store data, any virtual machine serves as an optional data node of the data processing software, and N is an integer greater than or equal to 2;
obtaining a configuration instruction, wherein the configuration instruction comprises: setting the copy number M when data storage is carried out and a storage perception strategy; the storage aware policy includes: determining M data nodes for storing data, wherein the M data nodes are located in M different storage devices, and M is an integer less than or equal to N;
determining M data nodes for storing data to be stored according to the configuration instruction;
and storing the data to be stored in the M data nodes.
In some possible embodiments, the storage resource pool created in the N storage devices is encoded with erasure codes EC.
In some possible embodiments, the storage device comprises: distributed servers or magnetic arrays.
In some possible embodiments, the EC encodes: including an 8 data chunk 1 parity chunk 8D1P mode, a 4D1P mode, an 8D2P mode, or a 4D2P mode.
In some possible embodiments, the data processing software comprises: and distributed processing software Hadoop.
In some possible embodiments, the M-2.
In the data storage method provided by the embodiment of the application, when multiple copies are stored, different copies are located in different storage devices, so that when one storage device for storing backup data fails, the number of actually reduced copies is one, and compared with the situation that a plurality of copies cannot be used possibly when a storage device for storing one copy in the prior art fails, the storage reliability is improved. When the storage resource pool created in the N storage devices adopts erasure codes EC coding, when data is lost or damaged, the lost data can be calculated, and the storage resource pool can still be used after some data are lost. Compared with the traditional mode that the distributed system can still continuously provide 3 copies of each data after failures such as hardware failure occur, the embodiment of the application saves the storage space and improves the storage utilization rate.
It is to be understood that the terms "first," "second," and the like in the description and in the claims, and in the drawings, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be implemented in other sequences than those illustrated or described herein. Moreover, the terms "comprises," "comprising," and any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or modules is not necessarily limited to those steps or modules explicitly listed, but may include other steps or modules not expressly listed or inherent to such process, method, article, or apparatus.
While the invention has been described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention.

Claims (14)

1. A method of storing data, the method comprising the steps of:
sending a first control instruction, wherein the first control instruction instructs to install data processing software in N storage devices, a storage resource pool and a virtual machine are created in any one of the N storage devices, any virtual machine uses the created storage resource pool in the corresponding storage device to store data, any virtual machine serves as an optional data node of the data processing software, and N is an integer greater than or equal to 2;
obtaining a configuration instruction, wherein the configuration instruction comprises: setting the copy number M when data storage is carried out and a storage perception strategy; the storage aware policy includes: determining M data nodes for storing data, wherein the M data nodes are located in M different storage devices, and M is an integer less than or equal to N;
determining M data nodes for storing data to be stored according to the configuration instruction;
and storing the data to be stored in the M data nodes.
2. The data storage method of claim 1,
and the storage resource pool created in the N storage devices is encoded by adopting an erasure code EC.
3. The data storage method of claim 2, wherein the storage device comprises: distributed servers or magnetic arrays.
4. The data storage method of claim 2, wherein the EC encodes: including an 8 data chunk 1 parity chunk 8D1P mode, a 4 data chunk 1 parity chunk 4D1P mode, an 8 data chunk 2 parity chunk 8D2P mode, or a 4 data chunk 2 parity chunk 4D2P mode.
5. The data storage method of claim 1, wherein the data processing software comprises: and (4) carrying out Hadoop on distributed processing software.
6. The data storage method of any one of claims 1 to 5, wherein M-2.
7. A data storage device, comprising:
a sending unit, configured to send a first control instruction, where the first control instruction instructs to install data processing software in N storage devices, and create a storage resource pool and a virtual machine in any one of the N storage devices, where any one of the virtual machines uses the created storage resource pool in the storage device corresponding to the virtual machine to store data, and any one of the virtual machines serves as an optional data node of the data processing software, where N is an integer greater than or equal to 2;
an obtaining unit, configured to obtain a configuration instruction, where the configuration instruction includes: setting the copy number M and a storage perception strategy when data storage is carried out; the storage aware policy includes: determining M data nodes for storing data, wherein the M data nodes are located in M different storage devices, and M is an integer less than or equal to N;
the determining unit is used for determining M data nodes for storing data to be stored according to the configuration instruction;
and the processing unit is used for storing the data to be stored in the M data nodes.
8. The data storage device of claim 7, wherein the created storage resource pool of the N storage devices is encoded with Erasure Codes (EC).
9. The data storage device of claim 8, wherein the storage device comprises: distributed servers or magnetic arrays.
10. The data storage device of claim 8, wherein the EC encodes: including 8 data chunk 1 parity chunk 8D1P mode, 4 data chunk 1 parity chunk 4D1P mode, 4 data chunk 2 parity chunk 8D2P mode, or 4 data chunk 2 parity chunk 4D2P mode.
11. The data storage device of claim 7, wherein the data processing software comprises: and distributed processing software Hadoop.
12. A data storage device according to any of claims 7 to 11, wherein M-2.
13. An electronic device, comprising:
one or more processors;
storage means for storing one or more programs;
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement a data storage method as recited in any of claims 1-6.
14. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the data storage method of any one of claims 1 to 6.
CN201910926872.3A 2019-09-27 2019-09-27 Data storage method and data storage device Active CN112578992B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910926872.3A CN112578992B (en) 2019-09-27 2019-09-27 Data storage method and data storage device
PCT/CN2020/111903 WO2021057377A1 (en) 2019-09-27 2020-08-27 Data storage method and data storage device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910926872.3A CN112578992B (en) 2019-09-27 2019-09-27 Data storage method and data storage device

Publications (2)

Publication Number Publication Date
CN112578992A CN112578992A (en) 2021-03-30
CN112578992B true CN112578992B (en) 2022-07-22

Family

ID=75110606

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910926872.3A Active CN112578992B (en) 2019-09-27 2019-09-27 Data storage method and data storage device

Country Status (2)

Country Link
CN (1) CN112578992B (en)
WO (1) WO2021057377A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108667867A (en) * 2017-03-29 2018-10-16 华为技术有限公司 Date storage method and device
CN109799948A (en) * 2017-11-17 2019-05-24 航天信息股份有限公司 A kind of date storage method and device
CN109828868A (en) * 2019-01-04 2019-05-31 新华三技术有限公司成都分公司 Date storage method, device, management equipment and dual-active data-storage system

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10248319B2 (en) * 2015-03-31 2019-04-02 International Business Machines Corporation Storage pool capacity management
CN110058790B (en) * 2018-01-18 2022-05-13 伊姆西Ip控股有限责任公司 Method, apparatus and computer program product for storing data
CN108287669B (en) * 2018-01-26 2019-11-12 平安科技(深圳)有限公司 Date storage method, device and storage medium
CN108846009B (en) * 2018-04-28 2021-02-05 北京奇艺世纪科技有限公司 Copy data storage method and device in ceph
CN110169008B (en) * 2018-07-10 2022-06-03 深圳花儿数据技术有限公司 Distributed data redundancy storage method based on consistent hash algorithm
CN109408597A (en) * 2018-11-29 2019-03-01 广东电网有限责任公司 A kind of power grid metering big data storage system and its creation method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108667867A (en) * 2017-03-29 2018-10-16 华为技术有限公司 Date storage method and device
CN109799948A (en) * 2017-11-17 2019-05-24 航天信息股份有限公司 A kind of date storage method and device
CN109828868A (en) * 2019-01-04 2019-05-31 新华三技术有限公司成都分公司 Date storage method, device, management equipment and dual-active data-storage system

Also Published As

Publication number Publication date
WO2021057377A1 (en) 2021-04-01
CN112578992A (en) 2021-03-30

Similar Documents

Publication Publication Date Title
US11726850B2 (en) Increasing or decreasing the amount of log data generated based on performance characteristics of a device
US11740826B2 (en) Policy-based hierarchical data protection in distributed storage
US10437672B2 (en) Erasure coding and replication in storage clusters
EP2787446B1 (en) Distributed storage method, device and system
CN106407040B (en) A kind of duplicating remote data method and system
EP3101530B1 (en) Data distribution method, data storage method, related apparatus, and system
CN101539873B (en) Data recovery method, data node and distributed file system
CN106776130B (en) Log recovery method, storage device and storage node
EP2394220B1 (en) Distributed storage of recoverable data
US8839031B2 (en) Data consistency between virtual machines
EP3449373B1 (en) Cross-platform replication
WO2019020081A1 (en) Distributed system and fault recovery method and apparatus thereof, product, and storage medium
US8930751B2 (en) Initializing replication in a virtual machine
US11442827B2 (en) Policy-based hierarchical data protection in distributed storage
CN104486438A (en) Disaster-tolerant method and disaster-tolerant device of distributed storage system
CN109165112B (en) Fault recovery method, system and related components of metadata cluster
CN115658390A (en) Container disaster tolerance method, system, device, equipment and computer readable storage medium
CN109324931B (en) Method for realizing vmware mount recovery in data de-duplication system
CN112578992B (en) Data storage method and data storage device
CN111177104B (en) Log downloading and flushing method and device of NAS storage system
US20190213090A1 (en) Storage aggregate restoration
CN116389233A (en) Container cloud management platform active-standby switching system, method and device and computer equipment
CN115470041A (en) Data disaster recovery management method and device
CN117493074A (en) Virtual machine increment recovery method, device, computer equipment and storage medium
CN118626432A (en) Data processing method, storage system, network interface device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant