CN112578992A - Data storage method and data storage device - Google Patents
Data storage method and data storage device Download PDFInfo
- Publication number
- CN112578992A CN112578992A CN201910926872.3A CN201910926872A CN112578992A CN 112578992 A CN112578992 A CN 112578992A CN 201910926872 A CN201910926872 A CN 201910926872A CN 112578992 A CN112578992 A CN 112578992A
- Authority
- CN
- China
- Prior art keywords
- data
- storage
- storage device
- storing
- processing software
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/0614—Improving the reliability of storage systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1446—Point-in-time backing up or restoration of persistent data
- G06F11/1448—Management of the data involved in backup or backup restore
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0638—Organizing or formatting or addressing of data
- G06F3/0644—Management of space entities, e.g. partitions, extents, pools
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0662—Virtualisation aspects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/067—Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Quality & Reliability (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Retry When Errors Occur (AREA)
Abstract
The embodiment of the application discloses a data storage method, which comprises the following steps: sending a first control instruction, wherein the first control instruction instructs to install data processing software in the N storage devices, a storage resource pool and a virtual machine are created in any one of the N storage devices, any virtual machine uses the created storage resource pool in the storage device corresponding to the virtual machine to store data, and any virtual machine is used as an optional data node of the data processing software; acquiring a configuration instruction, wherein the configuration instruction comprises: the number of copies M and a storage perception strategy; the storage-aware policy includes: determining that M data nodes are used for storing data, wherein the M data nodes are located in M different storage devices; determining M data nodes for storing data to be stored according to the configuration instruction; and storing the data to be stored in the M data nodes. According to the data storage method provided by the embodiment of the application, the reliability of data storage is favorably improved through the set storage sensing strategy.
Description
Technical Field
The present application relates to the field of computer networks, and in particular, to a data storage method and a data storage apparatus.
Background
In order to improve the security of data storage, when data storage is performed, a multi-copy method is generally adopted, one data file is copied into multiple copies and stored in multiple servers or magnetic arrays respectively, taking the example that the data file is stored in multiple servers, as long as any one server storing the copy of the data file is in an available state, the data file can be accessed, and thus the problems of data loss and inaccessibility of a single server due to network failure, disk damage, power failure, downtime and the like can be avoided.
Although this method has the advantage that data is easy to restore, when a data storage device storing multiple backup data is powered off or down, the number of usable copies is not the total number of copies minus one, but the total number of copies minus the number of copies stored in the current data storage device, that is, the number of actually usable copies is less than the expected number, which reduces the reliability of storage.
Therefore, how to improve the reliability of data storage is a problem to be solved.
Disclosure of Invention
The embodiment of the application provides a data storage method and a data storage device, which can improve the reliability of data storage.
In a first aspect, an embodiment of the present application provides a data storage method, where the method includes the following steps:
sending a first control instruction, wherein the first control instruction instructs to install data processing software in N storage devices, a storage resource pool and a virtual machine are created in any one of the N storage devices, any virtual machine uses the created storage resource pool in the corresponding storage device to store data, any virtual machine serves as an optional data node of the data processing software, and N is an integer greater than or equal to 2;
obtaining configuration instructions, the configuration instructions comprising: setting the copy number M and a storage perception strategy when data storage is carried out; the storage aware policy includes: determining M data nodes for storing data, wherein the M data nodes are located in M different storage devices, and M is an integer less than or equal to N;
determining M data nodes for storing data to be stored according to the configuration instruction;
and storing the data to be stored in the M data nodes.
In the data storage method provided by the embodiment of the application, when multiple copies are stored, different copies are located in different storage devices, so that when one storage device for storing backup data fails, the actually reduced number of copies is one, and compared with the situation that a plurality of copies are possibly unavailable when a storage device for storing one copy in the prior art fails, the storage reliability is improved.
In some possible embodiments, the storage resource pool created in the N storage devices is encoded with erasure codes EC.
The EC codes are adopted in the storage resource pool created by the storage device, when data are lost or damaged, the lost data can be calculated, and the storage device can still be used after certain data are lost. Compared with the traditional mode that the distributed system can still continuously provide 3 copies of each data after failures such as hardware failure occur, the embodiment of the application saves the storage space and improves the storage utilization rate.
In some possible embodiments, the storage utilization rate of the hard disk is 88.89% when the EC coding adopts the 8 data block 1 parity block 8D1P mode, 80% when the EC coding adopts the 4D1P mode, 80% when the EC coding adopts the 8D2P mode, and 66.67% when the EC coding adopts the 4D2P mode.
In a second aspect, an embodiment of the present application provides a data storage device, including:
a sending unit, configured to send a first control instruction, where the first control instruction instructs to install data processing software in N storage devices, and create a storage resource pool and a virtual machine in any one of the N storage devices, where any one of the virtual machines uses the created storage resource pool in the storage device corresponding to the virtual machine to store data, and any one of the virtual machines serves as an optional data node of the data processing software, where N is an integer greater than or equal to 2.
An obtaining unit, configured to obtain a configuration instruction, where the configuration instruction includes: setting the copy number M and a storage perception strategy when data storage is carried out; the storage aware policy includes: determining M data nodes for storing data, the M data nodes being located in M different storage devices, M being an integer less than or equal to N.
And the determining unit is used for determining M data nodes for storing the data to be stored according to the configuration instruction.
And the processing unit is used for storing the data to be stored in the M data nodes.
When the data storage device provided by the embodiment of the application stores a plurality of copies, different copies are located in different storage devices, so that when one storage device for storing backup data fails, the actually reduced number of copies is one, and compared with the situation that a plurality of copies are possibly unavailable when a storage device for storing one copy in the prior art fails, the storage reliability is improved.
In some possible embodiments, the storage resource pool created in the N storage devices is encoded with erasure codes EC. The EC codes are adopted in the storage resource pool created by the storage device, when data are lost or damaged, the lost data can be calculated, and the storage device can still be used after certain data are lost. Compared with the traditional mode that the distributed system can still continuously provide 3 copies of each data after failures such as hardware failure occur, the embodiment of the application saves the storage space and improves the storage utilization rate.
In some possible embodiments, the storage utilization rate of the hard disk is 88.89% when the EC coding adopts the 8 data block 1 parity block 8D1P mode, 80% when the EC coding adopts the 4D1P mode, 80% when the EC coding adopts the 8D2P mode, and 66.67% when the EC coding adopts the 4D2P mode.
In some possible embodiments, the storage device comprises: distributed servers or magnetic arrays.
In some possible embodiments, the data processing software comprises: and distributed processing software Hadoop.
In some possible embodiments, M is 2.
In a third aspect, an embodiment of the present application provides a data storage system, which includes N storage devices, such as the data storage device described in the second aspect or any one of the possible implementation manners of the second aspect, where N is an integer greater than or equal to 2.
In a fourth aspect, an embodiment of the present application provides an electronic device, including:
one or more processors;
storage means for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement a method as described in the first aspect or any one of the possible embodiments of the first aspect.
In a fifth aspect, the present application provides a computer-readable medium, on which a computer program is stored, which when executed by a processor, implements the method as described in the first aspect or any one of the possible implementation manners of the first aspect.
Drawings
Fig. 1 is a schematic flowchart of a data storage method according to an embodiment of the present application.
Fig. 2 is a schematic flowchart of a data storage method according to another embodiment of the present application.
Fig. 3 is an interaction flow diagram of a data storage method according to an embodiment of the present application.
Fig. 4 is a schematic structural diagram of a data storage device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described clearly and completely with reference to the accompanying drawings, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. Other embodiments can be derived by those skilled in the art based on the embodiments in the present application.
Referring to fig. 1, fig. 1 is a data processing method according to an embodiment of the present application, including the following steps.
101. Sending a first control instruction, wherein the first control instruction instructs to install data processing software in N storage devices, a storage resource pool and a virtual machine are created in any one of the N storage devices, any virtual machine uses the created storage resource pool in the corresponding storage device to store data, any virtual machine serves as an optional data node of the data processing software, and N is an integer greater than or equal to 2.
The storage device may be, for example, a distributed server or a magnetic array, etc.
102. Obtaining configuration instructions, the configuration instructions comprising: setting the copy number M and a storage perception strategy when data storage is carried out; the storage aware policy includes: determining M data nodes for storing data, the M data nodes being located in M different storage devices, M being an integer less than or equal to N.
For example, if N is 3 and M is 2, the first control instruction instructs to install the data processing software in 3 storage devices, create a storage resource pool and a virtual machine in any one of the 3 storage devices, where any virtual machine creates data using the created storage resource pool in the storage device corresponding to the virtual machine, and the created 3 virtual machines may serve as optional data nodes of the data processing software. In some possible embodiments, the data processing software may be distributed processing software Hadoop.
103. And determining M data nodes for storing the data to be stored according to the configuration instruction.
For example, if M is 2, two data nodes for saving data to be stored are determined according to the configuration instruction.
104. And storing the data to be stored in the M data nodes.
For example, if M is 2, the data to be stored is saved into the determined consecutive data nodes.
In the data storage method provided by the embodiment of the application, when multiple copies are stored, different copies are located in different storage devices, so that when one storage device for storing backup data fails, the actually reduced number of copies is one, and compared with the situation that a plurality of copies are possibly unavailable when a storage device for storing one copy in the prior art fails, the storage reliability is improved.
Referring to fig. 2, fig. 2 is a schematic flow chart of a data processing method according to another embodiment of the present application. The method comprises the following steps:
201. sending a first control instruction, wherein the first control instruction instructs to install data processing software in N storage devices, a storage resource pool and a virtual machine are created in any one of the N storage devices, any one virtual machine uses the created storage resource pool in the corresponding storage device to store data, any one virtual machine serves as an optional data node of the data processing software, N is an integer greater than or equal to 2, and the storage resource pool is encoded by erasure codes EC.
The storage device may be, for example, a distributed server or a magnetic array, etc. The following description will be made taking a distributed server as an example.
In some possible embodiments, the storage utilization rate of the hard disk is 88.89% when the EC coding adopts the 8 data block 1 parity block 8D1P mode, 80% when the EC coding adopts the 4D1P mode, 80% when the EC coding adopts the 8D2P mode, and 66.67% when the EC coding adopts the 4D2P mode.
202. Obtaining configuration instructions, the configuration instructions comprising: setting the copy number M and a storage perception strategy when data storage is carried out; the storage aware policy includes: determining M data nodes for storing data, the M data nodes being located in M different storage devices, M being an integer less than or equal to N.
For example, if N is 3 and M is 2, the first control instruction instructs to install the data processing software in 3 storage devices, create a storage resource pool and a virtual machine in any one of the 3 storage devices, where any virtual machine creates data using the created storage resource pool in the storage device corresponding to the virtual machine, and the created 3 virtual machines may serve as optional data nodes of the data processing software. In some possible embodiments, the data processing software may be distributed processing software Hadoop.
203. And determining M data nodes for storing the data to be stored according to the configuration instruction.
For example, if M is 2, two data nodes for saving data to be stored are determined according to the configuration instruction.
204. And storing the data to be stored in the M data nodes.
For example, if M is 2, the data to be stored is saved into the determined consecutive data nodes.
The EC codes are adopted in the storage resource pool created by the storage device, when data are lost or damaged, the lost data can be calculated, and the storage device can still be used after certain data are lost. Compared with the traditional mode that the distributed system can still continuously provide 3 copies of each data after failures such as hardware failure occur, the embodiment of the application saves the storage space and improves the storage utilization rate.
Referring to fig. 3, fig. 3 is a schematic interaction flow diagram of a data storage method according to an embodiment of the present application. As shown in fig. 3, the data storage in this embodiment includes the following steps.
301. Software Defined Storage (SDS) is installed on the first Rack Rack1 and the second Rack Rack2, different Storage resource pools are respectively created on the Rack1 and the Rack2, EC (N: N + M) coding is adopted, and in the embodiment, the EC coding adopts 8D1P mode.
In this embodiment, as shown in FIG. 3, storage resource pool 1 may be created by an SDS, with storage resource pool 1 encoded with EC in 8D1P mode. And creating resource pool 2 from the SDS, resource pool 2 employing EC encoding in 8D1P mode.
SDS is a storage architecture that can separate storage software from hardware. Unlike conventional Network Attached Storage (NAS) or Storage Area Network (SAN) systems, SDS is typically implemented on industry standard systems or x86 systems, thereby eliminating the dependency of software on proprietary hardware. SDS typically employs a distributed architecture to promote reliability and scalability, so it is sometimes referred to as SDS as distributed storage. The two are obviously different, the distributed storage refers to an architecture, and the emphasis is that the architecture is distributed; SDS refers to software defined storage, emphasizing software and hardware decoupling.
SDS has the following advantages: (1) and decoupling software and hardware. The storage hardware is commercial off-the-shelf (COTS) COTS, so that locking of a manufacturer is avoided, and the purchasing cost of the equipment is reduced by purchasing the hardware and the software in a layered manner. (2) And the expansibility is strong. The SDS adopts a distributed architecture, the storage specification expands theoretically infinitely, and the storage specification increases linearly with the number of servers (lateral expansion). SAN is limited by the processing capacity of a controller, the specification of a single set of magnetic array is limited, and after the storage specification exceeds the specification of the magnetic array, a set of storage equipment (longitudinal expansion) must be added. (3) The reliability is high.
302. And creating a virtual machine used by Hadoop, wherein the disk of the virtual machine uses the storage pool on the rack where the virtual machine is located.
Specifically, the virtual machine disk on Rack1 uses storage resource pool 1, and the virtual machine on Rack2 uses storage resource pool 2.
Hadoop is a Distributed System infrastructure frequently used in the prior art, and a Hadoop Distributed File System (HDFS) divides nodes into two types, namely Name nodes and Data nodes. The NameNode manages the namespace for the file system. It maintains a file system tree and all files and directories within the entire tree. This information is permanently stored on the local disk in two files: namespace mirror files and edit log files. The NameNode records the data node information of each block in each file, but the NameNode does not permanently store the position information of the blocks, and the information is reconstructed by the data nodes when the system is started.
303. Hadoops are installed in the virtual machines created in step 302, and the virtual machines are Hadoop-hosted as DataNodes.
304. And configuring a rack sensing strategy of Hadoop, and configuring the number of copies to be 2.
When data is written into Hadoop, one DataNode is taken from each of the two racks, and thus the data is written into two different storage pools. The machine frame level reliability ensures data double activity, and the overall reliability is improved.
It can be understood that, when the storage system is actually built, the number of the racks is not limited to two, and a plurality of racks can adopt the same strategy.
The EC codes are adopted in the storage resource pool created by the storage device, when data are lost or damaged, the lost data can be calculated, and the storage device can still be used after certain data are lost. Compared with the traditional mode that the distributed system can still continuously provide 3 copies of each data after failures such as hardware failure occur, the embodiment of the application saves the storage space and improves the storage utilization rate.
Referring to fig. 4, fig. 4 is a block diagram illustrating a data storage device 400 according to an embodiment of the present application, where the data storage device 400 includes:
a sending unit 401, configured to send a first control instruction, where the first control instruction instructs to install data processing software in N storage devices, and create a storage resource pool and a virtual machine in any one of the N storage devices, where any one of the virtual machines uses the created storage resource pool in the storage device corresponding to the virtual machine to store data, and any one of the virtual machines serves as an optional data node of the data processing software, where N is an integer greater than or equal to 2.
A obtaining unit 402, configured to obtain a configuration instruction, where the configuration instruction includes: setting the copy number M and a storage perception strategy when data storage is carried out; the storage aware policy includes: determining M data nodes for storing data, the M data nodes being located in M different storage devices, M being an integer less than or equal to N.
For example, if N is 3 and M is 2, the first control instruction instructs to install the data processing software in 3 storage devices, create a storage resource pool and a virtual machine in any one of the 3 storage devices, where any virtual machine creates data using the created storage resource pool in the storage device corresponding to the virtual machine, and the created 3 virtual machines may serve as optional data nodes of the data processing software. In some possible embodiments, the data processing software may be distributed processing software Hadoop.
A determining unit 403, configured to determine, according to the configuration instruction, M data nodes for storing data to be stored.
For example, if M is 2, two data nodes for saving data to be stored are determined according to the configuration instruction.
A processing unit 404, configured to store the data to be stored in the M data nodes.
For example, if M is 2, the data to be stored is saved into the determined consecutive data nodes.
In the data storage method provided by the embodiment of the application, when multiple copies are stored, different copies are located in different storage devices, so that when one storage device for storing backup data fails, the actually reduced number of copies is one, and compared with the situation that a plurality of copies are possibly unavailable when a storage device for storing one copy in the prior art fails, the storage reliability is improved.
In some possible embodiments, the pool of storage resources created in the N storages is encoded with erasure codes EC. The EC codes are adopted in the storage resource pool created by the storage device, when data are lost or damaged, the lost data can be calculated, and the storage device can still be used after certain data are lost. Compared with the traditional mode that the distributed system can still continuously provide 3 copies of each data after failures such as hardware failure occur, the embodiment of the application saves the storage space and improves the storage utilization rate.
In some possible embodiments, the storage utilization rate of the hard disk is 88.89% when the EC coding adopts the 8 data block 1 parity block 8D1P mode, 80% when the EC coding adopts the 4D1P mode, 80% when the EC coding adopts the 8D2P mode, and 66.67% when the EC coding adopts the 4D2P mode.
The embodiment of the application also provides a data storage system, which comprises N storage devices and an embodiment corresponding to any one of the data storage devices, wherein N is an integer greater than or equal to 2. As shown in fig. 4, the data storage device includes: a sending unit 401, configured to send a first control instruction, where the first control instruction instructs to install data processing software in N storage devices, and create a storage resource pool and a virtual machine in any one of the N storage devices, where any one of the virtual machines uses the created storage resource pool in the storage device corresponding to the virtual machine to store data, and any one of the virtual machines serves as an optional data node of the data processing software, where N is an integer greater than or equal to 2.
A obtaining unit 402, configured to obtain a configuration instruction, where the configuration instruction includes: setting the copy number M and a storage perception strategy when data storage is carried out; the storage aware policy includes: determining M data nodes for storing data, the M data nodes being located in M different storage devices, M being an integer less than or equal to N.
For example, if N is 3 and M is 2, the first control instruction instructs to install the data processing software in 3 storage devices, create a storage resource pool and a virtual machine in any one of the 3 storage devices, where any virtual machine creates data using the created storage resource pool in the storage device corresponding to the virtual machine, and the created 3 virtual machines may serve as optional data nodes of the data processing software. In some possible embodiments, the data processing software may be distributed processing software Hadoop.
A determining unit 403, configured to determine, according to the configuration instruction, M data nodes for storing data to be stored.
For example, if M is 2, two data nodes for saving data to be stored are determined according to the configuration instruction.
A processing unit 404, configured to store the data to be stored in the M data nodes.
For example, if M is 2, the data to be stored is saved into the determined consecutive data nodes.
In the data storage method provided by the embodiment of the application, when multiple copies are stored, different copies are located in different storage devices, so that when one storage device for storing backup data fails, the actually reduced number of copies is one, and compared with the situation that a plurality of copies are possibly unavailable when a storage device for storing one copy in the prior art fails, the storage reliability is improved.
In some possible embodiments, the pool of storage resources created in the N storages is encoded with erasure codes EC. The EC codes are adopted in the storage resource pool created by the storage device, when data are lost or damaged, the lost data can be calculated, and the storage device can still be used after certain data are lost. Compared with the traditional mode that the distributed system can still continuously provide 3 copies of each data after failures such as hardware failure occur, the embodiment of the application saves the storage space and improves the storage utilization rate.
In some possible embodiments, the storage utilization rate of the hard disk is 88.89% when the EC coding adopts the 8 data block 1 parity block 8D1P mode, 80% when the EC coding adopts the 4D1P mode, 80% when the EC coding adopts the 8D2P mode, and 66.67% when the EC coding adopts the 4D2P mode.
An embodiment of the present application further provides an electronic device, including: one or more processors; storage means for storing one or more programs; when executed by the one or more processors, cause the one or more processors to implement a data storage method as in any preceding method embodiment. The method comprises the following steps:
sending a first control instruction, wherein the first control instruction instructs to install data processing software in N storage devices, a storage resource pool and a virtual machine are created in any one of the N storage devices, any virtual machine uses the created storage resource pool in the corresponding storage device to store data, any virtual machine serves as an optional data node of the data processing software, and N is an integer greater than or equal to 2;
obtaining configuration instructions, the configuration instructions comprising: setting the copy number M and a storage perception strategy when data storage is carried out; the storage aware policy includes: determining M data nodes for storing data, wherein the M data nodes are located in M different storage devices, and M is an integer less than or equal to N;
determining M data nodes for storing data to be stored according to the configuration instruction;
and storing the data to be stored in the M data nodes.
In some possible embodiments, the pool of storage resources created in the N storages is encoded with erasure codes EC.
In some possible embodiments, the storage device comprises: distributed servers or magnetic arrays.
In some possible embodiments, the EC encodes: including an 8 data chunk 1 parity chunk 8D1P mode, a 4D1P mode, an 8D2P mode, or a 4D2P mode.
In some possible embodiments, the data processing software comprises: and distributed processing software Hadoop.
In some possible embodiments, the M-2.
In the data storage method provided by the embodiment of the application, when multiple copies are stored, different copies are located in different storage devices, so that when one storage device for storing backup data fails, the actually reduced number of copies is one, and compared with the situation that a plurality of copies are possibly unavailable when a storage device for storing one copy in the prior art fails, the storage reliability is improved. When the storage resource pool created in the N storage devices adopts erasure codes EC coding, when data is lost or damaged, the lost data can be calculated, and the storage resource pool can still be used after some data are lost. Compared with the traditional mode that the distributed system can still continuously provide 3 copies of each data after failures such as hardware failure occur, the embodiment of the application saves the storage space and improves the storage utilization rate.
The present application also provides a computer readable medium, on which a computer program is stored, which when executed by a processor implements the data storage method according to any of the preceding method embodiments. The method comprises the following steps:
sending a first control instruction, wherein the first control instruction instructs to install data processing software in N storage devices, a storage resource pool and a virtual machine are created in any one of the N storage devices, any virtual machine uses the created storage resource pool in the corresponding storage device to store data, any virtual machine serves as an optional data node of the data processing software, and N is an integer greater than or equal to 2;
obtaining configuration instructions, the configuration instructions comprising: setting the copy number M and a storage perception strategy when data storage is carried out; the storage aware policy includes: determining M data nodes for storing data, wherein the M data nodes are located in M different storage devices, and M is an integer less than or equal to N;
determining M data nodes for storing data to be stored according to the configuration instruction;
and storing the data to be stored in the M data nodes.
In some possible embodiments, the pool of storage resources created in the N storages is encoded with erasure codes EC.
In some possible embodiments, the storage device comprises: distributed servers or magnetic arrays.
In some possible embodiments, the EC encodes: including an 8 data chunk 1 parity chunk 8D1P mode, a 4D1P mode, an 8D2P mode, or a 4D2P mode.
In some possible embodiments, the data processing software comprises: and distributed processing software Hadoop.
In some possible embodiments, the M-2.
In the data storage method provided by the embodiment of the application, when multiple copies are stored, different copies are located in different storage devices, so that when one storage device for storing backup data fails, the actually reduced number of copies is one, and compared with the situation that a plurality of copies are possibly unavailable when a storage device for storing one copy in the prior art fails, the storage reliability is improved. When the storage resource pool created in the N storage devices adopts erasure codes EC coding, when data is lost or damaged, the lost data can be calculated, and the storage resource pool can still be used after some data are lost. Compared with the traditional mode that the distributed system can still continuously provide 3 copies of each data after failures such as hardware failure occur, the embodiment of the application saves the storage space and improves the storage utilization rate.
It is to be understood that the terms "first," "second," and the like in the description and in the claims, and in the drawings, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Moreover, the terms "comprises," "comprising," and any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or modules is not necessarily limited to those steps or modules explicitly listed, but may include other steps or modules not expressly listed or inherent to such process, method, article, or apparatus.
While the invention has been described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.
Claims (14)
1. A method of storing data, the method comprising the steps of:
sending a first control instruction, wherein the first control instruction instructs to install data processing software in N storage devices, a storage resource pool and a virtual machine are created in any one of the N storage devices, any virtual machine uses the created storage resource pool in the corresponding storage device to store data, any virtual machine serves as an optional data node of the data processing software, and N is an integer greater than or equal to 2;
obtaining configuration instructions, the configuration instructions comprising: setting the copy number M and a storage perception strategy when data storage is carried out; the storage aware policy includes: determining M data nodes for storing data, wherein the M data nodes are located in M different storage devices, and M is an integer less than or equal to N;
determining M data nodes for storing data to be stored according to the configuration instruction;
and storing the data to be stored in the M data nodes.
2. The data storage method of claim 1,
and the storage resource pool created in the N storage devices is encoded by adopting an erasure code EC.
3. The data storage method of claim 2, wherein the storage device comprises: distributed servers or magnetic arrays.
4. The data storage method of claim 2, wherein the EC encodes: including an 8 data chunk 1 parity chunk 8D1P mode, a 4D1P mode, an 8D2P mode, or a 4D2P mode.
5. The data storage method of claim 1, wherein the data processing software comprises: and distributed processing software Hadoop.
6. The data storage method according to any one of claims 1 to 5, wherein M-2.
7. A data storage device, comprising:
a sending unit, configured to send a first control instruction, where the first control instruction instructs to install data processing software in N storage devices, and create a storage resource pool and a virtual machine in any one of the N storage devices, where any one of the virtual machines uses the created storage resource pool in the storage device corresponding to the virtual machine to store data, and any one of the virtual machines serves as an optional data node of the data processing software, where N is an integer greater than or equal to 2;
an obtaining unit, configured to obtain a configuration instruction, where the configuration instruction includes: setting the copy number M and a storage perception strategy when data storage is carried out; the storage aware policy includes: determining M data nodes for storing data, wherein the M data nodes are located in M different storage devices, and M is an integer less than or equal to N;
the determining unit is used for determining M data nodes for storing data to be stored according to the configuration instruction;
and the processing unit is used for storing the data to be stored in the M data nodes.
8. The data storage device of claim 7, wherein the created pool of storage resources of the N storage devices is encoded with Erasure Codes (EC).
9. The data storage device of claim 8, wherein the storage device comprises: distributed servers or magnetic arrays.
10. The data storage device of claim 8, wherein the EC encodes: including an 8 data chunk 1 parity chunk 8D1P mode, a 4D1P mode, an 8D2P mode, or a 4D2P mode.
11. The data storage device of claim 7, wherein the data processing software comprises: and distributed processing software Hadoop.
12. The data storage device of any of claims 7 to 11, wherein M-2.
13. An electronic device, comprising:
one or more processors;
storage means for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement a data storage method as claimed in any one of claims 1 to 6.
14. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the data storage method of any one of claims 1 to 6.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910926872.3A CN112578992B (en) | 2019-09-27 | 2019-09-27 | Data storage method and data storage device |
PCT/CN2020/111903 WO2021057377A1 (en) | 2019-09-27 | 2020-08-27 | Data storage method and data storage device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910926872.3A CN112578992B (en) | 2019-09-27 | 2019-09-27 | Data storage method and data storage device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112578992A true CN112578992A (en) | 2021-03-30 |
CN112578992B CN112578992B (en) | 2022-07-22 |
Family
ID=75110606
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910926872.3A Active CN112578992B (en) | 2019-09-27 | 2019-09-27 | Data storage method and data storage device |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN112578992B (en) |
WO (1) | WO2021057377A1 (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108667867A (en) * | 2017-03-29 | 2018-10-16 | 华为技术有限公司 | Date storage method and device |
US20190107955A1 (en) * | 2015-03-31 | 2019-04-11 | International Business Machines Corporation | Storage Pool Capacity Management |
CN109799948A (en) * | 2017-11-17 | 2019-05-24 | 航天信息股份有限公司 | A kind of date storage method and device |
CN109828868A (en) * | 2019-01-04 | 2019-05-31 | 新华三技术有限公司成都分公司 | Date storage method, device, management equipment and dual-active data-storage system |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110058790B (en) * | 2018-01-18 | 2022-05-13 | 伊姆西Ip控股有限责任公司 | Method, apparatus and computer program product for storing data |
CN108287669B (en) * | 2018-01-26 | 2019-11-12 | 平安科技(深圳)有限公司 | Date storage method, device and storage medium |
CN108846009B (en) * | 2018-04-28 | 2021-02-05 | 北京奇艺世纪科技有限公司 | Copy data storage method and device in ceph |
WO2020010502A1 (en) * | 2018-07-10 | 2020-01-16 | 深圳花儿数据技术有限公司 | Distributed data redundant storage method based on consistent hash algorithm |
CN109408597A (en) * | 2018-11-29 | 2019-03-01 | 广东电网有限责任公司 | A kind of power grid metering big data storage system and its creation method |
-
2019
- 2019-09-27 CN CN201910926872.3A patent/CN112578992B/en active Active
-
2020
- 2020-08-27 WO PCT/CN2020/111903 patent/WO2021057377A1/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190107955A1 (en) * | 2015-03-31 | 2019-04-11 | International Business Machines Corporation | Storage Pool Capacity Management |
CN108667867A (en) * | 2017-03-29 | 2018-10-16 | 华为技术有限公司 | Date storage method and device |
CN109799948A (en) * | 2017-11-17 | 2019-05-24 | 航天信息股份有限公司 | A kind of date storage method and device |
CN109828868A (en) * | 2019-01-04 | 2019-05-31 | 新华三技术有限公司成都分公司 | Date storage method, device, management equipment and dual-active data-storage system |
Also Published As
Publication number | Publication date |
---|---|
CN112578992B (en) | 2022-07-22 |
WO2021057377A1 (en) | 2021-04-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11726850B2 (en) | Increasing or decreasing the amount of log data generated based on performance characteristics of a device | |
US11740826B2 (en) | Policy-based hierarchical data protection in distributed storage | |
EP2787446B1 (en) | Distributed storage method, device and system | |
EP3101530B1 (en) | Data distribution method, data storage method, related apparatus, and system | |
US9916198B2 (en) | Erasure coding and replication in storage clusters | |
CN105378676B (en) | The simple correcting and eleting codes locally generated | |
CN101539873B (en) | Data recovery method, data node and distributed file system | |
CN106776130B (en) | Log recovery method, storage device and storage node | |
US8839031B2 (en) | Data consistency between virtual machines | |
CN103942112A (en) | Magnetic disk fault-tolerance method, device and system | |
US11442827B2 (en) | Policy-based hierarchical data protection in distributed storage | |
CN104486438A (en) | Disaster-tolerant method and disaster-tolerant device of distributed storage system | |
CN109165112B (en) | Fault recovery method, system and related components of metadata cluster | |
CN115113819A (en) | Data storage method, single-node server and equipment | |
CN109324931B (en) | Method for realizing vmware mount recovery in data de-duplication system | |
CN111177104B (en) | Log downloading and flushing method and device of NAS storage system | |
CN112578992B (en) | Data storage method and data storage device | |
CN116389233A (en) | Container cloud management platform active-standby switching system, method and device and computer equipment | |
US20150269044A1 (en) | Storage aggregate restoration | |
CN118626432A (en) | Data processing method, storage system, network interface device and storage medium | |
CN117493074A (en) | Virtual machine increment recovery method, device, computer equipment and storage medium | |
CN118605795A (en) | Distributed data processing method, apparatus, device, medium and program product | |
CN116069545A (en) | Method, device and equipment for data asynchronous redundancy coding | |
CN116974482A (en) | ESS elastic storage system, method and device | |
CN116775596A (en) | Data migration method and device, server and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |