WO2017162177A1 - Redundant storage system, redundant storage method and redundant storage device - Google Patents

Redundant storage system, redundant storage method and redundant storage device Download PDF

Info

Publication number
WO2017162177A1
WO2017162177A1 PCT/CN2017/077754 CN2017077754W WO2017162177A1 WO 2017162177 A1 WO2017162177 A1 WO 2017162177A1 CN 2017077754 W CN2017077754 W CN 2017077754W WO 2017162177 A1 WO2017162177 A1 WO 2017162177A1
Authority
WO
WIPO (PCT)
Prior art keywords
storage
redundant
network
devices
node
Prior art date
Application number
PCT/CN2017/077754
Other languages
French (fr)
Chinese (zh)
Inventor
王东临
金友兵
莫仲华
Original Assignee
北京书生国际信息技术有限公司
书生云公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京书生国际信息技术有限公司, 书生云公司 filed Critical 北京书生国际信息技术有限公司
Publication of WO2017162177A1 publication Critical patent/WO2017162177A1/en
Priority to US16/139,712 priority Critical patent/US10782898B2/en
Priority to US16/378,076 priority patent/US20190235777A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/18Error detection or correction of the data by redundancy in hardware using passive fault-masking of the redundant circuits
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0662Virtualisation aspects

Definitions

  • the present invention relates to the field of data storage technologies, and in particular, to a redundant storage system, a redundant storage method, and a redundant storage device.
  • FIG. 1 shows a schematic diagram of the architecture of a prior art redundant storage system.
  • each storage node S is connected to a TCP/IP network (through a core switch) through an access network switch.
  • Each storage node is a separate physical server, and each server has its own storage medium.
  • Each storage node is connected by a storage network such as an IP network to form a storage pool.
  • each compute node C is also connected to the TCP/IP network (through the core network switch) through the access network switch to access the entire storage pool over the TCP/IP network.
  • the storage node is located on the storage medium side, and the storage medium is a built-in disk of the physical machine where the storage node is located, and the storage node is equivalent to a control machine, a storage node, and a local physical machine of all storage media in the local physical machine. All storage media within it constitute a storage device.
  • the disk mounted on each storage node S can be used for redundancy management through redundant storage, when a storage node S fails, the disk mounted under the storage node can no longer be used. Being read and written, and restoring the data in the disk mounted by the failed storage node S will seriously affect the working efficiency of the entire redundant storage system.
  • the embodiments of the present invention provide a redundant storage system, a redundant storage method, and a redundant storage device, which solve the problem of low efficiency of disaster recovery processing based on the structure of the traditional redundant storage system.
  • An embodiment of the invention provides a redundant storage system, including:
  • At least two storage nodes connected to the storage network
  • Each of the storage nodes accesses at least two storage devices through the storage network, and is redundantly stored between at least one storage block of each of the at least two storage devices accessed by the same storage node.
  • the data is saved, wherein the storage block is a complete storage medium or is part of a storage medium.
  • An embodiment of the present invention further provides a redundant storage method, where the redundant storage system includes: a storage network; at least two storage nodes connected to the storage network; and at least two storage devices connected to the a storage network, each of the storage devices including at least one storage medium; wherein each of the storage nodes accesses at least two storage devices through the storage network; the method includes:
  • An embodiment of the present invention further provides a redundant storage device, where the redundant storage system includes: a storage network; at least two storage nodes connected to the storage network; and at least two storage devices connected to the a storage network, each of the storage devices including at least one storage medium; wherein each of the storage nodes accesses at least two storage devices through the storage network; the redundant storage device includes:
  • a redundant storage module configured to be in at least two storage devices accessed by the same storage node The data is stored in a redundant storage manner between at least one of the storage blocks of each storage device, wherein the storage block is a complete storage medium or a part of a storage medium.
  • An embodiment of the present invention also provides a computer program product embodied in a computer readable storage medium having computer readable program code portions stored therein, the computer readable program code portion being Configured to perform the redundant storage method as described previously.
  • the present invention provides a redundant storage system, a redundant storage method, and a redundant storage device.
  • the storage node and the storage device are independently connected to the storage network, and each storage node can access multiple storage devices through the storage network. And is redundantly stored between multiple storage devices accessed by the same storage node. In this way, even if a storage device fails, the data in the storage device can be quickly recovered through other working storage devices, which greatly improves the disaster recovery processing efficiency of the entire redundant storage system.
  • Figure 1 shows the architecture of a traditional storage system.
  • FIG. 2 is a schematic structural diagram of a storage system according to an embodiment of the invention.
  • FIG. 3 is a schematic structural diagram of a storage system according to another embodiment of the present invention.
  • FIG. 4 is a schematic structural diagram of a storage pool using redundant storage according to an embodiment of the present invention.
  • FIG. 5 is a schematic structural diagram of a storage pool using redundant storage according to another embodiment of the present invention.
  • FIG. 2 is a schematic structural diagram of a storage system according to an embodiment of the invention. As shown 2 is shown.
  • the storage system includes: a storage network; at least two storage nodes connected to the storage network; and at least two storage devices connected to the storage network, each of the storage devices including at least one storage medium.
  • the storage node is a software module that provides a storage service, instead of a hardware server including a storage medium in a general sense.
  • the storage nodes in the description of the subsequent embodiments also refer to the same concepts, and therefore will not be described again.
  • each storage node accesses at least two storage devices through a storage network, and is redundantly stored between at least one storage block of each of at least two storage devices accessed by the same storage node.
  • the way to save data where the storage block is a complete storage medium or part of a storage medium. It can be seen that since the data is stored in the storage block of different storage devices in a redundant storage manner, the storage system is a redundant storage system.
  • the storage node is located on the storage medium side, or strictly speaking, the storage medium is a built-in disk of the physical machine where the storage node is located.
  • the physical machine where the storage node is located is independent of the storage device, and the storage device is more used as a channel connecting the storage medium and the storage network, and the storage node and the storage device are independently connected to the storage network, and each of the storage nodes and the storage device are independently connected to the storage network.
  • the storage node can access multiple storage devices through the storage network, and is redundantly stored between multiple storage devices accessed by the same storage node, thereby implementing redundant storage across the storage device under the same storage node. In this way, even if a storage device fails, the data in the storage device can be quickly restored through other working storage devices, which greatly improves the disaster recovery processing efficiency of the entire storage system.
  • the storage network is configured such that each storage node can access all storage media without the aid of other storage nodes.
  • all storage media of the present invention can be shared by all storage nodes, and all storage media in the storage system actually constitute a global storage pool accessible by all storage nodes.
  • the storage node side further includes a compute node, and the calculation section
  • the point and storage nodes are set up in a physical server that is connected to the storage device through the storage network.
  • the aggregated storage system in which the computing node and the storage node are located in the same physical machine constructed by using the embodiment of the present invention can reduce the number of physical devices required, thereby reducing the cost.
  • the compute node can also access the storage resources it wishes to access locally.
  • the data exchange between the two can be as simple as shared memory, and the performance is particularly excellent.
  • the length of the I/O data path between the computing node and the storage medium includes: (1) the storage medium to the storage node; and (2) the storage node to the computing node aggregated in the same physical server. (CPU bus path).
  • the I/O data path length between the compute node and the storage medium includes: (1) storage medium to storage node; (2) storage node to storage Network access network switch; (3) storage network access network switch to core network switch; (4) core network switch to computing network access network switch; and (5) computing network access network switch to computing node.
  • the total data path of the storage system of the embodiment of the present invention is only close to item (1) of the conventional storage system. That is, the storage system provided by the embodiment of the present invention can greatly improve the I/O channel performance of the storage system by extremely compressing the I/O data path length, and the actual running effect is very close to the I/O of the local hard disk. O channel.
  • the storage node may be a virtual machine of the physical server, a container, a module running directly on the physical operating system of the server, or a combination thereof (for example, a firmware of a part of the storage node on the expansion card, The other part is a module in the physical operating system, and some are in the virtual machine); the computing node can also be a virtual machine of the same physical machine server, a container, and a module running directly on the physical operating system of the server. Or the combination above.
  • each storage node may correspond to one or more compute nodes.
  • one physical server can be divided into multiple virtual machines, one of which is used as a storage node, and the other virtual machine is used as a computing node; or a module on a physical OS is utilized. Do storage nodes for better performance.
  • the virtualization technology forming the virtual machine may be KVM or Zen or VMware or Hyper-V virtualization technology
  • the container technology forming the container may be Docker or Rocket or Odin or Chef or LXC or Vagrant. Or Ansible or Zone or Jail or Hyper-V container technology.
  • each storage node is only responsible for managing a fixed storage medium at the same time, and one storage medium is not simultaneously written by multiple storage nodes to avoid data conflict, thereby enabling each storage node to be able to implement each storage node.
  • the storage medium managed by it is accessed without resorting to other storage nodes, and the integrity of the data stored in the storage system can be guaranteed.
  • all the storage media in the system may be divided according to storage logic.
  • the storage pool of the entire system may be divided into a logical storage hierarchy structure such as a storage area, a storage group, and a storage block.
  • the storage block is the smallest storage unit.
  • the storage pool may be divided into at least two storage areas.
  • each storage area may be divided into at least one storage group. In a preferred embodiment, each storage area is divided into at least two storage groups.
  • the storage area and the storage group can be merged such that one level can be omitted in the storage hierarchy.
  • each storage area may be composed of at least one storage block, wherein the storage block may be a complete storage medium or a part of a storage medium.
  • each storage area may be composed of at least two storage blocks, and when any one of the storage blocks fails, the complete storage block may be calculated from the remaining storage blocks in the group.
  • the data is stored.
  • the redundant storage mode can be multi-copy mode, independent redundant disk array (RAID) mode, and erasure code mode.
  • the redundant storage mode can be established by the ZFS file system.
  • the plurality of storage blocks included in each storage area (or storage group) are not located in the same storage medium, or even in the same Storage devices. In an embodiment of the invention, any two storage blocks included in each storage area (or storage group) are not located in the same storage medium/storage device. In another embodiment of the present invention, the number of storage blocks located in the same storage medium/storage device in the same storage area (or storage group) is preferably less than or equal to the redundancy of the redundant storage.
  • the redundancy of redundant storage is 1, and the number of storage blocks of the same storage group of the same storage device is at most 1; for RAID 6, the redundancy of redundant storage With a redundancy of 2, the number of memory blocks in the same storage group on the same storage device is up to 2.
  • the storage system further includes a fault tolerance level.
  • the adjustment module is configured to adjust the storage pool by adjusting the number of storage blocks in the storage group that allow simultaneous failures and/or selecting the number of storage blocks for aggregation into the same storage group from each of the at least two storage devices of the storage pool Fault tolerance level. Specifically, if the number of storage blocks in the storage group that allow simultaneous failures is represented by D, the storage for aggregation into the same storage group is selected from each of the at least two storage devices of the storage pool by N.
  • the number of blocks, in M represents the number of storage devices in the storage pool that are allowed to fail simultaneously.
  • each storage node can only read and write its own managed storage area. Since the read operations of the same storage block by multiple storage nodes do not conflict with each other, and multiple storage nodes write one storage block at the same time, conflicts are easily generated. Therefore, in another embodiment, each storage node can only Write the storage area managed by yourself, but you can read the storage area managed by yourself and the storage area managed by other storage nodes, that is, the write operation is local, but the read operation can be global.
  • the storage system may further include a storage control node coupled to the storage network for determining a storage area managed by each storage node.
  • each storage node may include a storage allocation module for determining a storage area managed by the storage node, This can be achieved by a communication and coordination processing algorithm between the various storage allocation modules included in each storage node, which algorithm can for example be based on load balancing between the various storage nodes.
  • other or all of the storage nodes may be configured such that the storage nodes take over the storage area previously managed by the failed storage node.
  • one of the storage nodes may take over a storage area managed by the failed storage node, or may be taken over by at least two other storage nodes, wherein each storage node takes over a portion of the storage area managed by the failed storage node, For example, at least two other storage nodes respectively take over different storage groups in the storage area.
  • the storage medium may include, but is not limited to, a hard disk, a flash memory, an SRAM, a DRAM, an NVME, or an NVRAM.
  • the access interface of the storage medium may include, but is not limited to, a SAS interface, a SATA interface, a PCI/e interface, and a DIMM. Interface, NVMe interface, SCSI interface, AHCI interface.
  • the storage network may include at least one storage switching device, and the storage node accesses the storage medium through data exchange between the storage switching devices included therein.
  • the storage node and the storage medium are respectively connected to the storage switching device through the storage channel.
  • the storage switching device may be a SAS switch or a PCI/e switch.
  • the storage channel may be a SAS (Serial Attached SCSI) channel or a PCI/e channel.
  • the SAS-based switching solution has the advantages of high performance, large bandwidth, and a large number of disks per device.
  • HBA host adapter
  • the storage provided by the SAS system can be easily accessed by multiple servers connected simultaneously.
  • the SAS switch is connected to the storage device through a SAS line, and the storage device and the storage medium are also connected by a SAS interface.
  • the storage device internally connects the SAS channel to each storage medium (may be in the storage device) Internally set a SAS switch chip). Since the bandwidth of a SAS network can reach 24Gb or 48Gb, it is dozens of times that of Gigabit Ethernet, and Several times the cost of 10 Gigabit Ethernet; at the same time, the link layer SAS has an order of magnitude improvement over the IP network. At the transport layer, due to the TCP handshake three times, the overhead is high and the TCP delay acknowledgement mechanism is slow. The startup sometimes causes a delay of 100 milliseconds.
  • SAS networks offer significant advantages in terms of bandwidth and latency over Ethernet-based TCP/IP. Those skilled in the art will appreciate that the performance of the PCI/e channel can also be adapted to the needs of the system.
  • the storage network may include at least two storage switching devices, each of which may be connected to any one of the storage devices through any one of the storage switching devices, thereby being connected to the storage medium.
  • the storage node reads and writes data on the storage device through other storage switching devices.
  • the storage devices in the storage system 30 are constructed as a plurality of JBODs 307-310, which are respectively connected to the two SAS switches 305 and 306 through SAS data lines, which constitute the switching core of the storage network included in the storage system.
  • the front end is at least two servers 301 and 302, each of which is connected to the two SAS switches 305 and 306 via an HBA device (not shown) or a SAS interface on the motherboard.
  • Each server has a storage node that manages some or all of the disks in all JBOD disks using information obtained from the SAS links.
  • the storage area, the storage group, and the storage block described above in the application file may be used to divide the JBOD disk into different storage groups.
  • Each storage node manages one or more sets of such storage groups.
  • redundant storage is used inside each storage group, redundantly stored metadata can exist on the disk, so that redundant storage can be directly recognized from the disk by other storage nodes.
  • the storage node can install a monitoring and management module that is responsible for monitoring the status of local storage and other servers.
  • a JBOD is abnormal overall or a disk on the JBOD is abnormal, data reliability is ensured by redundant storage.
  • the management module in the storage node on another pre-configured server will locally identify and take over the disk managed by the storage node of the failed server according to the data on the disk.
  • the storage node originally provided by the storage node of the faulty server will also be extended on the storage node on the new server. So far, a new highly available global storage pool structure has been implemented.
  • the exemplary storage system 30 is constructed to provide a multi-point, controllable, globally accessible storage pool.
  • the hardware uses multiple servers to provide external services, and uses JBOD to store disks.
  • Multiple JBODs are connected to two SAS switches, and the two switches are respectively connected to the server's HBA cards, thereby ensuring that all disks on the JBOD can be accessed by all servers.
  • the SAS redundant link also ensures high availability on the link.
  • each server uses redundant storage technology to select redundant disks from each JBOD to avoid redundant data loss.
  • the module that monitors the overall state will schedule another server to access the disks managed by the storage node of the failed server through the SAS channel, and quickly take over the disks that the other party is responsible for, achieving high-available global storage.
  • JBOD storage disk is illustrated in FIG. 3 as an example, it should be understood that the embodiment of the present invention as shown in FIG. 3 also supports a storage device other than JBOD.
  • the above is an example in which one storage medium (entire) is used as one storage block, and the same applies to a case where a part of one storage medium is used as one storage block.
  • An embodiment of the present invention further provides a redundant storage method, where the applicable storage system includes: a storage network; at least two storage nodes connected to the storage network; and at least two storage devices connected to the storage network, each storage The device includes at least one storage medium; wherein each storage node accesses at least two storage devices through the storage network; the method includes:
  • all storage media in the storage system form a storage pool.
  • the storage pool is a global storage pool as described above, that is, all storage media in the storage pool can be shared by all storage nodes in the storage system, and each storage node can access the storage pool without using other storage nodes. All storage media.
  • the redundant storage method based on the global storage pool may be implemented by first selecting multiple storage devices from the storage pool, and then selecting each of the selected multiple storage devices. At least one storage block aggregates all the storage blocks selected by the above process into a storage group. Thus, in the storage group, data is stored in all storage blocks of the storage group in a redundant manner. When a certain storage block in the storage group fails, data in the other storage block in the storage group can be used to acquire data in the failed storage block.
  • the storage blocks in one storage group do not necessarily come from all storage devices in the storage pool, and the storage devices in the storage pool are not necessarily all used for redundant storage, and are not selected for redundant storage. Storage devices and storage blocks can be used as hot spare devices that are not normally used.
  • the manner of redundant storage between the storage blocks in the storage group may be specifically implemented by a multiple copy mode, a RAID mode, or an erasure code mode.
  • the specific manner of the redundant storage between the storage blocks in the storage group is not Make a limit.
  • a plurality of storage groups may also be aggregated into a storage area.
  • the fault tolerance level of the storage pool is related to the fault tolerance level of the redundant storage in the storage group, so the fault tolerance level of the storage pool can be adjusted by adjusting the storage group.
  • the number of memory blocks that are allowed to fail at the same time and/or the number of memory blocks for aggregation into the same memory group are each selected from at least two storage devices of the storage pool.
  • the specific adjustment manner may be the same as the method performed by the fault tolerance level adjustment module in the foregoing storage system, and details are not described herein again.
  • FIG. 4 is a schematic structural diagram of a storage pool using redundant storage according to an embodiment of the present invention.
  • the storage pool 40 includes five storage devices JBOD1 to JBOD5, and each storage device includes five storage blocks.
  • the five storage devices JBOD1 to JBOD5 in the storage pool 40 are used for redundant storage, and each storage device selects one storage block to be aggregated into a storage group in an erasure code.
  • the memory blocks D1 to D5 are aggregated into one memory group P1, and D11 to D15 can be aggregated into another memory group.
  • the data is stored in the storage blocks D1 to D5 in an erasure code, and the check level of the erasure code is 2, that is, the number of storage blocks allowed to simultaneously fail in the storage group P1 is 2, then the storage is The number of storage devices allowed to fail simultaneously in pool 40 is also two.
  • FIG. 5 is a schematic structural diagram of a storage pool using redundant storage according to another embodiment of the present invention.
  • the five storage devices JBOD1 to JBOD5 in the storage pool 50 are also used for redundant storage, but each storage device selects two storage blocks and is aggregated in an erasure code.
  • Storage group For example, the memory blocks D1 to D15 are aggregated into one memory group P2, and the memory blocks D21 to D35 can be aggregated into another memory group.
  • the check level of the erasure code is 3, that is, the number of storage blocks allowed to be simultaneously faulty in the storage group P2 is three, and the number of storage devices that allow simultaneous failure in the storage pool 50 is 3/2.
  • the integer bit 1, that is, the number of storage devices in the storage pool 50 that allow simultaneous failure is only one.
  • An embodiment of the present invention further provides a redundant storage device, where the storage system includes: a storage network; at least two storage nodes connected to the storage network; and at least two storage devices connected to the storage network Each of the storage devices includes at least one storage medium; wherein each of the storage nodes accesses at least two storage devices through the storage network; the redundant storage device includes:
  • a redundant storage module configured to store data in a redundant manner between at least one of each of the at least two storage devices accessed by the same storage node, wherein the storage block is a complete
  • the storage medium is either part of a storage medium.
  • An embodiment of the invention further provides a computer program product of a computer readable storage medium, comprising computer program code, which when executed by a processor, enables the processor to be implemented according to the method of the embodiments of the invention A redundant storage method of an embodiment.
  • the computer storage medium can be any tangible medium such as a floppy disk, CD-ROM, DVD, hard drive, or even network media.
  • an implementation form of the embodiments of the present invention described above may be a computer program product
  • the method or apparatus of the embodiments of the present invention may be implemented in software, hardware, or a combination of software and hardware.
  • the hardware portion can be implemented using dedicated logic; the software portion can be stored in memory and executed by a suitable instruction execution system, such as a microprocessor or dedicated design hardware.
  • a suitable instruction execution system such as a microprocessor or dedicated design hardware.
  • processor control code such as a carrier medium such as a magnetic disk, CD or DVD-ROM, such as a read only memory.
  • Such code is provided on a programmable memory (firmware) or on a data carrier such as an optical or electronic signal carrier.
  • the method and apparatus of the present invention may be implemented by hardware circuits such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, etc., or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc., also It can be implemented by software executed by various types of processors, or by a combination of the above-described hardware circuits and software such as firmware.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Hardware Redundancy (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides a redundant storage system, a redundant storage method and a redundant storage device and solves the problem that a system structure based on traditional redundant storage is low in disaster tolerance treatment efficiency. The redundant storage system comprising a storage network, at least two storage nodes and at least two storage devices, wherein the at least two storage nodes are connected to the storage network; the at least two storage devices are connected to the storage network; each storage device comprises at least one storage medium; each storage node accessesthe at least two storage devices via the storage network; at least one storage block of each storage device in the at least two storage devices accessed by the same storage node stores data in a redundant storage manner, and the storage block is a complete storage medium or one part of a storage medium.

Description

冗余存储系统、冗余存储方法和冗余存储装置Redundant storage systems, redundant storage methods, and redundant storage devices 技术领域Technical field
本发明涉及数据存储技术领域,具体涉及一种冗余存储系统、冗余存储方法和冗余存储装置。The present invention relates to the field of data storage technologies, and in particular, to a redundant storage system, a redundant storage method, and a redundant storage device.
背景技术Background technique
随着计算机应用规模越来越大,对存储空间的需求也与日俱增。对应的,将复数设备的存储资源(比如存储介质)统合为一体作为一个存储池来提供存储服务成为了现在的主流。在传统的冗余存储系统中,该冗余存储系统通常是由TCP/IP网络连接多个分布式存储节点组成的。图1示出现有技术的冗余存储系统的架构示意图。如图1所示,在传统的冗余存储系统中,各存储节点S通过接入网交换机连接到TCP/IP网络(通过核心交换机实现)。每个存储节点都是单独一台物理服务器,每台服务器都有自己的若干存储介质。各存储节点通过如IP网络这样的存储网络连接起来,构成一个存储池。在核心交换机的另一侧,各计算节点C也通过接入网交换机连接到TCP/IP网络(通过核心网交换机实现),以通过TCP/IP网络访问整个存储池。As computer applications become larger and larger, the demand for storage space is increasing. Correspondingly, integrating the storage resources of multiple devices (such as storage media) into one storage pool to provide storage services has become the mainstream. In a conventional redundant storage system, the redundant storage system is usually composed of a plurality of distributed storage nodes connected by a TCP/IP network. FIG. 1 shows a schematic diagram of the architecture of a prior art redundant storage system. As shown in FIG. 1, in a conventional redundant storage system, each storage node S is connected to a TCP/IP network (through a core switch) through an access network switch. Each storage node is a separate physical server, and each server has its own storage medium. Each storage node is connected by a storage network such as an IP network to form a storage pool. On the other side of the core switch, each compute node C is also connected to the TCP/IP network (through the core network switch) through the access network switch to access the entire storage pool over the TCP/IP network.
在该传统的冗余存储系统中,存储节点位于存储介质侧,存储介质是存储节点所在物理机的内置盘,存储节点相当于是本地物理机内所有存储介质的控制机,存储节点和本地物理机内的所有存储介质构成一个存储设备。虽然每个存储节点S下所挂载的磁盘之间可通过冗余存储的方式实现容灾处理,但当某个存储节点S发生故障后,挂载在该存储节点下的磁盘就都不能再被读写了,而恢复该失效存储节点S所挂载磁盘中的数据会严重影响整个冗余存储系统的工作效率。 In the conventional redundant storage system, the storage node is located on the storage medium side, and the storage medium is a built-in disk of the physical machine where the storage node is located, and the storage node is equivalent to a control machine, a storage node, and a local physical machine of all storage media in the local physical machine. All storage media within it constitute a storage device. Although the disk mounted on each storage node S can be used for redundancy management through redundant storage, when a storage node S fails, the disk mounted under the storage node can no longer be used. Being read and written, and restoring the data in the disk mounted by the failed storage node S will seriously affect the working efficiency of the entire redundant storage system.
发明内容Summary of the invention
有鉴于此,本发明实施例提供了一种冗余存储系统、冗余存储方法和冗余存储装置,解决了基于传统冗余存储系统结构容灾处理效率低的问题。In view of this, the embodiments of the present invention provide a redundant storage system, a redundant storage method, and a redundant storage device, which solve the problem of low efficiency of disaster recovery processing based on the structure of the traditional redundant storage system.
本发明一实施例提供一种冗余存储系统,包括:An embodiment of the invention provides a redundant storage system, including:
存储网络;Storage network
至少两个存储节点,连接至所述存储网络;以及At least two storage nodes connected to the storage network;
至少两个存储设备,连接至所述存储网络,每个所述存储设备包括至少一个存储介质;At least two storage devices connected to the storage network, each of the storage devices including at least one storage medium;
其中,每个所述存储节点通过所述存储网络访问至少两个存储设备,被同一个存储节点访问的至少两个存储设备中的每个存储设备的至少一个存储块间以冗余存储的方式保存数据,其中,所述存储块是一个完整的存储介质或者是一个存储介质的一部分。Each of the storage nodes accesses at least two storage devices through the storage network, and is redundantly stored between at least one storage block of each of the at least two storage devices accessed by the same storage node. The data is saved, wherein the storage block is a complete storage medium or is part of a storage medium.
本发明一实施例还提供一种冗余存储方法,所适用的冗余存储系统包括:存储网络;至少两个存储节点,连接至所述存储网络;以及至少两个存储设备,连接至所述存储网络,每个所述存储设备包括至少一个存储介质;其中,每个所述存储节点通过所述存储网络访问至少两个存储设备;所述方法包括:An embodiment of the present invention further provides a redundant storage method, where the redundant storage system includes: a storage network; at least two storage nodes connected to the storage network; and at least two storage devices connected to the a storage network, each of the storage devices including at least one storage medium; wherein each of the storage nodes accesses at least two storage devices through the storage network; the method includes:
在被同一个存储节点访问的至少两个存储设备中的每个存储设备的至少一个存储块间以冗余存储的方式保存数据,其中,所述存储块是一个完整的存储介质或者是一个存储介质的一部分。Saving data in a redundant storage manner between at least one of each of at least two storage devices accessed by the same storage node, wherein the storage block is a complete storage medium or a storage Part of the media.
本发明一实施例还提供一种冗余存储装置,所适用的冗余存储系统包括:存储网络;至少两个存储节点,连接至所述存储网络;以及至少两个存储设备,连接至所述存储网络,每个所述存储设备包括至少一个存储介质;其中,每个所述存储节点通过所述存储网络访问至少两个存储设备;所述冗余存储装置包括:An embodiment of the present invention further provides a redundant storage device, where the redundant storage system includes: a storage network; at least two storage nodes connected to the storage network; and at least two storage devices connected to the a storage network, each of the storage devices including at least one storage medium; wherein each of the storage nodes accesses at least two storage devices through the storage network; the redundant storage device includes:
冗余存储模块,配置为在被同一个存储节点访问的至少两个存储设备中 的每个存储设备的至少一个存储块间以冗余存储的方式保存数据,其中,所述存储块是一个完整的存储介质或者是一个存储介质的一部分。A redundant storage module configured to be in at least two storage devices accessed by the same storage node The data is stored in a redundant storage manner between at least one of the storage blocks of each storage device, wherein the storage block is a complete storage medium or a part of a storage medium.
本发明一实施例还提供一种在计算机可读存储介质中实现的计算机程序产品,所述计算机可读存储介质具有存储于其中的计算机可读程序代码部分,所述计算机可读程序代码部分被配置为执行如前所述的冗余存储方法。An embodiment of the present invention also provides a computer program product embodied in a computer readable storage medium having computer readable program code portions stored therein, the computer readable program code portion being Configured to perform the redundant storage method as described previously.
本发明实施例提供的一种冗余存储系统、冗余存储方法和冗余存储装置,存储节点和存储设备是各自独立接入存储网络的,每个存储节点可通过存储网络访问多个存储设备,且被同一个存储节点访问的多个存储设备之间是冗余存储的。这样即使一个存储设备出现故障,该存储设备中的数据仍能通过其他正常工作的存储设备而快速恢复,大大提高了整个冗余存储系统容灾处理效率。The present invention provides a redundant storage system, a redundant storage method, and a redundant storage device. The storage node and the storage device are independently connected to the storage network, and each storage node can access multiple storage devices through the storage network. And is redundantly stored between multiple storage devices accessed by the same storage node. In this way, even if a storage device fails, the data in the storage device can be quickly recovered through other working storage devices, which greatly improves the disaster recovery processing efficiency of the entire redundant storage system.
附图说明DRAWINGS
图1所示为传统存储系统的架构示意图。Figure 1 shows the architecture of a traditional storage system.
图2所示为根据本发明一实施例所提供的存储系统的架构示意图。FIG. 2 is a schematic structural diagram of a storage system according to an embodiment of the invention.
图3所示为根据本发明另一实施例所提供的存储系统的架构示意图。FIG. 3 is a schematic structural diagram of a storage system according to another embodiment of the present invention.
图4所示为本发明一实施例所提供的采用冗余存储的存储池的结构示意图。FIG. 4 is a schematic structural diagram of a storage pool using redundant storage according to an embodiment of the present invention.
图5所示为本发明另一实施例所提供的采用冗余存储的存储池的结构示意图。FIG. 5 is a schematic structural diagram of a storage pool using redundant storage according to another embodiment of the present invention.
具体实施方式detailed description
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.
图2所示为根据本发明一实施例所提供的存储系统的架构示意图。如图 2所示。该存储系统包括:存储网络;至少两个存储节点,连接至所述存储网络;以及至少两个存储设备,连接至所述存储网络,每个所述存储设备包括至少一个存储介质。在本发明实施例中,存储节点是一种提供存储服务的软件模块,而非通常意义上的包含存储介质在内的硬件服务器。后续的实施例描述中的存储节点也指代相同的概念,因此不再赘述。FIG. 2 is a schematic structural diagram of a storage system according to an embodiment of the invention. As shown 2 is shown. The storage system includes: a storage network; at least two storage nodes connected to the storage network; and at least two storage devices connected to the storage network, each of the storage devices including at least one storage medium. In the embodiment of the present invention, the storage node is a software module that provides a storage service, instead of a hardware server including a storage medium in a general sense. The storage nodes in the description of the subsequent embodiments also refer to the same concepts, and therefore will not be described again.
在本发明一实施例中,每个存储节点通过存储网络访问至少两个存储设备,被同一个存储节点访问的至少两个存储设备中的每个存储设备的至少一个存储块间以冗余存储的方式保存数据,其中,存储块是一个完整的存储介质或者是一个存储介质的一部分。由此可见,由于数据是以冗余存储的方式存储在了不同存储设备的存储块中,因此该存储系统为一种冗余存储系统。In an embodiment of the present invention, each storage node accesses at least two storage devices through a storage network, and is redundantly stored between at least one storage block of each of at least two storage devices accessed by the same storage node. The way to save data, where the storage block is a complete storage medium or part of a storage medium. It can be seen that since the data is stored in the storage block of different storage devices in a redundant storage manner, the storage system is a redundant storage system.
在传统存储系统结构中,存储节点位于存储介质侧,或者严格来说,存储介质是存储节点所在物理机的内置盘。而在本发明实施例中,存储节点所在的物理机独立于存储设备,存储设备更多作为连接存储介质与存储网络的一个通道,存储节点和存储设备是各自独立接入存储网络的,每个存储节点可通过存储网络访问多个存储设备,且被同一个存储节点访问的多个存储设备之间是冗余存储的,由此实现了同一存储节点下跨存储设备进行的冗余存储。这样即使一个存储设备出现故障,该存储设备中的数据仍能通过其他正常工作的存储设备而快速恢复,大大提高了整个存储系统容灾处理效率。In a conventional storage system architecture, the storage node is located on the storage medium side, or strictly speaking, the storage medium is a built-in disk of the physical machine where the storage node is located. In the embodiment of the present invention, the physical machine where the storage node is located is independent of the storage device, and the storage device is more used as a channel connecting the storage medium and the storage network, and the storage node and the storage device are independently connected to the storage network, and each of the storage nodes and the storage device are independently connected to the storage network. The storage node can access multiple storage devices through the storage network, and is redundantly stored between multiple storage devices accessed by the same storage node, thereby implementing redundant storage across the storage device under the same storage node. In this way, even if a storage device fails, the data in the storage device can be quickly restored through other working storage devices, which greatly improves the disaster recovery processing efficiency of the entire storage system.
这样的方式,使得在需要进行动态平衡时,无需将物理数据在不同的存储介质中进行迁移,只需要通过配置平衡不同的存储节点所管理的存储介质即可。In this way, when dynamic balancing is required, physical data does not need to be migrated in different storage media, and only storage media managed by different storage nodes need to be balanced.
在本发明一实施例中,存储网络被配置为使得每一个存储节点都能够无需借助其他存储节点而访问所有存储介质。由此使得本发明所有的存储介质都可被所有的存储节点共享,存储系统中的所有存储介质实际上构成了一个可被所有存储节点访问的全局存储池。In an embodiment of the invention, the storage network is configured such that each storage node can access all storage media without the aid of other storage nodes. Thus, all storage media of the present invention can be shared by all storage nodes, and all storage media in the storage system actually constitute a global storage pool accessible by all storage nodes.
在本发明另一实施例中,存储节点侧进一步包括计算节点,并且计算节 点和存储节点设置在一台物理服务器中,该物理服务器通过存储网络与存储设备连接。利用本发明实施方式所构建的将计算节点和存储节点位于同一物理机的聚合式存储系统,从整体结构而言,可以减少所需物理设备的数量,从而降低成本。同时,计算节点也可以在本地访问到其希望访问的存储资源。另外,由于将计算节点和存储节点聚合在同一台物理服务器上,两者之间数据交换可以简单到仅仅是共享内存,性能特别优异。In another embodiment of the present invention, the storage node side further includes a compute node, and the calculation section The point and storage nodes are set up in a physical server that is connected to the storage device through the storage network. The aggregated storage system in which the computing node and the storage node are located in the same physical machine constructed by using the embodiment of the present invention can reduce the number of physical devices required, thereby reducing the cost. At the same time, the compute node can also access the storage resources it wishes to access locally. In addition, since the compute nodes and storage nodes are aggregated on the same physical server, the data exchange between the two can be as simple as shared memory, and the performance is particularly excellent.
本发明实施例提供的存储系统中,计算节点到存储介质之间的I/O数据路径长度包括:(1)存储介质到存储节点;以及(2)存储节点到聚合在同一物理服务器的计算节点(CPU总线通路)。而相比之下,图1所示现有技术的存储系统,其计算节点到存储介质之间的I/O数据路径长度包括:(1)存储介质到存储节点;(2)存储节点到存储网络接入网交换机;(3)存储网络接入网交换机到核心网交换机;(4)核心网交换机到计算网络接入网交换机;以及(5)计算网络接入网交换机到计算节点。显然,本发明实施方式的存储系统的总数据路径只接近于传统存储系统的第(1)项。即,本发明实施例提供的存储系统,通过对I/O数据路径长度的极致压缩能够极大地提高了存储系统的I/O通道性能,其实际运行效果非常接近于读写本地硬盘的I/O通道。In the storage system provided by the embodiment of the present invention, the length of the I/O data path between the computing node and the storage medium includes: (1) the storage medium to the storage node; and (2) the storage node to the computing node aggregated in the same physical server. (CPU bus path). In contrast, in the prior art storage system shown in FIG. 1, the I/O data path length between the compute node and the storage medium includes: (1) storage medium to storage node; (2) storage node to storage Network access network switch; (3) storage network access network switch to core network switch; (4) core network switch to computing network access network switch; and (5) computing network access network switch to computing node. Obviously, the total data path of the storage system of the embodiment of the present invention is only close to item (1) of the conventional storage system. That is, the storage system provided by the embodiment of the present invention can greatly improve the I/O channel performance of the storage system by extremely compressing the I/O data path length, and the actual running effect is very close to the I/O of the local hard disk. O channel.
在本发明一实施例中,存储节点可以是物理服务器的一个虚拟机、一个容器、直接运行在服务器的物理操作系统上的一个模块或者上述组合(例如存储节点的一部分在扩展卡上的firmware,另一部分在物理操作系统的一个模块,还有一部分在虚拟机);计算节点也可以是同一个物理机服务器的一个虚拟机、一个容器、直接运行在所述服务器的物理操作系统上的一个模块或者上述组合。在一个实施例中,每个存储节点可以对应一个或多个计算节点。In an embodiment of the invention, the storage node may be a virtual machine of the physical server, a container, a module running directly on the physical operating system of the server, or a combination thereof (for example, a firmware of a part of the storage node on the expansion card, The other part is a module in the physical operating system, and some are in the virtual machine); the computing node can also be a virtual machine of the same physical machine server, a container, and a module running directly on the physical operating system of the server. Or the combination above. In one embodiment, each storage node may correspond to one or more compute nodes.
具体而言,可以将一台物理服务器分成多个虚拟机,其中一台虚拟机做存储节点用,其它虚拟机做计算节点用;也可是利用物理OS上的一个模块 做存储节点用,以便实现更好的性能。Specifically, one physical server can be divided into multiple virtual machines, one of which is used as a storage node, and the other virtual machine is used as a computing node; or a module on a physical OS is utilized. Do storage nodes for better performance.
在本发明一实施例中,形成虚拟机的虚拟化技术可以是KVM或Zen或VMware或Hyper-V虚拟化技术,形成所述容器的容器技术可以是Docker或Rockett或Odin或Chef或LXC或Vagrant或Ansible或Zone或Jail或Hyper-V容器技术。In an embodiment of the invention, the virtualization technology forming the virtual machine may be KVM or Zen or VMware or Hyper-V virtualization technology, and the container technology forming the container may be Docker or Rocket or Odin or Chef or LXC or Vagrant. Or Ansible or Zone or Jail or Hyper-V container technology.
在本发明一实施例中,各个存储节点同时只负责管理固定的存储介质,并且一个存储介质不会同时被多个存储节点进行写入,以避免数据冲突,从而能够实现每一个存储节点都能够无需借助其他存储节点而访问由其管理的存储介质,并且能够保证存储系统中存储的数据的完整性。In an embodiment of the present invention, each storage node is only responsible for managing a fixed storage medium at the same time, and one storage medium is not simultaneously written by multiple storage nodes to avoid data conflict, thereby enabling each storage node to be able to implement each storage node. The storage medium managed by it is accessed without resorting to other storage nodes, and the integrity of the data stored in the storage system can be guaranteed.
在本发明一实施例中,可以将系统中所有的存储介质按照存储逻辑进行划分,具体而言,可以将整个系统的存储池划分为存储区域、存储组、存储块这样的逻辑存储层级架构,其中,存储块为最小存储单位。在本发明一实施例中,可以将存储池划分成至少两个存储区域。In an embodiment of the present invention, all the storage media in the system may be divided according to storage logic. Specifically, the storage pool of the entire system may be divided into a logical storage hierarchy structure such as a storage area, a storage group, and a storage block. Among them, the storage block is the smallest storage unit. In an embodiment of the invention, the storage pool may be divided into at least two storage areas.
在本发明一实施例中,每一个存储区域可以分为至少一个存储组。在一个较优的实施例中,每个存储区域至少被划分为两个存储组。In an embodiment of the invention, each storage area may be divided into at least one storage group. In a preferred embodiment, each storage area is divided into at least two storage groups.
在一些实施例中,存储区域和存储组是可以合并的,从而可以在该存储层级架构中省略一个层级。In some embodiments, the storage area and the storage group can be merged such that one level can be omitted in the storage hierarchy.
在本发明一实施例中,每个存储区域(或者存储组)可以由至少一个存储块组成,其中存储块可以是一个完整的存储介质、也可以是一个存储介质的一部分。为了在存储区域内部构建冗余存储,每个存储区域(或者存储组)可以由至少两个存储块组成,当其中任何一个存储块出现故障时,可以从该组中其余存储块中计算出完整的被存储数据。冗余存储方式可以为多副本模式、独立冗余磁盘阵列(RAID)模式、纠删码(erase code)模式。在本发明一实施例中,冗余存储方式可以通过ZFS文件系统建立。在本发明一实施例中,为了对抗存储设备/存储介质的硬件故障,每个存储区域(或者存储组)所包含的多个存储块不会位于同一个存储介质中,甚至也不位于同一 个存储设备中。在本发明一实施例中,每个存储区域(或者存储组)所包含的任何两个存储块都不会位于同一个存储介质/存储设备中。在本发明另一实施例中,同一存储区域(或者存储组)中位于同一存储介质/存储设备的存储块数量最好小于或等于冗余存储的冗余度。举例说明,当存储冗余采取的RAID5方式时,其冗余存储的冗余度为1,那么位于同一存储设备的同一存储组的存储块数量最多为1;对RAID6,其冗余存储的冗余度为2,那么位于同一存储设备的同一存储组的存储块数量最多为2。In an embodiment of the invention, each storage area (or storage group) may be composed of at least one storage block, wherein the storage block may be a complete storage medium or a part of a storage medium. In order to construct redundant storage inside the storage area, each storage area (or storage group) may be composed of at least two storage blocks, and when any one of the storage blocks fails, the complete storage block may be calculated from the remaining storage blocks in the group. The data is stored. The redundant storage mode can be multi-copy mode, independent redundant disk array (RAID) mode, and erasure code mode. In an embodiment of the invention, the redundant storage mode can be established by the ZFS file system. In an embodiment of the present invention, in order to combat the hardware failure of the storage device/storage medium, the plurality of storage blocks included in each storage area (or storage group) are not located in the same storage medium, or even in the same Storage devices. In an embodiment of the invention, any two storage blocks included in each storage area (or storage group) are not located in the same storage medium/storage device. In another embodiment of the present invention, the number of storage blocks located in the same storage medium/storage device in the same storage area (or storage group) is preferably less than or equal to the redundancy of the redundant storage. For example, when the RAID 5 mode of storage redundancy is adopted, the redundancy of redundant storage is 1, and the number of storage blocks of the same storage group of the same storage device is at most 1; for RAID 6, the redundancy of redundant storage With a redundancy of 2, the number of memory blocks in the same storage group on the same storage device is up to 2.
由于存储组中的存储块实际来自不同的存储设备,存储池的容错级别是与存储组中冗余存储的容错级别有关的,因此在本发明一实施例中,该存储系统进一步包括一个容错级别调整模块,配置为通过调整存储组中允许同时故障的存储块数和/或从存储池的至少两个存储设备中各自选取用于聚合成同一个存储组的存储块数量便可调整存储池的容错级别。具体而言,若以D表示所述存储组中允许同时故障的存储块数,以N表示从所述存储池的至少两个存储设备中各自选取用于聚合成同一个所述存储组的存储块数量,以M表示存储池中允许同时故障的存储设备数量。则该容错级别调整模块所确定的存储池的容错级别为M=D/N,D/N仅取整数位。通过这种方式可根据实际需要而实现不同容错级别的存储系统。Since the storage blocks in the storage group are actually from different storage devices, the fault tolerance level of the storage pool is related to the fault tolerance level of the redundant storage in the storage group. Therefore, in an embodiment of the invention, the storage system further includes a fault tolerance level. The adjustment module is configured to adjust the storage pool by adjusting the number of storage blocks in the storage group that allow simultaneous failures and/or selecting the number of storage blocks for aggregation into the same storage group from each of the at least two storage devices of the storage pool Fault tolerance level. Specifically, if the number of storage blocks in the storage group that allow simultaneous failures is represented by D, the storage for aggregation into the same storage group is selected from each of the at least two storage devices of the storage pool by N. The number of blocks, in M, represents the number of storage devices in the storage pool that are allowed to fail simultaneously. Then, the fault tolerance level of the storage pool determined by the fault tolerance level adjustment module is M=D/N, and D/N only takes integer bits. In this way, different fault-tolerant storage systems can be implemented according to actual needs.
在本发明一实施例中,每个存储节点都只能读和写自己管理的存储区域。由于多个存储节点对同一个存储块的读操作并不会互相冲突,而多个存储节点同时写一个存储块容易发生冲突,因此,在另一个实施例中,可以是每个存储节点只能写自己管理的存储区域,但是可以读自己管理的存储区域以及其它存储节点管理的存储区域,即写操作是局域性的,但读操作可以是全局性。In an embodiment of the invention, each storage node can only read and write its own managed storage area. Since the read operations of the same storage block by multiple storage nodes do not conflict with each other, and multiple storage nodes write one storage block at the same time, conflicts are easily generated. Therefore, in another embodiment, each storage node can only Write the storage area managed by yourself, but you can read the storage area managed by yourself and the storage area managed by other storage nodes, that is, the write operation is local, but the read operation can be global.
在一个实施方式中,存储系统还可以包括存储控制节点,其连接至存储网络,用于确定每个存储节点管理的存储区域。在另一个实施方式中,每个存储节点可以包括存储分配模块,用于确定该存储节点所管理的存储区域, 这可以通过每个存储节点所包括的各个存储分配模块之间的通信和协调处理算法来实现,该算法可以例如以使得各个存储节点之间的负载均衡为原则。In one embodiment, the storage system may further include a storage control node coupled to the storage network for determining a storage area managed by each storage node. In another embodiment, each storage node may include a storage allocation module for determining a storage area managed by the storage node, This can be achieved by a communication and coordination processing algorithm between the various storage allocation modules included in each storage node, which algorithm can for example be based on load balancing between the various storage nodes.
在一个实施例中,在监测到一个存储节点发生故障时,可以对其他部分或全部存储节点进行配置,使得这些存储节点接管之前由所述发生故障的存储节点管理的存储区域。例如,可以由其中一个存储节点接管出现故障的存储节点管理的存储区域,或者,可以由其它至少两个存储节点进行接管,其中每个存储节点接管出现故障的存储节点管理的部分的存储区域,比如其他至少两个存储节点分别接管该存储区域内的不同存储组。In one embodiment, upon detecting a failure of a storage node, other or all of the storage nodes may be configured such that the storage nodes take over the storage area previously managed by the failed storage node. For example, one of the storage nodes may take over a storage area managed by the failed storage node, or may be taken over by at least two other storage nodes, wherein each storage node takes over a portion of the storage area managed by the failed storage node, For example, at least two other storage nodes respectively take over different storage groups in the storage area.
在一个实施例中,存储介质可以包括但不限于硬盘、闪存、SRAM、DRAM、NVME或NVRAM等其它形式,存储介质的访问接口可以包括但不限于SAS接口、SATA接口、PCI/e接口、DIMM接口、NVMe接口、SCSI接口、AHCI接口。In one embodiment, the storage medium may include, but is not limited to, a hard disk, a flash memory, an SRAM, a DRAM, an NVME, or an NVRAM. The access interface of the storage medium may include, but is not limited to, a SAS interface, a SATA interface, a PCI/e interface, and a DIMM. Interface, NVMe interface, SCSI interface, AHCI interface.
在本发明一实施例中,存储网络可以包括至少一个存储交换设备,通过其中包括的存储交换设备之间的数据交换来实现存储节点对存储介质的访问。具体而言,存储节点和存储介质分别通过存储通道与存储交换设备连接。In an embodiment of the invention, the storage network may include at least one storage switching device, and the storage node accesses the storage medium through data exchange between the storage switching devices included therein. Specifically, the storage node and the storage medium are respectively connected to the storage switching device through the storage channel.
在本发明一实施例中,存储交换设备可以是SAS交换机或PCI/e交换机,对应地,存储通道可以是SAS(串行连接SCSI)通道或PCI/e通道。In an embodiment of the invention, the storage switching device may be a SAS switch or a PCI/e switch. Correspondingly, the storage channel may be a SAS (Serial Attached SCSI) channel or a PCI/e channel.
以SAS通道为例,相比传统的基于IP协议的存储方案,基于SAS交换的方案,拥有着性能高,带宽大,单台设备磁盘数量多等优点。在与主机适配器(HBA)或者服务器主板上的SAS接口结合使用后,SAS体系所提供的存储能够很容易的被连接的多台服务器同时访问。Taking the SAS channel as an example, compared with the traditional IP-based storage solution, the SAS-based switching solution has the advantages of high performance, large bandwidth, and a large number of disks per device. When used in conjunction with a host adapter (HBA) or a SAS interface on a server board, the storage provided by the SAS system can be easily accessed by multiple servers connected simultaneously.
具体而言,SAS交换机到存储设备之间通过一根SAS线连接,存储设备与存储介质之间也是由SAS接口连接,比如,存储设备内部将SAS通道连到每个存储介质(可以在存储设备内部设置一个SAS交换芯片)。由于SAS网络的带宽可以达到24Gb或48Gb,是千兆以太网的几十倍,以及昂 贵的万兆以太网的数倍;同时在链路层SAS比IP网有大约一个数量级的提升,在传输层,由于TCP协议三次握手四次关闭,开销很高且TCP的延迟确认机制和慢启动有时会导致100毫秒级的延时,SAS协议的延时只有TCP的几十分之一,性能有更大的提升。总之,SAS网络比基于以太网的TCP/IP在带宽、延时性方面具有巨大优势。本领域技术人员可以理解,PCI/e通道的性能也可以适应系统的需求。Specifically, the SAS switch is connected to the storage device through a SAS line, and the storage device and the storage medium are also connected by a SAS interface. For example, the storage device internally connects the SAS channel to each storage medium (may be in the storage device) Internally set a SAS switch chip). Since the bandwidth of a SAS network can reach 24Gb or 48Gb, it is dozens of times that of Gigabit Ethernet, and Several times the cost of 10 Gigabit Ethernet; at the same time, the link layer SAS has an order of magnitude improvement over the IP network. At the transport layer, due to the TCP handshake three times, the overhead is high and the TCP delay acknowledgement mechanism is slow. The startup sometimes causes a delay of 100 milliseconds. The delay of the SAS protocol is only a few tenths of that of TCP, and the performance is greatly improved. In summary, SAS networks offer significant advantages in terms of bandwidth and latency over Ethernet-based TCP/IP. Those skilled in the art will appreciate that the performance of the PCI/e channel can also be adapted to the needs of the system.
在本发明一实施例中,存储网络可以包括至少两个存储交换设备,所述每个存储节点都可以通过任意一个存储交换设备连接到任何一个存储设备,进而连接至存储介质。当任何一个存储交换设备或连接到一个存储交换设备的存储通道出现故障时,存储节点通过其它存储交换设备读写存储设备上的数据。In an embodiment of the invention, the storage network may include at least two storage switching devices, each of which may be connected to any one of the storage devices through any one of the storage switching devices, thereby being connected to the storage medium. When any storage switching device or storage channel connected to a storage switching device fails, the storage node reads and writes data on the storage device through other storage switching devices.
参考图3,其示出了根据本发明一个实施方式所构建的一个具体的存储系统30。存储系统30中的存储设备被构建成多台JBOD307-310,分别通过SAS数据线连接至两个SAS交换机305和306,这两个SAS交换机构成了存储系统所包括的存储网络的交换核心。前端为至少两个服务器301和302,每台服务器通过HBA设备(未示出)或主板上SAS接口连接至这两个SAS交换机305和306。服务器之间存在基本的网络连接用来监控和通信。每台服务器中都有一个存储节点,利用从SAS链路获取的信息,管理所有JBOD磁盘中的部分或全部磁盘。具体而言,可以利用本申请文件以上描述的存储区域、存储组、存储块来将JBOD磁盘划分成不同的存储组。每个存储节点都管理一组或多组这样的存储组。当每个存储组内部采用冗余存储的方式时,可以将冗余存储的元数据存在于磁盘之上,使得冗余存储能够被其他存储节点直接从磁盘识别。Referring to Figure 3, there is shown a particular storage system 30 constructed in accordance with one embodiment of the present invention. The storage devices in the storage system 30 are constructed as a plurality of JBODs 307-310, which are respectively connected to the two SAS switches 305 and 306 through SAS data lines, which constitute the switching core of the storage network included in the storage system. The front end is at least two servers 301 and 302, each of which is connected to the two SAS switches 305 and 306 via an HBA device (not shown) or a SAS interface on the motherboard. There is a basic network connection between the servers for monitoring and communication. Each server has a storage node that manages some or all of the disks in all JBOD disks using information obtained from the SAS links. Specifically, the storage area, the storage group, and the storage block described above in the application file may be used to divide the JBOD disk into different storage groups. Each storage node manages one or more sets of such storage groups. When redundant storage is used inside each storage group, redundantly stored metadata can exist on the disk, so that redundant storage can be directly recognized from the disk by other storage nodes.
在所示的示例性存储系统30中,存储节点可以安装监控和管理模块,负责监控本地存储和其它服务器的状态。当某台JBOD整体异常,或者JBOD上某个磁盘异常时,数据可靠性由冗余存储来确保。当某台服务器故障时, 另一台预先设定好的服务器上的存储节点中的管理模块,将按照磁盘上的数据,在本地识别并接管原来由故障服务器的存储节点所管理的磁盘。故障服务器的存储节点原本对外提供的存储服务,也将在新的服务器上的存储节点得到延续。至此,实现了一种全新的高可用的全局存储池结构。In the exemplary storage system 30 shown, the storage node can install a monitoring and management module that is responsible for monitoring the status of local storage and other servers. When a JBOD is abnormal overall or a disk on the JBOD is abnormal, data reliability is ensured by redundant storage. When a server fails, The management module in the storage node on another pre-configured server will locally identify and take over the disk managed by the storage node of the failed server according to the data on the disk. The storage node originally provided by the storage node of the faulty server will also be extended on the storage node on the new server. So far, a new highly available global storage pool structure has been implemented.
可见,所构建的示例性存储系统30提供了一种多点可控的、全局访问的存储池。硬件方面使用多台服务器来对外提供服务,使用JBOD来存放磁盘。将多台JBOD各自连接两台SAS交换机,两台交换机再分别连接服务器的HBA卡,从而确保JBOD上所有磁盘,能够被所有服务器访问。SAS冗余链路也确保了链路上的高可用性。As can be seen, the exemplary storage system 30 is constructed to provide a multi-point, controllable, globally accessible storage pool. The hardware uses multiple servers to provide external services, and uses JBOD to store disks. Multiple JBODs are connected to two SAS switches, and the two switches are respectively connected to the server's HBA cards, thereby ensuring that all disks on the JBOD can be accessed by all servers. The SAS redundant link also ensures high availability on the link.
在每台服务器本地,利用冗余存储技术,从每台JBOD上选取磁盘组成冗余存储,避免单台JBOD的损失造成数据不可用。当一台服务器失效时,对整体状态进行监控的模块将调度另一台服务器,通过SAS通道访问失效服务器的存储节点所管理的磁盘,快速接管对方负责的这些磁盘,实现高可用的全局存储。Locally, each server uses redundant storage technology to select redundant disks from each JBOD to avoid redundant data loss. When one server fails, the module that monitors the overall state will schedule another server to access the disks managed by the storage node of the failed server through the SAS channel, and quickly take over the disks that the other party is responsible for, achieving high-available global storage.
虽然在图3中是以JBOD存放磁盘为例进行了说明,但是应当理解,如图3所示的本发明的实施方式还支持JBOD以外的存储设备。另外,以上是以一块存储介质(整个的)作为一个存储块为例,也同样适用于将一个存储介质的一部分作为一个存储块的情形。Although the JBOD storage disk is illustrated in FIG. 3 as an example, it should be understood that the embodiment of the present invention as shown in FIG. 3 also supports a storage device other than JBOD. In addition, the above is an example in which one storage medium (entire) is used as one storage block, and the same applies to a case where a part of one storage medium is used as one storage block.
本发明一实施例还提供一种冗余存储方法,所适用的存储系统包括:存储网络;至少两个存储节点,连接至存储网络;以及至少两个存储设备,连接至存储网络,每个存储设备包括至少一个存储介质;其中,每个存储节点通过存储网络访问至少两个存储设备;方法包括:An embodiment of the present invention further provides a redundant storage method, where the applicable storage system includes: a storage network; at least two storage nodes connected to the storage network; and at least two storage devices connected to the storage network, each storage The device includes at least one storage medium; wherein each storage node accesses at least two storage devices through the storage network; the method includes:
在被同一个存储节点访问的至少两个存储设备中的每个存储设备的至少一个存储块间以冗余存储的方式保存数据,其中,存储块是一个完整的存储介质或者是一个存储介质的一部分。Saving data in a redundant storage manner between at least one of each of at least two storage devices accessed by the same storage node, wherein the storage block is a complete storage medium or a storage medium portion.
在本发明一实施例中,该存储系统中的所有存储介质构成一个存储池, 且该存储池为如前所述的全局存储池,即存储池中所有的存储介质都可被存储系统中所有的存储节点共享,每一个存储节点都能够无需借助其他存储节点而访问存储池中的所有存储介质。In an embodiment of the invention, all storage media in the storage system form a storage pool. The storage pool is a global storage pool as described above, that is, all storage media in the storage pool can be shared by all storage nodes in the storage system, and each storage node can access the storage pool without using other storage nodes. All storage media.
具体而言,基于该全局存储池的冗余存储方法可通过如下过程实现:先是从存储池中选取多个存储设备,然后再从该选取的多个存储设备中的每个存储设备中各选取至少一个存储块,将通过以上过程选取的所有存储块聚合成存储组。这样在该存储组中,数据以冗余存储的方式存储于该存储组的所有存储块中。当该存储组中的某个存储块出现故障时,便可利用该存储组中的其他存储块中的数据获取该故障存储块中的数据。Specifically, the redundant storage method based on the global storage pool may be implemented by first selecting multiple storage devices from the storage pool, and then selecting each of the selected multiple storage devices. At least one storage block aggregates all the storage blocks selected by the above process into a storage group. Thus, in the storage group, data is stored in all storage blocks of the storage group in a redundant manner. When a certain storage block in the storage group fails, data in the other storage block in the storage group can be used to acquire data in the failed storage block.
应当理解,一个存储组中的存储块并不一定来自存储池中所有存储设备,同时存储池中的存储设备也并不一定全部用于进行冗余存储,对于未被选取用于冗余存储的存储设备和存储块,可作为平常不使用的热备设备。It should be understood that the storage blocks in one storage group do not necessarily come from all storage devices in the storage pool, and the storage devices in the storage pool are not necessarily all used for redundant storage, and are not selected for redundant storage. Storage devices and storage blocks can be used as hot spare devices that are not normally used.
应当理解,存储组中存储块之间冗余存储的方式可具体通过多副本模式、RAID模式或纠删码模式来实现,本发明对存储组中存储块之间的冗余存储的具体方式不做限定。It should be understood that the manner of redundant storage between the storage blocks in the storage group may be specifically implemented by a multiple copy mode, a RAID mode, or an erasure code mode. The specific manner of the redundant storage between the storage blocks in the storage group is not Make a limit.
在本发明一实施例中,为了满足根据存储的具体内容而进行的更灵活存储设置,还可以将多个存储组聚合成存储区域。In an embodiment of the present invention, in order to satisfy more flexible storage settings according to specific content stored, a plurality of storage groups may also be aggregated into a storage area.
如前所述,由于存储组中的存储块实际来自不同的存储设备,存储池的容错级别是与存储组中冗余存储的容错级别有关的,因此存储池的容错级别可通过调整存储组中允许同时故障的存储块数和/或从存储池的至少两个存储设备中各自选取用于聚合成同一个存储组的存储块数量便而进行调整。具体的调整方式可与前述存储系统中容错级别调整模块所执行的方法相同,在此不再赘述。As mentioned earlier, since the storage blocks in the storage group are actually from different storage devices, the fault tolerance level of the storage pool is related to the fault tolerance level of the redundant storage in the storage group, so the fault tolerance level of the storage pool can be adjusted by adjusting the storage group. The number of memory blocks that are allowed to fail at the same time and/or the number of memory blocks for aggregation into the same memory group are each selected from at least two storage devices of the storage pool. The specific adjustment manner may be the same as the method performed by the fault tolerance level adjustment module in the foregoing storage system, and details are not described herein again.
由此可见,通过采用本发明实施例所提供的应用于存储系统的冗余冗余存储方法,可通过调整存储组的容错级别以及存储组中存储块的选用策略来实现存储池的不同容错级别,以适应不同程度的实际存储需求。 It can be seen that by adopting the redundant redundant storage method applied to the storage system provided by the embodiment of the present invention, different fault tolerance levels of the storage pool can be realized by adjusting the fault tolerance level of the storage group and the selection policy of the storage block in the storage group. To adapt to different levels of actual storage needs.
图4所示为本发明一实施例所提供的采用冗余存储的存储池的结构示意图。如图4所示,该存储池40包括5个存储设备JBOD1~JBOD5,每个存储设备包括5个存储块。该存储池40中的5个存储设备JBOD1~JBOD5均被用于冗余存储,且每个存储设备中各选取了一个存储块以纠删码的方式聚合成了存储组。例如,存储块D1~D5被聚合成了一个存储组P1,D11~D15可被聚合成另一个存储组。在存储组P1中,数据以纠删码的方式存储在存储块D1~D5中,纠删码的校验级别为2,即存储组P1中允许同时故障的存储块数为2,则该存储池40中允许同时故障的存储设备数量也为2。FIG. 4 is a schematic structural diagram of a storage pool using redundant storage according to an embodiment of the present invention. As shown in FIG. 4, the storage pool 40 includes five storage devices JBOD1 to JBOD5, and each storage device includes five storage blocks. The five storage devices JBOD1 to JBOD5 in the storage pool 40 are used for redundant storage, and each storage device selects one storage block to be aggregated into a storage group in an erasure code. For example, the memory blocks D1 to D5 are aggregated into one memory group P1, and D11 to D15 can be aggregated into another memory group. In the storage group P1, the data is stored in the storage blocks D1 to D5 in an erasure code, and the check level of the erasure code is 2, that is, the number of storage blocks allowed to simultaneously fail in the storage group P1 is 2, then the storage is The number of storage devices allowed to fail simultaneously in pool 40 is also two.
图5所示为本发明另一实施例所提供的采用冗余存储的存储池的结构示意图。如图5所示,该存储池50中的5个存储设备JBOD1~JBOD5也均被用于冗余存储,但每个存储设备中各选取了两个存储块以纠删码的方式聚合成了存储组。例如,存储块D1~D15被聚合成了一个存储组P2,存储块D21~D35则可被聚合成另一个存储组。在存储组P2中,纠删码的校验级别为3,即存储组P2中允许同时故障的存储块数为3个,则该存储池50中允许同时故障的存储设备数量为3/2取整数位=1,即该存储池50中允许同时故障的存储设备数量仅为一个。FIG. 5 is a schematic structural diagram of a storage pool using redundant storage according to another embodiment of the present invention. As shown in FIG. 5, the five storage devices JBOD1 to JBOD5 in the storage pool 50 are also used for redundant storage, but each storage device selects two storage blocks and is aggregated in an erasure code. Storage group. For example, the memory blocks D1 to D15 are aggregated into one memory group P2, and the memory blocks D21 to D35 can be aggregated into another memory group. In the storage group P2, the check level of the erasure code is 3, that is, the number of storage blocks allowed to be simultaneously faulty in the storage group P2 is three, and the number of storage devices that allow simultaneous failure in the storage pool 50 is 3/2. The integer bit = 1, that is, the number of storage devices in the storage pool 50 that allow simultaneous failure is only one.
本发明一实施例还提供一种冗余存储装置,所适用的存储系统包括:存储网络;至少两个存储节点,连接至所述存储网络;以及至少两个存储设备,连接至所述存储网络,每个所述存储设备包括至少一个存储介质;其中,每个所述存储节点通过所述存储网络访问至少两个存储设备;冗余存储装置包括:An embodiment of the present invention further provides a redundant storage device, where the storage system includes: a storage network; at least two storage nodes connected to the storage network; and at least two storage devices connected to the storage network Each of the storage devices includes at least one storage medium; wherein each of the storage nodes accesses at least two storage devices through the storage network; the redundant storage device includes:
冗余存储模块,配置为在被同一个存储节点访问的至少两个存储设备中的每个存储设备的至少一个存储块间以冗余存储的方式保存数据,其中,所述存储块是一个完整的存储介质或者是一个存储介质的一部分。应当理解,该冗余存储模块所执行的方法与前述的冗余存储方法相同,所能实现的功能效果也相同,在此不再赘述。 a redundant storage module configured to store data in a redundant manner between at least one of each of the at least two storage devices accessed by the same storage node, wherein the storage block is a complete The storage medium is either part of a storage medium. It should be understood that the method performed by the redundant storage module is the same as the foregoing redundant storage method, and the functional effects that can be achieved are also the same, and details are not described herein again.
本发明一实施例还提供一种计算机可读存储介质的计算机程序产品,包括计算机程序代码,当计算机程序代码由处理器执行时,其使得处理器能够按照本发明实施方式的方法来实现如本文实施方式的冗余存储方法。计算机存储介质可以为任何有形媒介,例如软盘、CD-ROM、DVD、硬盘驱动器、甚至网络介质等。An embodiment of the invention further provides a computer program product of a computer readable storage medium, comprising computer program code, which when executed by a processor, enables the processor to be implemented according to the method of the embodiments of the invention A redundant storage method of an embodiment. The computer storage medium can be any tangible medium such as a floppy disk, CD-ROM, DVD, hard drive, or even network media.
应当理解,虽然以上描述了本发明实施方式的一种实现形式可以是计算机程序产品,但是本发明的实施方式的方法或装置可以被依软件、硬件或者软件和硬件的结合来实现。硬件部分可以利用专用逻辑来实现;软件部分可以存储在存储器中,由适当的指令执行系统,例如微处理器或者专用设计硬件来执行。本领域的普通技术人员可以理解上述的方法和设备可以使用计算机可执行指令和/或包含在处理器控制代码中来实现,例如在诸如磁盘、CD或DVD-ROM的载体介质、诸如只读存储器(固件)的可编程的存储器或者诸如光学或电子信号载体的数据载体上提供了这样的代码。本发明的方法和装置可以由诸如超大规模集成电路或门阵列、诸如逻辑芯片、晶体管等的半导体、或者诸如现场可编程门阵列、可编程逻辑设备等的可编程硬件设备的硬件电路实现,也可以用由各种类型的处理器执行的软件实现,也可以由上述硬件电路和软件的结合例如固件来实现。It should be understood that although an implementation form of the embodiments of the present invention described above may be a computer program product, the method or apparatus of the embodiments of the present invention may be implemented in software, hardware, or a combination of software and hardware. The hardware portion can be implemented using dedicated logic; the software portion can be stored in memory and executed by a suitable instruction execution system, such as a microprocessor or dedicated design hardware. One of ordinary skill in the art will appreciate that the methods and apparatus described above can be implemented using computer-executable instructions and/or embodied in processor control code, such as a carrier medium such as a magnetic disk, CD or DVD-ROM, such as a read only memory. Such code is provided on a programmable memory (firmware) or on a data carrier such as an optical or electronic signal carrier. The method and apparatus of the present invention may be implemented by hardware circuits such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, etc., or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc., also It can be implemented by software executed by various types of processors, or by a combination of the above-described hardware circuits and software such as firmware.
还应当理解,为了不模糊本发明的实施方式,说明书仅对一些关键、未必必要的技术和特征进行了描述,而可能未对一些本领域技术人员能够实现的特征做出说明。It should also be understood that the descriptions of the present invention are merely illustrative of some key, non-essential techniques and features, and may not be described in a manner that can be realized by those skilled in the art.
以上所述仅为本发明的较佳实施例而已,并不用以限制本发明,凡在本发明的精神和原则之内,所作的任何修改、等同替换等,均应包含在本发明的保护范围之内。 The above is only the preferred embodiment of the present invention, and is not intended to limit the present invention. Any modifications, equivalent substitutions, etc., which are within the spirit and principles of the present invention, should be included in the scope of the present invention. within.

Claims (16)

  1. 一种冗余存储系统,其特征在于,包括:A redundant storage system, comprising:
    存储网络;Storage network
    至少两个存储节点,连接至所述存储网络;以及At least two storage nodes connected to the storage network;
    至少两个存储设备,连接至所述存储网络,每个所述存储设备包括至少一个存储介质;At least two storage devices connected to the storage network, each of the storage devices including at least one storage medium;
    其中,每个所述存储节点通过所述存储网络访问至少两个存储设备,被同一个存储节点访问的至少两个存储设备中的每个存储设备的至少一个存储块间以冗余存储的方式保存数据,其中,所述存储块是一个完整的存储介质或者是一个存储介质的一部分。Each of the storage nodes accesses at least two storage devices through the storage network, and is redundantly stored between at least one storage block of each of the at least two storage devices accessed by the same storage node. The data is saved, wherein the storage block is a complete storage medium or is part of a storage medium.
  2. 根据权利要求1所述的冗余存储系统,其特征在于,所述存储网络被配置为使得每一个存储节点都能够无需借助其他存储节点而访问所有存储介质。The redundant storage system of claim 1 wherein said storage network is configured such that each storage node is capable of accessing all storage media without the aid of other storage nodes.
  3. 根据权利要求2所述的冗余存储系统,其特征在于,所述冗余存储系统所包括的所有存储介质构成一个存储池,将所述存储池划分成至少两个存储区域,每个存储节点负责管理零到多个存储区域。The redundant storage system according to claim 2, wherein all storage media included in the redundant storage system constitute a storage pool, and the storage pool is divided into at least two storage areas, each storage node Responsible for managing zero to multiple storage areas.
  4. 根据权利要求3所述的冗余存储系统,其特征在于,每个所述存储区域包括至少两个存储块,组成所述每个存储区域的至少两个存储块被划分为一个或多个存储组,每个存储组内的存储块之间以冗余存储方式保存数据。The redundant storage system according to claim 3, wherein each of said storage areas comprises at least two storage blocks, and at least two storage blocks constituting said each storage area are divided into one or more storages. Groups, which store data in redundant storage between storage blocks within each storage group.
  5. 根据权利要求4所述的冗余存储系统,其特征在于,一个所述存储组中位于同一存储设备的存储块数量小于或等于冗余存储的冗余度。The redundant storage system according to claim 4, wherein the number of storage blocks located in the same storage device in one of the storage groups is less than or equal to the redundancy of the redundant storage.
  6. 根据权利要求4所述的冗余存储系统,其特征在于,进一步包括:The redundant storage system of claim 4, further comprising:
    容错级别调整模块,配置为通过调整所述存储组中允许同时故障的存储块数和/或从所述存储池的至少两个存储设备中各自选取用于聚合成同一个存储组的存储块数量来调整所述存储池的容错级别。 a fault tolerance level adjustment module configured to adjust a number of storage blocks in the storage group that allow simultaneous failures and/or to select a number of storage blocks to be aggregated into the same storage group from each of the at least two storage devices of the storage pool To adjust the fault tolerance level of the storage pool.
  7. 根据权利要求6所述的冗余存储系统,其特征在于,以D表示所述存储组中允许同时故障的存储块数,以N表示从所述存储池的至少两个存储设备中各自选取用于聚合成同一个所述存储组的存储块数量,以M表示存储池中允许同时故障的存储设备数量;则所述容错级别调整模块所确定的所述存储池的容错级别为M=D/N,D/N仅取整数位。The redundant storage system according to claim 6, wherein D indicates a number of storage blocks in the storage group that allow simultaneous failure, and N indicates that each of the storage pools is selected from at least two storage devices. The number of storage blocks that are aggregated into the same storage group, and M indicates the number of storage devices in the storage pool that are allowed to fail simultaneously; then the fault tolerance level of the storage pool determined by the fault tolerance level adjustment module is M=D/ N, D/N takes only integer bits.
  8. 根据权利要求4所述的冗余存储系统,其特征在于,一个存储组在一个存储设备中最多只有一个存储块。The redundant storage system of claim 4 wherein one storage group has at most one storage block in a storage device.
  9. 根据权利要求1至8中任一所述的冗余存储系统,其特征在于,所述冗余存储方式为RAID、纠删码或者多副本模式;或The redundant storage system according to any one of claims 1 to 8, wherein the redundant storage mode is RAID, erasure code or multiple copy mode; or
    所述存储设备为JBOD;和/或,所述存储介质是硬盘、闪存、DRAM或NVRAM;和/或所述存储介质的接口是SAS接口、SATA接口、PCI/e接口、DIMM接口、NVMe接口、SCSI接口或AHCI接口。The storage device is a JBOD; and/or the storage medium is a hard disk, a flash memory, a DRAM or an NVRAM; and/or the interface of the storage medium is a SAS interface, a SATA interface, a PCI/e interface, a DIMM interface, and an NVMe interface. , SCSI interface or AHCI interface.
  10. 根据权利要求1至8中任一所述的冗余存储系统,其特征在于,所述存储节点是以下几项中的一项或多项的结合:所述服务器的一个虚拟机、一个容器和直接运行在所述服务器的物理操作系统上的一个模块。A redundant storage system according to any one of claims 1 to 8, wherein said storage node is a combination of one or more of the following: a virtual machine of said server, a container and A module that runs directly on the physical operating system of the server.
  11. 一种冗余存储方法,其特征在于,所适用的冗余存储系统包括:存储网络;A redundant storage method, characterized in that the applicable redundant storage system comprises: a storage network;
    至少两个存储节点,连接至所述存储网络;以及至少两个存储设备,连接至所述存储网络,每个所述存储设备包括至少一个存储介质;其中,每个所述存储节点通过所述存储网络访问至少两个存储设备;所述方法包括:At least two storage nodes connected to the storage network; and at least two storage devices connected to the storage network, each of the storage devices including at least one storage medium; wherein each of the storage nodes passes the The storage network accesses at least two storage devices; the method includes:
    在被同一个存储节点访问的至少两个存储设备中的每个存储设备的至少一个存储块间以冗余存储的方式保存数据,其中,所述存储块是一个完整的存储介质或者是一个存储介质的一部分。Saving data in a redundant storage manner between at least one of each of at least two storage devices accessed by the same storage node, wherein the storage block is a complete storage medium or a storage Part of the media.
  12. 根据权利要求11所述的方法,其特征在于,在被同一个存储节点访问的至少两个存储设备中的每个存储设备的至少一个存储块间以冗余存储的方式保存数据包括: The method according to claim 11, wherein storing data in a redundant manner between at least one of the at least two storage devices of the at least two storage devices accessed by the same storage node comprises:
    将被同一个存储节点访问的至少两个存储设备中的每个存储设备的至少一个存储块以冗余存储的方式聚合成存储组。At least one storage block of each of the at least two storage devices accessed by the same storage node is aggregated into a storage group in a redundant storage manner.
  13. 根据权利要求12所述的方法,其特征在于,所述存储网络被配置为使得每一个存储节点都能够无需借助其他存储节点而访问所有存储介质,所述冗余存储系统所包括的所有存储介质构成一个存储池,其中,所述方法进一步包括:The method of claim 12 wherein said storage network is configured such that each storage node is capable of accessing all storage media without the aid of other storage nodes, all storage media included in said redundant storage system Forming a storage pool, wherein the method further comprises:
    通过调整所述存储组中允许同时故障的存储块数和/或从所述存储池的至少两个存储设备中各自选取用于聚合成同一个存储组的存储块数量来调整所述存储池的容错级别。Adjusting the storage pool by adjusting the number of storage blocks in the storage group that allow simultaneous failures and/or selecting the number of storage blocks for aggregation into the same storage group from each of the at least two storage devices of the storage pool Fault tolerance level.
  14. 根据权利要求13所述的方法,其特征在于,通过调整所述存储组中允许同时故障的存储块数和/或从所述存储池的至少两个存储设备中各自选取用于聚合成同一个存储组的存储块数量来调整所述存储池的容错级别包括:The method according to claim 13, wherein the number of storage blocks allowed to simultaneously fail in the storage group is adjusted and/or selected from at least two storage devices of the storage pool for aggregation into the same The number of storage blocks of the storage group to adjust the fault tolerance level of the storage pool includes:
    以D表示所述存储组中允许同时故障的存储块数,以N表示从所述存储池的至少两个存储设备中各自选取用于聚合成同一个所述存储组的存储块数量,以M表示存储池中允许同时故障的存储设备数量;则M=D/N,D/N仅取整数位。The number of storage blocks in the storage group that allow simultaneous failures is represented by D, and the number of storage blocks selected from the at least two storage devices of the storage pool for aggregation into the same storage group is represented by N. Indicates the number of storage devices in the storage pool that are allowed to fail at the same time; then M=D/N, D/N takes only integer bits.
  15. 根据权利要求11至14中任一所述的方法,其特征在于,进一步包括:将多个所述存储组聚合成存储区域。The method according to any one of claims 11 to 14, further comprising: a plurality of said storage groups being aggregated into a storage area.
  16. 一种冗余存储装置,其特征在于,所适用的冗余存储系统包括:存储网络;至少两个存储节点,连接至所述存储网络;以及至少两个存储设备,连接至所述存储网络,每个所述存储设备包括至少一个存储介质;其中,每个所述存储节点通过所述存储网络访问至少两个存储设备;所述冗余存储装置包括:A redundant storage device, characterized in that the applicable redundant storage system comprises: a storage network; at least two storage nodes connected to the storage network; and at least two storage devices connected to the storage network, Each of the storage devices includes at least one storage medium; wherein each of the storage nodes accesses at least two storage devices through the storage network; the redundant storage device includes:
    冗余存储模块,配置为在被同一个存储节点访问的至少两个存储设备中的每个存储设备的至少一个存储块间以冗余存储的方式保存数据,其中,所 述存储块是一个完整的存储介质或者是一个存储介质的一部分。 a redundant storage module configured to store data in a redundant manner between at least one of each of the at least two storage devices accessed by the same storage node, wherein The storage block is a complete storage medium or a part of a storage medium.
PCT/CN2017/077754 2011-10-11 2017-03-22 Redundant storage system, redundant storage method and redundant storage device WO2017162177A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US16/139,712 US10782898B2 (en) 2016-02-03 2018-09-24 Data storage system, load rebalancing method thereof and access control method thereof
US16/378,076 US20190235777A1 (en) 2011-10-11 2019-04-08 Redundant storage system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610176288.7A CN105843557B (en) 2016-03-24 2016-03-24 Redundant storage system, redundant storage method and redundant storage device
CN201610176288.7 2016-03-24

Related Parent Applications (2)

Application Number Title Priority Date Filing Date
PCT/CN2017/077758 Continuation-In-Part WO2017162179A1 (en) 2011-10-11 2017-03-22 Load rebalancing method and apparatus for use in storage system
PCT/CN2017/077753 Continuation-In-Part WO2017162176A1 (en) 2011-10-11 2017-03-22 Storage system, access method for storage system, and access device for storage system

Related Child Applications (2)

Application Number Title Priority Date Filing Date
PCT/CN2017/077757 Continuation-In-Part WO2017162178A1 (en) 2011-10-11 2017-03-22 Access control method and device for storage system
PCT/CN2017/077755 Continuation-In-Part WO2017167106A1 (en) 2011-10-11 2017-03-22 Storage system

Publications (1)

Publication Number Publication Date
WO2017162177A1 true WO2017162177A1 (en) 2017-09-28

Family

ID=56583383

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/077754 WO2017162177A1 (en) 2011-10-11 2017-03-22 Redundant storage system, redundant storage method and redundant storage device

Country Status (2)

Country Link
CN (1) CN105843557B (en)
WO (1) WO2017162177A1 (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105843557B (en) * 2016-03-24 2019-03-08 天津书生云科技有限公司 Redundant storage system, redundant storage method and redundant storage device
US20180046383A1 (en) * 2016-08-12 2018-02-15 Hewlett Packard Enterprise Development Lp Movement of frequently accessed data chunks between storage tiers
CN107967117B (en) * 2016-10-20 2020-10-20 杭州海康威视数字技术股份有限公司 Data storage, reading and cleaning method and device and cloud storage system
CN106708431B (en) * 2016-12-01 2020-02-14 华为技术有限公司 Data storage method and device, host equipment and storage equipment
WO2018102967A1 (en) * 2016-12-05 2018-06-14 华为技术有限公司 Control method, storage device and system for data read/write command in nvme over fabric architecture
CN108153622B (en) * 2016-12-06 2021-08-31 华为技术有限公司 Fault processing method, device and equipment
CN111966540B (en) 2017-09-22 2024-03-01 成都华为技术有限公司 Storage medium management method and device and readable storage medium
US11436113B2 (en) * 2018-06-28 2022-09-06 Twitter, Inc. Method and system for maintaining storage device failure tolerance in a composable infrastructure
CN109130558A (en) * 2018-07-25 2019-01-04 福州市联奇智能科技有限公司 A kind of more chapters intelligence selection automatic stamping machine device people based on big data
CN109814803B (en) * 2018-12-17 2022-12-09 深圳创新科技术有限公司 Fault tolerance self-adaptive adjusting method and device in distributed storage system
CN109992445B (en) * 2019-04-11 2020-10-02 苏州浪潮智能科技有限公司 Processing method and device for modifying write operation, electronic equipment and storage medium
CN112579384B (en) * 2019-09-27 2023-07-04 杭州海康威视数字技术股份有限公司 Method, device and system for monitoring nodes of SAS domain and nodes

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104424052A (en) * 2013-09-11 2015-03-18 杭州信核数据科技有限公司 Automatic redundant distributed storage system and method
CN104657316A (en) * 2015-03-06 2015-05-27 北京百度网讯科技有限公司 Server
CN105843557A (en) * 2016-03-24 2016-08-10 天津书生云科技有限公司 Redundant storage system, redundant storage method and redundant storage device

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8037361B2 (en) * 2009-11-04 2011-10-11 International Business Machines Corporation Selective write protect for disaster recovery testing
CN203982354U (en) * 2014-06-19 2014-12-03 天津书生投资有限公司 A kind of redundant storage system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104424052A (en) * 2013-09-11 2015-03-18 杭州信核数据科技有限公司 Automatic redundant distributed storage system and method
CN104657316A (en) * 2015-03-06 2015-05-27 北京百度网讯科技有限公司 Server
CN105843557A (en) * 2016-03-24 2016-08-10 天津书生云科技有限公司 Redundant storage system, redundant storage method and redundant storage device

Also Published As

Publication number Publication date
CN105843557A (en) 2016-08-10
CN105843557B (en) 2019-03-08

Similar Documents

Publication Publication Date Title
WO2017162177A1 (en) Redundant storage system, redundant storage method and redundant storage device
WO2017162176A1 (en) Storage system, access method for storage system, and access device for storage system
WO2017162179A1 (en) Load rebalancing method and apparatus for use in storage system
WO2017133483A1 (en) Storage system
US8806124B2 (en) Methods and structure for transferring ownership of a logical volume by transfer of native-format metadata in a clustered storage environment
US10031820B2 (en) Mirroring high performance and high availablity applications across server computers
US9411764B2 (en) Optimized redundant high availability SAS topology
US8560772B1 (en) System and method for data migration between high-performance computing architectures and data storage devices
JP5523468B2 (en) Active-active failover for direct attached storage systems
WO2017167106A1 (en) Storage system
WO2017162178A1 (en) Access control method and device for storage system
US20150121134A1 (en) Storage device failover
US10782898B2 (en) Data storage system, load rebalancing method thereof and access control method thereof
US10901626B1 (en) Storage device
US10318393B2 (en) Hyperconverged infrastructure supporting storage and compute capabilities
US8832489B2 (en) System and method for providing failover between controllers in a storage array
US8683258B2 (en) Fast I/O failure detection and cluster wide failover
JP2003330626A (en) Controller communication over always-on controller interconnect
US20100082793A1 (en) Server-Embedded Distributed Storage System
US20140316539A1 (en) Drivers and controllers
US11366618B2 (en) All flash array server and control method thereof
US11487654B2 (en) Method for controlling write buffer based on states of sectors of write buffer and associated all flash array server
US11216348B2 (en) All flash array server and control method thereof
US11467930B2 (en) Distributed failover of a back-end storage director

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17769454

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 17769454

Country of ref document: EP

Kind code of ref document: A1