WO2021139571A1 - 存储系统中的数据存储方法、数据读取方法、装置及系统 - Google Patents

存储系统中的数据存储方法、数据读取方法、装置及系统 Download PDF

Info

Publication number
WO2021139571A1
WO2021139571A1 PCT/CN2020/141063 CN2020141063W WO2021139571A1 WO 2021139571 A1 WO2021139571 A1 WO 2021139571A1 CN 2020141063 W CN2020141063 W CN 2020141063W WO 2021139571 A1 WO2021139571 A1 WO 2021139571A1
Authority
WO
WIPO (PCT)
Prior art keywords
hard disk
data
storage
units
read
Prior art date
Application number
PCT/CN2020/141063
Other languages
English (en)
French (fr)
Inventor
陈灿
陈明
谭春毅
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP20912213.4A priority Critical patent/EP4075252A4/en
Priority to JP2022536646A priority patent/JP2023510500A/ja
Publication of WO2021139571A1 publication Critical patent/WO2021139571A1/zh
Priority to US17/859,378 priority patent/US20220342567A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0629Configuration or reconfiguration of storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0751Error or fault detection not based on redundancy
    • G06F11/0754Error or fault detection not based on redundancy by exceeding limits
    • G06F11/0757Error or fault detection not based on redundancy by exceeding limits by exceeding a time limit, i.e. time-out, e.g. watchdogs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2053Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant
    • G06F11/2094Redundant storage or storage space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/16Handling requests for interconnection or transfer for access to memory bus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0604Improving or facilitating administration, e.g. storage management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0608Saving storage space on storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • G06F3/0619Improving the reliability of storage systems in relation to data integrity, e.g. data losses, bit errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0644Management of space entities, e.g. partitions, extents, pools
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/065Replication mechanisms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0683Plurality of storage devices
    • G06F3/0689Disk arrays, e.g. RAID, JBOD

Definitions

  • This application relates to the field of information technology, and in particular to a data storage method, data reading method, device and system in a storage system.
  • the distributed storage system may include multiple storage nodes.
  • a storage node is a storage server, and each storage server contains storage resources, such as multiple hard disks.
  • the distributed storage system organizes storage resources in storage nodes to provide storage services.
  • the client stores the data in the distributed storage system, usually divides the data into M data units, and obtains N check units of the data units based on the erasure coding (EC) algorithm.
  • the client stores M data units and N check units in M+N storage nodes, that is, one storage node of the M+N storage nodes stores a corresponding unit (data unit or check unit).
  • This application provides a data storage method, data reading method, device, and system in a storage system to make full use of storage resources in the storage system.
  • the technical solution is as follows:
  • the first device stores the K share units in K hard disk modules in the storage system; where the K share units include M data units and N check units, and each hard disk module of the K hard disk modules
  • the group stores one of the K units; each hard disk module includes an interface module and a hard disk, and the interface module communicates with the hard disk.
  • the storage system includes multiple storage nodes, and each storage node communicates with the interface modules of the K hard disk modules. Since each storage node communicates with the interface modules of multiple hard disk modules, the computing resources of the storage node can be fully used, and the computing power of the CPU of the storage node can be fully used, thereby reducing the waste of computing resources.
  • the first device is a client of the storage system.
  • the client sends the K units to the target storage node among the multiple storage nodes.
  • the target storage node stores the K units in the K hard disk modules in the storage system. In this way, storage is performed with the hard disk module in the storage system as the granularity, so that the storage resources in the storage system can be fully utilized.
  • the first device is one of the multiple storage nodes.
  • the interface module is a host bus adapter, a redundant array of independent hard disks, an expander card, or a network interface card.
  • the storage system includes a second device, and there is a mutual backup relationship or an active backup relationship between the second device and the first device.
  • the second device can take over the hard disk module corresponding to the first device, so there is no need to restore the data stored in the hard disk module corresponding to the first device, which improves the reliability of the storage system.
  • the present application embodiment of a data reading device in a storage system in the method: the first device receives a read request, and the read request includes a data identifier of the data to be read.
  • the first device determines the hard disk module storing the data to be read from the K hard disk modules in the storage system according to the data identifier.
  • the storage system includes multiple storage nodes, and each storage node communicates with the interface modules of the K hard disk modules. Since each storage node communicates with multiple hard disk modules, the computing resources of the storage node can be fully used, and the computing power of the CPU of the storage node can be fully used, thereby reducing the waste of computing resources.
  • the first device is a client of the storage system.
  • the client sends a data read request to a target storage node among the multiple storage nodes, where the data read data carries the data identifier.
  • the target storage node reads the data to be read from the hard disk module storing the data to be read according to the data identifier. Since the target storage node can read the data to be read from the hard disk module storing the data to be read according to the data identifier, the data can be stored with the hard disk module in the storage system as the granularity when storing the data.
  • the first device is one of the multiple storage nodes.
  • the interface module is a host bus adapter, a redundant array of independent hard disks, an expander card, or a network interface card.
  • the storage system includes a second device, and there is a mutual backup relationship or an active backup relationship between the second device and the first device.
  • the second device can take over the hard disk module corresponding to the first device, so there is no need to restore the data stored in the hard disk module corresponding to the first device, which improves the reliability of the storage system.
  • the present application provides a data storage device in a storage system, which is used to execute the first aspect or the method in any one of the possible implementation manners of the first aspect.
  • the device includes a unit for executing the method of the first aspect or any one of the possible implementation manners of the first aspect.
  • the present application provides a data reading device in a storage system, which is used to execute the second aspect or the method in any one of the possible implementation manners of the second aspect.
  • the device includes a unit for executing the second aspect or any one of the possible implementation manners of the second aspect.
  • the present application provides a data storage device in a storage system.
  • the device includes a processor and a communication interface.
  • the processor communicates with the communication interface.
  • the processor and the communication interface are respectively configured to execute corresponding steps in the first aspect or any possible implementation manner of the first aspect.
  • the present application provides a data reading device in a storage system, the device including: a processor and a communication interface. Wherein, the processor communicates with the communication interface. The processor and the communication interface are respectively configured to execute corresponding steps in the second aspect or any possible implementation manner of the second aspect.
  • the present application provides a computer-readable storage medium with program code stored in the computer-readable storage medium, which when run on a computer, causes the computer to execute the first, second, and first aspects above Any possible implementation manner of or a method in any possible implementation manner of the second aspect.
  • the present application provides a computer program product containing program code, which when running on a computer, enables the computer to execute the first aspect, the second aspect, any possible implementation manner of the first aspect, or the second aspect Any of the possible implementations of the method.
  • the present application provides a storage system that includes a storage device and K hard disk modules; wherein the storage device is used to implement the first aspect, any possible implementation manner of the first aspect, and the first aspect.
  • FIG. 1 is a schematic structural diagram of a storage system provided by an embodiment of the present application.
  • FIG. 2 is a schematic structural diagram of another storage system provided by an embodiment of the present application.
  • FIG. 3 is a schematic structural diagram of another storage system provided by an embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of another storage system provided by an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of another storage system provided by an embodiment of the present application.
  • FIG. 6 is a flowchart of a data storage method of a storage system provided by an embodiment of the present application.
  • FIG. 7 is a flowchart of another data storage method of a storage system provided by an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of a data storage device of a storage system provided by an embodiment of the present application.
  • FIG. 9 is a schematic structural diagram of a data reading device of a storage system provided by an embodiment of the present application.
  • FIG. 10 is a schematic structural diagram of a data storage device of another storage system provided by an embodiment of the present application.
  • FIG. 11 is a schematic structural diagram of another data reading device of a storage system provided by an embodiment of the present application.
  • an embodiment of the present application provides a storage system.
  • the storage system includes a plurality of storage nodes, and each storage node corresponds to a plurality of hard disk modules, and the storage node accesses hard disks in the plurality of hard disk modules.
  • the hard disk module includes an interface module and multiple hard disks.
  • the hard disk can be a mechanical disk or a solid state drive (SSD), etc.
  • the interface module can be a Host Bus Adapter (HBA), Redundant Array of Independent Disks (RID), Expander Card (Expander), or Network Interface Controller (Network Interface Controller, NIC), etc. This embodiment of the present invention does not limit this.
  • the interface module in the hard disk module communicates with the hard disk.
  • the storage node communicates with the interface module of the hard disk module to access the hard disk in the hard disk module.
  • the interface module can exist in the form of a card, that is, the interface module can be an interface card.
  • the interface of the hard disk can be Serial Attached Small Computer System Interface (SAS), Serial Advanced Technology Attachment (SATA) or high-speed serial computer expansion bus standard (Peripheral Component Interconnect express, PCIe) and so on.
  • SAS Serial Attached Small Computer System Interface
  • SATA Serial Advanced Technology Attachment
  • PCIe high-speed serial computer expansion bus standard
  • the storage node and the hard disk module can communicate through a bus, for example, through a PCIe bus.
  • the storage node and the hard disk module can also communicate through a network, such as Ethernet.
  • the embodiment of the present invention does not limit this.
  • communication between storage nodes There is a mutual backup relationship between one storage node and one or more other storage nodes in the storage system.
  • the so-called mutual backup relationship means that one storage node can access multiple hard disk modules corresponding to the other storage node, that is, a storage node communicates with the hard disk module of another storage node that has a mutual backup relationship, that is, a storage node Communicate with the interface module of the hard disk module of another storage node that has a mutual backup relationship.
  • each storage node that have a mutual backup relationship are in a normal state, and each storage node only establishes a communication connection with its corresponding multiple hard disk modules, that is, each storage node only communicates with its corresponding multiple hard disk modules.
  • the group has direct read and write access. When one of the storage nodes in the mutual backup relationship fails, other storage nodes take over the failed storage node and access multiple hard disk modules of the failed storage node.
  • the storage node backs up the main storage node. In this way, when the primary storage node fails, the standby storage node can take over the primary storage node.
  • the first storage node and the second storage node are any two storage nodes in the storage system that have a mutual backup relationship or a master-backup relationship.
  • the first storage node corresponds to a plurality of first hard disk modules
  • the second storage node corresponds to a plurality of second hard disk modules
  • the first storage node communicates with the plurality of first hard disk modules and the plurality of second hard disks through a first bus.
  • the modules are connected.
  • the second storage node is connected to the plurality of first hard disk modules and the plurality of second hard disk modules through a second bus.
  • the first storage node and the second storage node are in a mutual backup relationship.
  • the first storage node communicates with the plurality of first storage nodes on the first bus.
  • the hard disk module establishes a communication connection, and the first storage node only directly performs read and write access to the plurality of first hard disk modules; and, the second storage node establishes a communication connection with the plurality of second hard disk modules on the second bus ,
  • the second storage node only directly performs read and write access to the plurality of second hard disk modules.
  • the first storage node takes over the second storage node and accesses the multiple second hard disk modules of the second storage node.
  • the first storage node can directly correspond to multiple first storage nodes.
  • the hard disk module and the plurality of second hard disk modules corresponding to the second storage node perform read and write access.
  • the first storage node includes a first communication interface, a processing unit, and a second communication interface, and the processing unit is connected to the first communication interface and the second communication interface.
  • the first storage node establishes a communication connection with other storage nodes in the storage system through the first communication interface, and the second communication interface of the first storage node communicates with a plurality of first hard disk modules corresponding to the first storage node through the first bus Connected to a plurality of second hard disk modules corresponding to the second storage node.
  • the second communication interface of the first storage node establishes a communication connection with the plurality of first hard disk modules on the first bus.
  • the processing unit of the first storage node may send data to other storage nodes in the storage system through the first communication interface of the first storage node, or receive data or access requests sent by other storage nodes.
  • the processing unit of the first storage node may send data to the client through the first communication interface of the first storage node, or receive an access request sent by the client.
  • the processing unit of the first storage node performs read and write access to the plurality of first hard disk modules through the second communication interface of the first storage node, that is, the unit of the first storage node communicates with the plurality of first hard disks through the second communication interface.
  • the interface module communication of the module In the case of a failure of the second storage node, the second communication interface of the first storage node establishes a communication connection with the plurality of second hard disk modules on the first bus. At this time, the processing node of the first storage node is still The multiple second hard disk modules can be read and written through the second communication interface.
  • the first communication interface and the processing unit in the first storage node are two separate modules, and the first communication interface and the processing unit may be connected by a high-speed bus, which may be PCIE or Intel Rapid Interconnection (intel quickpath interConnector, QPI) etc.
  • a high-speed bus which may be PCIE or Intel Rapid Interconnection (intel quickpath interConnector, QPI) etc.
  • the first communication interface and the processing unit in the first storage node may be integrated.
  • the second communication interface and the processing unit in the first storage node are two separate modules, and the second communication interface and the processing unit can also be connected through a high-speed bus, or the second communication interface in the first storage node
  • the communication interface and the processing unit can be integrated.
  • the first communication interface in the first storage node may be a network card or the like.
  • the first communication interface may be a 10G network card.
  • the processing unit in the first storage node may be a central processing unit (Central Processing Unit, CPU) or a processing unit composed of one or more chips.
  • the composed processing unit may be a data compression card or artificial intelligence (AI). ) Reasoning card, image processing card or video capture card, etc.
  • the second communication interface in the first storage node may be a PCIE conversion chip or a SAS conversion chip.
  • the storage node and the hard disk module may be connected through a shared link single board, and the shared link single board includes a bus composed of one or more physical lines.
  • the second communication interface of the first storage node is connected to the plurality of first hard disk modules and the plurality of second hard disk modules through the first bus, and the second communication of the second storage node The interface is connected to the plurality of first hard disk modules and the plurality of second hard disk modules through a second bus.
  • a first hard disk module and a second hard disk module are connected by communication, for example, connected by a bus.
  • the specific communication connection mode is not limited in the embodiment of the present invention.
  • Such a first hard disk module and a second hard disk module form a cascade relationship.
  • the second communication interface of the first storage node does not need to be connected to each second hard disk module, and the second communication interface of the second storage node does not need to be connected to each first hard disk module, which can reduce storage The number of connections of the node's second communication interface.
  • the first storage node can pass through the second communication interface of the first storage node, the first hard disk module, and the first hard disk
  • the cascading relationship of the modules accesses the second hard disk module, that is, communicates with the interface module of the second hard disk module.
  • the first storage node and the second storage node are storage nodes that have a mutual backup relationship. The first storage node and the second storage node can mutually determine whether each other is faulty, for example, through a heartbeat to determine whether a fault occurs.
  • the storage node may scan multiple hard disk modules corresponding to the other storage node to obtain the hard disk module in the normal state corresponding to the other storage node, and establish communication with the hard disk module in the normal state. connection.
  • the corresponding relationship between the node identifier of the storage node and the module identifier of the hard disk module may be stored in the storage node in the storage system.
  • the node identifier of the storage node may be the address of the storage node, for example, may be an Internet Protocol (IP) address or a Media Access Control Address (MAC) address.
  • IP Internet Protocol
  • MAC Media Access Control Address
  • the module identifier of the hard disk module may be the number of the hard disk module in the storage system, the identifier of the interface module, the address of the interface module, and so on. The embodiment of the present invention does not limit this.
  • the storage node determines the module of each hard disk module among the multiple hard disk modules in the corresponding relationship between the node identifier and the module identifier.
  • the node identifier corresponding to the identifier is updated to the node identifier of the storage node, and an update request is sent to other storage nodes of the storage system.
  • the update request includes the node identifier of the storage node and the module identifiers of the multiple hard disk modules.
  • Any other storage node of the storage system receives the update request, and in the corresponding relationship between the node ID and the module ID, the node ID corresponding to the module ID of each hard disk module in the plurality of hard disk modules is updated to The node ID of the storage node.
  • the first storage node can determine whether the second storage node has a failure, and the second storage node can also determine whether the first storage node has a failure.
  • the first storage node and the second storage node can determine whether each other is faulty in the following two ways:
  • the first storage node periodically sends heartbeat information to the second storage node, and the second storage node also periodically sends heartbeat information to the first storage node. If the first storage node does not receive the heartbeat information sent again by the second storage node within the first time period after receiving the heartbeat information sent by the second storage node, it is determined that the second storage node is faulty. Similarly, if the second storage node does not receive the heartbeat information sent again by the first storage node within the first time period after receiving the heartbeat information sent by the first storage node, it is determined that the first storage node is faulty.
  • the first storage node when it detects its own failure, it can send interruption information to the second storage node, and the second storage node receives the interruption information to determine the failure of the first storage node; in the same way, the second storage node is detecting When the failure occurs, the interruption information can be sent to the first storage node, and the first storage node receives the interruption information and determines that the second storage node is faulty.
  • the storage system may further include a client, and the client may communicate with a storage node in the storage system.
  • the client can store the data to be stored in the storage system, and can also read the data to be read from the storage system.
  • the detailed implementation process of storing the data to be stored in the storage system can refer to the embodiment shown in FIG. 6 or FIG. 7, and the detailed implementation process of reading the data to be read from the storage system can refer to the embodiment shown in FIG. 9. I won't elaborate on it here.
  • an embodiment of the present application provides a data storage method in a storage system.
  • the storage system may be the storage system shown in FIGS. 1 to 5 above.
  • a storage node in the storage system receives a client For the sent data to be stored, divide the data to be stored into M data units, generate N check units for the M data units, and save the K units in K hard disk modules.
  • the K units include the For the M data units and the N check units, both M and N are positive integer data.
  • the method includes:
  • Step 101 The client sends a storage request to the first storage node, where the storage request includes data to be stored.
  • the first storage node may determine the K hard disk modules storing the K hard disk modules according to the corresponding relationship between the partition to which the K hard disk modules belong.
  • the hard disk module recorded in the partition also includes the storage node to which the hard disk module belongs.
  • One of the storage nodes will be recorded in the partition as the primary storage node.
  • the embodiment of the present invention takes the first storage node as the primary storage node as an example.
  • the client determines the corresponding partition according to the data identifier of the data to be stored, and determines to send a storage request to the first storage node according to the information of the primary storage node in the partition.
  • the storage system in the embodiment of the present invention includes multiple partitions, and each hard disk module belongs to one or more partitions.
  • the length of the stripe in the partition is determined.
  • the length of the stripe is the sum of the lengths of M data units and N check units, that is, M+N.
  • M+N the length of the stripe
  • usually one unit in a strip is stored in one hard disk module, and M+N units require M+N hard disk modules. Therefore, one partition contains M+N hard disk modules.
  • the storage node stores the correspondence between partitions and hard disk modules.
  • the correspondence between the partition and the hard disk module may include the partition identifier and the module identifier of the hard disk module of the storage node.
  • the module identifier of the hard disk module of the storage node includes the module identifier and the storage to which the hard disk module belongs.
  • the embodiment of the present invention does not limit this.
  • the storage node divides the data to be stored into data units, calculates the check unit of the data unit, and determines the storage data unit, check unit, and hard disk module according to the partition to which the data unit belongs.
  • storage systems use hash (Hash) rings to divide partitions. The specific implementation of the present invention does not limit this.
  • Step 102 The first storage node cuts the data to be stored into M data units, and generates N check units for the M data units, where M and N are both positive integers.
  • the first storage node generates N check units according to the M data units, and the N check units may be used to restore at least one data unit of the M data units.
  • Step 103 The first storage node stores K copies of units in K hard disk modules; wherein, the K copies of units include the M data units and N check units, and each hard disk module of the K hard disk modules The group stores one copy of the K copies of the unit.
  • Each hard disk module includes an interface module and a hard disk, and the interface module communicates with the hard disk.
  • the first storage node may determine the K hard disk modules storing the K hard disk modules according to the corresponding relationship between the partition to which the K hard disk modules belong. At the same time, each storage node saves the relationship between the hard disk module and the storage node to which the hard disk module belongs. Therefore, after the first storage node determines the K hard disk modules for storing the K copies of the unit, it sends a corresponding storage request to the storage nodes to which the other hard disk modules except the local hard disk modules in the K hard disk modules belong. That is, the first storage node sends a first storage command to the second storage node, and the first storage command includes K2 share units. The third hard disk module number K3 corresponding to the third storage node sends a second storage command to the third storage node.
  • the K hard disk modules include K1 first hard disk modules corresponding to the first storage node, K2 second hard disk modules corresponding to the second storage node, and K3 third hard disks corresponding to the third storage node Module.
  • the first storage node saves K1 share units to K1 first hard disk modules respectively, and sends a first storage command to the second storage node, and the first storage command includes K2 share units. Send a second storage command to the third storage node, where the second storage command includes K3 share units.
  • the second storage node receives the first storage command, and saves the K2 units included in the second storage command to the hard disks in the K2 second hard disk modules.
  • the third storage node receives the second storage command, and saves the K3 units included in the second storage command to the hard disks in the K3 hard disk modules.
  • the first storage command includes the data identifier of the data to be stored
  • the second storage command includes the data identifier of the data to be stored.
  • the first storage node sends a storage response to the client after saving the K1 copy unit and sending the first storage command and the second storage command.
  • the first storage node saves K1 units to K1 first hard disk modules and K2 units to K2 second hard disk modules respectively.
  • the storage node when the storage node saves a unit to the hard disk module, the storage node sends the unit to the interface module included in the hard disk module, and the interface module receives the unit and saves it to the hard disk included in the hard disk module. .
  • the storage node may also compress the unit and save the compressed unit to the hard disk module.
  • the first storage node after storing the K1 share unit in the hard disks of the K1 first hard disk modules, the first storage node also obtains the location information of the K1 share unit.
  • the location information of the unit includes the data type of the unit, the module identification of the hard disk module storing the unit, and the address information of the unit in the hard disk module.
  • the address information can include hard disk identification, starting storage address, data length, and so on.
  • the location information of the unit may also include the unit identification of the check unit corresponding to the data unit; and/or, when the unit is a check unit, the location information of the unit may also include the unit Check the unit identifier of at least one data unit corresponding to the unit.
  • the first storage node correspondingly saves the data identifier of the data to be stored and the location information of the K1 units into the corresponding relationship between the data identifier and the location information.
  • the second storage node saves K2 units to K2 second hard disk modules, it also obtains the location information of the K2 units, and saves the data identification of the data to be stored and the location information of the K2 units correspondingly To the corresponding relationship between the data identifier and the location information.
  • the third storage node saves the K3 share unit to the K3 third hard disk modules, it also obtains the location information of the K3 share unit, and saves the data identifier of the data to be stored and the location information of the K3 share unit in correspondence to The corresponding relationship between the data identifier and the location information.
  • the storage nodes in the storage system also synchronize the corresponding relationship between the data identifier and the location information, so that each storage node can store the same corresponding relationship between the data identifier and the location.
  • the storage nodes to which the K hard disk modules belonging to the same partition belong are storage nodes with a mutual backup relationship or an active backup relationship.
  • the first storage node receives the storage request and obtains a first number, where the first number is the number of normal hard disk modules included in the storage system, and M and N are obtained according to the first number.
  • the operation of the first storage node to obtain the first number may be: the first storage node obtains its corresponding number of hard disk modules in a normal state, and sends a query command to other storage nodes in a normal state in the storage system.
  • the any storage node receives the query command, obtains its corresponding number of hard disk modules in a normal state, and sends the number of hard disk modules in a normal state to the first storage node.
  • the first storage node receives the number of hard disk modules sent by each of the other storage nodes, and accumulates the number of hard disk modules obtained and the number of hard disk modules received to obtain the first number.
  • a communication connection is established between the first storage node and at least one hard disk module. For any hard disk module in the at least one hard disk module, when any one hard disk module fails, the first storage node and any one hard disk module fail. The communication connection between the hard disk modules will be disconnected. Therefore, the first storage node obtains the corresponding number of hard disk modules in a normal state, which can be: the first storage node determines the hard disk modules with which there is a communication connection, and performs statistics on the determined hard disk modules to obtain their corresponding normal hard disk modules. The number of hard disk modules in the status.
  • the storage node obtains its corresponding number of hard disk modules in a normal state in the same manner as the first storage node.
  • the operation of the first storage node to obtain M and N may be: the first storage node obtains the stored M and N, and when M+N is less than the first number, obtains N according to the first number, and subtracts the first number. Go to N to get M; when M+N is equal to or greater than the first number, use the saved M and N to perform the operation of step 102.
  • the first storage node compares the first number with the second number, and the second number is the number of normal hard disk modules included in the storage system acquired last time; when the two are different, N is obtained according to the first number, M is obtained by subtracting N from the first number; when the two are the same, the saved M and N obtained last time are obtained.
  • the first storage node may update the stored second number to the first number; and, respectively update the stored M and N values to the values obtained in this step The values of M and N.
  • the first storage node obtains N according to the first number, which may be:
  • the initial value may be a preset value, for example, the initial value may be 2, 3, 4, or 5 values.
  • storage is performed with the hard disk modules in the storage system as the granularity, that is, the data is divided into M data units, the data units are calculated to obtain the corresponding N check units, and M+N hard disk modules are selected for use To store the corresponding data unit and check unit respectively.
  • M+N storage nodes are selected to store the corresponding data units and check units respectively. Because the number of hard disk modules is greater than the number of storage nodes, the storage system can be fully utilized. Storage resources.
  • the computing resources of the storage node can be fully used, and the computing power of the CPU of the storage node can be fully used, thereby reducing the waste of computing resources.
  • storage resource management is performed at the granularity of hard disk modules, instead of storage resource management based on the granularity of storage nodes.
  • One of the implementations is that the interface module of the hard disk module establishes a storage resource process, and the storage node recognizes according to the storage resource process Granularity of storage resources that can be used.
  • the storage resource process includes the module ID of the hard disk module.
  • an embodiment of the present application provides a data storage method in a storage system.
  • the storage system may be the storage system shown in FIGS. 1 to 5 above.
  • the client divides the data to be stored into M
  • the data unit generates N check units for the M data unit, and sends K hard disk modules in the storage system.
  • the K data unit includes the M data unit and the N check unit,
  • the K units are respectively stored in K hard disk modules, that is, each hard disk module in the K hard disk modules stores one copy of the K units, and M and N are both positive integers.
  • the method includes:
  • Step 201 The client cuts the data to be stored into M data units, and generates N check units for the M data units.
  • Step 202 The client sends K share units to K hard disk modules in the storage system, where the K share units include the M data units and the N check units.
  • the client can determine the K hard disk modules storing the K hard disk modules according to the corresponding relationship between the partition to which the K hard disk modules belong.
  • the corresponding relationship between the partition and the hard disk module includes the main storage node information, and the first storage node is still taken as an example.
  • the corresponding relationship between the partition and the hard disk module can refer to the previous description, which will not be repeated here.
  • the client sends the K share units to the K hard disk modules in the storage system, which specifically includes sending the K share units to the first storage node.
  • the first storage node After the first storage node determines the K hard disk modules for storing the K copies of the unit, it sends a corresponding storage request to the storage node to which the other hard disk modules except the local hard disk modules in the K hard disk modules belong. That is, the first storage node sends a first storage command to the second storage node, and the first storage command includes K2 share units.
  • the third hard disk module number K3 corresponding to the third storage node sends the second storage command to the third storage node.
  • the client can determine the K hard disk modules storing the K hard disk modules according to the corresponding relationship between the partition to which the K hard disk modules belong.
  • the correspondence between the partition to which the K share unit belongs and the hard disk module also includes the storage node information to which the hard disk module belongs. Take the K hard disk modules including K1 hard disk modules of the first storage node, K2 hard disk modules of the second storage node, and K3 hard disk modules of the third storage node as an example.
  • the storage node sends K1 copies of units stored in K1 hard disk modules, K2 copies of units stored in K2 hard disk modules to the second storage node, and K3 copies of units stored in K3 hard disk modules to the third storage node .
  • the storage nodes to which the K hard disk modules belonging to the same partition belong are storage nodes with a mutual backup relationship or an active backup relationship.
  • storage resource management is performed at the granularity of hard disk modules, instead of storage resource management based on the granularity of storage nodes.
  • the interface module of the hard disk module establishes a storage resource process, and the client recognizes according to the storage resource process Granularity of storage resources that can be used.
  • the storage resource process includes the module ID of the hard disk module. Further, the storage resource process also includes the identification of the storage node to which the hard disk module belongs.
  • the embodiment of the present application provides a method for reading data in a storage system.
  • the storage system may be the storage system shown in FIG. 1 to FIG. 5 above.
  • the storage node in the storage system receives the data read from the client. Request, the read request contains the data identifier of the data to be read, the storage node obtains the data to be read according to the data identifier, and sends the data to be read to the client.
  • the first storage node may determine the K hard disk modules storing the K hard disk modules according to the corresponding relationship between the partition to which the K hard disk modules belong.
  • the hard disk module recorded in the partition also includes the storage node to which the hard disk module belongs.
  • One of the storage nodes will be recorded in the partition as the primary storage node.
  • the embodiment of the present invention takes the first storage node as the primary storage node as an example.
  • the client determines the corresponding partition according to the data identifier of the data to be read, and determines to send a read request to the first storage node according to the information of the primary storage node in the partition.
  • the read request contains the data identifier of the data to be read.
  • the first storage node determines the hard disk module where the data to be read is located according to the correspondence between the partition to which the K share units belong and the hard disk module and the data identifier of the data to be read, and reads the hard disk module to be read from the determined hard disk module The data.
  • the second storage node or the third storage node Taking over the first storage node is used to execute the above-mentioned read request, and the storage node taking over the first storage node can directly access K1 hard disk modules.
  • the hard disk module storing the data to be read fails, or when the hard disk in the hard disk module where the data to be read is located fails, or the hard disk module has errors due to the data to be read stored in the hard disk module
  • the data to be read cannot be recovered locally, resulting in the loss of the data to be read.
  • the embodiment of the present invention can use the units stored in other hard disk modules among the K hard disk modules to recover the data unit where the data to be read is located. That is, the capability of the N data units among the M data units can be restored by using the verification protection relationship formed by M data units and N verification units.
  • the client determines the corresponding partition according to the data identifier of the data to be read, and determines the hard disk module corresponding to the data identifier in the partition according to the partition.
  • the client determines the storage node of the hard disk module where the data to be read belongs to according to the correspondence between the hard disk module and the storage node in the partition, that is, the storage node to which the hard disk module belongs, and sends a read request to the storage node.
  • the storage node After receiving the read request, the storage node reads the data to be read from the hard disk module according to the data identification to be read carried in the read request. That is, the storage node communicates with the interface module of the hard disk module.
  • M data units and N check units reference may be made to the description of the above embodiment, which will not be repeated here.
  • the client can communicate with the interface module of the hard disk module, that is, the client can directly access the hard disk module without passing through the storage node. That is, the client directly sends the K units to the corresponding K hard disk modules according to the corresponding relationship between the partitions and the hard disk modules, or reads data from the corresponding hard disk modules.
  • an embodiment of the present application provides a data storage device 300 in a storage system.
  • the device 300 may be deployed in a storage node or client in any of the foregoing embodiments, including:
  • the storage unit 302 is configured to store the K share units in K hard disk modules in the storage system; wherein, the K share units include M data units and N check units, and each of the K hard disk modules One hard disk module stores one copy of the K copy units; each hard disk module includes an interface module and a hard disk, and the interface module communicates with the hard disk.
  • the detailed operation of the storage unit 302 storing the copy unit to the K hard disk modules may refer to related content in step 103 of the embodiment shown in FIG. 6 or step 202 of the embodiment shown in FIG. 7.
  • the storage system includes multiple storage nodes, and each storage node communicates with interface modules of K hard disk modules.
  • the device 300 is a client of a storage system; the device 300 includes: a sending unit 303;
  • the sending unit 303 is configured to send the K shares of units to the target storage node among the multiple storage nodes, so that the target storage node stores the K shares of units in K hard disk modules in the storage system.
  • the detailed operation of the target storage node storing the K copies of the unit into the K hard disk modules in the storage system can refer to the related content in step 202 of the embodiment shown in FIG. 7.
  • the device 300 is one of the multiple storage nodes.
  • the interface module is a host bus adapter, a redundant array of independent hard disks, an expander card, or a network interface card.
  • the storage system includes a second device, and there is a mutual backup relationship or an active backup relationship between the second device and the device.
  • the generating unit generates N check units for M data units.
  • the storage unit stores the K units in the K hard disk modules in the storage system, so that the storage unit realizes storage with the hard disk modules in the storage system as the granularity, even if K hard disk modules are used to store the corresponding K hard disk modules separately. ⁇ units.
  • storage nodes are used as the granularity. Because the number of hard disk modules is greater than the number of storage nodes, the storage resources in the storage system can be fully utilized.
  • an embodiment of the present application provides a data reading device 400 in a storage system.
  • the device 400 may be deployed in a storage node or client in any of the above embodiments, and includes:
  • the receiving unit 401 is configured to receive a read request, where the read request includes a data identifier of the data to be read;
  • the processing unit 402 is configured to determine the hard disk module storing the data to be read from the K hard disk modules in the storage system according to the data identifier.
  • the storage system includes multiple storage nodes, and each storage node communicates with interface modules of K hard disk modules.
  • the device 400 is a client of a storage system; the device 400 further includes: a sending unit 403;
  • the sending unit 403 is configured to send a data read request to a target storage node among multiple storage nodes; the data read data carries the data identifier;
  • the processing unit 402 is configured to read the data to be read from the hard disk module storing the data to be read according to the data identifier.
  • the apparatus 400 is one of the multiple storage nodes.
  • the interface module is a host bus adapter, a redundant array of independent hard disks, an expander card, or a network interface card.
  • the storage system includes a second device, and there is a mutual backup relationship or an active backup relationship between the second device and the apparatus 400.
  • the receiving unit receives a read request, and the read request includes a data identifier of the data to be read.
  • an embodiment of the present application provides a schematic diagram of a data storage device 500 in a storage system.
  • the apparatus 500 may be a client or a storage node in any of the above embodiments.
  • the device 500 includes at least one processor 501, a bus system 502, and at least one communication interface 503.
  • the storage system includes a plurality of hard disk modules, and the at least one processor 501 also communicates with a plurality of hard disk modules in the storage system through the at least one communication interface 503.
  • the processor 501 may be a central processing unit (CPU), or a field programmable gate array (FPGA), an application-specific integrated circuit (ASIC) or Other hardware is substituted, or FPGA or other hardware and the CPU are used together as the processor 501.
  • CPU central processing unit
  • FPGA field programmable gate array
  • ASIC application-specific integrated circuit
  • the device 500 is a device with a hardware structure, and can be used to implement the functional modules in the device 300 described in FIG. 8.
  • the generating unit 301 and the storage unit 302 in the device 300 shown in FIG. 8 may be implemented by calling codes in the memory by the one or more CPUs.
  • the sending unit 303 in the device 300 shown in 8 may be implemented through the communication interface 503.
  • the aforementioned communication interface 503 is used to communicate with other devices or communication networks.
  • the above-mentioned memory can be read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (RAM), or other types that can store information and instructions
  • the dynamic storage device can also be electrically erasable programmable read-only memory (EEPROM), compact disc read-only memory (CD-ROM) or other optical disk storage, optical disc storage (Including compact discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or can be used to carry or store desired program codes in the form of instructions or data structures and can be used by a computer Any other media accessed, but not limited to this.
  • the memory can exist independently and is connected to the processor through a bus.
  • the memory can also be integrated with the processor.
  • the memory is used to store application program code for executing the solution of the present application, and the processor 501 controls the execution.
  • the processor 501 is configured to execute application program codes stored in the memory, so as to realize the functions in the method of the present patent.
  • an embodiment of the present application provides a schematic diagram of a data reading device 600 in a storage system.
  • the apparatus 600 may be a storage node or a client in any of the foregoing embodiments.
  • the device 600 includes at least one processor 601, a bus system 602, and at least one communication interface 603.
  • the processor 601 may be a central processing unit (CPU), or a Field Programmable Gate Array (FPGA), an application-specific integrated circuit (ASIC) or Other hardware is substituted, or FPGA or other hardware and the CPU are used together as the processor 601.
  • CPU central processing unit
  • FPGA Field Programmable Gate Array
  • ASIC application-specific integrated circuit
  • the device 600 is a device with a hardware structure, and can be used to implement the functional modules in the device 400 described in FIG. 9.
  • the processing unit 402 in the apparatus 400 shown in FIG. 9 may be implemented by calling codes in the memory by the one or more CPUs, as shown in FIG. 9
  • the receiving unit 401 and the sending unit 403 in the device 400 can be implemented through the communication interface 603.
  • the above-mentioned bus system 602 may include a path for transferring information between the above-mentioned components.
  • the aforementioned communication interface 603 is used to communicate with other devices or a communication network.
  • the above-mentioned memory can be read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (RAM), or other types that can store information and instructions
  • the dynamic storage device can also be electrically erasable programmable read-only memory (EEPROM), compact disc read-only memory (CD-ROM) or other optical disk storage, optical disc storage (Including compact discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or can be used to carry or store desired program codes in the form of instructions or data structures and can be used by a computer Any other media accessed, but not limited to this.
  • the memory can exist independently and is connected to the processor through a bus.
  • the memory can also be integrated with the processor.
  • the memory is used to store application program code for executing the solution of the present application, and the processor 601 controls the execution.
  • the processor 601 is configured to execute application program codes stored in the memory, so as to realize the functions in the method of the present patent.
  • M in M data units is 1, and N check units are copies of data units. That is, the protection of data units is realized based on multiple copies, and the data units are restored based on multiple copies.
  • N check units are copies of data units. That is, the protection of data units is realized based on multiple copies, and the data units are restored based on multiple copies.
  • the storage system is a storage array
  • the storage node is an array controller of the storage array.
  • the program can be stored in a computer-readable storage medium.
  • the storage medium mentioned can be a read-only memory, a magnetic disk or an optical disk, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Quality & Reliability (AREA)
  • Computer Security & Cryptography (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种存储系统中的数据存储方法、数据读取方法、装置及系统,属于信息技术领域。所述方法包括:第一设备为M份数据单元生成N份校验单元;其中,M和N分别为正整数;M+N=K;所述第一设备将所述K份单元存储到所述存储系统中的K个硬盘模组;其中,所述K份单元包含M份数据单元和N份校验单元,所述K个硬盘模组中的每一个硬盘模组存储所述K份单元中的一份;所述每一个硬盘模组包含接口模块和硬盘,所述接口模块与所述硬盘通信。该方法能够充分利用存储系统中的存储资源。

Description

存储系统中的数据存储方法、数据读取方法、装置及系统
本申请要求在2020年01月08日提交的申请号为202010018706.6、申请名称为“一种存储系统中数据存储方法及存储系统”,以及,要求在2020年02月17日提交的申请号为202010096222.3、申请名称为“存储系统中的数据存储方法、数据读取方法、装置及系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及信息技术领域,特别涉及一种存储系统中的数据存储方法、数据读取方法、装置及系统。
背景技术
分布式存储系统可以包括多个存储节点。存储节点即存储服务器,每一个存储服务器包含存储资源,例如包含多个硬盘。分布式存储系统将存储节点中的存储资源组织起来提供存储服务。
客户端将数据存储到分布式存储系统中,通常将数据划分为M数据单元,基于纠删码(Erasure Coding,EC)算法获得数据单元的N个校验单元。客户端将M个数据单元和N个校验单元存储到M+N个存储节点,即M+N个存储节点中的一个存储节点存储相应的一个单元(数据单元或校验单元)。
尽管硬盘容量在不断增加,存储节点可以挂载更多硬盘,但客户端存储数据时,仍然以存储节点作为基于EC算法的存储粒度,无法充分利用硬盘的存储资源。
发明内容
本申请提供了一种存储系统中的数据存储方法、数据读取方法、装置及系统,以充分利用存储系统中的存储资源。所述技术方案如下:
第一方面,本申请实施例提供了一种存储系统中的数据存储方法,在所述方法包括:第一设备为M份数据单元生成N份校验单元;其中,M和N分别为正整数;M+N=K。第一设备将该K份单元存储到存储系统中的K个硬盘模组;其中,该K份单元包含M份数据单元和N份校验单元,该K个硬盘模组中的每一个硬盘模组存储该K份单元中的一份;每一个硬盘模组包含接口模块和硬盘,该接口模块与该硬盘通信。
这样实现以存储系统中的硬盘模组为粒度进行存储,即使使用K个硬盘模组用于分别存储相应的K份单元。相对于现有技术以存储节点为粒度,因为硬盘模组的数量大于存储节点的数量,可以充分利用存储系统中的存储资源。
在一种可能的实现方式中,该存储系统包含多个存储节点,每一个存储节点均与该K个硬盘模组的接口模块通信。由于每个存储节点与多个硬盘模组的接口模块通信,这样可以充分使用存储节点的计算资源,如此充分使用存储节点的CPU的计算能力,减少计算资源的浪费。
在另一种可能的实现方式中,第一设备为该存储系统的客户端。客户端将该K份单元发送到该多个存储节点中的目标存储节点。目标存储节点将该K份单元存储到该存储系统中的K个硬盘模组。如此实现以存储系统中的硬盘模组为粒度进行存储,从而可以充分利用存储系统中的存储资源。
在另一种可能的实现方式中,第一设备为该多个存储节点中的一个。
在另一种可能的实现方式中,该接口模块为主机总线适配器、独立硬盘冗余阵列卡、扩展器卡或网络接口卡。
在另一种可能的实现方式中,该存储系统包括第二设备,第二设备与第一设备之间存在互备关系或主备关系。这样在第一设备故障时,第二设备可以接管第一设备对应的硬盘模组,这样不需要对第一设备对应的硬盘模组中保存的数据进行恢复,提高存储系统的可靠性。
第二方面,本申请实施例了一种存储系统中的数据读取装置,在所述方法中:第一设备接收读取请求,该读取请求包括待读取数据的数据标识。第一设备根据该数据标识从存储系统中的K个硬盘模组中确定存储待读取数据的硬盘模组。第一设备从存储待读取数据的硬盘模组中读取待读取数据;其中,待读取数据属于M份数据单元中的数据;存储系统还包含该M份数据单元的N份校验单元;M和N分别为正整数;M+N=K;该K个硬盘模组中的每一个硬盘模组存储该K份单元中的一份;该K份单元包含该M份数据单元和该N份校验单元;每一个硬盘模组包含接口模块和硬盘,该接口模块与该硬盘通信。由于该K个硬盘模组中的每一个硬盘模组存储该K份单元中的一份,这样实现以存储系统中的硬盘模组为粒度进行存储。相对于现有技术以存储节点为粒度,因为硬盘模组的数量大于存储节点的数量,可以充分利用存储系统中的存储资源。
在一种可能的实现方式中,存储系统包含多个存储节点,每一个存储节点均与该K个硬盘模组的接口模块通信。由于每个存储节点与多个硬盘模组通信,这样可以充分使用存储节点的计算资源,如此充分使用存储节点的CPU的计算能力,减少计算资源的浪费。
在另一种可能的实现方式中,第一设备为存储系统的客户端。客户端向该多个存储节点中的目标存储节点发送数据读取请求,该数据读取数据携带该数据标识。目标存储节点根据该数据标识从存储该待读取数据的硬盘模组中读取待读取数据。由于目标存储节点可以根据该数据标识从存储该待读取数据的硬盘模组中读取待读取数据,从而在存储数据时可以以存储系统中的硬盘模组为粒度进行存储。
在另一种可能的实现方式中,第一设备为该多个存储节点中的一个。
在另一种可能的实现方式中,该接口模块为主机总线适配器、独立硬盘冗余阵列卡、扩展器卡或网络接口卡。
在另一种可能的实现方式中,存储系统包括第二设备,第二设备与第一设备之间存在互备关系或主备关系。这样在第一设备故障时,第二设备可以接管第一设备对应的硬盘模组,这样不需要对第一设备对应的硬盘模组中保存的数据进行恢复,提高存储系统的可靠性。
第三方面,本申请提供了一种存储系统中的数据存储装置,用于执行第一方面或第一方面的任意一种可能实现方式中的方法。具体地,所述装置包括用于执行第一方面或第一方面的任意一种可能实现方式的方法的单元。
第四方面,本申请提供了一种存储系统中的数据读取装置,用于执行第二方面或第二方面的任意一种可能实现方式中的方法。具体地,所述装置包括用于执行第二方面或第二方面的任意一种可能实现方式的方法的单元。
第五方面,本申请提供了一种存储系统中的数据存储装置,所述装置包括:处理器和通信接口。其中,所述处理器和所述通信接口通信。所述处理器和所述通信接口分别用于执行第一方面或第一方面的任意可能实现方式中的方法中相应的步骤。
第六方面,本申请提供了一种存储系统中的数据读取装置,所述装置包括:处理器和通信接口。其中,所述处理器和所述通信接口通信。所述处理器和所述通信接口分别用于执行第二方面或第二方面的任意可能实现方式中的方法中相应的步骤。
第七方面,本申请提供了一种计算机可读存储介质,计算机可读存储介质中存储有程序代码,当其在计算机上运行时,使得计算机执行上述第一方面、第二方面、第一方面的任意可能实现方式或第二方面的任意可能实现方式中的方法。
第八方面,本申请提供了一种包含程序代码的计算机程序产品,当其在计算机上运行时,使得计算机执行上述第一方面、第二方面、第一方面的任意可能实现方式或第二方面的任意可能实现方式中的方法。
第九方面,本申请提供了一种存储系统,所述存储存储系统包含存储设备和K个硬盘模组;其中,存储设备用于执行上述第一方面、第一方面的任意可能实现方式、第二方面或第二方面的任意可能实现方式中的方法。
附图说明
图1是本申请实施例提供的一种存储系统的结构示意图;
图2是本申请实施例提供的另一种存储系统的结构示意图;
图3是本申请实施例提供的另一种存储系统的结构示意图;
图4是本申请实施例提供的另一种存储系统的结构示意图;
图5是本申请实施例提供的另一种存储系统的结构示意图;
图6是本申请实施例提供的一种存储系统的数据存储方法流程图;
图7是本申请实施例提供的另一种存储系统的数据存储方法流程图;
图8是本申请实施例提供的一种存储系统的数据存储装置结构示意图;
图9是本申请实施例提供的一种存储系统的数据读取装置结构示意图;
图10是本申请实施例提供的另一种存储系统的数据存储装置结构示意图;
图11是本申请实施例提供的另一种存储系统的数据读取装置结构示意图。
具体实施方式
下面将结合附图对本申请实施方式作进一步地详细描述。
参见图1,本申请实施例提供了一种存储系统,该存储系统包括多个存储节点,每个存储节点对应的多个硬盘模组,存储节点访问多个硬盘模组中的硬盘。硬盘模组包含接口模块和多个硬盘。硬盘可以为机械盘或固态硬盘(Solid State Drive,SSD)等。接口模块可以是主机总线适配器(Host Bus Adapter,HBA)、独立硬盘冗余阵列卡(Redundant Array of Independent Disks,RID)、扩展器卡(Expander)或网络接口控制器(Network Interface Controller,NIC)等,本发明实施例对此不作限定。硬盘模组中的接口模块与硬盘通信。存储节点与硬盘模组的接口模块通信,从而访问硬盘模组中的硬盘。
在实现时,接口模块可以卡的形式存在,即接口模块可以为接口卡。
硬盘的接口可以为串行连接小型计算机系统接口(Serial Attached Small Computer System Interface,SAS)、串行高级技术附件(Serial Advanced Technology Attachment,SATA)或高速串行计算机扩展总线标准(Peripheral Component Interconnect express,PCIe)等。
存储节点与硬盘模组之间可以通过总线通信,例如通过PCIe总线通信。存储节点与硬盘模组之间也可以通过网络通信,如以太网等。本发明实施例对此不作限定。
本发明实施例中,存储节点之间通信。存储系统中一个存储节点与其他一个或多个存储节点之间存在互备关系。所谓互备关系是指该一个存储节点可以访问该另一个存储节点对应的多个硬盘模组,即一个存储节点与存在互备关系的另一个存储节点的硬盘模组通信,也就是一个存储节点与存在互备关系的另一个存储节点的硬盘模组的接口模块通信。
需要说明的是:对于存在互备关系的存储节点均处于正常状态下,每个存储节点只与其对应的多个硬盘模组建立通信连接,即每个存储节点只对本身对应的多个硬盘模组直接进行读写访问。对于存在互备关系的存储节点中的某个存储节点故障时,其他存储节点才接管故障存储节点,访问故障存储节点的多个硬盘模组。
可选的,存储系统中一个存储节点与其他一个或多个存储节点之间也可以存在主备关系,即该一个存储节点是主存储节点,其他一个或多个存储节点是备存储节点,备存储节点对主存储节点进行备份。这样在主存储节点出现故障,备存储节点可以接管主存储节点。
例如,参见图2,假设第一存储节点和第二存储节点是存储系统中的任意具有互备关系或主备关系的两个存储节点。以存储节点与硬盘模组之间通过总线通信为例进行说明。第一存储节点对应多个第一硬盘模组,第二存储节点对应多个第二硬盘模组,第一存储节点通过第一总线与该多个第一硬盘模组和该多个第二硬盘模组相连。第二存储节点通过第二总线与该多个第一硬盘模组和该多个第二硬盘模组相连。
其中一种实现,第一存储节点和第二存储节点为互备关系,在第一存储节点和第二存储节点均处于正常状态下,第一存储节点在第一总线上与该多个第一硬盘模组建立通信连接,第一存储节点只对该多个第一硬盘模组直接进行读写访问;以及,第二存储节点在第二总线上与该多个第二硬盘模组建立通信连接,第二存储节点只对该多个第二硬盘模组直接进行读写访问。假设第二存储节点故障了,第一存储节点接管第二存储节点,访问第二存储节点的该多个第二硬盘模组,此时第一存储节点可以直接对其自身对应的多个第一硬盘模组和第二存储节点对应的多个第二硬盘模组进行读写访问。
参见图3,第一存储节点包括第一通信接口、处理单元和第二通信接口,处理单元与第一 通信接口和第二通信接口相连。第一存储节点通过第一通信接口与存储系统中的其他存储节点之间建立通信连接,第一存储节点的第二通信接口通过第一总线与第一存储节点对应的多个第一硬盘模组和第二存储节点对应的多个第二硬盘模组相连。
在第一存储节点处于正常状态的情况下,第一存储节点的第二通信接口在第一总线上与该多个第一硬盘模组之间建立有通信连接。此时第一存储节点的处理单元可以通过第一存储节点的第一通信接口向存储系统中的其他存储节点发送数据,或者,接收其他存储节点发送的数据或访问请求。或者,第一存储节点的处理单元可以通过第一存储节点的第一通信接口向客户端发送数据,或者,接收该客户端发送的访问请求。第一存储节点的处理单元通过第一存储节点的第二通信接口对该多个第一硬盘模组进行读写访问,即第一存储节点的单元通过第二通信接口与多个上第一硬盘模组的接口模块通信。在第二存储节点故障的情况下,第一存储节点的第二通信接口在第一总线上与该多个第二硬盘模组之间建立有通信连接,此时第一存储节点的处理节点还可以通过第二通信接口对该多个第二硬盘模组进行读写访问。
可选的,第一存储节点中的第一通信接口和处理单元是两个单独的模块,第一通信接口与处理单元之间可以通过高速总线相连,该高速总线可以为PCIE或英特尔快速互连(intel quickpath interConnector,QPI)等。或者,第一存储节点中的第一通信接口和处理单元可以集成在一起。
可选的,第一存储节点中的第二通信接口和处理单元是两个单独的模块,第二通信接口与处理单元之间也可以通过高速总线相连,或者,第一存储节点中的第二通信接口和处理单元可以集成在一起。
可选的,第一存储节点中的第一通信接口可以为网卡等。例如,参见图4,第一通信接口可以为10G网卡。第一存储节点中的处理单元可以为中央处理器(Central Processing Unit,CPU)或者由一个或多个芯片组成的处理单元,例如组成的处理单元可以为数据压缩卡、人工智能(Artificial Intelligence,AI)推理卡、图像处理卡或视频采集卡等。第一存储节点中的第二通信接口可以为PCIE转换芯片或SAS转换芯片等。
可选的,存储节点与硬盘模组之间可以通过共享链路单板连接,该共享链路单板中包括一根或多根物理线路组成的总线。
可选的,参见图3,第一存储节点的第二通信接口通过第一总线与该多个第一硬盘模组和该多个第二硬盘模组相连,以及第二存储节点的第二通信接口通过第二总线与该多个第一硬盘模组与该多个第二硬盘模组相连。
可选的,参见图5,一个第一硬盘模组与一个第二硬盘模组采用通信连接,例如通过总线连接。具体的通信连接方式本发明实施例对此不作限定。这样一个第一硬盘模组与一个第二硬盘模组组成级联关系。如此第一存储节点的第二通信接口可以不用与每个第二硬盘模组相连,以及第二存储节点中的第二通信接口也可以不用与每个第一硬盘模组相连,这样可以减少存储节点的第二通信接口的连接数。对于该一个第一硬盘模组级联的第二硬盘模组,当第二存储节点故障时,第一存储节点可以通过第一存储节点的第二通信接口、第一硬盘模组以及第一硬盘模组的级联关系访问第二硬盘模组,即与第二硬盘模组的接口模块通信。例如,第一存储节点与第二存储节点为存在互备关系的存储节点。第一存储节点和第二存储节点可以相互确定对方是否出现故障,例如通过心跳来确定是否故障。
可选的,该存储节点可以扫描该另一个存储节点对应的多个硬盘模组,得到该另一个存 储节点对应的正常状态的硬盘模组,建立与该正常状态的硬盘模组之间的通信连接。
可选的,存储系统中的存储节点中可以保存有存储节点的节点标识与硬盘模组的模组标识的对应关系。
可选的,存储节点的节点标识可以为存储节点的地址,例如可以为网际互连协议(Internet Protocol,IP)地址或媒体存取控制位址(Media Access Control Address,MAC)地址。硬盘模组的模组标识可以是该硬盘模组在存储系统中的编号,接口模块的标识、接口模块的地址等。本发明实施例对此不作限定。
在该存储节点接管该另一个存储节点对应的多个硬盘模组,该存储节点在节点标识与模组标识的对应关系中,将该多个硬盘模组中的每个硬盘模组的模组标识对应的节点标识更新为该存储节点的节点标识,以及,向存储系统的其他存储节点发送更新请求,该更新请求包括该存储节点的节点标识和该多个硬盘模组的模组标识。
该存储系统的任一个其他存储节点接收该更新请求,在节点标识与模组标识的对应关系中,将该多个硬盘模组中的每个硬盘模组的模组标识对应的节点标识更新为该存储节点的节点标识。
例如,对于存在互备关系的第一存储节点和第二存储节点,第一存储节点可以确定第二存储节点是否出现故障,第二存储节点也可以确定第一存储节点是否出现故障。第一存储节点和第二存储节点可以通过如下两种方式确定对方是否出现故障,该两种方式分别为:
第一种方式,第一存储节点周期性地向第二存储节点发送心跳信息,第二存储节点也周期性地向第一存储节点发送心跳信息。第一存储节点在接收到第二存储节点发送的心跳信息之后的第一时间长度内,如果未接收到第二存储节点再次发送的心跳信息,则确定第二存储节点故障。同理,第二存储节点在接收到第一存储节点发送的心跳信息之后的第一时间长度内,如果未接收到第一存储节点再次发送的心跳信息,则确定第一存储节点故障。
第二种方式,第一存储节点在检测出自身故障时可以向第二存储节点发送中断信息,第二存储节点接收该中断信息,确定第一存储节点故障;同理,第二存储节点在检测出自身故障时可以向第一存储节点发送中断信息,第一存储节点接收该中断信息,确定第二存储节点故障。
可选的,存储系统还可以包括客户端,该客户端可以与存储系统中的存储节点通信。客户端中可以向存储系统存储待存储数据,也可以从存储系统中读取待读取数据。其中,向存储系统存储待存储数据的详细实现过程可以参见图6或图7所示的实施例,以及从存储系统读取待读取数据的详细实现过程可以参见图9所示的实施例,在此先不详细说明。
参见图6,本申请实施例提供了一种存储系统中的数据存储方法,该存储系统可以是上述图1至图5所示的存储系统,在该方法中存储系统中的存储节点接收客户端发送的待存储数据,将待存储数据分割成M份数据单元,为该M份数据单元生成N份校验单元,将K份单元分别保存到K个硬盘模组中,该K份单元包括该M份数据单元和该N份校验单元,M和N均为正整数据。该方法包括:
步骤101:客户端向第一存储节点发送存储请求,该存储请求包括待存储数据。
本发明实施例中,第一存储节点可以根据K份单元所属的分区与硬盘模组的对应关系,确定存储K份单元的K个硬盘模组。另外,在分区中记录的硬盘模组还包含硬盘模组所归属 的存储节点。分区中会记录其中一个存储节点作为主存储节点,本发明实施例以第一存储节点为主存储节点为例。客户端根据待存储数据的数据标识确定相应的分区,根据分区中主存储节点的信息,确定向第一存储节点发送存储请求。
可选的,本发明实施例存储系统中包含多个分区(Partition),每一个硬盘模组属于一个或多个分区。根据EC算法确定分区内分条(Stripe)的长度,分条的长度即M份数据单元和N份校验单元的长度之和,即M+N。根据EC算法,通常分条中的一份单元存储到一个硬盘模组,M+N份单元则需要M+N个硬盘模组。因此,一个分区中包含M+N个硬盘模组。存储节点存储有分区与硬盘模组的对应关系。具体的,分区与硬盘模组的对应关系可以包含分区标识与存储节点的硬盘模组的模组标识,其中,存储节点的硬盘模组的模组标识包含模组标识和硬盘模组所属的存储节点的标识。本发明实施例对此不作限定。存储节点将待存储数据分割成数据单元,并且计算数据单元的校验单元,根据数据单元所属的分区,确定存储数据单元和校验单元和硬盘模组。通常,存储系统使用哈希(Hash)环的方式划分分区。具体实现本发明对此不作限定。
步骤102:第一存储节点将待存储数据切割成M份数据单元,为该M份数据单元生成N份校验单元,M和N均为正整数。
第一存储节点根据该M份数据单元生成N份校验单元,该N份校验单元可以用于对该M份数据单元中的至少一个数据单元进行恢复。
步骤103:第一存储节点将K份单元存储到K个硬盘模组;其中,该K份单元包含该M份数据单元和N份校验单元,该K个硬盘模组中的每一个硬盘模组存储该K份单元中的一份。
每一个硬盘模组包含接口模块和硬盘,该接口模块与该硬盘通信。
本发明实施例中,第一存储节点可以根据K份单元所属的分区与硬盘模组的对应关系,确定存储K份单元的K个硬盘模组。同时,每一个存储节点保存有硬盘模组与该硬盘模组的归属的存储节点的关系。因此,第一存储节点确定存储K份单元的K个硬盘模组后,向K个硬盘模组中除本地硬盘模组外的其他硬盘模组所属的存储节点发送相应的存储请求。即第一存储节点向第二存储节点发送第一存储命令,该第一存储命令包括K2份单元。第三存储节点对应的第三硬盘模组数目K3,向该第三存储节点发送第二存储命令。
其中一种实现,该K个硬盘模组包括第一存储节点对应的K1个第一硬盘模组,第二存储节点对应的K2个第二硬盘模组和第三存储节点对应的K3第三硬盘模组。K1、K2、K3均为正整数,并且K1+K2+K3=K。
在本步骤中,第一存储节点将K1份单元分别保存到K1个第一硬盘模组,向第二存储节点发送第一存储命令,该第一存储命令包括K2份单元。向第三存储节点发送第二存储命令,该第二存储命令包括K3份单元。
第二存储节点接收第一存储命令,将第二存储命令包括的K2份单元保存到K2个第二硬盘模组中的硬盘。第三存储节点接收第二存储命令,将第二存储命令包括的K3份单元保存到K3个硬盘模组中的硬盘。
可选的,第一存储命令包括待存储数据的数据标识,第二存储命令包括待存储数据的数据标识。
可选的,第一存储节点在保存K1份单元以及发送第一存储命令和第二存储命令后,向 客户端发送存储响应。
可选的,在第二存储节点故障时,第一存储节点将K1份单元分别保存到K1个第一硬盘模组以及将K2份单元分别保存到K2个第二硬盘模组。
可选的,存储节点在向硬盘模组保存一份单元时,该存储节点向该硬盘模组包括的接口模块发送该单元,该接口模块接收该单元并保存到该硬盘模组包括的硬盘中。
可选的,存储节点在向硬盘模组保存一份单元之前,还可以对该单元进行压缩,向该硬盘模组保存压缩的该单元。
可选的,第一存储节点在将K1份单元保存到K1个第一硬盘模组的硬盘中之后,还获取该K1份单元的位置信息。对于该K1份单元中的任一份单元,该单元的位置信息包括该单元的数据类型,存储该单元的硬盘模组的模组标识、该单元在该硬盘模组中的地址信息。地址信息可以包含硬盘标识、起始存储地址和数据长度等。在该单元为数据单元时,该单元的数据类型为数据单元,在该单元为校验单元时,该单元的数据类型为校验单元。在该单元为数据单元时,该单元的位置信息还可以包括该数据单元对应的校验单元的单元标识;和/或,在该单元为校验单元时,该单元的位置信息还以包括该校验单元对应的至少一个数据单元的单元标识。第一存储节点将待存储数据的数据标识和该K1个单元的位置信息对应保存到数据标识与位置信息的对应关系中。
同理,第二存储节点在将K2份单元保存到K2个第二硬盘模组之后,还获取该K2个单元的位置信息,将待存储数据的数据标识和该K2份单元的位置信息对应保存到数据标识与位置信息的对应关系中。以及,第三存储节点在将K3份单元保存到K3个第三硬盘模组之后,还获取该K3份单元的位置信息,将待存储数据的数据标识和该K3份单元的位置信息对应保存到数据标识与位置信息的对应关系中。
存储系统中的各存储节点之间还同步数据标识与位置信息的对应关系,这样可以使各存储节点保存相同的数据标识与位置的对应关系。
本发明实施例中属于同一分区的K个硬盘模组所归属的存储节点为具有互备关系或主备关系的存储节点。
可选的,在执行步骤102之前,第一存储节点接收该存储请求,获取第一数目,第一数目是存储系统包括的正常状态的硬盘模组数目,根据第一数目获取M和N。
第一存储节点获取第一数目的操作可以为:第一存储节点获取其对应的正常状态的硬盘模组数目,以及向存储系统中的其他各正常状态的存储节点发送查询命令。对于存储系统中的其他任一个存储节点,该任一个存储节点接收该查询命令,获取其对应的正常状态的硬盘模组数目,向第一存储节点发送该正常状态的硬盘模组数目。第一存储节点接收其他各存储节点发送的硬盘模组数目,将其获取的硬盘模组数目和接收的硬盘模组数目进行累加,得到第一数目。
第一存储节点与至少一个硬盘模组之间建立有通信连接,对于该至少一个硬盘模组中的任一个硬盘模组,当该任一个硬盘模组故障时,第一存储节点与该任一个硬盘模组之间的通信连接就会断开。所以第一存储节点获取其对应的正常状态的硬盘模组数目,可以为:第一存储节点确定与其之间存在通信连接的硬盘模组,对确定的硬盘模组进行统计,得到其对应的正常状态的硬盘模组数目。
同理,对于其他任一个存储节点,该任一个存储节点,同第一存储节点一样按上述方式 获取其对应的正常状态的硬盘模组数目。
可选的,第一存储节点获取M和N的操作可以为:第一存储节点获取保存的M和N,在M+N小于第一数目时,根据第一数目获取N,将第一数目减去N得到M;在M+N等于或大于第一数目时,使用保存的M和N执行步骤102的操作。或者,第一存储节点比较第一数目和第二数目,第二数目是上一次获取的存储系统包括的正常状态的硬盘模组数目;在比较出两者不同时,根据第一数目获取N,将第一数目减去N得到M;在比较出两者相同时,获取保存的上一次获取的M和N。
可选的,在第一数目和第二数目不同时,第一存储节点可以将保存的第二数目更新为第一数目;以及,将保存的M和N值分别更新为在本步骤中得到的M和N的值。
可选的,第一存储节点根据第一数目获取N,可以为:
第一存储节点在第一数目小于数目阈值且N大于X时,设置N=N-X,X为大于0的整数;在第一数目大于或等于数目阈值且N小于初始数值时,设置N等于初始数值;在第一数目大于或等于数目阈值且N等于于初始数值时,获取保存的N。
初始数值可以为预设值,例如初始数值可以为2、3、4或5等数值。
在本申请实施例中,以存储系统中的硬盘模组为粒度进行存储,即将数据划分为M个数据单元,计算数据单元获得相应的N个校验单元,选择M+N个硬盘模组用于分别存储相应的数据单元和校验单元。相对于现有技术以存储节点为粒度,选择M+N个存储节点用于分别存储相应的数据单元和校验单元,因为硬盘模组的数量大于存储节点的数量,可以充分利用存储系统中的存储资源。又由于每个存储节点对应多个硬盘模组,这样可以充分使用存储节点的计算资源,如此充分使用存储节点的CPU的计算能力,减少计算资源的浪费。
本发明实施例以硬盘模组为粒度进行存储资源管理,而不是基于存储节点粒度进行存储资源管理,其中一种实现是硬盘模组的接口模块建立存储资源进程,存储节点根据存储资源进程识别到可以使用的存储资源粒度。存储资源进程包含硬盘模组的模组标识。
参见图7,本申请实施例提供了一种存储系统中的数据存储方法,该存储系统可以是上述图1至图5所示的存储系统,在该方法中客户端将待存储数据分割成M份数据单元,为该M份数据单元生成N份校验单元,向存储系统中的K个硬盘模组发送K份单元,该K份单元包括该M份数据单元和该N份校验单元,将该K份单元分别保存到K个硬盘模组中,即K个硬盘模组中每一个硬盘模组存储K份单元中的一份,M和N均为正整数。该方法包括:
步骤201:客户端将待存储数据切割成M份数据单元,为该M份数据单元生成N份校验单元。
步骤202:客户端向存储系统中的K个硬盘模组发送K份单元,该K份单元包括该M份数据单元和该N个校验单元。
本发明实施例中,客户端可以根据K份单元所属的分区与硬盘模组的对应关系,确定存储K份单元的K个硬盘模组。其中一种实现,分区与硬盘模组的对应关系中包含主存储节点信息,仍然以第一存储节点为例。分区与硬盘模组的对应关系可以参考前面的描述,在此不再赘述。客户端向存储系统中的K个硬盘模组发送K份单元,具体包括向第一存储节点发送发送K份单元。第一存储节点确定存储K份单元的K个硬盘模组后,向K个硬盘模组中除本地硬盘模组外的其他硬盘模组所属的存储节点发送相应的存储请求。即第一存储节点向第二存储节点发送第一存储命令,该第一存储命令包括K2份单元。第三存储节点对应的第三 硬盘模组数目K3,向该第三存储节点发送第二存储命令。K1、K2、K3均为正整数,并且K1+K2+K3=K。
本发明实施例中,客户端可以根据K份单元所属的分区与硬盘模组的对应关系,确定存储K份单元的K个硬盘模组。其中一种实现,K份单元所属的分区与硬盘模组的对应关系中还包含硬盘模组所属的存储节点信息。以K个硬盘模组中包含第一存储节点的K1个硬盘模组,包含第二存储节点的K2个硬盘模组,以及第三存储节点的K3个硬盘模组为例,客户端向第一存储节点发送存储到K1个硬盘模组的K1份单元,向第二存储节点发送存储到K2个硬盘模组的K2份单元,向第三存储节点发送存储到K3个硬盘模组的K3份单元。
本发明实施例中属于同一分区的K个硬盘模组所归属的存储节点为具有互备关系或主备关系的存储节点。
本发明实施例以硬盘模组为粒度进行存储资源管理,而不是基于存储节点粒度进行存储资源管理,其中一种实现是硬盘模组的接口模块建立存储资源进程,客户端根据存储资源进程识别到可以使用的存储资源粒度。存储资源进程包含硬盘模组的模组标识。进一步的,存储资源进程还包含硬盘模组所归属的存储节点的标识。
本申请实施例提供了一种存储系统中的数据读取方法,该存储系统可以是上述图1至图5所示的存储系统,在该方法中存储系统中的存储节点接收客户端发送读取请求,读取请求中包含待读取数据的数据标识,存储节点根据该数据标识获取待读取数据,向客户端发送待读取数据。
其中一种实现,第一存储节点可以根据K份单元所属的分区与硬盘模组的对应关系,确定存储K份单元的K个硬盘模组。另外,在分区中记录的硬盘模组还包含硬盘模组所归属的存储节点。分区中会记录其中一个存储节点作为主存储节点,本发明实施例以第一存储节点为主存储节点为例。客户端根据待读取数据的数据标识确定相应的分区,根据分区中主存储节点的信息,确定向第一存储节点发送读取请求。读取请求中包含待读取数据的数据标识。第一存储节点根据K份单元所属的分区与硬盘模组的对应关系和待读取数据的数据标识确定待读取数据所在的硬盘模组,从确定的硬盘模组中读取该待读取的数据。结合图6所示的实施例,当第一存储节点故障,由于第二存储节点和第三存储节点与第一存储节点存在互备关系或主备关系,因此第二存储节点或第三存储节点接管所述第一存储节点,用于执行上述读取请求,并且接管第一存储节点的存储节点可以直接访问K1个硬盘模组。另外,当存储待读取数据的硬盘模组发生故障,或者当待读取数据所在的硬盘模组中的硬盘发生故障,或者由于硬盘模组中存储的待读取数据发生错误而硬盘模组本地无法恢复出待读取数据,导致待读取数据丢失,本发明实施例可以使用K个硬盘模组中其他硬盘模组中存储的单元恢复出待读取数据所在的数据单元。即利用M个数据单元和N个校验单元所形成的校验保护关系可以恢复M个数据单元中的N个数据单元的能力。
另一种实现,客户端根据待读取数据的数据标识确定相应的分区,根据分区确定数据标识在分区中对应的硬盘模组。客户端根据分区中硬盘模组和存储节点的对应关系,即硬盘模组所归属的存储节点,确定待读取数据所在的硬盘模组所属的存储节点,向该存储节点发送读取请求。该存储节点接收该读取请求后,根据读取请求中携带的待读取数据标识向该硬盘模组读取该待读取的数据。即存储节点与硬盘模组的接口模块通信。基于M个数据单元和N个校验单位进行数据恢复可参考上面实施例的描述,在此不再赘述。
本发明另一种存储架构中,客户端可以与硬盘模组的接口模块通信,即客户端可以直接访问硬盘模组,而不需要经过存储节点。即客户端根据分区与硬盘模组的对应关系直接将K个单元发送到相应的K个硬盘模组,或从相应的硬盘模组中读取数据。
参见图8,本申请实施例提供了一种存储系统中的数据存储装置300,所述装置300可以部署在上述任一实施例的存储节点或客户端中,包括:
生成单元301,用于为M份数据单元生成N份校验单元;其中,M和N分别为正整数;M+N=K;
存储单元302,用于将该K份单元存储到存储系统中的K个硬盘模组;其中,该K份单元包含M份数据单元和N份校验单元,该K个硬盘模组中的每一个硬盘模组存储该K份单元中的一份;每一个硬盘模组包含接口模块和硬盘,该接口模块与该硬盘通信。
可选的,存储单元302将该份单元存储到该K个硬盘模组的详细操作可以参见图6所示实施例的步骤103或图7所示实施例的步骤202中的相关内容。
可选的,存储系统包含多个存储节点,每一个存储节点均与K个硬盘模组的接口模块通信。
可选的,所述装置300为存储系统的客户端;所述装置300包括:发送单元303;
发送单元303,用于将该K份单元发送到该多个存储节点中的目标存储节点,以使目标存储节点将该K份单元存储到存储系统中的K个硬盘模组。
可选的,目标存储节点将该K份单元存储到存储系统中的K个硬盘模组的详细操作可以参见图7所示实施例的步骤202中的相关内容。
可选的,所述装置300为该多个存储节点中的一个。
可选的,接口模块为主机总线适配器、独立硬盘冗余阵列卡、扩展器卡或网络接口卡。
可选的,存储系统包括第二设备,第二设备与所述装置之间存在互备关系或主备关系。
在本申请实施例中,生成单元为M份数据单元生成N份校验单元。存储单元将该K份单元存储到存储系统中的K个硬盘模组,这样存储单元实现以存储系统中的硬盘模组为粒度进行存储,即使使用K个硬盘模组用于分别存储相应的K份单元。相对于现有技术以存储节点为粒度,因为硬盘模组的数量大于存储节点的数量,可以充分利用存储系统中的存储资源。
参见图9,本申请实施例提供了一种存储系统中的数据读取装置400,所述装置400可以部署在上述任一实施例的存储节点或客户端中,包括:
接收单元401,用于接收读取请求,该读取请求包括待读取数据的数据标识;
处理单元402,用于根据该数据标识从存储系统中的K个硬盘模组中确定存储待读取数据的硬盘模组。
处理单元402,还用于从存储待读取数据的硬盘模组中读取待读取数据;其中,待读取数据属于M份数据单元中的数据;存储系统还包含该M份数据单元的N份校验单元;M和N分别为正整数;M+N=K;该K个硬盘模组中的每一个硬盘模组存储该K份单元中的一份;该K份单元包含M份数据单元和N份校验单元;每一个硬盘模组包含接口模块和硬盘,该接口模块与该硬盘通信。
可选的,存储系统包含多个存储节点,每一个存储节点均与K个硬盘模组的接口模块通信。
可选的,所述装置400为存储系统的客户端;所述装置400还包括:发送单元403;
发送单元403,用于向多个存储节点中的目标存储节点发送数据读取请求;该数据读取数据携带该数据标识;
处理单元402,用于根据该数据标识从存储待读取数据的硬盘模组中读取待读取数据。
可选的,所述装置400为该多个存储节点中的一个。
可选的,该接口模块为主机总线适配器、独立硬盘冗余阵列卡、扩展器卡或网络接口卡。
可选的,该存储系统包括第二设备,第二设备与所述装置400之间存在互备关系或主备关系。
在本申请实施例中,接收单元接收读取请求,该读取请求包括待读取数据的数据标识。处理单元从存储待读取数据的硬盘模组中读取待读取数据;其中,待读取数据属于M份数据单元中的数据;存储系统还包含该M份数据单元的N份校验单元,M+N=K;存储系统中的K个硬盘模组中的每一个硬盘模组存储该K份单元中的一份。由于该K个硬盘模组中的每一个硬盘模组存储该K份单元中的一份,这样实现以存储系统中的硬盘模组为粒度进行存储。相对于现有技术以存储节点为粒度,因为硬盘模组的数量大于存储节点的数量,可以充分利用存储系统中的存储资源。
参见图10,本申请实施例提供了一种存储系统中的数据存储装置500示意图。该装置500可以是上述任一实施例中的客户端或存储节点。该装置500包括至少一个处理器501,总线系统502以及至少一个通信接口503。
可选的,该存储系统中包括多个硬盘模组,该至少一个处理器501还通过该至少一个通信接口503与该存储系统中的多个硬盘模组通信。
具体实现,处理器501可以是中央处理器(Central Processing Unit,CPU),还可以由现场可编程逻辑门阵列(Field Programmable Gate Array,FPGA)、专用集成电路(Application-specific integrated circuit,ASIC)或其他硬件代替,或者,FPGA或其他硬件与CPU共同作为处理器501。
该装置500是一种硬件结构的装置,可以用于实现图8所述的装置300中的功能模块。
可选的,当处理器501由一个或多个CPU实现时,图8所示的装置300中的生成单元301和存储单元302可以通过该一个或多个CPU调用存储器中的代码来实现,图8所示的装置300中的发送单元303可以通过该通信接口503来实现。
上述通信接口503,用于与其他设备或通信网络通信。
上述存储器可以是只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)或者可存储信息和指令的其他类型的动态存储设备,也可以是电可擦可编程只读存储器(electrically erasable programmable read-only memory,EEPROM)、只读光盘(compact disc read-only memory,CD-ROM)或其他光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其他磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。存储器可以是独立存在,通过总线与处理器相连接。存储器也可以和处理器集成在一起。
其中,存储器用于存储执行本申请方案的应用程序代码,并由处理器501来控制执行。 处理器501用于执行存储器中存储的应用程序代码,从而实现本专利方法中的功能。
参见图11,本申请实施例提供了一种存储系统中的数据读取装置600示意图。该装置600可以是上述任一实施例中的存储节点或客户端。该装置600包括至少一个处理器601,总线系统602以及至少一个通信接口603。
具体实现,处理器601可以是中央处理器(Central Processing Unit,CPU),还可以由现场可编程逻辑门阵列(Field Programmable Gate Array,FPGA)、专用集成电路(Application-specific integrated circuit,ASIC)或其他硬件代替,或者,FPGA或其他硬件与CPU共同作为处理器601。
该装置600是一种硬件结构的装置,可以用于实现图9所述的装置400中的功能模块。可选的,当处理器601由一个或多个CPU实现时,图9所示的装置400中的处理单元402可以通过该一个或多个CPU调用存储器中的代码来实现,图9所示的装置400中的接收单元401和发送单元403可以通过该通信接口603来实现。
上述总线系统602可包括一通路,在上述组件之间传送信息。
上述通信接口603,用于与其他设备或通信网络通信。
上述存储器可以是只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)或者可存储信息和指令的其他类型的动态存储设备,也可以是电可擦可编程只读存储器(electrically erasable programmable read-only memory,EEPROM)、只读光盘(compact disc read-only memory,CD-ROM)或其他光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其他磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。存储器可以是独立存在,通过总线与处理器相连接。存储器也可以和处理器集成在一起。
其中,存储器用于存储执行本申请方案的应用程序代码,并由处理器601来控制执行。处理器601用于执行存储器中存储的应用程序代码,从而实现本专利方法中的功能。
在本发明实施实施例中,M份数据单元中的M为1,N份校验单元为数据单元的副本。即基于多副本实现数据单元的保护,基于多副本恢复数据单元。具体实现可以参考上面实施例的描述,在此不再赘述。
在本发明另一实施例中,存储系统为存储阵列,存储节点为存储阵列的阵列控制器。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。

Claims (36)

  1. 一种存储系统中的数据存储方法,其特征在于,所述方法包括:
    第一设备为M份数据单元生成N份校验单元;其中,M和N分别为正整数;M+N=K;
    所述第一设备将所述K份单元存储到所述存储系统中的K个硬盘模组;其中,所述K份单元包含M份数据单元和N份校验单元,所述K个硬盘模组中的每一个硬盘模组存储所述K份单元中的一份;所述每一个硬盘模组包含接口模块和硬盘,所述接口模块与所述硬盘通信。
  2. 根据权利要求1所述的方法,其特征在于,所述存储系统包含多个存储节点,每一个存储节点均与所述K个硬盘模组的接口模块通信。
  3. 根据权利要求2所述的方法,其特征在于,所述第一设备为所述存储系统的客户端;所述第一设备将所述K份单元存储到所述存储系统中的K个硬盘模组,具体包括:
    所述客户端将所述K份单元发送到所述多个存储节点中的目标存储节点;
    所述目标存储节点将所述K份单元存储到所述存储系统中的K个硬盘模组。
  4. 根据权利要求2所述的方法,其特征在于,所述第一设备为所述多个存储节点中的一个。
  5. 根据权利要求1至4任一项所述的方法,其特征在于,所述接口模块为主机总线适配器、独立硬盘冗余阵列卡、扩展器卡或网络接口卡。
  6. 根据权利要求1所述的方法,其特征在于,所述存储系统包括第二设备,所述第二设备与所述第一设备之间存在互备关系或主备关系。
  7. 一种存储系统中的数据读取方法,其特征在于,所述方法包括:
    第一设备接收读取请求,所述读取请求包括待读取数据的数据标识;
    所述第一设备根据所述数据标识从所述存储系统中的K个硬盘模组中确定存储所述待读取数据的硬盘模组;
    所述第一设备从所述存储所述待读取数据的硬盘模组中读取所述待读取数据;其中,所述待读取数据属于M份数据单元中的数据;所述存储系统还包含所述M份数据单元的N份校验单元;M和N分别为正整数;M+N=K;所述K个硬盘模组中的每一个硬盘模组存储所述K份单元中的一份;所述K份单元包含所述M份数据单元和所述N份校验单元;所述每一个硬盘模组包含接口模块和硬盘,所述接口模块与所述硬盘通信。
  8. 根据权利要求7所述的方法,其特征在于,所述存储系统包含多个存储节点,每一个存储节点均与所述K个硬盘模组的接口模块通信。
  9. 根据权利要求7所述的方法,其特征在于,所述第一设备为所述存储系统的客户端;所述第一设备从所述存储所述待读取数据的硬盘模组中读取所述待读取数据,具体包括:
    所述客户端向所述多个存储节点中的目标存储节点发送数据读取请求;所述数据读取数据携带所述数据标识;
    所述目标存储节点根据所述数据标识从所述存储所述待读取数据的硬盘模组中读取所述待读取数据。
  10. 根据权利要求8所述的方法,其特征在于,所述第一设备为所述多个存储节点中的一个。
  11. 根据权利要求7至10任一项所述的方法,其特征在于,所述接口模块为主机总线适配器、独立硬盘冗余阵列卡、扩展器卡或网络接口卡。
  12. 根据权利要求7所述的方法,其特征在于,所述存储系统包括第二设备,所述第二设备与所述第一设备之间存在互备关系或主备关系。
  13. 一种存储系统中的数据存储装置,其特征在于,所述装置包括:
    生成单元,用于为M份数据单元生成N份校验单元;其中,M和N分别为正整数;M+N=K;
    存储单元,用于将所述K份单元存储到所述存储系统中的K个硬盘模组;其中,所述K份单元包含M份数据单元和N份校验单元,所述K个硬盘模组中的每一个硬盘模组存储所述K份单元中的一份;所述每一个硬盘模组包含接口模块和硬盘,所述接口模块与所述硬盘通信。
  14. 根据权利要求13所述的装置,其特征在于,所述存储系统包含多个存储节点,每一个存储节点均与所述K个硬盘模组的接口模块通信。
  15. 根据权利要求14所述的装置,其特征在于,所述装置为所述存储系统的客户端;所述装置包括:发送单元;
    所述发送单元,用于将所述K份单元发送到所述多个存储节点中的目标存储节点,以使所述目标存储节点将所述K份单元存储到所述存储系统中的K个硬盘模组。
  16. 根据权利要求14所述的装置,其特征在于,所述装置为所述多个存储节点中的一个。
  17. 根据权利要求13至16任一项所述的装置,其特征在于,所述接口模块为主机总线适配器、独立硬盘冗余阵列卡、扩展器卡或网络接口卡。
  18. 根据权利要求13所述的装置,其特征在于,所述存储系统包括第二设备,所述第二设备与所述装置之间存在互备关系或主备关系。
  19. 一种存储系统中的数据读取装置,其特征在于,所述装置包括:
    接收单元,用于接收读取请求,所述读取请求包括待读取数据的数据标识;
    处理单元,用于根据所述数据标识从所述存储系统中的K个硬盘模组中确定存储所述待读取数据的硬盘模组;
    所述处理单元,还用于从所述存储所述待读取数据的硬盘模组中读取所述待读取数据;其中,所述待读取数据属于M份数据单元中的数据;所述存储系统还包含所述M份数据单元的N份校验单元;M和N分别为正整数;M+N=K;所述K个硬盘模组中的每一个硬盘模组存储所述K份单元中的一份;所述K份单元包含所述M份数据单元和所述N份校验单元;所述每一个硬盘模组包含接口模块和硬盘,所述接口模块与所述硬盘通信。
  20. 根据权利要求19所述的装置,其特征在于,所述存储系统包含多个存储节点,每一个存储节点均与所述K个硬盘模组的接口模块通信。
  21. 根据权利要求19所述的装置,其特征在于,所述装置为所述存储系统的客户端;所述装置还包括:发送单元;
    所述发送单元,用于向所述多个存储节点中的目标存储节点发送数据读取请求;所述数据读取数据携带所述数据标识;
    所述处理单元,用于根据所述数据标识从所述存储所述待读取数据的硬盘模组中读取所述待读取数据。
  22. 根据权利要求20所述的装置,其特征在于,所述装置为所述多个存储节点中的一个。
  23. 根据权利要求19至22任一项所述的装置,其特征在于,所述接口模块为主机总线适配器、独立硬盘冗余阵列卡、扩展器卡或网络接口卡。
  24. 根据权利要求19所述的装置,其特征在于,所述存储系统包括第二设备,所述第二设备与所述装置之间存在互备关系或主备关系。
  25. 一种存储系统中的存储设备,其特征在于,所述存储设备包括处理器和通信接口,所述处理器与所述通信接口通信;所述处理器用于:
    为M份数据单元生成N份校验单元;其中,M和N分别为正整数;M+N=K;
    将所述K份单元存储到所述存储系统中的K个硬盘模组;其中,所述K份单元包含M份数据单元和N份校验单元,所述K个硬盘模组中的每一个硬盘模组存储所述K份单元中的一份;所述每一个硬盘模组包含接口模块和硬盘,所述接口模块与所述硬盘通信。
  26. 根据权利要求25所述的存储设备,其特征在于,所述存储系统包含多个存储节点,每一个存储节点均与所述K个硬盘模组的接口模块通信。
  27. 根据权利要求26所述的存储设备,其特征在于,所述存储设备为所述存储系统的客户端;所述通信接口用于将所述K份单元发送到所述多个存储节点中的目标存储节点,以使所述目标存储节点将所述K份单元存储到所述存储系统中的K个硬盘模组。
  28. 一种存储系统中的存储设备,其特征在于,所述存储设备包括处理器和通信接口,所述处理器与所述通信接口通信;
    所述通信接口用于接收读取请求,所述读取请求包括待读取数据的数据标识;
    所述处理器用于:根据所述数据标识从所述存储系统中的K个硬盘模组中确定存储所述待读取数据的硬盘模组;
    从所述存储所述待读取数据的硬盘模组中读取所述待读取数据;其中,所述待读取数据属于M份数据单元中的数据;所述存储系统还包含所述M份数据单元的N份校验单元;M和N分别为正整数;M+N=K;所述K个硬盘模组中的每一个硬盘模组存储所述K份单元中的一份;所述K份单元包含所述M份数据单元和所述N份校验单元;所述每一个硬盘模组包含接口模块和硬盘,所述接口模块与所述硬盘通信。
  29. 根据权利要求28所述的存储设备,其特征在于,所述存储系统包含多个存储节点,每一个存储节点均与所述K个硬盘模组的接口模块通信。
  30. 根据权利要求29所述的存储设备,其特征在于,所述存储设备为所述存储系统的客户端;所述通信接口,还用于向所述多个存储节点中的目标存储节点发送数据读取请求;所述数据读取数据携带所述数据标识;
    所述处理器还用于根据所述数据标识从所述存储所述待读取数据的硬盘模组中读取所述待读取数据。
  31. 一种存储系统,其特征在于,所述存储系统包含存储设备和K个硬盘模组;
    其中,所述存储设备用于:
    为M份数据单元生成N份校验单元;其中,M和N分别为正整数;M+N=K;
    将所述K份单元存储到所述K个硬盘模组;其中,所述K份单元包含M份数据单元和N份校验单元,所述K个硬盘模组中的每一个硬盘模组用于存储所述K份单元中的一份;所述每一个硬盘模组包含接口模块和硬盘,所述接口模块与所述硬盘通信。
  32. 根据权利要求31所述的存储系统,其特征在于,所述存储系统还包含多个存储节点,每一个存储节点均与所述K个硬盘模组的接口模块通信。
  33. 根据权利要求32所述的存储系统,其特征在于,所述第一设备为所述存储系统的客户端;所述存储设备用于将所述K份单元存储到所述存储系统中的K个硬盘模组,具体包括:
    所述客户端用于将所述K份单元发送到所述多个存储节点中的目标存储节点;
    所述目标存储节点用于将所述K份单元存储到所述存储系统中的K个硬盘模组。
  34. 一种存储系统,其特征在于,所述存储系统包含存储设备和K个硬盘模组;
    所述存储设备用于:
    接收读取请求,所述读取请求包括待读取数据的数据标识;
    根据所述数据标识从所述K个硬盘模组中确定存储所述待读取数据的硬盘模组;
    从所述存储所述待读取数据的硬盘模组中读取所述待读取数据;其中,所述待读取数据属于M份数据单元中的数据;所述存储系统还包含所述M份数据单元的N份校验单元;M和N分别为正整数;M+N=K;所述K个硬盘模组中的每一个硬盘模组存储用于所述K份单元中的一份;所述K份单元包含所述M份数据单元和所述N份校验单元;所述每一个硬盘模组包含接口模块和硬盘,所述接口模块与所述硬盘通信。
  35. 一种计算机程序产品,其特征在于,所述计算机程序产品包含程序代码,当计算机运行所述程序代码时,使得所述计算机执行如权利要求1-6任一所述的方法。
  36. 一种计算机程序产品,其特征在于,所述计算机程序产品包含程序代码,当计算机运行所述程序代码时,使得所述计算机执行如权利要求7-12任一所述的方法。
PCT/CN2020/141063 2020-01-08 2020-12-29 存储系统中的数据存储方法、数据读取方法、装置及系统 WO2021139571A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
EP20912213.4A EP4075252A4 (en) 2020-01-08 2020-12-29 DATA STORAGE METHOD, DEVICE, SYSTEM AND DATA READING METHOD, DEVICE AND SYSTEM IN A STORAGE SYSTEM
JP2022536646A JP2023510500A (ja) 2020-01-08 2020-12-29 ストレージシステムにおけるデータ記憶方法、データ読み出し方法、データ記憶装置、データ読み出し装置、ストレージデバイス、およびシステム
US17/859,378 US20220342567A1 (en) 2020-01-08 2022-07-07 Data Storage Method, Data Reading Method, Data Storage Apparatus, Data Reading Apparatus, Storage Device in Storage System, and System

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN202010018706.6 2020-01-08
CN202010018706 2020-01-08
CN202010096222.3A CN111399766B (zh) 2020-01-08 2020-02-17 存储系统中的数据存储方法、数据读取方法、装置及系统
CN202010096222.3 2020-02-17

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/859,378 Continuation US20220342567A1 (en) 2020-01-08 2022-07-07 Data Storage Method, Data Reading Method, Data Storage Apparatus, Data Reading Apparatus, Storage Device in Storage System, and System

Publications (1)

Publication Number Publication Date
WO2021139571A1 true WO2021139571A1 (zh) 2021-07-15

Family

ID=71428516

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/141063 WO2021139571A1 (zh) 2020-01-08 2020-12-29 存储系统中的数据存储方法、数据读取方法、装置及系统

Country Status (5)

Country Link
US (1) US20220342567A1 (zh)
EP (1) EP4075252A4 (zh)
JP (1) JP2023510500A (zh)
CN (1) CN111399766B (zh)
WO (1) WO2021139571A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111399766B (zh) * 2020-01-08 2021-10-22 华为技术有限公司 存储系统中的数据存储方法、数据读取方法、装置及系统
CN114461134B (zh) * 2021-11-19 2024-05-14 中航航空电子有限公司 硬盘零碎块读写装置、方法、计算机设备和存储介质
US20230236755A1 (en) * 2022-01-27 2023-07-27 Pure Storage, Inc. Data Resiliency Using Container Storage System Storage Pools

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070214314A1 (en) * 2006-03-07 2007-09-13 Reuter James M Methods and systems for hierarchical management of distributed data
CN106201338A (zh) * 2016-06-28 2016-12-07 华为技术有限公司 数据存储方法及装置
CN107273048A (zh) * 2017-06-08 2017-10-20 浙江大华技术股份有限公司 一种数据写入方法及装置
CN109213420A (zh) * 2017-06-29 2019-01-15 杭州海康威视数字技术股份有限公司 数据存储方法、装置及系统
CN109271360A (zh) * 2018-08-03 2019-01-25 北京城市网邻信息技术有限公司 分布式对象存储数据冗余方法、装置、设备及存储介质
CN109783280A (zh) * 2019-01-15 2019-05-21 上海海得控制系统股份有限公司 共享存储系统和共享存储方法
CN111399766A (zh) * 2020-01-08 2020-07-10 华为技术有限公司 存储系统中的数据存储方法、数据读取方法、装置及系统

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7594134B1 (en) * 2006-08-14 2009-09-22 Network Appliance, Inc. Dual access pathways to serially-connected mass data storage units
US8862847B2 (en) * 2013-02-08 2014-10-14 Huawei Technologies Co., Ltd. Distributed storage method, apparatus, and system for reducing a data loss that may result from a single-point failure
CN103699494B (zh) * 2013-12-06 2017-03-15 北京奇虎科技有限公司 一种数据存储方法、数据存储设备和分布式存储系统
US9727437B2 (en) * 2014-02-18 2017-08-08 Quantum Corporation Dynamically controlling erasure code distribution in an object store
US9965336B2 (en) * 2014-04-30 2018-05-08 International Business Machines Corporation Delegating iterative storage unit access in a dispersed storage network
EP3208714B1 (en) * 2015-12-31 2019-08-21 Huawei Technologies Co., Ltd. Data reconstruction method, apparatus and system in distributed storage system
US10270469B2 (en) * 2017-05-24 2019-04-23 Vmware, Inc. Efficient data write approach for distributed multi-mirror erasure coding system
EP3495939B1 (en) * 2017-10-13 2021-06-30 Huawei Technologies Co., Ltd. Method and device for storing data in distributed block storage system, and computer readable storage medium
CN109783002B (zh) * 2017-11-14 2021-02-26 华为技术有限公司 数据读写方法、管理设备、客户端和存储系统
CN107943421B (zh) * 2017-11-30 2021-04-20 成都华为技术有限公司 一种基于分布式存储系统的分区划分方法及装置
CN108780386B (zh) * 2017-12-20 2020-09-04 华为技术有限公司 一种数据存储的方法、装置和系统
CN110096220B (zh) * 2018-01-31 2020-06-26 华为技术有限公司 一种分布式存储系统、数据处理方法和存储节点
CN109726036B (zh) * 2018-11-21 2021-08-20 华为技术有限公司 一种存储系统中的数据重构方法和装置

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070214314A1 (en) * 2006-03-07 2007-09-13 Reuter James M Methods and systems for hierarchical management of distributed data
CN106201338A (zh) * 2016-06-28 2016-12-07 华为技术有限公司 数据存储方法及装置
CN107273048A (zh) * 2017-06-08 2017-10-20 浙江大华技术股份有限公司 一种数据写入方法及装置
CN109213420A (zh) * 2017-06-29 2019-01-15 杭州海康威视数字技术股份有限公司 数据存储方法、装置及系统
CN109271360A (zh) * 2018-08-03 2019-01-25 北京城市网邻信息技术有限公司 分布式对象存储数据冗余方法、装置、设备及存储介质
CN109783280A (zh) * 2019-01-15 2019-05-21 上海海得控制系统股份有限公司 共享存储系统和共享存储方法
CN111399766A (zh) * 2020-01-08 2020-07-10 华为技术有限公司 存储系统中的数据存储方法、数据读取方法、装置及系统

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4075252A4

Also Published As

Publication number Publication date
JP2023510500A (ja) 2023-03-14
CN111399766B (zh) 2021-10-22
CN111399766A (zh) 2020-07-10
EP4075252A1 (en) 2022-10-19
EP4075252A4 (en) 2023-02-08
US20220342567A1 (en) 2022-10-27

Similar Documents

Publication Publication Date Title
WO2021139571A1 (zh) 存储系统中的数据存储方法、数据读取方法、装置及系统
US11163653B2 (en) Storage cluster failure detection
CN110071821B (zh) 确定事务日志的状态的方法,节点和存储介质
CN107544862B (zh) 一种基于纠删码的存储数据重构方法和装置、存储节点
US9916113B2 (en) System and method for mirroring data
US11307776B2 (en) Method for accessing distributed storage system, related apparatus, and related system
CN106776130B (zh) 一种日志恢复方法、存储装置和存储节点
US7313722B2 (en) System and method for failover
US6578160B1 (en) Fault tolerant, low latency system resource with high level logging of system resource transactions and cross-server mirrored high level logging of system resource transactions
US7219260B1 (en) Fault tolerant system shared system resource with state machine logging
US6594775B1 (en) Fault handling monitor transparently using multiple technologies for fault handling in a multiple hierarchal/peer domain file server with domain centered, cross domain cooperative fault handling mechanisms
US11409471B2 (en) Method and apparatus for performing data access management of all flash array server
CN113326006A (zh) 一种基于纠删码的分布式块存储系统
EP4036732A1 (en) Verification data calculation method and device
EP4027243A1 (en) Data recovery method and related device
CN113918083A (zh) 分条管理方法、存储系统、分条管理装置及存储介质
CN112104729A (zh) 一种存储系统及其缓存方法
WO2020034695A1 (zh) 数据存储方法、数据恢复方法、装置、设备及存储介质
CN113051428A (zh) 一种摄像机前端存储备份的方法及装置
CN115470041A (zh) 一种数据灾备管理方法及装置
WO2022033269A1 (zh) 数据处理的方法、设备及系统
CN114518973A (zh) 分布式集群节点宕机重启恢复方法
CN113868017B (zh) 一种全闪系统数据管理方法、系统
TWI766594B (zh) 伺服器與應用於伺服器的控制方法
CN117555493B (zh) 数据处理方法、系统、装置、存储介质及电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20912213

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2022536646

Country of ref document: JP

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2020912213

Country of ref document: EP

Effective date: 20220711

NENP Non-entry into the national phase

Ref country code: DE