WO2023000686A1 - 存储系统中的数据存储方法以及装置 - Google Patents

存储系统中的数据存储方法以及装置 Download PDF

Info

Publication number
WO2023000686A1
WO2023000686A1 PCT/CN2022/080193 CN2022080193W WO2023000686A1 WO 2023000686 A1 WO2023000686 A1 WO 2023000686A1 CN 2022080193 W CN2022080193 W CN 2022080193W WO 2023000686 A1 WO2023000686 A1 WO 2023000686A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
storage
blocks
storage system
data blocks
Prior art date
Application number
PCT/CN2022/080193
Other languages
English (en)
French (fr)
Inventor
李林波
唐军
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP22844859.3A priority Critical patent/EP4369170A1/en
Publication of WO2023000686A1 publication Critical patent/WO2023000686A1/zh
Priority to US18/418,737 priority patent/US20240160528A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • G06F11/1076Parity data used in redundant arrays of independent storages, e.g. in RAID systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • G06F3/0619Improving the reliability of storage systems in relation to data integrity, e.g. data losses, bit errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/0647Migration mechanisms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]

Definitions

  • the present application relates to the field of storage technologies, and in particular to a data storage method and device in a storage system.
  • the data reliability of the storage system can be improved through data redundancy technology.
  • data redundancy technologies include erasure coding (EC) technology.
  • the EC technology means that by calculating Q check blocks of P data blocks, P data blocks and Q check blocks (collectively referred to as P+Q blocks) are finally stored in different storage locations of the storage system.
  • P+Q blocks Q check blocks
  • the storage system In a storage system using EC technology, the storage system is configured with a redundancy ratio, and then stores data according to the redundancy ratio.
  • the above mechanism will lead to waste of storage resources.
  • the present application provides a data storage method and device in a storage system, which solves the problem of waste of storage resources caused by an inappropriate redundancy ratio adopted by the storage system.
  • the present application provides a data storage method in a storage system, which can be used in a storage system (such as a centralized storage system or a distributed storage system), and the method includes: calculating N according to the first erasure code EC technology M check blocks of the first data block.
  • the N first data blocks and the M check blocks of the N first data blocks are respectively stored in corresponding storage nodes among the (N+M) storage nodes in the storage system. Update the first erasure coding technique to the second erasure coding technique.
  • the second data block is a data block received after updating the first erasure code technology to the second erasure code technology.
  • the S second data blocks and the R check blocks of the S second data blocks are respectively stored in corresponding storage nodes among the (S+R) storage nodes in the storage system.
  • the EC technology of the storage system can be updated so that the newly adopted EC technology (that is, the second EC Technology) the proportion and number of data blocks are greater than the proportion and number of data blocks in the original EC technology (the first EC technology) (that is, the ratio of S to R is greater than the ratio of N to M and S is greater than N ), thereby improving the capacity utilization rate of the storage system and avoiding the waste of storage resources of the storage system.
  • the EC technology of the storage system can be updated so that the newly adopted EC technology (that is, the second EC Technology) the proportion and number of data blocks are greater than the proportion and number of data blocks in the original EC technology (the first EC technology) (that is, the ratio of S to R is greater than the ratio of N to M and S is greater than N ), thereby improving the capacity utilization rate of the storage system and avoiding the waste of storage resources of the storage system.
  • the method further includes: selecting S target data blocks from the data blocks stored by the storage system according to the first erasure code technology, and calculating R of the S target data blocks according to the second erasure code technology check blocks.
  • the S target data blocks and the R check blocks of the S target data blocks are respectively stored in corresponding storage nodes in the storage system.
  • the original redundancy ratio EC N+M can be used in the storage system at an appropriate time (for example, during the idle period of the storage system).
  • the data of the new redundancy ratio is converted into the data of EC S+R, thereby further improving the capacity utilization rate of the storage system.
  • the structure of the previous data in the storage system can not be changed first, but the EC N+M data in the storage system can be converted in a subsequent appropriate period of time (such as the idle period of the storage system). It is EC S+R data, which can shorten the expansion time of the storage system.
  • R is not less than M.
  • the reliability of stored data is ensured by making the number R of check blocks in the redundancy ratio of the second EC technology not less than the number M of check blocks in the first EC technology.
  • the method further includes: receiving a read request.
  • the read request is used to request to read the data in the N first data blocks, according to the first erasure code technology, read the data in the N first data blocks.
  • the read request is used to request to read S second data blocks, according to the second erasure code technology, read the data in the S second data blocks.
  • different EC technologies can be used to read data according to different data requested by the read request.
  • the method before updating the first erasure code technology to the second erasure code technology, the method further includes: adding storage nodes to the storage system.
  • adding storage nodes to the storage system.
  • the method further includes: migrating the N first data blocks and one or more of the M check blocks of the N first data blocks to the new storage node.
  • migrating the N first data blocks and one or more of the M check blocks of the N first data blocks to the new storage node.
  • the storage node is any one of a hard disk, a hard disk enclosure, or a storage server.
  • the present application provides a data storage device, including: a processing unit configured to calculate M check blocks of N first data blocks according to a first erasure code EC technology.
  • the read-write unit is configured to store the N first data blocks and the M check blocks of the N first data blocks into corresponding storage nodes among the (N+M) storage nodes in the storage system.
  • the processing unit is further configured to update the first erasure code technology to the second erasure code technology.
  • the processing unit is further configured to calculate R check blocks of the S second data blocks according to the second erasure code technology; wherein, S is greater than N, and the ratio of S to R is greater than the ratio of N to M, S, R, Both N and M are positive integers, and the second data block is a data block received after the first erasure code technology is updated to the second erasure code technology.
  • the read-write unit is configured to store the S second data blocks and the R check blocks of the S second data blocks in corresponding storage nodes among the (S+R) storage nodes in the storage system.
  • the processing unit is further configured to select S target data blocks from the data blocks stored in the storage system according to the first erasure code technology during the idle period of the storage system, and select S target data blocks according to the second erasure code technology Calculate R check blocks of S target data blocks.
  • the read-write unit is further configured to store the S target data blocks and the R check blocks of the S target data blocks in corresponding storage nodes in the storage system.
  • R is not less than M.
  • the data storage device further includes: a receiving unit, configured to receive a read request; and a read/write unit, configured to read data in N first data blocks when the read request is used , according to the first erasure code technology, read the data in the N first data blocks;
  • the read-write unit is also used to read the data in the S second data blocks according to the second erasure code technology when the read request is for requesting to read S second data blocks.
  • the processing unit is further configured to add storage nodes to the storage system before updating the first erasure code technology to the second erasure code technology.
  • the read-write unit is also used to migrate one or more of the N first data blocks and the M check blocks of the N first data blocks to the newly added storage node.
  • the storage node is any one of a hard disk, a hard disk enclosure, or a storage server.
  • the present application provides a data storage device, including a processor and an interface circuit, the processor receives or sends data through the interface circuit, and the processor uses a logic circuit or executes code instructions to implement the first aspect or the first A method of any implementation of the aspects.
  • the present application provides a storage system, including the second aspect or the implementation manner of any one of the second aspect or the data storage device in the third aspect.
  • the present application provides a computer-readable storage medium, which is characterized in that a computer program is stored in the storage medium, and when the computer program is executed by a processor, any one of the above-mentioned first aspect or the first aspect can be realized. method of implementation.
  • the present application provides a computer program product, which is characterized in that the computer program product includes instructions, and when the instructions are run on the processor, the method according to any one of the above-mentioned first aspect or the first aspect is implemented. .
  • FIG. 1 is one of the schematic structural diagrams of a storage system provided by an embodiment of the present application
  • FIG. 2 is the second structural schematic diagram of a storage system provided by the embodiment of the present application.
  • FIG. 3 is one of the structural schematic diagrams of a data storage device provided by an embodiment of the present application.
  • FIG. 4 is one of the schematic flow diagrams of a data storage method provided in the embodiment of the present application.
  • FIG. 5 is the second schematic flow diagram of a data storage method provided by the embodiment of the present application.
  • FIG. 6 is the third schematic flow diagram of a data storage method provided by the embodiment of the present application.
  • FIG. 7 is the fourth schematic flow diagram of a data storage method provided by the embodiment of the present application.
  • FIG. 8 is the fifth schematic flow diagram of a data storage method provided by the embodiment of the present application.
  • FIG. 9 is a second structural schematic diagram of a data storage device provided by an embodiment of the present application.
  • FIG. 1 is a schematic diagram of a storage system provided by an embodiment of the present application.
  • the storage system 100 can be understood as a storage system that stores data on multiple independent storage nodes.
  • terminals 121 to 125 in FIG. 1 may write data into a storage system or read data from a storage system.
  • the storage nodes 111 to 114 are used to store data.
  • the storage nodes 111 to 114 in FIG. 1 may be independent servers
  • FIG. 2 is a schematic diagram of a distributed storage system.
  • the distributed storage system includes one or more servers 210 (three servers 210 are exemplarily shown in FIG. 2 ), and the servers 210 can communicate with each other.
  • the server 210 is a device having both computing capability and storage capability, such as a server, a desktop computer, and the like. In terms of software, each server 210 has an operating system.
  • a virtual machine 207 can be created on the server 210, the computing resources required by the virtual machine 207 come from the local processor 212 and memory 213 of the server 210, and the storage resources required by the virtual machine 207 can come from the local hard disk 205 of the server 210 , can also come from the hard disk 205 in other servers 210.
  • various application programs can run in the virtual machine 207, and the user can trigger a read/write data request through the application programs in the virtual machine 207.
  • the virtual machine 207 accesses the distributed storage system as a client.
  • the server 210 includes at least a processor 212 , a memory 213 , a network card 214 and a hard disk 205 .
  • the processor 212, the memory 213, the network card 214 and the hard disk 205 are connected through a bus.
  • the processor 212 and the memory 213 are used to provide computing resources.
  • the processor 212 is a central processing unit (central processing unit, CPU), configured to process data access requests from outside the server 210, or requests generated inside the server 210.
  • CPU central processing unit
  • the processor 212 sends the data stored in the memory 213 to the hard disk 205 for persistent storage.
  • the processor 212 is also used for data calculation or processing, such as metadata management, data deduplication, data compression, data verification, virtualized storage space, and address translation. Only one CPU 212 is shown in FIG. 2 . In practical applications, there are often multiple CPUs 212 , and one CPU 212 has one or more CPU cores. This embodiment does not limit the number of CPUs and the number of CPU cores.
  • the memory 213 refers to an internal memory directly exchanging data with the processor. It can read and write data at any time, and the speed is very fast. It is used as a temporary data storage for an operating system or other running programs.
  • Memory includes at least two kinds of memory, for example, memory can be either random access memory or read-only memory (Read Only Memory, ROM).
  • the random access memory is dynamic random access memory (Dynamic Random Access Memory, DRAM), or storage class memory (Storage Class Memory, SCM).
  • DRAM Dynamic Random Access Memory
  • SCM Storage Class Memory
  • DRAM is a semiconductor memory, which, like most Random Access Memory (RAM), is a volatile memory device.
  • SCM is a composite storage technology that combines the characteristics of traditional storage devices and memory.
  • Storage-class memory can provide faster read and write speeds than hard disks, but the access speed is slower than DRAM, and the cost is also cheaper than DRAM.
  • the DRAM and the SCM are only exemplary illustrations in this embodiment, and the memory may also include other random access memories, such as Static Random Access Memory (Static Random Access Memory, SRAM) and the like.
  • Static Random Access Memory SRAM
  • the read-only memory for example, it can be a programmable read-only memory (Programmable Read Only Memory, PROM), an erasable programmable read-only memory (Erasable Programmable Read Only Memory, EPROM) and the like.
  • the memory 213 can also be a dual in-line memory module or a dual-line memory module (Dual In-line Memory Module, DIMM for short), that is, a module composed of dynamic random access memory (DRAM), or a solid-state hard disk (Solid State Disk, SSD).
  • DIMM Dual In-line Memory Module
  • multiple memories 213 and different types of memories 213 may be configured in the server 210 .
  • This embodiment does not limit the quantity and type of the memory 213 .
  • the memory 213 can be configured to have a power saving function.
  • the power saving function means that the data stored in the internal memory 213 will not be lost when the system is powered off and then powered on again. Memory with a power saving function is called non-volatile memory.
  • the hard disk 205 is used to provide storage resources, such as storing data. It can be a magnetic disk or other types of storage media, such as solid state drives or shingled magnetic recording hard drives.
  • the network card 214 is used for communicating with other application servers 210 .
  • Fig. 2 only exemplarily provides a framework of a distributed storage system.
  • the distributed storage system may also adopt other frameworks.
  • the server 210 uses local computing resources (such as a processor, memory) and storage resources (such as a hard disk) to complete the request for reading/writing data.
  • the distributed storage system may include computing node clusters and storage node clusters.
  • a computing node cluster includes one or more computing nodes, and the computing nodes can communicate with each other.
  • Each computing node is a computing device, such as a server, a desktop computer, or a controller of a storage array.
  • Each computing node can communicate with any storage node in the storage node cluster through the network, and is used to write data to the hard disk in the storage node or read data on the hard disk in the storage node.
  • the above mainly introduces the application scenarios of the embodiments of the present application by taking the distributed storage system as an example.
  • the description should not be construed as a limitation on the framework of the storage system to which this application applies.
  • the embodiments of the present application may also be applied to a centralized storage system.
  • the centralized storage system can be understood as a central node composed of one or more master devices, where the data is stored centrally, and the data processing business of the entire system Centralized deployment on the central node. That is to say, the embodiment of the present application may not limit the frame structure of the storage system to which the technical solution provided in the embodiment of the present application is applied.
  • the data reliability of the storage system can be guaranteed through data redundancy technology, and the commonly used data redundancy technology includes erasure coding (EC) technology.
  • EC technology refers to the calculation of Q check blocks (also called check columns) of P data blocks (also called data columns), and finally P data blocks and Q check blocks (collectively referred to as P +Q blocks) are respectively stored in different storage locations of the storage system, for example, in a distributed storage system, P+Q blocks are respectively stored in different storage servers.
  • the number of damaged blocks in the P+Q blocks is less than Q, recovery can be performed through undamaged blocks.
  • the number P of data blocks and the number of check blocks Q in EC technology can be called the redundancy ratio, expressed as ECP+Q.
  • the EC technology used by the storage system is usually configured when the storage system is created, that is, the redundancy ratio ECP+Q is configured. Afterwards, the system stores data according to the redundancy ratio. However, with the increase of storage nodes in the storage system, if the original redundancy ratio is still used, storage resources will be wasted.
  • the storage system when creating a storage system, the storage system includes 6 storage nodes.
  • the redundancy ratio of EC4+2 is used for data storage, and the capacity utilization rate of the storage system is about 66.7% (that is, 4/(4+2) ⁇ 100%);
  • the storage node is expanded, for example, to 10 storage nodes, if the redundancy ratio of EC 4+2 is still maintained for data storage, it will lead to waste of storage resources.
  • a redundancy ratio with a larger proportion of data blocks can be used to improve capacity utilization while satisfying data reliability constraints.
  • the original storage system includes 6 storage nodes.
  • the redundancy ratio of EC 4+2 is used for data storage; when the storage system needs to be expanded to 10 storage nodes, the newly added 10 storage nodes are reused.
  • the six storage nodes of the original storage system can be added to the new storage system to further expand the capacity of the storage system.
  • this technical solution can achieve the effect of improving the capacity utilization rate when expanding the storage system.
  • this technical solution has higher requirements on the number of newly added and expanded hardware. For example, in the above example, if the number of expanded storage nodes is less than six, the above technical solution cannot be realized; on the other hand, in this solution
  • O&M operations are complex, capacity expansion takes a long time, consumes a lot of resource overhead, and is not user-friendly. Especially for storage systems that require continuous capacity expansion, the required O&M costs and operational risks will be higher.
  • the storage space can be another storage system
  • Create a new storage system with redundancy ratio then migrate the temporarily stored data to the new storage system, and complete the business switchover.
  • the storage system When the storage system is initially created, the storage system is created with more redundancy ratios for both data blocks and check blocks. For example, when creating a storage system including two storage nodes, the redundancy ratio of EC 1+1 can generally be used to ensure data reliability; but in this technical solution, data blocks and verification blocks can be used Both have more redundancy ratios, such as EC10+10.
  • the number of parity blocks in the redundancy ratio in the storage system is reduced.
  • the EC technology of the storage system can be updated to change the redundancy ratio, for example, changing EC N+M to EC S+R, so that the newly stored data can be more in accordance with the number of data blocks and
  • This way of storage with a redundancy ratio with a larger proportion that is, S is greater than N, and the ratio of S to R is greater than the ratio of N to M
  • the storage system uses It takes a long time and takes up more resources to convert the data of the original redundancy ratio EC N+M into the data of the new redundancy ratio EC S+R.
  • the structure of the existing EC N+M data can not be changed first, but the EC N+M data in the storage system can be converted into EC S+R data during the idle period of the subsequent storage system, thereby shortening the time for updating the storage system. Redundancy matching time.
  • the embodiment of the present application provides a data storage method in a storage system, which can be implemented by the data storage device 30 shown in FIG. 3 during specific implementation.
  • the data storage device 30 includes: at least one processor 301 and a memory 302 .
  • the data storage device 30 may further include a communication line 303 and a communication interface 304 .
  • the processor 301 is configured to execute computer-executed instructions in the memory 302 to implement the data storage method provided in this application.
  • the processor 301 can be a general-purpose central processing unit (central processing unit, CPU), a microprocessor, a specific application integrated circuit (application-specific integrated circuit, ASIC), or one or more integrated circuit for program execution.
  • CPU central processing unit
  • ASIC application-specific integrated circuit
  • Memory 302 may be read-only memory (read-only memory, ROM) or other types of static storage devices that can store static information and instructions, random access memory (random access memory, RAM) or other types that can store information and instructions It can also be an electrically erasable programmable read-only memory (EEPROM), compact disc read-only memory (CD-ROM) or other optical disc storage, optical disc storage (including compact discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or can be used to carry or store program code in the form of instructions or data structures and can be accessed by a computer any other media, but not limited to.
  • the memory 302 may exist independently, and is connected to the processor through the communication line 303 .
  • the memory 302 can also be integrated with the processor.
  • the communication line 303 may include a data bus for transferring information between the aforementioned components.
  • the communication interface 304 is used for communicating with other devices.
  • the data storage device 30 may communicate with other hardware devices in the storage system through the communication interface 304, so as to execute the data storage method provided in the embodiment of the present application.
  • the data storage device 30 can be a hardware device used to manage and control the storage system in the storage system.
  • the method includes:
  • the data storage device calculates M check blocks of the N first data blocks according to the first EC technology.
  • the N first data blocks may be N data blocks in the data to be stored received by the storage system after the first EC technology is configured for the storage system.
  • the storage system may temporarily store the received data to be stored. For example, it is temporarily stored in the memory of a storage node (which may be the storage node where the data storage device is located) of the storage system. Then, when the amount of temporarily stored data reaches the threshold, the data storage device divides the temporarily stored data into N data blocks (such as the above-mentioned N first data blocks) on average, and calculates M checksums of the N data blocks. piece.
  • N data blocks such as the above-mentioned N first data blocks
  • the first EC technology can be understood as the redundancy ratio corresponding to the pre-configured first EC technology (that is, N data blocks correspond to M check blocks, referred to as EC N+ M)
  • EC N+ M An EC technique for calculating a check block of a data block.
  • the redundancy relationship of N data blocks corresponding to M check blocks is usually used to perform data redundancy; but in some cases, such as storage system
  • the redundancy relationship of less data blocks such as N -1 data block corresponds to M check blocks, or N-2 data blocks correspond to M check blocks, etc.
  • N+1 data blocks correspond to M check blocks, or N+2 data blocks correspond to M check blocks, etc.
  • the data storage device respectively stores N first data blocks and M check blocks in corresponding storage nodes among (N+M) storage nodes in the storage system.
  • the current storage system includes 6 storage nodes, and the redundancy ratio corresponding to the first EC technology is EC4+2. Then the data storage device equally divides the data to be stored into 4 first data blocks, and calculates 2 check blocks of the 4 first data blocks. Then the 4 first data blocks and the 2 check blocks are respectively stored in different storage nodes among the 6 storage nodes.
  • the storage node mentioned in the method may be a storage server, a hard disk or a hard disk enclosure that realizes the storage node function in the distributed system.
  • the storage nodes referred to in the method may be one or more hard disks or hard disk enclosures.
  • the data storage device adds storage nodes in the storage system.
  • the storage system can be expanded by adding storage nodes (such as storage servers, hard disk enclosures or hard disks, etc.).
  • storage nodes such as storage servers, hard disk enclosures or hard disks, etc.
  • the operation and maintenance personnel can trigger the data storage device by checking the storage server, hard disk enclosure or hard disk to be added on the operation interface. Add storage servers, disk enclosures, or hard disks to the storage system to add storage nodes to the storage system.
  • the data storage device migrates the N first data blocks and one or more of the M check blocks of the N first data blocks to the new storage node.
  • the newly added storage nodes have a larger free storage space, so the part stored in the original storage nodes can be Migrating data to a new storage node, specifically, at least part of the first data blocks or check blocks of the N first data blocks and the M check blocks of the N first data blocks can be migrated to the new storage node , so that the data is evenly distributed on each storage node and the load of each storage node is balanced.
  • the 4 first data blocks (block 1- Block 4) is stored in node 1-node 4 respectively, and two check blocks (block 5-block 6 in the figure) are stored in node 5 and node 6 respectively (as shown by the shadow in the figure);
  • the first data block in node 4 (that is, block 4) can be migrated to node 7, and the check block in node 6 (that is, block 6) can be migrated to node 8, thereby reducing the load.
  • the data storage device updates the first EC technology to the second EC technology.
  • the second EC technology is used to store the data blocks received by the storage system after the first EC technology is updated to the second EC technology according to the redundancy ratio corresponding to the second EC technology.
  • the number S of data blocks in the redundancy ratio (EC S+R) corresponding to the second EC technology is greater than the number N of data blocks in the redundancy ratio (EC N+M) corresponding to the first EC technology,
  • the ratio of S to R is greater than the ratio of N to M.
  • the second EC technology in the embodiment of the present application can be understood as calculating the data block ratio according to the redundancy ratio (ie EC S+R) corresponding to the second EC technology
  • An EC technology for parity blocks when using the second EC technology with a redundancy ratio of EC S+R to calculate the check block of the data block, it is usually the redundancy relationship that S data blocks correspond to R check blocks, and data redundancy is performed.
  • the EC technology adopted by the storage system is usually not updated to the EC technology with more data blocks during the operation of the storage system.
  • this technical prejudice can be broken, and when the storage system is expanded, the technical means of updating the EC technology of the storage system to the above-mentioned second EC technology is directly adopted, thereby improving the capacity utilization of the storage system; and
  • other technical means can be used to solve the problem on the basis of updating the EC technology of the storage system to the above-mentioned second EC technology.
  • the data is read by using the first EC technology to read the previous data and the second EC technology to read the data stored after capacity expansion; on the other hand, in the subsequent storage system During the idle period, the second EC technology is used to recalculate and save the check block of the data block that originally used the first EC technology, so that it can avoid the need to expand the data in order to maintain the consistency of the redundant relationship of the stored data
  • the redundancy ratio of the previous data is changed to the redundancy ratio corresponding to the second EC technology, thereby reducing the complexity of the expansion process; in addition, due to the method provided by the embodiment of the present application, the process of expanding the capacity can be avoided. Consume corresponding resources to change the EC technology adopted by the previous data, so the time of the expansion process can be shortened and the read and write performance of the storage system can be improved during the expansion process.
  • the specific manner of determining the values of S and R in the redundancy ratio EC S+R corresponding to the second EC technology can be set according to actual requirements in the specific implementation process.
  • the values of S and R can be manually configured by operation and maintenance personnel when expanding the storage system.
  • the data storage device may determine the values of S and R according to the number of storage nodes after expansion.
  • the data storage device depends on data reliability or Regarding the related constraints of capacity utilization, it is determined that the redundancy ratio corresponding to the second EC technology is EC6+2.
  • the method may also include:
  • the data storage device calculates R check blocks of the S second data blocks according to the second EC technology.
  • the second data block is a data block received after the first EC technology is updated to the second EC technology.
  • the above R is not less than M.
  • the reliability of stored data is ensured by making the number R of check blocks in the redundancy ratio of the second EC technology not less than the number M of check blocks in the first EC technology.
  • the data storage device stores the above-mentioned S second data blocks and R check blocks of the S second data blocks respectively in corresponding storage nodes among the (S+R) storage nodes in the storage system.
  • the storage system not only stores data stored using the first EC technology (ie, blocks 1-6) but also stores data stored using the second EC technology (ie, blocks 7-14).
  • the method may also include:
  • the data storage device receives a read request.
  • the read request is used to request to read data stored in the storage system.
  • the data storage device determines the N first data blocks by reading the metadata corresponding to the N first data blocks A storage address (which may be a physical address or a logical address), and the first EC technology adopted by the N first data blocks. Then, according to the first EC technology, the data to be read in the N first data blocks is read.
  • the data storage device determines the S second data blocks by reading the metadata corresponding to the S second data blocks A storage address (which may be a physical address or a logical address), and the second EC technology adopted by the S second data blocks. Then, according to the second EC technique, the data to be read in the S second data blocks is read.
  • the method may also include:
  • the data storage device selects S data blocks from the data blocks stored by the storage system according to the first EC technology (for convenience of description, the S data blocks are referred to as S third data blocks hereinafter), and according to the second EC technology R check blocks of the S third data blocks are calculated.
  • S411 may be executed to convert data stored in the storage system using the first EC technology into data stored in the second EC technology, thereby maintaining the consistency of the data structure of the storage system.
  • the idle period of the storage system may also be referred to as a period when the operating load of the storage system is lower than a load threshold.
  • the idle period of the storage system can be manifested as: the current data to be written in the storage system is less than a preset threshold (which may be referred to as "the first preset threshold"), and the current data to be read in the storage system is less than the preset threshold.
  • a threshold (may be referred to as a "second preset threshold”) is set, or one or more of the storage system's related hardware resource utilization is lower than the preset threshold (may be referred to as a "third preset threshold").
  • the data storage device respectively stores the S third data blocks and the R check blocks of the S third data blocks to corresponding storage nodes in the storage system.
  • the S third data blocks and R check blocks of the S third data blocks are respectively stored in (S+R) storage nodes in the storage system.
  • the previous data (that is, the data stored before the EC technology is updated) Keep the original redundant relationship. Then, when the storage system is in an idle period, the redundancy relationship of the previous data is converted by using the second EC technology. In this way, while maintaining the consistency of the data structure of the storage system, the effects of reducing the complexity of the expansion process and balancing the load of the storage system can be achieved.
  • the storage system can also reclaim the storage space occupied by the data stored according to the first EC technology, and then store data into the storage space according to the second EC technology, thereby further Improve the capacity utilization of the storage system.
  • the data storage method provided in the embodiment of the present application may further include:
  • the data storage device calculates M check blocks of the N first data blocks according to the first EC technology.
  • the data storage device respectively stores N first data blocks and M check blocks in (N+M) storage nodes in the storage system.
  • the data storage device updates the first EC technology to the second EC technology.
  • this method may not be applied in the scenario of storage system expansion, but may directly update the EC technology of the storage system, so that The storage system stores the updated and received data according to the second EC technology.
  • the data storage device calculates R check blocks of the S second data blocks according to the second EC technology.
  • the data storage device stores the above-mentioned S second data blocks and R check blocks of the S second data blocks respectively in (S+R) storage nodes in the storage system.
  • the data storage device may perform some or all of the steps in the embodiments of the present application, and these steps or operations are only examples. In the embodiments of the present application, other operations or variations of various operations may also be performed. In addition, each step may be performed in a different order presented in the embodiment of the present application, and it may not be necessary to perform all operations in the embodiment of the present application.
  • the data storage device includes hardware structures and/or software modules corresponding to each function.
  • the present application can be implemented in the form of hardware or a combination of hardware and computer software with reference to the units and method steps of the examples described in the embodiments disclosed in the present application. Whether a certain function is executed by hardware or computer software drives the hardware depends on the specific application scenario and design constraints of the technical solution.
  • FIG. 9 is a schematic structural diagram of another data storage device provided by the present application.
  • the data storage device 600 can be used to implement the functions of the steps in the above-mentioned method embodiments, so the beneficial effects of the above-mentioned method embodiments can also be realized.
  • the data storage device 600 may be a storage server with management and control functions in a distributed storage system as shown in FIG. 2 or part of the internal hardware of the storage server; It is the storage engine in the centralized storage system or part of the hardware inside the storage engine.
  • the data storage device 600 includes a processing unit 601 , a reading and writing unit 602 and a receiving unit 603 .
  • the data storage device 600 is used to realize the functions of each step in the above-mentioned method embodiment shown in FIG. 4 or FIG. 6 to FIG. 8 .
  • the processing unit 601 is used to perform one or more of S401, S403, S405 or S406; the read-write unit 602 is used to perform S402, S404 or One or more of S407.
  • the receiving unit 603 is used to perform S408;
  • the read-write unit 602 is also used to perform one or more of S409 or S410;
  • the processing unit 601 is also used to perform S411; the read-write unit 602 is also used to perform S412.
  • the processing unit 601 is used to execute one or more of S501, S503 or S504; the read-write unit 602 is used to execute the one or more.
  • processing unit 601, the reading and writing unit 602, and the receiving unit 603 can be directly obtained by referring to the relevant descriptions in the method embodiments shown in FIG. 4 or FIGS. 6-8 , and will not be repeated here.
  • the method steps in the embodiments of the present application may be implemented by means of hardware, or may be implemented by means of a processor executing software instructions.
  • the software instructions can be composed of corresponding software modules, and the software modules can be stored in RAM, flash memory, ROM, PROM, EPROM, EEPROM, registers, hard disk, mobile hard disk, CD-ROM or any other form of storage medium known in the art .
  • An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium.
  • the storage medium may also be a component of the processor.
  • the processor and storage medium can be located in the ASIC.
  • the ASIC can be located in a network device or a terminal device.
  • the processor and the storage medium may also exist in the network device or the terminal device as discrete components.
  • all or part of them may be implemented by software, hardware, firmware or any combination thereof.
  • software When implemented using software, it may be implemented in whole or in part in the form of a computer program product.
  • the computer program product comprises one or more computer programs or instructions. When the computer program or instructions are loaded and executed on the computer, the processes or functions described in the embodiments of the present application are executed in whole or in part.
  • the computer may be a general purpose computer, a special purpose computer, a computer network, network equipment, user equipment, or other programmable devices.
  • the computer program or instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer program or instructions may be downloaded from a website, computer, A server or data center transmits to another website site, computer, server or data center by wired or wireless means.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device such as a server or a data center integrating one or more available media.
  • the available medium may be a magnetic medium, such as a floppy disk, a hard disk, or a magnetic tape; it may also be an optical medium, such as a digital video disc (digital video disc, DVD); it may also be a semiconductor medium, such as an SSD.
  • “at least one” means one or more
  • “multiple” means two or more
  • other quantifiers are similar.
  • “And/or” describes the association relationship of associated objects, indicating that there may be three kinds of relationships, for example, A and/or B may indicate: A exists alone, A and B exist simultaneously, and B exists independently.
  • the singular forms “a”, “an” and “the” do not mean “one or only one” but “one or more” unless the context clearly dictates otherwise. in one".
  • “a device” means reference to one or more such devices.
  • At least one (at least one of). «" means one or any combination of subsequent associated objects, such as "at least one of A, B and C” includes A, B, C, AB, AC, BC, or ABC.
  • the character “/” generally indicates that the front and rear related objects are a kind of "or” relationship; in the formula of the application, the character “/” indicates that the front and rear Associated objects are a "division" relationship.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Quality & Reliability (AREA)
  • Computer Security & Cryptography (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种存储系统中的数据存储方法以及装置,涉及数据存储领域,用于提高存储系统的容量利用率。该方法包括:根据第一纠删码技术计算N个第一数据块的M个校验块;将N个第一数据块和N个第一数据块的M个校验块分别存储到存储系统中的(N+M)个存储节点;将第一纠删码技术更新为第二纠删码技术;根据第二纠删码技术计算S个第二数据块的R个校验块;其中,S大于N,并且S与R的比值大于N与M的比值,S、R、N和M均为正整数,第二数据块为将第一纠删码技术更新为第二纠删码技术之后接收的数据块;将S个第二数据块和S个第二数据块的R个校验块分别存储到存储系统中的(S+R)个存储节点。

Description

存储系统中的数据存储方法以及装置
本申请要求于2021年07月22日提交国家知识产权局、申请号为202110831638.X、申请名称为“存储系统中的数据存储方法以及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及存储技术领域,尤其涉及一种存储系统中的数据存储方法以及装置。
背景技术
目前,可以通过数据冗余技术来提高存储系统的数据可靠性。目前常用的数据冗余技术包括纠删码(erasure coding,EC)技术。
EC技术是指通过计算P个数据块的Q个校验块,最终将P个数据块和Q个校验块(统称为P+Q个块)分别存储在存储系统的不同存储位置。当P+Q个块中损坏的块的数量小于Q时,均可通过未损坏的块来进行恢复。
在采用EC技术存储系统中,存储系统配置冗余配比,然后按照冗余配比进行数据存储。但是,随着存储系统中存储节点的增加,上述机制会导致存储资源的浪费。
发明内容
本申请提供一种存储系统中的数据存储方法以及装置,解决了因存储系统所采用的冗余配比不合适而导致存储资源浪费的问题。
为达到上述目的,本申请采用如下技术方案:
第一方面,本申请提供一种存储系统中的数据存储方法,该方法可用于存储系统(例如集中式存储系统或分布式存储系统),该方法包括:根据第一纠删码EC技术计算N个第一数据块的M个校验块。将N个第一数据块和N个第一数据块的M个校验块分别存储到存储系统中的(N+M)个存储节点中的相应存储节点。将第一纠删码技术更新为第二纠删码技术。根据第二纠删码技术计算S个第二数据块的R个校验块;其中,S大于N,并且S与R的比值大于N与M的比值,S、R、N和M均为正整数。其中,第二数据块为将第一纠删码技术更新为第二纠删码技术之后接收的数据块。将S个第二数据块和S个第二数据块的R个校验块分别存储到存储系统中的(S+R)个存储节点中的相应存储节点。
本申请上述方法中,在对存储系统进行扩容等导致原来配置的冗余配比不适合当前存储系统的情况下,可以通过更新存储系统的EC技术,使得新采用的EC技术(即第二EC技术)中数据块的占比和个数均大于原来采用的EC技术(第一EC技术)中数据块的占比和个数(即S与R的比值大于N与M的比值并且S大于N),从而提高存储系统的容量利用率,避免存储系统的存储资源浪费。
作为一种可能的实现方式,该方法还包括:从存储系统按照第一纠删码技术存储的数据块中选择S个目标数据块,根据第二纠删码技术计算S个目标数据块的R个校验块。将S个目标数据块和S个目标数据块的R个校验块分别存储到存储系统中相应的存储节点。上述实现方式中,在将第一纠删码技术更新为第二纠删码技术后,可以 在适当的时候(例如存储系统的空闲时段)将存储系统中采用原冗余配比EC N+M的数据转换为新冗余配比EC S+R的数据,从而进一步提高存储系统的容量利用率。另外,通过在对存储系统进行扩容时可以先不改变存储系统中在先数据的结构,而是在后续合适时间段(例如存储系统的空闲时段)再将存储系统中的EC N+M数据转换为EC S+R数据,从而可以缩短存储系统的扩容时间。
作为一种可能的实现方式,其中R不小于M。在该实现方式中,通过使得第二EC技术的冗余配比中校验块的个数R不小于第一EC技术中校验块的个数M,从而保证存储数据的可靠性。
作为一种可能的实现方式,该方法还包括:接收读取请求。当读取请求用于请求读取N个第一数据块中数据的情况下,根据第一纠删码技术,读取N个第一数据块中数据。当读取请求用于请求读取S个第二数据块的情况下,根据第二纠删码技术,读取S个第二数据块中数据。该实现方式中,在接收到读取请求后,能够根据读取请求所请求读取的数据的不同,采用不同的EC技术读取数据。
作为一种可能的实现方式,在将第一纠删码技术更新为第二纠删码技术之前,该方法还包括:在存储系统增加存储节点。通过上述实现方式,本申请实施例所提供方法可以在存储系统增加存储节点的情况下,提高存储系统的容量利用率,避免存储资源浪费。
作为一种可能的实现方式,该方法还包括:将N个第一数据块和N个第一数据块的M个校验块中的一个或多个迁移至新增存储节点中。上述实现方式中,考虑到在对存储系统进行扩容后,可以通过将原有存储节点中的部分或全部数据迁移至新增存储节点中,降低原有存储节点的负载,从而均衡整个存储系统的负载。
作为一种可能的实现方式,存储节点为硬盘、硬盘框或者存储服务器中任一项。
第二方面,本申请提供一种数据存储装置,包括:处理单元,用于根据第一纠删码EC技术计算N个第一数据块的M个校验块。读写单元,用于将N个第一数据块和N个第一数据块的M个校验块分别存储到存储系统中的(N+M)个存储节点中的相应存储节点。处理单元,还用于将第一纠删码技术更新为第二纠删码技术。处理单元,还用于根据第二纠删码技术计算S个第二数据块的R个校验块;其中,S大于N,并且S与R的比值大于N与M的比值,S、R、N和M均为正整数,第二数据块为将第一纠删码技术更新为第二纠删码技术之后接收的数据块。读写单元,用于将S个第二数据块和S个第二数据块的R个校验块分别存储到存储系统中的(S+R)个存储节点中的相应存储节点。
作为一种可能的实现方式,处理单元,还用于在存储系统的空闲时段,从存储系统按照第一纠删码技术存储的数据块中选择S个目标数据块,根据第二纠删码技术计算S个目标数据块的R个校验块。读写单元,还用于将S个目标数据块和S个目标数据块的R个校验块分别存储到存储系统中相应的存储节点。
作为一种可能的实现方式,其中R不小于M。
作为一种可能的实现方式,数据存储装置还包括:接收单元,用于接收读取请求;读写单元,用于当读取请求用于请求读取N个第一数据块中数据的情况下,根据第一纠删码技术,读取N个第一数据块中数据;
读写单元,还用于当读取请求用于请求读取S个第二数据块的情况下,根据第二纠删码技术,读取S个第二数据块中数据。
作为一种可能的实现方式,处理单元,还用于在将第一纠删码技术更新为第二纠删码技术之前,在存储系统增加存储节点。
作为一种可能的实现方式,在存储系统增加存储节点之后,读写单元,还用于将N个第一数据块和N个第一数据块的M个校验块中的一个或多个迁移至新增存储节点中。
作为一种可能的实现方式,存储节点为硬盘、硬盘框或存储服务器中任一项。
第三方面,本申请提供一种数据存储装置,包括处理器和接口电路,处理器通过接口电路接收或发送数据,处理器通过逻辑电路或执行代码指令用于实现如上述第一方面或第一方面中任一项实现方式的方法。
第四方面,本申请提供一种存储系统,包括如上述第二方面或第二方面中任一项实现方式或第三方面的数据存储装置。
第五方面,本申请提供一种计算机可读存储介质,其特征在于,存储介质中存储有计算机程序,当计算机程序被处理器执行时,实现如上述第一方面或第一方面中任一项实现方式的方法。
第六方面,本申请提供一种计算机程序产品,其特征在于,计算机程序产品包括指令,当指令在处理器上运行时,实现如上述第一方面或第一方面中任一项实现方式的方法。
附图说明
图1为本申请实施例提供的一种存储系统的结构示意图之一;
图2为本申请实施例提供的一种存储系统的结构示意图之二;
图3为本申请实施例提供的一种数据存储装置的结构示意图之一;
图4为本申请实施例提供的一种数据存储方法的流程示意图之一;
图5为本申请实施例提供的一种数据存储方法的流程示意图之二;
图6为本申请实施例提供的一种数据存储方法的流程示意图之三;
图7为本申请实施例提供的一种数据存储方法的流程示意图之四;
图8为本申请实施例提供的一种数据存储方法的流程示意图之五;
图9为本申请实施例提供的一种数据存储装置的结构示意图之二。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行描述。其中,为了便于清楚描述本申请实施例的技术方案,在本申请的实施例中,采用了“第一”、“第二”等字样对功能和作用基本相同的相同项或相似项进行区分。本领域技术人员可以理解“第一”、“第二”等字样并不对数量和执行次序进行限定,并且“第一”、“第二”等字样也并不限定一定不同。同时,在本申请实施例中,“示例性的”或者“例如”等词用于表示作例子、例证或说明。本申请实施例中被描述为“示例性的”或者“例如”的任何实施例或设计方案不应被解释为比其它实施例或设计方案更优选或更具优势。确切而言,使用“示例性的”或者“例如”等词旨在以具体方式呈现相关概念,便于理解。
首先,对本申请实施例所提供技术方案的应用场景进行介绍。具体的,本申请实施例所提供技术方案可能应用于各类框架的存储系统:
示例性的,图1为本申请实施例提供的一种存储系统的示意图。其中,该存储系统100可以理解为将数据存储在多台独立的存储节点上的存储系统。其中,图1中终端121至终端125可以为将数据写入存储系统或从存储系统中读取数据。存储节点111至存储节点114用于存储数据。
在一种可能的设计中,图1中存储节点111至存储节点114可以分别为独立的服务器,进而图2所示为一种分布式存储系统的示意图。该分布式存储系统包括一个或多个服务器210(图2中示例性示出了三个服务器210),各个服务器210之间可以相互通信。服务器210是一种即具有计算能力又具有存储能力的设备,例如服务器、台式计算机等。在软件上,每台服务器210上具有操作系统。在服务器210上可以创建虚拟机207,虚拟机207所需的计算资源来源于服务器210本地的处理器212和内存213,而虚拟机207所需的存储资源既可以来源于服务器210本地的硬盘205,也可以来自其他服务器210中的硬盘205。此外,虚拟机207中可以运行各种应用程序,用户可以通过虚拟机207中的应用程序触发读/写数据请求。虚拟机207作为客户端访问分布式存储系统。
在硬件上,如图2所示,服务器210至少包括处理器212、内存213、网卡214和硬盘205。处理器212、内存213、网卡214和硬盘205之间通过总线连接。其中,处理器212和内存213用于提供计算资源。具体地,处理器212是一个中央处理器(central processing unit,CPU),用于处理来自服务器210外部的数据访问请求,或者服务器210内部生成的请求。示例性的,处理器212接收终端发送的写数据请求时,会将这些写数据请求中的数据暂时保存在内存213中。当内存213中的数据总量达到一定阈值时,处理器212将内存213中存储的数据发送给硬盘205进行持久化存储。除此之外,处理器212还用于数据进行计算或处理,例如元数据管理、重复数据删除、数据压缩、数据校验、虚拟化存储空间以及地址转换等。图2中仅示出了一个CPU212,在实际应用中,CPU212的数量往往有多个,其中,一个CPU212又具有一个或多个CPU核。本实施例不对CPU的数量,以及CPU核的数量进行限定。
内存213是指与处理器直接交换数据的内部存储器,它可以随时读写数据,而且速度很快,作为操作系统或其他正在运行中的程序的临时数据存储器。内存包括至少两种存储器,例如内存既可以是随机存取存储器,也可以是只读存储器(Read Only Memory,ROM)。举例来说,随机存取存储器是动态随机存取存储器(Dynamic Random Access Memory,DRAM),或者存储级存储器(Storage Class Memory,SCM)。DRAM是一种半导体存储器,与大部分随机存取存储器(Random Access Memory,RAM)一样,属于一种易失性存储器(volatile memory)设备。SCM是一种同时结合传统储存装置与存储器特性的复合型储存技术,存储级存储器能够提供比硬盘更快速的读写速度,但存取速度上比DRAM慢,在成本上也比DRAM更为便宜。然而,DRAM和SCM在本实施例中只是示例性的说明,内存还可以包括其他随机存取存储器,例如静态随机存取存储器(Static Random Access Memory,SRAM)等。而对于只读存储器,举例来说,可以是可编程只读存储器(Programmable Read Only Memory,PROM)、可抹除可编程只 读存储器(Erasable Programmable Read Only Memory,EPROM)等。另外,内存213还可以是双列直插式存储器模块或双线存储器模块(Dual In-line Memory Module,简称DIMM),即由动态随机存取存储器(DRAM)组成的模块,还可以是固态硬盘(Solid State Disk,SSD)。实际应用中,服务器210中可配置多个内存213,以及不同类型的内存213。本实施例不对内存213的数量和类型进行限定。此外,可对内存213进行配置使其具有保电功能。保电功能是指系统发生掉电又重新上电时,内存213中存储的数据也不会丢失。具有保电功能的内存被称为非易失性存储器。
硬盘205用于提供存储资源,例如存储数据。它可以是磁盘或者其他类型的存储介质,例如固态硬盘或者叠瓦式磁记录硬盘等。网卡214用于与其他应用服务器210通信。
容易理解的,图2仅示例性的提供一种分布式存储系统的框架。在另一些可能的设计中,分布式存储系统也可以采用其他的框架。例如,可以不在服务器210上创建虚拟机,由服务器210利用本地的计算资源(如处理器、内存)和存储资源(如硬盘),来完成读/写数据的请求。再例如,分布式存储系统中可以包括计算节点集群和存储节点集群。计算节点集群包括一个或多个计算节点,各计算节点之间可以相互通信。其中,各计算节点是一种计算设备,如服务器、台式计算机或者存储阵列的控制器等。各计算节点可以通过网络与存储节点集群中任意一个存储节点通信,用于向存储节点中的硬盘写入数据或读取存储节点中硬盘上的数据。
另外,为了便于对本申请实施例所提供的技术方案进行理解,上文中主要以分布式存储系统为例对本申请实施例的应用场景进行介绍,但需要说明的是,上述对分布式存储系统的相关描述并不宜理解为对本申请所应用的存储系统的框架的限制。例如,在另一些应用场景中,本申请实施例也可以应用于集中式存储系统中。具体的,与分布式存储系统不同的是,集中式存储系统可以理解为由一台或多台主设备组成的一种中心节点,数据集中存储于该中心节点中,并且整个系统的数据处理业务集中部署在该中心节点上。也就是说,本申请实施例对于应用本申请实施例所提供技术方案的存储系统的框架结构,可以不做限制。
目前,可以通过数据冗余技术来保证存储系统的数据可靠性,其中常用的数据冗余技术包括纠删码(erasure coding,EC)技术。EC技术是指通过计算P个数据块(也可称为数据列)的Q个校验块(也可称为校验列),最终将P个数据块和Q个校验块(统称为P+Q个块)分别存储在存储系统的不同存储位置,例如在分布式存储系统中,将P+Q个块分别存储在不同的存储服务器中。当P+Q个块中损坏的块的数量小于Q时,均可通过未损坏的块来进行恢复。其中,为简化描述,下文中将EC技术中数据块的个数P和校验块Q的个数,可以称为冗余配比,表示为EC P+Q。
在采用EC技术的存储系统中,通常会在创建存储系统时配置存储系统所采用的EC技术,即配置冗余配比EC P+Q。之后,系统根据该冗余配比进行数据存储。但是,随着存储系统中存储节点的增加,若依然沿用原来的冗余配比,则会导致存储资源的浪费。
例如,在创建存储系统时,存储系统包括6个存储节点,此时采用EC4+2的冗余配比进行数据存储,存储系统的容量利用率约为66.7%(即4/(4+2)×100%);之后在 存储节点扩容后,例如扩容至10个存储节点,此时若依然保持EC 4+2的冗余配比进行数据存储,则会导致存储资源的浪费。实际上此时可以采用数据块个数占比更大的冗余配比,以在满足数据可靠性约束条件的同时提高容量利用率。
为了解决上述问题,相关技术中提出了以下三种技术方案,来完成存储系统的扩容。
技术方案一:
先按照更高容量利用率的冗余配比创建一个新的存储系统,然后将原存储系统的数据迁移到新存储系统中,并完成业务切换,再将原存储系统的硬件加入新存储系统。
例如,原存储系统包括6个存储节点,此时采用EC 4+2的冗余配比进行数据存储;当需要将存储系统扩容为10个存储节点时,则重新利用新增的10个存储节点,按照更高容量利用率的冗余配比(例如EC 8+2)创建新的存储系统,然后将原存储系统的数据按照新的冗余配比EC 8+2迁移到新的存储系统中,并完成业务切换。之后,再还可以将原存储系统的6个存储节点加入新的存储系统中,进一步扩充存储系统的容量。
可以看出,上述这种技术方案虽然可以达到在扩容存储系统时提高容量利用率的效果。但是,一方面,该技术方案对新增扩容的硬件数量要求较高,例如在上述示例中若扩容的存储节点个数少于6个,则无法实现上述技术方案;另一方面,该方案中需要在扩容过程中,进行存储系统之间的数据迁移和业务切换,运维操作复杂、扩容花费的时间较长、占用资源开销大,使用不友好。特别是对于需要持续扩容的存储系统,则需要的运维成本和操作风险则会更高。
技术方案二:
先将存储系统中的数据暂存至某处存储空间(例如该存储空间可以为另一个存储系统)内;然后利用原存储系统的硬件和新增存储节点的硬件,按照更高容量利用率的冗余配比创建一个新的存储系统,然后将暂存的数据迁移到新存储系统中,并完成业务切换。
可以看出,上述这种技术方案虽然可以达到在扩容存储系统时提高容量利用率的效果。但是,一方面,该技术方案需要额外提供较多的存储空间用于暂存原存储系统的数据,随着存储系统的规模扩大,需要暂存的存储空间也越大;另一方面,该方案中需要在扩容过程中,进行存储系统之间的数据迁移和业务切换,运维操作复杂、扩容花费的时间较长、占用资源开销大,使用不友好。特别是对于需要持续扩容的存储系统,则需要的运维成本和操作风险则会更高。
技术方案三:
在初始创建存储系统时,采用数据块和检验块都更多的冗余配比创建存储系统。例如,在创建包括两个存储节点的存储系统时,一般情况下可以采用EC 1+1的冗余配比,来保证数据可靠性;但在该技术方案中,则可以采用数据块和检验块都更多的冗余配比,例如EC10+10。
然后,在对存储系统进行扩容时,减少存储系统中冗余配比中的校验块个数。继续上述示例,在扩容至三个存储节点时,如下表1所示按照EC10+5的冗余配比,计算存储系统所存储数据块的校验块,即每10个数据块的5个校验块;然后将10个数据块和5个校验块分别存储至三个存储节点中。再例如,在扩容至四个存储节点时,如 下表1所示按照EC10+4的冗余配比,计算存储系统所存储数据块的校验块,即计算存储系统中10个数据块的4个校验块;然后将10个数据块和4个校验块分别存储至四个存储节点中。
表1
存储节点数 冗余配比 容量利用率
2 10+10 50.0%
3 10+5 66.7%
4 10+4 71.4%
5 10+3 76.9%
6 10+2 83.3%
可以看出,上述技术方案虽然可以达到在扩容存储系统时提高容量利用率的效果。但是,该技术方案在创建存储系统时需要配置数据块和校验块个数更多的冗余配比。并且在持续扩容场景下,冗余配比中校验块个数存在缩减下限,例如上述表1所示示例中在EC10+2之后,由于可靠性约束,无法进一步缩减校验块个数。
为了解决上述技术问题,首先可以通过更新存储系统的EC技术,以改变冗余配比,例如将EC N+M改为EC S+R,使得新存入的数据按照数据块个数更多并且占比更大(即S大于N,并且S与R的比值大于N与M的比值)的冗余配比进行存储的这一方式,从而提高容量利用率;进一步的,由于将存储系统中采用原冗余配比EC N+M的数据转换为新冗余配比EC S+R的数据,需要花费较长时间以及占用较多资源,因此本申请实施例中在更新存储系统的冗余配比时,可以先不改变现有EC N+M数据的结构,而是在后续存储系统的空闲时段再将存储系统中的EC N+M数据转换为EC S+R数据,从而缩短更新存储系统的冗余配比的时间。
以下结合附图对本申请实施例提供的技术方案进行介绍:
具体的,本申请实施例提供一种存储系统中的数据存储方法,在具体实施时,该方法可由如图3所示的数据存储装置30来实现。
其中,数据存储装置30包括:至少一个处理器301以及存储器302。另外,数据存储装置30还可以包括通信线路303以及通信接口304。
其中,处理器301用于执行存储器302中的计算机执行指令,以实现本申请所提供的数据存储方法。
具体的,处理器301可以是一个通用中央处理器(central processing unit,CPU),微处理器,特定应用集成电路(application-specific integrated circuit,ASIC),或一个或多个用于控制本申请方案程序执行的集成电路。
存储器302可以是只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)或者可存储信息和指令的其他类型的动态存储设备,也可以是电可擦可编程只读存储器(electrically erasable programmable read-only memory,EEPROM)、只读光盘(compact disc read-only memory,CD-ROM)或其他光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其他磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的程序代码并能够由计算机 存取的任何其他介质,但不限于此。存储器302可以是独立存在,通过通信线路303与处理器相连接。存储器302也可以和处理器集成在一起。
通信线路303可以包括数据总线,用于在上述组件之间传送信息。
通信接口304,用于与其他装置进行通信。例如,数据存储装置30可以通过通信接口304与存储系统中其他硬件设备通信,以便执行本申请实施例所提供数据存储方法。
在实际应用过程中,该数据存储装置30可以为存储系统中用于管理、控制存储系统的硬件装置,例如该数据存储装置可以为如图2所示分布式存储系统中具有管理、控制功能的存储服务器或存储服务器内部的部分硬件;再例如,或者该数据存储装置可以为集中式存储系统中的存储引擎或存储引擎内部的部分硬件。
下面以对存储系统进行扩容的场景为例,对该方法进行介绍,如图4所示,该方法包括:
S401、数据存储装置根据第一EC技术计算N个第一数据块的M个校验块。
其中,N个第一数据块可以是在为存储系统配置了第一EC技术后,存储系统接收到的待存储数据中的N个数据块。
具体的,在为存储系统配置了第一EC技术后,存储系统可以将接收到的待存储数据暂存起来。例如暂存至存储系统的一个存储节点(可以是数据存储装置所在存储节点)的内存中。然后,当暂存的数据量到达阈值后,数据存储装置将暂存的数据平均分为N个数据块(例如上述N个第一数据块),并计算该N个数据块的M个校验块。
需要说明的是,本申请实施例中,第一EC技术,可以理解为根据预先配置的第一EC技术对应的冗余配比(即N个数据块对应M个校验块,简称EC N+M)计算数据块的校验块的一种EC技术。其中,在利用第一EC技术计算数据块的校验块时,通常采用N个数据块对应M个校验块的这种冗余关系,进行数据冗余;但在一些情况下,例如存储系统中某个存储节点出现故障的情况下,在利用冗余配比为EC N+M的第一EC技术计算数据块的校验块时,也可以按照数据块更少的冗余关系(例如N-1个数据块对应M个校验块,或者N-2个数据块对应M个校验块等等),进行数据冗余处理。但是,由于受到冗余配比EC N+M的约束,不会出现按照数据块更多的冗余关系(例如N+1个数据块对应M个校验块,或者N+2个数据块对应M个校验块等等)来进行数据冗余处理。
S402、数据存储装置将N个第一数据块和M个校验块分别存储到存储系统中的(N+M)个存储节点中的相应存储节点。
例如,目前的存储系统包括6个存储节点,第一EC技术对应的冗余配比为EC4+2。则数据存储装置将待存储数据平均分为4个第一数据块,并计算4个第一数据块的2个校验块。然后将4个第一数据块和2个校验块分别存储至6个存储节点中的不同存储节点。
其中,需要说明的是,当本申请实施例所提供方法应用于分布式存储系统中时,该方法中所称存储节点可以为分布式系统中实现存储节点功能的存储服务器、硬盘或硬盘框。当本申请实施例所提供方法应用于集中式存储系统时,该方法中所称存储节点分别可以为一个或多个硬盘或硬盘框等。
S403、数据存储装置在存储系统中增加存储节点。
具体的,在存储系统使用过程中,当存储系统的存储空间不足时,可以通过增加存储节点(如存储服务器、硬盘框或硬盘等)的方式,扩容存储系统。例如,在将待增加的存储服务器、硬盘框或硬盘上电并接入存储系统所在网络后,运维人员通过在操作界面上勾选待增加的存储服务器、硬盘框或硬盘,触发数据存储装置将存储服务器、硬盘框或硬盘增加至存储系统,从而实现在存储系统中增加存储节点。
S404、数据存储装置将N个第一数据块和N个第一数据块的M个校验块中的一个或多个迁移至新增存储节点中。
具体的,考虑到在存储系统增加存储节点后,相比于存储系统原本的存储节点而言,新增存储节点中具有更大的空闲存储空间,因此可以通过将原本的存储节点中存储的部分数据迁移至新增存储节点,具体的可以将N个第一数据块和N个第一数据块的M个校验块中的至少部分第一数据块或校验块迁移至新增存储节点中,从而使数据在各存储节点上均匀分布、均衡各存储节点负载。
示例性的,以4个第一数据块和4个第一数据块的2个校验块为例,如图5所示,在扩容前4个第一数据块(如图中区块1-区块4)分别存储在节点1-节点4中,2个校验块(如图中区块5-区块6)分别存储在节点5和节点6中(图中阴影所示);在扩容后,可以将节点4中的第一数据块(即区块4)迁移至节点7,将节点6中的校验块(即区块6)迁移至节点8,从而降低节点4和节点6的负载。
需要说明的是,在具体实施过程中,为了简化存储系统的扩容过程,也可以不执行上述S304的内容。
S405、数据存储装置将第一EC技术更新为第二EC技术。
其中,第二EC技术用于对将第一EC技术更新为第二EC技术之后存储系统接收到的数据块根据第二EC技术对应的冗余配比进行存储。其中,第二EC技术对应的冗余配比(EC S+R)中数据块的个数S大于第一EC技术对应的冗余配比(EC N+M)中数据块的个数N,并且S与R的比值大于N与M的比值。
其中,与上文对第一EC技术的描述同理,本申请实施例中第二EC技术,可以理解为根据第而EC技术对应的冗余配比(即EC S+R)计算数据块的校验块的一种EC技术。其中,在利用冗余配比为EC S+R的第二EC技术计算数据块的校验块时,通常是S个数据块对应R个校验块的这种冗余关系,进行数据冗余;但在一些情况下,例如存储系统中某个存储节点出现故障的情况下,在利用第二EC技术计算数据块的校验块时,也可以按照数据块更少的冗余关系(例如S-1个数据块对应R个校验块,或者S-2个数据块对应R个校验块等等),进行数据冗余处理。但是,由于受到冗余配比EC S+R的约束,不会出现按照数据块更多的冗余关系(例如S+1个数据块对应R个校验块,或者S+2个数据块对应R个校验块等等)来进行数据冗余处理。
换句话讲,如前文对相关技术的描述:在对存储系统进行扩容的场景下,为了达到提高容量利用率的目的,可以采用在扩容时重新构建存储系统这一方式,该方式存在扩容过程复杂、对新增的硬件数量要求高、需要借用暂存的存储空间的问题;另外,还可以采用在构建存储系统时采用数据块和检验块都更多的冗余配比,并在后续扩容中通过保持冗余配比中数据块个数不变并且重新按照较少校验块个数冗余配比中校验 块个数的这一方式,但该方法需要在构建存储系统时采用数据块和校验块更多的冗余配比因此构建存储系统的流程复杂。也就是说,相关技术中,出于降低扩容过程的复杂程度的目的,通常不会在存储系统运行过程中将存储系统采用的EC技术更新为数据块个数更多的EC技术的这种方式,而本申请中则可以打破这种技术偏见,在对存储系统进行扩容时,直接采用将存储系统的EC技术更新为上述第二EC技术的技术手段,从而提高存储系统的容量利用率;而对于扩容过程复杂的问题,则可以在将存储系统的EC技术更新为上述第二EC技术的基础上,采用其他技术手段进行解决,例如下文所述可以采用在扩容过程中保持先前数据(即更新EC技术之前存入存储系统的数据)采用的EC技术不变,因此可以不做数据迁移,而是在必要时仅做EC元数据的迁移(例如,在存储系统的主节点发生改变时,将先前数据的EC元数据迁移至新的主节点)。一方面,此时若需要读取数据,则采用利用第一EC技术读取先前数据、利用第二EC技术读取扩容后存储的数据的方式读取数据;另一方面,在后续存储系统的空闲时段,再采用第二EC技术对原本利用第一EC技术的数据块重新计算校验块并保存,这样一来便可以避免为了保持被存储数据的冗余关系的一致性,而需要在扩容过程中将先前数据的冗余配比改变为第二EC技术对应的冗余配比的这一过程,从而降低扩容过程的复杂度;另外,由于本申请实施例所提供方法可以避免扩容过程中消耗相应资源来改变先前数据采用的EC技术,因此可以缩短扩容过程的时长并且提高扩容过程中存储系统的读写性能。
其中,对于确定第二EC技术对应的冗余配比EC S+R中的S和R的数值的具体方式,可以根据具体实施过程中的实际需求进行设置。例如,可以由运维人员在对存储系统进行扩容时手动配置S和R的数值。再例如,可以在对存储系统进行扩容过程中,数据存储装根据扩容后的存储节点个数,确定S和R的数值。示例性的,若存储系统在扩容前包括6个存储节点,第一EC技术对应的冗余配比为EC4+2,扩容之后存储系统包括8个存储节点,则数据存储装置根据数据可靠性或容量利用率的相关约束条件,确定第二EC技术对应的冗余配比为EC6+2。
进一步的,该方法还可以包括:
S406、数据存储装置根据第二EC技术计算S个第二数据块的R个校验块。
其中,第二数据块为将第一EC技术更新为第二EC技术之后接收的数据块。
在一种实现方式中,上述R不小于M。在该实现方式中,通过使得第二EC技术的冗余配比中校验块的个数R不小于第一EC技术中校验块的个数M,从而保证存储数据的可靠性。
S407、数据存储装置将上述S个第二数据块以及S个第二数据块的R个校验块分别存储到存储系统中的(S+R)个存储节点中的相应存储节点。
例如,若第一EC技术对应的冗余配比为EC4+2,第二EC技术对应冗余配比为EC6+2,那么上述S个第二数据块和S个第二数据块的R个校验块可以为6个第二数据块和6个第二数据块的2个校验块。需要说明的是,本示例中是以R与M相等(即R=M=2)为例进行举例说明,在具体实现过程中R与M可以不相等,对此本申请实施例可以不做限制。继续图5所示示例,在将4个第一数据块分别存至节点1、节点2、节点3和节点7,以及将4个第一数据块的2个校验块分别存至节点5和节点8之后,可以 将6个第二数据块(即图中区块7-12)分别存至节点1-节点6,将6个第二数据块的两个校验块(即图中区块13-14)分别存至节点7和节点8。也就是说,此时存储系统中既存储有利用第一EC技术进行存储的数据(即区块1-6)还存储有利用第二EC技术进行存储的数据(即区块7-14)。
在一种实现方式中,考虑到在本申请实施例所提供的方法中,存储系统中可能同时存储有利用两种(或两种以上)不同EC技术进行存储的数据。因此,如图6所示,该方法还可以包括:
S408、数据存储装置接收读取请求。
其中,读取请求用于请求读取存储系统中存储的数据。
其中,当读取请求用于请求读取N个第一数据块中数据的情况下,则执行下文S409;其中,当读取请求用于请求读取S个第二数据块中数据的情况下,则执行下文S410。
S409、根据第一EC技术,读取N个第一数据块中数据。
示例性的,当读取请求用于请求读取N个第一数据块中数据的情况下,数据存储装置通过读取N个第一数据块对应的元数据,确定N个第一数据块的存储地址(可以为物理地址或逻辑地址),以及N个第一数据块所采用的第一EC技术。然后根据第一EC技术,读取N个第一数据块中的待读取数据。
S410、根据第二EC技术,读取S个第二数据块中数据。
示例性的,当读取请求用于请求读取S个第二数据块中数据的情况下,数据存储装置通过读取S个第二数据块对应的元数据,确定S个第二数据块的存储地址(可以为物理地址或逻辑地址),以及S个第二数据块所采用的第二EC技术。然后根据第二EC技术,读取S个第二数据块中的待读取数据。
另外,在一种实现方式中,为了将存储系统中利用第一EC技术存储的数据转换为利用第二EC技术存储的数据,从而保持存储系统的数据结构的一致性,便于对存储系统中存储数据进行管理。进而,如图7所示,该方法还可以包括:
S411、数据存储装置从存储系统按照第一EC技术存储的数据块中选择S个数据块(为便于描述,下文将该S个数据块称为S个第三数据块),根据第二EC技术计算S个第三数据块的R个校验块。
例如,可以在存储系统的空闲时段,通过执行S411,以将存储系统中利用第一EC技术存储的数据转换为利用第二EC技术存储的数据,从而保持存储系统的数据结构的一致性。
其中,存储系统的空闲时段,也可以称为存储系统的运行负载低于负载阈值的时段。具体的,存储系统的空闲时段,可以表现为:存储系统的当前待写入数据少于预设阈值(可称为“第一预设阈值”)、存储系统的当前待读取数据少于预设阈值(可称为“第二预设阈值”),或者存储系统的相关硬件资源利用率低于预设阈值(可称为“第三预设阈值”)中的一项或多项。
S412、数据存储装置将S个第三数据块和S个第三数据块的R个校验块分别存储至存储系统中相应的存储节点。
例如,将S个第三数据块和S个第三数据块的R个校验块分别存储至存储系统中的(S+R)个存储节点中。
上述实现方式中,考虑到在扩容存储系统的场景下,在更新存储系统的EC技术并改变所采用EC技术的冗余配比后,可以对先前数据(即更新EC技术之前存入的数据)保持原来的冗余关系。然后,在存储系统处于空闲时段时,再利用第二EC技术转换先前数据的冗余关系。这样一来可以在保持存储系统的数据结构的一致性的同时,达到降低扩容过程的复杂度、均衡存储系统负载的效果。
另外,当对存储系统中的先前数据进行删除操作时,存储系统也可以回收根据第一EC技术存储的数据所占用的存储空间,然后根据第二EC技术向该存储空间存入数据,从而进一步提升存储系统的容量利用率。
上述主要是以对存储系统进行扩容的场景,为本申请实施例所提供方法进行介绍。在实际应用过程中,本申请也可以用于其他场景中,例如存储系统中初始配置的第一EC技术的冗余配比不合理,采用第二EC技术才更加合理的场景下。因此,如图8所示,本申请实施例所提供的数据存储方法还可以包括:
S501、数据存储装置根据第一EC技术计算N个第一数据块的M个校验块。
其中,关于S501的具体实施过程可以参照上述S401的相应内容,在此不再赘述。
S502、数据存储装置将N个第一数据块和M个校验块分别存储到存储系统中的(N+M)个存储节点。
其中,关于S502的具体实施过程可以参照上述S402的相应内容,在此不再赘述。
S503、数据存储装置将第一EC技术更新为第二EC技术。
也就是说,与上述图4、图6或图7所描述方法不同的是,本方法中可以不应用在存储系统扩容的场景下,而是可以直接对存储系统的EC技术进行更新,以使得存储系统对更新后接收到的数据根据第二EC技术进行存储。
S504、数据存储装置根据第二EC技术计算S个第二数据块的R个校验块。
其中,关于S504的具体实施过程可以参照上述S406的相应内容,在此不再赘述。
S505、数据存储装置将上述S个第二数据块以及S个第二数据块的R个校验块分别存储到存储系统中的(S+R)个存储节点。
其中,关于S505的具体实施过程可以参照上述S407的相应内容,在此不再赘述。
另外,可以理解的是,在例如存储系统中初始配置的第一EC技术的冗余配比不合理,采用第二EC技术才更加合理的场景下,可以采用与上述S408-S410同理的方式,读取存储系统中的数据;另外还可以采用与上述S411-S412同理的方式,将按照第一EC技术存储的数据转换为按照第二EC技术存储的数据。也就是说,本申请提供的各个方法流程之间是可以关联的,并且可以相互参考或引用。
另外,本申请实施例中,数据存储装置可以执行本申请实施例中的部分或全部步骤,这些步骤或操作仅是示例,本申请实施例中,还可以执行其它操作或者各种操作的变形。此外,各个步骤可以按照本申请实施例呈现的不同的顺序来执行,并且有可能并非要执行本申请实施例中的全部操作。
可以理解的是,为了实现上述实施例中功能,数据存储装置包括了执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本申请中所公开的实施例描述的各示例的单元及方法步骤,本申请能够以硬件或硬件和计算机软件相结合的形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行, 取决于技术方案的特定应用场景和设计约束条件。
图9为本申请提供的另一种数据存储装置的结构示意图。该数据存储装置600可以用于实现上述方法实施例中各步骤的功能,因此也能实现上述方法实施例所具备的有益效果。在本申请的实施例中,该数据存储装置600可以为如图2所示分布式存储系统中具有管理、控制功能的存储服务器或存储服务器内部的部分硬件;再例如,或者该数据存储装置可以为集中式存储系统中的存储引擎或存储引擎内部的部分硬件。
如图9所示,该数据存储装置600包括处理单元601、读写单元602和接收单元603。该数据存储装置600用于实现上述图4或图6~图8中所示的方法实施例中各步骤的功能。
例如,当数据存储装置600用于实现图4所示的方法时:处理单元601用于执行S401、S403、S405或S406中的一项或多项;读写单元602用于执行S402、S404或S407中的一项或多项。
再例如,当数据存储装置600用于实现图6所示的方法时:接收单元603用于执行S408;读写单元602还用于执行S409或S410中的一项或多项;
再例如,当数据存储装置600用于实现图7所示的方法时:处理单元601还用于执行S411;读写单元602还用于执行S412。
再例如,当数据存储装置600用于实现图8所示的方法时:处理单元601用于执行S501、S503或S504中的一项或多项;读写单元602用于执行S502或S505中的一项或多项。
有关上述处理单元601、读写单元602和接收单元603更详细的描述可以直接参考图4或图6~图8所示的方法实施例中相关描述直接得到,这里不加赘述。
本申请的实施例中的方法步骤可以通过硬件的方式来实现,也可以由处理器执行软件指令的方式来实现。软件指令可以由相应的软件模块组成,软件模块可以被存放于RAM、闪存、ROM、PROM、EPROM、EEPROM、寄存器、硬盘、移动硬盘、CD-ROM或者本领域熟知的任何其它形式的存储介质中。一种示例性的存储介质耦合至处理器,从而使处理器能够从该存储介质读取信息,且可向该存储介质写入信息。当然,存储介质也可以是处理器的组成部分。处理器和存储介质可以位于ASIC中。另外,该ASIC可以位于网络设备或终端设备中。当然,处理器和存储介质也可以作为分立组件存在于网络设备或终端设备中。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机程序或指令。在计算机上加载和执行所述计算机程序或指令时,全部或部分地执行本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、网络设备、用户设备或者其它可编程装置。所述计算机程序或指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机程序或指令可以从一个网站站点、计算机、服务器或数据中心通过有线或无线方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是集成一个或多个可用介质的服务器、数据中心等数据存储设备。 所述可用介质可以是磁性介质,例如,软盘、硬盘、磁带;也可以是光介质,例如,数字视频光盘(digital video disc,DVD);还可以是半导体介质,例如,SSD。
在本申请的各个实施例中,如果没有特殊说明以及逻辑冲突,不同的实施例之间的术语和/或描述具有一致性、且可以相互引用,不同的实施例中的技术特征根据其内在的逻辑关系可以组合形成新的实施例。
本申请说明书和权利要求书及上述附图中的术语“第一”、“第二”和“第三”等是用于区别不同对象,而不是用于限定特定顺序。在本申请实施例中,“示例性的”或者“例如”等词用于表示作例子、例证或说明。本申请实施例中被描述为“示例性的”或者“例如”的任何实施例或设计方案不应被解释为比其它实施例或设计方案更优选或更具优势。确切而言,使用“示例性的”或者“例如”等词旨在以具体方式呈现相关概念。
本申请中,“至少一个”是指一个或者多个,“多个”是指两个或两个以上,其它量词与之类似。“和/或”描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。此外,对于单数形式“a”,“an”和“the”出现的元素(element),除非上下文另有明确规定,否则其不意味着“一个或仅一个”,而是意味着“一个或多于一个”。例如,“a device”意味着对一个或多个这样的device。再者,至少一个(at least one of).......”意味着后续关联对象中的一个或任意组合,例如“A、B和C中的至少一个”包括A,B,C,AB,AC,BC,或ABC。在本申请的文字描述中,字符“/”,一般表示前后关联对象是一种“或”的关系;在本申请的公式中,字符“/”,表示前后关联对象是一种“相除”的关系。
可以理解的是,在本申请的实施例中涉及的各种数字编号仅为描述方便进行的区分,并不用来限制本申请的实施例的范围。上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定。

Claims (18)

  1. 一种存储系统中的数据存储方法,其特征在于,所述方法包括:
    根据第一纠删码EC技术计算N个第一数据块的M个校验块;
    将所述N个第一数据块和所述N个第一数据块的M个校验块分别存储到所述存储系统中的(N+M)个存储节点中的相应存储节点;
    将所述第一纠删码技术更新为第二纠删码技术;
    根据所述第二纠删码技术计算S个第二数据块的R个校验块;其中,S大于N,并且S与R的比值大于N与M的比值,S、R、N和M均为正整数,所述第二数据块为将所述第一纠删码技术更新为所述第二纠删码技术之后接收的数据块;
    将所述S个第二数据块和所述S个第二数据块的R个校验块分别存储到所述存储系统中的(S+R)个存储节点中的相应存储节点。
  2. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    从所述存储系统按照第一纠删码技术存储的数据块中选择S个目标数据块,根据所述第二纠删码技术计算所述S个目标数据块的R个校验块;
    将所述S个目标数据块和所述S个目标数据块的R个校验块分别存储到所述存储系统中相应的存储节点。
  3. 根据权利要求1或2所述的方法,其特征在于,其中R不小于M。
  4. 根据权利要求1-3任一项所述的方法,其特征在于,所述方法还包括:
    接收读取请求;
    当所述读取请求用于请求读取所述N个第一数据块中数据的情况下,根据所述第一纠删码技术,读取所述N个第一数据块中数据;
    当所述读取请求用于请求读取所述S个第二数据块的情况下,根据所述第二纠删码技术,读取所述S个第二数据块中数据。
  5. 根据权利要求1-4任一项所述的方法,其特征在于,在所述将所述第一纠删码技术更新为第二纠删码技术之前,所述方法还包括:
    在所述存储系统增加存储节点。
  6. 根据权利要求5所述的方法,其特征在于,在所述存储系统增加存储节点之后,所述方法还包括:
    将所述N个第一数据块和所述N个第一数据块的M个校验块中的一个或多个迁移至新增存储节点中。
  7. 根据权利要求1-6任一项所述的方法,其特征在于,所述存储节点为硬盘、硬盘框或者存储服务器中任一项。
  8. 一种数据存储装置,其特征在于,包括:
    处理单元,用于根据第一纠删码EC技术计算N个第一数据块的M个校验块;
    读写单元,用于将所述N个第一数据块和所述N个第一数据块的M个校验块分别存储到存储系统中的(N+M)个存储节点中的相应存储节点;
    所述处理单元,还用于将所述第一纠删码技术更新为第二纠删码技术;
    所述处理单元,还用于根据所述第二纠删码技术计算S个第二数据块的R个校验块;其中,S大于N,并且S与R的比值大于N与M的比值,S、R、N和M均为正整数, 所述第二数据块为将所述第一纠删码技术更新为第二纠删码技术之后接收的数据块;
    所述读写单元,用于将所述S个第二数据块和所述S个第二数据块的R个校验块分别存储到所述存储系统中的(S+R)个存储节点中的相应存储节点。
  9. 根据权利要求8所述的数据存储装置,其特征在于,所述处理单元,还用于从所述存储系统按照第一纠删码技术存储的数据块中选择S个目标数据块,根据所述第二纠删码技术计算所述S个目标数据块的R个校验块;
    所述读写单元,还用于将所述S个目标数据块和所述S个目标数据块的R个校验块分别存储到所述存储系统中相应的存储节点。
  10. 根据权利要求8或9所述的数据存储装置,其特征在于,其中R不小于M。
  11. 根据权利要求8-10任一项所述的数据存储装置,其特征在于,所述数据存储装置还包括:
    接收单元,用于接收读取请求;
    所述读写单元,用于当所述读取请求用于请求读取所述N个第一数据块中数据的情况下,根据所述第一纠删码技术,读取所述N个第一数据块中数据;
    所述读写单元,还用于当所述读取请求用于请求读取所述S个第二数据块的情况下,根据所述第二纠删码技术,读取所述S个第二数据块中数据。
  12. 根据权利要求8-11任一项所述的数据存储装置,其特征在于,所述处理单元,还用于在所述将所述第一纠删码技术更新为第二纠删码技术之前,在所述存储系统增加存储节点。
  13. 根据权利要求12所述的数据存储装置,其特征在于,在所述存储系统增加存储节点之后,所述读写单元,还用于将所述N个第一数据块和所述N个第一数据块的M个校验块中的一个或多个迁移至新增存储节点中。
  14. 根据权利要求8-13任一项所述的数据存储装置,其特征在于,所述存储节点为硬盘、硬盘框或存储服务器中任一项。
  15. 一种数据存储装置,其特征在于,包括处理器和接口,所述处理器通过所述接口接收或发送数据,所述处理器用于实现如权利要求1-7中任一项所述的方法。
  16. 一种存储系统,其特征在于,包括如权利要求8-15任一项所述的数据存储装置和多个存储节点。
  17. 一种计算机可读存储介质,其特征在于,所述存储介质中存储有计算机程序,当所述计算机程序被处理器执行时,实现如权利要求1-7任一项所述的方法。
  18. 一种计算机程序产品,其特征在于,所述计算机程序产品包括指令,当所述指令在处理器上运行时,实现如权利要求1-7任一项所述的方法。
PCT/CN2022/080193 2021-07-22 2022-03-10 存储系统中的数据存储方法以及装置 WO2023000686A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP22844859.3A EP4369170A1 (en) 2021-07-22 2022-03-10 Method and apparatus for data storage in storage system
US18/418,737 US20240160528A1 (en) 2021-07-22 2024-01-22 Data Storage Method and Apparatus in Storage System

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110831638.X 2021-07-22
CN202110831638.XA CN115686342A (zh) 2021-07-22 2021-07-22 存储系统中的数据存储方法以及装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/418,737 Continuation US20240160528A1 (en) 2021-07-22 2024-01-22 Data Storage Method and Apparatus in Storage System

Publications (1)

Publication Number Publication Date
WO2023000686A1 true WO2023000686A1 (zh) 2023-01-26

Family

ID=84978869

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/080193 WO2023000686A1 (zh) 2021-07-22 2022-03-10 存储系统中的数据存储方法以及装置

Country Status (4)

Country Link
US (1) US20240160528A1 (zh)
EP (1) EP4369170A1 (zh)
CN (1) CN115686342A (zh)
WO (1) WO2023000686A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117075821A (zh) * 2023-10-13 2023-11-17 杭州优云科技有限公司 一种分布式存储方法、装置、电子设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150378820A1 (en) * 2014-06-25 2015-12-31 Quantum Corporation High Reliability Erasure Code Distribution
CN109426590A (zh) * 2017-09-01 2019-03-05 阿里巴巴集团控股有限公司 用于数据节点存储数据的方法和用于恢复数据的方法
US20200250031A1 (en) * 2019-02-05 2020-08-06 Alibaba Group Holding Limited Method and system for mitigating read disturb impact on persistent memory
CN112115001A (zh) * 2020-09-18 2020-12-22 深圳市欢太科技有限公司 数据备份方法、装置、计算机存储介质及电子设备
CN113918083A (zh) * 2020-07-10 2022-01-11 华为技术有限公司 分条管理方法、存储系统、分条管理装置及存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150378820A1 (en) * 2014-06-25 2015-12-31 Quantum Corporation High Reliability Erasure Code Distribution
CN109426590A (zh) * 2017-09-01 2019-03-05 阿里巴巴集团控股有限公司 用于数据节点存储数据的方法和用于恢复数据的方法
US20200250031A1 (en) * 2019-02-05 2020-08-06 Alibaba Group Holding Limited Method and system for mitigating read disturb impact on persistent memory
CN113918083A (zh) * 2020-07-10 2022-01-11 华为技术有限公司 分条管理方法、存储系统、分条管理装置及存储介质
CN112115001A (zh) * 2020-09-18 2020-12-22 深圳市欢太科技有限公司 数据备份方法、装置、计算机存储介质及电子设备

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117075821A (zh) * 2023-10-13 2023-11-17 杭州优云科技有限公司 一种分布式存储方法、装置、电子设备及存储介质
CN117075821B (zh) * 2023-10-13 2024-01-16 杭州优云科技有限公司 一种分布式存储方法、装置、电子设备及存储介质

Also Published As

Publication number Publication date
EP4369170A1 (en) 2024-05-15
US20240160528A1 (en) 2024-05-16
CN115686342A (zh) 2023-02-03

Similar Documents

Publication Publication Date Title
KR102457611B1 (ko) 터넌트-어웨어 스토리지 쉐어링 플랫폼을 위한 방법 및 장치
US9600181B2 (en) Live configurable storage
US7721044B1 (en) Expanding the storage capacity of a virtualized data storage system
US9977618B2 (en) Pooling of memory resources across multiple nodes
CN103513938B (zh) 一种独立磁盘冗余阵列raid系统扩容方法及装置
WO2021008197A1 (zh) 资源分配方法、存储设备和存储系统
CN109933312B (zh) 一种有效降低容器化关系型数据库i/o消耗的方法
US20140324778A1 (en) Hierarchical Coherency Log for Managing a Distributed Data Storage System
WO2019148841A1 (zh) 一种分布式存储系统、数据处理方法和存储节点
CN114860163A (zh) 一种存储系统、内存管理方法和管理节点
WO2023035646A1 (zh) 一种扩展内存的方法、装置及相关设备
US20240160528A1 (en) Data Storage Method and Apparatus in Storage System
US20230163789A1 (en) Stripe management method, storage system, stripe management apparatus, and storage medium
CN105468296A (zh) 基于虚拟化平台的无共享存储管理方法
JP2022553951A (ja) ブロックデバイスの構築
CN109358818B (zh) 一种数据中心的块设备io请求处理方法
WO2019091349A1 (zh) 数据均衡方法、装置及计算机设备
KR20220083710A (ko) 블록 디바이스의 구성
WO2020083106A1 (zh) 存储系统中的节点扩容方法和存储系统
US10846014B2 (en) Concept for group memory write instructions
WO2023000696A1 (zh) 一种资源分配方法及装置
WO2020024113A1 (zh) 一种内存交织方法及装置
WO2022257685A1 (zh) 存储系统、网卡、处理器、数据访问方法、装置及系统
CN116560560A (zh) 存储数据的方法和相关装置
WO2020215223A1 (zh) 分布式存储系统和分布式存储系统中垃圾回收方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22844859

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2022844859

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2022844859

Country of ref document: EP

Effective date: 20240207

NENP Non-entry into the national phase

Ref country code: DE