WO2019028799A1 - 一种数据访问方法、装置和系统 - Google Patents
一种数据访问方法、装置和系统 Download PDFInfo
- Publication number
- WO2019028799A1 WO2019028799A1 PCT/CN2017/096958 CN2017096958W WO2019028799A1 WO 2019028799 A1 WO2019028799 A1 WO 2019028799A1 CN 2017096958 W CN2017096958 W CN 2017096958W WO 2019028799 A1 WO2019028799 A1 WO 2019028799A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- storage node
- read
- node
- storage
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 71
- 238000012545 processing Methods 0.000 claims description 55
- 238000013507 mapping Methods 0.000 claims description 33
- 238000004590 computer program Methods 0.000 claims description 10
- 238000004891 communication Methods 0.000 claims description 3
- 238000013461 design Methods 0.000 description 28
- 101000648827 Homo sapiens TPR and ankyrin repeat-containing protein 1 Proteins 0.000 description 25
- 102100028173 TPR and ankyrin repeat-containing protein 1 Human genes 0.000 description 25
- 230000006870 function Effects 0.000 description 16
- 239000003999 initiator Substances 0.000 description 12
- 230000008569 process Effects 0.000 description 10
- 238000004422 calculation algorithm Methods 0.000 description 9
- 238000010586 diagram Methods 0.000 description 9
- 230000009977 dual effect Effects 0.000 description 8
- 238000012005 ligant binding assay Methods 0.000 description 4
- 238000012546 transfer Methods 0.000 description 4
- 230000009286 beneficial effect Effects 0.000 description 3
- 239000000306 component Substances 0.000 description 3
- 230000003993 interaction Effects 0.000 description 3
- 238000007726 management method Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000003491 array Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000011664 signaling Effects 0.000 description 2
- 239000007787 solid Substances 0.000 description 2
- 241001362551 Samba Species 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 239000012141 concentrate Substances 0.000 description 1
- 239000008358 core component Substances 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 239000004744 fabric Substances 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0662—Virtualisation aspects
- G06F3/0665—Virtualisation aspects at area level, e.g. provisioning of virtual or logical volumes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0629—Configuration or reconfiguration of storage systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/0604—Improving or facilitating administration, e.g. storage management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/22—Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
- G06F11/2205—Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing using arrangements specific to the hardware being tested
- G06F11/2221—Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing using arrangements specific to the hardware being tested to test input/output devices or peripheral units
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/10—Address translation
- G06F12/1009—Address translation using page tables, e.g. page table structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/061—Improving I/O performance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/061—Improving I/O performance
- G06F3/0611—Improving I/O performance in relation to response time
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/0614—Improving the reliability of storage systems
- G06F3/0617—Improving the reliability of storage systems in relation to availability
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0638—Organizing or formatting or addressing of data
- G06F3/064—Management of blocks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0653—Monitoring storage devices or systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0655—Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
- G06F3/0656—Data buffering arrangements
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0655—Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
- G06F3/0659—Command handling arrangements, e.g. command buffers, queues, command scheduling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/067—Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/0671—In-line storage system
- G06F3/0673—Single storage device
- G06F3/0679—Non-volatile semiconductor memory device, e.g. flash memory, one time programmable memory [OTP]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/0671—In-line storage system
- G06F3/0683—Plurality of storage devices
- G06F3/0689—Disk arrays, e.g. RAID, JBOD
Definitions
- the present application relates to the field of storage technologies, and in particular, to a data access method, apparatus, and system.
- FIG. 1 is a schematic structural diagram of a storage system provided by the prior art.
- the storage system is connected to the host through two switches.
- the storage system also includes a plurality of dual control arrays coupled to each switch.
- Each dual control array includes two storage controllers and a plurality of hard disk drives (HDDs) connected to each storage controller.
- Two storage controllers are connected through a redundant mirror channel to implement mirroring operations in the write data flow.
- each dual control array acts as a dual control array unit, and each dual control array unit corresponds to a part of a logical block address (LBA) of the host.
- LBA logical block address
- the read/write request sent by the host is forwarded by the switch to the dual control array unit corresponding to the LBA carried by the read/write request. Then, the dual control array unit locally performs read/write operations of data.
- LBA logical block address
- the system architecture shown in Figure 1 is based on HDD.
- NVMe nonvolatile memory
- SSDs solid state disks
- the performance of NVMe SSD has been enhanced hundreds of times or even thousands of times, such as Intel's P3600NVMe SSD, read-only IOPS reached 450,000 times, and only IOPS has reached 70,000 times, of which IOPS is The English abbreviation for the number of input/output operations per second.
- the processing architecture shown in FIG. 1 concentrates on two storage controllers, and the storage controller has limited processing capability, the dual-control array storage architecture shown in FIG. 1 is no longer suitable for NVMe SSD is a storage system for storage media, and it is urgent to provide a new system architecture.
- the embodiment of the present application provides a data access method, apparatus, and system for providing a storage system suitable for using an NVMe SSD as a storage medium.
- the first aspect provides a data access method, which is applied to a first storage node in a storage system, where the first storage node communicates with at least one second storage node in the host and the storage system through the switch, and at least one second storage node
- the included physical disk is mapped to a virtual disk of the first storage node.
- the method may include: receiving a first write request, the first write request carrying the first data to be written; then, striping the first data to be written to obtain stripe data; and writing the stripe data into the first a physical disk and/or a virtual disk of a storage node; and a write location of the recorded stripe data.
- the first storage node may be any one of the storage nodes in the storage system.
- the first write request received by the first storage node may be a first write request sent by the host, or may be a first write request from the host forwarded by any one of the second storage nodes.
- part or all of the physical disks (for example, memory chips) included in each storage node may be mapped to other storage nodes as virtual disks of other storage nodes, such as but not limited to mapping by the NOF protocol, and thus, compared
- the prior art can be free from the limitation of the processing capability of the CPU or the storage controller in the dual control array, and can greatly improve the processing capability of the storage system.
- the striped data is written to the physical disk of the second storage node that maps the virtual disk.
- the first storage node sends the corresponding striping data to the corresponding second storage node, and then the second storage node stores the received data in the local disk (ie, the physical disk mapped to the virtual disk). )in.
- the fingerprint of the first data to be written is also recorded when the write position of the stripe data is recorded.
- the write position of the stripe data and the fingerprint of the first data to be written are recorded in the distribution information of the first data to be written.
- the LBA of the first data to be written is also recorded, wherein the LBA is the LBA carried in the write request.
- the write position of the stripe data is recorded in the distribution information of the first data to be written, and the LBA of the first data to be written.
- the first storage node may also perform other steps:
- the first storage node may receive a second write request sent by the host, and the second write request carries the second write data; and then, determine, according to the second write request, the home node of the second write request, if The home node of the second write request is the first storage node, and the first storage node performs a write operation on the second write request, and if the home node of the second write request is the second storage node, the first storage node is the second The write request is forwarded to the second storage node to cause the second storage node to perform a write operation on the second write request.
- the write operation reference may be made to the technical solution provided above or the specific implementation manners below, and details are not described herein again.
- determining the home node of the second write request according to the second write request may include: calculating a fingerprint of the second data to be written; and then determining, according to the fingerprint of the second data to be written, the second write request. Home node.
- the home node of the second write request is specifically the home node of the second data to be written.
- the method may further include: determining a home node of the LBA carried by the second write request; wherein, the home node of the LBA is configured to manage a mapping relationship between the LBA and the fingerprint of the second data to be written. .
- determining the home node of the second write request according to the second write request may include: determining, by the LBA carried by the second write request, the home node of the second write request.
- the home node of the second write request is specifically the home node of the LBA carried by the second write request.
- the above is an example of the steps performed by the first storage node in the write data flow.
- the following describes the steps performed by the first storage node in the read data flow.
- the first storage node receives the fingerprint of the first data to be read requested by the first read request; and then, according to the fingerprint of the first data to be read, obtains the write position of the first data to be read, And reading the stripe data of the first data to be read from the writing position of the first data to be read.
- the mapping relationship between the write location of the first data to be written and the fingerprint of the first data to be written is stored in the first storage node.
- the first storage node first read request, the first read request carries the first LBA; and then, according to the first LBA, the write location of the first to-be-read data requested by the first read request is obtained. And reading the stripe data of the first data to be read from the write position of the first data to be read.
- the first storage node stores a mapping relationship between the write location of the first data to be written and the first LBA.
- the first storage node may perform other steps:
- the first storage node receives the second read request sent by the host;
- the second read request determines the home node of the second read request. If the home node of the second read request is the first storage node, the first storage node performs a read operation on the second read request, if the home node of the second read request is And the second storage node forwards the second read request to the second storage node, so that the second storage node performs a read operation on the second read request.
- the read operation reference may be made to the technical solution provided above or the specific implementation manners below, and details are not described herein again.
- determining the home node of the second read request according to the second read request may include: determining a home node of the LBA carried by the second read request, where the home node of the LBA is used to manage the LBA and the second read request. Mapping the fingerprint of the requested second data to be read; then acquiring the fingerprint of the second data to be read from the home node of the second LBA; and determining the home node of the second read request according to the fingerprint of the second data to be read .
- the home node of the LBA carried by the second read request is determined, and then the fingerprint of the second data to be read and the home node of the second read request are obtained from the home node of the second LBA.
- the home node of the second read request is specifically the home node of the second data to be read.
- determining the home node of the second read request according to the second read request may include: determining, according to the LBA carried by the second read request, the home node of the second read request.
- the home node of the second read request is specifically the home node of the LBA carried by the second read request.
- a storage node may perform function module division on the storage node according to the foregoing method example.
- each function module may be divided according to each function, or two or more functions may be integrated.
- a processing module In a processing module.
- a storage node comprising: a memory and a processor, wherein the memory is for storing a computer program that, when executed by the processor, causes any of the methods provided by the first aspect to be performed.
- the memory may be a memory and/or a memory chip or the like.
- the processor can be a CPU and/or a control memory or the like.
- a data access system comprising: any one of the storage nodes provided by the second aspect and the third aspect, wherein the storage node is connected to at least one second storage node in the host and the storage system through the switch, at least A physical disk included in a second storage node is mapped to a virtual disk of the storage node.
- the present application also provides a computer readable storage medium having stored thereon a computer program that, when executed on a computer, causes the computer to perform the method described in the first aspect above.
- the application also provides a computer program product, when run on a computer, causing the computer to perform the method of any of the above aspects.
- the present application also provides a communication chip in which instructions are stored that, when run on a storage node, cause the storage node to be described in the first aspect above.
- any of the devices or computer storage media or computer program products provided above are used to perform the corresponding methods provided above, and therefore, the beneficial effects that can be achieved can be referred to the beneficial effects in the corresponding methods. , will not repeat them here.
- FIG. 1 is a schematic structural diagram of a storage system provided by the prior art
- FIG. 2 is a schematic diagram of a system architecture applicable to the technical solution provided by the embodiment of the present application.
- FIG. 3 is a schematic diagram of mapping between a physical disk and a virtual disk according to an embodiment of the present disclosure
- FIG. 4a is a front view of a hardware form of the system architecture shown in FIG. 2;
- 4b is a rear view of a hardware form of the system architecture shown in FIG. 2;
- 4c is a top view of a hardware form of the system architecture shown in FIG. 2;
- FIG. 5 is a schematic diagram of an expanded system architecture of the system architecture shown in FIG. 2;
- FIG. 6 is a flowchart 1 of a method for writing data according to an embodiment of the present application.
- FIG. 7 is a flowchart of a method for reading data based on FIG. 6 according to an embodiment of the present application.
- FIG. 8 is a second flowchart of a method for writing data according to an embodiment of the present disclosure.
- FIG. 9 is a flowchart of a method for reading data based on FIG. 8 according to an embodiment of the present application.
- FIG. 10 is a third flowchart of a method for writing data according to an embodiment of the present disclosure.
- FIG. 11 is a flowchart of a method for reading data based on FIG. 10 according to an embodiment of the present application.
- FIG. 12 is a flowchart 4 of a method for writing data according to an embodiment of the present disclosure.
- FIG. 13 is a flowchart of a method for reading data based on FIG. 12 according to an embodiment of the present application.
- FIG. 14 is a schematic structural diagram of a storage node according to an embodiment of the present disclosure.
- the term "plurality” as used herein refers to two or more.
- the terms “first”, “second”, etc. are used herein only to distinguish different objects, and the order is not limited.
- the first storage node and the second storage node are used to distinguish different objects, and the order is not Limited.
- the term “and/or” in this context is merely an association describing the associated object, indicating that there may be three relationships, for example, A and/or B, which may indicate that A exists separately, and both A and B exist, respectively. B these three situations.
- the character "/" in this article generally indicates that the contextual object is an "or" relationship.
- FIG. 2 is a schematic diagram showing a system architecture to which the technical solution provided by the present application is applied.
- the system architecture shown in FIG. 2 can include a host 1 and a storage system 2.
- the storage system 2 may include a switch 21, and a plurality of storage nodes 22 connected to the switch 21, respectively. It can be understood that, in order to improve reliability, at least two switches 21 can be generally disposed in the storage system 2, and each storage node 22 is connected to each switch 21 at this time.
- the storage system 2 includes two switches 21 as an example for description.
- the switch 21 is configured to connect each storage node 22 with the storage node 22 and the host 1.
- the switch 21 can be, for example but not limited to, an ethernet switch, an infiniband switch, a PCIe switch, or the like.
- the switch 21 may include an internal switch port 211 and a storage service port 212.
- the switch port 213 may also be included.
- the internal switch port 211 is a port that connects to the storage node 22.
- One or more internal switch ports 211 may be provided on each switch 22, and each internal switch port 211 may be connected to an internal port 220 of one storage node 22.
- the storage service port 212 is a port connected to the host 1 for providing external storage services.
- One or more storage service ports 212 may be provided on each switch 21.
- the expansion port 213 is used to connect to other switches 21 to implement horizontal expansion of the plurality of storage systems 2. It should be noted that the above several ports are divided from the use, and in physical implementation, these ports may be the same.
- the expansion port 213 can be used as the storage service port 212, and other examples are not enumerated.
- the internal switch port 211 can be used as the storage service port 212 or the expansion port 213.
- it can be set according to the hardware form of the storage system. For example, in the hardware configuration shown in FIGS. 4a to 4c, since the internal switch port 211 is located inside the chassis, and the storage service port 212 and the expansion port 213 are located on the chassis surface, generally, the internal switch port 211 is not used. A service port 212 or an expansion port 213 is stored.
- the storage node 22 is a core component of the storage system that provides input/output (I/O) processing capabilities and storage space.
- one or more internal ports 220 may be disposed on each storage node 22, wherein the internal ports 220 are ports connecting the internal switch ports 211 of the switch 21, and each internal port 220 may be connected to one switch 21.
- Internal port 220 can be provided, for example, but not limited to, by a remote direct memory access (RDMA) network card. If switch 21 is an Ethernet switch, then a redundant Ethernet network, or internal Ethernet of storage system 2, is formed to help achieve connectivity when any port or connection or switch fails.
- RDMA remote direct memory access
- the storage node shown in FIG. 2 includes an execution and I/O module 221, and one or more storage modules 222 that are coupled to the I/O module 221.
- the execution and I/O module 221 is responsible for the input and output of I/O requests (including read/write requests) and the execution of related processing flows.
- the execution and I/O module 221 may be at least one central processing unit (CPU) connected to at least one RDMA network card through an I/O bus.
- CPU central processing unit
- RDMA RDMA network card
- An internal port 220 is provided on the RDMA network card to connect to the switch 21.
- the I/O bus can be, for example but not limited to, a fast external component interconnect express (PCIe).
- PCIe fast external component interconnect express
- some or all of the CPU, I/O bus, and RDMA network card herein may be integrated, such as a system on chip (Soc) or a field programmable gate array (field- Programmable gate array (FPGA); can also be a general-purpose device, such as through a CPU (such as Xeon CPU) and a general-purpose RDMA network card.
- Soc system on chip
- FPGA field- Programmable gate array
- the execution and I/O module 221 is connected to the memory module 222 via an internal I/O bus.
- the storage module 222 can include at least one storage controller and a plurality of storage chips connected to each storage controller.
- the memory chip may be a NandFlash chip, or may be other non-volatile memory chips, such as a phase change memory (PCM), a magnetic random access memory (MRAM), and a resistive change. Resistive random access memory (RRAM).
- the memory controller can be an application specific integrated circuit (ASIC) chip or an FPGA.
- ASIC application specific integrated circuit
- the physical form of the storage module herein may be a general solid state drive (SSD), or the storage controller and the storage chip may be connected to the I/O module 221 through the I/O bus. stand up.
- the storage node 22 when the execution is combined with the I/O module 221 and the storage module 222 by a common component, such as a general-purpose CPU (such as X86 Xeon), a general-purpose RDMA network card, a general-purpose SSD, and the like,
- a common component such as a general-purpose CPU (such as X86 Xeon), a general-purpose RDMA network card, a general-purpose SSD, and the like
- the storage node 22 is a general purpose server.
- the host and the storage system can be accessed through a NOF (NVMe over Fabric) protocol.
- NOF NVMe over Fabric
- Some or all of the physical disks (eg, memory chips) included in each storage node may be mapped to other storage nodes as virtual disks of other storage nodes. For example, mapping based on the NOF protocol.
- the software system in the storage node ie, the instruction executed by the CPU or the memory controller
- the present application provides a distributed storage system.
- different storage nodes communicate with each other through switches, and access each other through RDMA network card and RDMA provided by NOF protocol.
- NOF RDMA network card
- NOF RDMA
- Figure 3 shows a schematic diagram of a mapping between a physical disk and a virtual disk.
- Figure 3 is the storage system
- the storage node includes 16 storage nodes, which are numbered from 1 to 16, and each of the storage nodes 2 to 15 maps the physical disk to the storage node 1 as a virtual disk of the storage node 1 as an example. of.
- the description will be made by taking an example in which the memory chip is an NVMe SSD.
- an implementation manner in which each of the storage nodes 2 to 15 maps the physical disk therein to the storage node 1 is: when the storage system is initialized, the storage nodes 2 to 15 are respectively configured to allow mapping to the storage node 1 Information of the physical disk; then establishing a connection with the storage node 1, respectively.
- the storage node 1 can obtain information that the storage nodes 2 to 15 respectively determine the physical disk that is allowed to be mapped to the storage node 1, and assign a drive letter to the physical disk mapped to the storage node 1, as the virtual disk of the storage node 1. And record the mapping relationship between the virtual disk and the remote physical disk.
- the software system on the storage node 1 can be aware of 16 NVMe SSDs, but only one NVMe SSD is actually local, and the remaining 15 NVMe SSDs are virtualized by the NVMe SSD of other storage nodes through the NOF protocol. . Due to the low latency nature of the NOF protocol, the performance difference between accessing a local disk (ie, a physical disk) and a virtual disk can be neglected.
- the goal of the NOF protocol is to decouple the NVMe SSD from the local computer system. That is, the remote NVMe SSD can be connected to the local computer using the RDMA network card and "seen" in the local computer system. Is a virtual NVMe SSD. Due to the use of RDMA technology, the performance of the remote NVMe SSD (ie virtual NVMe SSD) is basically the same as that of the local NVMe SSD (ie physical NVMe SSD). NOF inherits all NVMe commands and adds some administrative commands, such as Authentication Send, Authentication Receive, Connect, Property Get, and Property Set.
- the data transfer mode and process of the NOF protocol have some changes relative to the original NVMe protocol, which may include: using RDMA to transfer commands (such as read/write requests)/data instead of the PCIe memory space used by the NVMe.
- the mapping method because in the NOF system, the initiator and the target cannot "see” the other's memory space.
- the initiator may be a host, and the target may be a storage node. In another implementation, the initiator can be a storage node and the target can be another storage node.
- the read data flow may include: after receiving the read request (ie, the READ command), the target device obtains the address information of the cache to be written to the initiator according to the read request. The target then initiates RDMA_WRITE to the initiator to write the read data to the host cache. The target then initiates RDMA_SEND to the initiator to inform the initiator that the transfer is complete.
- the write data flow may include: the initiator assembles the write request (ie, the Write command) and sends it to the target through RDMA_SEND. After the target receives the write request, it starts RDMA_READ and gets the data to be written by the write request from the initiator. After the target receives the data that the initiator responds, it initiates RDMA_SEND to the initiator to notify the initiator that the transfer is complete.
- the initiator assembles the write request (ie, the Write command) and sends it to the target through RDMA_SEND.
- the target After the target receives the write request, it starts RDMA_READ and gets the data to be written by the write request from the initiator. After the target receives the data that the initiator responds, it initiates RDMA_SEND to the initiator to notify the initiator that the transfer is complete.
- the present application does not limit the hardware configuration of the switch 21 and the storage node 22.
- the storage node 22 and the switch 21 can exist in one chassis.
- the storage node 22 is implemented by a general purpose server
- the storage node 22 and the switch 21 may exist in one rack.
- the chassis may include one or more switches, one or more power sources, multiple storage nodes, and a backplane connecting the storage node 22 and the switch 21.
- Figures 4a-4c show a hardware form of the system architecture shown in Figure 2.
- 4a shows a front view of a rack frame
- FIG. 4b shows a rear view of the rack frame
- FIG. 4c shows a top view of the rack frame.
- the rack frame includes 2 Ethernet switches and 4 redundancy. The remaining power, 16 storage nodes, and the backplane that connects the storage node to the Ethernet switch.
- the rack frame is provided with 16 empty slots, and each slot can be used to insert a storage node. It may not be fully populated, and at the same time, in order to meet the redundancy requirement, a minimum of 2 storage nodes can be inserted.
- One or more handle bars are disposed on each storage node for inserting the storage node into the empty slot.
- the storage service port and the expansion port are provided through an Ethernet switch.
- the service port here can be a variety of Ethernet speed (such as 10G/40G/25G/50G/100G) ports; the expansion port can be a high-speed port (such as 40G/50G/100G).
- the internal ports of the Ethernet switch and the internal ports of the storage node can be connected through the backplane.
- FIG. 4a to FIG. 4c is only an example of the hardware configuration of the system architecture shown in FIG. 2, and does not constitute a limitation on the hardware configuration of the system architecture shown in FIG. 2. .
- Figure 5 provides an extended system architecture.
- the storage system provides a storage service externally through the storage service port, and is connected to the M hosts through a network, where M is an integer greater than or equal to 1.
- the network here can consist of a network that is directly connected or through a switch. If the network is an Ethernet network, the external services of the storage system may be provided using an Ethernet-based storage protocol, including but not limited to any of the following: iSCSI, NOF, iSER, NFS, Samba, and the like.
- the storage system can also be scaled horizontally through the expansion port, as shown in FIG. 5, including N storage systems, where N is an integer greater than or equal to 2.
- scale-out can be connected through a directly connected network or through a switch.
- the storage system includes 16 storage nodes, and the storage nodes 1 to 16 are used as an example. It should also be noted that the steps performed by each storage node may be performed by a CPU and/or a storage controller in the storage node.
- FIG. 6 is a flowchart of a method for writing data applied to the storage system shown in FIG. 2 according to an embodiment of the present application. specific:
- S101 The host sends a write request to the storage system, where the write request includes LBA1 and data to be written.
- the storage node 1 in the storage system receives the write request. Specifically, the host forwards the write request to the storage node 1 via the switch.
- the storage node 1 may be any storage node in the storage system.
- the write request is backed up.
- S102 to S103 it can be understood that S102 to S103 are optional steps.
- the storage node 1 sends the write request to the mirror node of the storage node 1, such as the storage node 2. Specifically, the storage node 1 sends the write request to the storage node 2 via the switch. The storage node 2 receives the write request.
- Any two storage nodes in the storage system may be mirror nodes of each other.
- which two storage nodes are mutually mirrored nodes may be preset according to certain rules.
- the rule may be, for example, but not limited to, configured according to a certain rule to implement load balancing.
- the load balancing here means that the step of performing the mirroring operation is evenly shared by each storage node as much as possible. For example, sequentially numbering two storage nodes adjacent to each other as the other.
- the mirror node of this for example, the storage node 1 and the storage node 2 are mutually mirrored nodes, and the storage node 3 and the storage node 4 are mirror nodes of each other...
- the storage node 2 caches the write request, and returns a mirror completion indication to the storage node 1, and the storage node 1 receives the mirror completion indication.
- the storage system sends a write operation completion indication to the host.
- some or all of the following steps S105 to S118 are continuously executed in the storage system, and writing of the data to be written in the write request is completed.
- the storage node 1 generates a fingerprint of the data to be written, and determines a home node, such as the storage node 3, to be written according to the fingerprint.
- the fingerprint of the data is used to uniquely mark the characteristics of the data.
- the fingerprint of the data can be understood as the identity (ID) of the data. If the fingerprints of the two data are the same, the two data are considered to be the same. If the fingerprints of the two data are different, the two data are considered to be different.
- the present application does not limit how to calculate the fingerprint of the data, for example, by performing a hash operation on the data, wherein the hash operation may be, for example but not limited to, a secure hash algorithm 1 (secure hash algorithm 1). SHA-1), cyclic redundancy check (CRC) 32, etc., wherein CRC32 is a specific implementation of CRC, which can generate a 32-bit check value. Taking SHA-1 as an example, after hashing the data, a 160-bit digest is obtained, which is the fingerprint of the data.
- the home node of the data is the storage node that performs the write operation on the data.
- the application does not limit the home node of the data, for example, but not limited to, determining the home node of the data according to a certain algorithm to implement load balancing.
- the load balancing means that the step of performing the write operation is performed by each storage as much as possible. Nodes are evenly shared.
- the algorithm can be a modulo operation.
- the fingerprint is subjected to a modulo operation. If the obtained value is a, the home node of the data is the storage node a+1, where a ⁇ 0, a is an integer, and the storage nodes in the storage system are numbered from 1 . For example, if there are 16 storage nodes in the storage system and the fingerprint of the data is 65537, the 65537 can be modulo 16 to obtain 1, that is, the home node of the data is the storage node 2.
- the home node of the data to be written is determined according to the fingerprint of the data
- the mirror node of the storage node 1 is determined according to the storage node 1, and the two are not associated with each other, and may be the same storage node or different storage. Nodes, which are different in the present embodiment, are described as an example.
- the storage node that receives the write request sent by the host can also be used as the home node of the data to be written carried by the write request.
- the storage node 1 can also be used as the home node of the data to be written carried by the write request.
- the fingerprint of the data to be written is 65536
- 65536 can be modulo 16 to obtain 0, that is, the home node to be written data is the storage node 1.
- the storage node 1 forwards the write request to the home node (such as the storage node 3) to which the data is to be written.
- the storage node 3 receives the write request.
- the storage node 3 queries the data distribution information set, and determines whether the set contains the fingerprint of the data to be written.
- the home node of the data can manage the data distribution information set.
- Data distribution managed by the home node of the data The number of data distribution information included in the information set increases as the number of disk operations performed by the storage node increases.
- the data distribution information set managed by the storage node may be considered to be empty, or the data distribution information set managed by the storage node may not be established in the storage system.
- each data distribution information set may include distribution information of at least one data.
- the distribution information of the data can be represented by the metadata table M1, and the related description of M1 is as follows.
- S108 The storage node 3 performs striping on the write data to obtain stripe data, and writes the stripe data to the physical disk and/or the virtual disk of the storage node 3.
- This step can be understood as redundant processing of data.
- the basic principle is: breaking up a complete data (specifically, the data carried in the write request) to obtain multiple data blocks, and optionally generating one or more data. Check blocks. These data blocks and check blocks are then stored in different disks (ie disks).
- the striping data in S108 may include a data block, and may further include a parity block.
- the present application does not limit the redundancy processing manner, and may be, for example but not limited to, a redundant array of independent disks (RAID) or an erasure coding (EC).
- the virtual disk can be used as a local disk, so that the storage The node 3 can select the virtual disk as the disk written by the striped data, and when writing the data to the virtual disk, the storage node 3 can first determine the physical disk of the other storage node that maps the virtual disk, and then according to the NOF protocol, The data block written to the virtual disk is written to the determined physical disk of the other storage node by RDMA.
- the storage system includes 16 storage nodes, and the redundancy processing mode is an example of the EC.
- the storage node 3 performs striping according to the EC algorithm to generate write data, and obtains 14 data blocks and 2 Check blocks. Each of these 16 blocks is then written to a storage node of the storage system.
- the storage node 3 records the write position of the stripe data. Specifically, the storage node 3 can record the write position of the stripe data by recording the distribution information of the data to be written.
- the distribution information of the data can be represented by the metadata table M1, and the elements included in one metadata table M1 can be as shown in Table 1.
- FingerPrint Data fingerprint hostLBA Write the LBA carried in the request hostLength Total length of data Seg.type Whether each block in the striped data is a data block or a check block Seg.diskID ID of the disk (which can be a virtual disk or a physical disk) written in each block in the striped data Seg.startLBA The starting LBA of each block in the striped data in the written disk Seg.length The length of each block in the striped data
- FingerPrint can be used to indicate the writing position of the stripe data of a data.
- hostLBA represents the LBA used by the host and storage system information to interact. What Seg.startLBA represents is the starting LBA address of the data written in the storage module. This application is on Table 1
- the recording mode of each element in the strip is not limited. For example, if the length of each block in the stripe data is the same, one length may be recorded, and other examples are not listed one by one.
- the distribution information of the data to be written recorded by the storage node 3 may include: a fingerprint of the data to be written, LBA1, the total length of the data to be written, and 14 data blocks of the data to be written. And information such as the type of each of the two redundant blocks, the ID of the written disk, the length, and the like.
- S108 to S109 may be referred to as a home node of data to perform a write operation on the write request/to-be-written data.
- the home node of the data may perform redundancy processing on the metadata table M1.
- S110 is an optional step.
- the storage node 3 writes the write position of the stripe data to the physical disk and/or the virtual disk of the storage node 3. Specifically, the storage node 3 can write the write location of the stripe data to the physical disk of the storage node 3 and/or by writing the distribution information of the data to be written to the physical disk and/or the virtual disk of the storage node 3. Virtual disk.
- This step can be understood as redundant processing of the distribution information of the data. It can be understood that this step is an optional step.
- the present application does not limit the redundancy processing manner, such as but not limited to multiple copies, EC or RAID. Taking three copies as an example, the storage node 3 can locally store a distribution information of the data to be written, and then select two storage nodes from the storage system, and then copy the distribution information of the data to be written into two copies, each Writes to one of the two storage nodes.
- the application does not limit how to select the two storage nodes, for example, but not limited to, using a modulo operation.
- S111 The storage node 3 feeds back a write operation completion indication to the storage node 1, and the storage node 1 receives the write operation completion indication.
- S105 if the home node of the data to be written determined by the storage node 1 is the storage node 1, the S106 and S111 may not be executed, and the execution subject of S107 to S111 is the storage node 1.
- the storage node 1 acquires a home node of the LBA1 carried by the write request, for example, the storage node 4.
- the home node of the LBA is used to manage the mapping relationship between the LBA and the fingerprint.
- the application does not limit the home node of the LBA. For example, but not limited to, determining the home node of the LBA according to a certain algorithm to implement load balancing, where load balancing refers to performing mapping between the management LBA and the fingerprint.
- the steps of the relationship are shared as evenly as possible by the storage nodes.
- the algorithm can be a modulo operation.
- the home node of the data is determined according to the fingerprint of the data
- the home node of the LBA is determined according to the LBA, and the two are not associated with each other, and may be the same storage node or different storage nodes. There is no association between the home node of the LBA and the mirror node of the storage node 1. This embodiment is described by taking the difference between the home node of the LBA1, the home node to be written data, and the mirror node of the storage node 1.
- S113 The storage node 1 sends the fingerprint of the data to be written and the LBA1 carried by the write request to the storage node 4.
- the storage node 4 records the mapping relationship between the fingerprint of the data to be written and the LBA1 carried by the write request.
- the mapping relationship can be represented by a metadata table M2, and an element included in a metadata table M2 can be as shown in Table 2.
- FingerPrint fingerprint LBA list The LBA list corresponding to the fingerprint NodeID ID of the home node of the data indicated by the fingerprint
- the metadata table M2 includes the above FingerPrint and LBA list.
- an LBA list can include one or more LBAs.
- the LBA list can be represented as a singly linked list.
- the same fingerprint can be mapped to multiple LBAs. For example, suppose the host sends 4 write requests to the storage system. The related information of the 4 write requests is shown in Table 3.
- the metadata table M2 recorded by the storage node A is as shown in Table 4.
- the LBA list corresponding to the fingerprint The home node of the data indicated by the fingerprint Fingerprint of data to be written 1 LBA1, LBA2 Storage node C Fingerprint of data to be written 2 LBA4 Storage node D
- the metadata table M2 recorded by the storage node B is as shown in Table 5.
- S115 The storage node 4 writes the mapping relationship between the fingerprint of the data to be written and the LBA1 carried by the write request to the physical disk and/or the virtual disk of the storage node 4.
- This step can be understood as redundant processing of the mapping relationship between the fingerprint of the data to be written and the LBA1 carried by the write request. It can be understood that this step is an optional step.
- the present application does not limit the redundancy processing manner. For example, reference may be made to the above.
- S116 The storage node 4 feeds back a mapping relationship completion indication to the storage node 1, and the storage node 1 receives the mapping relationship completion indication.
- the LBA can be read from the subsequent data read operation. Read the data to be written at the home node. For details, refer to the flow of reading data described in FIG. 7.
- the storage node 1 sends an instruction to delete the mirror data to the mirror node (such as the storage node 2) of the storage node 1, and the storage node 2 receives the instruction.
- FIG. 7 is a flowchart of a method for reading data applied to the storage system shown in FIG. 2 according to an embodiment of the present application. specific:
- S201 The host sends a read request to the storage system, where the read request carries the LBA1.
- Storage section in the storage system Point 1 receives the write request.
- the host sends a read request to the switch, and after receiving the read request, the switch forwards the read request to the storage node 1.
- the storage node 1 receives the request (including the write request and the read request) from the host as
- the write data flow and the read data flow shown in Embodiment 1 are designed based on the idea that the switch forwards the write request/read request to any storage node, the data is written.
- the storage nodes that receive write requests and read requests from the host can also be different.
- S202 The storage node 1 acquires the home node of the LBA1.
- the determined home node of LBA1 is the storage node 4.
- S203 The storage node 1 sends the read request to the storage node 4, and the storage node 4 receives the read request.
- the storage node 4 obtains a fingerprint of the data to be read according to a mapping relationship between the LBA and the fingerprint of the data, such as but not limited to the metadata table M2 as shown in Table 2.
- the fingerprint of the data to be read is the fingerprint of the data to be written in the above.
- the storage node 4 can also obtain the home node of the data to be read. According to the above description, the determined home node of the data to be read is the storage node 3.
- the storage node 4 feeds back the fingerprint of the data to be read to the storage node 1, and the storage node 1 receives the fingerprint of the data to be read, and determines the home node of the data to be read, that is, the storage node 3 according to the fingerprint of the data to be read.
- the storage node 4 can also feed back the ID of the home node to be data, that is, the storage node 3, to the storage node 1, so that the storage node 1 does not need to obtain the home node of the data to be read according to the fingerprint, thereby saving the calculation of the storage node 1. the complexity.
- S206 The storage node 1 sends a fingerprint of the data to be read to the storage node 3, and the storage node 3 receives the fingerprint of the data to be read.
- the storage node 3 determines the writing position of the stripe data of the data to be read according to the fingerprint of the data to be read, for example, but not limited to, according to the metadata table M1 shown in Table 1, to obtain the writing position of the striping data. Then, some or all of the striping data is acquired from these write locations.
- the data block of the data to be read can be read without reading the check block.
- the check block of the data to be read may be read, and then the data recovery may be performed, for example, but not limited to, according to a RAID or EC algorithm.
- the storage node 3 constructs the read data block into complete data, that is, performs data before striping. At this point, the storage node 3 is considered to have acquired the data to be read. The storage node 3 feeds back the data to be read to the storage node 1, and the storage node 1 receives the data to be read.
- the read data flow shown in FIG. 7 is described based on the write data flow shown in FIG. 6.
- Those skilled in the art should be able to determine the embodiment in the following scenario according to the read data flow shown in FIG. 7: the attribution of LBA1
- the scenario in which the node is the same as the storage node 1, and/or the scenario in which the home node of the data to be read is the same as the storage node 1, and/or the scenario in which the home node of the LBA1 is the same as the home node of the data to be read. I will not repeat them here.
- the step of performing the read/write operation is allocated to the storage node of the storage system according to the fingerprint of the data
- the step of executing the management fingerprint and the LBA of the host is allocated to the storage node of the storage system according to the LBA. in. This helps achieve load balancing and improves system performance.
- FIG. 8 is a flowchart of a method for writing data applied to the storage system shown in FIG. 2 according to an embodiment of the present application. specific:
- S301 to S304 Reference may be made to the above S101 to S104, but the present application is not limited thereto.
- S306 The storage node 1 sends the write request to the storage node 4, and the storage node 4 receives the write request.
- the storage node 4 performs striping on the write data to obtain stripe data, and writes the stripe data to the physical disk and/or the virtual disk of the storage node 4.
- the storage node 4 records the write position of the stripe data. Specifically, the storage node 4 can record the write position of the stripe data by recording the distribution information of the data to be written.
- the distribution information of the data can be represented by the metadata table M3.
- the element included in the metadata table M3 can be the table obtained by removing the FingerPrint in the above Table 1.
- the storage node 4 writes the write position of the stripe data to the physical disk and/or the virtual disk of the storage node 4.
- the storage node 4 records the mapping relationship between the writing position of the striping data and the LBA1.
- S311 The storage node 4 writes the mapping relationship between the write position of the stripe data and the LBA1 to the physical disk and/or the virtual disk of the storage node 4.
- FIG. 9 is a flowchart of a method for reading data applied to the storage system shown in FIG. 2 according to an embodiment of the present application. specific:
- S401 to S403 Reference may be made to S201 to S203. Of course, the present application is not limited thereto.
- the storage node 4 determines, according to the LBA1, a write location of the stripe data of the data to be read, such as, but not limited to, a metadata table obtained by deleting the fingerprint of the data according to Table 1, and obtaining the write of the stripe data of the data to be read. Into the location. Then, some or all of the striping data is acquired from these write locations.
- a write location of the stripe data of the data to be read such as, but not limited to, a metadata table obtained by deleting the fingerprint of the data according to Table 1, and obtaining the write of the stripe data of the data to be read. Into the location. Then, some or all of the striping data is acquired from these write locations.
- S405 The storage node 4 constructs the read data block into complete data, that is, performs data before striping. At this point, the storage node 4 is considered to have acquired the data to be read. The storage node 4 feeds back the data to be read to the storage node 1.
- the read data flow shown in FIG. 9 is described based on the write data flow shown in FIG. 8. Those skilled in the art should be able to determine the embodiment in the following scenario according to the read data flow shown in FIG. 9: the attribution of LBA1 The same scenario as node 1 of storage node. I will not repeat them here.
- the step of executing the write position of the management data and the LBA of the host is allocated to the storage node of the storage system in accordance with the LBA. This helps achieve load balancing and improves system performance.
- FIG. 10 is a flowchart of a method for writing data applied to the storage system shown in FIG. 2 according to an embodiment of the present application. specific:
- S501 The host sends a write request to the storage system, where the write request includes LBA1 and data to be written.
- the storage node 1 in the storage system receives the write request. Specifically, the host sends a write request to the switch, and after receiving the write request, the switch forwards the write request to the storage node 1 according to the information carried by the write request.
- the difference between the first embodiment and the second embodiment is that the host can carry a specific LBA first.
- the write request is sent to a specific storage node, which can reduce the computational complexity of the storage system.
- the host may pre-store the correspondence between the LBA range and the storage node, for example, the LBAs 1 to 100 correspond to the storage node 1, the LBAs 101 to LBA 200 correspond to the storage nodes 2, and then, the write request carries the LBA.
- the information of the storage node wherein the information of the storage node may include, for example but not limited to, a network address including the storage node, and optionally an ID of the storage node, etc., so that when the switch receives the write request, the switch may Write the information of the storage node written in the request to determine which storage node to forward the write request to.
- S502 to S511 Reference may be made to S102 to S111. Of course, the present application is not limited thereto.
- the storage node 1 records the mapping relationship between the fingerprint of the data to be written and the LBA1 carried by the write request.
- S513 The storage node 1 writes the mapping relationship between the fingerprint of the data to be written and the LBA1 carried by the write request to the physical disk and/or the virtual disk of the storage node 1.
- S514 to S515 Reference may be made to S117 to S118. Of course, the present application is not limited thereto.
- FIG. 11 is a flowchart of a method for reading data applied to the storage system shown in FIG. 2 according to an embodiment of the present application. specific:
- S601 The host sends a read request to the storage system, where the read request carries LBA1.
- the storage node 1 in the storage system receives the write request. Specifically, the host sends a read request to the switch, and after receiving the read request, the switch forwards the read request to the storage node 1 according to the read request.
- the read data flow shown in FIG. 11 is explained based on the write data flow shown in FIG.
- the storage node 1 obtains a fingerprint of the data to be read according to a mapping relationship between the LBA and the fingerprint of the data, such as but not limited to the metadata table M2 as shown in Table 2.
- the fingerprint of the data to be read is the fingerprint of the data to be written in the above.
- the storage node 1 can also obtain the home node of the data to be read according to the information recorded by the metadata table M2 or according to the above description. According to the above description, the determined home node of the data to be read is the storage node 3.
- the storage node 1 acquires a home node of the data to be read, such as the storage node 3
- S604 to S607 Reference may be made to S206 to S209. Of course, the present application is not limited thereto.
- the host determines, according to the correspondence between the LBA and the storage node, which storage node to send the read/write request to the storage system, that is, the storage node does not need to determine the home node of the LBA, so that The signaling interaction between storage nodes can be reduced, thereby increasing the read/write rate.
- the present embodiment contributes to load balancing, thereby improving system performance.
- FIG. 12 is a flowchart of a method for writing data applied to the storage system shown in FIG. 2 according to an embodiment of the present application. specific:
- S701 to S704 Reference may be made to S501 to S504. Of course, the present application is not limited thereto.
- S705 The storage node 1 performs striping on the write data to obtain stripe data, and writes the stripe data to the physical disk and/or the virtual disk of the storage node 1.
- the storage node 1 records the write position of the stripe data. Specifically, the storage node 1 can pass the record The distribution information of the write data is recorded to record the write position of the stripe data.
- S707 The storage node 1 writes the write position of the stripe data to the physical disk and/or the virtual disk of the storage node 1.
- S708 The storage node 1 records the mapping relationship between the writing position of the striping data and the LBA1.
- the storage node 1 writes the mapping relationship between the write position of the stripe data and the LBA1 to the physical disk and/or the virtual disk of the storage node 1.
- FIG. 13 is a flowchart of a method for reading data applied to the storage system shown in FIG. 2 according to an embodiment of the present application. specific:
- the storage node 1 determines, according to the LBA, a write location of the stripe data of the data to be read, such as, but not limited to, a metadata table obtained by deleting the fingerprint of the data according to Table 1, and obtaining the write of the stripe data of the data to be read. Into the location. Then, some or all of the striping data is acquired from these write locations.
- a write location of the stripe data of the data to be read such as, but not limited to, a metadata table obtained by deleting the fingerprint of the data according to Table 1, and obtaining the write of the stripe data of the data to be read. Into the location. Then, some or all of the striping data is acquired from these write locations.
- S803 The storage node 1 constructs the read data block into complete data, that is, performs data before the striping.
- S804 The storage node 1 feeds back the data to be read to the host.
- the host determines, according to the correspondence between the LBA and the storage node, which storage node to send the read/write request to the storage system, that is, the storage node does not need to determine the home node of the LBA, so that The signaling interaction between storage nodes can be reduced, thereby increasing the read/write rate.
- the present embodiment contributes to load balancing, thereby improving system performance.
- each node such as a host or a storage node.
- each node such as a host or a storage node.
- it includes hardware structures and/or software modules corresponding to the execution of the respective functions.
- the present application can be implemented in a combination of hardware or hardware and computer software in combination with the elements and algorithm steps of the various examples described in the embodiments disclosed herein. Whether a function is implemented in hardware or computer software to drive hardware depends on the specific application and design constraints of the solution. A person skilled in the art can use different methods to implement the described functions for each particular application, but such implementation should not be considered to be beyond the scope of the present application.
- the embodiment of the present application may perform the division of the function module on the storage node according to the foregoing method example.
- each function module may be divided according to each function, or two or more functions may be integrated into one processing module.
- the above integrated modules can be implemented in the form of hardware or in the form of software functional modules. It should be noted that the division of the module in the embodiment of the present application is schematic, and is only a logical function division, and the actual implementation may have another division manner. The following is an example of dividing each functional module by using corresponding functions:
- FIG. 14 shows a schematic structural diagram of a storage node 140.
- the storage node 140 can be any of the storage nodes referred to above.
- the storage node 140 is connected to the host and the at least one second storage node in the storage system, and the physical disk included in the at least one second storage node is mapped to the virtual disk of the storage node 140.
- the storage node 140 includes: a transceiver unit 1401. Unit 1402 and storage unit 1403. Transceiver unit 1401, And receiving the first write request, where the first write request carries the first to-be-written data.
- the processing unit 1402 is configured to perform striping on the first data to be written to obtain stripe data, and write the stripe data into a physical disk and/or a virtual disk of the storage node 140.
- the storage unit 1403 is configured to record a stripe data writing position.
- the storage node 140 may be the storage node 3
- the transceiver unit 1401 may be configured to execute S106/S506,
- the processing unit 1402 may be configured to execute S108/S508, and the storage unit 1403 may be configured to execute S109/S509.
- storage node 140 can be storage node 4
- transceiver unit 1401 can be used to execute S306, processing unit 1402 can be used to execute S307
- storage unit 1403 can be used to execute S308.
- storage node 140 can be storage node 1
- transceiver unit 1401 can be used to execute S701
- processing unit 1402 can be used to execute S705
- storage unit 1403 can be used to execute S706.
- the processing unit 1402 may be specifically configured to: when the stripe data is written into the virtual disk, write the stripe data into the physical disk of the second storage node that maps the virtual disk.
- the storage unit 1403 can also be configured to: when recording the stripe data writing location, also record the fingerprint of the first data to be written. For example, reference can be made to Table 1 in Example 1.
- the storage unit 1403 may be further configured to: when recording the stripe data writing position, also recording the LBA of the first data to be written.
- the storage unit 1403 can be used to execute S310/S708.
- the transceiver unit 1401 is further configured to: receive a second write request sent by the host, and the second write request carries the second data to be written.
- the processing unit 1402 is further configured to: determine, according to the second write request, the home node of the second write request, if the home node of the second write request is the storage node 140, perform a write operation on the second write request by the storage node 140, if The home node of the second write request is the second storage node, and the storage node 140 forwards the second write request to the second storage node, so that the second storage node performs a write operation on the second write request.
- storage node 140 can be storage node 1.
- the transceiver unit 1401 can be configured to execute S101/S301/S501.
- Processing unit 1402 can be used to execute S105/S305/S505.
- the processing unit 1402 may be specifically configured to: calculate a fingerprint of the second data to be written; and then determine a home node of the second write request according to the fingerprint of the second data to be written.
- processing unit 1402 can be used to execute S105/S 505 in conjunction with FIG. 6 or FIG.
- the processing unit 1402 is further configured to: determine a home node of the LBA carried by the second write request; the home node of the LBA is configured to manage a mapping relationship between the LBA and the fingerprint of the second data to be written.
- processing unit 1402 can be used to execute S112.
- processing unit 1402 may be specifically configured to: determine, according to the LBA carried by the second write request, the home node of the second write request. For example, in conjunction with FIG. 8, processing unit 1402 can be used to execute S305.
- the transceiver unit 1401 is further configured to receive a fingerprint of the first data to be read requested by the first read request.
- the processing unit 1402 is further configured to: obtain a write position of the first data to be read according to the fingerprint of the first data to be read, and read a strip of the first data to be read from the write position of the first data to be read. Data.
- storage node 140 may be storage node 3.
- the transceiver unit 1401 can be configured to execute S206/S604, and the processing unit 1402 can be configured to execute S207/S605.
- the transceiver unit 1401 is further configured to receive a first read request, where the first read request carries the first LBA.
- the processing unit 1402 is further configured to: obtain, according to the first LBA, a write position of the first data to be read requested by the first read request, and read the first data to be read from the write position of the first data to be read. Striped data.
- the storage node 140 may be a storage node 4, and the transceiver unit 1401 may be used for Executing S403, the processing unit 1402 may be configured to execute S404.
- storage node 140 may be storage node 1
- transceiver unit 1401 may be configured to execute S801
- processing unit 1402 may be configured to execute S803.
- the transceiver unit 1401 is further configured to receive a second read request sent by the host.
- the processing unit 1402 is further configured to: determine, according to the second read request, the home node of the second read request, if the home node of the second read request is the storage node 140, perform a read operation on the second read request by the storage node 140, if The home node of the second read request is the second storage node, and the storage node 140 forwards the second read request to the second storage node to cause the second storage node to perform a read operation on the second read request.
- storage node 140 may be storage node 1.
- the transceiver unit 1401 can be configured to execute S201/S401/S601, and the processing unit 1402 can be configured to execute S205/S402/S603.
- the processing unit 1402 may be specifically configured to: determine a home node of the LBA carried by the second read request, where the home node of the LBA is configured to manage the second data to be read requested by the LBA and the second read request. Fingerprint mapping relationship; obtaining a fingerprint of the second data to be read from the home node of the second LBA; determining a home node of the second read request according to the fingerprint of the second data to be read.
- the transceiving unit 1401 may be configured to execute S201/S601
- the processing unit 1402 may be configured to execute S205/S603.
- the processing unit 1402 may be specifically configured to: determine, according to the LBA carried by the second read request, the home node of the second read request.
- the transceiving unit 1401 can be used to execute S401, and the processing unit 1402 can be used to execute S402.
- the storage node 140 may specifically be a different storage node in the same figure, for example, in conjunction with FIG. 6, in a scenario in which the storage system receives the first write request.
- the storage node 140 may specifically be the storage node 3; in the scenario where the storage system receives the first read request, the storage node 140 may specifically be the storage node 1.
- the same storage node 140 may have the foregoing technical solution provided in the scenario of reading and writing different data multiple times in a specific implementation.
- each unit in the storage node 140 illustrates, by way of example, the relationship between each unit in the storage node 140 and some steps in the method embodiment shown above. In fact, each unit in the storage node 140 is also Other related steps in the method embodiments shown above may be performed, and are not enumerated here.
- a hardware implementation of storage node 140 may refer to the storage node shown in FIG. 2.
- the receiving unit 1401 may correspond to the internal port 220 in FIG.
- Processing unit 1402 may correspond to the CPU and/or memory controller of FIG.
- the storage unit 1403 may correspond to the memory in FIG. 2, and may alternatively correspond to the memory chip in FIG. 2.
- the storage node provided by the embodiment of the present application can be used to perform the above-mentioned read and write process. Therefore, the technical solution can be obtained by referring to the foregoing method embodiments.
- the above embodiments it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof.
- a software program it may be implemented in whole or in part in the form of a computer program product.
- the computer program product includes one or more computer instructions.
- the computer program instructions When the computer program instructions are loaded and executed on a computer, the processes or functions described in accordance with embodiments of the present application are generated in whole or in part.
- the computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable device.
- the computer instructions can be stored in a computer readable storage medium or transferred from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions can be from a website site, computer, server or data
- the heart transmits to another website site, computer, server, or data center by wire (eg, coaxial cable, fiber optic, digital subscriber line (DSL), or wireless (eg, infrared, wireless, microwave, etc.).
- the computer readable storage medium can be any available media that can be accessed by a computer or a data storage device that includes one or more servers, data centers, etc. that can be integrated with the media.
- the usable medium may be a magnetic medium (eg, a floppy disk, a hard disk, a magnetic tape), an optical medium (eg, a DVD), or a semiconductor medium (eg, an SSD) or the like.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Computer Hardware Design (AREA)
- Quality & Reliability (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
元素 | 描述 |
FingerPrint | 数据的指纹 |
hostLBA | 写请求中携带的LBA |
hostLength | 数据的总长度 |
Seg.type | 条带化数据中的每一块是数据块还是校验块 |
Seg.diskID | 条带化数据中的每一块写入的盘(可以是虚拟盘或者物理盘)的ID |
Seg.startLBA | 条带化数据中的每一块在写入的盘中的起始LBA |
Seg.length | 条带化数据中的每一块的长度 |
元素 | 描述 |
FingerPrint | 指纹 |
LBA list | 该指纹对应的LBA列表 |
NodeID | 该指纹所指示的数据的归属节点的ID |
写请求 | 携带的信息 | LBA的归属节点 | 数据的归属节点 |
写请求1 | LBA1,待写数据1 | 存储节点A | 存储节点C |
写请求2 | LBA2,待写数据1 | 存储节点A | 存储节点C |
写请求3 | LBA3,待写数据1 | 存储节点B | 存储节点C |
写请求4 | LBA4,待写数据2 | 存储节点A | 存储节点D |
指纹 | 该指纹对应的LBA列表 | 该指纹所指示的数据的归属节点 |
待写数据1的指纹 | LBA1、LBA2 | 存储节点C |
待写数据2的指纹 | LBA4 | 存储节点D |
指纹 | 该指纹对应的LBA列表 | 该指纹所指示的数据的归属节点 |
待写数据1的指纹 | LBA3 | 存储节点C |
Claims (29)
- 一种数据访问方法,其特征在于,应用于存储系统中的第一存储节点,所述第一存储节点通过交换机与主机和所述存储系统中的至少一个第二存储节点连通,所述至少一个第二存储节点所包括的物理盘映射为所述第一存储节点的虚拟盘;所述方法包括:接收第一写请求,所述第一写请求携带第一待写数据;对所述第一待写数据进行条带化,得到条带化数据;并将所述条带化数据写入所述第一存储节点的物理盘和/或虚拟盘;记录所述条带化数据写入位置。
- 根据权利要求1所述的方法,其特征在于,当所述条带化数据写入的是所述虚拟盘时,将所述条带化数据写入所述第二存储节点中映射所述虚拟盘的物理盘。
- 根据权利要求1或2所述的方法,其特征在于,在记录所述条带化数据写入位置时,还记录所述第一待写数据的指纹。
- 根据权利要求1或2所述的方法,其特征在于,在记录所述条带化数据写入位置时,还记录所述第一待写数据的逻辑区块地址LBA。
- 根据权利要求1或2所述的方法,其特征在于,所述方法还包括:接收所述主机发送的第二写请求,所述第二写请求携带第二待写数据;根据所述第二写请求确定所述第二写请求的归属节点,如果所述第二写请求的归属节点是所述第一存储节点,则由所述第一存储节点对所述第二写请求执行写操作,如果所述第二写请求的归属节点是所述第二存储节点,则所述第一存储节点将所述第二写请求转发至所述第二存储节点,以使所述第二存储节点对所述第二写请求执行写操作。
- 根据权利要求5所述的方法,其特征在于,所述根据所述第二写请求确定所述第二写请求的归属节点包括:计算所述第二待写数据的指纹;根据所述第二待写数据的指纹确定所述第二写请求的归属节点。
- 根据权利要求6所述的方法,其特征在于,所述方法还包括:确定所述第二写请求携带的LBA的归属节点;其中,所述LBA的归属节点用于管理所述LBA与所述第二待写数据的指纹之间的映射关系。
- 根据权利要求5所述的方法,其特征在于,所述根据所述第二写请求确定所述第二写请求的归属节点包括:根据所述第二写请求携带的LBA确定所述第二写请求的归属节点。
- 根据权利要求3所述的方法,其特征在于,所述方法还包括:接收第一读请求所请求的第一待读数据的指纹;根据所述第一待读数据的指纹,获取所述第一待读数据的写入位置,并从所述第一待读数据的写入位置,读取所述第一待读数据的条带化数据。
- 根据权利要求4所述的方法,其特征在于,所述方法还包括:接收第一读请求,所述第一读请求携带第一LBA;根据所述第一LBA,获取所述第一读请求所请求的第一待读数据的写入位置,并 从所述第一待读数据的写入位置,读取所述第一待读数据的条带化数据。
- 根据权利要求1或2所述的方法,其特征在于,所述方法还包括:接收所述主机发送的第二读请求;根据所述第二读请求确定所述第二读请求的归属节点,如果所述第二读请求的归属节点是所述第一存储节点,则由所述第一存储节点对所述第二读请求执行读操作,如果所述第二读请求的归属节点是所述第二存储节点,则所述第一存储节点将所述第二读请求转发至所述第二存储节点,以使所述第二存储节点对所述第二读请求执行读操作。
- 根据权利要求11所述的方法,其特征在于,所述根据所述第二读请求确定所述第二读请求的归属节点包括:确定所述第二读请求携带的LBA的归属节点,所述LBA的归属节点用于管理所述LBA与所述第二读请求所请求的第二待读数据的指纹的映射关系;从所述第二LBA的归属节点中获取所述第二待读数据的指纹;根据所述第二待读数据的指纹确定所述第二读请求的归属节点。
- 根据权利要求11所述的方法,其特征在于,所述根据所述第二读请求确定所述第二读请求的归属节点,包括:根据所述第二读请求携带的LBA,确定所述第二读请求的归属节点。
- 一种存储节点,其特征在于,所述存储节点通过交换机与主机和所述存储系统中的至少一个第二存储节点连通,所述至少一个第二存储节点所包括的物理盘映射为所述存储节点的虚拟盘;所述存储节点包括:收发单元,用于接收第一写请求,所述第一写请求携带第一待写数据;处理单元,用于对所述第一待写数据进行条带化,得到条带化数据;并将所述条带化数据写入所述存储节点的物理盘和/或虚拟盘;存储单元,用于记录所述条带化数据写入位置。
- 根据权利要求14所述的存储节点,其特征在于,所述处理单元具体用于:当所述条带化数据写入的是所述虚拟盘时,将所述条带化数据写入所述第二存储节点中映射所述虚拟盘的物理盘。
- 根据权利要求14或15所述的存储节点,其特征在于,所述存储单元还用于:在记录所述条带化数据写入位置时,还记录所述第一待写数据的指纹。
- 根据权利要求14或15所述的存储节点,其特征在于,所述存储单元还用于:在记录所述条带化数据写入位置时,还记录所述第一待写数据的逻辑区块地址LBA。
- 根据权利要求14或15所述的存储节点,其特征在于,所述收发单元还用于:接收所述主机发送的第二写请求,所述第二写请求携带第二待写数据;所述处理单元还用于:根据所述第二写请求确定所述第二写请求的归属节点,如果所述第二写请求的归属节点是所述存储节点,则由所述存储节点对所述第二写请求执行写操作,如果所述第二写请求的归属节点是所述第二存储节点,则所述存储节点 将所述第二写请求转发至所述第二存储节点,以使所述第二存储节点对所述第二写请求执行写操作。
- 根据权利要求18所述的存储节点,其特征在于,所述处理单元具体用于:计算所述第二待写数据的指纹;根据所述第二待写数据的指纹确定所述第二写请求的归属节点。
- 根据权利要求19所述的存储节点,其特征在于,所述处理单元还用于:确定所述第二写请求携带的LBA的归属节点;其中,所述LBA的归属节点用于管理所述LBA与所述第二待写数据的指纹之间的映射关系。
- 根据权利要求18所述的存储节点,其特征在于,所述处理单元具体用于:根据所述第二写请求携带的LBA确定所述第二写请求的归属节点。
- 根据权利要求16所述的存储节点,其特征在于,所述收发单元还用于,接收第一读请求所请求的第一待读数据的指纹;所述处理单元还用于,根据所述第一待读数据的指纹,获取所述第一待读数据的写入位置,并从所述第一待读数据的写入位置,读取所述第一待读数据的条带化数据。
- 根据权利要求17所述的存储节点,其特征在于,所述收发单元还用于,接收第一读请求,所述第一读请求携带第一LBA;所述处理单元还用于:根据所述第一LBA,获取所述第一读请求所请求的第一待读数据的写入位置,并从所述第一待读数据的写入位置,读取所述第一待读数据的条带化数据。
- 根据权利要求14或15所述的存储节点,其特征在于,所述收发单元还用于,接收所述主机发送的第二读请求;所述处理单元还用于,根据所述第二读请求确定所述第二读请求的归属节点,如果所述第二读请求的归属节点是所述存储节点,则由所述存储节点对所述第二读请求执行读操作,如果所述第二读请求的归属节点是所述第二存储节点,则所述存储节点将所述第二读请求转发至所述第二存储节点,以使所述第二存储节点对所述第二读请求执行读操作。
- 根据权利要求24所述的存储节点,其特征在于,所述处理单元具体用于:确定所述第二读请求携带的LBA的归属节点,所述LBA的归属节点用于管理所述LBA与所述第二读请求所请求的第二待读数据的指纹的映射关系;从所述第二LBA的归属节点中获取所述第二待读数据的指纹;根据所述第二待读数据的指纹确定所述第二读请求的归属节点。
- 根据权利要求24所述的存储节点,其特征在于,所述处理单元具体用于:根据所述第二读请求携带的LBA,确定所述第二读请求的归属节点。
- 一种存储节点,其特征在于,包括:存储器和处理器,其中,所述存储器用于存储计算机程序,所述计算机程序被所述处理器执行时,使得如权利要求1至13任一项所述的方法被执行。
- 一种数据访问系统,其特征在于,包括:如权利要求14至27任一项所述的 存储节点,所述存储节点通过交换机与主机和所述存储系统中的至少一个第二存储节点连通,所述至少一个第二存储节点所包括的物理盘映射为所述存储节点的虚拟盘。
- 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该程序在计算机上运行时,使得如权利要求1至13任一项所述的方法被执行。
Priority Applications (10)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2017/096958 WO2019028799A1 (zh) | 2017-08-10 | 2017-08-10 | 一种数据访问方法、装置和系统 |
JP2020507656A JP7105870B2 (ja) | 2017-08-10 | 2017-08-10 | データアクセス方法、装置およびシステム |
EP23177282.3A EP4273688A3 (en) | 2017-08-10 | 2017-08-10 | Data access method, device and system |
CN202110394808.2A CN113485636B (zh) | 2017-08-10 | 2017-08-10 | 一种数据访问方法、装置和系统 |
KR1020207006897A KR20200037376A (ko) | 2017-08-10 | 2017-08-10 | 데이터 액세스 방법, 디바이스 및 시스템 |
CN201780002892.0A CN108064374B (zh) | 2017-08-10 | 2017-08-10 | 一种数据访问方法、装置和系统 |
EP17920626.3A EP3657315A4 (en) | 2017-08-10 | 2017-08-10 | DATA ACCESS METHOD, DEVICE AND SYSTEM |
US16/785,008 US11416172B2 (en) | 2017-08-10 | 2020-02-07 | Physical disk and virtual disk mapping in storage systems |
US17/872,201 US11748037B2 (en) | 2017-08-10 | 2022-07-25 | Physical disk and virtual disk mapping in storage systems |
US18/353,334 US20230359400A1 (en) | 2017-08-10 | 2023-07-17 | Data Access Method, Apparatus, and System |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2017/096958 WO2019028799A1 (zh) | 2017-08-10 | 2017-08-10 | 一种数据访问方法、装置和系统 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/785,008 Continuation US11416172B2 (en) | 2017-08-10 | 2020-02-07 | Physical disk and virtual disk mapping in storage systems |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2019028799A1 true WO2019028799A1 (zh) | 2019-02-14 |
Family
ID=62141827
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2017/096958 WO2019028799A1 (zh) | 2017-08-10 | 2017-08-10 | 一种数据访问方法、装置和系统 |
Country Status (6)
Country | Link |
---|---|
US (3) | US11416172B2 (zh) |
EP (2) | EP4273688A3 (zh) |
JP (1) | JP7105870B2 (zh) |
KR (1) | KR20200037376A (zh) |
CN (2) | CN113485636B (zh) |
WO (1) | WO2019028799A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111158948A (zh) * | 2019-12-30 | 2020-05-15 | 深信服科技股份有限公司 | 基于去重的数据存储与校验方法、装置及存储介质 |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109033420B (zh) * | 2018-08-08 | 2020-11-03 | 北京奇艺世纪科技有限公司 | 一种数据处理方法和装置 |
CN112256657B (zh) * | 2019-07-22 | 2023-03-28 | 华为技术有限公司 | 日志镜像方法及系统 |
CN112988034B (zh) * | 2019-12-02 | 2024-04-12 | 华为云计算技术有限公司 | 一种分布式系统数据写入方法及装置 |
WO2022002010A1 (zh) * | 2020-07-02 | 2022-01-06 | 华为技术有限公司 | 使用中间设备对数据处理的方法、计算机系统、及中间设备 |
CN112162693B (zh) * | 2020-09-04 | 2024-06-18 | 郑州浪潮数据技术有限公司 | 一种数据刷写方法、装置、电子设备和存储介质 |
CN112783722B (zh) * | 2021-01-12 | 2021-12-24 | 深圳大学 | 一种区块链安全监测方法、装置、电子设备及存储介质 |
CN113253944A (zh) * | 2021-07-07 | 2021-08-13 | 苏州浪潮智能科技有限公司 | 一种磁盘阵列访问方法、系统及存储介质 |
CN113419684B (zh) * | 2021-07-09 | 2023-02-24 | 深圳大普微电子科技有限公司 | 一种数据处理方法、装置、设备及可读存储介质 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102460439A (zh) * | 2009-04-30 | 2012-05-16 | 网络存储技术公司 | 通过条带式文件系统中的容量平衡进行数据分布 |
CN103092786A (zh) * | 2013-02-25 | 2013-05-08 | 浪潮(北京)电子信息产业有限公司 | 一种双控双活存储控制系统及方法 |
CN104020961A (zh) * | 2014-05-15 | 2014-09-03 | 深圳市深信服电子科技有限公司 | 分布式数据存储方法、装置及系统 |
CN105095290A (zh) * | 2014-05-15 | 2015-11-25 | 中国银联股份有限公司 | 一种分布式存储系统的数据布局方法 |
Family Cites Families (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7389393B1 (en) * | 2004-10-21 | 2008-06-17 | Symantec Operating Corporation | System and method for write forwarding in a storage environment employing distributed virtualization |
US9032164B2 (en) * | 2006-02-17 | 2015-05-12 | Emulex Corporation | Apparatus for performing storage virtualization |
JP4643543B2 (ja) | 2006-11-10 | 2011-03-02 | 株式会社東芝 | キャッシュ一貫性保証機能を有するストレージクラスタシステム |
US8566508B2 (en) * | 2009-04-08 | 2013-10-22 | Google Inc. | RAID configuration in a flash memory data storage device |
US8473690B1 (en) * | 2009-10-30 | 2013-06-25 | Netapp, Inc. | Using logical block addresses with generation numbers as data fingerprints to provide cache coherency |
CN102402394B (zh) | 2010-09-13 | 2014-10-22 | 腾讯科技(深圳)有限公司 | 一种基于哈希算法的数据存储方法及装置 |
US8788788B2 (en) * | 2011-08-11 | 2014-07-22 | Pure Storage, Inc. | Logical sector mapping in a flash storage array |
CN102622189B (zh) * | 2011-12-31 | 2015-11-25 | 华为数字技术(成都)有限公司 | 存储虚拟化的装置、数据存储方法及系统 |
US8972478B1 (en) | 2012-05-23 | 2015-03-03 | Netapp, Inc. | Using append only log format in data storage cluster with distributed zones for determining parity of reliability groups |
CN102915278A (zh) * | 2012-09-19 | 2013-02-06 | 浪潮(北京)电子信息产业有限公司 | 重复数据删除方法 |
US20140195634A1 (en) | 2013-01-10 | 2014-07-10 | Broadcom Corporation | System and Method for Multiservice Input/Output |
US9009397B1 (en) | 2013-09-27 | 2015-04-14 | Avalanche Technology, Inc. | Storage processor managing solid state disk array |
US9483431B2 (en) | 2013-04-17 | 2016-11-01 | Apeiron Data Systems | Method and apparatus for accessing multiple storage devices from multiple hosts without use of remote direct memory access (RDMA) |
US9986028B2 (en) | 2013-07-08 | 2018-05-29 | Intel Corporation | Techniques to replicate data between storage servers |
US11016820B2 (en) * | 2013-08-26 | 2021-05-25 | Vmware, Inc. | Load balancing of resources |
CN103942292A (zh) * | 2014-04-11 | 2014-07-23 | 华为技术有限公司 | 虚拟机镜像文件处理方法、装置及系统 |
US9558085B2 (en) * | 2014-07-02 | 2017-01-31 | Hedvig, Inc. | Creating and reverting to a snapshot of a virtual disk |
CN105612488B (zh) | 2014-09-15 | 2017-08-18 | 华为技术有限公司 | 数据写请求处理方法和存储阵列 |
US9565269B2 (en) | 2014-11-04 | 2017-02-07 | Pavilion Data Systems, Inc. | Non-volatile memory express over ethernet |
CN108702374A (zh) * | 2015-09-02 | 2018-10-23 | 科内克斯实验室公司 | 用于以太网类型网络上的存储器和I/O的远程访问的NVM Express控制器 |
CN105487818B (zh) * | 2015-11-27 | 2018-11-09 | 清华大学 | 针对云存储系统中重复冗余数据的高效去重方法 |
WO2017113960A1 (zh) * | 2015-12-28 | 2017-07-06 | 华为技术有限公司 | 一种数据处理方法以及NVMe存储器 |
CN107430494B (zh) * | 2016-01-29 | 2020-09-15 | 慧与发展有限责任合伙企业 | 用于远程直接存储器访问的系统、方法和介质 |
US10334334B2 (en) * | 2016-07-22 | 2019-06-25 | Intel Corporation | Storage sled and techniques for a data center |
US20180032249A1 (en) * | 2016-07-26 | 2018-02-01 | Microsoft Technology Licensing, Llc | Hardware to make remote storage access appear as local in a virtualized environment |
US10509708B2 (en) * | 2017-06-13 | 2019-12-17 | Vmware, Inc. | Code block resynchronization for distributed multi-mirror erasure coding system |
-
2017
- 2017-08-10 KR KR1020207006897A patent/KR20200037376A/ko not_active Application Discontinuation
- 2017-08-10 EP EP23177282.3A patent/EP4273688A3/en active Pending
- 2017-08-10 CN CN202110394808.2A patent/CN113485636B/zh active Active
- 2017-08-10 WO PCT/CN2017/096958 patent/WO2019028799A1/zh unknown
- 2017-08-10 JP JP2020507656A patent/JP7105870B2/ja active Active
- 2017-08-10 CN CN201780002892.0A patent/CN108064374B/zh active Active
- 2017-08-10 EP EP17920626.3A patent/EP3657315A4/en not_active Withdrawn
-
2020
- 2020-02-07 US US16/785,008 patent/US11416172B2/en active Active
-
2022
- 2022-07-25 US US17/872,201 patent/US11748037B2/en active Active
-
2023
- 2023-07-17 US US18/353,334 patent/US20230359400A1/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102460439A (zh) * | 2009-04-30 | 2012-05-16 | 网络存储技术公司 | 通过条带式文件系统中的容量平衡进行数据分布 |
CN103092786A (zh) * | 2013-02-25 | 2013-05-08 | 浪潮(北京)电子信息产业有限公司 | 一种双控双活存储控制系统及方法 |
CN104020961A (zh) * | 2014-05-15 | 2014-09-03 | 深圳市深信服电子科技有限公司 | 分布式数据存储方法、装置及系统 |
CN105095290A (zh) * | 2014-05-15 | 2015-11-25 | 中国银联股份有限公司 | 一种分布式存储系统的数据布局方法 |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111158948A (zh) * | 2019-12-30 | 2020-05-15 | 深信服科技股份有限公司 | 基于去重的数据存储与校验方法、装置及存储介质 |
CN111158948B (zh) * | 2019-12-30 | 2024-04-09 | 深信服科技股份有限公司 | 基于去重的数据存储与校验方法、装置及存储介质 |
Also Published As
Publication number | Publication date |
---|---|
US20200174708A1 (en) | 2020-06-04 |
EP4273688A3 (en) | 2024-01-03 |
US11748037B2 (en) | 2023-09-05 |
US11416172B2 (en) | 2022-08-16 |
EP3657315A1 (en) | 2020-05-27 |
CN113485636B (zh) | 2023-07-18 |
EP3657315A4 (en) | 2020-07-22 |
CN108064374A (zh) | 2018-05-22 |
JP2020530169A (ja) | 2020-10-15 |
EP4273688A2 (en) | 2023-11-08 |
CN108064374B (zh) | 2021-04-09 |
CN113485636A (zh) | 2021-10-08 |
US20230359400A1 (en) | 2023-11-09 |
KR20200037376A (ko) | 2020-04-08 |
US20220357894A1 (en) | 2022-11-10 |
JP7105870B2 (ja) | 2022-07-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113485636B (zh) | 一种数据访问方法、装置和系统 | |
US10459649B2 (en) | Host side deduplication | |
US11010078B2 (en) | Inline deduplication | |
US11620064B2 (en) | Asynchronous semi-inline deduplication | |
US11573855B2 (en) | Object format resilient to remote object store errors | |
WO2019127018A1 (zh) | 存储系统访问方法及装置 | |
US9548888B1 (en) | Technique for setting WWNN scope for multi-port fibre channel SCSI target deduplication appliances | |
US20230052732A1 (en) | Object and sequence number management | |
WO2019127021A1 (zh) | 存储系统中存储设备的管理方法及装置 | |
WO2021017782A1 (zh) | 分布式存储系统访问方法、客户端及计算机程序产品 | |
WO2019127017A1 (zh) | 存储系统中存储设备的管理方法及装置 | |
US11947419B2 (en) | Storage device with data deduplication, operation method of storage device, and operation method of storage server | |
US9501290B1 (en) | Techniques for generating unique identifiers | |
US11662917B1 (en) | Smart disk array enclosure race avoidance in high availability storage systems | |
US20210311654A1 (en) | Distributed Storage System and Computer Program Product |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 17920626 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2020507656 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 2017920626 Country of ref document: EP Effective date: 20200221 |
|
ENP | Entry into the national phase |
Ref document number: 20207006897 Country of ref document: KR Kind code of ref document: A |