WO2019080370A1 - 一种数据读写方法、装置和存储服务器 - Google Patents

一种数据读写方法、装置和存储服务器

Info

Publication number
WO2019080370A1
WO2019080370A1 PCT/CN2018/071637 CN2018071637W WO2019080370A1 WO 2019080370 A1 WO2019080370 A1 WO 2019080370A1 CN 2018071637 W CN2018071637 W CN 2018071637W WO 2019080370 A1 WO2019080370 A1 WO 2019080370A1
Authority
WO
WIPO (PCT)
Prior art keywords
storage
address
virtual
storage device
server
Prior art date
Application number
PCT/CN2018/071637
Other languages
English (en)
French (fr)
Inventor
姚唐仁
王晨
王�锋
冯玮
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP18870458.9A priority Critical patent/EP3690630A4/en
Priority to CN201880000062.9A priority patent/CN110651246B/zh
Publication of WO2019080370A1 publication Critical patent/WO2019080370A1/zh
Priority to US16/856,257 priority patent/US11397668B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/0223User address space allocation, e.g. contiguous or non contiguous base addressing
    • G06F12/023Free address space management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • G06F3/0611Improving I/O performance in relation to response time
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • G06F3/0619Improving the reliability of storage systems in relation to data integrity, e.g. data losses, bit errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0629Configuration or reconfiguration of storage systems
    • G06F3/0635Configuration or reconfiguration of storage systems by changing the path, e.g. traffic rerouting, path reconfiguration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/065Replication mechanisms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0662Virtualisation aspects
    • G06F3/0665Virtualisation aspects at area level, e.g. provisioning of virtual or logical volumes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/062Securing storage systems
    • G06F3/0623Securing storage systems in relation to content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0683Plurality of storage devices
    • G06F3/0689Disk arrays, e.g. RAID, JBOD

Definitions

  • the present invention relates to the field of storage, and more particularly to the field of distributed storage.
  • EC redundancy strategies are used to protect data.
  • this problem is more prominent.
  • the storage servers of the distributed storage system are distributed in multiple data centers or distributed in multiple available areas, the network between the storage servers and between the storage servers and the clients is blocked or the storage server is sub-healthy.
  • the probability is greatly increased, and it is easy to see a small number of problems with the storage server. This can result in a significant degradation in the performance of data written by the distributed storage system and a significant increase in latency.
  • the present invention provides a data reading and writing method, which is applied to a first storage server in a distributed storage system, where the distributed storage system includes a plurality of storage servers, and each storage server includes at least one storage device.
  • the storage space of the distributed storage system is managed by virtual storage blocks, and each of the virtual storage blocks corresponds to a plurality of the storage devices.
  • the method includes: a first storage server receiving a first write request, the first write request including a first slice of the first target data, a first virtual storage address, and a first storage device ID, the first virtual storage address Is a relative position in the first virtual storage block, where the first virtual storage address corresponds to a first logical block address of the first storage device, wherein the first storage device is a storage corresponding to the first virtual storage block
  • the first storage server stores the first storage device in the first storage device with the first logical block address as a starting address a slice, after the storage is completed, the first consecutive storage success address range that has been successfully stored since the start position;
  • the first storage server receives the first read request, and the first read request carries the first virtual storage An address segment and the first storage device ID, wherein a logical block address corresponding to the first virtual storage address segment is located at the first storage device, when the first virtual storage location The address segment belongs to the first consecutive storage success address range, and the data of the first
  • the storage server After the storage server receives the write request from the client and successfully stores the fragment in the write request to the storage device, the address range in which the storage is successfully stored is recorded according to the situation.
  • the slice stored based on the method can be directly read and returned to the client when it needs to be read out later. It is not necessary to read multiple fragments for additional verification, which reduces the influence of reading and reading the system resources.
  • the first consecutive storage success address range stores one fragment or multiple fragments, and when the number of the stored fragments is multiple, the storage location The segments are adjacent to each other.
  • the first slice is: a copy of the first target data.
  • the first fragment is an erasure code (EC) data strip of the first target data; or the first fragment is an error correction code EC of the first target data Check the strip.
  • the EC data fragment and the EC check fragment, the first target data split generates at least 2 EC data strips, and the plurality of EC data fragments generate at least one EC check strip by the EC algorithm.
  • the embodiment of the present invention supports a multi-copy distributed storage system and also supports an EC distributed storage system.
  • the first consecutive storage success address range is described by a virtual storage address range of the first virtual storage block, or the first consecutive storage successful address range Recording is performed in the form of a logical block address range in the first storage device.
  • the first consecutive storage success address range is a virtual address range, and the description manner can be diversified as long as the address range of consecutively storing the fragments from the first address of the first virtual storage block can be described. In other implementation methods, it is also possible to describe with the LBA address corresponding to this virtual address.
  • each of the tiles is a copy of the first target data.
  • the first target data is copied by the client to generate at least two copies; or the first target data may be copied by at least one storage server in the distributed storage system to generate at least two copies.
  • the method further includes:
  • the first storage server receives the second read request, where the second read request carries the second virtual storage address segment and the first storage device ID, where the first storage server detects: the second virtual storage The address segment does not belong to the first consecutive storage success address range; the first storage server sends a failure response message to the client server; after receiving the failure response message, the client server sends a third read request to the distribution.
  • the third read request carries the second virtual storage address segment and the second storage device ID, and the second storage device is different from the first storage device
  • the first storage device and the second storage device correspond to the first virtual storage block
  • the first The second storage server reads data of the first virtual storage address segment from the second storage device and returns the data to the client server, where the second continuous storage Storing a successful address range, indicating that the first virtual storage block is located in a virtual storage address range of the second storage device, starting from a first address of the first virtual storage block to the first fragment Fragments are stored consecutively between the end addresses.
  • the slice in the fifth possible implementation of the first aspect may be a copy. It can be seen that, in the scenario of multiple copies, if the data requested by the second read request is not read from a storage device, the next storage device may be selected according to the storage device corresponding to the first virtual storage block. The reading is performed, and the loop is sequentially performed until the data requested by the second read request is read.
  • the method before the first storage server receives the write request, the method further includes: the client server generates a plurality of fragments of the first target data, and selects a first virtuality for storing the first target data.
  • the storage block sends a plurality of write requests including the first write request to the storage server where the storage device corresponding to the first virtual storage block is located, and each write request includes one of the fragments and a corresponding virtual storage address.
  • the client server is installed with client software. This embodiment describes a method of storing the entire first target data.
  • the storage client receives the preset number of success response messages
  • the first target data is in the distributed storage The system is successfully stored, wherein the total number of the preset number of successful response messages is less than the total number of the plurality of write requests.
  • this embodiment does not require all write requests to be successfully written, as long as a predetermined number of write requests are successfully written.
  • Embodiments of the invention allow for the presence of slow nodes.
  • the distributed storage system is an additional write distributed system storage. Therefore, for multiple shards that are written to the same first logical storage unit in succession, the address that the client server shards to them is continuous.
  • the present invention also provides a data storage device that can perform the first aspect described above, as well as various possible implementations of the first aspect.
  • the data storage device may be a first storage server as hardware or a software code running in the first storage server.
  • the present invention also provides a first storage server that can perform the above first aspect, and various possible implementations of the first aspect.
  • the present invention further provides a data storage method for a distributed storage system, where the distributed storage system has multiple storage servers, each storage server includes at least one storage device, and the storage space of the distributed storage system is used.
  • the virtual storage blocks are managed, and each of the virtual storage blocks corresponds to a plurality of the storage devices.
  • the method includes: the client server generates a plurality of fragments of the first target data, and selects a first one for storing the first target data.
  • a virtual storage block generating a plurality of write requests, each write request carrying a slice, a virtual storage address, and a device address, wherein the virtual storage address is a relative position in the first virtual storage block, and each of the virtual storage addresses Corresponding to a logical block address in the storage device, the device address is used to mark a storage device in the storage server, and the storage device of the device address tag corresponds to the first virtual storage block, and the device address in each write request is different. (If it is multiple copies, the fragments are the same, which is a copy of the first target data; if it is EC, the fragments are different.
  • the slice is a data slice or a check slice of the first target data; the client server sends the multiple write requests to the plurality of storage servers; the storage server that receives the write request is obtained by the write request Fragment, virtual storage address, device address, find the corresponding storage device according to the device address, and store the fragment in the write request in the logical address (LBA) corresponding to the virtual storage address in the write request; for each storage device
  • the virtual storage block is recorded in the range in which the storage server is successfully stored continuously, if the fragment storage carried by the current write request is successful, and the storage device receives the first virtual virtual device before the write request.
  • the storage device carrying the write request of the storage block is also successfully stored, and the storage server where the storage device is located records the first consecutive storage success in the first storage device from the start position of the first virtual storage block. Continuously store successful address ranges.
  • the embodiment provided by the fourth aspect describes the complete process of writing first target data to a distributed storage system.
  • FIG. 1 is a topological diagram of an embodiment of a distributed storage system of the present invention
  • FIG. 2 is a flow chart of an embodiment of a data read/write method of the present invention
  • FIG. 3 is a diagram of a plurality of write requests in an embodiment of the present invention.
  • FIG. 4 is a functional block diagram of an embodiment of a data storage device of the present invention.
  • the embodiment of the present invention can be applied to a storage mode of "additional write", and the additional write storage mode is, for example, a log-structure or an append only.
  • the write-time redirection (ROW) also belongs to the additional write scenario, and therefore the embodiments of the invention of the present invention are also applicable.
  • the newly written data is allocated free storage space, so the newly written data does not occupy the storage space of the existing data.
  • the storage space allocated for adjacent write requests is also adjacent, so the virtual storage location of the data carried by the adjacent write request in the storage system is also adjacent, generally speaking, The storage space for a write request is located after the storage location of the previous write request.
  • the allocated storage space will be reserved, and the subsequent written data will not occupy the storage space allocated for unwritten successful data. After the data that has not been successfully written is successfully restored, the recovered data can be written into the reserved storage space.
  • the embodiments of the present invention are applicable to data read/write based on a distributed storage system, for example, a multi-copy based read/write mode or an erasure code (EC) read/write mode.
  • EC erasure code
  • the first target data is only used to refer to a piece of data, and there is no other limitation.
  • the first target data may be a file, an object, a block, a part of a file, or a part of an object.
  • a distributed storage system includes multiple storage servers, which can be located in different data centers or in different available areas (AZ).
  • the storage server 11, the storage server 12, and the storage server 13 together form a distributed storage system, and provide data read and write services for the client server 10, each storage server including a processor and At least one storage device, the processor processes the read request and the write request of the client server to provide a physical storage space, such as a storage medium such as a hard disk, a solid state drive (SSD), and a phase change memory (PCM).
  • the client server saves the first target data in the storage device of the distributed storage system by using multiple copies or ECs.
  • a client program is installed in the client server 10, and the client program can be used by the user.
  • the storage server includes a processor, memory, and can also be built-in or external storage devices.
  • Program code is stored in the memory
  • the processor can execute program code in the memory to perform an operation
  • the storage device provides a storage space of data.
  • the processor and the memory are independent of each other.
  • the storage server may also have no memory, such as a field programmable gate array (FPGA), which is equivalent to integrating the processor and the memory.
  • FPGA field programmable gate array
  • the storage server may be a computer, a general purpose server, or a dedicated storage device, and the dedicated storage server includes a storage controller and a storage device.
  • a storage device can store data persistently, such as a disk or a solid state drive.
  • the client server performs a copy operation on the first target data that needs to be written, and then writes the same data (copy) to different storage devices.
  • the first target data and the data generated by the copy may be collectively referred to as a copy; the first target data may also be referred to as a primary copy, and the copied generated data may be referred to as a copy (or a copy), and the foregoing embodiments use the former name. the way.
  • the replica and the storage device in the distributed storage system for storing this copy form a correspondence.
  • the client server divides the first target data into N data strips, and then generates M parity strips corresponding to the N data strips according to the EC algorithm. . N + M strips together form a stripe. Each of the N+M strips corresponds to a storage device in the distributed storage system.
  • the embodiments of the present invention are applicable to both storage methods. Therefore, for convenience of description, the embodiments of the present invention collectively refer to a copy or a slice as a slice unless otherwise specified.
  • the shard can be: a copy in the multi-copy storage mode, and a stripe (data strip or check strip) in the EC storage mode.
  • a storage server can correspond to no more than one fragment, which can improve the reliability of the first target data.
  • one storage server may correspond to at least two fragments, so that the first target data is stored in a centralized manner.
  • the address range in which the continuous write succeeds is recorded. It should be noted that for a fragment that has not been successfully written, the address range will become continuous after being successfully written by data recovery or the like, and thus it is regarded as continuous write success. Therefore, any data within the address range that is successfully written consecutively is the data that is successfully written.
  • the embodiment of the present invention can tolerate the existence of a slow node when writing data, that is, even if the partial fragment is not successfully written, the first target data can be successfully written as long as the number of successfully stored fragments reaches a preset value.
  • a distributed storage system that returns a successful response message to the client server. For a storage device that has not been successfully written for a while, it is called a slow node, and the data that the slow node has not successfully written can be recovered by the storage node data that has been successfully written.
  • the method described in the embodiment of the present invention is applied as long as the preset number (1 ⁇ preset number ⁇ total number of copies, for example, an optional method is 1 ⁇ preset number ⁇ copy If the copy of the total number is successfully stored, the first target data storage can be considered successful, and the prior art often needs to store all the copies successfully, and then the first target data is successfully stored in the distributed storage system, and the write is returned to the client server. A successful response message caused a delay in response time.
  • Quorum there is also a multi-copy storage mechanism called Quorum in the prior art, which can allow the target data to be successfully stored in the distributed storage system in a multi-copy storage mode without requiring all the replicas to be successfully stored.
  • this Quorum mechanism has the requirement of W (number of copies to be successfully written) + R (number of copies to be successfully read) > N (total number of copies). This means that if the number of successful writes is small, more copies must be read when reading the copy.
  • R>NW copies each copy carrying a version number, so as to compare versions between R copies, and determine the latest version of the copy, which causes the so-called "Read zoom" problem.
  • the embodiment of the present invention can directly determine whether the latest version of the copy is successfully written in the storage device, and the copy does not need to carry the version number, and does not need to perform version comparison. Therefore, in the embodiment of the present invention, only one copy needs to be read in a normal case, and the problem of read amplification is better solved.
  • the storage resources of the distributed storage system are provided by the storage device.
  • the storage resources in the distributed storage system can be logically divided into multiple partitions.
  • the storage resources of the partition come from multiple storage devices (each storage device provides a part of storage space for the partition), wherein the storage space provided by each storage device can be the same size.
  • the number of storage devices providing space for one partition may be less than the total number of storage devices in the distributed storage system.
  • Each partition can be further divided into multiple extents, and in the partition-related storage device, each storage device provides the same size storage space for the extent.
  • the management information of the extent the size of the extent, the storage server where the extent is located, and the storage device where the extent is located are described. Therefore, if the extent is selected, the storage server corresponding to the extent and the storage device corresponding to the extent are determined, and the write request can be sent to the corresponding storage server.
  • the embodiment of the present invention does not exclude the case where the storage resource is not divided into partitions, and the storage resources are directly divided into extents.
  • the storage server manages the storage space of the storage device by using the virtual storage unit, and the client server sends the fragment to the virtual storage unit in the storage device of the storage server for storage.
  • a virtual storage unit can be mapped to a physical address of a storage device.
  • the client server generates multiple fragments according to the first target data that needs to be stored. Select the extent for storing these fragments, and send a write request to the storage server where the selected extent (called the first extent) is located.
  • the way to generate multiple shards is to copy the first target data to generate 2 or more copies.
  • the slice is a slice in the EC storage mode, the multiple fragments are generated by dividing the first target data into N data stripes, and then generating N parity stripes from the N data stripes.
  • the storage server where the extent is located refers to the storage server that provides the storage space for the extent.
  • the extent of the storage server that provides the storage space for the extent is the storage device for the extent, and the extent and the extent are the extent. There is a correspondence between storage devices that provide storage space.
  • the virtual storage addresses of the write requests of different shards are the same, and the same virtual storage addresses correspond to the logical addresses of different storage devices.
  • the virtual storage addresses of the write requests of different fragments are different, and each virtual storage address corresponds to the logical address of one storage device.
  • the logical address corresponds to the physical address of the storage device, and the storage device can store the data into the corresponding physical address according to the logical address.
  • the write request sent by the client server to the storage server includes: device address, fragmentation, and virtual storage address.
  • Device address There are two ways to write a request to carry a device address: one is to carry the device address directly in one field of the write request; the other is to carry the information about the device address, and the storage server can obtain the information according to the device address.
  • Device address is to carry the device address directly in one field of the write request.
  • the device address includes: a server address + a storage device ID.
  • the device address is a server IP+storage device port number, and the storage device port number can directly determine the storage device.
  • the write request may also not directly carry the device port number, but carry information related to the storage device, and then the storage server determines the corresponding storage device according to the information related to the storage device.
  • the information used to determine the storage device carried in the write request (or read request) as a storage device ID.
  • the virtual storage address is the location in the virtual storage block, which can be represented by "extent ID + start address”; or "extent ID + start address + length".
  • the extent ID is used to mark an extent.
  • the extent is used to write the extent of the fragment; the starting address is the relative position in the extent, and is the start of the storage space allocated by the client server to the fragment.
  • the location; length information is optional. If at least 2 write requests are transmitted in the same data stream, the length can mark the end of the write request.
  • This application is based on an additional write storage method. Therefore, in the extent indicated by the extent ID, the address before the start address is an address that has already been allocated.
  • the embodiment of the invention adopts an additional write mode. Therefore, the virtual storage address assigned to the shard is continuous. In other words, in the same extent, the virtual storage address allocated this time is adjacent to the last allocated virtual storage address.
  • the device address of a write request is: IP address 211.113.44.13 + port 32; virtual storage address is: extent 5 + start address 100 + fragment length 50.
  • the storage server address sent to is 211.1133.43.13, and the storage device to which the storage device is sent is the storage device corresponding to port 32 of the storage server of 211.133.44.13.
  • the storage location indicated by the write request is: in the extent 5, with the offset 100 as the starting position, the writing of the length is 50.
  • the write request is referred to as a first write request, and the slice carried by the first write request is referred to as a first slice.
  • the device address included in the first write request is referred to as a first device address
  • the first device address is a first storage device of the first storage server
  • the fragment included in the first write request is referred to as a first fragment.
  • the virtual storage address included in the first write request is referred to as a first virtual storage address, and the first virtual storage address belongs to the first extent.
  • the first device address is the address of the first device located at the first storage server.
  • the first storage server receives a first write request from the client server, and stores the first fragment in the first write request into a first virtual storage address of the first storage device.
  • the address range in which the consecutive storage succeeds is generated, and the starting address of the address range in which the consecutive storage succeeds is the starting address of the first extent, and the ending address is The end address of the first slice in the first extent; when the first extent writes the first extent, but not the first slice of the first extent, writes all the fragments of the first virtual storage block (including If all the writings of the first fragment are successful, the address range of the first extent continuous storage is updated, and the starting address of the updated address range of the first extent consecutively stored is the starting address of the first extent.
  • the end address is the end address of the first fragment in the first extent. If the first slice write is unsuccessful, the address range in which the consecutive storage is successful is not updated, and the address range in which the continuous storage is successful is unchanged.
  • the first storage device stores the first fragment into a storage medium corresponding to the first virtual storage address.
  • a mapping relationship between a virtual storage address and a logical block address is recorded in the first storage device, and the first storage server may convert the first virtual storage address into a first logical block address (LBA) according to the mapping relationship. And storing the first slice with the first LBA address as a first address.
  • LBA logical block address
  • the storage server further stores, in the first storage device, the address range in which the first extent is successfully stored from the initial location, and may be stored in the first storage device or in another device. As long as it can be read by the first storage server.
  • the address range of the first extent continuous storage success refers to the address range of the fragment that continues to be successfully written from the starting position in the address range of the first extent from the first storage device. And these fragments that continue to be successfully written are in close proximity, and there is no address of unwritten data.
  • the address range of the successfully written slice can be described only by the last address of the address range of the successfully written slice. .
  • the concept of continuously storing a successful address range is described in more detail below in conjunction with FIG. In other words, all slices are successfully written in the address range where consecutive storage is successful, and the fragments are adjacent; there are neither free addresses nor addresses that failed to write.
  • the logical storage space of an extent is distributed among multiple storage devices.
  • the address range of the extent continuous storage in the embodiments of the present invention is for a single storage device, and the extent is located in a certain storage.
  • the portion of the logical storage space in the device successfully stores multiple fragments in succession (there is no fragment that has not been successfully stored).
  • continuous means that the last address of the slice is adjacent to the first address of the next slice, in other words, there is no hole between the fragments.
  • the principle of recording the address range in which the first extent is successfully stored is the same.
  • the embodiment is only exemplified by the case of the first storage device.
  • the same extent performs a total of 4 slice storage operations, and the operation object of each storage operation is 3 slices.
  • the extent1 corresponds to three storage devices, and the logical storage spaces corresponding to the three storage devices are extent1-1, extent1-2, and extent1-3.
  • the logical storage space of one extent corresponding to different storage devices can be described by the same virtual address.
  • extent1-1, extent1-2, and extent1-3 are all described using extent1.
  • the write request received is a write request to extnet1.
  • the same first target data is the same.
  • the virtual storage address carried is the same.
  • fragment 1 For each storage device corresponding to extent1, the fragment received by the first write request is fragment 1, which is the first fragment written to extent 1, so it is stored from the beginning of extent 1. . Fragment 1 is successfully written in all three storage devices, and in these three storage devices, the last address of the address range in which the consecutive storage succeeds is the last address of the slice 1. The first address of the address range for which the three consecutive storages are successful, as described above, is the starting position of extent 1.
  • the second write request received includes a slice 2, and slice 2 is successfully written in storage device A and storage device C of extent 1, and extent 1 is stored.
  • Device B write was unsuccessful.
  • extent 1 (specifically, extent 1 of extent 1 is located in extent 1-1 of storage device A, and extent 1 of extent 1 of extent 1 is located in storage device C), the last address of the address range in which the address range is successfully stored. Is the last address of slice 2.
  • the last address of the address range in which extent 1 (specifically, extent 1-2) is successfully stored is still the last address of slice 1.
  • the third write request received includes the slice 3, and the slice 3 is successfully written on the 3 storage devices of extent 1. Then, in the storage device A and the storage device C, the last address of the address range in which the extent1 consecutively stores success is the last address of the slice 3. In the storage device B, since the slice 2 is not restored, the storage location of the slice 2 is reserved to form a "hole". Therefore, the last address of the address range in which extent 1 is successfully stored is still the last address of slice 1.
  • the fourth write request received includes the fragment 4, and the fragment 4 is successfully written in the three storage devices of extent 1.
  • the slice 3 is successfully restored in the storage device B, that is to say, the "hole" that existed in the storage device B is filled by the slice 2.
  • the last address of the address range in which the extent 1 is successfully stored is the last address of the slice 3.
  • LBA logical block address
  • the starting address of the address range in which the consecutive storage is successful is the starting position of the extent.
  • the address range in which the consecutive storage succeeds is updated to the last address of the fragment; If the slice is not successfully written, or if there is no successful slice before the slice is in the extent, the address range in which the consecutive storage is successful is not updated.
  • the first storage device corresponding to the port 32 converts the extent 5+ offset 100+ fragment length 50 into an LBA address range: LBA 40339215+slice length 50. And writing the slice to the physical storage space corresponding to the LBA address range generated by the conversion.
  • the first storage server sends a write success response message of the first write request to the client server.
  • the write success response message and the first write request carry the same write request ID.
  • the client server may learn that the first fragment sent to the first storage device is successfully written, and in the first storage device, from the extent1 There is no hole between the start of the first address and the end address of the first slice.
  • the client server may record that the first storage location is successfully written, and record the first storage location to the metadata of the first target data.
  • the client server may also receive a response message for other write requests. If the write success response received by the client server reaches a preset number, and the storage device corresponding to the write success response message does not have a fragment that is not successfully written, the first target data is in the distributed storage. The storage is successful. In the multiple copy storage mode, 1 ⁇ the preset number of values ⁇ the total number of copies, for the storage mode of the EC, N ⁇ the preset number of values ⁇ N + M.
  • the storage device (slow node) that has not been successfully written may obtain the fragment from the successfully written storage device and store it, and then send the write success response after the storage is successful.
  • a slow storage device since the time for storing the slice is delayed, it is called a slow storage device.
  • the client server receives 2 successful responses, and the extents written by the two successfully written copies are consecutively stored successfully, and the first target data fragment can be considered to be successfully stored in the distributed storage system.
  • the client server can record the extent 5 + offset 100 + slice length 50 written in the distributed storage system.
  • the above is the flow of writing data.
  • the flow of reading data is introduced below.
  • the granularity of the read data is irrelevant to the granularity of the written data and may be greater than, equal to, or less than the length of one slice.
  • the read process The data that the client server needs to read is the second target data.
  • the read process and the write process can be relatively independent, and the two do not have a sequential relationship in execution time.
  • the client server sends a read request to a storage server, where the read request carries a storage device address, and a virtual storage address segment of data that needs to be read.
  • the number of read requests can be only one.
  • the number of read requests is determined by the number of storage devices distributed by the second target data.
  • the first target data is 30 bytes, and is written to 5 storage devices in the distributed storage system by 3+2 (3 data fragments + 2 parity fragments), and the length of each fragment is It is 10 bytes.
  • the first data fragment includes the first to tenth bytes of the first target data
  • the second data fragment includes the 11th to the 20th bytes of the first target data
  • the third data fragment includes the first data fragment 21st to 30th bytes of a target data.
  • the second target data that the client server needs to read is the 11th to 20th bytes
  • a read request is issued to the storage device where the second data fragment is located, and the requested data length is 10 bytes.
  • the second target data that the client server needs to read is the 26th to 30th bytes
  • a read request is issued to the storage device where the third data fragment is located, and the requested data length is 5 bytes.
  • the second target data that the client server needs to read is the first to the twenty-fifth bytes, three read requests are respectively issued, and the destinations of the three read requests are the storage devices in which the three data fragments are respectively located.
  • the data lengths requested by the three read requests are 10 bytes, 10 bytes, and 5 bytes, respectively.
  • the storage device processes the read request in a similar manner. Therefore, for convenience of explanation, the following still describes one read request as an example.
  • the read request is named as the first read request.
  • the operation of the client sending the read request to the corresponding storage server includes sending a read first read request to the first storage server corresponding to the first read request.
  • the data in the first read request includes: a first device address, and a first virtual storage address segment.
  • the first virtual storage address segment includes: extent ID+start address+length, or extent ID+start address+end address.
  • the first virtual memory address segment indicates the location of the data that needs to be read by the read request in the first virtual memory block.
  • the ID of the second target data (the ID of the first target data is, for example, the file name of the second target data, or the second target data)
  • the hash value can obtain the metadata of the second target data, and the metadata of the second target data can know the virtual storage address range corresponding to the second target data, the virtual storage address range includes the extent ID, and the extent ID corresponds to the partition. , and the partition and storage device have a corresponding relationship. Therefore, the client server can obtain the storage device where the data to be read is located by the second target data ID.
  • the storage device belongs to the storage server. Therefore, after obtaining the storage device where the data to be read is located, the storage server where the data to be read is obtained is obtained.
  • the device address of the first read request is: IP address 211.11334.43 + port 32; the first virtual storage address segment is: extent 5 + start address 100 + data length 30.
  • the server address to be sent is 211.1133.44.13, and the storage device to which the first storage request is sent is the storage device corresponding to the port 32.
  • the storage location of the storage server indicated by the first read request is: in the extent 5, the data whose length is 30 is read with the offset of 100 as the starting position.
  • the first storage server determines whether the address range in which the first extent consecutively succeeds includes the first virtual storage address segment. If the result of the determination is yes, the data located in the first virtual storage address segment is read from the first storage device. As previously mentioned, the information "continuous storage of successful address ranges" can be stored in the first storage device.
  • the first storage server reads data from the second virtual storage address of the first storage device and returns it to the client server.
  • the client server may regenerate the read request and send it to the next storage server (second storage server), and try to The next storage server obtains data of the first virtual storage address segment.
  • the next storage server continues to make similar judgments until it finds whether the address range in which the first extent consecutively succeeds in the address includes the storage device of the second virtual storage address, and reads data from it and returns it to the client server.
  • the client server determines whether the storage device A stores the slice 2. Since in the storage device A, the end of the address range in which the extent1 is successfully stored consecutively is the last address of the fragment 3, the address range in which the first extent consecutively succeeds includes the virtual storage address of the fragment 2, so the result of the judgment is "Yes" ".
  • the storage device B receives the fourth read request, if it is determined whether the storage device B stores the slice 2, since the end of the address range in which the extent1 consecutively succeeds in the storage device B is the last address of the slice 1, The address range in which the first extent consecutively succeeds does not include the virtual storage address of the slice 2, and the result of the determination is "NO".
  • the first client server obtains the second target data.
  • the second target data can be directly obtained from the response message of the first read request.
  • the second target data can be directly obtained from the response message of the first read request; if the client server sends out in step 206
  • the number of read requests is at least two, and the fragments obtained from the response messages of the plurality of read requests are combined to form the second target data.
  • the above steps 204-206 describe how to obtain the second target data from the first storage device of the distributed storage system.
  • the first storage device cannot provide the data requested by the client server, and the client server sends a read request to the next storage server (specifically, the storage device of the storage server), and continues to try to read the client. If the data requested by the server is still not successfully read, it continues to send a read request to the next storage server (specifically, the storage device of the storage server), and so on, until the data requested by the client server is read.
  • the first storage server receives the second read request, and the content carried in the second read request refers to the first read read request, for example, includes a second virtual storage address segment and the first storage device ID. Different from step 205, the first storage server detects that the second virtual storage address segment does not belong to the first consecutive storage success address range. Then, the first storage server sends a failure response message to the client server.
  • the client server After receiving the failure response message, the client server sends a third read request to the second storage server in the distributed storage system, and the content carried in the third read request refers to the first read request, for example,
  • the third read request carries the second virtual storage address segment and the second storage device ID.
  • the second storage device is different from the first storage device, and the first storage device and the second storage device correspond to the first virtual storage block.
  • the second storage server When the second virtual storage address segment belongs to a second consecutive storage success address range in the second storage server, the second storage server reads the first virtual storage address segment from the second storage device The data is returned to the client server.
  • the process of reading data from the second storage device can be referred to step 205.
  • the second consecutive storage success address range indicates that the first virtual storage block is located in a virtual storage address range of the second storage device, starting from a first address of the first virtual storage block. Fragments are successively stored between the end addresses of the first slice.
  • the present invention also provides an embodiment of a storage medium in which program code is recorded.
  • the storage server can execute the above method by executing the program code.
  • the present invention also provides a computer program by which a storage server can execute the above method.
  • the storage device may be comprised of software modules running on the storage server to collectively perform the operations performed by the storage server in steps 202-205 above.
  • the storage device may also be hardware, such as a hardware device consisting of a processor, a memory, and a storage device, with the program running in memory executing the operations of the storage server in steps 202-205.
  • the data storage device 4 is applied to a first storage server in the distributed storage system, the distributed storage system includes a plurality of storage servers, each storage server includes at least one storage device, Each of the virtual storage blocks of the distributed storage system corresponds to a plurality of the storage devices.
  • the data read/write device 4 includes an interface module 41, a storage module 42 and a reading module 43.
  • the interface module 41 is configured to receive a write request and a read request, where the write request includes a first fragment of the first target data, a first virtual storage address, and a first storage device ID, where the first virtual storage address is the first a first virtual storage address corresponding to a first logical block address of the first storage device, wherein the first storage device is in a storage device corresponding to the first virtual storage block One, the first storage device belongs to the first storage server management.
  • the storage module 42 is configured to store the first fragment in the first storage device by using the first logical block address as a starting address, after the first fragment storage is completed, when the first The virtual storage block is located in the virtual storage address range of the first storage device, and the fragment is continuously stored from the first address of the first virtual storage block to the end address of the first fragment, and the continuous The address segment is recorded as the first consecutive storage success address range.
  • the interface module 41 is further configured to receive a read request, where the read request carries a first virtual storage address segment, where a logical block address corresponding to the first virtual storage address segment is located in the first storage device.
  • the reading module 43 is configured to read data of the first virtual storage address segment from the first storage device when the first virtual storage address segment belongs to the continuous storage success address range.
  • the storage server receives the write request of the client and stores the file, and each write request carries: the slice to be written, the first storage device ID, and the virtual of the first virtual storage block.
  • the storage address if the virtual storage block is located in the virtual storage space of the storage device and the storage is successful from the starting position, the address range in which the continuous storage is successful is recorded.
  • the data within the address range that is successfully stored successfully is the data that is successfully stored.
  • the data can be read in accordance with the virtual storage address segment. If the address range in which the continuous storage is successful is provided by the embodiment of the present invention, it can be ensured that the read data is correct. However, if the address range in which the continuous storage is successfully provided by the embodiment of the present invention is not used, it is possible to read out the correct data, and it is also possible to read the erroneous data. For example, data that has been deleted by the system can still be read out, but obviously this is not the data that the user needs, in other words, the wrong data is read. In this case, the version number must be used. The client server determines whether the version number carried in the read data is the same as the version number of the data that needs to be read. If the same, the read data is correct. Otherwise, the wrong data is read.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种分布式系统的数据存储方法,存储服务器(11,12,13)接收客户端服务器(10)的写请求并进行存储,每个写请求中携带:待写的分片、第一存储设备ID以及第一虚拟存储块的虚拟存储地址;如果虚拟存储块位于这个存储设备的虚拟存储空间中,从起始位置开始一直存储成功,则记录连续存储成功的地址范围。对各个存储设备而言,在连续存储成功的地址范围之内的数据都是存储成功的数据,在所述存储服务器(11,12,13)收到客户端服务器(10)对这个地址范围内的地址段的读请求时,可以直接返回需要读取的数据给客户端。

Description

一种数据读写方法、装置和存储服务器 技术领域
本发明涉及存储领域,特别涉及分布式存储领域。
背景技术
在分布式存储系统中,为了降低数据丢失的概率,采用多副本或者纠删码(EC)的冗余策略来保护数据。在使用这些冗余策略存储数据时,对于数据有强一致性要求的系统。具体而言:客户端把数据写到分布式存储系统中,对于多副本存储系统,需要等待所有副本都成功保存后,客户端才认为多副本才在分布式存储系统中存储成功;而EC存储中,需要所有条带(strip)都存储成功,客户端才认为数据在分布式存储系统存储成功。这种方式比较便利地保证了数据的强一致性,但是,任何一个节点如果暂时没有写成功,都会影响到客户端存储数据的时延和性能。
在大规模存储集群的场景中,这个问题更加突出。特别是如果分布式存储系统的存储服务器分布于多个数据中心或分布于多个可用区域的场景中,存储服务器之间、以及存储服务器和客户端之间的网络出现阻塞或存储服务器亚健康的概率大大增加,很容易出现少量的存储服务器出现问题的情况。这会导致分布式存储系统写数据的性能明显下降、时延大大增加。
发明内容
第一方面,本发明提供一种数据读写方法,应用于分布式存储系统中的第一存储服务器,所述分布式存储系统包括多个存储服务器,每个存储服务器包括至少一个存储设备,所述分布式存储系统的存储空间用虚拟存储块进行管理,每个所述虚拟存储块对应多个所述存储设备。该方法包括:第一存储服务器接收第一写请求,所述第一写请求包括第一目标数据的第一分片、第一虚拟存储地址和第一存储设备ID,所述第一虚拟存储地址是第一虚拟存储块中的相对位置,所述第一虚拟存储地址与第一存储设备的第一逻辑块地址对应,其中,所述第一存储设备是所述第一虚拟存储块对应的存储设备中的一个,所述第一存储设备属于所述第一存储服务器管理;所述第一存储服务器以所述第一逻辑块地址作为起始地址在所述第一存储设备中存储所述第一分片,存储完成后,自起始位置开始已连续存储成功的第一连续存储成功地址范围;所述第一存储服务器接收第一读请求,所述第一读请求中携带第一虚拟存储地址段和所述第一存储设备ID,其中,第一虚拟存储地址段所对应的逻辑块地址位于所述第一存储设备,当所述第一虚拟存储地址段属于所述第一连续存储成功地址范围,从所述第一存储设备中读取所述第一虚拟存储地址段的数据。
应用该方法,存储服务器接收客户端的写请求并成功把写请求中的分片存储到存储设备后,会根据情况记录连续存储成功的地址范围。基于该方法存储的分片在后续需要被读出时,可以直接读出后返回给客户端,不需要读取多个分片进行额外的校验,减少了读放大读系统资源的影响。
在第一方面的第一种可能实现方式中,所述第一连续存储成功地址范围中存储有一个分片或者多个分片,当存储的所述分片数量是多个时,存储的所述分片之间相邻。
在第一方面的第二种可能实现中,所述第一分片是:所述第一目标数据的一个副本。 或者,所述第一分片是所述第一目标数据的一个纠错码(erasure code,EC)数据条带;或者所述第一分片是所述第一目标数据的一个纠错码EC校验条带。EC数据分片和EC校验分片,第一目标数据拆分生成至少2个EC数据条带,所述多个EC数据分片通过EC算法生成至少一个EC校验条带。
因此,本发明实施例支持多副本的分布式存储系统,也支持EC的分布式存储系统。
在第一方面的第三种可能实现中,所述第一连续存储成功地址范围用所述第一虚拟存储块的虚拟存储地址范围的形式进行描述,或者,所述第一连续存储成功地址范围以所述第一存储设备中的逻辑块地址范围的形式进行记录。
第一连续存储成功地址范围是一个虚拟地址范围,只要能够描述出从第一虚拟存储块的首地址开始连续成功存储分片的地址范围即可,描述方式可以多样化。在其他实现方法中,用这段虚拟地址对应的LBA地址进行描述也是可以的。
在第一方面的第四种可能实现中,每个所述分片是所述第一目标数据的一个副本。由客户端对第一目标数据进行复制生成至少2个副本;也可以由分布式存储系统中的某一个存储服务器对第一目标数据进行复制生成至少2个副本。
在第一方面的第五种可能实现中,其中,还包括:
所述第一存储服务器接收第二读请求,所述第二读请求中携带第二虚拟存储地址段和所述第一存储设备ID,其中,第一存储服务器检测到:所述第二虚拟存储地址段不属于所述第一连续存储成功地址范围;所述第一存储服务器发送失败响应消息给客户端服务器;客户端服务器收到所述失败响应消息后,发送第三读请求给所述分布式存储系统中的第二存储服务器,所述第三读请求携带所述第二虚拟存储地址段和所述第二存储设备ID,所述第二存储设备和所述第一存储设备不同,所述第一存储设备和所述第二存储设备对应所述第一虚拟存储块;当所述第二虚拟存储地址段属于所述第二存储服务器中的第二连续存储成功地址范围,所述第二存储服务器从所述第二存储设备中读取所述第一虚拟存储地址段的数据返回给所述客户端服务器其中,所述第二连续存储成功地址范围,指示了在所述第一虚拟存储块位于所述第二存储设备的虚拟存储地址范围之中,在从所述第一虚拟存储块的首地址开始到所述第一分片的结束地址之间连续存储有分片。
第一方面的第五种可能实现中的分片可以是副本。由此可以看出,在多副本的场景下,如果从一个存储设备中没有读出第二读请求所请求的数据,可以按照第一虚拟存储块对应的存储设备中选择下一个存储设备继续尝试进行读取,依次循环,直至读出第二读请求所请求的数据为止。
第一方面的第六种可能实现中,在第一存储服务器接收写请求之前,还包括:客户端服务器生成第一目标数据的多个分片,选择用于存储第一目标数据的第一虚拟存储块,向第一虚拟存储块对应的存储设备所在存储服务器发送包括所述第一写请求在内的多个写请求,每个写请求中包括一个所述分片以及对应的虚拟存储地址。
客户端服务器安装有客户端软件。该实施例描述了第一目标数据整体的存储方法。
基于第一方面的第六种可能实现,在第一方面的第七种可能实现中,所述存储客户端收到预设数量的成功响应消息后,所述第一目标数据在上述分布式存储系统中存储成功,其中,所述预设数量的成功响应消息的总数小于所述多个写请求的总数。
由此可以看出,本实施例不需要全部写请求都写成功,只要预定数量的写请求写成功 即可。本发明实施例允许慢节点的存在。
第一方面的第七种可能实现中,所述分布式存储系统是追加写分布式系统存储。因此,对于先后写入同一个第一逻辑存储单元的多个分片,客户端服务器给它们分片的地址是连续的。
第二方面,本发明还提供一种数据存储装置,可以执行上述第一方面、以及第一方面的各个可能实现。数据存储装置可以是作为硬件的第一存储服务器,或者是第一存储服务器中运行的软件代码。
第三方面,本发明还提供一种第一存储服务器,可以执行上述第一方面、以及第一方面的各个可能实现。
第四方面,本发明还提供一种分布式存储系统的数据存储方法,所述分布式存储系统多个存储服务器,每个存储服务器包括至少一个存储设备,所述分布式存储系统的存储空间用虚拟存储块进行管理,每个所述虚拟存储块对应多个所述存储设备,该方法包括:客户端服务器生成第一目标数据的多个分片,选择用于存储第一目标数据的第一虚拟存储块,生成多个写请求,每个写请求携带分片、虚拟存储地址、设备地址,其中,所述虚拟存储地址是第一虚拟存储块中的相对位置,每个所述虚拟存储地址与存储设备中的一段逻辑块地址对应,所述设备地址用于标记存储服务器中的存储设备,设备地址标记的存储设备与所述第一虚拟存储块对应,每个写请求中的设备地址不同(如果是多副本,则分片相同,是第一目标数据的一个副本;如果是EC,则分片不同,分片是第一目标数据的一个数据分片或者一个校验分片);客户端服务器发送所述多个写请求给多个存储服务器;收到写请求的存储服务器,获得写请求中携带的分片、虚拟存储地址、设备地址,按照设备地址找到对应的存储设备,按照写请求中所述虚拟存储地址对应的逻辑地址(LBA)中写请求中的分片进行存储;对各个存储设备而言,对每个虚拟存储块在本存储服务器被连续存储成功的范围进行记录,如果本次写请求携带的分片存储成功,并且在本次写请求之前这个存储设备收到的对第一虚拟存储块的写请求所携带的分片也全部存储成功,则存储设备所在的存储服务器记录所述第一存储设备中、自所述第一虚拟存储块起始位置开始已连续存储成功的第一连续存储成功地址范围。
第四方面提供的实施例描述了把第一目标数据写入分布式存储系统的完整过程。
附图说明
为了更清楚地说明本发明实施例的技术方案,下面将对本发明的实施例所需要使用的附图作简单地介绍,下面描述中的附图仅仅是本发明的一些实施例,还可以根据这些附图获得其他的附图。
图1是本发明分布式存储系统实施例的拓扑图;
图2本发明数据读写方法实施例的流程图;
图3是本发明实施例中对多个写请求的图;
图4是本发明数据存储装置实施例的功能结构图。
具体实施方式
下面将结合本发明实施例中的附图,对本发明的技术方案进行清楚、完整地描述,显然,所描述的实施例仅是本发明一部分实施例,而不是全部的实施例。基于本发明中所介 绍的实施例所获得的所有其他实施例,都属于本发明保护的范围。
本发明实施例可以适用于“追加写”的存储方式,追加写存储方式例如是日志-结构(log-structure)或者是仅增加(append only)。此外,写时重定向(ROW)也属于追加写场景,因此也适用本发明的各发明的实施例。在追加写这种场景下,为新写入的数据分配空闲的存储空间,因此新写入的数据不会占用已有数据的存储空间。在追加写这种场景下,为相邻的写请求分配的存储空间也是相邻的,因此相邻写请求所携带的数据在存储系统中的虚拟存储位置也是相邻的,通常来讲,后一个写请求的存储空间位于前一个写请求的存储位置之后。如果有数据没有写成功,那么为其分配的存储空间会预留下来,后续写入的数据并不会占用给未写成功数据分配的存储空间。待没有写成功的数据被成功恢复后,可以将恢复得到的数据写入为其预留的存储空间。本发明实施例适用于基于分布式存储系统的数据读/写,例如基于多副本的读/写方式或者纠删码(erasure code,EC)的读/写方式。
为了方便说明,我们把需要写入分布式存储系统的数据称为第一目标数据。第一目标数据仅用于指代是一段数据,并没有其他的限制,第一目标数据可以是文件、对象(object)、块(block)、文件的一部分或者对象的一部分。分布式存储系统包括多个存储服务器,这些存储服务器可以位于不同的数据中心,或者不同的可用区域(Available Zone,AZ)。
参见附图1是本发明实施例拓扑图,存储服务器11、存储服务器12和存储服务器13共同组成分布式存储系统,为客户端服务器10提供数据读、写服务,每个存储服务器包括处理器和至少一个存储设备,处理器对客户端服务器的读请求和写请求进行处理,提供物理上的存储空间,存储设备例如是硬盘、固态硬盘(SSD)以及相变存储器(PCM)等存储介质。客户端服务器使用多副本或者EC的方式把把第一目标数据保存在分布式存储系统的存储设备中。客户端服务器10中安装有客户端程序,客户端程序可以被用户使用。
存储服务器包括处理器、内存,还可以内置或者外接存储设备。所述内存中存储有程序代码,所述处理器可以运行所述内存中的程序代码来执行操作,所述存储设备提供数据的存储空间。处理器和内存是相互独立的,在其他实施例中,存储服务器中也可以没有内存,例如现场可编程门阵列(FPGA),相当于处理器和内存集成在一起。存储服务器可以是计算机、通用服务器、或者是专用的存储设备,专用存储服务器包括存储控制器和存储设备组成。存储设备可以持久化的存储数据,例如是磁盘或者固态硬盘。
在多副本存储方式中,客户端服务器对需要写入的第一目标数据进行复制操作,然后把相同的数据(副本)写入不同的存储设备。可以把第一目标数据和复制生成的数据统称为副本;也可以把第一目标数据称为主副本,复制生成的数据称为副本(或者从副本),本申请各实施例使用前一种命名方式。副本和分布式存储系统中用于存储这个副本的存储设备形成对应关系。
而在EC存储方式中,客户端服务器把第一目标数据分成N个数据条带(data strip),然后按照EC算法生成与这N个数据条带对应的M个校验条带(parity strip)。N+M个条带共同组成一个分条(stripe)。这N+M个条带中,每个条带对应分布式存储系统中的一个存储设备。
由于这两种存储方式都可以适用本发明实施例。因此,为了方便描述,本发明各实施例在没有特别说明的情况下,把副本或者条带都统一称呼为分片。换句话说,分片可以是: 多副本存储方式中的副本,以及EC存储方式中的条带(数据条带或者校验条带)。
一个存储服务器可以对应不超过一个分片,可以提高第一目标数据的可靠性。在另外一些情况下,也可以一个存储服务器对应至少2个分片,这样对第一目标数据进行集中存储。
应用本发明实施例,在往分布式存储系统中存储第一目标数据时,如果发往某一个存储设备的虚拟存储块分片持续写入成功,会对连续写成功的地址范围进行记录。需要说明的是,对于没有写成功的分片,当在后续通过数据恢复等方式被写入成功后,地址范围也会变为连续,因此也会视为连续写成功。因此,凡是在连续写成功的地址范围之内的数据,均是写入成功的数据。那么在客户端读数据时,如果读请求携带的读地址范围属于所述连续写成功的分片的地址范围之内,就说明读请求所欲读出的数据肯定是写成功的数据,可以不需要进行额外的校验操作直接读出后返回给客户端服务器,不用担心读出的数据是错误的。因此,本发明实施例在写数据时可以容忍慢节点的存在,也就是说即使部分分片没有写成功,只要存储成功的分片数量达到预设值,就可以认为第一目标数据成功写入分布式存储系统,向客户端服务器返回写入成功的响应消息。对于暂时没有写成功的存储设备,称为慢节点,慢节点未写成功的数据可以由已写成功的存储节点数据进行恢复。
具体而言,在多副本存储方式中,应用本发明实施例介绍的方法,只要预设数量(1≤预设数量≤副本总数,例如,一种可选的方式是1<预设数量<副本总数)的副本存储成功,就可以认为第一目标数据存储成功,而现有技术往往需要所有副本都存储成功,才认为第一目标数据成功存储到分布式存储系统,向客户端服务器返回写入成功的响应消息,造成了响应时间上的延迟。
现有技术中也存在一种称为Quorum的多副本存储机制,可以允许在多副本存储方式中,在不需要所有副本存储成功的情况下,就判定为目标数据在分布式存储系统中存储成功。但是这种Quorum机制有W(写成功的副本数量)+R(读成功的副本数量)>N(副本总数)的要求。这就意味着如果写入成功的副本数量少了,在读副本时就必需读出更多的副本。具体而言,在读副本的时候必需读出R>N-W个副本,每个副本携带版本号,以便在R个副本之间进行版本的比较,从中确定出最新版本的副本,这就造成了所谓的“读放大”的问题。而使用本发明实施例,可以直接判断出最新版本的副本在存储设备中是否写成功,副本中不需要携带版本号,也不需要进行版本比较。因此,本发明实施例通常情况下只需要读出1个副本即可,较好的解决了读放大的问题。
分布式存储系统的存储资源由存储设备提供。为了方便管理,在逻辑上可以把分布式存储系统中的存储资源划分为多个分区(partition)。分区的存储资源来自于多个存储设备(每个存储设备为分区提供一部分存储空间),其中,每个存储设备提供的存储空间的大小可以相同。
其中,为一个分区提供空间的存储设备的数量,可以少于分布式存储系统中存储设备的总数。每个分区又可以进一步划分成多个区域(extent),和分区相关的存储设备中,每个存储设备为extent提供相同大小的存储空间。在extent的管理信息中,描述了extent的大小、extent所在的存储服务器、extent所在的存储设备等信息。因此,选择了extent,也就确定了与extent对应的存储服务器以及与extent对应的存储设备,就可以向相应的存储服务器发送写请求。本发明实施例不排除不把存储资源划分为分区,而直接把存储资 源划分成extent的情况。
存储服务器使用虚拟存储单元对存储设备的存储空间进行管理,客户端服务器把分片发送给存储服务器的存储设备中的虚拟存储单元进行存储。在存储设备中,虚拟存储单元可以映射到存储设备的物理地址。
不同的extent可以支持的分片数量可以不同。为了方便理解,这里举个例子:分布式存储系统中一共有10个存储设备,与第一分区相关的存储设备是这10个存储设备中的3个。那么对于所述第一分区中的每个extent,其空间来自于这3个存储设备,并且每个存储设备为extent提供的空间大小相同,第一分区中的extent可以用于3分片(3副本的多副本存储,或者M+N=3的EC存储)的存储;假设另外有第二分区,与其相关的存储设备可以是5个存储设备,那么对所述第二分区中任意一个extent,可以支持5分片的存储。
下面结合附图2,对本发明数据读写方法的实施例进行详细介绍。
201、客户端服务器根据需要存储的第一目标数据生成多个分片。选择用于存储这些分片的extent,给选择出的extent(称为第一extent)所在的存储服务器发送写请求。
如果分片是多副本存储方式中的副本,那么生成多个分片的方式是:对第一目标数据进行复制生成2个或者2个以上副本。如果分片是EC存储方式中的条带,那么生成多个分片的方式是:把第一目标数据分成N个数据条带,然后由N个数据条带生成M个校验条带。
extent所在的存储服务器是指为extent提供存储空间的存储服务器,extent和为extent提供存储空间的存储服务器存在对应关系;extent所在的存储设备是指为extent提供存储空间的存储设备,extent和为extent提供存储空间的存储设备之间存在对应关系。
对于多副本的存储方式,不同分片的写请求的虚拟存储地址相同,相同的虚拟存储地址对应到不同的存储设备的逻辑地址。对于EC的存储方式,不同分片的写请求的虚拟存储地址不同,每个虚拟存储地址对应一个存储设备的逻辑地址。逻辑地址和存储设备的物理地址对应,存储设备可以根据逻辑地址把数据存入相应的物理地址。
客户端服务器发送给存储服务器的写请求中包括:设备地址、分片、虚拟存储地址。写请求携带设备地址的方式有两种:一种是直接在写请求的一个字段中携带设备地址;另外一种是携带设备地址的相关信息,存储服务器根据设备地址的相关信息进行处理后可以获得设备地址。
设备地址包括:服务器地址+存储设备ID,例如,设备地址是服务器IP+存储设备端口号,由所述存储设备端口号可以直接确定存储设备。写请求中也可以不直接携带设备端口号,而是携带与存储设备相关的信息,然后由存储服务器根据存储设备相关的信息确定出对应的存储设备。为了便于描述,我们把写请求(或者读请求)中携带的用于确定存储设备的信息,统称为存储设备ID。
虚拟存储地址是虚拟存储块中的位置,可以用“extent ID+起始地址”表示;或者用“extent ID+起始地址+长度”表示。其中extent ID用于标记一个extent,本实施例中标记的是用于写入分片的extent;起始地址是在extent中的相对位置,是客户端服务器分配给分片的存储空间的起始位置;长度这一信息是可选的,如果同一个数据流中传输至 少2个写请求,长度可以标记所述写请求的结束位置。由于本申请是基于追加写的存储方式。因此,在extent ID所指示的extent中,所述起始地址之前的地址是已经被分配过的地址。本发明实施例采用追加写的方式。因此给分片分配的虚拟存储地址是连续的,换句话说,在同一个extent中,本次分配的虚拟存储地址和上一次分配的虚拟存储地址相邻。
下面举个具体例子来介绍设备地址和虚拟存储地址。一个写请求的设备地址是:IP地址211.133.44.13+端口32;虚拟存储地址是:extent 5+起始地址100+分片长度50。那么对这个写请求而言,发往的存储服务器地址是211.133.44.13,发往的存储设备是211.133.44.13存储服务器的端口32所对应的存储设备。写请求所指示的存储位置是:在extent 5中以偏移量100作为起始位置,写入长度是50的分片。
虽然写请求的数量不止一个。但是由于每个写请求的处理过程是类似的,下面仅以其中一个写请求为例进行具体说明这个写请求称为第一写请求,第一写请求携带的分片称为第一分片。相应的,第一写请求中包括的设备地址称为第一设备地址,第一设备地址是第一存储服务器的第一存储设备,第一写请求中包括的分片称为第一分片,第一写请求中包括的虚拟存储地址称为第一虚拟存储地址,第一虚拟存储地址属于第一extent。第一设备地址是位于第一存储服务器的第一设备的地址。
202、所述第一存储服务器接收来自所述客户端服务器的第一写请求,把第一写请求中的第一分片存入第一存储设备的第一虚拟存储地址。当第一分片是写入第一extent的首个分片,则生成连续存储成功的地址范围,所述连续存储成功的地址范围的起始地址是第一extent的起始地址,结束地址是第一分片在第一extent中的结束地址;当第一extent写入第一extent,但不是写入第一extent的第一个分片,写入第一虚拟存储块的所有分片(包括第一分片在内的)全部写成功,则更新第一extent连续存储成功的地址范围,更新后的所述第一extent连续存储成功的地址范围的起始地址是第一extent的起始地址,结束地址是第一分片在第一extent中的结束地址。如果第一分片写入不成功,则不更新连续存储成功的地址范围,保存连续存储成功的地址范围不变。
具体而言,所述第一存储设备把所述第一分片存入与所述第一虚拟存储地址对应的存储介质。第一存储设备中记录有虚拟存储地址和逻辑块地址的映射关系,所述第一存储服务器根据这个映射关系可以把所述第一虚拟存储地址转换成第一逻辑块地址(logicblock address,LBA),然后以所述第一LBA地址作为首地址存储所述第一分片。
所述存储服务器还对所述第一存储设备中、所述第一extent自起始位置开始连续存储成功的地址范围进行存储,可以存储在所述第一存储设备中也可以存储于其他设备,只要能被所述第一存储服务器读出即可。
第一extent连续存储成功的地址范围:是指从第一extent的位于第一存储设备的那部分地址范围中,自起始位置开始持续写成功的分片的地址范围。并且这些持续写成功的分片之间是紧邻的,不存在未写数据的地址。
由于写成功的分片的地址范围的起始位置始终是extent的起始位置,因此,可以仅用写成功的分片的地址范围的末地址来来描述所述写成功的分片的地址范围。下面结合图3对连续存储成功的地址范围这一概念进行了更详细的描述。换句话说,在连续存储成功的地址范围内所有的分片都写成功,并且这些分片相邻;既不存在空闲的地址,也不存在 写失败的地址。
需要说明的是,一个extent的逻辑存储空间分布在多个存储设备中,本发明各个实施例中所说的extent连续存储成功的地址范围是针对单个存储设备而言,是指extent位于某一个存储设备中的那部分逻辑存储空间连续的成功存储了多个分片(不存在没有存储成功的分片)。这里的“连续”是指分片的末地址和后一个分片的首地址相邻,换句话说,分片之间没有空洞(hole)。对于第一extent对应的多个存储设备而言,其记录第一extent连续存储成功的地址范围的原理是相同的,为了简便,本实施例仅以第一存储设备的情况进行举例。
如图3所示,同一个extent(extent 1)一共进行了4次分片存储操作,每次存储操作的操作对象是3分片。extent1对应3个存储设备,对应于3个存储设备的逻辑存储空间分别是extent1-1,extent1-2和extent1-3。
需要说明的是,对于多副本的存储方式,一个extent对应到不同存储设备的逻辑存储空间可以用相同的虚拟地址进行描述。例如extent1-1,extent1-2和extent1-3都使用extent1进行描述,对于3个存储设备而言,收到的写请求都是对extnet1的写请求,这种情况下,同一个第一目标数据的多个写请求中,携带的虚拟存储地址相同。
对于extent1对应的每个存储设备而言,收到的第一个写请求包括的分片是分片1,这是写入extent 1的首个分片,因此从extent 1的起始位置开始存储。分片1在三个存储设备中均写入成功,那么在这三个存储设备中,连续存储成功的地址范围的末地址是分片1的末地址。对于这3个连续存储成功的地址范围的首地址,如前所述,是extent 1的起始位置。
对于extent1对应的每个存储设备而言,收到的第二个写请求包括的分片是分片2,分片2在extent 1的存储设备A和存储设备C写入成功,extent 1在存储设备B写入不成功。那么在存储设备A和存储设备C中,extent 1(具体而言是extent1位于存储设备A的extent 1-1,和extent1位于存储设备C的extent 1-3)连续存储成功的地址范围的末地址是分片2的末地址。在存储设备B中,extent 1(具体而言是extent 1-2)连续存储成功的地址范围的末地址仍然是分片1的末地址。
对于extent1对应的每个存储设备而言,收到的第三个写请求包括的分片是分片3,分片3在extent 1的3个存储设备均写入成功。那么在存储设备A和存储设备C中,extent1连续存储成功的地址范围的末地址是分片3的末地址。在存储设备B中,由于分片2没有恢复,分片2的存储位置被预留出来,形成一个“空洞”。因此extent 1连续存储成功的地址范围的末地址仍然是分片1的末地址。
对于extent1对应的每个存储设备而言,收到的第四个写请求包括的分片是分片4,分片4在extent 1的3个存储设备的均写入成功。并且此时分片3在存储设备B中成功恢复,也就是说存储设备B中曾经存在的“空洞”被分片2填充。那么在3个存储设备中,extent 1连续存储成功的地址范围的末地址都是分片3的末地址。
记录连续存储成功的地址范围的方式有多种,例如一种间接的方式是:记录写入所述目标extent的每个分片的逻辑块地址(LBA)地址范围,以及记录每个分片是否写成功,那么由此可以推导出所述目标extent地址连续存储成功的LBA范围,由此可知对应的虚拟存储地址范围。或者,直接记录所述目标extent连续存储成功的虚拟存储地址范围。 连续存储成功的地址范围的起始地址是extent的起始位置。因此,如果所述分片写成功,并且在所述extent中,所述分片之前的所有分片都写成功,那么连续存储成功的地址范围会更新为所述分片的末地址;反之,如果所述分片没有写成功,或者在所述extent中所述分片之前的存在没有写成功的分片,那么连续存储成功的地址范围不会更新。
沿用步骤201、步骤202中的具体举例,端口32所对应的第一存储设备把extent 5+偏移量100+分片长度50转换成LBA地址范围:LBA 40339215+分片长度50。并把所述分片写入这段转换生成的LBA地址范围所对应的物理存储空间。
203、所述第一存储服务器发送第一写请求的写成功响应消息给所述客户端服务器。
所述写成功响应消息和所述第一写请求携带有相同的写请求ID。所述客户端服务器收到第一次存储设备的写成功响应消息后,可以获知发往第一存储设备的第一分片被成功写入,并且,在第一存储设备中,从所述extent1的首地址开始到所述第一分片的结束地址之间不存在空洞。客户端服务器可以记录第一存储位置写入成功,把第一存储位置记录到所述第一目标数据的元数据。
写请求的数量不止一个,因此除了第一写请求的响应消息之外,所述客户端服务器可能还会接收到其他写请求的响应消息。如果所述客户端服务器收到的写成功响应达到预设数量,并且写成功响应消息所对应的存储设备中不存在没有写成功的分片,那么所述第一目标数据在所述分布式存储中存储成功。在多副本存储方式中,1≤预设数量的值≤副本总数,对于EC的存储方式,N≤预设数量的值≤N+M。
可选的,没有写成功的存储设备(慢节点),可以从写成功的存储设备中获得所述分片并进行存储,存储成功后再发送写成功响应。对于这样的存储设备,由于存储所述分片的时间被滞后,所以称其为慢存储设备。
对于没有写成功的存储设备,如果是多副本的存储方式,可以直接从其他存储设备获得需要的分片;如果是EC的存储方式,通过获取至少N个分片并进行EC校验后,可以获得需要的分片。
沿用步骤201、202、203的举例,假设预设阈值是2。那么客户端服务器收到2个成功响应,并且这两个写成功的副本所写入的extent均连续存储成功,可以认为所述第一目标数据分片在所述分布式存储系统中存储成功。客户端服务器可以记录extent 5+偏移量100+分片长度50在所述分布式存储系统中写入。
以上是写数据的流程,下面对读数据的流程进行介绍。读出数据的粒度和写入数据的粒度没有关系,可以大于、等于或者小于一个分片的长度。
现有技术中,对于多副本的存储方式,通常必须所有副本均成功存储才可以认为在所述分布式存储系统中存储成功,不允许出现慢存储设备。本发明实施例与之相比,允许有存储设备存储失败,提高了对慢存储设备的容忍度,从而提高了存储副本的总体效率。
下面介绍读流程,客户端服务器需要读出的数据是第二目标数据。读流程和写流程可以相对独立,二者没有执行时间上的先后关系。
204、所述客户端服务器发送读请求给存储服务器,所述读请求中携带存储设备地址,需要读出的数据的虚拟存储地址段。
如果是多副本的存储方式,读请求的数量可以只有一个。
如果是EC的存储方式,读请求的数量由第二目标数据所分布的存储设备数量确定。 例如:第一目标数据是30字节,以3+2(3个数据分片+2个校验分片)的方式写入分布式存储系统中的5个存储设备,每个分片的长度是10字节。其中,第一个数据分片包含第一目标数据的第1-第10字节,第二个数据分片包含第一目标数据的第11-第20字节,第三个数据分片包含第一目标数据的第21-第30字节。如果客户端服务器需要读出的第二目标数据是第11-第20字节,那么向第二数据分片所在的存储设备发出读请求,请求的数据长度是10字节。如果客户端服务器需要读出的第二目标数据是第26-第30字节,那么向第三数据分片所在的存储设备发出读请求,请求读出的数据长度是5字节。如果客户端服务器需要读出的第二目标数据是第1-第25字节,那么分别发出三个读请求,这三个读请求的目的地是三个数据分片各自所在的存储设备,这三个读请求所请求读出的数据长度分别是10字节、10字节和5字节。当然,也可以选择任意读出3个分片,通过EC校验算法重新获得所述第一目标数据,然后从中所述第一目标数据中获得所述第二目标数据。
由于多副本存储方式和EC存储方式下,存储设备对读请求的处理方式相似,因此,为了方便说明,下面仍然以其中一个读请求为例进行介绍,这个读请求命名为第一读请求。客户端发送读请求给对应的存储服务器的操作,包括了发送读第一读请求给与第一读请求对应的第一存储服务器。
第一读请求中的数据包括:第一设备地址、第一虚拟存储地址段。第一虚拟存储地址段包括:extent ID+起始地址+长度,或者extent ID+起始地址+结束地址。第一虚拟存储地址段指示的是读请求所需要读出的数据在第一虚拟存储块中的位置。
客户端服务器获得第一读请求中各项信息的方式,简单介绍如下:由第二目标数据的ID(第一目标数据的ID例如是第二目标数据的文件名,或者是第二目标数据的哈希值)可以获得第二目标数据的元数据,由第二目标数据的元数据可以知道第二目标数据对应的虚拟存储地址范围,虚拟存储地址范围中包括了extent ID,extent ID对应有分区,而分区和存储设备有对应关系。因此,客户端服务器由第二目标数据ID可以获得待读数据所在的存储设备。存储设备归属于存储服务器,因此在获得待读数据所在的存储设备后,也就获得了待读数据所在的存储服务器。
沿用步骤201的例子举例,第一读请求的设备地址是:IP地址211.133.44.13+端口32;第一虚拟存储地址段是:extent 5+起始地址100+数据长度30。那么对所述第一读请求而言,发往的服务器地址是211.133.44.13,发往的存储设备是端口32所对应的存储设备。第一读请求所指示的存储服务器的存储位置是:在extent 5中,以偏移量100作为起始位置读出长度是30的数据。
205、第一存储服务器接收第一读请求后,判断第一extent连续存储成功的地址范围是否包括了第一虚拟存储地址段。如果判断结果为是,则从所述第一存储设备中读出位于第一虚拟存储地址段的数据。如前所述,所述“连续存储成功的地址范围”这一信息可以存储在第一存储设备中。
如果判断结果为是,则第一存储服务器从第一存储设备的第二虚拟存储地址读取数据返回给客户端服务器。
如果判断结果为否,意味着在所述第一存储设备的第一extent的第一虚拟存储地址段没有存储需要读取的数据,因此可以反馈失败响应消息给所述客户端服务器。对于多副本的存储方式,由于多个存储设备存储的数据相同,所述客户端服务器在收到失败响应消 息后,可以重新生成读请求发送给下一个存储服务器(第二存储服务器),尝试从下一个存储服务器获取第一虚拟存储地址段的数据。下一个存储服务器继续进行相似的判断,直至找到第一extent连续存储成功的地址范围是否包括了第二虚拟存储地址的存储设备,并从中读取数据返回给所述客户端服务器。
以附图3为例,在存储设备A收到第四个读请求之前,客户端服务器判断存储设备A是否存储有分片2。由于在存储设备A中,extent1连续存储成功的地址范围的末端是分片3的末地址,第一extent连续存储成功的地址范围包括了分片2的虚拟存储地址,因此判断的结果为“是”。在存储设备B收到第四个读请求之前,如果判断存储设备B是否存储了分片2,由于在存储设备B中,extent1连续存储成功的地址范围的末端是分片1的末地址,因此第一extent连续存储成功的地址范围不包括分片2的虚拟存储地址,判断的结果为“否”。
206、所述第一客户端服务器获得所述第二目标数据。
对于多副本的存储方式,从第一读请求的响应消息中可以直接获得第二目标数据。
对于EC的存储方式,如果步骤205中所述客户端服务器只发出一个读请求,那么从第一读请求的响应消息中可以直接获得第二目标数据;如果步骤206中所述客户端服务器发出的读请求数量是至少2个,那么把从这多个读请求中的响应消息中获得的分片组合起来形成所述第二目标数据。
上述步骤204-206介绍了如何从分布式存储系统的第一存储设备中获得第二目标数据。还存在另外一种情况:第一存储设备无法提供客户端服务器所请求的数据,于是客户端服务器向下一个存储服务器(具体而言是存储服务器的存储设备)发送读请求,继续尝试读出客户端服务器所请求的数据,如果还是没有读成功,则继续往下一个存储服务器(具体而言是存储服务器的存储设备)发送读请求,如此循环,直至读出客户端服务器所请求的数据为止。在这种情况下,没有一次性命中客户端服务器所请求的数据,因此读数据的效率变低,但是这种情况只是发送在少数情况下,大多数情况是像不再204-206一样可以一次性命名客户端服务器所请求的数据。因为,从统计上来看,总体的读效率仍然明显高于现有技术。由于读数据的原理和步骤204-206所介绍的原理相同,只是多了几次读数据的常识,因此对这种情况简单不做详细说明,简单介绍如下。
所述第一存储服务器接收第二读请求,所述第二读请求中携带的内容参照第一读读请求,例如包括第二虚拟存储地址段和所述第一存储设备ID。与步骤205不同的是,第一存储服务器检测到:所述第二虚拟存储地址段不属于所述第一连续存储成功地址范围。于是,所述第一存储服务器发送失败响应消息给客户端服务器。
客户端服务器收到所述失败响应消息后,发送第三读请求给所述分布式存储系统中的第二存储服务器,所述第三读请求中携带的内容参照第一读读请求,例如所述第三读请求携带所述第二虚拟存储地址段和所述第二存储设备ID。需要说明的是,所述第二存储设备和所述第一存储设备不同,所述第一存储设备和所述第二存储设备对应所述第一虚拟存储块。
当所述第二虚拟存储地址段属于所述第二存储服务器中的第二连续存储成功地址范围,所述第二存储服务器从所述第二存储设备中读取所述第一虚拟存储地址段的数据返回给所述客户端服务器其中。从第二存储设备读出数据的过程可以参照步骤205。其中,所 述第二连续存储成功地址范围,指示了在所述第一虚拟存储块位于所述第二存储设备的虚拟存储地址范围之中,在从所述第一虚拟存储块的首地址开始到所述第一分片的结束地址之间连续存储有分片。
本发明还提供一种存储介质的实施方式,存储介质中记录程序代码。存储服务器通过执行程序代码可以执行上述方法。
本发明还提供一种计算机程序,存储服务器通过运行所述计算机程序可以执行上述方法。
下面介绍一种存储装置。存储装置可以由运行在存储服务器上的软件模块组成,共同执行上述步骤202-205中存储服务器所执行的操作。存储装置也可以是硬件,例如由处理器、内存和存储设备组成的硬件装置,由处理器运行内存中的程序执行步骤202-205中存储服务器的操作。
如图4所示,数据存储装置4,应用于所述分布式存储系统中的第一存储服务器,所述分布式存储系统包括多个存储服务器,每个存储服务器包括至少一个存储设备,所述分布式存储系统的每个虚拟存储块对应多个所述存储设备,该数据读写装置4包括:接口模块41,存储模块42和读取模块43。
接口模块41,用于接收写请求和读请求,所述写请求包括第一目标数据的第一分片、第一虚拟存储地址和第一存储设备ID,所述第一虚拟存储地址是第一虚拟存储块中的相对位置,所述第一虚拟存储地址与第一存储设备的第一逻辑块地址对应,其中,所述第一存储设备是所述第一虚拟存储块对应的存储设备中的一个,所述第一存储设备属于所述第一存储服务器管理。
存储模块42,用于以所述第一逻辑块地址作为起始地址在所述第一存储设备中存储所述第一分片,在所述第一分片存储完成后,当所述第一虚拟存储块位于所述第一存储设备的虚拟存储地址范围之中,从所述第一虚拟存储块的首地址开始到所述第一分片的结束地址之间连续存储有分片,把连续的地址段记录为第一连续存储成功地址范围。
所述接口模块41,还用于接收读请求,所述读请求中携带第一虚拟存储地址段,其中,第一虚拟存储地址段所对应的逻辑块地址位于所述第一存储设备。
读取模块43,用于当所述第一虚拟存储地址段属于所述连续存储成功地址范围,从所述第一存储设备中读取所述第一虚拟存储地址段的数据。
依靠本发明提供的数据存储方法和数据存储装置,存储服务器接收客户端的写请求并进行存储,每个写请求中携带:待写的分片、第一存储设备ID以及第一虚拟存储块的虚拟存储地址;如果虚拟存储块位于这个存储设备的虚拟存储空间中,从起始位置开始一直存储成功,则记录连续存储成功的地址范围。对各个存储设备而言,在连续存储成功的地址范围之内的数据都是存储成功的数据,在所述存储服务器收到客户端对这个地址范围内的地址段的读请求时,可以直接返回需要读取的据给客户端。
在追加写的存储方式中,可以按照虚拟存储地址段读出数据。如果使用本发明实施例提供的连续存储成功的地址范围,可以确保读出的数据是正确的。但是,如果不使用本发明实施例提供的连续存储成功的地址范围,有可能读出的是正确的数据,也有可能读出的错误的数据。例如:对于已经被系统删除的数据,实际上仍然可以被读出来,但显然这并不是用户所需要的数据,换句话说,读出的是错误的数据。这种情况下就必须要用到版本号,客户端服务器判断读出的数据中所携带的版本号,与自己需要读取的数据的版本号是否相同,如果相同就说明读出的数据是正确的,否则读出的就是错误数据。

Claims (19)

  1. 一种数据读写方法,应用于分布式存储系统中的第一存储服务器,所述分布式存储系统包括多个存储服务器,每个存储服务器包括至少一个存储设备,所述分布式存储系统的存储空间用虚拟存储块进行管理,每个所述虚拟存储块对应多个所述存储设备,其特征在于,该方法包括:
    第一存储服务器接收第一写请求,所述第一写请求包括第一目标数据的第一分片、第一虚拟存储地址和第一存储设备ID,所述第一虚拟存储地址是第一虚拟存储块中的相对位置,所述第一虚拟存储地址与第一存储设备的第一逻辑块地址对应,其中,所述第一存储设备是所述第一虚拟存储块对应的存储设备中的一个,所述第一存储设备属于所述第一存储服务器管理;
    所述第一存储服务器以所述第一逻辑块地址作为起始地址在所述第一存储设备中存储所述第一分片,在所述第一分片存储完成后,从所述第一虚拟存储块的首地址开始到所述第一分片的结束地址之间连续存储有分片,把连续的地址段记录为第一连续存储成功地址范围;
    所述第一存储服务器接收第一读请求,所述第一读请求中携带第一虚拟存储地址段和所述第一存储设备ID,其中,第一虚拟存储地址段所对应的逻辑块地址位于所述第一存储设备,当所述第一虚拟存储地址段属于所述第一连续存储成功地址范围,从所述第一存储设备中读取所述第一虚拟存储地址段的数据。
  2. 根据权利要求1中任一所述的方法,其中:
    所述第一连续存储成功地址范围中存储有一个分片或者多个分片,当存储的所述分片数量是多个时,存储的所述分片之间相邻。
  3. 根据权利要求1所述的方法,其中,所述第一分片是下述一种:
    所述第一目标数据的一个副本;
    所述第一目标数据的一个纠错码EC数据条带;
    所述第一目标数据的一个纠错码EC校验条带。
  4. 根据权利要求1所述的方法,其中,
    所述第一连续存储成功地址范围用所述第一虚拟存储块的虚拟存储地址范围的形式进行描述,或者,所述第一连续存储成功地址范围以所述第一存储设备中的逻辑块地址范围的形式进行记录。
  5. 根据权利要求1所述的方法,其中:
    每个所述分片是所述第一目标数据的一个副本。
  6. 根据权利要求5所述的方法,其中,还包括:
    所述第一存储服务器接收第二读请求,所述第二读请求中携带第二虚拟存储地址段和所述第一存储设备ID,其中,第一存储服务器检测到:所述第二虚拟存储地址段不属于所述第一连续存储成功地址范围;
    所述第一存储服务器发送失败响应消息给客户端服务器;
    客户端服务器收到所述失败响应消息后,发送第三读请求给所述分布式存储系统中的第二存储服务器,所述第三读请求携带所述第二虚拟存储地址段和所述第二存储设备ID,所述 第二存储设备和所述第一存储设备不同,所述第一存储设备和所述第二存储设备对应所述第一虚拟存储块;
    当所述第二虚拟存储地址段属于所述第二存储服务器中的第二连续存储成功地址范围,所述第二存储服务器从所述第二存储设备中读取所述第一虚拟存储地址段的数据返回给所述客户端服务器其中,所述第二连续存储成功地址范围,指示了在所述第一虚拟存储块位于所述第二存储设备的虚拟存储地址范围之中,在从所述第一虚拟存储块的首地址开始到所述第一分片的结束地址之间连续存储有分片。
  7. 根据权利要求1所述的方法,在第一存储服务器接收写请求之前,还包括:
    客户端服务器生成第一目标数据的多个分片,选择用于存储第一目标数据的第一虚拟存储块,向第一虚拟存储块对应的存储设备所在存储服务器发送包括所述第一写请求在内的多个写请求,每个写请求中包括一个所述分片以及对应的虚拟存储地址。
  8. 根据权利要求7所述的方法,其中:
    所述存储客户端收到预设数量的成功响应消息后,所述第一目标数据在上述分布式存储系统中存储成功,其中,所述预设数量的成功响应消息的总数小于所述多个写请求的总数。
  9. 根据权利要求1-7任一项所述的方法,其中:
    所述第一存储服务器是基于追加写的方式写入所述第一分片。
  10. 一种数据存储装置,应用于所述分布式存储系统中的第一存储服务器,所述分布式存储系统包括多个存储服务器,每个存储服务器包括至少一个存储设备,所述分布式存储系统的每个虚拟存储块对应多个所述存储设备,其特征在于,该数据读写装置包括:
    接口模块,用于接收写请求和读请求,所述写请求包括第一目标数据的第一分片、第一虚拟存储地址和第一存储设备ID,所述第一虚拟存储地址是第一虚拟存储块中的相对位置,所述第一虚拟存储地址与第一存储设备的第一逻辑块地址对应,其中,所述第一存储设备是所述第一虚拟存储块对应的存储设备中的一个,所述第一存储设备属于所述第一存储服务器管理;
    存储模块,用于以所述第一逻辑块地址作为起始地址在所述第一存储设备中存储所述第一分片,在所述第一分片存储完成后,从所述第一虚拟存储块的首地址开始到所述第一分片的结束地址之间连续存储有分片,把连续的地址段记录为第一连续存储成功地址范围;
    所述接口模块,还用于接收读请求,所述读请求中携带第一虚拟存储地址段和所述第一存储设备ID,其中,第一虚拟存储地址段所对应的逻辑块地址位于所述第一存储设备;
    读取模块,用于当所述第一虚拟存储地址段属于所述连续存储成功地址范围,从所述第一存储设备中读取所述第一虚拟存储地址段的数据。
  11. 根据权利要求10所述的数据存储装置,其中:
    所述连续存储成功地址范围中存储有一个分片或者多个分片,当存储的所述分片数量是多个时,存储的所述分片之间相邻。
  12. 根据权利要求10所述的数据存储装置,其中,
    所述第一连续存储成功地址范围用所述第一虚拟存储块的虚拟存储地址范围进行描述;或者,所述第一连续存储成功地址范围以所述第一存储设备中的逻辑块地址范围进行记录。
  13. 根据权利要求10所述的数据存储装置,其中:所述第一分片是下述一种:
    所述第一目标数据的一个副本;
    所述第一目标数据的一个纠错码EC数据条带;
    所述第一目标数据的一个纠错码EC校验条带。
  14. 根据权利要求10-13任一项所述的数据存储,其中:
    所述存储模块基于追加写方式的存储所述第一分片。
  15. 一种第一存储服务器,位于于分布式存储系统中的第一存储服务器,所述分布式存储系统包括多个存储服务器,每个存储服务器包括处理器和至少一个存储设备,所述分布式存储系统的存储空间用虚拟存储块进行管理,每个所述虚拟存储块对应多个所述存储设备,所述第一存储服务器的处理器用于运行所述计算机指令执行:
    接收写请求,所述写请求包括第一目标数据的第一分片、第一虚拟存储地址和第一存储设备ID,所述第一虚拟存储地址是第一虚拟存储块中的相对位置,所述第一虚拟存储地址与第一存储设备的第一逻辑块地址对应,其中,所述第一存储设备是所述第一虚拟存储块对应的存储设备中的一个,所述第一存储设备属于所述第一存储服务器管理;
    以所述第一逻辑块地址作为起始地址在所述第一存储设备中存储所述第一分片,在所述第一分片存储完成后,从所述第一虚拟存储块的首地址开始到所述第一分片的结束地址之间连续存储有分片,把连续的地址段记录为第一连续存储成功地址范围;
    接收读请求,所述读请求中携带第一虚拟存储地址段,其中,第一虚拟存储地址段所对应的逻辑块地址位于所述第一存储设备,当所述第一虚拟存储地址段属于所述连续存储成功地址范围,从所述第一存储设备中读取所述第一虚拟存储地址段的数据。
  16. 根据权利要求14所述的第一存储服务器,其中:
    所述连续存储成功地址范围中存储有一个分片或者至少2个分片,当存储的所述分片数量是至少2个时,存储的所述分片之间相邻。
  17. 根据权利要求14所述的第一存储服务器,其中,所述第一分片是下述一种:
    所述第一目标数据的一个副本;
    所述第一目标数据的一个纠错码EC数据条带;
    所述第一目标数据的一个纠错码EC校验条带。
  18. 根据权利要求14所述的第一存储服务器,其中,
    所述第一连续存储成功地址范围用所述第一虚拟存储块的虚拟存储地址范围进行描述,或者,所述第一连续存储成功地址范围以所述第一存储设备中的逻辑块地址范围进行记录。
  19. 根据权利要求14-18任一项所述的第一存储服务器,其中:
    所述第一存储服务器是基于追加写的方式写入所述第一分片。
PCT/CN2018/071637 2017-10-25 2018-01-05 一种数据读写方法、装置和存储服务器 WO2019080370A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
EP18870458.9A EP3690630A4 (en) 2017-10-25 2018-01-05 DATA READING AND WRITING PROCESS AND APPARATUS, AND STORAGE SERVER
CN201880000062.9A CN110651246B (zh) 2017-10-25 2018-01-05 一种数据读写方法、装置和存储服务器
US16/856,257 US11397668B2 (en) 2017-10-25 2020-04-23 Data read/write method and apparatus, and storage server

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CNPCT/CN2017/107695 2017-10-25
PCT/CN2017/107695 WO2019080015A1 (zh) 2017-10-25 2017-10-25 一种数据读写方法、装置和存储服务器

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/856,257 Continuation US11397668B2 (en) 2017-10-25 2020-04-23 Data read/write method and apparatus, and storage server

Publications (1)

Publication Number Publication Date
WO2019080370A1 true WO2019080370A1 (zh) 2019-05-02

Family

ID=66247132

Family Applications (2)

Application Number Title Priority Date Filing Date
PCT/CN2017/107695 WO2019080015A1 (zh) 2017-10-25 2017-10-25 一种数据读写方法、装置和存储服务器
PCT/CN2018/071637 WO2019080370A1 (zh) 2017-10-25 2018-01-05 一种数据读写方法、装置和存储服务器

Family Applications Before (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/107695 WO2019080015A1 (zh) 2017-10-25 2017-10-25 一种数据读写方法、装置和存储服务器

Country Status (4)

Country Link
US (1) US11397668B2 (zh)
EP (1) EP3690630A4 (zh)
CN (1) CN110651246B (zh)
WO (2) WO2019080015A1 (zh)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11334279B2 (en) 2019-11-14 2022-05-17 Western Digital Technologies, Inc. Hierarchical blacklisting of storage system components
US11314431B2 (en) * 2019-11-14 2022-04-26 Western Digital Technologies, Inc. Distributed data blocks using storage path cost values
US11340807B2 (en) * 2019-12-17 2022-05-24 Vmware, Inc. Mounting a shared data store of a server cluster on a client cluster for use as a remote data store
CN113360077B (zh) * 2020-03-04 2023-03-03 华为技术有限公司 数据存储方法、计算节点及存储系统
CN111399780B (zh) * 2020-03-19 2021-08-24 蚂蚁金服(杭州)网络技术有限公司 一种数据的写入方法、装置以及设备
CN113076068B (zh) * 2021-04-27 2022-10-21 哈尔滨工业大学(深圳) 一种数据存储方法、装置、电子设备及可读存储介质
CN116204136B (zh) * 2023-05-04 2023-08-15 山东浪潮科学研究院有限公司 一种数据存储、查询方法、装置、设备及存储介质
CN116580739B (zh) * 2023-07-14 2023-11-03 上海海栎创科技股份有限公司 一种快速掩膜编程rom自定时方法、电路及电子装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103064765A (zh) * 2012-12-28 2013-04-24 华为技术有限公司 数据恢复方法、装置及集群存储系统
CN105630808A (zh) * 2014-10-31 2016-06-01 北京奇虎科技有限公司 基于分布式文件系统的文件读取、写入方法及节点服务器
US20160357440A1 (en) * 2015-06-04 2016-12-08 Huawei Technologies Co.,Ltd. Data distribution method, data storage method, related apparatus, and system
CN106302702A (zh) * 2016-08-10 2017-01-04 华为技术有限公司 数据的分片存储方法、装置及系统
CN107273048A (zh) * 2017-06-08 2017-10-20 浙江大华技术股份有限公司 一种数据写入方法及装置

Family Cites Families (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7072971B2 (en) * 2000-11-13 2006-07-04 Digital Foundation, Inc. Scheduling of multiple files for serving on a server
JP5431148B2 (ja) * 2006-05-31 2014-03-05 インターナショナル・ビジネス・マシーンズ・コーポレーション ストレージ用論理データオブジェクトの変換方法およびシステム
JP4148529B2 (ja) * 2006-12-28 2008-09-10 インターナショナル・ビジネス・マシーンズ・コーポレーション データベースにおける索引の整合性をチェックするためのシステム、方法およびプログラム
KR100925441B1 (ko) 2008-01-07 2009-11-06 엘지전자 주식회사 분산형 가상자원블록 스케쥴링 방법
US8688867B2 (en) * 2008-02-13 2014-04-01 Honeywell International Inc. System and methods for communicating between serial communications protocol enabled devices
US8275966B2 (en) * 2009-07-30 2012-09-25 Cleversafe, Inc. Dispersed storage network virtual address generations
US9201732B2 (en) * 2010-01-28 2015-12-01 Cleversafe, Inc. Selective activation of memory to retrieve data in a dispersed storage network
WO2012016089A2 (en) * 2010-07-28 2012-02-02 Fusion-Io, Inc. Apparatus, system, and method for conditional and atomic storage operations
US9454431B2 (en) * 2010-11-29 2016-09-27 International Business Machines Corporation Memory selection for slice storage in a dispersed storage network
US8793328B2 (en) * 2010-12-17 2014-07-29 Facebook, Inc. Distributed storage system
US8583852B1 (en) * 2011-09-01 2013-11-12 Symantec Operation Adaptive tap for full virtual machine protection
CN102622412A (zh) * 2011-11-28 2012-08-01 中兴通讯股份有限公司 一种分布式文件系统中的并发写入方法及装置
US9203625B2 (en) * 2011-11-28 2015-12-01 Cleversafe, Inc. Transferring encoded data slices in a distributed storage network
US9465861B2 (en) * 2012-01-31 2016-10-11 International Business Machines Corporation Retrieving indexed data from a dispersed storage network
US9277011B2 (en) * 2012-10-30 2016-03-01 International Business Machines Corporation Processing an unsuccessful write request in a dispersed storage network
US9456035B2 (en) * 2013-05-03 2016-09-27 International Business Machines Corporation Storing related data in a dispersed storage network
US9565252B2 (en) * 2013-07-31 2017-02-07 International Business Machines Corporation Distributed storage network with replication control and methods for use therewith
US9998538B2 (en) * 2013-08-29 2018-06-12 International Business Machines Corporation Dispersed storage with coordinated execution and methods for use therewith
CN103593147B (zh) * 2013-11-07 2016-08-17 华为技术有限公司 一种数据读取的方法及装置
WO2015077955A1 (zh) * 2013-11-28 2015-06-04 华为技术有限公司 一种写数据方法、装置和系统
CN105100146B (zh) * 2014-05-07 2018-07-20 腾讯科技(深圳)有限公司 数据存储方法、装置及系统
CN105468718B (zh) * 2015-11-18 2020-09-08 腾讯科技(深圳)有限公司 数据一致性处理方法、装置和系统
EP3220275B1 (en) * 2015-12-03 2020-11-04 Huawei Technologies Co., Ltd. Array controller, solid state disk and data writing control method for solid state disk

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103064765A (zh) * 2012-12-28 2013-04-24 华为技术有限公司 数据恢复方法、装置及集群存储系统
CN105630808A (zh) * 2014-10-31 2016-06-01 北京奇虎科技有限公司 基于分布式文件系统的文件读取、写入方法及节点服务器
US20160357440A1 (en) * 2015-06-04 2016-12-08 Huawei Technologies Co.,Ltd. Data distribution method, data storage method, related apparatus, and system
CN106302702A (zh) * 2016-08-10 2017-01-04 华为技术有限公司 数据的分片存储方法、装置及系统
CN107273048A (zh) * 2017-06-08 2017-10-20 浙江大华技术股份有限公司 一种数据写入方法及装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3690630A4 *

Also Published As

Publication number Publication date
CN110651246A (zh) 2020-01-03
US11397668B2 (en) 2022-07-26
EP3690630A1 (en) 2020-08-05
CN110651246B (zh) 2020-12-25
EP3690630A4 (en) 2021-03-03
WO2019080015A1 (zh) 2019-05-02
US20200250080A1 (en) 2020-08-06

Similar Documents

Publication Publication Date Title
WO2019080370A1 (zh) 一种数据读写方法、装置和存储服务器
US10467246B2 (en) Content-based replication of data in scale out system
US9298386B2 (en) System and method for improved placement of blocks in a deduplication-erasure code environment
US9336076B2 (en) System and method for controlling a redundancy parity encoding amount based on deduplication indications of activity
US9411685B2 (en) Parity chunk operating method and data server apparatus for supporting the same in distributed raid system
US20170308437A1 (en) Parity protection for data chunks in an object storage system
US8972779B2 (en) Method of calculating parity in asymetric clustering file system
US11074129B2 (en) Erasure coded data shards containing multiple data objects
US20160246677A1 (en) Virtual chunk service based data recovery in a distributed data storage system
WO2015010394A1 (zh) 数据发送方法、数据接收方法和存储设备
US20110055471A1 (en) Apparatus, system, and method for improved data deduplication
JP2017518565A (ja) 散在ストレージ・ネットワークにおける多世代記憶されたデータの読取り
US10725662B2 (en) Data updating technology
WO2019001521A1 (zh) 数据存储方法、存储设备、客户端及系统
WO2011140991A1 (zh) 分布式文件系统的文件处理方法及装置
WO2019137323A1 (zh) 一种数据存储方法、装置及系统
JP2016513306A (ja) データ格納方法、データストレージ装置、及びストレージデバイス
WO2018120844A1 (zh) 一种差异数据备份方法和差异数据备份装置
CN109445681B (zh) 数据的存储方法、装置和存储系统
WO2015085529A1 (zh) 数据复制方法、数据复制装置和存储设备
US9798638B2 (en) Systems and methods providing mount catalogs for rapid volume mount
WO2021088586A1 (zh) 一种存储系统中的元数据的管理方法及装置
WO2023197937A1 (zh) 数据处理方法及其装置、存储介质、计算机程序产品
WO2020034695A1 (zh) 数据存储方法、数据恢复方法、装置、设备及存储介质
US20180293137A1 (en) Distributed storage system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18870458

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2018870458

Country of ref document: EP

Effective date: 20200429