WO2011157144A2 - 数据读写方法、装置和存储系统 - Google Patents

数据读写方法、装置和存储系统 Download PDF

Info

Publication number
WO2011157144A2
WO2011157144A2 PCT/CN2011/075048 CN2011075048W WO2011157144A2 WO 2011157144 A2 WO2011157144 A2 WO 2011157144A2 CN 2011075048 W CN2011075048 W CN 2011075048W WO 2011157144 A2 WO2011157144 A2 WO 2011157144A2
Authority
WO
WIPO (PCT)
Prior art keywords
data
storage
operation request
read
node
Prior art date
Application number
PCT/CN2011/075048
Other languages
English (en)
French (fr)
Other versions
WO2011157144A3 (zh
Inventor
刘显
王道辉
杨德平
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN201180000715.1A priority Critical patent/CN102918509B/zh
Priority to EP11795110.3A priority patent/EP2698718A2/en
Priority to PCT/CN2011/075048 priority patent/WO2011157144A2/zh
Publication of WO2011157144A2 publication Critical patent/WO2011157144A2/zh
Publication of WO2011157144A3 publication Critical patent/WO2011157144A3/zh
Priority to US13/706,068 priority patent/US8938604B2/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • G06F12/1009Address translation using page tables, e.g. page table structures
    • G06F12/1018Address translation using page tables, e.g. page table structures involving hashing techniques, e.g. inverted page tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2053Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant
    • G06F11/2094Redundant storage or storage space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0626Reducing size or complexity of storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/065Replication mechanisms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device

Definitions

  • the embodiments of the present invention relate to the field of information technologies, and in particular, to a data reading and writing method, apparatus, and storage system. Background technique
  • a typical block storage system can be simplified into two parts: an initiator end and a target end.
  • the initiator generates a volume locally and establishes a connection with the target end, and at the same time inputs an input to the local device file (Input; 1) /Output (hereinafter referred to as: 0)
  • the request is forwarded to the target for processing; the target manages the storage device and processes the final I/O request; through the Internet based small computer system interface (internet Small Computer System) Interface; hereinafter referred to as: iSCSI) / Fibre Channel (Fibre Channel; hereinafter referred to as FC) / Advanced Technology Attachment over Ethernet (AOE) I Network Block Device ; hereinafter referred to as: NBD )
  • the block interface of the storage protocol communicates.
  • the originating end mainly includes an access unit entity, implements local volume management, and establishes a connection and communicates with the target end through various protocols (for example, iSCSI, FC, or AOE, etc.); the target end mainly includes a volume control unit and independent redundancy. Redundant Array of Independent Disk (hereinafter referred to as: RAID) control unit.
  • the RAID control unit manages specific physical disks and builds RAID groups, and simultaneously forms logical disks.
  • the volume control unit manages RAID controllers.
  • the logical disks generated by the unit are divided into logical volumes as needed, and the logical volumes are exposed through protocols such as iSCSI, FC, or AOE for use by the initiator.
  • the block storage service provided by the target side has certain restrictions on reliability, availability, scalability, and low cost.
  • the prior art mainly ensures data reliability by constructing RAID in the target end machine and using multiple control head redundancy (such as dual controller disk array), but if If the machine refuses to have a power failure and two or more controller heads fail at the same time, data loss or service interruption may occur, which further affects the availability.
  • IP SAN Internet Storage Area Network
  • FC Storage Area Network FC Storage Area Network
  • the disk array built by the traditional RAID group needs to be replaced in time after the disk in the RAID group fails, and the RAID is rebuilt to ensure redundancy and data reliability. This requires maintenance personnel to replace it at any time. The system cannot handle such failures automatically;
  • IP SAN or FC SAN is relatively expensive as the target end, especially FC SAN, and the related switching equipment needed for it is also expensive; the storage server as the target end cost is relatively low, but due to storage The server has low requirements on the processor and memory, and the performance of the disk I/O is low.
  • Embodiments of the present invention provide a data reading and writing method, apparatus, and storage system to improve reliability, availability, scalability, and low cost of a storage system.
  • An embodiment of the present invention provides a data reading and writing method, including:
  • the block storage driver of the originating device receives a logical block addressing based operation on the volume Request
  • the block storage driver converts the operation request based on the logical block addressing mode into an operation request based on a key value addressing mode, and the operation request based on the key value addressing mode carries a key value corresponding to the data to be operated;
  • the block storage driver sends the operation request based on the key value addressing mode to the routing library, so that the routing library sends the operation request based on the key value addressing mode according to the key value corresponding to the data to be operated
  • the storage master node and the at least one backup node of the to-be-operated data, and the storage master node and the at least one backup node perform a read or write operation on the to-be-operated data.
  • the embodiment of the invention further provides a data reading and writing device, comprising:
  • a receiving module configured to receive a logical block addressing mode-based operation request for the volume
  • a conversion module configured to convert the logical block addressing mode-based operation request into an operation request based on the key value addressing mode, where An operation request based on the key value addressing mode carries a key value corresponding to the to-be-operated data
  • a sending module configured to send the operation request based on the key value addressing mode to the routing library, so that the routing library sends the operation request based on the key value addressing manner according to the key value corresponding to the to-be-operated data
  • the storage master node and the at least one backup node of the to-be-operated data, and the storage master node and the at least one backup node perform a read or write operation on the to-be-operated data.
  • the present invention also provides a storage system, including: a block storage driver, a routing library, a storage primary node, and at least one backup node;
  • the block storage driver configured to receive a logical block addressing mode based operation request for a volume, convert the logical block addressing mode based operation request into an operation request based on a key value addressing mode, and An operation request based on the key value addressing mode is sent to the routing library, and the operation request based on the key value addressing mode carries a key value corresponding to the data to be operated;
  • the routing library configured to receive the key-value based addressing manner sent by the block storage driver
  • the operation request, the operation request based on the key value addressing mode is sent to the storage primary node and the at least one backup node of the data to be operated according to the key value corresponding to the data to be operated;
  • the storage master node uses Receiving the operation request based on the key value addressing manner sent by the routing library, and performing a reading or writing operation on the to-be-operated data;
  • the at least one backup node is configured to receive the operation request based on the key value addressing manner sent by the routing library, and perform a read or write operation on the to-be-operated data.
  • the operation request based on the logical block addressing mode is converted into the operation based on the key value addressing mode.
  • the operation request of the mode is sent to the storage master node and the at least one backup node of the data to be operated, and the storage master node and the at least one backup node perform the reading or writing operation on the to-be-operated data; thereby improving the reliability of the storage system, Availability, scalability and affordability for high-volume, high-reliability, high-availability and low-cost mass storage needs.
  • FIG. 1 is a flow chart of an embodiment of a data reading and writing method according to the present invention.
  • FIG. 2 is a flow chart of an embodiment of a method for creating a local volume according to the present invention
  • FIG. 3 is a flow chart of an embodiment of a data writing method according to the present invention.
  • FIG. 4 is a flow chart of an embodiment of a data reading method according to the present invention.
  • FIG. 5 is a schematic structural diagram of an embodiment of a data read/write device according to the present invention
  • FIG. 6 is a schematic structural diagram of another embodiment of a data read/write device according to the present invention
  • FIG. 7 is a schematic structural diagram of an embodiment of a storage system according to the present invention.
  • FIG. 8 is a schematic diagram of an embodiment of a route supported by a storage system of the present invention. detailed description
  • FIG. 1 is a flowchart of an embodiment of a data reading and writing method according to the present invention. As shown in FIG. 1, the data reading and writing method may include:
  • Step 101 The block storage driver of the originating device receives an operation request based on a Logical Block Addressing (LBA) manner for the volume.
  • LBA Logical Block Addressing
  • the initiating device may be any application server.
  • the embodiment does not limit the specific configuration of the initiating device.
  • Step 102 The block storage driver converts the operation request based on the LBA mode into an operation request based on a key value (Key) mode, and the operation request based on the Key addressing mode carries a key corresponding to the data to be operated.
  • Key key value
  • Step 103 The block storage driver sends an operation request based on the Key addressing mode to the routing library, so that the routing library sends an operation request based on the Key addressing mode to the storage primary node of the to-be-operated data according to the key corresponding to the data to be operated. And at least one backup node, wherein the storage master node and the at least one backup node perform a read or write operation on the to-be-operated data.
  • the initiating device before the block storage driver of the initiating device receives the LBA-based operation request for the volume, the initiating device initializes and starts the block storage driver, and saves an Internet protocol of at least one storage node in the storage system (Internet) Protocol; IP) address and service port; then, when the routing library is located in the above block storage driver, the originating device saves the connection between the block storage driver and all or part of the storage nodes in the storage system; or, when the routing library is located in the storage node
  • the block storage driver directly establishes a connection between the block storage driver and the storage node; next, the originating device can receive a command to create a volume, the volume name of the volume to be created, and the volume size of the volume to be created;
  • the device's block storage driver can create a volume logical device in the local operating system of the initiating device according to the above command.
  • routing library also needs to obtain the IP address, service port, and storage node responsible for each storage node in the storage system.
  • a hash area and establishes a connection between the routing library and all or part of the storage nodes in the storage system. The above process can be implemented to create a local volume on the originating device.
  • the routing library sends the operation request based on the key addressing mode to the storage primary node and the at least one backup node of the to-be-operated data according to the key corresponding to the data to be operated.
  • the key corresponding to the data is hashed, and the storage node of the hash region to which the key belongs is determined to be the storage master node of the data to be operated, and the operation request based on the Key addressing mode is sent to the storage master node of the to-be-operated data.
  • determining at least one backup node of the to-be-operated data according to a predetermined backup policy and sending an operation request based on the Key addressing mode to the at least one backup node.
  • the storage node in the storage system has a node identifier, and when determining at least one backup node of the to-be-operated data, the node identifier may be changed from small to large or large to small according to a predetermined backup policy.
  • the sequence sequentially selects at least one backup node of the data to be operated; for example, when the predetermined backup policy has two backups for each data to be operated, the node identifier may be changed from small to large or by The large to small order selects two backup nodes in turn as backup nodes for data to be operated.
  • the write operation request when the operation request is a write operation request, and the to-be-operated data is data to be written, the write operation request may further carry the data to be written, where the primary node and the at least one backup are stored.
  • the node can write to the operation data:
  • the storage master node and at least one backup node will be written according to the key value of the data to be written Write to the local, and record the version of the write data, after the write operation is completed, return a write operation response to the routing library.
  • the routing library may receive the above write operation response according to a preset write operation policy, and calculate the number of successful write operations, and return to the block storage driver for the above-mentioned Key based on the number of successful write operations and the preset write operation policy. Address mode write request response.
  • the storage master node and the at least one backup node may perform the read operation on the operation data: And at least one backup node reads the locally stored data to be read and the version of the data to be read according to the key value of the data to be read, and returns the read data, the version of the read data, and the read operation response to the above-mentioned routing library.
  • the routing library can receive the returned data according to the preset read operation policy, and calculate the number of successful read operations, and recognize and return to the block storage driver according to the number of successful read operations and the preset read operation policy.
  • the latest version of the data in the data so that the block storage driver processes the data of the latest version of the above data, and returns the data corresponding to the LBA-based read operation request.
  • the block storage driver, the routing library, the storage master node, and the at least one backup node may all be implemented based on a Distributed Hash Table (hereinafter referred to as DHT) technology.
  • DHT Distributed Hash Table
  • the storage system can provide read and write at all times, and the data cannot be read or written due to a storage node failure in the storage system. Therefore, the data reading and writing method provided in this embodiment can improve the availability of the storage system.
  • the characteristics of the DHT technology itself That is, the high scalability, so the data read and write method provided by this embodiment can improve the scalability of the storage system;
  • a general hardware device such as: A personal computer (hereinafter referred to as a PC) can be used. Therefore, the data reading and writing method provided in this embodiment can improve the storage system.
  • the data reading and writing method provided by the embodiment can meet the requirements of high expansion, high reliability, high availability and low-cost mass storage.
  • a block storage driver is used as a DHT block storage driver
  • a routing library is a DHT routing library
  • a storage master node and at least one backup node is a node in a DHT-based key-value storage system.
  • the DHT-based Key-value storage system is a distributed Key-value storage system implemented by using DHT technology, where Key is a unique identifier of data, and value is data content.
  • FIG. 2 is a flowchart of an embodiment of a method for creating a local volume according to the present invention. As shown in FIG. 2, the method may include:
  • Step 201 The initiator device initializes and starts the DHT block storage driver.
  • the initiating device specifies a Uniform Resource Locator (hereinafter referred to as: URL) list of the DHT-based Key-value storage system storage node, where the URL list may save an IP address and a service port of at least one storage node;
  • URL Uniform Resource Locator
  • the purpose of saving the IP address and service port of at least one storage node is mainly to use other storage nodes if a storage node cannot communicate normally.
  • Step 202 Initialize the DHT routing library.
  • the main purpose of initializing the DHT routing library is to establish a connection with the DHT-based Key-value storage system. Specifically, if the DHT routing library is located in the DHT block storage driver, the initiator device saves a connection pool after initializing the DHT routing library, and the connection pool includes the DHT block storage driver of the originating device and the DHT-based key-value storage.
  • connection pool in the originating device stores the connection between the DHT routing library and all or part of the storage nodes in the DHT-based Key-value storage system; if the DHT routing library is based on DHT In the storage node of the Key-value storage system, the DHT block storage driver will directly establish a connection with the storage node having the DHT routing library in the DHT-based Key-value storage system.
  • Step 203 The DHT routing library establishes a connection with all or part of the storage nodes in the DHT-based Key-value storage system.
  • the DHT routing library also obtains information about each storage node in the DHT-based Key-value storage system, including the IP address, port, and responsible Hash region of each storage node.
  • the DHT routing library is maintained in the form of a pool for the connection of the DHT routing library to all or part of the storage nodes in the DHT-based Key-value storage system.
  • Step 204 The DHT block storage driver of the initiating device receives a command to create a volume, where the command includes a volume name and a volume size of the volume to be created.
  • the volume name may be any character, string, and/or number.
  • the representation of the volume name is not limited in this embodiment, as long as the volume name of the volume to be created is unique within the storage system.
  • Step 205 After receiving the foregoing command, the DHT block storage driver of the initiating device creates a volume logical device in the local operating system of the initiating device according to the foregoing command. At this point, the local volume is created.
  • FIG. 3 is a flowchart of an embodiment of a data writing method according to the present invention. In this embodiment, it is assumed that there are three copies of each data. As shown in FIG. 3, the data writing method may include:
  • Step 301 The DHT block storage driver receives an LBA-based write operation request for the volume.
  • the LBA-based write operation request carries the data to be written, and the starting sector number to be written and the number of sectors to be written.
  • the sector is the smallest access unit of the disk, and the default sector size of the existing disk is 512 bytes (Byte).
  • Step 302 The DHT block storage driver converts the LBA mode-based write operation request into a key addressing mode-based write operation request, where the key addressing mode-based write operation request carries the to-be-written data and the to-be-written data. Key.
  • the specific conversion manner of converting the write operation request based on the LBA method into the write operation request based on the Key addressing mode may be various, and a typical conversion manner may be used.
  • Key volume name + (LBA number based on LBA mode write request x 512 / value block size); where division only takes the integer part of the quotient.
  • the volume name is the volume name of the volume.
  • the LBA number of the LBA-based write operation request is an integer number carried in the LBA-based write operation request.
  • 512 is the default sector size of the existing disk, and the value data block size. Refers to the fixed length of the value corresponding to each key stored in the DHT-based Key-value storage system.
  • LBA-based write operation requests with LBA numbers 32, 33, 34, and 35 are actually translated into write operations based on the "nbd0_4" key addressing.
  • the length of a write operation request based on the LBA method (that is, the number of sectors to be written) has a mapping relationship with the number of write operation requests based on the Key addressing mode after conversion. If the length of the data to be written in the LBA-based write operation request is greater than the value data block size, the LBA-based write operation request is converted into at least two write operations based on the Key addressing mode. For example, if the LBA numbers carried in the LBA-based write operation request are 32, 33, 34, 35, 36, 37, and 38, the LBA-based write operation request is converted into the following conversion method. Two write operations based on Key addressing.
  • the write request corresponding to the sectors to be written is 36, 37, and 38.
  • Step 303 The DHT block storage driver sends a write operation request based on the Key addressing mode to the DHT routing library.
  • the DHT routing library may be located in the DHT block storage driver or in the storage node of the DHT-based Key-value storage system. If the DHT routing library is located in the DHT block storage driver, the DHT block storage driver may be locally The language interface call sends a write operation request based on the Key addressing mode to the DHT routing library; if the DHT routing library is located in the memory In the storage node, the DHT block storage driver can interact with the storage node where the DHT routing library is located, and send a write operation request based on the Key addressing mode to the DHT routing library.
  • Step 304 The DHT routing library sends a write operation request based on the Key addressing mode to the storage master node to be written data.
  • the DHT routing library first hashes the key of the to-be-written data carried in the received key-address-based write operation request, and then determines the storage node of the Hash area in which the key responsible for the Hash is located to be the data to be written. The storage master node, and finally, the DHT routing library sends a write operation request based on the Key addressing mode to the storage master node.
  • Step 305 The storage master node writes the data to be written to the local area according to the key value of the data to be written, and records the version of the written data.
  • the representation of the version of the written data may be a timestamp, a vector clock, or other methods.
  • the manner in which the version of the written data is expressed in this embodiment is not limited.
  • Step 306 The storage master node returns a write operation response to the DHT routing library.
  • the storage master if the write is successful, the storage master returns a write success response to the DHT routing library; if the write fails, the storage master returns a write failure response to the DHT routing library.
  • a backup node sends a write operation request based on Key addressing mode to the first backup node.
  • the backup policy can be a backup policy across the rack or across the data center.
  • the backup node can be used.
  • Step 308 The first backup node writes the data to be written to the local according to the key value of the data to be written, and records the version of the written data.
  • the version of the written data may be represented by a timestamp, a vector clock, or other modes.
  • the version of the written data is not limited in this embodiment.
  • Step 309 The first backup node returns a write operation response to the DHT routing library. Specifically, if the write is successful, the first backup node returns a write success response to the DHT routing library; if the write fails, the first backup node returns a write failure response to the DHT routing library. Two backup nodes, and send a write operation request based on the Key addressing mode to the second backup node.
  • Step 311 The second backup node writes the data to be written to the local according to the key value of the data to be written, and records the version of the written data.
  • Step 312 The second backup node returns a write operation response to the DHT routing library.
  • the second backup node if the write is successful, the second backup node returns a write success response to the DHT routing library; if the write fails, the second backup node returns a write failure response to the DHT routing library.
  • steps 304 to 312 may be a process of asynchronous operation.
  • Step 313 The DHT routing library receives the write operation response according to the preset write operation policy, and calculates the number of times the write operation succeeds.
  • the DHT-based Key-value storage system supports different write operation policies.
  • the write operation policy can be set to write 2 copies for 2 copies, that is, the write operation succeeds, which means that the DHT is based on DHT.
  • the write operation policy can be set to write 2 copies for 2 copies, that is, the write operation succeeds, which means that the DHT is based on DHT.
  • three different storage nodes (3 copies) will be written. If the writing to the two storage nodes is successful during the write operation, the entire write operation is considered successful.
  • the next copy can be synchronized by the background, which can speed up the write operation without destroying the number of copies of the data.
  • Step 314 The DHT routing library returns a response to the above-mentioned Key addressing mode based write operation request to the DHT block storage driver according to the number of successful write operations and the preset write operation policy.
  • step 313 after the writing to the two storage nodes is successful, the entire write operation is considered successful, and the DHT routing library returns a write operation success response to the DHT block storage driver; and if only one storage node is written Successfully entered, or failed to write to all storage nodes. Then the DHT routing library returns a write operation failure response to the DHT block storage driver.
  • the data writing method provided in this embodiment can improve the reliability of the DHT-based Key-value storage system;
  • the incoming data has at least two backups, so the DHT-based Key-value storage system can perform the write operation all the time, and the data cannot be written because the storage node in the storage system fails. Therefore, the data provided in this embodiment is provided.
  • the write method can improve the availability of the DHT-based Key-value storage system.
  • the DHT technology itself is characterized by high scalability. Therefore, the data writing method provided in this embodiment can improve the DHT-based Key-value storage system.
  • the scalability of the DHT-based storage system does not require specially customized hardware, and the general hardware device such as a PC can be used. Therefore, the data writing method provided in this embodiment can improve the DHT-based key. -value storage system cheapness; in summary, the number provided in this embodiment Write methods to meet the highly scalable, highly reliable, highly available and inexpensive mass storage needs.
  • the data reading method may include:
  • Step 401 The DHT block storage driver receives an LBA-based read operation request for the volume.
  • the LBA-based read operation request carries the LBA number and the number of sectors to be read.
  • the sector is the smallest access unit of the disk, and the default sector size of the existing disk is 512 bytes (Byte).
  • Step 402 The DHT block storage driver converts the read operation request based on the LBA mode into a read operation request based on the Key addressing mode, and the read operation request based on the Key addressing mode carries the Key corresponding to the data to be read.
  • Step 403 The DHT block storage driver sends a read operation request based on the Key addressing mode to the DHT routing library.
  • the DHT routing library may be located in the DHT block storage driver or in the storage node of the DHT-based Key-value storage system. If the DHT routing library is located in the DHT block storage driver, the DHT block storage driver may be locally The language interface call sends a read operation request based on the Key addressing mode to the DHT routing library; if the DHT routing library is located in the storage node, the DHT block storage driver can interact with the storage node where the DHT routing library is located, and will be based on Key addressing. The mode read operation request is sent to the DHT routing library.
  • Step 404 The DHT routing library sends a read operation request based on the Key addressing mode to the storage primary node of the data to be read.
  • the DHT routing library first hashes the key of the data to be read carried in the received key operation mode-based read operation request, and then determines the storage node of the Hash area in which the key responsible for the Hash is located as the data to be read. The storage master node, and finally, the DHT routing library sends a read operation request based on the Key addressing mode to the storage master node.
  • Step 405 The storage master node reads the locally stored data to be read and the version of the data to be read according to the key value of the data to be read.
  • the representation of the version may be a timestamp, a vector clock, or other manners, and the manner in which the version of the data to be read is expressed in this embodiment is not limited.
  • Step 406 If the read operation is successful, the storage master node returns the read data, the version of the read data, and the read success response to the DHT routing library.
  • a backup node sends a read operation request based on the Key addressing mode to the first backup node.
  • the foregoing backup policy may be a backup policy, such as a cross-rack or a cross-data center. This embodiment does not limit this, as long as the DHT routing library can determine the backup node of the data to be read according to the backup policy.
  • Step 408 The first backup node reads the locally stored data to be read and the version of the data to be read according to the key value of the data to be read.
  • Step 409 if the read operation is successful, the first backup node returns the read data, the version of the read data, and the read success response to the DHT routing library.
  • the first backup node If the read operation fails, the first backup node returns an empty or other failure response to the DHT routing library. Two backup nodes, and send a read operation request based on Key addressing mode to the second backup node.
  • Step 411 The second backup node reads the locally stored data to be read and the version of the data to be read according to the key value of the data to be read.
  • Step 412 If the read operation is successful, the second backup node returns the read data, the version of the read data, and the read success response to the DHT routing library.
  • the second backup node If the read operation fails, the second backup node returns an empty or other failure response to the DHT routing library.
  • steps 304 to 312 may be a process of asynchronous operation.
  • Step 413 The DHT routing library receives the returned data according to a preset read operation policy, and calculates the number of successful read operations.
  • the DHT-based Key-value storage system supports different read operation strategies.
  • the read operation policy can be set to read 3 copies to 1 copy, which is a successful read operation, which means that the read operation is based on
  • data can be read from any of the three storage nodes (3 copies). As long as the read from one storage node is successful, the entire read operation can be considered. success.
  • the DHT routing library identifies and returns the latest version of the data in the read data to the DHT block storage driver according to the number of successful read operations and the preset read operation policy.
  • Step 415 After processing the latest version of the data, the DHT block storage driver returns data corresponding to the LBA-based read operation request.
  • the DHT block storage driver needs to merge the latest version of the data; if a Key-based addressing The read operation request of the mode corresponds to at least two LBA-based read operation requests, and the DHT block storage driver needs to perform the segmentation process on the latest version of the data.
  • the data to be read since the data to be read has at least two backups, the data is not lost, so the data read and write method provided in this embodiment can improve the reliability of the DHT-based Key-value storage system;
  • the data obtained by the present embodiment has at least two backups. Therefore, the DHT-based key-value storage system can perform the read operation all the time, and the data cannot be read because the storage node in the storage system fails.
  • the read and write method can improve the availability of the DHT-based key-value storage system.
  • the DHT technology itself is characterized by high scalability. Therefore, the data reading and writing method provided in this embodiment can improve the DHT-based key-value storage system.
  • the scalability of the DHT-based storage system does not require special customized hardware, and the general hardware device such as a PC can be used. Therefore, the data reading and writing method provided in this embodiment can improve the DHT-based key.
  • FIG. 5 is a schematic structural diagram of an embodiment of a data read/write device according to the present invention.
  • the data read/write device in this embodiment can be used as an initiator device or a part of an initiator device to implement the present invention.
  • the data reading and writing device may include:
  • the receiving module 51 is configured to receive an LBA-based operation request for the volume
  • the conversion module 52 is configured to convert the foregoing LBA-based operation request into an operation request based on the Key addressing mode, and the key addressing mode-based operation
  • the request carries the key value corresponding to the data to be operated;
  • the sending module 53 is configured to send an operation request based on the Key addressing mode to the routing library, so that the routing library sends an operation request based on the Key addressing mode to the storage master of the to-be-operated data according to the key value corresponding to the data to be operated.
  • the node and the at least one backup node perform read or write operations on the to-be-operated data by the storage master node and the at least one backup node.
  • the above data read/write devices can improve the reliability, availability, scalability and low cost of the storage system to meet the high expansion, high reliability, high availability and low cost mass storage requirements.
  • FIG. 6 is a schematic structural diagram of another embodiment of the data read/write device of the present invention. Compared with the data read/write device shown in FIG. 5, the data read/write device shown in FIG. 6 may further include:
  • An initialization module 54 is configured to initialize and start a block storage driver
  • a saving module 55 configured to save an IP address and a service port of at least one storage node in the storage system; and, when the routing library is located in the block storage driver, save the connection between the block storage driver and all or part of the storage nodes in the storage system;
  • the creating module 56 is configured to: after the receiving module 51 receives the command to create a volume, create a volume logical device in the local operating system of the initiating device according to the foregoing command, where the command includes the volume name of the volume to be created and the volume of the volume to be created. size.
  • the above data read/write devices can improve the reliability, availability, scalability and low cost of the storage system to meet the high expansion, high reliability, high availability and low cost mass storage requirements.
  • FIG. 7 is a schematic structural diagram of an embodiment of a storage system according to the present invention, as shown in FIG.
  • the storage system may include: a block storage driver 71, a routing library 72, a storage master node 73, and at least one backup node 74;
  • a block storage driver 71 configured to receive an LBA-based operation request for a volume, convert the LBA-based operation request into an operation request based on a Key addressing mode, and send an operation request based on the Key addressing mode to the route
  • the library 72, the operation request based on the Key addressing mode carries the Key corresponding to the data to be operated.
  • the routing library 72 is configured to receive the operation request based on the Key addressing mode sent by the block storage driver 71, and send the operation request based on the Key addressing mode to the storage master node 73 to be operated data according to the Key corresponding to the data to be operated. At least one backup node 74;
  • the storage master node 73 is configured to receive an operation request based on the Key addressing mode sent by the routing library 72, and perform a read or write operation on the data to be operated;
  • the at least one backup node 74 is configured to receive an operation request based on the Key addressing mode sent by the routing library 72, and perform a read or write operation on the data to be operated.
  • the storage system in this embodiment may further include: an initiator device 70 where the block storage driver 71 is located, for initializing and starting the block storage driver 71 before the block storage driver 71 receives the LBA-based operation request for the volume. Saving the IP address and service port of at least one storage node in the storage system; and storing the connection of the block storage driver 71 with all or part of the storage nodes in the storage system when the routing library 72 is located in the block storage driver 71; and receiving the created volume Command, which contains the volume name of the volume to be created and the volume size of the volume to be created;
  • the block storage driver 71 can also establish a connection between the block storage driver 71 and the storage node when the routing library 72 is located in the storage node; and the local operating system of the originating device 70 according to the command to create the volume received by the originating device 70. Create a volume logical device.
  • the routing library 72 in this embodiment is further configured to obtain an IP address of each storage node in the storage system, a service port, and a Hash area that each storage node is responsible for, and establish a routing library 72 and all or part of storage nodes in the storage system. Connection.
  • the routing library 72 may hash the Key corresponding to the operation data, and determine that the key is negative.
  • the storage node of the Hash area to which the Key belongs to the Hash is the storage master node 73 of the data to be operated, and the operation request based on the Key addressing mode is sent to the storage master node 73 of the data to be operated; and the to-be-operated operation is determined according to a predetermined backup policy.
  • At least one backup node 74 of the data transmits an operation request based on the Key addressing mode to at least one backup node.
  • the storage master node 73 may be a write operation request when the operation request is to be written data, and the write operation request further carries the data to be written according to the data to be written. Key writes the data to be written to the local, and records the version of the written data. After the write operation is completed, the write operation response is returned to the routing library 72;
  • the at least one backup node 74 may be a write operation request, and the to-be-operated data is data to be written.
  • the write operation request further carries the data to be written
  • the data to be written is written according to the key of the data to be written. Going locally, and recording the version of the data to be written, after the write operation is completed, a write operation response is returned to the routing library 72.
  • the routing library 72 can also receive the above write operation response according to a preset write operation policy, and calculate the number of successful write operations, and return to the block storage driver 71 according to the number of successful write operations and the preset write operation policy. Response to a write operation request based on the Key addressing method.
  • the storage master node 73 may read the locally stored data to be read according to the key of the data to be read when the operation request is a read operation request and the data to be read is the data to be read. And the version of the data to be read, returning the read data, the version of the read data, and the read operation response to the routing library 72;
  • the at least one backup node 74 is specifically configured to: when the operation request is a read operation request, and the to-be-operated data is data to be read, read the locally stored data to be read and the to-be-read according to the key value of the data to be read.
  • the version of the data is fetched, and the read data, the version of the read data, and the read operation response are returned to the routing library 72.
  • the routing library 72 is further configured to receive the returned data according to the preset read operation policy, and calculate the number of successful read operations, and identify and store the data according to the number of successful read operations and the preset read operation policy.
  • Driver 71 returns the latest version of the data in the read data for block storage After the driver 71 processes the latest version of the data, it returns the data corresponding to the LBA-based read operation request.
  • the block storage driver 71 can be implemented based on the DHT technology to implement the functions of the receiving module 51, the converting module 52, and the transmitting module 53 in the embodiment shown in FIG. 5 of the present invention. Specifically, the block storage driver 71 can establish and maintain a connection with the target end, and convert the LBA-based operation request into an operation request based on the Key addressing mode when a read/write operation for the volume occurs, and the key-based search is performed. The operation request of the address mode is sent to the routing library 72. In this embodiment, the block storage driver 71 and the volume together constitute the initiator device 70.
  • the routing library 72 is a virtual entity, which can be implemented based on the DHT technology.
  • the routing library 72 can be placed in the block storage driver 71 as a library or in the storage engine of the storage node. It is placed in the block storage driver 71 and in the storage engine of the storage node; or, the routing library 72 exists as a separate entity; the main function of the routing library 72 is to implement the distribution of operation requests based on the Key addressing mode.
  • Figure 8 is a schematic diagram of an embodiment of a route supported by the storage system of the present invention. As shown in Figure 8, it is assumed that the number of copies reserved for each data is 3, and the data requested is stored in node 1, node 3, and node j, respectively.
  • the request routing supported by the storage system mainly includes the following three types:
  • client routing the client sends the operation request based on the Key addressing mode to the node where the data is located; wherein, the client is the initiator device 70;
  • Server primary data node routing The client forwards the operation request based on the Key addressing mode to the storage primary node of the data, and then redistributes it to the backup node by the storage primary node; the routing manner can also be regarded as the server proxy routing. Special way
  • Server-side proxy routing The client sends an operation request based on the Key addressing mode to any storage node, and then the storage node acts as a proxy node to forward the operation request based on the Key addressing mode to the storage node where the data is located.
  • the proxy node can also be the node where the data resides.
  • the routing library 72 when the routing library 72 is placed in the block storage driver 71, the corresponding client route is stored.
  • the storage node corresponds to two routes of the server. Of course, some special requests may also be a combination of client routing and server routing.
  • the block storage driver 71 and the storage node both include the routing library 72.
  • the storage system provided in this embodiment is a DHT-based key-value storage system, and the storage master node 73 and the at least one backup node 74 are storage nodes in the target end of the DHT-based Key-value storage system, and the target end is Server.
  • the DHT-based Key-value storage system is a distributed Key-value storage system implemented by using DHT technology.
  • the target end of the DHT-based Key-value storage system includes several storage nodes, and the physical storage nodes may be located. Different racks and different data centers, but all logical storage nodes are located on the same Hash ring and are responsible for different Hash areas.
  • the keys carried by the Key-based operation request are hashed to find the data to be operated.
  • the storage node to which it belongs a piece of continuous data to be processed is likely to be distributed and stored in each storage node.
  • the storage node uses data copies and different replication policies (cross-rack backup or cross-data center backup) to ensure data redundancy.
  • cross-rack backup or cross-data center backup In order to improve reliability; and because the data to be operated is distributed and stored, the DHT-based Key-value storage system can use the initiator device or the target concurrent I/O to reduce the I/O delay to improve performance.
  • the storage node is composed of a DHT storage engine and storage hardware. It is the smallest unit visible to the outside of the storage system. It is used to process the operation request based on the Key addressing mode and complete the reading and writing of data.
  • DHT Storage Engine Completes the data storage of the DHT-based Key-value storage system. If the routing library is included, the request routing of the server can also be implemented.
  • Storage hardware including physical disk
  • the physical disk can be a common hard disk, for example: Integrated Device Electronics (hereinafter referred to as: IDE) hard disk, Serial Advanced Technology Attachment (hereinafter referred to as: SATA) hard disk or solid state hard disk, etc.
  • IDE Integrated Device Electronics
  • SATA Serial Advanced Technology Attachment
  • solid state hard disk etc.
  • the storage hardware of a single storage node also includes other hardware necessary for the system to operate, such as: Central Processing Unit (CPU), memory , hardware devices such as motherboards or network cards. Because the main bottleneck of storage system read and write is disk, so Special hardware can be built to reduce costs, such as CPUs with low main frequency and relatively inexpensive Advanced Reduced Instruction Set Computer Machines (ARM) architecture and/or reduced memory.
  • CPU Central Processing Unit
  • ARM Advanced Reduced Instruction Set Computer Machines
  • the interface between the routing library 72 and the storage node of the target end is based on a Transmission Control Protocol (hereinafter referred to as TCP) I User Datagram Protocol (hereinafter referred to as UDP).
  • TCP Transmission Control Protocol
  • UDP I User Datagram Protocol
  • the Key-value interface, as described above, is a unique identifier of the data to be operated, and value is the content of the data to be operated.
  • the storage system provided in this embodiment is a DHT-based key-value storage system, and the DHT-based Key-value storage system automatically processes the storage node faults, etc., and can maintain data redundancy well, thereby ensuring that the data redundancy is ensured. Storage reliability and increased availability.
  • DHT-based Key-value storage systems are also very scalable, and in theory can be expanded infinitely to expand system capacity.
  • the storage node in the DHT-based Key-value storage system itself also supports the use of inexpensive hardware, which can improve the cost of the storage system.
  • modules in the apparatus in the embodiments may be distributed in the apparatus of the embodiment according to the description of the embodiments, or may be correspondingly changed in one or more apparatuses different from the embodiment.
  • the modules of the above embodiments may be combined into one module, or may be further split into a plurality of sub-modules.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

提供一种数据读写方法、装置和存储系统,该数据读写方法包括:发起端设备的块存储驱动接收针对卷的基于LBA方式的操作请求;将基于LBA方式的操作请求转换为基于Key寻址方式的操作请求;然后将基于Key寻址方式的操作请求发送给路由库。可以提高存储系统的可靠性、可用性、可扩展性和廉价性,满足高扩展、高可靠、高可用和廉价的海量存储需求。

Description

数据读写方法、 装置和存储系统
技术领域
本发明实施例涉及信息技术领域, 尤其涉及一种数据读写方法、 装置和 存储系统。 背景技术
随着网络的发展, 信息呈现爆炸性增长, 人类的数据达到前所未有的 规模, 这些超大规模的数据存储和管理已经成为一大挑战, 虽然硬盘的容 量越来越大, 存储速度不断增加, 可传统的直接硬盘存储以及釆用多个外 挂磁盘阵列的方式已经难以满足海量信息管理对于存储子系统的可扩展 性、 可靠性和高可用性等方面的要求。
现有技术中, 一个典型的块存储系统一般可以简化为发起端和目标端 两部分, 发起端在本地生成卷并与目标端建立连接, 同时将对本地设备文 件的输入(Input; 以下简称: 1 ) /输出 (Output; 以下简称: 0 )请求转发 给目标端进行处理; 目标端管理存储设备并处理最终的 I/O请求; 两者之 间通过基于因特网小型计算机系统接口 (internet Small Computer System Interface; 以下简称: iSCSI ) /光纤通道( Fiber Channel; 以下简称: FC ) /基于以太网的高级技术附力 σ装置 (Advanced Technology Attachment over Ethernet; 以下简称: AOE ) I网络块设备( Network Block Device; 以下简 称: NBD ) 等存储协议的块接口进行通信。
其中, 发起端主要包含接入单元这个实体, 实现本地卷管理及通过各 种协议 (例如: iSCSI、 FC或 AOE等) 与目标端建立连接并进行通信; 目标端主要包括卷控制单元和独立冗余磁盘阵列 ( Redundant Array of Independent Disk; 以下简称: RAID )控制单元。 RAID控制单元管理具体 物理磁盘并构建 RAID组, 同时形成逻辑磁盘; 卷控制单元管理 RAID控 制单元生成的逻辑磁盘并根据需要划分好逻辑卷, 同时通过 iSCSI、 FC或 AOE等协议将逻辑卷暴露出去以供发起端使用。
但是, 现有技术中, 目标端提供的块存储服务在可靠性、 可用性、 可 扩展性和廉价性等方面存在一定的制约。
具体地, 在可靠性和可用性方面, 现有技术主要通过在目标端机拒内 构建 RAID、 釆用多控制机头冗余(如双控制器磁盘阵列 ) 来保证数据的 可靠性, 但是如果该机拒出现电源故障、 两个或多个控制器机头同时出现 故障都可能出现数据丟失或服务中断等问题, 从而进一步影响到可用性; 在可扩展性方面, 如果目标端釆用因特网存储局域网络(Internet Storage Area Network; 以下简称: IP SAN )或光纤通道存储局域网络( FC Storage Area Network; 以下简称: FC SAN ) , 该目标端的容量将受到 IP SAN或 FC SAN的控制器机头处理能力的限制, 支持的最大容量将受到制约, 无 法进行大规模扩展;
在可维护性方面, 传统 RAID组构建的磁盘阵列 , 当 RAID组内的磁 盘出现故障后, 需要及时更换, 并重建 RAID, 才能保证冗余和数据可靠 性, 这就要求维护人员进行随时更换, 系统无法自动处理这类故障;
而在廉价性方面, IP SAN或 FC SAN作为目标端价格都比较高昂, 尤 其是 FC SAN, 与其配套需要的相关交换设备也价格不菲; 存储服务器作 为目标端成本虽然相对较低, 但由于存储服务器对处理器和内存等要求不 高, 磁盘 I/O的性能较低。 发明内容
本发明实施例提供一种数据读写方法、 装置和存储系统, 以提高存储系 统的可靠性、 可用性、 可扩展性和廉价性。
本发明实施例提供一种数据读写方法, 包括:
发起端设备的块存储驱动接收针对卷的基于逻辑块寻址方式的操作 请求;
所述块存储驱动将所述基于逻辑块寻址方式的操作请求转换为基于 键值寻址方式的操作请求, 所述基于键值寻址方式的操作请求携带待操作 数据对应的键值;
所述块存储驱动将所述基于键值寻址方式的操作请求发送给路由库, 以便所述路由库根据所述待操作数据对应的键值将所述基于键值寻址方 式的操作请求发送给所述待操作数据的存储主节点和至少一个备份节点, 由所述存储主节点和所述至少一个备份节点对所述待操作数据进行读取 或写入操作。
本发明实施例还提供一种数据读写装置, 包括:
接收模块, 用于接收针对卷的基于逻辑块寻址方式的操作请求; 转换模块, 用于将所述基于逻辑块寻址方式的操作请求转换为基于键 值寻址方式的操作请求, 所述基于键值寻址方式的操作请求携带所述待操 作数据对应的键值;
发送模块, 用于将所述基于键值寻址方式的操作请求发送给路由库, 以便所述路由库根据所述待操作数据对应的键值将所述基于键值寻址方 式的操作请求发送给所述待操作数据的存储主节点和至少一个备份节点, 由所述存储主节点和所述至少一个备份节点对所述待操作数据进行读取 或写入操作。
本发明还提供一种存储系统, 包括: 块存储驱动、 路由库、 存储主节 点和至少一个备份节点;
所述块存储驱动, 用于接收针对卷的基于逻辑块寻址方式的操作请 求, 将所述基于逻辑块寻址方式的操作请求转换为基于键值寻址方式的操 作请求, 以及将所述基于键值寻址方式的操作请求发送给所述路由库, 所 述基于键值寻址方式的操作请求携带待操作数据对应的键值;
所述路由库, 用于接收所述块存储驱动发送的所述基于键值寻址方式 的操作请求, 根据所述待操作数据对应的键值将所述基于键值寻址方式的 操作请求发送给所述待操作数据的存储主节点和至少一个备份节点; 所述存储主节点, 用于接收所述路由库发送的所述基于键值寻址方式 的操作请求, 对所述待操作数据进行读取或写入操作;
所述至少一个备份节点, 用于接收所述路由库发送的所述基于键值寻 址方式的操作请求, 对所述待操作数据进行读取或写入操作。
通过本发明实施例, 发起端设备的块存储驱动接收到针对卷的基于逻 辑块寻址方式的操作请求之后, 将该基于逻辑块寻址方式的操作请求转换 为基于键值寻址方式的操作请求, 然后将上述基于键值寻址方式的操作请 求发送给路由库, 以便该路由库根据基于键值寻址方式的操作请求中携带 的待操作数据对应的键值将该基于键值寻址方式的操作请求发送给待操 作数据的存储主节点和至少一个备份节点, 由存储主节点和至少一个备份 节点对上述待操作数据进行读取或写入操作; 从而可以提高存储系统的可 靠性、 可用性、 可扩展性和廉价性, 满足高扩展、 高可靠、 高可用和廉价 的海量存储需求。 附图说明
为了更清楚地说明本发明实施例或现有技术中的技术方案, 下面将对实 施例或现有技术描述中所需要使用的附图作一简单地介绍, 显而易见地, 下 面描述中的附图是本发明的一些实施例, 对于本领域普通技术人员来讲, 在 不付出创造性劳动的前提下, 还可以根据这些附图获得其他的附图。
图 1为本发明数据读写方法一个实施例的流程图;
图 2为本发明本地卷创建方法一个实施例的流程图;
图 3为本发明数据写入方法一个实施例的流程图;
图 4为本发明数据读取方法一个实施例的流程图;
图 5为本发明数据读写装置一个实施例的结构示意图; 图 6为本发明数据读写装置另一个实施例的结构示意图;
图 7为本发明存储系统一个实施例的结构示意图;
图 8为本发明存储系统支持的路由一个实施例的示意图。 具体实施方式
为使本发明实施例的目的、 技术方案和优点更加清楚, 下面将结合本发 明实施例中的附图, 对本发明实施例中的技术方案进行清楚、 完整地描述, 显然, 所描述的实施例是本发明一部分实施例, 而不是全部的实施例。 基于 本发明中的实施例, 本领域普通技术人员在没有做出创造性劳动的前提下所 获得的所有其他实施例, 都属于本发明保护的范围。
图 1为本发明数据读写方法一个实施例的流程图, 如图 1所示, 该数 据读写方法可以包括:
步骤 101 , 发起端设备的块存储驱动接收针对卷的基于逻辑块寻址 ( Logical Block Addressing; 以下简称: LBA ) 方式的操作请求。
本实施例中, 该发起端设备可以为任何应用服务器, 当然本实施例并 不仅限于此, 本实施例对发起端设备的具体形态不作限定。
步骤 102 , 块存储驱动将基于 LBA 方式的操作请求转换为基于键值 ( Key ) 寻址方式的操作请求, 该基于 Key寻址方式的操作请求携带待操 作数据对应的 Key。
步骤 103 ,块存储驱动将基于 Key寻址方式的操作请求发送给路由库, 以便该路由库根据待操作数据对应的 Key将基于 Key寻址方式的操作请求 发送给上述待操作数据的存储主节点和至少一个备份节点, 由该存储主节 点和至少一个备份节点对上述待操作数据进行读取或写入操作。
本实施例中, 在发起端设备的块存储驱动接收针对卷的基于 LBA方 式的操作请求之前, 该发起端设备初始化并启动上述块存储驱动, 保存存 储系统中至少一个存储节点的因特网协议( Internet Protocol; 以下简称: IP )地址和服务端口; 然后, 当路由库位于上述块存储驱动中时, 发起端 设备保存上述块存储驱动与存储系统中所有或部分存储节点的连接; 或 者, 当路由库位于存储节点中时, 块存储驱动直接建立块存储驱动与存储 节点之间的连接; 接下来, 发起端设备可以接收创建卷的命令、 该命令中 包含待创建卷的卷名和待创建卷的卷大小; 然后发起端设备的块存储驱动 可以根据上述命令在发起端设备的本地操作系统中创建卷逻辑设备; 另 外, 路由库也需获取存储系统中每个存储节点的 IP 地址、 服务端口和每 个存储节点负责的哈希(Hash ) 区域, 并建立路由库与存储系统中所有或 部分存储节点的连接。 上述过程可以实现在发起端设备创建本地卷。
本实施例中, 具体地, 路由库根据待操作数据对应的 Key将基于 Key 寻址方式的操作请求发送给上述待操作数据的存储主节点和至少一个备 份节点可以为: 路由库对上述待操作数据对应的 Key进行哈希, 确定负责 哈希后的 Key所属哈希区域的存储节点为待操作数据的存储主节点,将基 于 Key寻址方式的操作请求发送给上述待操作数据的存储主节点; 以及根 据预定的备份策略确定上述待操作数据的至少一个备份节点, 并将基于 K e y寻址方式的操作请求发送给上述至少一个备份节点。
具体地, 存储系统中的存储节点都有节点标识, 在确定上述待操作数 据的至少一个备份节点时, 可以根据预定的备份策略从存储主节点开始按 照节点标识由小到大或由大到小的顺序依次选择上述待操作数据的至少 一个备份节点; 举例来说, 当预定的备份策略为每份待操作数据有两份备 份时, 可以从存储主节点开始按照节点标识由小到大或由大到小的顺序依 次选择两个备份节点作为待操作数据的备份节点。
本实施例的一种实现方式中, 当操作请求为写操作请求, 待操作数据 为待写入数据时, 该写操作请求还可以携带上述待写入数据, 这时存储主 节点和至少一个备份节点对待操作数据进行写入操作可以为:
存储主节点和至少一个备份节点根据待写入数据的键值将待写入数 据写入到本地, 并记录写入数据的版本, 写入操作完成之后, 向路由库返 回写操作响应。
之后, 路由库可以根据预设的写操作策略接收上述写操作响应, 并计 算写入操作成功的次数, 根据写入操作成功的次数和预设的写操作策略向 块存储驱动返回针对上述基于 Key寻址方式的写操作请求的响应。
本实施例的另一种实现方式中, 当操作请求为读操作请求, 待操作数 据为待读取数据时, 存储主节点和至少一个备份节点对待操作数据进行读 取操作可以为: 存储主节点和至少一个备份节点根据待读取数据的键值读 取本地存储的待读取数据和待读取数据的版本, 向上述路由库返回读取的 数据、 读取的数据的版本和读操作响应。
之后, 路由库可以根据预设的读操作策略接收返回的数据, 并计算读 取操作成功的次数, 根据该读取操作成功的次数和预设的读操作策略识别 并向块存储驱动返回读取的数据中最新版本的数据, 以便块存储驱动对上 述最新版本的数据进行处理后, 返回基于 LBA方式的读操作请求对应的 数据。
本实施例中, 块存储驱动、 路由库、 存储主节点和至少一个备份节点 均可以基于分布式哈希表 ( Distributed Hash Table; 以下简称: DHT )技 术实现。 本实施例中, 由于待操作数据具有至少两个备份, 因此数据不会 丟失, 所以本实施例提供的数据读写方法可以提高存储系统的可靠性; 并 且由于待操作数据具有至少两个备份, 因此存储系统可以一直提供读写, 不会因为存储系统中某个存储节点故障导致数据不能读写, 所以本实施例 提供的数据读写方法可以提高存储系统的可用性; 另外, DHT技术本身的 特点即为高可扩展性, 因此本实施例提供的数据读写方法可以提高存储系 统的扩展性; 最后, 由于基于 DHT技术实现的存储系统不需要用专门定 制的硬件, 用通用的硬件设备例如: 个人计算机( Personal Computer; 以 下简称: PC )机即可, 因此本实施例提供的数据读写方法可以提高存储系 统的廉价性; 综上所述, 本实施例提供的数据读写方法可以满足高扩展、 高可靠、 高可用和廉价的海量存储需求。
本发明以下实施例的描述中, 以块存储驱动为 DHT块存储驱动、 路 由库为 DHT 路由库、 存储主节点和至少一个备份节点为基于 DHT 的 Key-value存储系统中的节点为例进行说明。其中,基于 DHT的 Key- value 存储系统为利用 DHT技术实现的分布式 Key- value存储系统, Key为数据 的唯一标识, value即为数据内容。
下面对本发明实施例中发起端设备创建本地卷的交互流程进行介绍。 图 2为本发明本地卷创建方法一个实施例的流程图, 如图 2所示, 该方法 可以包括:
步骤 201 , 发起端设备初始化并启动 DHT块存储驱动。
具体地, 发起端设备指定基于 DHT的 Key-value存储系统存储节点 的统一资源定位符 ( Uniform Resource Locator; 以下简称: URL ) 列表, 该 URL列表可以保存至少一个存储节点的 IP地址和服务端口; 保存至少 一个存储节点的 IP地址和服务端口的目的主要是在某个存储节点无法正 常通信的情况下可以使用其他存储节点。
步骤 202 , 初始化 DHT路由库。
本实施例中, 初始化 DHT路由库的主要目的是建立与基于 DHT的 Key-value存储系统的连接。 具体地, 如果 DHT路由库位于 DHT块存储 驱动中, 则初始化 DHT路由库后发起端设备将保存一个连接池, 该连接 池中包含发起端设备的 DHT块存储驱动与基于 DHT的 Key- value存储系 统中所有或者部分存储节点的连接, 也就是说, 发起端设备中的连接池保 存 DHT路由库与基于 DHT的 Key-value存储系统中所有或者部分存储节 点的连接; 如果 DHT路由库位于基于 DHT的 Key-value存储系统的存储 节点中, 则 DHT块存储驱动将直接建立与基于 DHT的 Key-value存储系 统中有 DHT路由库的存储节点的连接。 步骤 203 , DHT路由库建立与基于 DHT的 Key- value存储系统中所 有或者部分存储节点的连接。
另外, DHT路由库还获取基于 DHT的 Key- value存储系统中每个存 储节点的信息,包括每个存储节点的 IP地址、端口和负责的 Hash区域等。
本实施例中, 对于 DHT路由库与基于 DHT的 Key- value存储系统中 所有或者部分存储节点的连接, 该 DHT路由库以池的形式进行维护。
步骤 204 , 发起端设备的 DHT块存储驱动接收创建卷的命令, 该命 令中包含了待创建卷的卷名和卷大小。
其中, 卷名可以是任意的字符、 字符串和 /或数字等, 本实施例对卷 名的表示形式不作限定, 只要待创建卷的卷名在存储系统内唯一即可。
步骤 205 , 发起端设备的 DHT块存储驱动接收到上述命令之后, 根 据上述命令在该发起端设备的本地操作系统中创建卷逻辑设备。 至此, 本 地卷创建完成。
图 3为本发明数据写入方法一个实施例的流程图,本实施例中假设每 份数据存在 3个副本, 如图 3所示, 该数据写入方法可以包括:
步骤 301 , DHT块存储驱动接收针对卷的基于 LBA方式的写操作请 求。
其中, 基于 LBA方式的写操作请求携带待写入数据, 以及待写入的 起始扇区编号和待写入的扇区个数。
需要说明的是, 扇区是磁盘的最小访问单元, 现有磁盘默认的扇区大 小为 512字节 (Byte ) 。
步骤 302 , DHT块存储驱动将基于 LBA方式的写操作请求转换为基 于 Key寻址方式的写操作请求, 该基于 Key寻址方式的写操作请求携带 待写入数据和该待写入数据对应的 Key。
本实施例中, 将基于 LBA方式的写操作请求转换为基于 Key寻址方 式的写操作请求的具体转换方式可以有多种, 一种典型的转换方式可以 为: Key = 卷名 + (基于 LBA方式的写操作请求的 LBA编号 x 512 / value 数据块大小) ; 其中, 除法只取商的整数部分。 其中卷名即为卷的卷名, 基于 LBA方式的写操作请求的 LBA编号为基于 LBA方式的写操作请求 中携带的一个整数编号, 512 为现有磁盘默认的扇区大小, value 数据块 大小指的是存放到基于 DHT 的 Key-value存储系统中每个 Key对应的 value的固定长度。 举例来说, 支设卷名为 "nbdO" , 基于 LBA方式的 写操作请求的 LBA编号为 35 , value数据块大小为 4096字节, 则 key = "nbd0_4" 。 这样, LBA编号为 32、 33、 34和 35的基于 LBA方式的写 操作请求实际上都转化为基于 "nbd0_4" 这个 Key寻址的写操作请求。
另外, 需要说明的是, 一个基于 LBA方式的写操作请求的长度(即 待写入扇区的个数),和转换后基于 Key寻址方式的写操作请求的个数有 映射关系。 如果一个基于 LBA方式的写操作请求中待写入数据的长度, 大于 value数据块大小,则该基于 LBA方式的写操作请求会被转化为至少 两个基于 Key寻址方式的写操作请求。 举例来说, 假设一个基于 LBA方 式的写操作请求中携带的 LBA编号为 32、 33、 34、 35、 36、 37和 38 , 则按照上述转换方式, 该基于 LBA方式的写操作请求会转化为两个基于 Key寻址方式的写操作请求。 其中, 这两个基于 Key寻址方式的写操作请 求中的 Key分另1 J为 "nbd0_4" 和 "ndb0_5" , Key = "nbd0_4" 的写操作 请求, 对应的待写入扇区为 32、 33、 34和 35; Key = "nbd0_5" 的写操 作请求对应的待写入扇区为 36、 37和 38。
步骤 303 , DHT块存储驱动将基于 Key寻址方式的写操作请求发送 给 DHT路由库。
本实施例中, DHT路由库可以位于 DHT块存储驱动中, 也可以位于 基于 DHT的 Key- value存储系统的存储节点中, 如果 DHT路由库位于 DHT块存储驱动中, DHT块存储驱动可以通过本地语言接口调用将基于 Key寻址方式的写操作请求发送给 DHT路由库;如果 DHT路由库位于存 储节点中,则 DHT块存储驱动可以与 DHT路由库所在的存储节点进行交 互, 将基于 Key寻址方式的写操作请求发送给 DHT路由库。
步骤 304 , DHT路由库向待写入数据的存储主节点发送基于 Key寻 址方式的写操作请求。
具体地, DHT路由库先对接收的基于 Key寻址方式的写操作请求中 携带的待写入数据的 Key进行 Hash, 然后确定负责 Hash后的 Key所在 Hash区域的存储节点为该待写入数据的存储主节点, 最后, DHT路由库 将基于 Key寻址方式的写操作请求发送给上述存储主节点。
步骤 305 , 存储主节点根据待写入数据的键值将待写入数据写入到本 地, 并记录写入数据的版本。
其中, 写入数据的版本的表示方式可以为时间戳、 向量时钟或其他方 式, 本实施例对写入数据的版本的表示方式不作限定。
步骤 306 , 存储主节点向 DHT路由库返回写操作响应。
具体地, 如果写入成功, 则存储主节点向 DHT路由库返回写入成功 响应; 如果写入失败, 则存储主节点向 DHT路由库返回写入失败响应。 一个备份节点, 并向第一个备份节点发送基于 Key寻址方式的写操作请 求。
其中, 上述备份策略可以跨机架或跨数据中心等备份策略, 本实施例 备份节点即可。
步骤 308 , 第一个备份节点根据待写入数据的键值将待写入数据写入 到本地, 并记录写入数据的版本。
其中, 写入数据的版本的表示方式可以为时间戳、 向量时钟或其他方 式, 本实施例对写入数据的版本不作限定。
步骤 309 , 第一个备份节点向 DHT路由库返回写操作响应。 具体地, 如果写入成功, 则第一个备份节点向 DHT路由库返回写入 成功响应; 如果写入失败, 则第一个备份节点向 DHT路由库返回写入失 败响应。 二个备份节点, 并向第二个备份节点发送基于 Key寻址方式的写操作请 求。
步骤 311 , 第二个备份节点根据待写入数据的键值将待写入数据写入 到本地, 并记录写入数据的版本。
步骤 312 , 第二个备份节点向 DHT路由库返回写操作响应。
具体地, 如果写入成功, 则第二个备份节点向 DHT路由库返回写入 成功响应; 如果写入失败, 则第二个备份节点向 DHT路由库返回写入失 败响应。
本实施例中, 步骤 304〜步骤 312可以是一个异步操作的过程。
步骤 313 , DHT路由库根据预设的写操作策略接收写操作响应, 并计 算写入操作成功的次数。
本实施例中, 基于 DHT的 Key-value存储系统中支持不同的写操作 策略, 举例来说, 可以设置写操作策略为 3份副本写成功 2份即为写操作 成功, 这意味着往基于 DHT的 Key-value存储系统中存一份数据时会写 到 3个不同存储节点 (3份副本) , 写操作过程中只要往 2个存储节点写 入成功了, 即可认为整个写操作成功, 剩下的 1份副本可以由后台进行同 步, 这样可以提升写操作的速度, 同时又不会破坏数据备份的份数。
步骤 314 , DHT路由库根据写入操作成功的次数和预设的写操作策略 向 DHT块存储驱动返回针对上述基于 Key寻址方式的写操作请求的响应。
参照步骤 313中的举例, 在往 2个存储节点写入成功之后, 即可认为整 个写操作成功,这时 DHT路由库向 DHT块存储驱动返回写操作成功响应; 而如果只往一个存储节点写入成功,或者,往所有存储节点均未写入成功, 则 DHT路由库向 DHT块存储驱动返回写操作失败响应。
本实施例中,由于待写入数据具有至少两个备份, 因此数据不会丟失, 所以本实施例提供的数据写入方法可以提高基于 DHT的 Key- value存储系 统的可靠性; 并且由于待写入数据具有至少两个备份, 因此基于 DHT 的 Key-value 存储系统可以一直进行写入操作, 不会因为存储系统中某个存 储节点故障导致数据不能进行写入操作, 所以本实施例提供的数据写入方 法可以提高基于 DHT的 Key-value存储系统的可用性; 另外, DHT技术 本身的特点即为高可扩展性, 因此本实施例提供的数据写入方法可以提高 基于 DHT的 Key-value存储系统的扩展性; 最后, 由于基于 DHT技术实 现的存储系统不需要用专门定制的硬件, 用通用的硬件设备例如: PC 机 即可, 因此本实施例提供的数据写入方法可以提高基于 DHT的 Key-value 存储系统的廉价性; 综上所述, 本实施例提供的数据写入方法可以满足高 扩展、 高可靠、 高可用和廉价的海量存储需求。
图 4为本发明数据读取方法一个实施例的流程图,本实施例中假设每 份数据存在 3个副本, 如图 4所示, 该数据读取方法可以包括:
步骤 401 , DHT块存储驱动接收针对卷的基于 LBA方式的读操作请 求。
其中,基于 LBA方式的读操作请求携带 LBA编号和待读取扇区的个 数。
需要说明的是, 扇区是磁盘的最小访问单元, 现有磁盘默认的扇区大 小为 512字节 (Byte ) 。
步骤 402 , DHT块存储驱动将基于 LBA方式的读操作请求转换为基 于 Key寻址方式的读操作请求, 该基于 Key寻址方式的读操作请求携带 待读取数据对应的 Key。
具体地, 将基于 LBA方式的读操作请求转换为基于 Key寻址方式的 读操作请求的方式与将基于 LBA方式的写操作请求转换为基于 Key寻址 方式的写操作请求的方式相同, 在此不再赘述。
步骤 403 , DHT块存储驱动将基于 Key寻址方式的读操作请求发送 给 DHT路由库。
本实施例中, DHT路由库可以位于 DHT块存储驱动中, 也可以位于 基于 DHT的 Key- value存储系统的存储节点中, 如果 DHT路由库位于 DHT块存储驱动中, DHT块存储驱动可以通过本地语言接口调用将基于 Key寻址方式的读操作请求发送给 DHT路由库;如果 DHT路由库位于存 储节点中,则 DHT块存储驱动可以与 DHT路由库所在的存储节点进行交 互, 将基于 Key寻址方式的读操作请求发送给 DHT路由库。
步骤 404 , DHT路由库向待读取数据的存储主节点发送基于 Key寻 址方式的读操作请求。
具体地, DHT路由库先对接收的基于 Key寻址方式的读操作请求中 携带的待读取数据的 Key进行 Hash, 然后确定负责 Hash后的 Key所在 Hash区域的存储节点为该待读取数据的存储主节点, 最后, DHT路由库 将基于 Key寻址方式的读操作请求发送给上述存储主节点。
步骤 405 , 存储主节点根据待读取数据的键值读取本地存储的待读取 数据和该待读取数据的版本。
其中, 该版本的表示方式可以为时间戳、 向量时钟或其他方式, 本实 施例对待读取数据的版本的表示方式不作限定。
步骤 406 , 如果读取操作成功, 则存储主节点向 DHT路由库返回读 取的数据、 读取的数据的版本和读取成功响应。
如果读取操作失败, 则存储主节点向 DHT路由库返回空或其他失败 响应。 一个备份节点, 并向第一个备份节点发送基于 Key寻址方式的读操作请 求。 其中, 上述备份策略可以跨机架或跨数据中心等备份策略, 本实施例 对此不作限定, 只要 DHT路由库可以根据该备份策略确定待读取数据的 备份节点即可。
步骤 408 , 第一个备份节点根据待读取数据的键值读取本地存储的待 读取数据和该待读取数据的版本。
步骤 409 , 如果读取操作成功, 则第一个备份节点向 DHT路由库返 回读取的数据、 读取的数据的版本和读取成功响应。
如果读取操作失败, 则第一个备份节点向 DHT路由库返回空或其他 失败响应。 二个备份节点, 并向第二个备份节点发送基于 Key寻址方式的读操作请 求。
步骤 411 , 第二个备份节点根据待读取数据的键值读取本地存储的待 读取数据和该待读取数据的版本。
步骤 412 , 如果读取操作成功, 则第二个备份节点向 DHT路由库返 回读取的数据、 读取的数据的版本和读取成功响应。
如果读取操作失败, 则第二个备份节点向 DHT路由库返回空或其他 失败响应。
本实施例中, 步骤 304〜步骤 312可以是一个异步操作的过程。
步骤 413 , DHT路由库根据预设的读操作策略接收返回的数据, 并计 算读取操作成功的次数。
本实施例中, 基于 DHT的 Key-value存储系统中支持不同的读操作 策略, 举例来说, 可以设置读操作策略为 3份副本读到 1份副本即为读操 作成功, 这意味着从基于 DHT的 Key-value存储系统中读取一份数据时, 可以从 3个存储节点 (3份副本) 中的任何一个读取数据, 只要从 1个存 储节点读取成功, 即可认为整个读操作成功。 步骤 414 , DHT路由库根据读取操作成功的次数和预设的读操作策略 识别并向 DHT块存储驱动返回读取的数据中最新版本的数据。
步骤 415 , DHT块存储驱动对上述最新版本的数据进行处理后,返回基 于 LBA方式的读操作请求对应的数据。
具体地, 如果一个基于 LBA 方式的读操作请求被转换为至少两个基于 Key寻址方式的读操作请求,则 DHT块存储驱动需要对上述最新版本的数据 进行合并处理; 如果一个基于 Key寻址方式的读操作请求对应至少两个基于 LBA方式的读操作请求, 则 DHT块存储驱动需要对上述最新版本的数据进 行切分处理。
本实施例中, 由于待读取数据具有至少两个备份, 因此数据不会丟失, 所以本实施例提供的数据读写方法可以提高基于 DHT的 Key-value存储系 统的可靠性; 并且由于待读取数据具有至少两个备份, 因此基于 DHT 的 Key-value 存储系统可以一直进行读取操作, 不会因为存储系统中某个存储 节点故障导致数据不能进行读取操作, 所以本实施例提供的数据读写方法可 以提高基于 DHT的 Key-value存储系统的可用性; 另外, DHT技术本身的 特点即为高可扩展性,因此本实施例提供的数据读写方法可以提高基于 DHT 的 Key-value存储系统的扩展性; 最后, 由于基于 DHT技术实现的存储系 统不需要用专门定制的硬件, 用通用的硬件设备例如: PC机即可, 因此本实 施例提供的数据读写方法可以提高基于 DHT的 Key-value存储系统的廉价 性; 综上所述, 本实施例提供的数据读取方法可以满足高扩展、 高可靠、 高 可用和廉价的海量存储需求。
本领域普通技术人员可以理解: 实现上述方法实施例的全部或部分步骤 可以通过程序指令相关的硬件来完成, 前述的程序可以存储于一计算机可读 取存储介质中, 该程序在执行时, 执行包括上述方法实施例的步骤; 而前述 的存储介质包括: ROM, RAM, 磁碟或者光盘等各种可以存储程序代码的介 图 5为本发明数据读写装置一个实施例的结构示意图, 本实施例中的 数据读写装置可以作为发起端设备, 或发起端设备的一部分实现本发明图
1所示实施例的流程。 如图 5所示, 该数据读写装置可以包括:
接收模块 51 , 用于接收针对卷的基于 LBA方式的操作请求; 转换模块 52 , 用于将上述基于 LBA方式的操作请求转换为基于 Key 寻址方式的操作请求,该基于 Key寻址方式的操作请求携带待操作数据对 应的键值;
发送模块 53 , 用于将基于 Key寻址方式的操作请求发送给路由库, 以便该路由库根据待操作数据对应的键值将基于 Key 寻址方式的操作请 求发送给上述待操作数据的存储主节点和至少一个备份节点, 由该存储主 节点和至少一个备份节点对上述待操作数据进行读取或写入操作。
上述数据读写装置可以提高存储系统的可靠性、 可用性、 可扩展性和 廉价性, 满足高扩展、 高可靠、 高可用和廉价的海量存储需求。
图 6为本发明数据读写装置另一个实施例的结构示意图, 与图 5所示 的数据读写装置相比, 不同之处在于, 图 6所示的数据读写装置还可以包 括:
初始化模块 54 , 用于初始化并启动块存储驱动;
保存模块 55 , 用于保存存储系统中至少一个存储节点的 IP地址和服 务端口; 以及当路由库位于上述块存储驱动中时, 保存该块存储驱动与存 储系统中所有或部分存储节点的连接;
创建模块 56 , 用于在接收模块 51接收到创建卷的命令之后, 根据上 述命令在发起端设备的本地操作系统中创建卷逻辑设备, 上述命令中包含 待创建卷的卷名和待创建卷的卷大小。
上述数据读写装置可以提高存储系统的可靠性、 可用性、 可扩展性和 廉价性, 满足高扩展、 高可靠、 高可用和廉价的海量存储需求。
图 7为本发明存储系统一个实施例的结构示意图, 如图 7所示, 该存 储系统可以包括: 块存储驱动 71、 路由库 72、 存储主节点 73和至少一个 备份节点 74;
块存储驱动 71 , 用于接收针对卷的基于 LBA方式的操作请求, 将该 基于 LBA方式的操作请求转换为基于 Key寻址方式的操作请求, 以及将 基于 Key寻址方式的操作请求发送给路由库 72 , 该基于 Key寻址方式的 操作请求携带待操作数据对应的 Key。
路由库 72 , 用于接收块存储驱动 71发送的基于 Key寻址方式的操作 请求,根据待操作数据对应的 Key将上述基于 Key寻址方式的操作请求发 送给待操作数据的存储主节点 73和至少一个备份节点 74;
存储主节点 73 , 用于接收路由库 72发送的基于 Key寻址方式的操作 请求, 对该待操作数据进行读取或写入操作;
至少一个备份节点 74, 用于接收路由库 72发送的基于 Key寻址方式 的操作请求, 对该待操作数据进行读取或写入操作。
另外, 本实施例中的存储系统还可以包括: 块存储驱动 71 所在的发 起端设备 70 ,用于在块存储驱动 71接收针对卷的基于 LBA方式的操作请 求之前, 初始化并启动块存储驱动 71 ,保存存储系统中至少一个存储节点 的 IP地址和服务端口; 以及当路由库 72位于块存储驱动 71 中时, 保存 块存储驱动 71 与存储系统中所有或部分存储节点的连接; 以及接收创建 卷的命令、 该命令中包含待创建卷的卷名和待创建卷的卷大小;
块存储驱动 71还可以当路由库 72位于存储节点中时, 建立块存储驱 动 71与存储节点之间的连接; 以及根据发起端设备 70接收的创建卷的命 令在发起端设备 70的本地操作系统中创建卷逻辑设备。
本实施例中的路由库 72 , 还用于获取存储系统中每个存储节点的 IP 地址、 服务端口和每个存储节点负责的 Hash区域, 并建立路由库 72与存 储系统中所有或部分存储节点的连接。
具体地, 路由库 72可以对待操作数据对应的 Key进行 Hash, 确定负 责 Hash后的 Key所属 Hash区域的存储节点为待操作数据的存储主节点 73 , 将基于 Key寻址方式的操作请求发送给待操作数据的存储主节点 73 ; 以及根据预定的备份策略确定待操作数据的至少一个备份节点 74 ,将基于 Key寻址方式的操作请求发送给至少一个备份节点。
本实施例的一种实现方式中, 存储主节点 73 可以当操作请求为写操 作请求,待操作数据为待写入数据,写操作请求还携带上述待写入数据时, 根据待写入数据的 Key 将待写入数据写入到本地, 并记录写入数据的版 本, 写入操作完成之后, 向路由库 72返回写操作响应;
至少一个备份节点 74 可以当操作请求为写操作请求, 待操作数据为 待写入数据, 上述写操作请求还携带上述待写入数据时, 根据待写入数据 的 Key将上述待写入数据写入到本地, 并记录待写入数据的版本, 写入操 作完成之后, 向路由库 72返回写操作响应。
这时, 路由库 72还可以根据预设的写操作策略接收上述写操作响应, 并计算写入操作成功的次数, 根据写入操作成功的次数和预设的写操作策 略向块存储驱动 71返回针对基于 Key寻址方式的写操作请求的响应。
本实施例的另一种实现方式中, 存储主节点 73 可以当操作请求为读 操作请求, 待操作数据为待读取数据时, 根据待读取数据的 Key读取本地 存储的待读取数据和待读取数据的版本, 向路由库 72返回读取的数据、 读取的数据的版本和读操作响应;
至少一个备份节点 74, 具体用于当上述操作请求为读操作请求, 上述 待操作数据为待读取数据时, 根据待读取数据的键值读取本地存储的上述 待读取数据和待读取数据的版本, 向路由库 72返回读取的数据、 读取的 数据的版本和读操作响应。
这时, 路由库 72 , 还用于根据预设的读操作策略接收返回的数据, 并 计算读取操作成功的次数, 根据读取操作成功的次数和预设的读操作策略 识别并向块存储驱动 71 返回读取的数据中最新版本的数据, 以便块存储 驱动 71对最新版本的数据进行处理后, 返回基于 LBA方式的读操作请求 对应的数据。
本实施例中, 块存储驱动 71可以基于 DHT技术实现, 实现本发明图 5 所示实施例中接收模块 51、 转换模块 52和发送模块 53的功能。 具体地, 块 存储驱动 71可以建立并维护与目标端的连接,在发生针对卷的读写操作时将 该基于 LBA方式的操作请求转换为基于 Key寻址方式的操作请求,并将该基 于 Key寻址方式的操作请求发送给路由库 72。 本实施例中, 块存储驱动 71 和卷共同组成发起端设备 70。
本实施例中, 路由库 72是一个虚拟实体, 可以基于 DHT技术实现, 路由库 72既可以作为一个库放在块存储驱动 71中, 也可以放在存储节点 的存储引擎中, 当然, 也可以既放在块存储驱动 71 中, 又放在存储节点 的存储引擎中; 或者, 路由库 72作为一个单独的实体存在; 路由库 72的 主要功能是实现基于 Key寻址方式的操作请求的分发和路由; 图 8为本发 明存储系统支持的路由一个实施例的示意图, 如图 8所示, 假设每份数据 保留的副本数为 3 , 示意请求的数据分别保存在节点 1、 节点 3 和节点 j 上, 则该存储系统支持的请求路由主要包含如下三种:
1 )客户端路由: 由客户端将基于 Key寻址方式的操作请求分别发往 数据所在节点; 其中, 该客户端即为发起端设备 70;
2 )服务端主数据节点路由: 客户端将基于 Key寻址方式的操作请求 转给数据的存储主节点, 然后由存储主节点再分发到备份节点; 该路由方 式也可视为服务端代理路由的特殊方式;
3 )服务端代理路由: 客户端将基于 Key寻址方式的操作请求发到任 意一个存储节点,然后该存储节点扮演代理节点的角色将基于 Key寻址方 式的操作请求转发至数据所在的存储节点; 当然, 该代理节点也可以是数 据所在节点。
其中, 当路由库 72放在块存储驱动 71 中对应的是客户端路由, 放在存 储节点则对应的是服务端的两种路由, 当然, 一些特殊的请求也可以是客户 端路由和服务端路由的组合,这时候块存储驱动 71和存储节点都包含路由库 72。
本实施例提供的存储系统为基于 DHT的 Key-value存储系统,存储主 节点 73和至少一个备份节点 74均为基于 DHT的 Key- value存储系统的目 标端中的存储节点, 该目标端即为服务端。 其中, 基于 DHT的 Key- value 存储系统是利用 DHT技术实现的分布式 Key- value存储系统,该基于 DHT 的 Key-value存储系统的目标端中包含若干个存储节点, 物理上这些存储 节点可能位于不同的机架、 不同的数据中心, 但逻辑所有存储节点都位于 一个相同的 Hash环上并且负责不同的 Hash区域,通过对基于 Key寻址方 式的操作请求携带的 Key进行 Hash来寻找待操作数据所属的存储节点; 一段连续的待操作数据很可能被分散的存储在各个存储节点, 存储节点利 用数据副本和不同的复制策略(跨机架备份或跨数据中心备份)来保证数 据的冗余度以提高可靠性; 同时因为待操作数据被分散存储的特点, 基于 DHT的 Key-value存储系统可以利用发起端设备或目标端并发 I/O来降低 I/O延时以提高性能。
存储节点由 DHT存储引擎和和存储硬件组成,是存储系统中对外可见的 最小单位, 用来处理基于 Key寻址方式的操作请求, 并完成数据的读写。 1 ) DHT存储引擎: 完成基于 DHT的 Key-value存储系统的数据存储。 如果包含 路由库, 则也可以实现服务端的请求路由。 2 )存储硬件: 包括物理磁盘, 该 物理磁盘可以为普通的硬盘, 例如: 集成磁盘电子接口 ( Integrated Device Electronics; 以下简称: IDE )硬盘、 串行高级技术附件 ( Serial Advanced Technology Attachment; 以下简称: SATA )硬盘或固态硬盘等, 是存储硬件 的主要组成; 此外, 单个存储节点的存储硬件还包括系统运行所必需的其他 硬件, 例如: 中央处理单元( Central Processing Unit; 以下简称: CPU )、 内 存、 主板或网卡等硬件设备。 因为存储系统读写的主要瓶颈在于磁盘, 所以 可以自制特殊的硬件来降低成本, 如釆用低主频和相对廉价的高级精简指令 集计算机 ( Advanced Reduced Instruction Set Computer Machines; 以下简称: ARM ) 架构的 CPU和 /或降低内存等。
本实施例提供的存储系统中,路由库 72与目标端的存储节点之间的接口 为基于传输控制协议(Transmission Control Protocol; 以下简称: TCP ) I用 户数据报协议( User Datagram Protocol; 以下简称: UDP )的 Key- value接口, 如上所述 Key为待操作数据的唯一标识, value为待操作数据的内容。
本实施例提供的存储系统为基于 DHT的 Key-value存储系统,该基于 DHT的 Key-value存储系统内部对于存储节点故障等都会自动进行处理, 能够很好的维持数据的冗余度, 从而保证存储可靠性以及提高可用性。 此 外, 基于 DHT的 Key-value存储系统还具有很好的可扩展性, 理论上可以 无限扩展从而扩大系统容量。 另夕卜, 基于 DHT的 Key- value存储系统中的 存储节点本身也支持釆用廉价硬件, 可以提高存储系统的廉价性。
本领域技术人员可以理解附图只是一个优选实施例的示意图, 附图中 的模块或流程并不一定是实施本发明所必须的。
本领域技术人员可以理解实施例中的装置中的模块可以按照实施例描述 进行分布于实施例的装置中, 也可以进行相应变化位于不同于本实施例的一 个或多个装置中。 上述实施例的模块可以合并为一个模块, 也可以进一步拆 分成多个子模块。
最后应说明的是: 以上实施例仅用以说明本发明的技术方案, 而非对 其限制; 尽管参照前述实施例对本发明进行了详细的说明, 本领域的普通 技术人员应当理解: 其依然可以对前述各实施例所记载的技术方案进行修 改, 或者对其中部分技术特征进行等同替换; 而这些修改或者替换, 并不 使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。

Claims

权利要求
1、 一种数据读写方法, 其特征在于, 包括:
发起端设备的块存储驱动接收针对卷的基于逻辑块寻址方式的操作 请求;
所述块存储驱动将所述基于逻辑块寻址方式的操作请求转换为基于 键值寻址方式的操作请求, 所述基于键值寻址方式的操作请求携带待操作 数据对应的键值;
所述块存储驱动将所述基于键值寻址方式的操作请求发送给路由库, 以便所述路由库根据所述待操作数据对应的键值将所述基于键值寻址方 式的操作请求发送给所述待操作数据的存储主节点和至少一个备份节点, 由所述存储主节点和所述至少一个备份节点对所述待操作数据进行读取 或写入操作。
2、 根据权利要求 1 所述的方法, 其特征在于, 所述发起端设备的块 存储驱动接收针对卷的基于逻辑块寻址方式的操作请求之前, 还包括: 所述发起端设备初始化并启动所述块存储驱动, 保存存储系统中至少 一个存储节点的因特网协议地址和服务端口;
当所述路由库位于所述块存储驱动中时, 所述发起端设备保存所述块 存储驱动与所述存储系统中所有或部分存储节点的连接; 或者, 当所述路 由库位于所述存储节点中时, 所述块存储驱动建立所述块存储驱动与所述 存储节点之间的连接;
所述发起端设备接收创建卷的命令、 所述命令中包含待创建卷的卷名 和所述待创建卷的卷大小;
所述发起端设备的块存储驱动根据所述命令在所述发起端设备的本 地操作系统中创建卷逻辑设备。
3、 根据权利要求 2所述的方法, 其特征在于, 所述发起端设备的块 存储驱动接收针对卷的基于逻辑块寻址方式的操作请求之前, 还包括: 所述路由库获取所述存储系统中每个存储节点的因特网协议地址、 服 务端口和所述每个存储节点负责的哈希区域, 并建立所述路由库与所述存 储系统中所有或部分存储节点的连接。
4、 根据权利要求 3 所述的方法, 其特征在于, 所述路由库根据所述 待操作数据对应的键值将所述基于键值寻址方式的操作请求发送给所述 待操作数据的存储主节点和至少一个备份节点包括:
所述路由库对所述待操作数据对应的键值进行哈希, 确定负责哈希后 的键值所属哈希区域的存储节点为所述待操作数据的存储主节点, 将所述 基于键值寻址方式的操作请求发送给所述待操作数据的存储主节点;
所述路由库根据预定的备份策略确定所述待操作数据的至少一个备 份节点, 将所述基于键值寻址方式的操作请求发送给所述至少一个备份节 点。
5、 根据权利要求 3或 4所述的方法, 其特征在于, 当所述操作请求 为写操作请求, 所述待操作数据为待写入数据时, 所述写操作请求还携带 所述待写入数据; 所述存储主节点和所述至少一个备份节点对所述待操作 数据进行写入操作包括:
所述存储主节点和所述至少一个备份节点根据所述待写入数据的键 值将所述待写入数据写入到本地, 并记录写入数据的版本, 写入操作完成 之后, 向所述路由库返回写操作响应。
6、 根据权利要求 5所述的方法, 其特征在于, 还包括:
所述路由库根据预设的写操作策略接收所述写操作响应, 并计算写入 操作成功的次数, 根据所述写入操作成功的次数和所述预设的写操作策略 向所述块存储驱动返回针对所述基于键值寻址方式的写操作请求的响应。
7、 根据权利要求 3或 4所述的方法, 其特征在于, 当所述操作请求 为读操作请求, 所述待操作数据为待读取数据时, 所述存储主节点和所述 至少一个备份节点对所述待操作数据进行读取操作包括:
所述存储主节点和所述至少一个备份节点根据所述待读取数据的键 值读取本地存储的所述待读取数据和所述待读取数据的版本, 向所述路由 库返回读取的数据、 读取的数据的版本和读操作响应。
8、 根据权利要求 7所述的方法, 其特征在于, 还包括:
所述路由库根据预设的读操作策略接收返回的数据, 并计算读取操作 成功的次数, 根据所述读取操作成功的次数和所述预设的读操作策略识别 并向所述块存储驱动返回所述读取的数据中最新版本的数据, 以便所述块 存储驱动对所述最新版本的数据进行处理后, 返回基于逻辑块寻址方式的 读操作请求对应的数据。
9、 一种数据读写装置, 其特征在于, 包括:
接收模块, 用于接收针对卷的基于逻辑块寻址方式的操作请求; 转换模块, 用于将所述基于逻辑块寻址方式的操作请求转换为基于键 值寻址方式的操作请求, 所述基于键值寻址方式的操作请求携带所述待操 作数据对应的键值;
发送模块, 用于将所述基于键值寻址方式的操作请求发送给路由库, 以便所述路由库根据所述待操作数据对应的键值将所述基于键值寻址方 式的操作请求发送给所述待操作数据的存储主节点和至少一个备份节点, 由所述存储主节点和所述至少一个备份节点对所述待操作数据进行读取 或写入操作。
10、 根据权利要求 9所述的装置, 其特征在于, 还包括:
初始化模块, 用于初始化并启动块存储驱动;
保存模块, 用于保存存储系统中至少一个存储节点的因特网协议地址 和服务端口; 以及当所述路由库位于所述块存储驱动中时, 保存所述块存 储驱动与所述存储系统中所有或部分存储节点的连接;
创建模块, 用于在所述接收模块接收到创建卷的命令之后, 根据所述 命令在所述发起端设备的本地操作系统中创建卷逻辑设备, 所述命令中包 含待创建卷的卷名和所述待创建卷的卷大小。
11、 一种存储系统, 其特征在于, 包括: 块存储驱动、 路由库、 存储 主节点和至少一个备份节点;
所述块存储驱动, 用于接收针对卷的基于逻辑块寻址方式的操作请 求, 将所述基于逻辑块寻址方式的操作请求转换为基于键值寻址方式的操 作请求, 以及将所述基于键值寻址方式的操作请求发送给所述路由库, 所 述基于键值寻址方式的操作请求携带待操作数据对应的键值;
所述路由库, 用于接收所述块存储驱动发送的所述基于键值寻址方式 的操作请求, 根据所述待操作数据对应的键值将所述基于键值寻址方式的 操作请求发送给所述待操作数据的存储主节点和至少一个备份节点; 所述存储主节点, 用于接收所述路由库发送的所述基于键值寻址方式 的操作请求, 对所述待操作数据进行读取或写入操作;
所述至少一个备份节点, 用于接收所述路由库发送的所述基于键值寻 址方式的操作请求, 对所述待操作数据进行读取或写入操作。
12、 根据权利要求 1 1所述的系统, 其特征在于, 还包括:
所述块存储驱动所在的发起端设备, 用于在所述块存储驱动接收针对 卷的基于逻辑块寻址方式的操作请求之前, 初始化并启动所述块存储驱 动, 保存存储系统中至少一个存储节点的因特网协议地址和服务端口; 以 及当所述路由库位于所述块存储驱动中时, 保存所述块存储驱动与所述存 储系统中所有或部分存储节点的连接; 以及接收创建卷的命令、 该命令中 包含待创建卷的卷名和待创建卷的卷大小;
所述块存储驱动, 还用于当所述路由库位于所述存储节点中时, 建立 所述块存储驱动与所述存储节点之间的连接; 以及根据所述发起端设备接 收的所述创建卷的命令在所述发起端设备的本地操作系统中创建卷逻辑 设备。
13、 根据权利要求 12所述的系统, 其特征在于,
所述路由库, 还用于获取所述存储系统中每个存储节点的因特网协议 地址、 服务端口和所述每个存储节点负责的哈希区域, 并建立所述路由库 与所述存储系统中所有或部分存储节点的连接。
14、 根据权利要求 13所述的系统, 其特征在于,
所述路由库, 具体用于对所述待操作数据对应的键值进行哈希, 确定 负责哈希后的键值所属哈希区域的存储节点为所述待操作数据的存储主 节点, 将所述基于键值寻址方式的操作请求发送给所述待操作数据的存储 主节点; 以及根据预定的备份策略确定所述待操作数据的至少一个备份节 点, 将所述基于键值寻址方式的操作请求发送给所述至少一个备份节点。
15、 根据权利要求 13或 14所述的系统, 其特征在于,
所述存储主节点, 具体用于当所述操作请求为写操作请求, 所述待操 作数据为待写入数据, 所述写操作请求还携带所述待写入数据时, 根据所 述待写入数据的键值将所述待写入数据写入到本地, 并记录写入数据的版 本, 写入操作完成之后, 向所述路由库返回写操作响应;
所述至少一个备份节点, 具体用于当所述操作请求为写操作请求, 所 述待操作数据为待写入数据, 所述写操作请求还携带所述待写入数据时, 根据所述待写入数据的键值将所述待写入数据写入到本地, 并记录所述待 写入数据的版本, 写入操作完成之后, 向所述路由库返回写操作响应。
16、 根据权利要求 15所述的系统, 其特征在于,
所述路由库, 还用于根据预设的写操作策略接收所述写操作响应, 并 计算写入操作成功的次数, 根据所述写入操作成功的次数和所述预设的写 操作策略向所述块存储驱动返回针对所述基于键值寻址方式的写操作请 求的响应。
17、 根据权利要求 13或 14所述的系统, 其特征在于,
所述存储主节点, 具体用于当所述操作请求为读操作请求, 所述待操 作数据为待读取数据时, 根据所述待读取数据的键值读取本地存储的所述 待读取数据和所述待读取数据的版本, 向所述路由库返回读取的数据、 读 取的数据的版本和读操作响应;
所述至少一个备份节点, 具体用于当所述操作请求为读操作请求, 所 述待操作数据为待读取数据时, 根据所述待读取数据的键值读取本地存储 的所述待读取数据和所述待读取数据的版本, 向所述路由库返回读取的数 据、 读取的数据的版本和读操作响应。
18、 根据权利要求 17所述的系统, 其特征在于,
所述路由库, 还用于根据预设的读操作策略接收返回的数据, 并计算 读取操作成功的次数, 根据所述读取操作成功的次数和所述预设的读操作 策略识别并向所述块存储驱动返回所述读取的数据中最新版本的数据, 以 便所述块存储驱动对所述最新版本的数据进行处理后, 返回基于逻辑块寻 址方式的读操作请求对应的数据。
PCT/CN2011/075048 2011-05-31 2011-05-31 数据读写方法、装置和存储系统 WO2011157144A2 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
CN201180000715.1A CN102918509B (zh) 2011-05-31 2011-05-31 数据读写方法、装置和存储系统
EP11795110.3A EP2698718A2 (en) 2011-05-31 2011-05-31 Data reading and writing method, device and storage system
PCT/CN2011/075048 WO2011157144A2 (zh) 2011-05-31 2011-05-31 数据读写方法、装置和存储系统
US13/706,068 US8938604B2 (en) 2011-05-31 2012-12-05 Data backup using distributed hash tables

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2011/075048 WO2011157144A2 (zh) 2011-05-31 2011-05-31 数据读写方法、装置和存储系统

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US13/706,068 Continuation US8938604B2 (en) 2011-05-31 2012-12-05 Data backup using distributed hash tables

Publications (2)

Publication Number Publication Date
WO2011157144A2 true WO2011157144A2 (zh) 2011-12-22
WO2011157144A3 WO2011157144A3 (zh) 2012-04-19

Family

ID=45348624

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2011/075048 WO2011157144A2 (zh) 2011-05-31 2011-05-31 数据读写方法、装置和存储系统

Country Status (4)

Country Link
US (1) US8938604B2 (zh)
EP (1) EP2698718A2 (zh)
CN (1) CN102918509B (zh)
WO (1) WO2011157144A2 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020143410A1 (zh) * 2019-01-10 2020-07-16 阿里巴巴集团控股有限公司 数据存储方法及装置、电子设备、存储介质

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104007938B (zh) * 2014-05-29 2017-04-05 华为技术有限公司 在存储网络中的键值生成方法及装置
CN104077374B (zh) * 2014-06-24 2018-09-11 华为技术有限公司 一种实现ip盘文件存储的方法及装置
JP2016099969A (ja) * 2014-11-26 2016-05-30 富士通株式会社 情報処理装置、データ保存システム、及びデータ保存方法
KR20160131359A (ko) * 2015-05-07 2016-11-16 에스케이하이닉스 주식회사 메모리 모듈, 메모리 모듈의 모듈 콘트롤러 및 메모리 모듈의 동작 방법
CN107748702B (zh) * 2015-06-04 2021-05-04 华为技术有限公司 一种数据恢复方法和装置
US9927984B2 (en) * 2015-10-14 2018-03-27 Samsung Electronics Co., Ltd. Electronic system with interface control mechanism and method of operation thereof
CN105516254B (zh) * 2015-11-26 2019-03-12 南京佰联信息技术有限公司 无人驾驶装置的数据读写方法及装置
US10534547B2 (en) * 2015-12-29 2020-01-14 EMC IP Holding Company LLC Consistent transition from asynchronous to synchronous replication in hash-based storage systems
JP6734058B2 (ja) * 2016-01-27 2020-08-05 株式会社バイオス 制御装置
JP6542152B2 (ja) * 2016-03-29 2019-07-10 東芝メモリ株式会社 オブジェクトストレージ、コントローラおよびプログラム
US11120002B2 (en) * 2016-07-20 2021-09-14 Verizon Media Inc. Method and system for concurrent database operation
US10445199B2 (en) * 2016-12-22 2019-10-15 Western Digital Technologies, Inc. Bad page management in storage devices
CN110286849B (zh) * 2019-05-10 2023-07-21 深圳物缘科技有限公司 数据存储系统的数据处理方法和装置
CN110336857B (zh) * 2019-06-03 2022-04-12 平安科技(深圳)有限公司 网络块设备的创建方法、装置、设备和存储介质
US11321244B2 (en) * 2019-12-16 2022-05-03 Samsung Electronics Co., Ltd. Block interface emulation for key value device
CN113037772B (zh) * 2021-03-30 2023-05-02 苏州科达科技股份有限公司 数据处理方法、系统、设备及存储介质
CN113806125B (zh) * 2021-09-07 2023-12-22 济南浪潮数据技术有限公司 一种卸载卷异常的处理方法、装置、设备及可读介质

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6470438B1 (en) * 2000-02-22 2002-10-22 Hewlett-Packard Company Methods and apparatus for reducing false hits in a non-tagged, n-way cache
US7443841B2 (en) * 2002-10-30 2008-10-28 Nortel Networks Limited Longest prefix matching (LPM) using a fixed comparison hash table
JP2005140823A (ja) 2003-11-04 2005-06-02 Sony Corp 情報処理装置、制御方法、プログラム、並びに記録媒体
FR2878673B1 (fr) * 2004-11-26 2007-02-09 Univ Picardie Jules Verne Etab Systeme et procede de sauvegarde distribuee perenne
JP2007156597A (ja) * 2005-12-01 2007-06-21 Hitachi Ltd ストレージ装置
JP2007219611A (ja) * 2006-02-14 2007-08-30 Hitachi Ltd バックアップ装置及びバックアップ方法
US8161353B2 (en) 2007-12-06 2012-04-17 Fusion-Io, Inc. Apparatus, system, and method for validating that a correct data segment is read from a data storage device
US8352692B1 (en) * 2007-03-30 2013-01-08 Symantec Corporation Utilizing peer-to-peer services with single instance storage techniques
US8775817B2 (en) * 2008-05-12 2014-07-08 Microsoft Corporation Application-configurable distributed hash table framework
US8335889B2 (en) 2008-09-11 2012-12-18 Nec Laboratories America, Inc. Content addressable storage systems and methods employing searchable blocks
CN101783761A (zh) * 2009-01-21 2010-07-21 华为技术有限公司 一种存储及查找路由表的方法及装置
CN102023809B (zh) * 2009-09-21 2012-10-17 成都市华为赛门铁克科技有限公司 存储系统、从存储系统读取数据的方法及写入数据的方法
US8996803B2 (en) * 2010-07-02 2015-03-31 Futurewei Technologies, Inc. Method and apparatus for providing highly-scalable network storage for well-gridded objects
EP2622452A4 (en) * 2010-09-30 2017-10-04 Nec Corporation Storage system
GB2486462B (en) * 2010-12-16 2019-04-24 Maidsafe Found Distributed file system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
None
See also references of EP2698718A4

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020143410A1 (zh) * 2019-01-10 2020-07-16 阿里巴巴集团控股有限公司 数据存储方法及装置、电子设备、存储介质

Also Published As

Publication number Publication date
WO2011157144A3 (zh) 2012-04-19
EP2698718A4 (en) 2014-02-19
CN102918509A (zh) 2013-02-06
US8938604B2 (en) 2015-01-20
CN102918509B (zh) 2014-06-04
US20130111187A1 (en) 2013-05-02
EP2698718A2 (en) 2014-02-19

Similar Documents

Publication Publication Date Title
WO2011157144A2 (zh) 数据读写方法、装置和存储系统
US11249857B2 (en) Methods for managing clusters of a storage system using a cloud resident orchestrator and devices thereof
JP5047165B2 (ja) 仮想化ネットワークストレージシステム、ネットワークストレージ装置及びその仮想化方法
US8452856B1 (en) Non-disruptive storage server migration
JP5026283B2 (ja) 協調的共用ストレージアーキテクチャ
US11921597B2 (en) Cross-platform replication
US8793432B2 (en) Consistent distributed storage communication protocol semantics in a clustered storage system
JP2005267327A (ja) ストレージシステム
AU2015360953A1 (en) Dataset replication in a cloud computing environment
US20090154472A1 (en) Packet Forwarding Apparatus And Method For Virtualization Switch
US11768624B2 (en) Resilient implementation of client file operations and replication
US10872036B1 (en) Methods for facilitating efficient storage operations using host-managed solid-state disks and devices thereof
US11343308B2 (en) Reduction of adjacent rack traffic in multi-rack distributed object storage systems
US20180373457A1 (en) Methods for copy-free data migration across filesystems and devices thereof
US10503409B2 (en) Low-latency lightweight distributed storage system
US10798159B2 (en) Methods for managing workload throughput in a storage system and devices thereof
US11995354B2 (en) Storage area network controller with integrated circuit having a plurality of logic paths
WO2014077451A1 (ko) Iscsi 스토리지 시스템을 이용한 네트워크 분산 파일 시스템 및 방법
JP2023541069A (ja) アクティブ-アクティブストレージシステムおよびそのデータ処理方法
US20210026780A1 (en) Methods for using extended physical region page lists to improve performance for solid-state drives and devices thereof
JP4478000B2 (ja) データ仲介方法およびデータ仲介装置
WO2016170619A1 (ja) 計算機システム
TW200820071A (en) Storage controllers

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 201180000715.1

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11795110

Country of ref document: EP

Kind code of ref document: A2

REEP Request for entry into the european phase

Ref document number: 2011795110

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2011795110

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE