WO2012097588A1 - 数据存储方法、设备和系统 - Google Patents

数据存储方法、设备和系统 Download PDF

Info

Publication number
WO2012097588A1
WO2012097588A1 PCT/CN2011/078476 CN2011078476W WO2012097588A1 WO 2012097588 A1 WO2012097588 A1 WO 2012097588A1 CN 2011078476 W CN2011078476 W CN 2011078476W WO 2012097588 A1 WO2012097588 A1 WO 2012097588A1
Authority
WO
WIPO (PCT)
Prior art keywords
volume
data
virtual block
primary volume
write operation
Prior art date
Application number
PCT/CN2011/078476
Other languages
English (en)
French (fr)
Inventor
周文明
钟炎培
吴清
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2012097588A1 publication Critical patent/WO2012097588A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2053Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant
    • G06F11/2056Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant by mirroring
    • G06F11/2071Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant by mirroring using a plurality of controllers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • G06F3/0617Improving the reliability of storage systems in relation to availability
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/065Replication mechanisms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/16Implementation or adaptation of Internet protocol [IP], of transmission control protocol [TCP] or of user datagram protocol [UDP]
    • H04L69/161Implementation details of TCP/IP or UDP/IP stack architecture; Specification of modified or new header fields
    • H04L69/162Implementation details of TCP/IP or UDP/IP stack architecture; Specification of modified or new header fields involving adaptations of sockets based mechanisms

Definitions

  • the embodiments of the present invention relate to the field of communications technologies, and in particular, to a data storage method, device, and system.
  • NBD Network Block Device
  • TCP Transmission Control Protocol
  • Internet Internet Protocol
  • IP IP Protocol
  • a network abstracts a file or block device on a server for use by a client.
  • the corresponding software is a Linux-based network storage software that can be used to build Linux-based network storage systems.
  • storage system especially for commercial solutions, storage systems are complex and demanding performance and reliability.
  • the existing NBD only completes ordinary network transmission, and lacks considerations for network anomalies and storage node anomalies.
  • NBD Client For example, deploy the NBD client (NBD Client) on the local server A, and virtualize an NBD device nbdl to deploy the NBD server (NBD Server) on the remote server B.
  • NBD server NBD Server
  • the virtual machine created on the nbdl is running, but due to a network exception or an abnormality in the storage node, the data cannot be read from the NBD server, causing the virtual machine to shut down.
  • RAID 1 Array of Independent Disks 1; hereinafter referred to as: RAID 1), that is, RAID1 consists of multiple hard disks on a single storage node at the bottom for use by NBD servers.
  • the data volume is doubled after the RAID card, which greatly increases the load of the storage node.
  • the prior art also provides another solution, that is, a primary volume and a backup volume scheme, specifically, an upper layer service program write input (Input; hereinafter referred to as 1) / output (Output; hereinafter referred to as: 0)
  • an upper layer service program write input Input; hereinafter referred to as 1
  • Output hereinafter referred to as: 0
  • the primary volume transfers the data to the backup volume.
  • the backup volume is written, the I/O result is reported to the primary volume.
  • the primary volume reports the write I/O to the upper layer business program. result.
  • the primary and backup volume schemes physically separate the primary and secondary devices. Compared with the RAID1 solution, the data is not damaged when the single node fails, further improving reliability. However, the inventors found that the primary and backup volume schemes still have the following disadvantages:
  • the backup volume is only used to back up data. During the running of the system, all the load is in the primary volume, that is, the node where the primary volume is located becomes the I/O bottleneck;
  • Embodiments of the present invention provide a data storage method, device, and system to improve storage reliability and reduce message traffic.
  • the embodiment of the invention provides a data storage method, including:
  • the kernel of the end reports the result of the write operation.
  • the embodiment of the invention further provides a primary volume node device, including:
  • a receiving module configured to receive data to be written by using a socket connection between the primary volume node device and the virtual block storage client
  • a write module configured to write data to be written received by the receiving module to a volume file of the primary volume node device, and pass the data to be written through the primary volume node device and a backup volume node device a socket connection between the backup volume node device, so that the backup volume node device stores the data to be written to the virtual volume after writing the volume file of the backup volume node device
  • the client's kernel reports the result of the write operation.
  • the embodiment of the invention further provides a virtual block storage client device, including:
  • connection establishing module configured to establish a socket connection with the pre-designated backup volume according to a pre-specified backup volume of the backup volume, and establish a socket with the pre-designated primary volume according to the pre-specified listening port of the primary volume Word connection
  • Obtaining a module configured to obtain a volume size, a detection point, and a single node bitmap of the pre-specified backup volume, and obtain a volume size, a detection point, and a single node bitmap of the pre-specified primary volume;
  • a comparison module configured to compare the detection points of the pre-designated primary volume and the pre-designated backup volume
  • a determining module configured to determine, according to the comparison result of the comparison module, that the volume corresponding to the latest detection point is a real primary volume, and the volume corresponding to the second new detection point is a real backup volume;
  • a registration module configured to register a role of the real primary volume and the real backup volume with a kernel of the virtual block storage client device, and between the real primary volume and the real backup volume a descriptor of a socket connection established by the connection establishment module with the pre-designated backup volume, and a socket connection established by the connection establishment module with the pre-designated primary volume Descriptor is registered to the kernel of the virtual block storage client device;
  • the calling module is configured to invoke a system function to enter a kernel mode thread, and process the write operation request sent by the upper layer service program in the kernel state thread.
  • the embodiment of the invention further provides a backup volume node device, including:
  • a data receiving module configured to receive data to be written sent by the primary volume node device by using a socket connection between the primary volume node device and the backup volume node device;
  • a data writing module configured to write the data to be written received by the data receiving module to a volume file of the backup volume node device
  • a result reporting module configured to report a write operation to a kernel of the virtual block storage client device fruit.
  • the embodiment of the present invention further provides a storage system, including: the foregoing primary volume node device, the virtual block storage client device, and the backup volume node device.
  • the data to be written is written into the volume file of the primary volume, and the above
  • the data to be written is sent to the backup volume through the socket connection between the primary volume and the backup volume, so that the backup volume writes the data to be written to the volume file of the backup volume, thereby improving storage reliability;
  • the result of the write operation is reported by the backup volume to the kernel of the virtual block storage client; thereby reducing the message traffic and sharing the partial load of the primary volume to achieve dynamic load balancing. the goal of. DRAWINGS
  • FIG. 1 is a flow chart of an embodiment of a data storage method according to the present invention.
  • FIG. 2 is a schematic diagram of an embodiment of a network architecture of the present invention.
  • FIG. 3 is a flow chart of an embodiment of establishing a triangular model according to the present invention.
  • FIG. 4 is a flow chart of another embodiment of a data storage method according to the present invention.
  • FIG. 5 is a schematic diagram of another embodiment of a network architecture according to the present invention.
  • FIG. 6 is a flow chart of still another embodiment of a data storage method according to the present invention.
  • FIG. 7 is a schematic diagram of still another embodiment of a network architecture according to the present invention.
  • FIG. 8 is a flowchart of still another embodiment of a data storage method according to the present invention.
  • FIG. 9 is a schematic diagram of still another embodiment of a network architecture according to the present invention.
  • FIG. 10 is a schematic structural diagram of an embodiment of a primary volume node device according to the present invention
  • 11 is a schematic structural diagram of another embodiment of a primary volume node device according to the present invention
  • FIG. 12 is a schematic structural diagram of an embodiment of a virtual block storage client device according to the present invention
  • FIG. 13 is another embodiment of a virtual block storage client device according to the present invention
  • FIG. 14 is a schematic structural diagram of an embodiment of a backup volume node device according to the present invention
  • FIG. 15 is a schematic structural diagram of another embodiment of a backup volume node device according to the present invention.
  • 16 is a schematic structural diagram of an embodiment of a storage system according to the present invention.
  • FIG. 17 is a schematic diagram of an embodiment of a cloud storage system of the present invention. detailed description
  • FIG. 1 is a flowchart of an embodiment of a data storage method according to the present invention. As shown in FIG. 1, the data storage method may include:
  • Step 101 Receive data to be written by a socket connection between the primary volume and the virtual block storage client.
  • Step 102 Write the to-be-written data into the volume file of the primary volume, and send the to-be-written data to the backup volume through a socket connection between the primary volume and the backup volume, so that the backup volume is to be written. After the incoming data is written to the volume file of the backup volume, the result of the write operation is reported to the kernel of the virtual block storage client.
  • the primary volume may further receive a write operation request sent by the kernel of the virtual block storage client, and write the write operation.
  • the operation request is forwarded to the backup volume, the write operation request is used to notify the primary volume and/or the backup volume to prepare to receive the data to be written;
  • the above write operation request is a virtual block storage client kernel from the kernel note Obtained in the request queue of the book; the write operation request in the request queue registered by the kernel is placed in the request queue registered by the kernel after the virtual block storage client receives the write operation request sent by the upper layer service program.
  • the primary volume may send a detection point record request to the backup volume to update the primary volume and the backup volume to respective detection points.
  • the primary volume can also check whether there is dirty data in the dirty data block list; when there is dirty data in the dirty data block list, and the predetermined condition is met After that, the dirty data is written to the disk.
  • the dirty data is temporarily stored in the memory buffer, the data of the volume file has not been written; the dirty data block list is used to record which data is dirty data.
  • the virtual block storage client may establish a socket connection with the pre-specified backup volume according to the pre-specified backup port of the backup volume. And obtaining the volume size, detection point and single node bitmap of the pre-specified backup volume; the virtual block storage client can establish a socket connection with the pre-specified primary volume according to the pre-specified listening port of the primary volume, and obtain the advance The volume size, detection point, and single-node bitmap of the specified primary volume; then, the virtual block storage client can compare the pre-specified primary volume with the detection point of the pre-specified backup volume to determine that the volume corresponding to the latest detection point is true The primary volume, the volume corresponding to the new detection point is a real backup volume; after that, the virtual block storage client can register the role of the real primary volume and the real backup volume to the kernel of the virtual block storage client; When the link between the volume and the real backup volume is normal, the virtual block storage client
  • the virtual block storage client will connect to the socket established by the pre-specified backup volume.
  • the descriptor, and the descriptor of the socket connection established with the pre-specified primary volume are registered to the virtual block, and the data between the real primary volume and the real backup volume is not synchronized, then the virtual block storage client can be truly
  • the primary volume sends a data synchronization request to synchronize the real primary volume with the real backup volume.
  • the new primary volume when an abnormality occurs in the primary volume, after the core of the virtual block storage client performs the active/standby switchover, and the backup volume is registered as the new primary volume, the new primary volume may pass the new one.
  • the socket connection between the primary volume and the virtual block storage client receives the data to be written, and after the data to be written is written to the volume file of the new primary volume, the detection point and the single node of the new primary volume are updated.
  • the bitmap then, the new primary volume reports the result of the write operation to the kernel of the virtual block storage client.
  • the primary volume may receive the link abnormality message sent by the virtual block storage client, and the link is abnormal.
  • the message is forwarded to the backup volume, so that the backup volume sends the result of the write operation to the primary volume, and then the result of the write operation is sent by the primary volume to the kernel of the virtual block storage client; wherein the link exception message is virtualized
  • the heartbeat process of the block storage client is sent to the virtual block storage client.
  • the primary volume may update the detection point of the primary volume after writing the data to be written to the volume file of the primary volume. Single-node bitmap; Then, the result of the write operation is reported by the primary volume to the kernel of the virtual block storage client.
  • the kernel of the virtual block storage client may first determine whether the result of the write operation corresponds to the sent write operation request; if corresponding, the virtual block storage client core may The result of the above write operation is sent to the upper layer service program; if the result of the write operation does not correspond to the sent write operation request, the kernel of the virtual block storage client may discard the result of the write operation, or cache but not process
  • the result of the write operation is not limited by the embodiment of the present invention. However, when the result of the write operation does not correspond to the sent write operation request, the kernel of the virtual block storage client discards the write operation. The result is explained as an example.
  • the primary volume after receiving the data to be written by the socket connection between the primary volume and the virtual block storage client, the primary volume writes the data to be written into the volume file of the primary volume, and the foregoing
  • the write data is sent to the backup volume through the socket connection between the primary volume and the backup volume, so that the backup volume writes the data to be written to the volume file of the backup volume, thereby improving storage reliability;
  • the result of the write operation is reported by the backup volume to the kernel of the virtual block storage client; thereby reducing the message traffic and sharing the partial load of the primary volume to achieve dynamic load balancing. purpose.
  • the data storage method provided by the embodiment of the present invention can improve storage reliability, and can further reduce message traffic and improve performance under the premise of ensuring storage reliability.
  • the primary volume and the backup volume can be deployed on different storage nodes, which can solve the problem of service unavailability caused by the failure of the single storage node, and can also be used to store the storage node or the backup volume where the service and the primary volume are located.
  • the service unavailability problem caused by the network abnormality between the nodes in addition, the embodiment of the present invention can reduce the message traffic under the premise of ensuring that the single node fault service is not interrupted.
  • the backup volume participates in the business process (responding to the result of the write operation) while implementing data backup, and can share part of the load of the primary volume to achieve a dynamic load balancing.
  • FIG. 2 is a schematic diagram of an embodiment of a network architecture according to the present invention.
  • the network architecture is a stable triangular model in which a solid line is used.
  • the arrow points to the server side, and the dotted line is the flow of control messages.
  • the backup volume (hereinafter referred to as BV) is not perceived, and the read data operation and the response are all performed by the primary volume (hereinafter referred to as: PV);
  • PV primary volume
  • the operation request is made, the data to be written is received by the PV, and the BV responds to the result of the write operation. In this way, you can reduce the amount of messages by 25% to achieve performance.
  • the network architecture shown in Figure 2 can effectively prevent storage node anomalies and network anomalies, as long as it is not a virtual block storage client (hereinafter referred to as vbs-client), Any two of the PV and BV are abnormal at the same time, or the link between the vbs-client and the PV, and the link between the vbs-client and the BV are abnormal at the same time, and the upper layer service program can run without interruption.
  • vbs-client virtual block storage client
  • FIG. 3 is a flowchart of an embodiment of establishing a triangular model according to the present invention. As shown in FIG. 3, the process of establishing the triangular model shown in FIG. 2 may include:
  • Step 301 The vbs-client establishes a SOCKET connection with the pre-designated BV according to the pre-designated BV listening port.
  • the pre-designated BV listening port is provided in advance by a pre-designated BV.
  • Step 302 The vbs-client negotiates with the pre-designated BV, and the content of the negotiation includes: (1) notifying that the pre-designated BV is currently connected to the vbs-client;
  • Step 303 The pre-specified BV returns a volume size, a check point, and a solo bitmap to the vbs-client.
  • Step 305 the vbs-client negotiates with the pre-specified PV, and the content of the negotiation includes: (1) notifying that the pre-specified PV is currently connected by vbs-client;
  • Step 306 the pre-specified PV returns a volume size, a check point, and a solo bitmap to the vbs-client.
  • Step 307 The vbs-client compares the pre-specified PV with the check point of the BV specified in advance, and determines that the volume corresponding to the latest detection point is a true PV, and the volume corresponding to the new detection point is a true BV.
  • vbs-client finds that the check point of the BV specified in advance is the latest, that is, the data of the BV-side specified in advance is the latest, which indicates that the pre-specified PV has occurred before this. Abnormal, then the master/slave switchover is performed by the kernel of vbs-client, and the pre-specified BV is used as the real PV, after which the storage system writes the data in the pre-specified BV.
  • the pre-specified PV is a true PV
  • the pre-specified BV is a true BV as an example.
  • Step 308 vbs-client registers the real BV role with the kernel of the vbs-client, and notifies the pre-specified BV that it is a real BV.
  • Step 309 vbs-client registers the real PV role with the kernel of the vbs-client, and notifies the pre-specified PV, which is a true PV. If the link is normal, go to Step 311 to Step 313. If the link between the real PV and the real BV is abnormal, exit the process and enter the rebuild process.
  • Step 311 If the data between the real PV and the real BV is out of synchronization, the vbs-client sends a data synchronization request to the real PV to synchronize the real PV with the real BV.
  • Step 312 vbs-client registers the descriptor of the SOCKET connection established in step 301 and step 304 to the kernel of the vbs-client.
  • the kernel of vbs-client sends a read operation request to the real PV; when receiving data or control messages, the vbs-client kernel chooses a suitable SOCKET connection.
  • vbs-client starts receiving data from the SOCKET connection between the vbs-client kernel and the real BV.
  • vbs-client is the kernel from vbs-client.
  • the SOCKET connection with the real PV starts receiving data.
  • Step 313 vbs-client calls a system function, for example: ioctrl enters a kernel mode thread, and the kernel state thread processes an I/O request sent by an upper layer service program by a callback function registered to the system, for example: a write operation request and a read operation request.
  • a system function for example: ioctrl enters a kernel mode thread
  • the kernel state thread processes an I/O request sent by an upper layer service program by a callback function registered to the system, for example: a write operation request and a read operation request.
  • the above embodiment may implement establishing a triangular model, and processing a write operation request through the triangular model, Improve storage reliability, reduce message traffic, and improve storage performance.
  • FIG. 4 is a flowchart of another embodiment of a data storage method according to the present invention.
  • the processing procedure of the read operation request is only the interaction between the vbs-client and the PV, which is the same as the prior art.
  • the embodiment only describes the processing of the write operation request.
  • the data storage method may include:
  • Step 401 the triangular model is established, and the storage system is in a stable state.
  • Step 402 After the triangle model is established, the vbs-client will start a kernel state thread specifically for processing the write operation request.
  • Step 403 The write operation request sent by the upper layer service program is placed in a request queue registered by the kernel by the operating system (Operating System; OS) of the vbs-client.
  • OS Operating System
  • Step 404 the kernel thread of the vbs-client obtains a write operation request from the request queue.
  • the kernel thread of the vbs-client may obtain a write operation request from the request queue according to a predetermined rule, and the predetermined rule may be a first-in first-out rule or other rules, which is not limited in this embodiment, as long as The kernel thread of the vbs-client can obtain a write operation request from the request queue according to the predetermined rule.
  • the present embodiment uses the predetermined rule as the first-in first-out rule as an example for description.
  • Step 405 The kernel of the vbs-client sends a write operation request to the PV, and the write operation request is used to notify the PV that it is ready to receive data to be written.
  • Step 406 After receiving the write operation request, the PV sub-process forwards the received write operation request to the BV sub-process; the write operation request is used to notify the BV to prepare to receive the data to be written.
  • Step 407 The PV receives the data to be written through the SOCKET connection between the PV and the vbs-client, writes the data to be written into the volume file of the PV, and passes the data to be written through the PV and the BV.
  • the SOCKET connection is sent to the BV so that the BV writes the data to be written to the BV volume file.
  • Step 408 after the BV writes the data to be written to the volume file of the BV, the BV reports the result of the write operation to the kernel of the vbs-client.
  • the PV sends a checkpoint record request to the BV.
  • step 410 PV and BV update their respective check points.
  • the check point is the only criterion for judging that the current side data is the latest data.
  • Step 411 The PV checks whether there is dirty data in the Dirty Block List (hereinafter referred to as DBL). If yes, and the predetermined condition is met, the PV forcibly writes the dirty data in the DBL to the disk. Among them, the dirty data is temporarily stored in the memory buffer, the data of the volume file has not been written; DBL is used to record which data is dirty data.
  • DBL Dirty Block List
  • the predetermined condition may be one or a combination of the following:
  • the dirty data in the DBL can be written to the disk after the predetermined time interval has elapsed.
  • Step 412 after receiving the result of the write operation sent by the BV, the kernel of the vbs-client determines whether the result of the received write operation corresponds to the sent write operation request; if yes, step 413 is performed; if the above write
  • the result of the inbound operation does not correspond to the sent write operation request, and the kernel of the vbs-client may discard the result of the write operation, or cache but not process the result of the write operation, which is not limited in this embodiment.
  • the kernel of the vbs-client discards the result of the write operation as an example.
  • step 413 the kernel of the vbs-client sends the result of the write operation to the upper layer service program.
  • the PV after receiving the data to be written by the SOCKET connection between the PV and the vbs-client, the PV writes the data to be written into the volume file of the PV, and passes the data to be written through the PV and the BV.
  • the SOCKET connection is sent to the BV so that the BV writes the data to be written to the BV volume file, thereby improving storage reliability; after the BV writes the data to be written to the BV volume file, the BV is The kernel of vbs-client reports the result of the write operation; thus, the message traffic can be reduced, and the partial load of the PV can be shared to achieve the purpose of dynamic load balancing.
  • FIG. 5 is a schematic diagram of another embodiment of the network architecture of the present invention.
  • the network architecture shown in Figure 5 is a single node (SOLO) model.
  • FIG. 6 is a flowchart of still another embodiment of a data storage method according to the present invention. This embodiment introduces a data storage process in the network architecture shown in FIG. 5.
  • the data storage method may include:
  • Step 601 The vbs-client kernel state thread receives a write operation request before the PV occurrence exception.
  • Step 602 The PV abnormality causes an abnormality in the SOCKET connection between the vbs-client and the PV, and the kernel of the vbs-client performs an active/standby switchover to register the original BV as a new PV.
  • Step 603 the vbs-client kernel state thread starts a process of detecting whether the abnormal P V is restored.
  • Step 604 the vbs-client kernel state thread sends the above write operation request to the new PV (ie, the original BV).
  • Step 605 the new PV writes the data to be written into the volume file of the new PV.
  • Step 606 the new PV updates the checkpoint.
  • Step 607 new PV update solo bitmap.
  • Step 608 the new PV reports the result of the write operation to the kernel of the vbs-client.
  • Step 609 After the kernel of the vbs-client determines that the result of the write operation corresponds to the previously sent write operation request, the result of the write operation is sent to the upper layer service program.
  • Step 610 The process of detecting whether the abnormal PV is restored sends a detection message to the kernel of the vbs-client, so that the kernel of the vbs-client determines, according to the detection message, whether the abnormal PV is restored.
  • the solo bitmap and the check point are updated, and the solo bitmap is used to record which piece of data has changed, and the check point is used to record the time point at which the data to be written is written.
  • the function of recording the solo bitmap and the check point is that the storage system will judge which one is based on the check point when the next triangle model is established.
  • the data on the side is up to date, and only the volume with the latest data can be PV.
  • the corresponding data on the PV can be synchronized to the BV according to the information recorded in the solo bitmap to ensure the consistency of the data on the PV and BV; and vice versa.
  • FIG. 7 is a schematic diagram of another embodiment of the network architecture of the present invention, and the network architecture shown in FIG. 7 is an L-model.
  • the BV cannot send the result of the write operation to the upper layer service program.
  • the BV turns to the result of the PV report write operation, and finally the result of the write operation is reported by the PV upper layer service program.
  • FIG. 8 is a flowchart of still another embodiment of a data storage method according to the present invention. This embodiment introduces a data storage process in the network architecture shown in FIG. 7.
  • the data storage method may include:
  • Step 801 In the steady state, the link between the vbs-client and the BV suddenly abnormalizes. Step 802, the result of the write operation reported by the BV is originally required. Because the link between the vbs-client and the BV is abnormal, the BV cannot send the result of the write operation to the kernel of the vbs-client, and is blocked all the time.
  • Step 803 When the heartbeat mechanism of the vbs-client detects that the link is abnormal, send a link exception message to the vbs-client.
  • Step 804 The vbs-client sends a link exception message to the PV.
  • Step 805 The PV forwards the link abnormality message to the BV.
  • Step 806 After receiving the link abnormality message, the BV sends the result of the unsent write operation to the PV.
  • Step 807 the PV sends the result of the write operation to the kernel of the vbs-client.
  • Step 808 The kernel of the vbs-client reports the result of the write operation to the upper layer service program.
  • Step 809 after a write operation request, after the BV writes the data to be written to the volume file of the BV, the BV directly reports the result of the write operation to the PV without attempting to send the kernel to the vbs-client.
  • the network architecture can be as shown in FIG. 9.
  • FIG. 9 is a schematic diagram of still another embodiment of the network architecture of the present invention.
  • the network structure shown in FIG. 9 is an inverted V model.
  • the storage system can ignore the BV, for example, to process the write operation request.
  • the PV writes the data to be written to the PV volume file
  • the PV updates the check point and the solo bitmap of the PV.
  • the PV goes to the vbs-client.
  • the kernel reports the result of the write operation, that is, the vbs-client does not receive the result of the write operation from the BV, but receives it from the PV.
  • the overall processing flow is similar to the SOLO model, and will not be described here.
  • the method includes the steps of the foregoing method embodiments; and the foregoing storage medium includes: a medium that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk.
  • FIG. 10 is a schematic structural diagram of an embodiment of a primary volume node device according to the present invention.
  • the primary volume node device can implement the process of the embodiment shown in FIG. 1.
  • the primary volume node device can include: a receiving module. 1001 and writing module 1002;
  • the receiving module 1001 is configured to receive data to be written by using a socket connection between the primary volume node device and the virtual block storage client.
  • the writing module 1002 is configured to write the data to be written received by the receiving module 1001 into the volume file of the primary volume node device, and connect the data to be written through the socket between the primary volume node device and the backup volume node device. Sended to the backup volume node device, so that the backup volume node device reports the result of the write operation to the kernel of the virtual block storage client after writing the data to be written to the volume file of the backup volume node device.
  • the writing module 1002 writes the to-be-written data into the primary volume node.
  • the volume file of the device, and the data to be written is sent to the backup volume node device through the socket connection between the primary volume node device and the backup volume node device, so that the backup volume node device writes the data to be written to the backup.
  • volume file of the volume node device which can improve storage reliability; After the backup volume node device writes the data to be written to the volume file of the backup volume, the backup volume node device reports the result of the write operation to the kernel of the virtual block storage client; thereby reducing message traffic and sharing the primary volume
  • the partial load of the node device achieves the purpose of dynamic load balancing.
  • FIG. 11 is a schematic structural diagram of another embodiment of a primary volume node device according to the present invention. The difference is that the primary volume node device shown in FIG. 11 may further include: a sending module. 1003;
  • the receiving module 1001 may further receive a write operation request sent by the kernel of the virtual block storage client before receiving the data to be written.
  • the sending module 1003 is configured to forward the write operation request received by the receiving module 1001 to the backup volume.
  • a node device the write operation request is used to notify the primary volume node device and/or the backup volume node device to prepare to receive data to be written;
  • the write operation request is obtained by the kernel of the virtual block storage client from the request queue registered by the kernel;
  • the write operation request in the kernel-registered request queue is placed in the request queue registered by the kernel after the virtual block storage client receives the write operation request sent by the upper-layer business program.
  • the sending module 1003 may further send a checkpoint record request to the backup volume node device, so that the primary volume node device and the backup volume node device update respective detection points.
  • the primary volume node device in this embodiment may further include:
  • the checking module 1004 is configured to check whether there is dirty data in the dirty data block list
  • the writing module 1002 can also write the dirty data to the disk after the dirty data is in the dirty data block list and the predetermined condition is satisfied.
  • the receiving module 1001 may further receive a link abnormality message sent by the virtual block storage client when the link between the virtual block storage client and the backup volume node device is abnormal; the sending module 1003 further The link abnormality message received by the receiving module 1001 may be forwarded to the backup volume node device, so that the backup volume node device sends the result of the write operation to the primary volume node device, and then the sending module 1003 sends the result of the foregoing write operation.
  • the client's kernel is stored for the virtual block; the link exception message is sent by the virtual block storage client's heartbeat process to the virtual block storage client.
  • the primary volume node device in this embodiment may further include:
  • the update module 1005 is configured to: when the link between the primary volume node device and the backup volume node device is abnormal, after the write module 1002 writes the data to be written to the volume file of the primary volume node device, the primary volume node device is updated.
  • the detection point and the single node bitmap; at this time, the sending module 1003 can report the result of the write operation to the kernel of the virtual block storage client.
  • FIG. 12 is a schematic structural diagram of an embodiment of a virtual block storage client device according to the present invention.
  • the virtual block storage client device may include: a connection establishment module 1201, an obtaining module 1202, a comparison module 1203, and a determining module 1204. , the registration module 1205 and the calling module 1206;
  • connection establishing module 1201 is configured to establish a socket connection with the pre-designated backup volume according to the pre-designated backup volume of the backup volume, and establish a socket according to the pre-specified listening port of the primary volume and the pre-designated primary volume.
  • the obtaining module 1202 is configured to obtain a volume size, a detection point, and a single node bitmap of the pre-specified backup volume, and obtain a volume size, a detection point, and a single node bitmap of the pre-specified primary volume;
  • the comparison module 1203 is configured to compare the detection points of the pre-specified primary volume and the pre-specified backup volume.
  • the determining module 1204 is configured to determine, according to the comparison result of the comparison module 1203, that the volume corresponding to the latest detection point is a real primary volume, and the second new The volume corresponding to the detection point is a real backup volume;
  • the registration module 1205 is configured to register a real primary volume with a kernel of the virtual block storage client device and a descriptor for connecting the connection established by the connection establishment module 1201 with the pre-designated backup volume, and the connection establishment module 1201 and the pre-designation
  • the descriptor of the socket connection established by the primary volume is registered to the kernel of the virtual block storage client device;
  • the calling module 1206 is configured to invoke a system function to enter a kernel mode thread, and process a write operation request sent by the upper layer service program in the kernel state thread.
  • FIG. 13 is a schematic structural diagram of another embodiment of a virtual block storage client device according to the present invention. Compared with the virtual block storage client device shown in FIG. 12, the difference is that the virtual block storage client device shown in FIG. 12 may also include: When, if the data between the real primary volume and the real backup volume is out of synchronization, a data synchronization request is sent to the real primary volume to synchronize the real primary volume with the real backup volume;
  • the active/standby switchover module 1208 is configured to perform an active/standby switchover on the virtual block storage client device when the primary volume is abnormal, and register the backup volume as a new primary volume, so that the new primary volume passes the new primary volume and
  • the socket connection between the virtual block storage clients receives the data to be written, and updates the detection point and the single node bitmap of the new primary volume after writing the data to be written to the volume file of the new primary volume. And reporting the result of the write operation to the kernel of the virtual block storage client device.
  • the virtual block storage client device 12 in this embodiment may further include: a result receiving module 1209 and a result sending module 1210;
  • the result receiving module 1209 is configured to receive a result of the write operation.
  • the determining module 1204 may further determine whether the result of the write operation received by the result receiving module 1209 corresponds to the sent write operation request;
  • the result sending module 1210 is configured to send the result of the writing operation to the upper layer service program after the determining module 1204 determines that the result of the writing operation received by the result receiving module 1209 corresponds to the sent write operation request.
  • FIG 14 is a schematic structural diagram of an embodiment of a backup volume node device according to the present invention.
  • the backup volume node device may include: a data receiving module 1401, a data writing module 1402, and a result reporting module 1403;
  • the data receiving module 1401 is configured to receive data to be written sent by the primary volume node device by using a socket connection between the primary volume node device and the backup volume node device;
  • the data writing module 1402 is configured to write the data to be written received by the data receiving module 1401 into a volume file of the backup volume node device;
  • the result reporting module 1403 is configured to report the result of the write operation to the kernel of the virtual block storage client device.
  • FIG. 15 is a schematic structural diagram of another embodiment of a backup volume node device according to the present invention.
  • the backup volume node device shown in FIG. 14 may further include:
  • the request receiving module 1404 is configured to receive, before the data receiving module 1401 receives the data to be written, a write operation request sent by the primary volume node device, where the write operation request is used to notify the backup volume node device to prepare to receive the to-be-written
  • the detection point record request sent by the primary volume node device may be received to update the detection point of the backup volume node device.
  • FIG. 16 is a schematic structural diagram of an embodiment of a storage system according to the present invention.
  • the storage system may include: a vbs client 1601, a PV 1602, and a BV 1603;
  • the PV 1602 is configured to receive data to be written by using a socket connection between the PV 1602 and the vbs client 1601, write the data to be written into the volume file of the PV 1602, and pass the data to be written through the PV 1602.
  • the socket connection with the BV 1603 is sent to the BV 1603 so that the BV 1603 reports the result of the write operation to the kernel of the vbs client 1601 after writing the data to be written to the volume file of the BV 1603.
  • the vbs client 1601 can be implemented by the virtual block storage client device shown in FIG. 12 or FIG. 13 of the present invention, and the PV 1602 can be implemented by the primary volume node device shown in FIG. 10 or FIG.
  • the vbs client 1601 may include a triangle model establishing module 16011, a control message processing module 16012, an I/O request sending module 16013, a link selecting module 16014, and an I/O result receiving module 16015;
  • the triangular model building module 16011 is configured to re-establish a triangular model or perform model switching when initial startup, node abnormality, or network abnormality.
  • the triangular model establishing module 16011 may refer to the method provided by the embodiment shown in FIG. 3 of the present invention.
  • the triangle model is established to implement the connection establishment module 1201, the obtaining module 1202, the comparison module 1203, the determining module 1204, the registration module 1205, the calling module 1206, and the request sending module 1207 in the virtual block storage client device provided by the embodiment shown in FIG. Features.
  • Control message processing module 16012 an interface between a user state and a kernel state
  • I/O request sending module 16013 an I/O request from an upper layer service program, for example: a write operation request or a read operation request is sent to the PV 1602 through the I/O request sending module 16013;
  • Link selection module 16014 When an abnormality occurs, the data is sent and received differently from the triangular model, so in the SOLO model or the L model, the kernel of the vbs-client selects an appropriate link when transmitting the I/O request;
  • the link selection module 16014 can implement the function of the active/standby switching module 1208 in the virtual block storage client device provided by the embodiment shown in FIG. 12;
  • the I/O result receiving module 16015 the result of the write operation from the BV is processed by the I/O result receiving module 16015.
  • the I/O result receiving module 16015 can implement the virtual block provided by the embodiment shown in FIG.
  • the functions of the result receiving module 1209 and the result sending module 1210 in the client device are stored.
  • the PV 1602 may include a read operation request processing module 16021, a write operation request processing module 16022, an active/standby data synchronization module 16023, a PV state detection module 16024, a single node bitmap (solo bitmap) 16025, a DBL 16026, and a volume. Document 16027.
  • the read operation request processing module 16021 is configured to process the read operation request from the vbs client 1601, and the read operation request in the triangular model only reaches the PV 1602, and the BV 1603 does not perceive;
  • a write operation request processing module 16022 configured to process a write operation request from the vbs client 1601, The write operation request is forwarded to the BV 1603 in the triangle model, and finally the BV 1603 reports the result of the write operation to the vbs client 1601.
  • the write operation request processing module 16022 can implement the embodiment shown in FIG. 10 of the present invention. Part of the functions of the receiving module 1001, the writing module 1002, and the sending module 1003 in the provided primary volume node device;
  • Active/standby data synchronization module 16023 When an abnormality occurs in PV 1602 or BV 1603, the storage system enters the SOLO model. When there is a write operation request, the solo bitmap 16025 records the changed data. When the triangular model is reconstructed, the primary and backup data are The synchronization module 16023 synchronizes the data to the other side to keep the data consistent;
  • PV state detection module 16024 When an abnormality occurs in the PV 1602, the storage system enters the SOLO model, and the original BV 1603 becomes a true PV. The PV state detection module 16024 polls whether the abnormal PV 1602 is restored to normal, and the triangle is restored after recovery. Model
  • Solo bitmap 16025 A data block that changes on the PV 1602 during the operation of the storage system under the SOLO model.
  • the solo bitmap 16025 can implement the update module in the primary volume node device provided by the embodiment shown in FIG. 10 of the present invention. Part of the 1005 function;
  • DBL 16026 When a write operation request occurs, the storage system first writes the data to be written to the buffer. These data to be written are considered dirty data before they are actually written to disk. DBL 16026 is used to record the dirty data. When there is dirty data in the DBL 16026 and the predetermined conditions are met, the PV 1602 can force the dirty data in the DBL 16026 to be written to the disk.
  • the predetermined condition may be one or a combination of the following:
  • PV 1602 may write dirty data in DBL 16026 to disk after a predetermined time interval has elapsed;
  • Volume File 16027 Based on A sparse file of the Advanced Log File System (XFS) file system used to store all data. Before writing to the data to be written, the volume file 16027 does not occupy disk space, which is a thin allocation, providing users with a great deal of flexibility.
  • XFS Advanced Log File System
  • the functions of the modules in the BV 1603 are the same as those in the PV 1602, and are not described here.
  • the PV 1602 After receiving the data to be written by the socket connection between the PV 1602 and the vbs client 1601, the PV 1602 writes the to-be-written data into the volume file of the PV 1602, and writes the above-mentioned to be written.
  • the incoming data is sent to the BV 1603 through a socket connection between the PV 1602 and the BV 1603, so that the BV 1603 writes the data to be written into the volume file of the BV 1603, thereby improving the kernel report writing of the storage reliable client 1601.
  • the result of the operation thus, the message traffic can be reduced, and the partial load of the PV 1602 can be shared to achieve the purpose of dynamic load balancing.
  • Cloud storage is a new concept extended and developed in the concept of cloud computing.
  • FIG. 17 is a schematic diagram of an embodiment of a cloud storage system according to the present invention. As shown in FIG. 17, the cloud storage system in this embodiment may include the following devices:
  • Block Storage Provider which is denoted as BSP1, BSP2 and BSP3 respectively
  • BSP1, BSP2 and BSP3 are upper block storage agents (BSA) Provide storage space.
  • BSA block storage agents
  • each BSP has PV and BV in the triangular model provided by the embodiment of the present invention. In this deployment, the I/O load of each BSP can be effectively distributed evenly.
  • the BSA as the interface between the storage management middleware and the underlying BSP, is responsible for providing virtual NBD devices to the upper layer.
  • the load balance between BSP2 and BSP3; the monitoring system is used to monitor whether each node is abnormal in real time.
  • it is also responsible for monitoring the performance of BSP1, BSP2 and BSP3, and notifying the storage resource management system in time if any abnormality occurs.
  • the data storage method, device and system provided by the embodiments of the present invention can conveniently construct a large and reliable storage system.
  • the embodiment of the present invention effectively reduces the message traffic of the entire storage system while improving storage reliability. In the case of high I/O load, the advantages are more obvious.
  • modules in the apparatus in the embodiments may be distributed in the apparatus of the embodiment according to the description of the embodiments, or may be correspondingly changed in one or more apparatuses different from the embodiment.
  • the modules of the above embodiments may be combined into one module, or may be further split into a plurality of sub-modules.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Description

数据存储方法、 设备和系统
本申请要求于 2011 年 1 月 19 日提交中国专利局、 申请号为 201110021715.1、 发明名称为 "数据存储方法、 设备和系统" 的中国专利 申请的优先权, 其全部内容通过引用结合在本申请中。
技术领域
本发明实施例涉及通信技术领域, 尤其涉及一种数据存储方法、 设备和 系统。
背景技术
网络块设备(Network Block Device; 以下简称: NBD ) , 是通过传输控 制协议 ( Transmission Control Protocol; 以下简称: TCP ) /因特网协议 ( Internet
Protocol; 以下简称: IP ) 网络把服务器上的文件或块设备抽象出来供客户端 使用的技术。 与之对应的软件是一种基于 Linux的网络存储软件, 利用这一软 件可以构建基于 Linux的网络存储系统。 作为一套存储系统, 尤其是用于商业 解决方案, 存储系统非常复杂, 对性能和可靠性的要求都很高。 但是, 现有 的 NBD只是完成普通的网络传输, 缺少对网络异常和存储节点异常的考虑。
举例来说, 在本地服务器 A上布署 NBD客户端(NBD Client ), 并虚拟一 个 NBD设备 nbdl, 在远程服务器 B上布署 NBD服务器(NBD Server ) 。 创建 于 nbdl上的虚拟机处于运行状态,但是由于网络异常或者存储节点发生异常, 无法从 NBD服务器读取数据, 造成虚拟机停机。
为解决上述可靠性问题, 现有技术提出独立磁盘冗余阵列 1 ( Redundant
Array of Independent Disks 1; 以下简称: RAID 1 ) 方案, 即由底层的单个存 储节点上的多个硬盘组成 RAID1 , 供 NBD服务器使用。
但是, 发明人发现上述 RAID1方案至少存在以下缺点:
( l ) RAIDl方案不能跨节点存储, 只能降低单节点发生故障的概率, 如 果 RAID卡出现故障, 则上层业务程序不可用;
( 2 )无法解决网络异常引起的业务不可用问题;
( 3 )对于存储节点来说, 经过 RAID卡之后数据量增倍, 大大增加了存 储节点的负荷。
为解决上述可靠性问题, 现有技术还提供另外一种方案, 即主卷和备份 卷方案, 具体地, 上层业务程序写输入(Input; 以下简称: 1 ) /输出(Output; 以下简称: 0 ) 时, 先向主卷写数据, 然后由主卷将数据传到备份卷, 备份 卷写完后再向主卷报告写 I/O结果, 最后由主卷向上层业务程序报告写 I/O结 果。
主卷和备份卷方案在物理上将主备分开了, 与 RAID1方案相比, 单节点 故障时数据不会损坏, 进一步提高了可靠性。 但发明人发现主卷和备份卷方 案仍然存在以下缺点:
( 1 )备份卷只是用来备份数据, 在系统运行过程中, 所有的负荷都在主 卷, 也就是说主卷所在的节点会成为 I/O瓶颈;
( 2 )—次 I/O要交互 4条消息, 消息流量较大。 发明内容
本发明实施例提供一种数据存储方法、设备和系统, 以提高存储可靠性, 降低消息流量。
本发明实施例提供一种数据存储方法, 包括:
通过主卷与虚拟块存储客户端之间的套接字连接接收待写入数据; 将所述待写入数据写入所述主卷的卷文件, 并将所述待写入数据通过所 述主卷与备份卷之间的套接字连接发送给所述备份卷, 以便所述备份卷在将 所述待写入数据写入所述备份卷的卷文件之后, 向所述虚拟块存储客户端的 内核上报写入操作的结果。
本发明实施例还提供一种主卷节点设备, 包括:
接收模块, 用于通过所述主卷节点设备与虚拟块存储客户端之间的套接 字连接接收待写入数据;
写入模块, 用于将所述接收模块接收的待写入数据写入所述主卷节点设 备的卷文件, 并将所述待写入数据通过所述主卷节点设备与备份卷节点设备 之间的套接字连接发送给所述备份卷节点设备, 以便所述备份卷节点设备在 将所述待写入数据写入所述备份卷节点设备的卷文件之后, 向所述虚拟块存 储客户端的内核上报写入操作的结果。
本发明实施例还提供一种虚拟块存储客户端设备, 包括:
连接建立模块, 用于根据预先指定的备份卷的监听端口与所述预先指定 的备份卷建立套接字连接, 以及根据预先指定的主卷的监听端口与所述预先 指定的主卷建立套接字连接;
获得模块, 用于获得所述预先指定的备份卷的卷大小、 检测点和单节点 位图, 以及获得所述预先指定的主卷的卷大小、 检测点和单节点位图;
比较模块, 用于比较所述预先指定的主卷和所述预先指定的备份卷的检 测点;
确定模块, 用于根据所述比较模块的比较结果确定最新检测点对应的卷 为真正的主卷, 次新检测点对应的卷为真正的备份卷;
注册模块, 用于向所述虚拟块存储客户端设备的内核注册所述真正的主 卷和所述真正的备份卷的角色, 以及当所述真正的主卷与所述真正的备份卷 之间的链路正常时, 将所述连接建立模块与所述预先指定的备份卷建立的套 接字连接的描述符, 以及所述连接建立模块与所述预先指定的主卷建立的套 接字连接的描述符注册到所述虚拟块存储客户端设备的内核;
调用模块, 用于调用系统函数进入内核态线程, 在所述内核态线程中处 理上层业务程序发送的写操作请求。
本发明实施例还提供一种备份卷节点设备, 包括:
数据接收模块, 用于接收主卷节点设备通过所述主卷节点设备与备份卷 节点设备之间的套接字连接发送的待写入数据;
数据写入模块, 用于将所述数据接收模块接收的待写入数据写入所述备 份卷节点设备的卷文件;
结果上报模块, 用于向虚拟块存储客户端设备的内核上报写入操作的结 果。
本发明实施例还提供一种存储系统, 包括: 上述主卷节点设备、 上述虚 拟块存储客户端设备和上述备份卷节点设备。
通过本发明实施例, 主卷通过该主卷与虚拟块存储客户端之间的套接字 连接接收到待写入数据之后, 将上述待写入数据写入主卷的卷文件, 并将上 述待写入数据通过主卷与备份卷之间的套接字连接发送给备份卷, 以便备份 卷将待写入数据写入该备份卷的卷文件, 从而可以提高存储可靠性; 在备份 卷将待写入数据写入该备份卷的卷文件之后, 由备份卷向虚拟块存储客户端 的内核上报写入操作的结果; 从而可以降低消息流量, 并且可以分担主卷的 部分负荷, 达到动态负载均衡的目的。 附图说明
为了更清楚地说明本发明实施例或现有技术中的技术方案, 下面将对实 施例或现有技术描述中所需要使用的附图作一简单地介绍, 显而易见地, 下 面描述中的附图是本发明的一些实施例, 对于本领域普通技术人员来讲, 在 不付出创造性劳动的前提下, 还可以根据这些附图获得其他的附图。
图 1为本发明数据存储方法一个实施例的流程图;
图 2为本发明网络架构一个实施例的示意图;
图 3为本发明建立三角模型一个实施例的流程图;
图 4为本发明数据存储方法另一个实施例的流程图;
图 5为本发明网络架构另一个实施例的示意图;
图 6为本发明数据存储方法又一个实施例的流程图;
图 7为本发明网络架构又一个实施例的示意图;
图 8为本发明数据存储方法再一个实施例的流程图;
图 9为本发明网络架构再一个实施例的示意图;
图 10为本发明主卷节点设备一个实施例的结构示意图; 图 11为本发明主卷节点设备另一个实施例的结构示意图; 图 12为本发明虚拟块存储客户端设备一个实施例的结构示意图; 图 13为本发明虚拟块存储客户端设备另一个实施例的结构示意图;; 图 14为本发明备份卷节点设备一个实施例的结构示意图;
图 15为本发明备份卷节点设备另一个实施例的结构示意图;
图 16为本发明存储系统一个实施例的结构示意图;
图 17为本发明云存储系统一个实施例的示意图。 具体实施方式
为使本发明实施例的目的、 技术方案和优点更加清楚, 下面将结合本发 明实施例中的附图, 对本发明实施例中的技术方案进行清楚、 完整地描述, 显然, 所描述的实施例是本发明一部分实施例, 而不是全部的实施例。 基于 本发明中的实施例, 本领域普通技术人员在没有做出创造性劳动的前提下所 获得的所有其他实施例, 都属于本发明保护的范围。
图 1为本发明数据存储方法一个实施例的流程图, 如图 1所示, 该数据存 储方法可以包括:
步骤 101,通过主卷与虚拟块存储客户端之间的套接字连接接收待写入数 据。
步骤 102, 将上述待写入数据写入该主卷的卷文件, 并将上述待写入数据 通过主卷与备份卷之间的套接字连接发送给备份卷, 以便备份卷在将待写入 数据写入该备份卷的卷文件之后, 向虚拟块存储客户端的内核上报写入操作 的结果。
本实施例中, 在通过主卷与虚拟块存储客户端之间的套接字连接接收待 写入数据之前,主卷还可以接收虚拟块存储客户端的内核发送的写操作请求, 并将该写操作请求转发给备份卷, 该写操作请求用于通知主卷和 /或备份卷准 备接收待写入数据; 上述写操作请求是虚拟块存储客户端的内核从该内核注 册的请求队列中获取的; 该内核注册的请求队列中的写操作请求是虚拟块存 储客户端接收到上层业务程序发送的写操作请求之后, 放入该内核注册的请 求队列中的。
本实施例中, 在备份卷向虚拟块存储客户端的内核上报写入操作的结果 之后, 主卷可以向备份卷发送检测点记录请求, 以使主卷和备份卷更新各自 的检测点。
另外, 在备份卷向虚拟块存储客户端的内核上报写入操作的结果之后, 主卷还可以检查脏数据块列表中是否有脏数据; 当该脏数据块列表中有脏数 据, 且满足预定条件之后, 将上述脏数据写到磁盘中。 其中, 脏数据为暂时 存放在内存緩沖区中, 还未写入卷文件的数据; 脏数据块列表用于记录哪些 数据是脏数据。
进一步地, 本实施例中, 主卷接收虚拟块存储客户端的内核发送的写操 作请求之前, 虚拟块存储客户端可以根据预先指定的备份卷的监听端口与预 先指定的备份卷建立套接字连接, 并获得预先指定的备份卷的卷大小、 检测 点和单节点位图; 虚拟块存储客户端可以根据预先指定的主卷的监听端口与 预先指定的主卷建立套接字连接, 并获得预先指定的主卷的卷大小、 检测点 和单节点位图; 然后, 虚拟块存储客户端可以比较预先指定的主卷和预先指 定的备份卷的检测点, 确定最新检测点对应的卷为真正的主卷, 次新检测点 对应的卷为真正的备份卷; 之后, 虚拟块存储客户端可以向该虚拟块存储客 户端的内核注册真正的主卷和真正的备份卷的角色; 当上述真正的主卷与真 正的备份卷之间的链路正常时, 虚拟块存储客户端将与预先指定的备份卷建 立的套接字连接的描述符, 以及与预先指定的主卷建立的套接字连接的描述 符注册到虚拟块存储客户端的内核; 接下来, 虚拟块存储客户端可以调用系 统函数, 例如: ioctrl进入内核态线程, 在该内核态线程中处理上层业务程序 发送的写操作请求。
其中, 在虚拟块存储客户端将与预先指定的备份卷建立的套接字连接的 描述符, 以及与预先指定的主卷建立的套接字连接的描述符注册到虚拟块存 果真正的主卷与真正的备份卷之间的数据不同步, 则虚拟块存储客户端可以 向真正的主卷发送数据同步请求, 以使真正的主卷与真正的备份卷进行数据 同步。
另外, 本实施例的一种实现方式中, 当主卷发生异常时, 在虚拟块存储 客户端的内核进行主备倒换, 将备份卷注册为新的主卷之后, 该新的主卷可 以通过新的主卷与虚拟块存储客户端之间的套接字连接接收待写入数据, 在 将该待写入数据写入新的主卷的卷文件之后, 更新新的主卷的检测点和单节 点位图; 然后, 该新的主卷向虚拟块存储客户端的内核上报写入操作的结果。
本实施例的另一种实现方式中, 当虚拟块存储客户端与备份卷之间的网 络发生异常时, 主卷可以接收虚拟块存储客户端发送的链路异常消息, 并将 该链路异常消息转发给备份卷, 以使备份卷将写入操作的结果发送给主卷, 再由主卷将该写入操作的结果发送给虚拟块存储客户端的内核; 其中, 上述 链路异常消息由虚拟块存储客户端的心跳进程发送给虚拟块存储客户端。
本实施例的再一种实现方式中, 当主卷和备份卷之间的网络发生异常时, 主卷可以在将待写入数据写入该主卷的卷文件之后, 更新主卷的检测点和单 节点位图; 然后, 再由主卷向虚拟块存储客户端的内核上报写入操作的结果。
本实施例中, 接收到写入操作的结果之后, 虚拟块存储客户端的内核可 以先确定写入操作的结果是否与已发送的写操作请求对应; 如果对应, 则虚 拟块存储客户端的内核可以将上述写入操作的结果发送给上层业务程序; 如 果写入操作的结果与已发送的写操作请求不对应, 则虚拟块存储客户端的内 核可以丟弃该写入操作的结果, 或者緩存但不处理该写入操作的结果, 本发 明实施例对此不作限定, 但本发明实施例以写入操作的结果与已发送的写操 作请求不对应时, 虚拟块存储客户端的内核丟弃该写入操作的结果为例进行 说明。 上述实施例中, 主卷通过该主卷与虚拟块存储客户端之间的套接字连接 接收到待写入数据之后, 将上述待写入数据写入主卷的卷文件, 并将上述待 写入数据通过主卷与备份卷之间的套接字连接发送给备份卷, 以便备份卷将 待写入数据写入该备份卷的卷文件, 从而可以提高存储可靠性; 在备份卷将 待写入数据写入该备份卷的卷文件之后, 由备份卷向虚拟块存储客户端的内 核上报写入操作的结果; 从而可以降低消息流量, 并且可以分担主卷的部分 负荷, 达到动态负载均衡的目的。
本发明实施例提供的数据存储方法可以提高存储可靠性, 并在保证存储 可靠性的前提下, 可以进一步降低消息流量, 提高性能。
本发明实施例中, 可以将主卷和备份卷部署在不同的存储节点上, 既可 以解决单存储节点故障引起的业务不可用问题, 也可以解决业务与主卷所在 存储节点或备份卷所在存储节点间的网络异常引起的业务不可用问题; 另夕卜, 在保证单节点故障业务不中断的前提下, 本发明实施例可以将消息流量减少
25%, 性能上有很大的提高。 另外, 备份卷在实现数据备份的同时, 参与了 业务流程(回应写入操作的结果) , 可以分担主卷的部分负荷, 达到一个动 态负载均衡的目的。
本发明实施例采用图 2所示的网络架构, 图 2为本发明网络架构一个实施 例的示意图, 如图 2所示, 该网络架构是一个稳定的三角模型, 在该三角模型 中, 实线代表套接字 (SOCKET )连接, 箭头指向服务器侧, 虚线为控制消 息的流向。
本发明实施例中, 在处理读操作请求时, 备份卷( Backup Volume; 以下 简称: BV ) 不感知, 读数据操作和回应都由主卷(Primary Volume; 以下简 称: PV ) 负责; 在处理写操作请求时, 由 PV接收待写入数据, BV回应写入 操作的结果。 这样, 可以将消息量减少 25%, 从而达到提高性能的目的。
另外, 图 2所示的网络架构可以有效防止存储节点异常和网络异常, 只要 不是虚拟块存储客户端 (virtual block storage client; 以下简称: vbs-client ) 、 PV和 BV中的任意两个同时发生异常, 或者 vbs-client与 PV之间的链路, 以及 vbs-client与 BV之间的链路同时发生异常, 上层业务程序可以不中断的运行。
图 3为本发明建立三角模型一个实施例的流程图, 如图 3所示, 图 2所示三 角模型的建立流程可以包括:
步骤 301, vbs-client根据预先指定的 BV的监听端口与该预先指定的 BV建 立 SOCKET连接。
本实施例中, 预先指定的 BV的监听端口是由预先指定的 BV预先对外提 供的。
步骤 302, vbs-client与预先指定的 BV进行协商, 协商的内容包括: ( 1 )通知预先指定的 BV当前连接的是 vbs-client;
( 2 )通知预先指定的 BV预先指定的 PV的信息, 包括预先指定的 PV的名 称和监听端口; 其中,预先指定的 PV的监听端口也是由预先指定的 PV预先提 供的;
( 3 ) 向预先指定的 BV发送获取卷大小, 检测 (check )点和单节点位图 ( solo bitmap ) 的请求; 其中, check点记录的是写入数据的时间点。
步骤 303, 预先指定的 BV向 vbs-client返回卷大小、 check点和 solo bitmap。 步骤 304, vbs-client根据预先指定的 PV的监听端口与该预先指定的 PV建 立 SOCKET连接。
步骤 305, vbs-client与预先指定的 PV进行协商, 协商的内容包括: ( 1 )通知预先指定的 PV当前连接的是 vbs-client;
( 2 )通知预先指定的 PV当前预先指定的 BV的信息, 包括预先指定的 BV 的名称和监听端口;
( 3 ) 向预先指定的 PV发送获取卷大小、 check点和 solo bitmap的请求。 步骤 306, 预先指定的 PV向 vbs-client返回卷大小、 check点和 solo bitmap。 步骤 307, vbs-client比较预先指定的 PV和预先指定的 BV的 check点, 确定 最新检测点对应的卷为真正的 PV, 次新检测点对应的卷为真正的 B V。 具体地, 如果比较 check点之后, vbs-client发现预先指定的 BV的 check点 是最新的, 也就是说预先指定的 BV—侧的数据是最新的, 这说明在这之前预 先指定的 PV发生过异常, 然后由 vbs-client的内核进行主备倒换, 将预先指定 的 B V作为真正的 P V, 之后存储系统在预先指定的 B V写过数据。
本实施例以预先指定的 PV为真正的 PV, 预先指定的 BV为真正的 BV为例 进行说明。
步骤 308, vbs-client向 vbs-client的内核注册真正的 BV的角色, 并通知预 先指定的 BV, 其为真正的 BV。
步骤 309, vbs-client向 vbs-client的内核注册真正的 PV的角色,并通知预先 指定的 PV, 其为真正的 PV。 果正常, 则执行步骤 311〜步骤 313; 如果真正的 PV与真正的 BV之间的链路发 生异常, 则退出本流程, 进入重建流程。
步骤 311, 如果真正的 PV与真正的 BV之间的数据不同步, 则 vbs-client向 真正的 PV发送数据同步请求, 以使该真正的 PV与真正的 BV进行数据同步。
步骤 312, vbs-client将步骤 301与步骤 304建立的 SOCKET连接的描述符注 册到 vbs-client的内核。这样, vbs-client的内核在向真正的 PV发送读操作请求、 接;而在接收数据或控制消息时, vbs-client的内核会选择一个合适的 SOCKET 连接。 因为, 在三角模型下, vbs-client是从 vbs-client的内核与真正的 BV之间 的 SOCKET连接开始接收数据的, 而在 L模型或 SOLO模型下, vbs-client是从 vbs-client的内核与真正的 PV之间的 SOCKET连接开始接收数据的。
步骤 313, vbs-client调用系统函数, 例如: ioctrl进入内核态线程, 在内核 态线程由注册到系统的回调函数处理上层业务程序发送的 I/O请求, 例如: 写 操作请求和读操作请求。
上述实施例可以实现建立三角模型, 通过该三角模型处理写操作请求, 可以实现提高存储可靠性, 降低消息流量, 提高存储性能。
图 4为本发明数据存储方法另一个实施例的流程图, 由于本发明实施例 中,读操作请求的处理过程仅是 vbs-client与 PV之间的交互,与现有技术一样, 因此, 本实施例仅对写操作请求的处理过程进行介绍。
如图 4所示, 该数据存储方法可以包括:
步骤 401, 三角模型建立, 存储系统处于稳定状态。
步骤 402, 三角模型建立后, vbs-client会启动一个内核态线程专门用来处 理写操作请求。
步骤 403, 上层业务程序发送的写操作请求被 vbs-client的操作系统 ( Operating System; 以下简称: OS )放入内核注册的一个请求队列中。
步骤 404, vbs-client的内核线程从上述请求队列中获取一个写操作请求。 本实施例中, vbs-client的内核线程可以按照预定的规则从上述请求队列 中获取一个写操作请求, 该预定的规则可以为先入先出规则或其他规则, 本 实施例对此不作限定, 只要 vbs-client的内核线程可以按照该预定的规则从上 述请求队列中获取一个写操作请求即可; 但本实施例以该预定的规则为先入 先出规则为例进行说明。
步骤 405, vbs-client的内核向 PV发送写操作请求, 该写操作请求用于通 知 PV准备接收待写入数据。
步骤 406, PV子进程收到写操作请求之后, 将接收到的写操作请求转发 给 BV子进程; 该写操作请求用于通知 BV准备接收待写入数据。
步骤 407, PV通过该 PV与 vbs-client之间的 SOCKET连接接收待写入数据, 将该待写入数据写入 PV的卷文件(volume file ) , 并将该待写入数据通过 PV 与 BV之间的 SOCKET连接发送给 BV, 以便 BV在将待写入数据写入该 BV的卷 文件。
步骤 408, BV在将待写入数据写入该 BV的卷文件之后, 向 vbs-client的内 核上报写入操作的结果。 步骤 409, PV向 BV发送 check点记录请求。
步骤 410, PV和 BV更新各自的 check点。
本实施例中, check点是判断当前侧数据是最新数据的唯一标准。
步骤 411, PV检查脏数据块列表(Dirty Block List; 以下简称: DBL ) 中 是否有脏数据, 如果有, 且满足预定条件, 则 PV将 DBL中的脏数据强行写到 磁盘中。 其中, 脏数据为暂时存放在内存緩沖区中, 还未写入卷文件的数据; DBL用于记录哪些数据是脏数据。
本实施例中, 上述预定条件可以为以下之一或组合:
( 1 )如果在轮询时发现 DBL没有变化, 说明存储系统不忙, 可以直接将 DBL中的脏数据写到磁盘中;
( 2 )如果 DBL不为空, 且在轮询时发现该 DBL有变化, 则可以在达到预 定的时间间隔后将 DBL中的脏数据写到磁盘中。
步骤 412, vbs-client的内核接收到 BV发送的写入操作的结果之后, 判断 接收到的写入操作的结果是否与已发送的写操作请求对应; 如果对应, 则执 行步骤 413 ; 如果上述写入操作的结果与已发送的写操作请求不对应, 则 vbs-client的内核可以丟弃该写入操作的结果, 或者緩存但不处理该写入操作 的结果, 本实施例对此不作限定, 但本实施例以写入操作的结果与已发送的 写操作请求不对应时, vbs-client的内核丟弃该写入操作的结果为例进行说明。
步骤 413, vbs-client的内核将写入操作的结果发送给上层业务程序。
上述实施例中, PV通过该 PV与 vbs-client之间的 SOCKET连接接收到待写 入数据之后, 将上述待写入数据写入 PV的卷文件, 并将上述待写入数据通过 PV与 BV之间的 SOCKET连接发送给 BV, 以便 BV将待写入数据写入该 BV的 卷文件, 从而可以提高存储可靠性; 在 BV将待写入数据写入该 BV的卷文件 之后,由 BV向 vbs-client的内核上报写入操作的结果;从而可以降低消息流量, 并且可以分担 PV的部分负荷, 达到动态负载均衡的目的。
以下介绍在各种异常发生时, 存储系统是如何从稳定的三角模型切换到 其他模型, 从而保证上层业务程序不受影响的。
当 PV发生异常时, 原来的 BV变成真正的 PV, 既负责接收请求, 也负责 回应结果, 这时网络结构可以如图 5所示, 图 5为本发明网络架构另一个实施 例的示意图。 图 5所示的网络架构为单节点 (SOLO )模型
图 6为本发明数据存储方法又一个实施例的流程图, 本实施例对图 5所示 网络架构下的数据存储过程进行介绍。
如图 6所示, 该数据存储方法可以包括:
步骤 601, vbs-client内核态线程接收 PV发生异常之前的写操作请求。
步骤 602, PV发生异常导致 vbs-client与 PV之间的 SOCKET连接发生异常, vbs-client的内核进行主备倒换, 将原来的 BV注册为新的 PV。
步骤 603, vbs-client内核态线程启动检测发生异常的 P V是否恢复的进程。 步骤 604, vbs-client内核态线程将上述写操作请求发送给新的 PV (即原 来的 BV ) 。
步骤 605, 新的 PV将待写入数据写入该新的 PV的卷文件。
步骤 606, 新的 PV更新 check点。
步聚 607, 新的 PV更新 solo bitmap。
步骤 608, 新的 PV向 vbs-client的内核上报写入操作的结果。
步骤 609, vbs-client的内核确定该写入操作的结果与之前发送的写操作请 求对应之后, 将该写入操作的结果发送给上层业务程序。
步骤 610, 检测发生异常的 PV是否恢复的进程发送检测消息给 vbs-client 的内核, 以便 vbs-client的内核根据该检测消息确定发生异常的 PV是否恢复正 常。 之后, 要更新 solo bitmap和 check点, 该 solo bitmap用于记录哪一块数据发生 了变化, 该 check点用于记录写入待写入数据的时间点。 记录 solo bitmap和 check点的作用是,在下次三角模型建立时,存储系统会根据 check点判断哪一 侧的数据是最新的, 只有具有最新数据的卷才可以做 PV。 另外, 如果 PV的数 据比 BV的新, 则可以根据 solo bitmap中记录的信息, 将 PV上相应的数据同步 到 BV, 保证 PV和 BV上数据的一致性; 反之亦然。
当 vbs-client与 BV之间的链路发生异常时, 网络架构可以如图 7所示, 图 7 为本发明网络架构又一个实施例的示意图, 图 7所示的网络架构为 L模型。 在 图 7所示的网络架构下, BV无法向上层业务程序发送写入操作的结果, 这时, BV转向 PV上报写入操作的结果, 最后由 PV向上层业务程序上报写入操作的 结果。
图 8为本发明数据存储方法再一个实施例的流程图, 本实施例对图 7所示 网络架构下的数据存储过程进行介绍。
如图 8所示, 该数据存储方法可以包括:
步骤 801, 在稳定状态下, vbs-client与 BV之间的链路突然发生异常。 步骤 802, 本来需要 BV上报的写入操作的结果, 因为 vbs-client与 BV之间 的链路发生异常, BV无法将写入操作的结果发送给 vbs-client的内核, 一直阻 塞。
步骤 803, 当 vbs-client的心跳机制检测到链路异常时, 向 vbs-client发送链 路异常消息。
步骤 804, vbs-client将链路异常消息发送给 PV。
步骤 805, PV将链路异常消息转发给 BV。
步骤 806, BV接收到链路异常消息之后, 将未发出的写入操作的结果发 送给 PV。
步骤 807, PV将写入操作的结果发送给 vbs-client的内核。
步骤 808, vbs-client的内核将写入操作的结果上报给上层业务程序。
步骤 809, 之后有写操作请求时, BV在将待写入数据写入 BV的卷文件之 后, 直接将写入操作的结果上报给 PV而不再进行尝试发送给 vbs-client的内 核。 当 PV与 BV之间的链路发生异常时, 网络架构可以如图 9所示, 图 9为本发 明网络架构再一个实施例的示意图, 图 9所示的网络结构为倒 V模型。 在倒 V 模型下, 存储系统可以忽略 BV, 以处理写操作请求为例, PV将待写入数据写 入 PV的卷文件之后, 更新 PV的 check点和 solo bitmap; 然后, PV向 vbs-client 的内核上报写入操作的结果, 即 vbs-client不从 BV接收写入操作的结果, 转而 从 PV接收, 整体的处理流程与 SOLO模型类似, 在此不再赘述。
本领域普通技术人员可以理解: 实现上述方法实施例的全部或部分步骤 可以通过程序指令相关的硬件来完成, 前述的程序可以存储于一计算机可读 取存储介质中, 该程序在执行时, 执行包括上述方法实施例的步骤; 而前述 的存储介质包括: ROM, RAM, 磁碟或者光盘等各种可以存储程序代码的介 质。
图 10为本发明主卷节点设备一个实施例的结构示意图, 该主卷节点设备 可以实现本发明图 1所示实施例的流程, 如图 10所示, 该主卷节点设备可以包 括: 接收模块 1001和写入模块 1002;
其中, 接收模块 1001, 用于通过该主卷节点设备与虚拟块存储客户端之 间的套接字连接接收待写入数据;
写入模块 1002, 用于将接收模块 1001接收的待写入数据写入主卷节点设 备的卷文件, 并将待写入数据通过主卷节点设备与备份卷节点设备之间的套 接字连接发送给备份卷节点设备, 以便备份卷节点设备在将该待写入数据写 入备份卷节点设备的卷文件之后, 向虚拟块存储客户端的内核上报写入操作 的结果。
上述实施例中, 接收模块 1001通过该主卷节点设备与虚拟块存储客户端 之间的套接字连接接收到待写入数据之后, 写入模块 1002将上述待写入数据 写入主卷节点设备的卷文件, 并将上述待写入数据通过主卷节点设备与备份 卷节点设备之间的套接字连接发送给备份卷节点设备, 以便备份卷节点设备 将待写入数据写入该备份卷节点设备的卷文件, 从而可以提高存储可靠性; 在备份卷节点设备将待写入数据写入该备份卷的卷文件之后, 由备份卷节点 设备向虚拟块存储客户端的内核上报写入操作的结果; 从而可以降低消息流 量, 并且可以分担主卷节点设备的部分负荷, 达到动态负载均衡的目的。
图 11为本发明主卷节点设备另一个实施例的结构示意图, 与图 10所示的 主卷节点设备相比, 不同之处在于, 图 11所示的主卷节点设备还可以包括: 发送模块 1003;
本实施例中, 接收模块 1001还可以在接收待写入数据之前, 接收虚拟块 存储客户端的内核发送的写操作请求; 则发送模块 1003用于将接收模块 1001 接收的写操作请求转发给备份卷节点设备, 该写操作请求用于通知主卷节点 设备和 /或备份卷节点设备准备接收待写入数据; 写操作请求是虚拟块存储客 户端的内核从该内核注册的请求队列中获取的; 该内核注册的请求队列中的 写操作请求是虚拟块存储客户端接收到上层业务程序发送的写操作请求之 后, 放入该内核注册的请求队列中的。
进一步地, 本实施例中, 发送模块 1003还可以向备份卷节点设备发送检 测点记录请求, 以使主卷节点设备和备份卷节点设备更新各自的检测点。
进一步地, 本实施例中的主卷节点设备还可以包括:
检查模块 1004, 用于检查脏数据块列表中是否有脏数据;
则写入模块 1002还可以当脏数据块列表中有脏数据, 且满足预定条件之 后, 将上述脏数据写到磁盘中。
进一步地, 本实施例中, 接收模块 1001还可以当虚拟块存储客户端与备 份卷节点设备之间的链路发生异常时, 接收虚拟块存储客户端发送的链路异 常消息; 发送模块 1003还可以将接收模块 1001接收的链路异常消息转发给备 份卷节点设备,以使备份卷节点设备将写入操作的结果发送给主卷节点设备, 再由发送模块 1003将上述写入操作的结果发送给虚拟块存储客户端的内核; 该链路异常消息由虚拟块存储客户端的心跳进程发送给虚拟块存储客户端。
进一步地, 本实施例中的主卷节点设备还可以包括: 更新模块 1005, 用于当主卷节点设备和备份卷节点设备之间的链路发生 异常时, 在写入模块 1002将待写入数据写入主卷节点设备的卷文件之后, 更 新主卷节点设备的检测点和单节点位图; 这时, 发送模块 1003可以向虚拟块 存储客户端的内核上报写入操作的结果。
采用上述主卷节点设备构建存储系统, 可以提高存储可靠性, 降低消息 流量, 提高存储性能。
图 12为本发明虚拟块存储客户端设备一个实施例的结构示意图, 如图 12 所示,该虚拟块存储客户端设备可以包括:连接建立模块 1201、获得模块 1202、 比较模块 1203、 确定模块 1204、 注册模块 1205和调用模块 1206;
其中, 连接建立模块 1201, 用于根据预先指定的备份卷的监听端口与预 先指定的备份卷建立套接字连接, 以及根据预先指定的主卷的监听端口与预 先指定的主卷建立套接字连接;
获得模块 1202, 用于获得预先指定的备份卷的卷大小、 检测点和单节点 位图, 以及获得预先指定的主卷的卷大小、 检测点和单节点位图;
比较模块 1203,用于比较预先指定的主卷和预先指定的备份卷的检测点; 确定模块 1204, 用于根据比较模块 1203的比较结果确定最新检测点对应 的卷为真正的主卷, 次新检测点对应的卷为真正的备份卷;
注册模块 1205, 用于向虚拟块存储客户端设备的内核注册真正的主卷和 将连接建立模块 1201与预先指定的备份卷建立的套接字连接的描述符, 以及 连接建立模块 1201与预先指定的主卷建立的套接字连接的描述符注册到虚拟 块存储客户端设备的内核;
调用模块 1206, 用于调用系统函数进入内核态线程, 在内核态线程中处 理上层业务程序发送的写操作请求。
采用上述虚拟块存储客户端设备构建存储系统, 可以提高存储可靠性, 降低消息流量, 提高存储性能。 图 13为本发明虚拟块存储客户端设备另一个实施例的结构示意图, 与图 12所示的虚拟块存储客户端设备相比, 不同之处在于, 图 13所示的虚拟块存 储客户端设备 12还可以包括: 时, 如果真正的主卷与真正的备份卷之间的数据不同步, 则向真正的主卷发 送数据同步请求, 以使真正的主卷与真正的备份卷进行数据同步;
主备倒换模块 1208, 用于当主卷发生异常时, 在虚拟块存储客户端设备 的内核进行主备倒换, 将备份卷注册为新的主卷, 以使新的主卷通过新的主 卷与虚拟块存储客户端之间的套接字连接接收待写入数据, 并在将待写入数 据写入新的主卷的卷文件之后, 更新新的主卷的检测点和单节点位图, 以及 向虚拟块存储客户端设备的内核上报写入操作的结果。
进一步地, 本实施例中的虚拟块存储客户端设备 12还可以包括: 结果接 收模块 1209和结果发送模块 1210;
其中, 结果接收模块 1209, 用于接收写入操作的结果;
这时确定模块 1204还可以确定结果接收模块 1209接收的写入操作的结果 是否与已发送的写操作请求对应;
结果发送模块 1210, 用于在确定模块 1204确定结果接收模块 1209接收的 写入操作的结果与已发送的写操作请求对应之后, 将上述写入操作的结果发 送给上层业务程序。
采用上述虚拟块存储客户端设备构建存储系统, 可以提高存储可靠性, 降低消息流量, 提高存储性能。
图 14为本发明备份卷节点设备一个实施例的结构示意图, 如图 14所示, 该备份卷节点设备可以包括: 数据接收模块 1401、 数据写入模块 1402和结果 上报模块 1403;
其中, 数据接收模块 1401, 用于接收主卷节点设备通过该主卷节点设备 与备份卷节点设备之间的套接字连接发送的待写入数据; 数据写入模块 1402, 用于将数据接收模块 1401接收的待写入数据写入备 份卷节点设备的卷文件;
结果上报模块 1403, 用于向虚拟块存储客户端设备的内核上报写入操作 的结果。
采用上述备份卷节点设备构建存储系统, 可以提高存储可靠性, 降低消 息流量, 提高存储性能。
图 15为本发明备份卷节点设备另一个实施例的结构示意图, 与图 13所示 的备份卷节点设备相比, 不同之处在于, 图 14所示的备份卷节点设备还可以 包括:
请求接收模块 1404, 用于在数据接收模块 1401接收待写入数据之前, 接 收主卷节点设备发送的写操作请求, 该写操作请求用于通知所述备份卷节点 设备准备接收所述待写入数据; 还可以在结果上报模块 1403上报写入操作的 结果之后, 接收主卷节点设备发送的检测点记录请求, 以更新该备份卷节点 设备的检测点。
采用上述备份卷节点设备构建存储系统, 可以提高存储可靠性, 降低消 息流量, 提高存储性能。
图 16为本发明存储系统一个实施例的结构示意图, 如图 16所示, 该存储 系统可以包括: vbs客户端 1601、 PV 1602和 BV 1603;
其中, PV 1602用于通过 PV 1602与 vbs客户端 1601之间的套接字连接接收 待写入数据,将该待写入数据写入 PV 1602的卷文件,并将待写入数据通过 PV 1602与 BV 1603之间的套接字连接发送给 BV 1603, 以便 BV 1603在将待写入 数据写入 B V 1603的卷文件之后, 向 vbs客户端 1601的内核上报写入操作的结 果。
具体地, vbs客户端 1601可以通过本发明图 12或图 13所示的虚拟块存储客 户端设备实现, PV 1602可以通过本发明图 10或图 11所示的主卷节点设备实 现。 本实施例中, vbs客户端 1601可以包括三角模型建立模块 16011、 控制消 息处理模块 16012、 I/O请求发送模块 16013、 链路选择模块 16014和 I/O结果接 收模块 16015;
其中,三角模型建立模块 16011,用于初次启动、节点异常或网络异常时, 重新建立三角模型或进行模型切换; 具体地, 三角模型建立模块 16011可以参 照本发明图 3所示实施例提供的方法建立三角模型,实现图 12所示实施例提供 的虚拟块存储客户端设备中连接建立模块 1201、 获得模块 1202、 比较模块 1203、 确定模块 1204、 注册模块 1205、 调用模块 1206和请求发送模块 1207的 功能。
控制消息处理模块 16012: 为用户态与内核态之间的接口;
I/O请求发送模块 16013 : 来自上层业务程序的 I/O请求, 例如: 写操作请 求或读操作请求通过 I/O请求发送模块 16013发送到 PV 1602;
链路选择模块 16014: 当异常发生时, 数据的收发不同于三角模型, 所以 在 SOLO模型或 L模型时, vbs-client的内核在发送 I/O请求时要选择合适的链 路; 本实施例中, 链路选择模块 16014可以实现图 12所示实施例提供的虚拟块 存储客户端设备中主备倒换模块 1208的功能;
I/O结果接收模块 16015 : 来自 BV的写入操作的结果由 I/O结果接收模块 16015处理; 本实施例中, I/O结果接收模块 16015可以实现图 12所示实施例提 供的虚拟块存储客户端设备中结果接收模块 1209和结果发送模块 1210的功 能。
本实施例中, PV 1602可以包括读操作请求处理模块 16021、 写操作请求 处理模块 16022、 主备数据同步模块 16023、 PV状态检测模块 16024、 单节点 位图 (solo bitmap ) 16025、 DBL 16026和卷文件 16027。
其中, 读操作请求处理模块 16021, 用于处理来自 vbs客户端 1601的读操 作请求, 在三角模型下读操作请求只到 PV 1602, BV 1603不感知;
写操作请求处理模块 16022, 用于处理来自 vbs客户端 1601的写操作请求, 在三角模型下写操作请求会转到 BV 1603 , 最后由 BV 1603向 vbs客户端 1601 报告写入操作的结果; 本实施例中, 写操作请求处理模块 16022可以实现本发 明图 10所示实施例提供的主卷节点设备中接收模块 1001、 写入模块 1002和发 送模块 1003的部分功能;
主备数据同步模块 16023: 当 PV 1602或 BV 1603发生异常时,该存储系统 进入 SOLO模型, 以后有写操作请求时, solo bitmap 16025会记录发生变化的 数据, 当重建三角模型时, 主备数据同步模块 16023将数据同步到另一侧, 保 持数据一致;
PV状态检测模块 16024: 当 PV 1602发生异常时, 存储系统进入 SOLO模 型,原来的 BV 1603会成为真正的 PV, PV状态检测模块 16024轮询检测发生异 常的 PV 1602是否恢复正常, 恢复后重建三角模型;
solo bitmap 16025: 记录存储系统在 SOLO模型下运行期间, PV 1602上发 生变化的数据块; 本实施例中, solo bitmap 16025可以实现本发明图 10所示实 施例提供的主卷节点设备中更新模块 1005的部分功能;
DBL 16026: 写操作请求发生时, 存储系统首先将待写入数据写入緩沖 区, 这些待写入数据在真正写到磁盘之前都被认为是脏数据, DBL 16026用 于记录这些脏数据。 当 DBL 16026中有脏数据, 且满足预定条件时, PV 1602 可以将 DBL 16026中的脏数据强行写到磁盘中。 本实施例中, 上述预定条件 可以为以下之一或组合:
( 1 )如果在轮询时发现 DBL 16026没有变化, 说明存储系统不忙, PV
1602可以直接将 DBL 16026中的脏数据写到磁盘中;
( 2 )如果 DBL 16026不为空, 且在轮询时发现该 DBL 16026有变化, 则 PV 1602可以在达到预定的时间间隔后将 DBL 16026中的脏数据写到磁盘中; 卷文件 16027: 基于高级日志文件系统( XFS )文件系统的稀疏文件, 用 于存放所有的数据。 写入待写入数据之前, 卷文件 16027并不占用磁盘空间, 是一种瘦分配, 为用户提供了很大的灵活性。 本实施例中, BV 1603中各模块的功能与 P V 1602中相同模块的功能相同, 在此不再赘述。
上述存储系统中, PV 1602通过该 PV 1602与 vbs客户端 1601之间的套接字 连接接收到待写入数据之后, 将上述待写入数据写入 PV 1602的卷文件, 并将 上述待写入数据通过 PV 1602与 BV 1603之间的套接字连接发送给 BV 1603 , 以便 BV 1603将待写入数据写入该 BV 1603的卷文件, 从而可以提高存储可靠 客户端 1601的内核上报写入操作的结果; 从而可以降低消息流量, 并且可以 分担 PV 1602的部分负荷, 达到动态负载均衡的目的。 云存储是在云计算概念上延伸和发展出来的一个新的概念, 是指通过集 群应用、 网格技术或分布式文件系统等功能, 将网络中大量各种不同类型的 存储设备通过应用软件集合起来协同工作,共同对外提供数据存储和业务访 问功能的一个系统。 当云计算系统运算和处理的核心是大量数据的存储和管 理时, 云计算系统中就需要配置大量的存储设备, 那么云计算系统就转变成 为一个云存储系统, 所以云存储系统是一个以数据存储和管理为核心的云计 算系统。
图 17为本发明云存储系统一个实施例的示意图, 如图 17所示, 本实施例 中的云存储系统可以包括以下设备:
( 1 )三台块存储提供者(Block Storage Provider; 以下简称: BSP ), 分 别记为 BSP1、 BSP2和 BSP3, BSP1、 BSP2和 BSP3为上层的块存储代理( Block Storage Agent; 以下简称: BSA )提供存储空间。 同时, 每个 BSP上分别有本 发明实施例提供的三角模型中的 PV和 BV。在这种部署下,可以有效地将每个 BSP的 I/O负荷平均分配。
( 2 )—台 BSA, 作为存储管理中间件和底层 BSP之间的接口, 负责向上 层提供虚拟的 NBD设备。 ( 3 )另一台服务器, 该服务器上部署监控系统、 计费系统和存储资源管 理系统等; 其中, 存储资源管理系统, 主要负责 BSP1、 BSP2和 BSP3上 PV和 BV的选择, 从而达到 BSP1、 BSP2和 BSP3之间负载的均衡; 监控系统用于实 时监控各节点是否异常, 另外, 也负责监控 BSP1、 BSP2和 BSP3的性能, 如 有异常及时通知存储资源管理系统。
本发明实施例提供的数据存储方法、 设备和系统, 可以方便地构建大型 可靠的存储系统。 同时, 本发明实施例在提高存储可靠性的同时, 也有效降 低了整个存储系统的消息流量。 在 I/O负荷很高的场合, 优势更加明显。
本领域技术人员可以理解附图只是一个优选实施例的示意图, 附图中的 模块或流程并不一定是实施本发明所必须的。
本领域技术人员可以理解实施例中的装置中的模块可以按照实施例描述 进行分布于实施例的装置中, 也可以进行相应变化位于不同于本实施例的一 个或多个装置中。 上述实施例的模块可以合并为一个模块, 也可以进一步拆 分成多个子模块。
最后应说明的是: 以上实施例仅用以说明本发明的技术方案, 而非对其 限制; 尽管参照前述实施例对本发明进行了详细的说明, 本领域的普通技术 人员应当理解: 其依然可以对前述各实施例所记载的技术方案进行修改, 或 者对其中部分技术特征进行等同替换; 而这些修改或者替换, 并不使相应技 术方案的本质脱离本发明各实施例技术方案的精神和范围。

Claims

权 利 要 求
1、 一种数据存储方法, 其特征在于, 包括:
通过主卷与虚拟块存储客户端之间的套接字连接接收待写入数据; 将所述待写入数据写入所述主卷的卷文件,并将所述待写入数据通过所述 主卷与备份卷之间的套接字连接发送给所述备份卷,以便所述备份卷在将所述 待写入数据写入所述备份卷的卷文件之后,向所述虚拟块存储客户端的内核上 报写入操作的结果。
2、 根据权利要求 1所述的方法, 其特征在于, 所述通过主卷与虚拟块存储 客户端之间的套接字连接接收待写入数据之前, 还包括:
接收虚拟块存储客户端的内核发送的写操作请求,并将所述写操作请求转 发给备份卷, 所述写操作请求用于通知所述主卷和 /或所述备份卷准备接收所 述待写入数据;所述写操作请求是所述虚拟块存储客户端的内核从所述内核注 册的请求队列中获取的;所述内核注册的请求队列中的写操作请求是所述虚拟 块存储客户端接收到上层业务程序发送的写操作请求之后,放入所述内核注册 的请求队列中的。
3、 根据权利要求 1所述的方法, 其特征在于, 所述向所述虚拟块存储客户 端的内核上报写入操作的结果之后, 还包括:
向所述备份卷发送检测点记录请求,以使所述主卷和所述备份卷更新各自 的检测点。
4、 根据权利要求 1或 3所述的方法, 其特征在于, 所述向所述虚拟块存储 客户端的内核上报写入操作的结果之后, 还包括:
检查脏数据块列表中是否有脏数据; 当所述脏数据块列表中有脏数据,且满足预定条件之后, 将所述脏数据写 到磁盘中。
5、 根据权利要求 2所述的方法, 其特征在于, 所述接收虚拟块存储客户端 的内核发送的写操作请求之前, 还包括:
所述虚拟块存储客户端根据预先指定的备份卷的监听端口与所述预先指 定的备份卷建立套接字连接, 并获得所述预先指定的备份卷的卷大小、检测点 和单节点位图;
所述虚拟块存储客户端根据预先指定的主卷的监听端口与所述预先指定 的主卷建立套接字连接, 并获得所述预先指定的主卷的卷大小、检测点和单节 点位图;
所述虚拟块存储客户端比较所述预先指定的主卷和所述预先指定的备份 卷的检测点,确定最新检测点对应的卷为真正的主卷, 次新检测点对应的卷为 真正的备份卷;
所述虚拟块存储客户端向所述虚拟块存储客户端的内核注册所述真正的 主卷和所述真正的备份卷的角色;
当所述真正的主卷与所述真正的备份卷之间的链路正常时,所述虚拟块存 储客户端将与所述预先指定的备份卷建立的套接字连接的描述符,以及与所述 预先指定的主卷建立的套接字连接的描述符注册到所述虚拟块存储客户端的 内核;
所述虚拟块存储客户端调用系统函数进入内核态线程,在所述内核态线程 中处理上层业务程序发送的写操作请求。
6、 根据权利要求 5所述的方法, 其特征在于, 所述虚拟块存储客户端将与 所述预先指定的备份卷建立的套接字连接的描述符,以及与所述预先指定的主 卷建立的套接字连接的描述符注册到所述虚拟块存储客户端的内核之前,还包 括:
当所述真正的主卷与所述真正的备份卷之间的链路正常时,如果所述真正 的主卷与所述真正的备份卷之间的数据不同步,则所述虚拟块存储客户端向所 述真正的主卷发送数据同步请求,以使所述真正的主卷与所述真正的备份卷进 行数据同步。
7、 根据权利要求 2所述的方法, 其特征在于, 还包括:
当所述主卷发生异常时, 在所述虚拟块存储客户端的内核进行主备倒换, 将所述备份卷注册为新的主卷之后,所述新的主卷通过所述新的主卷与所述虚 拟块存储客户端之间的套接字连接接收待写入数据;
所述新的主卷将所述待写入数据写入所述新的主卷的卷文件之后,更新所 述新的主卷的检测点和单节点位图;
所述新的主卷向所述虚拟块存储客户端的内核上报写入操作的结果。
8、 根据权利要求 2所述的方法, 其特征在于, 还包括:
当所述虚拟块存储客户端与所述备份卷之间的链路发生异常时,所述主卷 接收所述虚拟块存储客户端发送的链路异常消息,并将所述链路异常消息转发 给所述备份卷, 以使所述备份卷将所述写入操作的结果发送给所述主卷, 由所 述主卷将所述写入操作的结果发送给所述虚拟块存储客户端的内核;所述链路 异常消息由所述虚拟块存储客户端的心跳进程发送给所述虚拟块存储客户端。
9、 根据权利要求 2所述的方法, 其特征在于, 还包括:
当所述主卷和所述备份卷之间的链路发生异常时,所述主卷将所述待写入 数据写入所述主卷的卷文件之后, 更新所述主卷的检测点和单节点位图; 所述主卷向所述虚拟块存储客户端的内核上报写入操作的结果。
10、 根据权利要求 2、 7、 8或 9所述的方法, 其特征在于, 还包括: 接收到所述写入操作的结果之后,所述虚拟块存储客户端的内核确定所述 写入操作的结果是否与已发送的写操作请求对应;
如果对应,则所述虚拟块存储客户端的内核将所述写入操作的结果发送给 所述上层业务程序。
11、 一种主卷节点设备, 其特征在于, 包括:
接收模块,用于通过所述主卷节点设备与虚拟块存储客户端之间的套接字 连接接收待写入数据;
写入模块,用于将所述接收模块接收的待写入数据写入所述主卷节点设备 的卷文件,并将所述待写入数据通过所述主卷节点设备与备份卷节点设备之间 的套接字连接发送给所述备份卷节点设备,以便所述备份卷节点设备在将所述 待写入数据写入所述备份卷节点设备的卷文件之后,向所述虚拟块存储客户端 的内核上报写入操作的结果。
12、 根据权利要求 11所述的设备, 其特征在于, 还包括发送模块; 所述接收模块,还用于在接收所述待写入数据之前,接收虚拟块存储客户 端的内核发送的写操作请求;
所述发送模块,用于将所述接收模块接收的写操作请求转发给所述备份卷 节点设备, 所述写操作请求用于通知所述主卷节点设备和 /或所述备份卷节点 设备准备接收所述待写入数据;所述写操作请求是所述虚拟块存储客户端的内 核从所述内核注册的请求队列中获取的;所述内核注册的请求队列中的写操作 请求是所述虚拟块存储客户端接收到上层业务程序发送的写操作请求之后,放 入所述内核注册的请求队列中的。
13、 根据权利要求 12所述的设备, 其特征在于,
所述发送模块,还用于向所述备份卷节点设备发送检测点记录请求, 以使 所述主卷节点设备和所述备份卷节点设备更新各自的检测点。
14、 根据权利要求 11或 13所述的设备, 其特征在于, 还包括检查模块; 所述检查模块, 用于检查脏数据块列表中是否有脏数据;
所述写入模块,还用于当所述脏数据块列表中有脏数据, 且满足预定条件 之后, 将所述脏数据写到磁盘中。
15、 根据权利要求 12所述的设备, 其特征在于,
所述接收模块,还用于当所述虚拟块存储客户端与所述备份卷节点设备之 间的链路发生异常时, 接收所述虚拟块存储客户端发送的链路异常消息; 所述发送模块,还用于将所述接收模块接收的链路异常消息转发给所述备 份卷节点设备,以使所述备份卷节点设备将所述写入操作的结果发送给所述主 卷节点设备,再由所述发送模块将所述写入操作的结果发送给所述虚拟块存储 客户端的内核;所述链路异常消息由所述虚拟块存储客户端的心跳进程发送给 所述虚拟块存储客户端。
16、 根据权利要求 12所述的设备, 其特征在于, 还包括更新模块; 更新模块,用于当所述主卷节点设备和所述备份卷节点设备之间的链路发 生异常时,在所述写入模块将所述待写入数据写入所述主卷节点设备的卷文件 之后, 更新所述主卷节点设备的检测点和单节点位图;
所述发送模块,还用于向所述虚拟块存储客户端的内核上报写入操作的结 果。
17、 一种虚拟块存储客户端设备, 其特征在于, 包括:
连接建立模块,用于根据预先指定的备份卷的监听端口与所述预先指定的 备份卷建立套接字连接,以及根据预先指定的主卷的监听端口与所述预先指定 的主卷建立套接字连接;
获得模块, 用于获得所述预先指定的备份卷的卷大小、检测点和单节点位 图, 以及获得所述预先指定的主卷的卷大小、 检测点和单节点位图;
比较模块,用于比较所述预先指定的主卷和所述预先指定的备份卷的检测 点;
确定模块,用于根据所述比较模块的比较结果确定最新检测点对应的卷为 真正的主卷, 次新检测点对应的卷为真正的备份卷;
注册模块,用于向所述虚拟块存储客户端设备的内核注册所述真正的主卷 和所述真正的备份卷的角色,以及当所述真正的主卷与所述真正的备份卷之间 的链路正常时,将所述连接建立模块与所述预先指定的备份卷建立的套接字连 接的描述符,以及所述连接建立模块与所述预先指定的主卷建立的套接字连接 的描述符注册到所述虚拟块存储客户端设备的内核;
调用模块, 用于调用系统函数进入内核态线程,在所述内核态线程中处理 上层业务程序发送的写操作请求。
18、 根据权利要求 17所述的设备, 其特征在于, 还包括:
请求发送模块,用于当所述真正的主卷与所述真正的备份卷之间的链路正 常时,如果所述真正的主卷与所述真正的备份卷之间的数据不同步, 则向所述 真正的主卷发送数据同步请求,以使所述真正的主卷与所述真正的备份卷进行 数据同步。
19、 根据权利要求 17所述的设备, 其特征在于, 还包括:
主备倒换模块, 用于当所述主卷发生异常时,在所述虚拟块存储客户端设 备的内核进行主备倒换, 将所述备份卷注册为新的主卷, 以使所述新的主卷通 过所述新的主卷与所述虚拟块存储客户端之间的套接字连接接收待写入数据, 并在将所述待写入数据写入所述新的主卷的卷文件之后,更新所述新的主卷的 检测点和单节点位图,以及向所述虚拟块存储客户端设备的内核上报写入操作 的结果。
20、 根据权利要求 19所述的设备, 其特征在于, 还包括结果接收模块和结 果发送模块;
所述结果接收模块, 用于接收写入操作的结果;
所述确定模块,还用于确定所述结果接收模块接收的写入操作的结果是否 与已发送的写操作请求对应;
所述结果发送模块,用于在所述确定模块确定所述结果接收模块接收的写 入操作的结果与已发送的写操作请求对应之后,将所述写入操作的结果发送给 所述上层业务程序。
21、 一种备份卷节点设备, 其特征在于, 包括:
数据接收模块,用于接收主卷节点设备通过所述主卷节点设备与备份卷节 点设备之间的套接字连接发送的待写入数据;
数据写入模块,用于将所述数据接收模块接收的待写入数据写入所述备份 卷节点设备的卷文件;
结果上报模块, 用于向虚拟块存储客户端设备的内核上报写入操作的结 果。
22、 根据权利要求 21所述的设备, 其特征在于, 还包括:
请求接收模块, 用于在所述数据接收模块接收待写入数据之前,接收所述 主卷节点设备发送的写操作请求,所述写操作请求用于通知所述备份卷节点设 备准备接收所述待写入数据。
23、 根据权利要求 22所述的设备, 其特征在于,
所述请求接收模块, 还用于在所述结果上报模块上报写入操作的结果之 后,接收所述主卷节点设备发送的检测点记录请求, 以更新所述备份卷节点设 备的检测点。
24、 一种存储系统, 其特征在于, 包括: 如权利要求 11-16任意一项所述 的主卷节点设备、 如权利要求 17-20任意一项所述的虚拟块存储客户端设备和 如权利要求 21-23任意一项所述的备份卷节点设备。
PCT/CN2011/078476 2011-01-19 2011-08-16 数据存储方法、设备和系统 WO2012097588A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201110021715.1 2011-01-19
CN 201110021715 CN102088490B (zh) 2011-01-19 2011-01-19 数据存储方法、设备和系统

Publications (1)

Publication Number Publication Date
WO2012097588A1 true WO2012097588A1 (zh) 2012-07-26

Family

ID=44100102

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2011/078476 WO2012097588A1 (zh) 2011-01-19 2011-08-16 数据存储方法、设备和系统

Country Status (2)

Country Link
CN (1) CN102088490B (zh)
WO (1) WO2012097588A1 (zh)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102088490B (zh) * 2011-01-19 2013-06-12 华为技术有限公司 数据存储方法、设备和系统
CN102291466B (zh) * 2011-09-05 2014-02-26 浪潮电子信息产业股份有限公司 一种优化集群存储网络资源配置的方法
TWI676898B (zh) * 2013-12-09 2019-11-11 安然國際科技有限公司 分散式記憶體磁碟群集儲存系統運作方法
CN105940658B (zh) * 2015-01-04 2019-04-26 华为技术有限公司 一种用户数据的传输方法、装置及终端
CN108804248B (zh) * 2017-04-28 2021-07-06 南京壹进制信息科技有限公司 一种卷实时保护数据的自动校验方法
CN110879760B (zh) * 2018-09-05 2022-09-02 北京鲸鲨软件科技有限公司 一种统一存储系统及方法、电子设备
CN110837442B (zh) * 2019-11-14 2022-08-16 北京京航计算通讯研究所 基于脏数据位图和网络块设备的kvm虚拟机备份系统
CN110837441A (zh) * 2019-11-14 2020-02-25 北京京航计算通讯研究所 基于脏数据位图和网络块设备的kvm虚拟机备份方法
CN112559445B (zh) * 2020-12-11 2022-12-27 上海哔哩哔哩科技有限公司 数据写入方法及装置
CN113032768B (zh) * 2021-03-31 2021-11-16 广州锦行网络科技有限公司 鉴权方法、装置、设备及计算机可读介质
CN113721857B (zh) * 2021-09-05 2023-08-25 苏州浪潮智能科技有限公司 一种双活存储系统管理方法、设备及存储介质
CN117421160B (zh) * 2023-11-01 2024-04-30 广州鼎甲计算机科技有限公司 数据备份方法、装置、计算机设备和存储介质

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050216535A1 (en) * 2004-03-29 2005-09-29 Nobuyuki Saika Backup method, storage system, and program for backup
US20080059734A1 (en) * 2006-09-06 2008-03-06 Hitachi, Ltd. Storage subsystem and back-up/recovery method
CN101291205A (zh) * 2008-06-16 2008-10-22 杭州华三通信技术有限公司 传输备份数据的方法、系统、镜像服务器和客户端
CN101478560A (zh) * 2008-01-04 2009-07-08 宏正自动科技股份有限公司 数据备份装置及其系统
US20090204700A1 (en) * 2008-02-07 2009-08-13 Gosukonda Naga Venkata Satya Sudhakar Coordinated peer-to-peer (p2p) replicated backup and versioning
US20090228671A1 (en) * 2008-03-04 2009-09-10 Takeshi Shigemura Storage apparatus and control method thereof
CN101706805A (zh) * 2009-10-30 2010-05-12 中国科学院计算技术研究所 对象存储方法及其系统
CN101808127A (zh) * 2010-03-15 2010-08-18 成都市华为赛门铁克科技有限公司 数据备份方法、系统和服务器
CN102088490A (zh) * 2011-01-19 2011-06-08 华为技术有限公司 数据存储方法、设备和系统

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050216535A1 (en) * 2004-03-29 2005-09-29 Nobuyuki Saika Backup method, storage system, and program for backup
US20080059734A1 (en) * 2006-09-06 2008-03-06 Hitachi, Ltd. Storage subsystem and back-up/recovery method
CN101478560A (zh) * 2008-01-04 2009-07-08 宏正自动科技股份有限公司 数据备份装置及其系统
US20090204700A1 (en) * 2008-02-07 2009-08-13 Gosukonda Naga Venkata Satya Sudhakar Coordinated peer-to-peer (p2p) replicated backup and versioning
US20090228671A1 (en) * 2008-03-04 2009-09-10 Takeshi Shigemura Storage apparatus and control method thereof
CN101291205A (zh) * 2008-06-16 2008-10-22 杭州华三通信技术有限公司 传输备份数据的方法、系统、镜像服务器和客户端
CN101706805A (zh) * 2009-10-30 2010-05-12 中国科学院计算技术研究所 对象存储方法及其系统
CN101808127A (zh) * 2010-03-15 2010-08-18 成都市华为赛门铁克科技有限公司 数据备份方法、系统和服务器
CN102088490A (zh) * 2011-01-19 2011-06-08 华为技术有限公司 数据存储方法、设备和系统

Also Published As

Publication number Publication date
CN102088490B (zh) 2013-06-12
CN102088490A (zh) 2011-06-08

Similar Documents

Publication Publication Date Title
WO2012097588A1 (zh) 数据存储方法、设备和系统
US11818212B2 (en) Storage area network attached clustered storage system
US10965519B2 (en) Exactly-once transaction semantics for fault tolerant FPGA based transaction systems
US7526549B2 (en) Cluster data port services for clustered computer system
WO2019085875A1 (zh) 存储集群的配置修改方法、存储集群及计算机系统
CN106170948B (zh) 一种用于双活数据中心的仲裁方法、装置及系统
US7676616B2 (en) Method, apparatus and program storage device for providing asynchronous status messaging in a data storage system
WO2016202051A1 (zh) 一种通信系统中管理主备节点的方法和装置及高可用集群
US9489230B1 (en) Handling of virtual machine migration while performing clustering operations
US10785350B2 (en) Heartbeat in failover cluster
CN103763155A (zh) 分布式云存储系统多服务心跳监测方法
CN102394914A (zh) 集群脑裂处理方法和装置
US9703638B2 (en) System and method for supporting asynchronous invocation in a distributed data grid
CN112887367B (zh) 实现分布式集群高可用的方法、系统及计算机可读介质
CN108512753B (zh) 一种集群文件系统中消息传输的方法及装置
CN111935244A (zh) 一种业务请求处理系统及超融合一体机
US20240036996A1 (en) Methods and systems to improve input/output (i/o) resumption time by batching multiple non-conflicting operations during a non-disruptive automatic unplanned failover from a primary copy of data at a primary storage system to a mirror copy of the data at a cross-site secondary storage system
CN107357800A (zh) 一种数据库高可用零丢失解决方法
CN117041893B (zh) 一种使用云计算技术发送国际短信的方法与系统
CN109474694A (zh) 一种基于san存储阵列的nas集群的管控方法及装置
CN110351122B (zh) 容灾方法、装置、系统与电子设备
WO2022218346A1 (zh) 一种故障处理方法及装置
WO2021244500A1 (zh) 一种备份状态确定方法、装置及系统
US11947431B1 (en) Replication data facility failure detection and failover automation
WO2024067051A1 (zh) 一种多az仲裁系统及方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11856204

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 11856204

Country of ref document: EP

Kind code of ref document: A1