WO2022053033A1 - 一种双活存储系统及其处理数据的方法 - Google Patents

一种双活存储系统及其处理数据的方法 Download PDF

Info

Publication number
WO2022053033A1
WO2022053033A1 PCT/CN2021/117843 CN2021117843W WO2022053033A1 WO 2022053033 A1 WO2022053033 A1 WO 2022053033A1 CN 2021117843 W CN2021117843 W CN 2021117843W WO 2022053033 A1 WO2022053033 A1 WO 2022053033A1
Authority
WO
WIPO (PCT)
Prior art keywords
storage device
file
node
data
virtual
Prior art date
Application number
PCT/CN2021/117843
Other languages
English (en)
French (fr)
Inventor
杜翔
翁宇佳
李小华
张鹏
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN202011628940.7A external-priority patent/CN114168066A/zh
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to JP2023516240A priority Critical patent/JP2023541069A/ja
Priority to BR112023003725A priority patent/BR112023003725A2/pt
Priority to EP21866085.0A priority patent/EP4198701A4/en
Publication of WO2022053033A1 publication Critical patent/WO2022053033A1/zh
Priority to US18/178,541 priority patent/US20230205638A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/065Replication mechanisms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1469Backup restoration techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • G06F3/0617Improving the reliability of storage systems in relation to availability
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • G06F3/0619Improving the reliability of storage systems in relation to data integrity, e.g. data losses, bit errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0626Reducing size or complexity of storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0643Management of files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/815Virtual
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/84Using snapshots, i.e. a logical point-in-time copy of the data

Definitions

  • the present application relates to the field of storage, and in particular, to a dual-active storage system and a method for processing data.
  • a network storage cluster such as a Network Attached Storage (NAS) cluster
  • NAS Network Attached Storage
  • the first storage device when active-active is implemented, when the first storage device receives the write data, it will write the received write data to the local At the same time, it will be synchronized to the peer storage device as backup data, so that when the first storage device fails or the first storage device is disconnected from the second storage device, the second storage device can use the backup data to take over the second storage device.
  • a service of a storage device so as to ensure that the service is not interrupted, that is, the active-active mode of the active-passive mode is realized. However, the active-active mode cannot be realized.
  • the present application provides a dual-active storage system and a method for implementing the dual-active storage system, which are used to realize dual-active in Active-Active mode, so that storage devices in the dual-active storage system can access data in the same file system.
  • a first aspect of the present application provides an active-active storage system.
  • the active-active storage system includes a first storage device and a second storage device.
  • the first storage device is configured to receive data of the first file sent by the client cluster to the file system, store the data of the first file, and send data of the first copy of the data of the first file to the second storage device.
  • the second storage device is configured to receive the data of the second file sent by the client cluster to the file system, store the data of the second file, and send the data of the second copy of the second file to the file system. the first storage device.
  • both the first storage device and the second storage device can store file data through the same file system, and can back up the file data of the opposite end, an Active-Active mode dual-active storage system is realized.
  • Traditional NAS devices also have file systems, but two storage devices in Active-Passive mode each have an independent file system. Both independent file systems need to occupy the computing/storage resources of the storage device, resulting in low resource utilization efficiency. , it is also more complicated to manage, which is not a true active-active.
  • the first storage device and the second storage device have the same file system, which can improve resource utilization efficiency and reduce management complexity.
  • the client sends an access request to the storage device, it also sends a request to the same file system. Therefore, the access efficiency for the client is also improved.
  • the active-active storage system further includes a virtual node set, the virtual node set includes a plurality of virtual nodes, each virtual node is allocated a computing resource, and the computing resource Sourced from a physical node in the first storage device or the second storage device.
  • the physical node may be a control node of the first storage device and the second storage device, or may be a CPU in the control node, or a core in the CPU.
  • a virtual node is a logical concept that acts as a medium for resource allocation to achieve isolation of computing resources in the system. According to this resource management method, each virtual node is allocated with independent computing resources, so the computing resources used by files/directories corresponding to different virtual nodes are also independent. Therefore, the capacity expansion or reduction of the active-active storage system is facilitated, and the lock-free mechanism between computing resources is also facilitated, and the complexity is reduced.
  • the active-active storage system further includes a management device, and the management device is further configured to create a global view, where the global view is used to record the information assigned to each virtual node and the management device.
  • the corresponding relationship between computing resources; the management device is further configured to send the global view to the first storage device and the second storage device, and the first storage device and the second storage device save the global view.
  • the management device can be used as a software module installed on the first storage device or the second storage device, or can be an independent device.
  • a software module installed on the first storage device after generating the global view , sending the global view to the first storage device and the second storage device for storage by interacting with other modules in the storage device.
  • the virtual nodes in the virtual node set are presented to the applications in the first storage device and the second storage device respectively through the global view, and the applications in the first storage device and the second storage device will use the physical node of the opposite end as the local end's physical node. resources are used to facilitate interaction with peer physical nodes.
  • the first storage device when storing the data of the first file, determines the first file corresponding to the first file according to the address of the data of the first file.
  • a virtual node determining the computing resources allocated to the first virtual node according to the first virtual node and the global view, and based on the computing resources allocated to the first virtual node, The data is sent to the physical node corresponding to the computing resource, and the physical node stores the data of the first file in the memory of the physical node.
  • the first storage device can receive data of files belonging to the physical node corresponding to any virtual node in the virtual node set, and forward the received data of the files to the file belonging In this way, when writing data, the user does not need to perceive the actual storage location of the file, and can operate the file through any storage device.
  • the first virtual node has at least one backup virtual node, and the physical node corresponding to the first virtual node and the physical node corresponding to the backup virtual node are located in different
  • the first storage device after determining the first virtual node corresponding to the first file, the first storage device further determines the backup virtual node corresponding to the first virtual node; and determining according to the backup virtual node and the global view the physical node corresponding to the backup virtual node; sending the first copy data to the physical node corresponding to the backup virtual node, and the physical node corresponding to the backup virtual node stores the first backup data in the in the physical node.
  • the business of the first storage device can be taken over through the backup data , thereby improving the reliability of the system.
  • the files and directories included in the file system are distributed among physical nodes corresponding to multiple virtual nodes in the virtual node set.
  • the files and directories included in the file system are distributed among the physical nodes corresponding to multiple virtual nodes in the virtual node set, which specifically refers to distributing the files and directories included in the file system to multiple physical nodes for processing. .
  • the physical resources of the first storage device and the second storage device can be fully utilized, and the file processing efficiency can be improved.
  • each virtual node in the virtual node set is set with one or more fragment identifiers, and each directory and file in the file system is allocated a fragment
  • the physical nodes in the first storage device and the second storage device distribute the directories and files to physical nodes corresponding to the virtual nodes to which the fragment identifiers belong according to the fragment identifiers of each directory and file.
  • the files and directories included in the file system can be more conveniently distributed to the physical nodes of the first storage device and the second storage device by using the fragment identification.
  • the first physical node in the first storage device is configured to receive a creation request for the first file, and a virtual node corresponding to the first physical node is configured to receive a request for creating the first file. From the set one or more fragment identifiers, a fragment identifier is selected for the first file, and the first file is created in the storage device.
  • the first storage device when the second storage device fails or the link between the first storage device and the second storage device is disconnected, the first storage device The device is further configured to restore the second file based on the second copy data of the second file, and take over the service sent by the client cluster to the second storage device.
  • the business of the first storage device can be taken over by backing up data, thereby improving the reliability of the system.
  • the first storage device is further configured to delete the virtual node corresponding to the computing resource of the second storage device from the global view.
  • the first storage device further has a first file system
  • the second storage device further has a second file system
  • Running both the local file system and the clustered file system on the same storage device provides users with multiple ways to access data in the storage device.
  • a second aspect of the present application provides a method for implementing an active-active file system, and the method includes steps for implementing the operations performed by the first storage device and the second storage device in the active-active storage system provided in the first aspect of the present application. each function.
  • a third aspect of the present application provides a management device, the management device is used to create a global view, the global view is used to record the corresponding relationship between each virtual node and its allocated computing resources, and is also used to record the global view.
  • the view is sent to the first storage device and the second storage device for storage.
  • the management device is used to monitor the changes of the virtual nodes in the first storage device and the second storage device. When it is detected that a new virtual node is added to the virtual cluster, or when the virtual node is deleted, for example, the virtual node corresponds to If the physical node fails, the global view is updated.
  • the monitoring module can monitor changes of virtual nodes in the virtual node cluster in real time, so as to update the global view in time.
  • a fourth aspect of the present application provides a storage medium for storing program instructions, where the program instructions are used to implement various functions provided by the management device provided in the third aspect.
  • Figure 1 is an architecture diagram of an active-active storage system in Active-passive mode.
  • FIG. 2 is an architectural diagram of a dual-active storage system in an Active-Active mode provided by an embodiment of the present application.
  • FIG. 3A is a flowchart of a method for establishing an active-active storage system in an embodiment of the present application.
  • FIG. 3B is a schematic diagram of various parameters generated in the process of constructing an active-active storage system in an embodiment of the present application.
  • FIG. 4A is a flowchart of establishing a file system of the active-active storage system according to an embodiment of the present application.
  • FIG. 4B is a schematic diagram of a dual-active system constructed in an embodiment of the present application.
  • FIG. 5 is a flowchart of a method for creating a directory in a file system according to an embodiment of the present application.
  • FIG. 6 is a flowchart of a method for querying a directory in a file system according to an embodiment of the present application.
  • FIG. 7 is a flowchart of a method for creating a file in a file system according to an embodiment of the present application.
  • FIG. 8 is a flowchart of a method for writing data in a file in a file system according to an embodiment of the present application.
  • FIG. 9 is a flowchart of a method for writing data in a file in a file system according to an embodiment of the present application.
  • FIG. 10 is a schematic diagram of a first storage device taking over services of the second storage device in an embodiment of the present application.
  • FIG. 11 is a flowchart of a method for a first storage device to take over a service of the second storage device in an embodiment of the present application.
  • the system 10 includes a first storage device 100 and a second storage device 200 .
  • a first file system 102 is provided in the control node 101 of the first storage device 100 (the first storage device may include a plurality of control nodes, and for the convenience of description, only one is taken as an example)
  • a second file system 202 is provided in the control node 201 (the second storage device may also include a plurality of control nodes, and for convenience of description, only one is used as an example for description).
  • the first storage device 100 mounts the first file system 102 to the first client 300 .
  • the second storage device 200 mounts the second file system 202 to the second client 400 .
  • Each file system has a root directory.
  • Mounting the file system on the client by the storage device means that the storage device provides the root directory of the file system to the client, and the client sets the root directory of the file system in the client's In the file system, the client can obtain the root directory of the file system of the storage device, so as to access the file system of the storage device according to the root directory of the file system of the storage device.
  • the first client 300 reads and writes data through the first file system 102 , and the written data is stored as the local data 103 .
  • the backup data of the second storage device 200 is also stored in the first storage device 100 .
  • the second client 400 reads and writes data through the second file system 202 , and the written data is stored as the local data 203 .
  • the second storage device 200 also stores the backup data of the first storage device 100 , that is, the peer backup data 204 . In this way, after the first storage device 100 fails or the link with the second client is disconnected, the second client can use the peer backup data 204 to take over the services of the first client 300, that is, the Active-passive mode is implemented. 's dual life.
  • the first client 300 can only access the files in the first storage device 100 through the first file system.
  • the second client 400 can only access the data in the second storage device 200 through the second file system, but cannot access the data in the first storage device 100, that is, cannot Implement active-active in Active-Active mode.
  • the technical solutions provided by the embodiments of the present application set a global view, which is a collection of virtual nodes, and each virtual node in the global view is allocated computing resources, and the computing resources come from the first storage device and the second storage device.
  • Two physical nodes of the storage device, the physical nodes can be the controller in the first storage device and the controller in the second storage device, or the CPU in the controller, or the core in the CPU, or a distributed Servers in the storage system.
  • each physical node can obtain the global view, and in addition, each physical node also uses the same file system, so that the first client connected to the first storage device and the first client connected to the second storage device
  • the second client of the storage device is mounted with the same file system, so that the first client can access the data belonging to the file system in the second storage device through the file system and the global view .
  • the system 500 includes a first storage device 600 and a second storage device 700 .
  • the first storage device 600 includes a physical node A and a physical node B.
  • the second storage device 700 includes a physical node C and a physical node D.
  • the first storage device 600 and the second storage device 700 may include more physical nodes.
  • this embodiment only takes that each storage device includes two physical nodes as an example for description.
  • the first storage device 600 and the second storage device 700 respectively include persistent storage devices 601 and 701 composed of multiple storage disks for persistently storing data.
  • the first storage device 600 and the second storage device 700 Based on the physical storage space provided by the storage disks of the persistent storage devices 601 and 701 , the first storage device 600 and the second storage device 700 create a first volume 609 and a second volume 703 , respectively.
  • the first storage device 600 and the second storage device 700 may store data into persistent storage devices 601 and 701 according to the first volume 609 and the second volume 703, respectively.
  • the storage disk can be, for example, a persistent storage medium such as a solid state disk (Solid State Disk, SSD), a hard disk drive (Hard Disk Drive, HDD).
  • the structures of the physical node A, the physical node B, the physical node C, and the physical node D are the same, and only the structure of the node A is used as an example for description in this embodiment of the present application.
  • the physical node A includes a processor 602 and a memory 603 .
  • the memory 603 stores application program instructions (not shown) and data generated during the running of the processor.
  • the processor 602 executes the application program instructions to implement the active-active function of the Active-Active mode provided by the embodiment of the present application.
  • the memory 603 also stores a global view 604 , a file system 605 , cache data 606 and backup data 607 .
  • each physical node includes two file systems, one is a file system shared by each physical node, and the other is a file system of each physical node.
  • the detailed introduction of other data in the memory 603 is introduced in conjunction with the method for implementing active-active, such as the flowcharts shown in FIG. 5 to FIG. 9 .
  • the first client 800 is connected to the first storage device 600 to access data in the first storage device 600
  • the second client 700 is connected to the second storage device 900 to access data in the second storage device 900 . data.
  • FIG. 3A it is a flowchart of a method for establishing a global view provided by an embodiment of the present application.
  • Step S301 the physical node A of the first storage device 600 receives a virtual cluster establishment request sent by the client.
  • the first storage device is the master array
  • the physical node A in the first storage device 600 is the master node
  • the physical node A processes the request.
  • Step S302 the physical node A establishes a global view 604, and synchronizes the established global view 604 to physical nodes corresponding to other virtual nodes in the global view.
  • the first storage device 600 After the first storage device 600 establishes a network connection with the second storage device 700 , the first storage device 600 acquires the identifiers of each physical node in the second storage device 700 and the IP address of each physical node.
  • the node A assigns a virtual identifier to each physical node in the first storage device 600 and the second storage device 700 to identify the virtual node, and establishes a global view to record the virtual node's Virtual ID.
  • the computing resources of each physical node such as processor resources and memory resources, are the computing resources allocated to the virtual node. In other embodiments, in addition to the computing resources, other physical resources may also be allocated to each virtual node. , such as bandwidth, etc.
  • the physical resources allocated by each virtual node are independent of each other. In this way, for a storage device, it is more convenient to expand the storage device. For example, when a new storage device is added For physical resources, a new virtual node is generated according to the new physical resource, thereby increasing the number of virtual nodes, and adding the newly added virtual node to the global view. In distributed storage, the added servers are used as new physical resources, and virtual nodes are established according to the added servers, thereby increasing the number of virtual nodes in the global view. The established global view is shown in Vcluster in FIG.
  • VNode A and Vnode B are allocated for physical node A and physical node B in the first storage device 600, which are physical node C of the second storage device 700.
  • physical node D allocates virtual identities VNode C and VNode D.
  • the node A stores the global view 604 in the memory 603 and the persistent storage device 601, and then synchronizes the node set table 604 to the physical nodes corresponding to other virtual nodes ( physical nodes B, C and D), and the persistent storage medium 701 of the second storage device 700 .
  • Step S303 the physical node A generates a shard view according to the node set, and synchronizes the shard view to physical nodes corresponding to other virtual nodes in the virtual node cluster.
  • a preset number of shards such as 4096 are set for the virtual cluster, and these shards are evenly distributed to each virtual node in the global view 604, that is, a shard view is generated.
  • the produced shard view is shown as the shard view in Figure 3B.
  • the shard is used to distribute and store the directories and files of the file system 605 to the physical nodes corresponding to each virtual node in the global view 604. The specific function of the shard view will be described in detail below.
  • the physical node A After the shard view is generated, the physical node A stores the shard view in the local memory 603 and the persistent storage medium 601, and synchronizes the shard view to the physical nodes (physical nodes B, C and D), and in the persistent storage medium 701 of the second storage device 700 .
  • Step S303 the physical node A generates a data backup policy, and synchronizes the data backup policy to the physical nodes corresponding to other virtual nodes in the virtual node cluster.
  • a data backup policy may be set in this embodiment of the present application, that is, the generated data is backed up to multiple nodes.
  • the backup strategy in the embodiment of the present application is to perform three-copy backup of data, two of which are stored in two local physical nodes, and the other one is stored in a physical node of a remote storage device.
  • a set of backup nodes is set for each virtual node.
  • the corresponding backup nodes of virtual node Vnode A are set as virtual nodes VnodeB and VnodeC, and the virtual node corresponding to virtual node VnodeB is VnodeA.
  • the virtual nodes corresponding to the virtual node VnodeB are VnodeA and VnodeD
  • the virtual nodes corresponding to the virtual node VnodeD are VnodeC and VnodeB.
  • the node A stores the backup policy in the local memory 603 and the persistent storage device 601, and synchronizes the backup policy to the persistent storage device 701 and the global view of the second storage device 700 in the corresponding physical nodes of other virtual nodes.
  • the establishment of the virtual cluster in FIG. 3A is performed by a management module.
  • the management module is located in the first storage device as an example for illustration. After the management module generates the file system and the global view, all The generated file system and global view are sent to the first storage device and the second storage device for storage.
  • the management module may also be located on an independent third-party management device, and the third-party management device sends the file system and global view to the first storage device and the second storage device after generating the file system and the global view. storage, so that each physical node can obtain the global view.
  • a monitoring module will monitor the changes of the virtual nodes in the first storage device and the second storage device. When it is detected that a new virtual node is added to the virtual cluster, or when the virtual If the node is deleted, for example, the physical node corresponding to the virtual node fails, the monitoring module will notify the management module to update the global view.
  • the monitoring module may be located on the third-party management device, or may be located in the first storage device and the second storage device.
  • the first storage device is used as the main storage device, the second storage device will send the monitored changes to the first storage device, and the management module in the first storage device will update the global view. In this way, the establishment of the virtual node cluster can be completed.
  • the first storage device 600 and the second storage device 700 can establish a file system according to the request of the client. Specifically, as shown in the flowchart of FIG. 4A .
  • Step S401 physical node A receives a file system creation request.
  • the first client 800 may send the file system creation request to the first storage device 600, or may send the file system creation request to the second storage device 700. If the first storage device 600 receives the file system creation request, then The file system creation request is processed by the physical node A. If the second storage device 700 receives the file system creation request, the second storage device 700 forwards the file system creation request to the first storage device 600 physical node A processing.
  • Step S402 the physical node A sets a root directory for the file system.
  • the master node When the master node sets the root directory, it first generates a mark of the root directory. In general, the default mark of the root directory is "/", and then assigns identification information and a shard ID to the root directory. Since the shard view created on the master node is synchronized to all nodes, the master node obtains the shard view from its own memory, and selects the shard ID for the root directory from it. As shown in Figure 3B, each virtual node in the shard view is assigned multiple shard IDs. Therefore, in order to reduce cross-network and cross-node access, the virtual node Vnode A corresponding to physical node A will be prioritized. Assign a shard ID to the root directory in the shard ID of . Since it is the root directory, the shard ID has not been allocated, for example, shard 0 can be selected as the shard ID of the root directory.
  • Step S403 the physical node A sends a file system mount command to the first client 800.
  • the physical node A After the root directory of the cluster file system is generated, in order to enable the first client 800 to access the file system, the physical node A will mount the file system to the file system of the first client 800 .
  • the physical node A provides the root directory of the file system to the first client 800 through the mount command, and when the physical node A sends the mount command, it will carry the parameter information of the root directory, the parameter information of the root directory That is, the handle information of the root directory, and the handle information carries the shard ID and identification information of the root directory.
  • Step S404 the first client 800 mounts the cluster file system to the file system of the first client 800 according to the mount command.
  • the first client 800 After the first client 800 receives the parameter information of the root directory of the file system, it will generate a mount point on the file system of the first client, and record the root directory of the file system at the mount point parameter information, the mount point is a segment of storage space.
  • the first client 800 may also perform data transmission through the file system 605 with the first client 800 . Users can select the file system to be accessed according to actual needs.
  • Step S405 the physical node A allocates a virtual volume to the file system.
  • Each newly created file system will be allocated a virtual volume Vvloume 0 for writing the data written to the file system by the first client or the second client.
  • Step S406 the physical node A creates a mirrored volume pair for the virtual volume.
  • the physical node A After the virtual volume Vvloume 0 is established, the physical node A first creates a local volume based on the persistent storage medium 601, such as the first volume in FIG. Create a mirrored volume of the first volume in, for example, the second volume file system file system in FIG. 2 .
  • Step S407 the physical node A generates a disk flushing policy by recording the mirrored volume pair corresponding to the virtual volume.
  • the generated flushing strategy is shown in the flushing strategy shown in FIG. 3B , the mirrored volume pair (the first volume and the second volume) corresponding to the virtual volume of the file system.
  • the data of the file system cached in the memory can be stored in the persistent storage medium 601 of the first storage device 600 and the persistent storage medium 701 of the second storage device 700 respectively in order to ensure the reliability of the data.
  • how to write the data in the memory into the persistent storage medium 601 and the persistent storage medium 701 according to the disk flushing strategy will be described in detail in FIG. 9 .
  • the physical node A After generating the flushing policy, the physical node A stores the file system flushing strategy in the local memory 603 and the persistent storage device 601, and synchronizes it to the persistent storage device 701 and the global view of the second storage device 700 in the corresponding physical nodes of other virtual nodes.
  • FIG. 4B The schematic diagram of the active-active storage system after the file system is created is shown in FIG. 4B, that is, the first storage device and the second storage system are Generate cross-device file systems, virtual volumes, shard views, and global views on storage devices.
  • directories and files can be created and accessed based on the file system.
  • the process of creating a directory under the file system will be described with reference to the flowchart shown in FIG. 5 .
  • the root directory is taken as a parent directory, and the directory to be created is introduced as a subdirectory of the parent directory.
  • the user may access the first storage device through the first client to create the subdirectory, and may also access the second storage device through the second client to create the subdirectory.
  • the first storage device mounts the file system to the first client, a path for the first client to access the file system is established. For example, the first storage device mounts the file system through physical node A. If it is mounted to the first client, the first client will access the file system through the physical node A.
  • the second storage device In order to realize Active-Active active-active access, the second storage device also mounts the file system to the file system of the second client, thus establishing a path for the second client to access the file system, The request of the second client to access the file system will be sent to the physical node on which the file system is mounted, such as physical node C.
  • the following describes the process of creating a subdirectory by using the second client to send a subdirectory creation request to the second storage device as an example.
  • Step S501 the second client sends a subdirectory creation request to the physical node C.
  • the physical node C is the master node of the second storage device 700, that is, the node that mounts the file system to the second client.
  • the subdirectory creation request includes parameter information of the parent directory and the name of the subdirectory.
  • Step S502 the physical node C receives the creation request sent by the second client, and generates parameter information for the subdirectory according to the creation request.
  • the parameter information includes the identification information and shard ID of the parent directory, the identification information is used to uniquely identify the subdirectory, and the identification information is, for example, the object ID in the NFS file system.
  • the physical node C looks up the shard view, assigns a shard ID to the subdirectory from the shard ID recorded in the shard view, and then assigns a shard ID to the physical node corresponding to the virtual node to which the shard ID belongs.
  • the subdirectory is created in the node. It should be noted that each directory can be assigned a shard ID, but a shard ID can be assigned to multiple directories.
  • a shard ID is allocated to the subdirectory in the shard ID of the virtual node corresponding to the physical node that receives the subdirectory request, that is, the virtual node corresponding to the physical node C is assigned a shard ID.
  • the shard [2048, 3071] corresponding to the node Vnode C assigns a shard ID to the subdirectory.
  • the directory corresponding to the shard ID in the virtual node Vnode C exceeds the preset threshold, the subdirectory will be assigned the shard ID corresponding to other virtual nodes.
  • Step S503 the physical node C creates the subdirectory.
  • Creating the subdirectory includes generating a directory entry table (DET) and an Inode table for the subdirectory.
  • the directory entry table is used to record the parameter information of the subdirectory or file established under the subdirectory as a parent directory after the subdirectory is successfully created.
  • the parameter information includes, for example, the name of the subdirectory, the directory Or the identification information of the file and the shard ID, etc.
  • the Inode table is used to record the detailed information of the file subsequently created in the subdirectory, such as the file length of the file, the operation authority of the user on the file, the modification time of the file, and other information.
  • Step S504 the physical node C determines, according to the parameter information of the parent directory, the physical node B in the first storage device to which the parent directory belongs.
  • the parameter information of the parent directory includes the Shard ID, and the virtual node corresponding to the Shard ID can be determined to be the virtual node Vnode B through the Shard view, and then the virtual node Vnode B is further determined according to the virtual node Vnode B.
  • the corresponding physical node is the physical node B in the first storage device.
  • Step S505 the physical node C sends the parameter information of the subdirectory and the parameter information of the parent directory to the physical node B.
  • Step S506 the physical node B finds the directory entry table of the parent directory according to the parameter information of the parent directory.
  • the parent directory can be found according to the shard ID and the parent directory name in the parameter information of the parent directory.
  • Step S507 the physical node B records the parameter information of the subdirectory in the directory entry table of the parent directory.
  • Step S508 the physical node B first returns the parameters of the subdirectory to the physical node C, and the physical node C then returns the parameters of the subdirectory to the second client.
  • the client will first query the subdirectory according to the parameter information of the root directory filesystem1
  • the parameter information of user1 that is, generating a request for querying the user1
  • after querying the parameter information of the user1 and then querying the parameter information of the favorite according to the parameter information of the user1, that is, generating a request for querying the favorite.
  • the method for querying the parameter information of the directories of each level is the same.
  • the following is an example of the directory above as the parent directory and the directory to be queried as the subdirectory to illustrate the process of a directory query.
  • the physical node C of the second storage device receives the query request as an example for description
  • Step S601 the second client sends a subdirectory query request to the physical node C.
  • the query request carries the parameter information of the parent directory and the name of the subdirectory.
  • the parameter information of the parent directory is, for example, the handle of the parent directory.
  • the handle of the root directory is obtained from the file system of the client.
  • the handle of the parent directory can be queried through a query request for querying the parent directory.
  • the handle of the parent directory includes identification information and shardID of the parent directory.
  • Step S602 the physical node C receives the query request sent by the second client, and determines the physical node B to which the parent directory belongs according to the query request.
  • the physical node C obtains the shard ID of the root directory from the parameter information of the root directory, and obtains the virtual node to which the parent directory belongs according to the shard ID.
  • physical node C Since physical node A synchronizes the created shard view to all nodes, physical node C obtains the shard view from its own memory, determines the virtual node to which the parent directory belongs according to the shard ID of the parent directory, and then Determine the physical node corresponding to the virtual node.
  • Step S603 the physical node C sends the parameters of the parent directory and the name of the subdirectory to the physical node B where the parent directory is located.
  • Step S604 the physical node B determines the directory entry table of the parent directory according to the parameters of the parent directory.
  • Step S605 the physical node B obtains the parameter information of the subdirectory from the directory entry table of the parent directory.
  • Step S606 the physical node B returns the parameter information of the subdirectory to the physical node C.
  • Step S607 the physical node C returns the parameter information of the subdirectory to the second client.
  • Figures 5 and 6 illustrate the case where the second client accesses the second storage device and creates a subdirectory in the file system and queries the subdirectory, but in practical applications, the first client can also access the first A storage device creates and queries the subdirectory.
  • the first client or the second client can obtain the parameter information of the subdirectory, and then can create the subdirectory according to the parameter information of the subdirectory document.
  • the following describes a process in which a user accesses the first storage device through the first client and creates a file in a subdirectory, as shown in FIG. 7 .
  • Step S701 the client sends a file generation request to the physical node A.
  • the file generation request carries the parameter information and file name of the subdirectory.
  • physical node A has sent the parameter information of the subdirectory to the client, so when the client needs to create a file in the subdirectory, it can query the file The request carries the parameter information of the subdirectory and the file name of the file.
  • Step S702 after receiving the file generation request, the physical node A determines the physical node D to which the subdirectory belongs according to the parameter information of the subdirectory.
  • step S602 in FIG. 6 The manner of determining the physical node D to which the subdirectory belongs is the same as that of step S602 in FIG. 6 , which will not be repeated here.
  • Step S703 the physical node A sends the parameter information of the subdirectory and the file name to the physical node D.
  • Step S704 the physical node D determines whether the file has been created.
  • the physical node D finds the subdirectory according to the Shard ID and the subdirectory name in the parameters of the subdirectory, then finds the DET corresponding to the subdirectory, and searches for the file name in the DET, if there is , it means that a file with the same file name has been created, then step S705 is executed, if it does not exist, it means that the file can be created in the subdirectory, and step S706 is executed.
  • Step S705 the node D sends the feedback that the file name has been created to the node A, and the node A further feeds back to the first client.
  • the first client can further notify the user through a notification message that a file with the same file name already exists, and the user can perform further operations, such as modifying the file name, according to the prompt information.
  • Step S706 the node D creates the file.
  • the node D When the node D creates a file, set parameters for the file, such as assigning a shard ID, assigning file identification information, and adding the shard ID and file identification information to the DET of the subdirectory.
  • an Inode table will be generated for the subdirectory, and the inode table is used to record the information of the files generated under the subdirectory. Therefore, in this In the step, after the node D creates the file, the information of the file will be added to the Inode in the subdirectory.
  • the file information includes information such as the length of the file, the user's authority to operate the file, and the modification time of the file.
  • Step S707 the physical node D feeds back the file parameters.
  • the physical node D first sends the feedback information to the node A, and the node A further feeds back the feedback information to the first client.
  • step S702 when the physical node A determines that the home node of the subdirectory is the physical node A, the physical node A executes the above steps S704 to S707.
  • the user can write data in the file.
  • a user can write data in the file through a first client connected to the first storage device and a second client connected to the second storage device.
  • the following describes a process in which the user accesses the first storage device through the first client and writes data to the file as an example, as shown in FIG. 8 .
  • Step S801 physical node A receives a write request to the file.
  • any node since any node stores a file system, a user can access files in the file system through a client connected to any node.
  • the write request carries address information of the file, where the address information includes parameter information, offset address and data to be written of the file.
  • the parameter information of the file is the handle of the file, including a file system identifier, a file identifier, and a shard ID.
  • Step S802 the physical node A determines the home node D of the file according to the write request.
  • step S602 in FIG. 6 For the method of determining the home node D that records the file according to the shard ID of the file, please refer to step S602 in FIG. 6 , which will not be repeated here.
  • Step S803 the physical node A forwards the write request to the physical node D.
  • Step S804 the physical node D converts the access to the file system into the access to the virtual volume corresponding to the file system.
  • the physical node D Since the virtual volume created for the file system is recorded in each physical node, the physical node D replaces the identifier of the file system in the write request with the identifier of the virtual volume.
  • Step S805 the physical node D finds the file according to the file identifier and shard ID in the write request and updates the information of the file.
  • the inode item corresponding to the file is found in the Inode according to the inode number of the file contained in the file identifier,
  • the file information is recorded therein, for example, according to the length and offset address of the data to be written carried in the write request, the length and offset address of the file are updated and the current time is recorded as the file update time.
  • Step S806 the physical node D writes multiple copies of the data to be written according to a preset backup policy.
  • a backup policy is established for the file system, and a backup node is set for each node in the backup policy.
  • the physical node can be determined
  • the backup nodes of D are physical node C and physical node B.
  • the physical node D writes the to-be-written data into the local memory, it sends the to-be-written data to physical node C and physical node B.
  • the physical node C and the physical node B write the to-be-written data into their own memory.
  • Step S807 after determining that the multi-copy writing is completed, the physical node D returns a write request completion message to the first client.
  • Step S808 the physical node D persistently stores the data to be written.
  • the virtual volume of the file system corresponds to the mirrored volume pair: the first volume in the first storage device and the second volume in the second storage device.
  • the physical node D determines that the data to be written needs to be eliminated to persistent storage, that is, when flushing the disk, first according to the virtual data recorded in the address in the data to be written.
  • the volume obtains the mirrored volume pair corresponding to the virtual volume from the disk flushing policy, that is, the first volume in the first storage device and the second volume in the second storage device, and then copies the memory of the second storage device Write the data to be written in the persistent storage 701 into the physical space corresponding to the second volume, and then send the memory address of the data to be written to the physical node D in the first storage device
  • the corresponding backup node B the physical node B writes the data to be written stored in the memory of the physical node B into the physical space corresponding to the first volume in the persistent storage 601 of the first storage device according to the memory address. .
  • FIG. 9 it is a flowchart of a method for reading a file in an embodiment of the present application.
  • the user can also access files in the file system through any client. This embodiment is described by taking the user reading the file through the second client as an example.
  • Step S901 the physical node C receives the read request of the file.
  • the read request carries the address information of the file, and the address information includes the parameter information of the file and the offset address, and the parameter information is the handle of the file, including the file system identifier, the file identifier, and the shard ID.
  • the parameter information of the file has been acquired according to the method shown in FIG. 6 .
  • Step S902 the physical node C determines the home node B of the file according to the read request.
  • step S602 in FIG. 6 Please refer to the description of step S602 in FIG. 6 for the way of confirming the home Node B of the file, which will not be repeated here.
  • Step S903 the physical node C forwards the read request to the home node B.
  • Step S904 the physical node B converts the access of the read request to the file system into an access to the virtual volume of the file system.
  • Step S905 the physical node B reads the file from the memory of the physical node B according to the address in the read request.
  • Step S906 the physical node B returns the file.
  • Step S907 when the file is not in the memory, the physical node B retrieves the file from the persistent storage 601 according to the first volume in the first storage device corresponding to the virtual volume in the disk flushing policy. The file is read and returned to the physical node C, and the physical node C returns the file to the second client.
  • the first storage device and the second storage device when the first storage device and the second storage device access files and directories, they both forward the access request to the home node of the files and directories based on the shard ID, which will result in cross-device data access. access, thereby affecting access efficiency.
  • the data since both the first storage device and the second storage device back up the data of the opposite end, when receiving an access request to access the data of the opposite end, the data can be backed up from the local end The data that needs to be accessed is obtained from the backup data of the opposite end, without the need to obtain the accessed data from the opposite end, thereby improving the efficiency of data access.
  • the business of the failed storage device can be taken over by backing up data.
  • FIG. 10 when the link between the first storage device and the second storage device is disconnected, or the second storage device fails, the backup data of the second storage device stored in the first storage device can be used to take over the second storage device.
  • the following description takes the disconnection of the link between the first storage device and the second storage device as an example. Specifically, as shown in the flowchart shown in FIG. 11 .
  • Step S111 the first storage device and the second storage device simultaneously detect the heartbeat of the opposite end.
  • Step S112 when the heartbeat of the opposite end is not detected, the first storage device and the second storage device suspend the service being executed.
  • Step S113 the first storage device and the second storage device modify the global view and the file system.
  • the first storage device and the second storage device When the heartbeat of the peer cannot be detected, the first storage device and the second storage device will prepare to take over the business of the peer, and the global view and file system will be modified, and the physical nodes of the peer in the global view will be modified.
  • the corresponding virtual node is deleted from the global view, and the backup node of the opposite end in the backup policy is deleted.
  • the first storage device modifies the global view to (Vnode A, Vnode B)
  • the second storage device modifies the global view to (Vnode C, Vnode D).
  • modify the shard of the virtual node corresponding to the peer node in the shard view of the file system to the shard corresponding to the virtual node corresponding to the end node.
  • Step S114 both the first storage device and the second storage device send an arbitration request to the arbitration device.
  • Step S115 the arbitration device arbitrates the first storage device to take over the service.
  • the arbitration device may determine the device to take over the service according to the order of receiving the arbitration request, for example, the storage device corresponding to the first received arbitration request is used as the device to take over the service.
  • Step S116 the arbitration device notifies the first storage device and the second storage device of the arbitration result respectively.
  • Step S117 after receiving the notification, the second storage device releases the connection with the second client, that is, stops the execution of the service.
  • Step S118 after receiving the notification, the first storage device shifts the IP address of the second storage array to the first storage device, and establishes a connection with the second client.
  • Step S119 taking over the business of the second storage array through the backup data of the second storage array.
  • the backup data of the second storage array is stored in the first storage device, when the first storage device receives an access to the data in the first storage device, it can use the shard ID in the access request to The access requests of the first client and the second client to the data in the second storage device are directed to the access to the backup data, so that the first client and the second client cannot perceive the link interruption.
  • the written data is only written to the memory of the node of the first storage device, and is only stored in the volume of the first storage device.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Quality & Reliability (AREA)
  • Computer Security & Cryptography (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Hardware Redundancy (AREA)

Abstract

一种双活存储系统及基于所述双活存储系统对数据处理的方法。所述双活存储系统包括第一存储设备和第二存储设备。所述第一存储设备用于接收客户端集群发送给文件系统的第一文件的数据,存储所述第一文件的数据,并且将所述第一文件的数据的第一副本数据发送给所述第二存储设备。所述第二存储设备用于接收所述客户端集群发送给所述文件系统的第二文件的数据,存储所述第二文件的数据,并且将所述第二文件的第二副本数据发送给所述第一存储设备。由于第一存储设备和第二存储设备都可以通过同样的文件系统进行文件数据的存储,并可以备份对端的文件的数据,从而实现了Active-Active模式双活存储系统。

Description

一种双活存储系统及其处理数据的方法 技术领域
本申请涉及存储领域,特别涉及一种双活存储系统及其处理数据的方法。
背景技术
对于网络存储集群,例如网络附加存储(Network Attached Storage,NAS)集群,在实现双活时,其中的第一存储设备在接收到写入数据时,会将所接收到的写入数据写到本地的同时,会同步到对端存储设备作为备份数据,这样,在第一存储设备故障,或者第一存储设备与第二存储设备断开连接时,第二存储设备可利用所述备份数据接管第一存储设备的业务,从而保证业务不中断,即实现了Active-Passive模式的双活。但是无法实现Active-Active模式的双活。
发明内容
本申请提供一种双活存储系统及实现双活存储系统的方法,用于实现Active-Active模式的双活,使双活存储系统中的存储设备可以访问同一文件系统中的数据。
本申请第一方面提供一种双活存储系统。所述双活存储系统包括第一存储设备和第二存储设备。所述第一存储设备用于接收客户端集群发送给文件系统的第一文件的数据,存储所述第一文件的数据,并且将所述第一文件的数据的第一副本数据发送给所述第二存储设备。所述第二存储设备用于接收所述客户端集群发送给所述文件系统的第二文件的数据,存储所述第二文件的数据,并且将所述第二文件的第二副本数据发送给所述第一存储设备。
由于第一存储设备和第二存储设备都可以通过同样的文件系统进行文件数据的存储,并可以备份对端的文件数据,从而实现了Active-Active模式双活存储系统。传统的NAS设备也具有文件系统,但Active-Passive模式下的两个存储设备各自拥有独立的文件系统,两个独立的文件系统都需要占用存储设备的计算/存储资源,使得资源的利用效率低,管理起来也较为复杂,这并非真正意义上双活。在本申请中,第一存储设备和第二存储设备拥有同一个文件系统,可以提高资源的利用效率,降低了管理复杂度。另外,由于客户端给存储设备发送访问请求时,对它而言也是向同一个文件系统发送请求。因此对客户端而言其访问效率也提高了。
在本申请第一方面的一种可能的实现中,所述双活存储系统还包括虚拟节点集合,所述虚拟节点集合包括多个虚拟节点,每个虚拟节点分配有计算资源,所述计算资源来源于所述第一存储设备或所述第二存储设备中的物理节点。
所述物理节点可以为所述第一存储设备及所述第二存储设备控制节点,也可以是控制节点中的CPU,或者CPU中的内核。虚拟节点是逻辑上的概念,它作为资源分配的媒介用于实现该系统中计算资源的隔离。按照这种资源管理方式,各个虚拟节点分配有独立的计算资源,那么对应不同虚拟节点的文件/目录所使用的计算资源也是独立的。由此有利于所述双活存储系统的扩容或减容,也有利于实现计算资源之间的免锁机制,降低了复杂度。
在本申请第一方面的一种可能的实现中,所述双活存储系统还包括管理设备,所述管理 设备还用于创建全局视图,所述全局视图用于记录每个虚拟节点与其分配的计算资源之间的对应关系;所述管理设备还用于将所述全局视图发送给所述第一存储设备及所述第二存储设备,所述第一存储设备及所述第二存储设备保存所述全局视图。
所述管理设备可以作为一个软件模块,安装在第一存储设备或第二存储设备上,也可以为一个独立的设备,在作为安装在第一存储设备上的软件模块时,在生成全局视图后,通过与存储设备中的其他模块的交互,将所述全局视图发送至第一存储设备及所述第二存储设备存储。
通过全局视图的方式将虚拟节点集合中的虚拟节点分别呈现给第一存储设备及第二存储设备中的应用,第一存储设备及第二存储设备中的应用会将对端的物理节点作为本端的资源进行使用,从而更方便与对端物理节点进行交互。
在本申请第一方面的一种可能的实现中,在存储所述第一文件的数据时,所述第一存储设备根据所述第一文件的数据的地址确定所述第一文件对应的第一虚拟节点,根据所述第一虚拟节点以及所述全局视图确定为所述第一虚拟节点分配的计算资源,并基于为所述第一虚拟节点分配的计算资源,将所述第一文件的数据发送给所述计算资源对应的物理节点,由所述物理节点将所述第一文件的数据存储至所述物理节点的内存中。
通过全局视图提供的虚拟节点集合,第一存储设备可以接收归属于所述虚拟节点集合中的任意虚拟节点对应的物理节点的文件的数据,并将所接收的文件的数据转发至所述文件归属的物理节点进行处理,这样,用户在写数据的时候,不必感知文件实际存储的位置,可通过任意一个存储设备对文件进行操作。
在本申请第一方面的一种可能的实现中,所述第一虚拟节点具有至少一个备份虚拟节点,所述第一虚拟节点对应的物理节点与所述备份虚拟节点对应的物理节点位于不同的存储设备中,在确定第一文件对应的第一虚拟节点后,所述第一存储设备还会确定所述第一虚拟节点对应的备份虚拟节点;根据所述备份虚拟节点以及所述全局视图确定所述备份虚拟节点对应的物理节点;将所述第一副本数据发送给所述备份虚拟节点对应的物理节点,由所述备份虚拟节点对应的物理节点将所述第一备份数据存储在所述物理节点中。
通过将写入第一存储设备的文件的数据备份到第二存储设备中,在第一存储设备故障或者与第二存储设备断开连接后,可通过所述备份数据接管第一存储设备的业务,从而提升了系统的可靠性。
在本申请第一方面的一种可能的实现中,所述文件系统所包括的文件和目录分布在所述虚拟节点集合中的多个虚拟节点对应的物理节点中。
所述文件系统所包括的文件和目录分布在所述虚拟节点集合中的多个虚拟节点对应的物理节点中,具体指将所述文件系统所包括的文件和目录打散给多个物理节点处理。这样,可以充分利用第一存储设备和第二存储设备的物理资源,提高文件的处理效率。
在本申请第一方面的一种可能的实现中,所述虚拟节点集合中的每个虚拟节点设置有一个或多个分片标识,所述文件系统中的每个目录及文件分配一个分片标识,所述第一存储设备和第二存储设备中的物理节点根据每个目录及文件的分片标识将所述目录和文件分布式至所述分片标识所属的虚拟节点对应物理节点中。
通过分片标识可以更方便的将所述文件系统所包括的文件和目录分布至第一存储设备及第二存储设备的各个物理节点。
在本申请第一方面的一种可能的实现中,所述第一存储设备中的第一物理节点用于接收所述第一文件的创建请求,从为所述第一物理节点对应的虚拟节点设置的一个或多个分片标 识中为所述第一文件选择一个分片标识,在所述存储设备中创建所述第一文件。
在创建文件时,通过给文件分配接收到所述文件创建请求对应的物理节点的虚拟节点的分片标识,避免将所述文件创建请求转发至其他物理节点,从而提高了处理效率。
在本申请第一方面的一种可能的实现中,当所述第二存储设备故障或所述第一存储设备与所述第二存储设备之间的链路断开后,所述第一存储设备还用于基于所述第二文件的第二副本数据恢复所述第二文件,并接管所述客户端集群发送给所述第二存储设备的业务。
在第一存储设备故障或者与第二存储设备断开连接后,可通过备份数据接管第一存储设备的业务,从而提升了系统的可靠性。
在本申请第一方面的一种可能的实现中,所述第一存储设备还用于从所述全局视图中删除所述第二存储设备的计算资源对应的虚拟节点。
在本申请第一方面的一种可能的实现中,所述第一存储设备还具有第一文件系统,所述第二存储设备还具有第二文件系统。
在同一存储设备中同时运行本地文件系统及集群的文件系统,为用户提供了多种访问存储设备中的数据的方式。
本申请第二方面提供一种实现双活文件系统的方法,所述方法所包括的步骤用于实现本申请第一方面提供的双活存储系统中第一存储设备及第二存储设备所执行的各个功能。
本申请第三方面提供一种管理设备,所述管理设备用于创建全局视图,所述全局视图用于记录每个虚拟节点与其分配的计算资源之间的对应关系,还用于将所述全局视图发送给所述第一存储设备及所述第二存储设备存储。
所述管理设备用于监控第一存储设备及第二存储设备中的虚拟节点的变化,当侦测到所述虚拟集群中增加了新的虚拟节点,或者当虚拟节点被删除,例如虚拟节点对应的物理节点故障,则更新所述全局视图。
通过所述监控模块可以实时监控虚拟节点集群中的虚拟节点的变化,从而及时更新所述全局视图。
本申请第四方面提供一种存储介质,用于存储程序指令,所述程序指令用于实现第三方面提供的管理设备提供的各项功能。
附图说明
为了更清楚地说明本申请实施例或中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍。
图1为Active-passive模式的双活存储系统的架构图。
图2为本申请实施例提供的Active-Active模式的双活存储系统的架构图。
图3A为本申请实施例中建立双活存储系统的方法的流程图。
图3B为本申请实施例中构建双活存储系统的过程中所生成各项参数的示意图。
图4A为本申请实施例建立所述双活存储系统的文件系统的流程图。
图4B为本申请实施例所构建的双活系统的示意图。
图5为本申请实施例在文件系统中创建目录的方法的流程图。
图6为本申请实施例在文件系统中查询目录的方法的流程图。
图7为本申请实施例在文件系统中创建文件的方法的流程图。
图8为本申请实施例在文件系统中的文件中写入数据的方法的流程图。
图9为本申请实施例在文件系统中的文件中写入数据的方法的流程图。
图10为本申请实施例中第一存储设备接管所述第二存储设备的业务的示意图。
图11为本申请实施例中第一存储设备接管所述第二存储设备的业务的方法的流程图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。
如图1所示,为Active-passive模式的双活系统的架构示意图。所述系统10包括第一存储设备100及第二存储设备200。在第一存储设备100的控制节点101中(第一存储设备可包括多个控制节点,为方便描述,仅以一个为例进行说明)设置有第一文件系统102,在第二存储设备200的控制节点201(第二存储设备也可包括多个控制节点,为方便描述,仅以一个为例进行说明)中设置有第二文件系统202。当第一客户端300连接至第一存储设备100之后,所述第一存储设备100将所述第一文件系统102挂载至所述第一客户端300。当第二客户端400连接至第二存储设备200之后,所述第二存储设备200将所述第二文件系统202挂载至所述第二客户端400。每个文件系统都具有根目录,所述存储设备将文件系统挂载至所述客户端是指存储设备将文件系统的根目录提供给客户端,客户端将文件系统的根目录设置在客户端的文件系统中,从而使客户端可以获取所述存储设备的文件系统的根目录,从而根据所述存储设备的文件系统的根目录访问存储设备的文件系统。如此,在第一文件系统102挂载至所述第一客户端300之后,第一客户端300通过第一文件系统102进行数据的读写,写入的数据作为本端数据103存储。另外,在第一存储设备100中还会存储第二存储设备200的备份数据,即对端备份数据104。同理,第二客户端400通过第二文件系统202进行数据的读写,写入的数据作为本端数据203存储。另外,在第二存储设备200中还会存储第一存储设备100的备份数据,即对端备份数据204。这样,在第一存储设备100发生故障或者与第二客户端之间的链接断开后,第二客户端可以利用对端备份数据204接管第一客户端300的业务,即实现Active-passive模式的双活。但是在Active-passive模式的双活系统100中,在第一存储设备100和第二存储设备200都正常运行期间,第一客户端300只能通过第一文件系统访问第一存储设备100中的数据,而无法访问第二存储设备200中的数据,第二客户端400只能通过第二文件系统访问第二存储设备200中的数据,而无法访问第一存储设备100中的数据,即无法实现Active-Active模式的双活。
本申请实施例提供的技术方案通过设置全局视图,全局视图为虚拟节点的集合,全局视图中的每个虚拟节点分配有计算资源,所述计算资源来自于所述第一存储设备和所述第二存储设备的物理节点,所述物理节点可以是第一存储设备中的控制器及第二存储设备的控制器,也可以是控制器中的CPU,或者CPU中的内核,还可以是分布式存储系统中的服务器。在本申请实施例中,每个物理节点都可以获取所述全局视图,另外,每个物理节点还使用相同的文件系统,这样,连接至第一存储设备的第一客户端和连接至第二存储设备的第二客户端挂载有相同的文件系统,如此,所述第一客户端可以通过所述文件系统及所述全局视图访问所述第二存储设备中归属于所述文件系统的数据。下面将结合附图对本申请实施例提供的方案进行详细描述。
如图2所示,为本申请实施例提供的Active-Active模式的双活系统500的架构图。所述系统500包括第一存储设备600及第二存储设备700。所述第一存储设备600包括物理节点A及物理节点B。第二存储设备700包括物理节点C及物理节点D。在实际应用中,所述第一存储设备600 及第二存储设备700可以包括更多的物理节点,为了方便描述,本实施例仅以每个存储设备包括两个物理节点为例进行说明。所述第一存储设备600和第二存储设备700分别包括由多个存储盘构成的持久性存储设备601及701,用于持久性存储数据。基于所述持久性存储设备601及701的存储盘提供的物理存储空间,第一存储设备600及第二存储设备700分别创建第一卷609及第二卷703。第一存储设备600及第二存储设备700可分别根据所述第一卷609及第二卷703将数据存储至持久性存储设备601及701中。所述存储盘例如可以为固态硬盘(Solid State Disk,SSD),硬盘驱动器(Hard Disk Drive,HDD)等持久性存储介质
物理节点A、物理节点B、物理节点C、及物理节点D的结构相同,本申请实施例中仅以节点A的结构为例进行说明。物理节点A包括处理器602及内存603。内存603中存储应用程序指令(图未示)及处理器运行过程中产生的数据。所述处理器602执行所述应用程序指令以实现本申请实施例提供的Active-Active模式的双活功能。所述内存603中除了第一文件系统608之外,还存储有全局视图604、文件系统605、缓存数据606及备份数据607。第一文件系统608的功能与图1中的第一文件系统102的功能相同,在此不再赘述。即在本申请实施例中,每个物理节点包括两个文件系统,一个是各个物理节点共有的文件系统,另外一个是每个物理节点自己的文件系统。关于内存603中其他数据的详细介绍结合实现双活的方法,例如图5到图9所示的流程图进行介绍。第一客户端800连接至所述第一存储设备600,以访问第一存储设备600中的数据,第二客户端700连接至所述第二存储设备900,以访问第二存储设备900中的数据。
下面将结合流程图3A,图4A,图5-图9介绍本申请实施例实现Active-Active模式的双活的方法。
首先,如图3A所示,为本申请实施例提供的建立全局视图的方法的流程图。
步骤S301,第一存储设备600的物理节点A接收客户端发送的虚拟集群建立请求。
当需要构建双活系统时,会建立全局视图,用户可通过客户端发送全局视图建立请求至第一存储设备600。第一存储设备为主阵列,第一存储设备600中的物理节点A为主节点,则由所述物理节点A对所述请求进行处理。
步骤S302,物理节点A建立全局视图604,并同步所建立的全局视图604至全局视图中的其他虚拟节点对应的物理节点。
在第一存储设备600与第二存储设备700建立网络连接后,第一存储设备600会获取第二存储设备700中各物理节点的标识,及每个物理节点的IP地址。在建立所述全局视图604时,所述节点A为第一存储设备600及第二存储设备700中的每个物理节点分配虚拟标识,以标识虚拟节点,并建立全局视图记录所述虚拟节点的虚拟标识。每个物理节点的计算资源,例如处理器资源及内存资源即为所述虚拟节点分配的计算资源,在其他实施例中,除了所述计算资源,还可以给每个虚拟节点分配其他的物理资源,例如带宽等。在本申请实施例中,各个虚拟节点所分配的物理资源是相互独立的,这样,对于一个存储设备来说,可以更方便的对存储设备进行扩容,例如,当个存储设备中增加了新的物理资源,根据新的物理资源生成新的虚拟节点,从而增加虚拟节点的数量,并将新添加的虚拟节点加入所述全局视图。在分布式的存储里,所增加的服务器作为新增的物理资源,根据所增加的服务器建立虚拟节点,从而增加全局视图中虚拟节点的数量。所建立的全局视图如图3B中的Vcluster所示,例如,为第一存储设备600中的物理节点A及物理节点B分配虚拟标识VNode A及Vnode B,为第二存储设备700的物理节点C及物理节点D分配虚拟标识VNode C及VNode D。在生成所述全局视图604后,所述节点A将所述全局视图604存储至内存603及持久性存储设备601中,然后将所述节点集合表604同步至其他虚拟节点对应的物理节点中(物理节点B、C及D),及第二存储设备700的持久性存储介 质701中。
步骤S303,物理节点A根据所述节点集合生成分片(shard)视图,并同步所述shard视图至虚拟节点集群中的其他虚拟节点对应的物理节点中。
在本申请实施例中,会为所述虚拟集群设置预设数量的Shard,例如4096个,这些shard会均分给所述全局视图604中各个虚拟节点,即生成shard视图。所生产的shard视图如图3B中的shard视图所示。所述shard用于将文件系统605的目录及文件分布存储至所述全局视图604中的各个虚拟节点对应的物理节点中,关于shard视图的具体作用将在下文做详细介绍。在shard视图生成之后,所述物理节点A将所述shard视图存储至本地内存603及持久性存储介质601,并同步所述shard视图至其他虚拟节点对应的物理节点中(物理节点B、C及D),及第二存储设备700的持久性存储介质701中。
步骤S303,物理节点A生成数据备份策略,并同步所述数据备份策略至虚拟节点集群中的其他虚拟节点对应的物理节点中。
为了保证数据的可靠性,防止设备故障后数据丢失,本申请实施例可设置数据备份策略,即将所生成的数据备份至多个节点。本申请实施例中的备份策略为对数据进行3副本备份,其中两份存储在本地的两个物理节点中,另外一份存储在远端存储设备的物理节点中。具体地,如图3B所示的备份策略,为每个虚拟节点设置一组备份节点,例如设置虚拟节点Vnode A的对应的备份节点为虚拟节点VnodeB及VnodeC,虚拟节点VnodeB对应的虚拟节点为VnodeA及VnodeD,虚拟节点VnodeB对应的虚拟节点为VnodeA及VnodeD,虚拟节点VnodeD对应的虚拟节点为VnodeC及VnodeB。在备份策略生成之后,所述节点A将所述备份策略存储至本地内存603及持久性存储设备601中,并同步所述备份策略至第二存储设备700的持久性存储设备701及全局视图中的其他虚拟节点对应的物理节点中。
图3A虚拟集群的建立由一个管理模块执行,在图3A及图4A中以所述管理模块位于第一存储设备为例进行说明,该管理模块生成所述文件系统及全局视图后,可以将所生成的文件系统及全局视图发送给第一存储设备及第二存储设备进行存储。在其他实施例中,所述管理模块也可以位于一个独立的第三方管理设备上,第三方的管理设备在生成所述文件系统及全局视图后,发送给第一存储设备及第二存储设备中的存储,使各个物理节点都可以获取所述全局视图。
在所建立的虚拟集群运行期间,会通过一个监控模块监控第一存储设备及第二存储设备中的虚拟节点的变化,当侦测到所述虚拟集群中增加了新的虚拟节点,或者当虚拟节点被删除,例如虚拟节点对应的物理节点故障,则所述监控模块会通知所述管理模块更新所述全局视图。所述监控模块可以位于所述第三方管理设备上,也可以位于第一存储设备及第二存储设备中。第一存储设备作为主存储设备,第二存储设备会将监控到的变化发送至第一存储设备,由第一存储设备中的管理模块更新所述全局视图。如此即可完成虚拟节点集群的建立,在虚拟节点集群建立好之后,所述第一存储设备600及所述第二存储设备700即可根据客户端的请求建立文件系统。具体如图4A的流程图所示。
步骤S401,物理节点A接收文件系统创建请求。
第一客户端800可以向第一存储设备600发送所述文件系统创建请求,也可以向第二存储设备700发送文件系统创建请求,如果是第一存储设备600接收所述文件系统创建请求,则由所述物理节点A处理所述文件系统创建请求,如果是第二存储设备700接收所述文件系统创建请求,则第二存储设备700转发所述文件系统创建请求至所述第一存储设备600的物理节点A处理。
步骤S402,物理节点A为所述文件系统设置根目录。
所述主节点在设置所述根目录时,首先生成根目录的标记,一般情况下,根目录的默认标记为“/”,接着为所述根目录分配标识信息及shard ID。由于在主节点创建的shard视图同步到了所有节点,所以,主节点从自己的内存中获取所述shard视图,从中为所述根目录选择shard ID。如图3B所示,所述shard视图中每个虚拟节点都被分配了多个shard ID,所以,为了减少跨网络及跨节点的访问,会优先从物理节点A对应的虚拟节点Vnode A所包括的shard ID中为所述根目录分配shard ID。由于是根目录,shard ID还没有被分配过,则例如可以选择shard 0作为所述根目录的shard ID。
步骤S403,物理节点A发送文件系统的挂载命令至所述第一客户端800。
在所述集群文件系统的根目录生成之后,为了能使第一客户端800访问所述文件系统,物理节点A会将所述文件系统挂载至所述第一客户端800的文件系统中。例如,物理节点A通过mount命令将所述文件系统的根目录提供至第一客户端800,物理节点A在发送mount命令时,会携带所述根目录的参数信息,所述根目录的参数信息即为所述根目录的句柄信息,所述句柄信息中携带了所述根目录的shard ID及标识信息。
步骤S404,所述第一客户端800根据所述挂载命令将所述集群文件系统挂载至所述第一客户端800的文件系统中。
在第一客户端800收到所述文件系统的根目录参数信息后,会在第一客户端的文件系统上生成一个挂载点,并在所述挂载点处记录所述文件系统的根目录的参数信息,所述挂载点为一段存储空间。
如此,第一客户端800除了可以通过第一文件系统608与第一存储设备600进行数据传输外,还可以通过所述文件系统605与第一客户端800进行数据传输。用户可以根据实际需求选择需要访问的文件系统。
步骤S405,物理节点A为所述文件系统分配虚拟卷。
每个新建的文件系统都会被分配一个虚拟卷Vvloume 0,用于写入第一客户端或者第二客户端写入该文件系统的数据。步骤S406,物理节点A为所述虚拟卷创建镜像卷对。
在虚拟卷Vvloume 0建立之后,所述物理节点A首先会基于所述持久性存储介质601创建一个本地卷,例如图2中的第一卷,然后请求第二存储设备700在第二存储设备700中创建所述第一卷的镜像卷,例如图2中的第二卷文件系统文件系统。
步骤S407,物理节点A通过记录所述虚拟卷对应镜像卷对生成刷盘策略。
所生成的刷盘策略如图3B所示的刷盘策略所示,文件系统的虚拟卷对应的镜像卷对(第一卷及第二卷)。根据图3B所示的刷盘策略,可以将内存中缓存的所述文件系统的数据分别存储至所述第一存储设备600的持久性存储介质601及第二存储设备700的持久性存储介质701中,从而保证数据的可靠性。具体如何根据所述刷盘策略将内存中的数据写入持久性存储介质601及持久性存储介质701将在图9中做详细描述。
在生成刷盘策略后,物理节点A将所述文件系统刷盘策略存储至本地内存603及持久性存储设备601,并将其同步至第二存储设备700的持久性存储设备701及全局视图中的其他虚拟节点对应的物理节点中。
通过执行图3A及图4B的方法,即可完成Active-Active的双活文件系统的创建,文件系统创建完成的双活存储系统的示意图如图4B所示,即在第一存储设备和第二存储设备之上生成跨设备的文件系统、虚拟卷、Shard视图,及全局视图。
在在Active-Active的双活存储系统创建完成之后,即可基于所述文件系统进行目录及文件的创建与访问。
首先,结合图5所示的流程图说明在所述文件系统下创建目录的过程。下面以所述根目录为父目录,将待创建的目录作为所述父目录的子目录进行介绍。本申请实施例中,用户可以通过第一客户端访问第一存储设备创建所述子目录,也可以通过第二客户端访问第二存储设备创建所述子目录。在第一存储设备将所述文件系统挂载至所述第一客户端时,即建立了第一客户端访问所述文件系统的路径,例如第一存储设备通过物理节点A将所述文件系统挂载至所述第一客户端,则第一客户端则会通过所述物理节点A访问所述文件系统。为了实现Active-Active的双活访问,第二存储设备也会将所述述文件系统挂载至所述第二客户端的文件系统,如此则建立了第二客户端访问所述文件系统的路径,第二客户端访问所述文件系统的请求会被发送至挂载所述文件系统的物理节点,例如物理节点C。下面以用于通过第二客户端发送子目录创建请求至第二存储设备为例说明子目录的创建过程。
具体创建过程如图5的流程图所示。
步骤S501,所述第二客户端发送子目录创建请求至物理节点C。
所述物理节点C为第二存储设备700的主节点,也就是挂载所述文件系统至所述第二客户端的节点。所述子目录创建请求包括父目录的参数信息及子目录的名称。
步骤S502,物理节点C接收第二客户端发送的创建请求,根据所述创建请求为所述子目录生成参数信息。
所述参数信息包括父目录的标识信息及shard ID,所述标识信息用于唯一标识所述子目录,所述标识信息例如为NFS文件系统中的对象ID。在生成ShardID时,所述物理节点C查找所述shard视图,从所述shard视图内记录的Shard ID中为所述子目录分配一个shard ID,然后在该shard ID所归属的虚拟节点对应的物理节点中创建所述子目录。需要说明的是,每个目录可以分配一个shard ID,但是一个shard ID可以被分配给多个目录。在本申请实施例中,为了减少数据的转发,会在接收到所述子目录请求的物理节点对应的虚拟节点的shard ID中为所述子目录分配shard ID,即在物理节点C对应的虚拟节点Vnode C中所对应的Shard[2048,3071]中为所述子目录分配shard ID。但在所述虚拟节点Vnode C中shard ID对应的目录超过预设阈值的时候,则会为所述子目录分配其他虚拟节点对应的shard ID。
步骤S503,物理节点C创建所述子目录。
创建所述子目录包括为所述子目录生成目录入口表(directory entry table,DET)及Inode表。所述目录入口表用于记录所述子目录创建成功之后,所述子目录作为父目录,在其下所建立的子目录或者文件的参数信息,所述参数信息例如包括子目录的名称,目录或者文件的标识信息及shard ID等。
Inode表用于记录后续在所述子目录中所创建的文件的详细信息,例如文件的文件长度、用户对文件的操作权限、文件的修改时间等信息。
步骤S504,物理节点C根据所述父目录的参数信息确定所述父目录所归属的第一存储设备中的物理节点B。
所述父目录的参数信息中包括Shard ID,通过在所述Shard视图可以确定所述Shard ID对应的虚拟节点为虚拟节点Vnode B,则进一步根据所述虚拟节点Vnode B确定所述虚拟节点Vnode B对应的物理节点为所述第一存储设备中的物理节点B。
步骤S505,物理节点C将所述子目录的参数信息及所述父目录的参数信息发送至所述物理节点B。
步骤S506,物理节点B根据所述父目录的参数信息找到所述父目录的目录入口表。
具体地,可根据所述父目录的参数信息中的shard ID及父目录名称找到所述父目录。
步骤S507,物理节点B将所述子目录的参数信息记录在所述父目录的目录入口表中。
步骤S508,物理节点B将所述子目录的参数首先返回给所述物理节点C,物理节点C再将所述子目录的参数返回给所述第二客户端。
在文件系统中访问文件,例如文件读取或者文件创建的过程中,由于文件创建在目录下,所述首先要查找到目录,才能进一步对该目录下的文件进行访问,如果所访问的文件在多层目录下,则需要逐层查询目录,直到查找到最底层的目录。例如,对于多层目录filesysterm1/user1/favorite,由于根目录的参数信息已记录在所述第一客户端的文件系统中,所以所述客户端首先会根据根目录filesystem1的参数信息查询所述子目录user1的参数信息,即生成查询所述user1的请求,等查询到所述user1的参数信息后,再根据所述user1的参数信息查询所述favorite的参数信息,即生成查询所述favorite的请求。每个层级的目录的参数信息的查询方法相同,下面以上层目录为父目录,待查询的目录为子目录为例,说明一次目录查询的过程。本申请实施例中仍然以第二存储设备的物理节点C接收到查询请求为例进行说明
步骤S601,第二客户端发送子目录的查询请求至物理节点C。
所述查询请求中携带所述父目录的参数信息及子目录的名称。所述父目录的参数信息例如为所述父目录句柄。在所述父目录为根目录时,则从所述客户端的文件系统中获取所述根目录的句柄。当所述父目录为不是根目录时,则可通过查询所述父目录的查询请求查询到所述父目录的句柄。
所述父目录的句柄中包括所述父目录的标识信息及shardID。
步骤S602,物理节点C接收第二客户端发送的查询请求,根据所述查询请求确定所述父目录所归属的物理节点B。
物理节点C从所述根目录的参数信息中获取所述根目录的shard ID,根据所述shard ID获取所述父目录所归属的虚拟节点。
由于物理节点A将创建的shard视图同步到了所有节点,所以,物理节点C从自己的内存中获取所述shard视图,根据所述父目录的Shard ID确定所述父目录所归属的虚拟节点,再确定所述虚拟节点对应的物理节点。
步骤S603,物理节点C将所述父目录的参数及所述子目录的名称发送至所述父目录所在的物理节点B。
步骤S604,物理节点B根据所述父目录的参数确定所述父目录的目录入口表。
请参考图5的描述,在物理节点B创建所述父目录时,会为所述父目录创建目录入口表,所述目录入口表中记录了在所述父目录下创建的所有子目录的参数信息。
步骤S605,物理节点B从所述父目录的目录入口表中获取所述子目录的参数信息。
步骤S606,物理节点B将所述子目录的参数信息返回给所述物理节点C。
步骤S607,物理节点C将所述子目录的参数信息返回给所述第二客户端。
图5及图6以第二客户端访问第二存储设备并在文件系统中创建子目录及查询子目录的为例进行说明,但在实际应用中,第一客户端也可以通过访问所述第一存储设备进行所述子目录的创建及查询。
在查询到子目录或者创建了新的子目录后,第一客户端或者第二客户端可以获取所述子目录的参数信息,然后可根据所述子目录的参数信息在所述子目录中创建文件。下面以用户 通过第一客户端访问第一存储设备,并在子目录中创建文件的过程进行说明,具体如图7所示。
步骤S701,客户端发送文件生成请求至物理节点A。
所述文件生成请求携带所述子目录的参数信息及文件名。
如图5或图6所述,物理节点A已经将所述子目录的参数信息发送至所述客户端,所以所述客户端需要在所述子目录中创建文件时,可在所述文件查询请求中携带所述子目录的参数信息及所述文件的文件名。
步骤S702,物理节点A在接收到所述文件生成请求后,根据所述子目录的参数信息确定所述子目录所归属的物理节点D。
确定所述子目录所归属的物理节点D的方式与图6中的步骤S602相同在此不再赘述。
步骤S703,物理节点A将所述子目录的参数信息及所述文件名发送至所述物理节点D。
步骤S704,所述物理节点D确定所述文件是否已经创建。
所述物理节点D根据所述子目录的参数中的Shard ID及子目录名称找到所述子目录,然后找到所述子目录对应的DET,并在所述DET中查找所述文件名,如果存在,则说明有相同文件名的文件已经创建,则执行步骤S705,如果不存在,则说明可以在所述子目录中创建所述文件,则执行步骤S706。
步骤S705,所述节点D发送所述文件名已被创建的反馈至节点A,节点A进一步反馈给第一客户端。
第一客户端收到所述反馈消息后,可进一步通过通知消息通知用户,相同文件名的文件已存在,用户可根据该提示信息做进一步操作,例如修改文件名。
步骤S706,所述节点D创建所述文件。
所述节点D创建文件时,为所述文件设置参数,例如分配shard ID,分配文件标识信息,并将shard ID及文件标识信息添加至所述子目录的DET中。如图5中的步骤S503所述,在创建所述子目录时,会为子目录生成Inode表,所述Inode表用于记录在所述子目录下所生成的文件的信息,所以,在本步骤中,在节点D创建了所述文件后,会将所述文件的信息添加至所述子目录中的Inode中。所述文件信息包括文件长度、用户对文件的操作权限、文件的修改时间等信息。
步骤S707,物理节点D反馈所述文件参数。
物理节点D首先将所述反馈信息发送至节点A,节点A则进一步将所述反馈信息反馈给第一客户端。
在步骤S702中,当所述物理节点A确定所述子目录的归属节点为所述物理节点A时,则由物理节点A执行上述步骤S704至S707。
在此需要说明的是,图6中生成的子目录及图7中生成的文件都会根据图3B设定的备份策略备份到对应的备份节点中。
在文件创建好后,用户即可在所述文件写入数据。用户可通过连接至第一存储设备的第一客户端及连接至第二存储设备的第二客户端在所述文件中写入数据。下面以用户通过第一客户端访问第一存储设备,对所述文件进行数据写入的过程为例进行说明,具体如图8所示。
步骤S801,物理节点A接收对所述文件的写入请求。
在本申请实施例中,由于任一节点都存储有文件系统,所以用户可以通过连接至任一节点的客户端访问所述文件系统中的文件。
所述写入请求中携带所述文件的地址信息,所述地址信息包括所述文件的参数信息、偏移地址及待写数据。在本申请实施例中,所述文件的参数信息即为所述文件的句柄,包括文 件系统标识、文件标识、及shard ID。
步骤S802,物理节点A根据所述写入请求确定所述文件的归属节点D。
根据所述文件的shard ID确定记录所述文件的归属节点D的方式请参考图6中的步骤S602,在此不再赘述。
步骤S803,所述物理节点A将所述写请求转发至所述物理节点D。
步骤S804,所述物理节点D将对所述文件系统的访问转换为对所述文件系统对应的虚拟卷的访问。
由于每个物理节点中都记录了为所述文件系统创建的虚拟卷,则所述物理节点D将所述写请求中的文件系统的标识替换为所述虚拟卷的标识。
步骤S805,所述物理节点D根据所述写请求中的文件标识及shard ID找到所述文件并更新所述文件的信息。
所述物理节点D在根据所述文件标识及shard ID找到所述文件之后,再根据所述文件标识中包含有所述文件的Inode号,在所述Inode中找到所述文件对应的inode项,并在其中记录文件的信息,例如根据写请求中携带的待写数据的长度及偏移地址,更新所述文件的长度及偏移地址并将当前时间记录为文件的更新时间。
步骤S806,所述物理节点D根据预设的备份策略,对所述待写数据进行多副本写入。
在建立所述虚拟文件集群时,为所述文件系统建立了备份策略,所述备份策略中为每个节点设定了备份节点,例如,根据图3B设置的备份策略,可以确定所述物理节点D的备份节点为物理节点C及物理节点B,则所述物理节点D在将所述待写数据写入本地内存的同时,将所述待写数据发送至物理节点C及物理节点B,由所述物理节点C及所述物理节点B将所述待写数据写入自己的内存中。
步骤S807,所述物理节点D在确定多副本写入完成后,返回写请求完成的消息至第一客户端。
步骤S808,所述物理节点D对所述待写数据进行持久化存储。
如图3B所示的刷盘策略,文件系统的虚拟卷对应了镜像卷对:第一存储设备中的第一卷及第二存储设备中的第二卷。所述物理节点D根据预设的内存淘汰算法,在确定需要将所述待写数据淘汰至持久性存储,也即刷盘时,先根据所述待写数据中的地址中记录的所述虚拟卷从所述刷盘策略中获取所述虚拟卷对应的镜像卷对,即第一存储设备中的第一卷及第二存储设备中的第二卷,然后将所述第二存储设备的内存中的待写数据写入所述持久性存储701中所述第二卷对应的物理空间中,然后将所述待写数据的的内存地址发送至所述第一存储设备中所述物理节点D对应的备份节点B,物理节点B根据所述内存地址将所述物理节点B的内存中存储的待写数据写入第一存储设备的持久性存储601中所述第一卷对应的物理空间中。
如图9所示,为本申请实施例中对文件进行读取的方法的流程图。
在本申请实施例中,用户同样可以通过任一客户端访问所述文件系统中的文件。本实施例以用户通过第二客户端读取文件为例进行说明。
步骤S901,物理节点C接收文件的读请求。
所述读请求中携带文件的地址信息,所述地址信息包括文件的参数信息、及偏移地址,所述参数信息即为所述文件的句柄,包括文件系统标识、文件标识、及shard ID。在第二客户端发送所述读请求时,已经根据图6所示的方法获取了文件的参数信息。
步骤S902,物理节点C根据所述读请求确定所述文件的归属节点B。
关于文件的归属节点B的确认方式请参考图6中的步骤S602的描述,在此不再赘述。
步骤S903,物理节点C将所述读请求转发至所述归属节点B。
步骤S904,物理节点B将所述读请求对所述文件系统的访问转换为对所述文件系统的虚拟卷的访问。
步骤S905,所述物理节点B根据所述读请求中的地址,从所述物理节点B的内存中读取所述文件。
步骤S906,所述物理节点B返回所述文件。
步骤S907,当所述文件不在所述内存中时,所述物理节点B根据所述刷盘策略中所述虚拟卷对应的第一存储设备中的第一卷,从所述持久性存储601中读取所述文件,并返回给所述物理节点C,所述物理节点C再将所述文件返回给所述第二客户端。
本申请实施例中,第一存储设备及第二存储设备在进行文件及目录的访问的时候,都是基于shard ID将访问请求转发至文件及目录的归属节点,这样,会导致跨设备进行数据的访问,从而影响访问效率。在本申请实施例提供的一种可能的实现方式中,由于第一存储设备及第二存储设备都备份了对端的数据,所以在收到访问对端数据的访问请求时,可以从本端备份的对端的备份数据中获取需要访问的数据,而无需从对端获取所访问的数据,从而提高了数据的访问效率。
当所述Active-Active双活存储系统中的一个存储设备故障后,则可以通过备份数据接管故障存储设备的业务。如图10所示,在第一存储设备及第二存储设备之间的链路断开,或者第二存储设备故障后,可以通过第一存储设备中存储的第二存储设备的备份数据接管第二存储设备的业务。下面以在第一存储设备及第二存储设备之间的链路断开为例进行说明。具体如图11所示的流程图所示。
步骤S111,第一存储设备和第二存储设备同时侦测对端的心跳。
步骤S112,在侦测不到对端心跳时,第一存储设备和第二存储设备挂起正在执行的业务。
挂起业务,即停止正在执行的访问请求。
步骤S113,第一存储设备和第二存储设备修改全局视图及文件系统。
当侦测不到对端心跳时,则第一存储设备和第二存储设备就要为接管对端的业务做准备,则会对全局视图及文件系统进行修改,会将全局视图中对端的物理节点对应的虚拟节点从全局视图中删除,将备份策略中对端的备份节点删除。例如第一存储设备将全局视图修改为(Vnode A,Vnode B),第二存储设备将全局视图修改为(Vnode C,Vnode D)。另外将文件系统中的shard视图中对端节点对应的虚拟节点的shard修改为本端节点对应的虚拟节点对应的shard。例如,将第一存储设备中的shard视图修改为Vnode A[0,2047],Vnode B[2048,4095];将第二存储设备中的shard视图修改为Vnode C[0,2047],Vnode D[2048,4095];同时将刷盘策略对端节点的卷删除。
步骤S114,第一存储设备及第二存储设备都向仲裁设备发送仲裁请求。
步骤S115,仲裁设备仲裁第一存储设备接管业务。
仲裁设备可根据收到仲裁请求的先后顺序确定接管业务的设备,例如,先收到的仲裁请求对应的存储设备作为接管业务的设备。
步骤S116,仲裁设备将仲裁结果分别通知到第一存储设备和第二存储设备。
步骤S117,第二存储设备接收到所述通知后,解除与第二客户端的连接,即停止业务的执行。
步骤S118,第一存储设备接收到所述通知后,将第二存储阵列的IP地址漂移至第一存储设备,并建立与第二客户端的连接。
步骤S119,通过第二存储阵列的备份数据接管第二存储阵列的业务。
由于所述第一存储设备中存储了所述第二存储阵列的备份数据,所以在第一存储设备接收到所述第一存储设备中的数据的访问时,可以通过访问请求中的shard ID将第一客户端及第二客户端对第二存储设备中的数据的访问请求定位到对所述备份数据的访问,从而使第一客户端及第二客户端感知不到链路中断。
在数据访问的过程中,由于备份策略及刷盘策略的更改,写入的数据则只会写入第一存储设备的节点中的内存,并且只在第一存储设备的卷中存储。
以上对本申请实施例所提供方案进行了描述,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。

Claims (22)

  1. 一种双活存储系统,其特征在于,包括第一存储设备和第二存储设备,所述第一存储设备用于接收客户端集群发送给文件系统的第一文件的数据,存储所述第一文件的数据,并且将所述第一文件的数据的第一副本数据发送给所述第二存储设备;
    所述第二存储设备用于接收所述客户端集群发送给所述文件系统的第二文件的数据,存储所述第二文件的数据,并且将所述第二文件的第二副本数据发送给所述第一存储设备。
  2. 根据权利要求1所述的系统,其特征在于,所述双活存储系统还包括虚拟节点集合,所述虚拟节点集合包括多个虚拟节点,每个虚拟节点分配有计算资源,所述计算资源来源于所述第一存储设备或所述第二存储设备中的物理节点。
  3. 根据权利要求2所述的系统,其特征在于,所述双活存储系统还包括管理设备,所述管理设备还用于创建全局视图,所述全局视图用于记录每个虚拟节点与其分配的计算资源之间的对应关系;
    所述管理设备还用于将所述全局视图发送给所述第一存储设备及所述第二存储设备;
    所述第一存储设备还用于保存所述全局视图;
    所述第二存储设备还用于保存所述全局视图。
  4. 根据权利要求3所述的系统,其特征在于,所述第一存储设备在存储所述第一文件的数据时具体用于:
    根据所述第一文件的数据的地址确定所述第一文件对应的第一虚拟节点;
    根据所述第一虚拟节点以及所述全局视图确定为所述第一虚拟节点分配的计算资源;
    基于为所述第一虚拟节点分配的计算资源,将所述第一文件的数据发送给所述计算资源对应的物理节点,由所述物理节点将所述第一文件的数据存储至所述物理节点的内存中。
  5. 根据权利要求4所述的系统,其特征在于,所述第一虚拟节点具有至少一个备份虚拟节点,所述第一虚拟节点对应的物理节点与所述备份虚拟节点对应的物理节点位于不同的存储设备中;
    所述第一存储设备还用于:
    确定所述第一虚拟节点对应的备份虚拟节点;
    根据所述备份虚拟节点以及所述全局视图确定所述备份虚拟节点对应的物理节点;
    将所述第一副本数据发送给所述备份虚拟节点对应的物理节点,由所述备份虚拟节点对应的物理节点将所述第一备份数据存储在所述物理节点中。
  6. 根据权利要求2-5任意一项所述的系统,其特征在于,所述文件系统所包括的文件和目录分布在所述虚拟节点集合中的多个虚拟节点对应的物理节点中。
  7. 根据权利要求6所述的系统,其特征在于,所述虚拟节点集合中的每个虚拟节点设置有一个或多个分片标识,所述文件系统中的每个目录及文件分配一个分片标识,所述第一存储设备和第二存储设备中的物理节点根据每个目录及文件的分片标识将所述目录和文件分布 至所述分片标识所属的虚拟节点对应物理节点中。
  8. 根据权利要求7所述的系统,其特征在于,所述第一存储设备中的第一物理节点用于接收所述第一文件的创建请求,从为所述第一物理节点对应的虚拟节点设置的一个或多个分片标识中为所述第一文件选择一个分片标识,在所述存储设备中创建所述第一文件。
  9. 根据权利要求2-8任意一项所述的系统,其特征在于,当所述第二存储设备故障或所述第一存储设备与所述第二存储设备之间的链路断开后,所述第一存储设备还用于基于所述第二文件的第二副本数据恢复所述第二文件,并接管所述客户端集群发送给所述第二存储设备的业务。
  10. 根据权利要求9所述的系统,其特征在于,所述第一存储设备还用于从所述全局视图中删除所述第二存储设备的计算资源对应的虚拟节点。
  11. 根据权利要求1所述的系统,其特征在于,所述第一存储设备还具有第一文件系统,所述第二存储设备还具有第二文件系统。
  12. 一种数据处理方法,应用于双活存储系统,所述双活存储系统包括第一存储设备和第二存储设备,其特征在于,所述方法包括:
    所述第一存储设备接收客户端集群发送给文件系统的第一文件的数据,存储所述第一文件的数据,并且将所述第一文件的数据的第一副本数据发送给所述第二存储设备;
    所述第二存储设备接收所述客户端集群发送给所述文件系统的第二文件的数据,存储所述第二文件的数据,并且将所述第二文件的第二副本数据发送给所述第一存储设备。
  13. 根据权利要求12所述的方法,其特征在于,所述双活存储系统还包括虚拟节点集合,所述虚拟节点集合包括多个虚拟节点,每个虚拟节点分配有计算资源,所述计算资源来源于所述第一存储设备或所述第二存储设备中的物理节点。
  14. 根据权利要求13所述的方法,其特征在于,所述双活存储系统还包括管理设备,所述方法还包括:
    所述管理设备创建全局视图,所述全局视图用于记录每个虚拟节点与其分配的计算资源之间的对应关系;
    所述管理设备将所述全局视图发送给所述第一存储设备及所述第二存储设备;
    所述第一存储设备及所述第二存储设备保存所述全局视图。
  15. 根据权利要求14所述的方法,其特征在于,所述第一存储设备在存储所述第一文件的数据时包括:
    根据所述第一文件的数据的地址确定所述第一文件对应的第一虚拟节点;
    根据所述第一虚拟节点以及所述全局视图确定为所述第一虚拟节点分配的计算资源;
    基于为所述第一虚拟节点分配的计算资源,将所述第一文件的数据发送给所述计算资源对应的物理节点,由所述物理节点将所述第一文件的数据存储至所述物理节点的内存中。
  16. 根据权利要求15所述的方法,其特征在于,所述第一虚拟节点具有至少一个备份虚拟节点,所述第一虚拟节点对应的物理节点与所述备份虚拟节点对应的物理节点位于不同的存储设备中;
    所述方法还包括:
    所述第一存储设备确定所述第一虚拟节点对应的备份虚拟节点;
    所述第一存储设备根据所述备份虚拟节点以及所述全局视图确定所述备份虚拟节点对应的物理节点;
    所述第一存储设备将所述第一副本数据发送给所述备份虚拟节点对应的物理节点,由所述备份虚拟节点对应的物理节点将所述第一备份数据存储在所述物理节点中。
  17. 根据权利要求13-16任意一项所述的系统,其特征在于,所述文件系统所包括的文件和目录分布在所述虚拟节点集合中的多个虚拟节点对应的物理节点中。
  18. 根据权利要求17所述的方法,其特征在于,所述虚拟节点集合中的每个虚拟节点设置有一个或多个分片标识,所述文件系统中的每个目录及文件分配一个分片标识,所述第一存储设备和第二存储设备中的物理节点根据每个目录及文件的分片标识将所述目录和文件分布至所述分片标识所属的虚拟节点对应物理节点中。
  19. 根据权利要求18所述的方法,其特征在于,所述方法还包括:
    所述第一存储设备中的第一物理节点接收所述第一文件的创建请求,从为所述第一物理节点对应的虚拟节点设置的一个或多个分片标识中为所述第一文件选择一个分片标识,在所述第一存储设备中创建所述第一文件。
  20. 根据权利要求13-19任意一项所述的方法,其特征在于,所述方法还包括:当所述第二存储设备故障或与所述第一存储设备与所述第二存储设备之间的链路断开后,所述第一存储设备基于所述第二文件的第二副本数据恢复所述第二文件,并接管所述客户端集群发送给所述第二存储设备的业务。
  21. 根据权利要求20所述的方法,其特征在于,所述方法还包括所述第一存储设备从所述全局视图中删除所述第二存储设备的计算资源对应的虚拟节点。
  22. 根据权利要求13-21任意一项所述的方法,其特征在于,所述第一存储设备还具有第一文件系统,所述第二存储设备还具有第二文件系统。
PCT/CN2021/117843 2020-09-11 2021-09-11 一种双活存储系统及其处理数据的方法 WO2022053033A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
JP2023516240A JP2023541069A (ja) 2020-09-11 2021-09-11 アクティブ-アクティブストレージシステムおよびそのデータ処理方法
BR112023003725A BR112023003725A2 (pt) 2020-09-11 2021-09-11 Sistema de armazenamento ativo-ativo e método de processamento de dados do mesmo
EP21866085.0A EP4198701A4 (en) 2020-09-11 2021-09-11 ACTIVE-ACTIVE STORAGE SYSTEM AND DATA PROCESSING METHOD BASED THEREON
US18/178,541 US20230205638A1 (en) 2020-09-11 2023-03-06 Active-active storage system and data processing method thereof

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN202010955301 2020-09-11
CN202010955301.5 2020-09-11
CN202011628940.7A CN114168066A (zh) 2020-09-11 2020-12-30 一种双活存储系统及其处理数据的方法
CN202011628940.7 2020-12-30

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/178,541 Continuation US20230205638A1 (en) 2020-09-11 2023-03-06 Active-active storage system and data processing method thereof

Publications (1)

Publication Number Publication Date
WO2022053033A1 true WO2022053033A1 (zh) 2022-03-17

Family

ID=76011613

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/117843 WO2022053033A1 (zh) 2020-09-11 2021-09-11 一种双活存储系统及其处理数据的方法

Country Status (6)

Country Link
US (1) US20230205638A1 (zh)
EP (1) EP4198701A4 (zh)
JP (1) JP2023541069A (zh)
CN (2) CN116466876A (zh)
BR (1) BR112023003725A2 (zh)
WO (1) WO2022053033A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116466876A (zh) * 2020-09-11 2023-07-21 华为技术有限公司 一种存储系统及数据处理方法

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103827843A (zh) * 2013-11-28 2014-05-28 华为技术有限公司 一种写数据方法、装置和系统
CN106133676A (zh) * 2014-04-21 2016-11-16 株式会社日立制作所 存储系统
CN107220104A (zh) * 2017-05-27 2017-09-29 郑州云海信息技术有限公司 一种虚拟机备灾方法和装置
CN109964208A (zh) * 2017-10-25 2019-07-02 华为技术有限公司 一种双活存储系统和地址分配方法
WO2020133473A1 (zh) * 2018-12-29 2020-07-02 华为技术有限公司 一种备份数据的方法、装置和系统
CN111542812A (zh) * 2018-01-04 2020-08-14 慧与发展有限责任合伙企业 基于虚拟节点资源的增强型高速缓存存储器分配
CN112860480A (zh) * 2020-09-11 2021-05-28 华为技术有限公司 一种双活存储系统及其处理数据的方法

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8346719B2 (en) * 2007-05-17 2013-01-01 Novell, Inc. Multi-node replication systems, devices and methods
CN102158546B (zh) * 2011-02-28 2013-05-08 中国科学院计算技术研究所 一种集群文件系统及其文件服务方法
US9582559B1 (en) * 2012-06-29 2017-02-28 EMC IP Holding Company LLC Multi-site storage system with replicated file system synchronization utilizing virtual block storage appliances
US9880777B1 (en) * 2013-12-23 2018-01-30 EMC IP Holding Company LLC Embedded synchronous replication for block and file objects
US9069783B1 (en) * 2013-12-31 2015-06-30 Emc Corporation Active-active scale-out for unified data path architecture
US9430480B1 (en) * 2013-12-31 2016-08-30 Emc Corporation Active-active metro-cluster scale-out for unified data path architecture
SG11201701365XA (en) * 2014-09-01 2017-03-30 Huawei Tech Co Ltd File access method and apparatus, and storage system
CN106909307B (zh) * 2015-12-22 2020-01-03 华为技术有限公司 一种管理双活存储阵列的方法及装置
CN108345515A (zh) * 2017-01-22 2018-07-31 中国移动通信集团四川有限公司 存储方法和装置及其存储系统
US10521344B1 (en) * 2017-03-10 2019-12-31 Pure Storage, Inc. Servicing input/output (‘I/O’) operations directed to a dataset that is synchronized across a plurality of storage systems
CN108958984B (zh) * 2018-08-13 2022-02-11 深圳市证通电子股份有限公司 基于ceph的双活同步在线热备方法

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103827843A (zh) * 2013-11-28 2014-05-28 华为技术有限公司 一种写数据方法、装置和系统
CN106133676A (zh) * 2014-04-21 2016-11-16 株式会社日立制作所 存储系统
CN107220104A (zh) * 2017-05-27 2017-09-29 郑州云海信息技术有限公司 一种虚拟机备灾方法和装置
CN109964208A (zh) * 2017-10-25 2019-07-02 华为技术有限公司 一种双活存储系统和地址分配方法
CN111542812A (zh) * 2018-01-04 2020-08-14 慧与发展有限责任合伙企业 基于虚拟节点资源的增强型高速缓存存储器分配
WO2020133473A1 (zh) * 2018-12-29 2020-07-02 华为技术有限公司 一种备份数据的方法、装置和系统
CN112860480A (zh) * 2020-09-11 2021-05-28 华为技术有限公司 一种双活存储系统及其处理数据的方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4198701A4 *

Also Published As

Publication number Publication date
CN112860480A (zh) 2021-05-28
CN116466876A (zh) 2023-07-21
EP4198701A4 (en) 2024-01-10
EP4198701A1 (en) 2023-06-21
JP2023541069A (ja) 2023-09-27
US20230205638A1 (en) 2023-06-29
CN112860480B (zh) 2022-09-09
BR112023003725A2 (pt) 2023-03-28

Similar Documents

Publication Publication Date Title
US8301654B2 (en) Geographical distributed storage system based on hierarchical peer to peer architecture
US20200081807A1 (en) Implementing automatic switchover
JP5047165B2 (ja) 仮想化ネットワークストレージシステム、ネットワークストレージ装置及びその仮想化方法
US9940154B2 (en) Storage virtual machine relocation
US8046422B2 (en) Automatic load spreading in a clustered network storage system
US9378258B2 (en) Method and system for transparently replacing nodes of a clustered storage system
JP5066415B2 (ja) ファイルシステム仮想化のための方法および装置
TWI467370B (zh) 執行儲存虛擬化之儲存子系統及儲存系統架構及其方法
US8429360B1 (en) Method and system for efficient migration of a storage object between storage servers based on an ancestry of the storage object in a network storage system
US7836017B1 (en) File replication in a distributed segmented file system
WO2006089479A1 (fr) Méthode de gestion de données dans un système de stockage en réseau et système de stockage en réseau reposant sur la méthode
US10320905B2 (en) Highly available network filer super cluster
US8756338B1 (en) Storage server with embedded communication agent
US8924656B1 (en) Storage environment with symmetric frontend and asymmetric backend
CN109407975B (zh) 写数据方法与计算节点以及分布式存储系统
US20230359564A1 (en) Methods and Systems for Managing Race Conditions During Usage of a Remote Storage Location Cache in a Networked Storage System
US20240103744A1 (en) Block allocation for persistent memory during aggregate transition
US11343308B2 (en) Reduction of adjacent rack traffic in multi-rack distributed object storage systems
US20230205638A1 (en) Active-active storage system and data processing method thereof
US11216204B2 (en) Degraded redundant metadata, DRuM, technique
US11194501B2 (en) Standby copies withstand cascading fails
CN111868704A (zh) 用于加速存储介质访问的方法及其设备
JP6697101B2 (ja) 情報処理システム
WO2012046585A1 (ja) 分散ストレージシステム、その制御方法、およびプログラム
CN114168066A (zh) 一种双活存储系统及其处理数据的方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21866085

Country of ref document: EP

Kind code of ref document: A1

REG Reference to national code

Ref country code: BR

Ref legal event code: B01A

Ref document number: 112023003725

Country of ref document: BR

ENP Entry into the national phase

Ref document number: 2023516240

Country of ref document: JP

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2021866085

Country of ref document: EP

Effective date: 20230315

ENP Entry into the national phase

Ref document number: 112023003725

Country of ref document: BR

Kind code of ref document: A2

Effective date: 20230228

NENP Non-entry into the national phase

Ref country code: DE