US20220342566A1 - Storage system with passive witness node - Google Patents
Storage system with passive witness node Download PDFInfo
- Publication number
- US20220342566A1 US20220342566A1 US17/238,615 US202117238615A US2022342566A1 US 20220342566 A1 US20220342566 A1 US 20220342566A1 US 202117238615 A US202117238615 A US 202117238615A US 2022342566 A1 US2022342566 A1 US 2022342566A1
- Authority
- US
- United States
- Prior art keywords
- storage array
- designated
- configuration setting
- locally
- preferred
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0629—Configuration or reconfiguration of storage systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/0614—Improving the reliability of storage systems
- G06F3/0617—Improving the reliability of storage systems in relation to availability
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0706—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
- G06F11/0727—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a storage system, e.g. in a DASD or network based storage system
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0766—Error or fault reporting or storing
- G06F11/0772—Means for error signaling, e.g. using interrupts, exception flags, dedicated error registers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0653—Monitoring storage devices or systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0655—Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0655—Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
- G06F3/0659—Command handling arrangements, e.g. command buffers, queues, command scheduling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/067—Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
Definitions
- a distributed storage system may include a plurality of storage devices (e.g., storage arrays) to provide data storage to a plurality of nodes.
- the plurality of storage devices and the plurality of nodes may be situated in the same physical location, or in one or more physically remote locations.
- the plurality of nodes may be coupled to the storage devices by a high-speed interconnect, such as a switch fabric.
- a method for use in a first storage array comprising: detecting whether a second storage array has designated the first storage array as a locally-preferred storage array, the detecting being performed when a first link between the second storage array and a witness node is down; setting a value of a first configuration setting to indicate that the first storage array is designated as a system-preferred storage array, the value of the first configuration setting being stored in a memory of the first storage array, the value of the first configuration setting being set only when the second storage array has designated the first storage array as a locally-preferred storage array; detecting, by the first storage array, whether a second link between the first storage array and the second storage array is down; and when the second link is down, assuming one of an active role or a passive role based, at least in part, on the value of the first configuration setting.
- a storage array comprising: a memory configured to store a first configuration setting and a second configuration setting; and at least one processor operatively coupled to the memory, the at least one processor being configured to perform the operations of: detecting whether a peer storage array has designated the storage array as a locally-preferred storage array, the detecting being performed when a first link between the peer storage array and a witness node is down; setting a value of the first configuration setting to indicate that the storage array is designated as a system-preferred storage array, the value of the first configuration setting being stored in a memory of the storage array, the value of the first configuration setting being set only when the peer storage array has designated the storage array as a locally-preferred storage array; detecting, by the storage array, whether a second link between the storage array and the peer storage array is down; and when the second link is down, assuming one of an active role or a passive role based, at least in part, on the value of the first configuration setting.
- a non-transitory computer-readable medium stores one or more processor-executable instructions, which, when executed by at least one processor of a first storage array, cause the at least one processor to perform the operations of: detecting whether a second storage array has designated the first storage array as a locally-preferred storage array, the detecting being performed when a first link between the second storage array and a witness node is down; setting a value of a first configuration setting to indicate that the first storage array is designated as a system-preferred storage array, the value of the first configuration setting being stored in a memory of the first storage array, the value of the first configuration setting being set only when the second storage array has designated the first storage array as a locally-preferred storage array; detecting, by the first storage array, whether a second link between the first storage array and the second storage array is down; and when the second link is down, assuming one of an active role or a passive role based, at least in part, on the value of the first configuration setting.
- FIG. 1 is a diagram of an example of a storage system, according to aspects of the disclosure.
- FIG. 2 is a state diagram illustrating the operation of the storage system of FIG. 1 , according to aspects of the disclosure
- FIG. 3 is a diagram illustrating examples of status messages, according to aspects of the disclosure.
- FIG. 4A is a diagram of an example of a state table, according to aspects of the disclosure.
- FIG. 4B is a diagram of an example of a state table, according to aspects of the disclosure.
- FIG. 4C is a diagram of an example of a state table, according to aspects of the disclosure.
- FIG. 5 is a flowchart of an example of a process, according to aspects of the disclosure.
- FIG. 6 is a flowchart of an example of a process, according to aspects of the disclosure.
- FIG. 7 is a flowchart of an example of a process, according to aspects of the disclosure.
- FIG. 8 is a flowchart of an example of a process, according to aspects of the disclosure.
- FIG. 9 is a diagram of an example of a computing device, according to aspects of the disclosure.
- FIG. 1 is a diagram of an example of a storage system 100 , according to aspects of the disclosure.
- the storage system 100 may include a storage array 101 A, a storage array 101 B, and a witness node 109 .
- the storage array 101 A may be configured to maintain a volume instance 102 A.
- the storage array 101 B may be configured to maintain a volume instance 102 B.
- Volume instances 102 A and 102 B are instances of the same volume.
- Both of the storage arrays 101 A and 101 B may be configured to service write requests to the volume, when they are active. For example, to service a write request, storage array 101 A may store data associated with the write request in volume instance 102 A, and then transmit the data over link 112 to storage array 101 B, after which the transmitted data is stored in volume instance 102 B.
- volume instance 102 A when the state of volume instance 102 A is changed (e.g., by storing or deleting data), this change is propagated to volume instance 102 B, over link 112 , in order to keep volume instances 102 A and 102 B consistent with one another.
- storage array 101 B may store data associated with the write request in volume instance 102 B, and then transmit the data over link 112 to storage array 101 A, after which the transmitted data is stored in volume instance 102 A.
- this change when the state of volume instance 102 B is changed (e.g., by storing or deleting data), this change is propagated to volume instance 102 A, over link 112 , in order to keep volume instances 102 A and 102 B consistent with one another.
- Maintaining a consistent state between volume instances is important for the operation of the storage system 100 . If volume instances 102 A and 102 B are not consistent with one another, this could lead to client devices receiving erroneous data.
- the maintenance of a consistent state is performed over link 112 .
- link 112 When link 112 is down, volume instances 102 A and 102 B can no longer be synchronized with one another, and they are prevented from maintaining a consistent state. Accordingly, when the link 112 is down, one of the storage arrays 101 A-B may assume a passive role (and stop serving IO requests), and the other one of the storage arrays 101 A-B may assume an active role (and continue serving IO requests).
- the examples that follow illustrate a technique that enables the storage arrays 101 A-B to choose their respective roles when link 112 fails.
- the technique allows one of storage arrays 101 A-B to choose a passive role and the other to choose an active role.
- it guarantees (e.g., at least under most circumstances) that only one of storage arrays 101 A-B will choose an active role and only one of the storage arrays 101 A-B will choose a passive role, thereby preventing a situation in which both storage arrays have chosen the same role.
- the technique allows each of storage arrays 101 A-B to chose its respective role autonomously of the other.
- Assuming an active role by storage array 101 A may include one or more of: (i) continuing to service IO requests, (ii) transitioning into a state in which data that is written to volume instance 102 A is not synchronously transmitted to storage array 101 B, or (iii) maintaining a record of deletions performed on volume instance 102 A, so that those deletions can be synchronized into volume instance 102 B at a later time, (iv) marking data that is written to volume instance 102 A (after the storage array 101 A has assumed an active role), so that the data can be synchronized (e.g., copied) into volume instance 102 B at a later time.
- Assuming an active role by storage array 101 B may include one or more of: (i) continuing to service IO requests, (ii) transitioning into a state in which data that is written to volume instance 102 B a is not synchronously transmitted to storage array 101 A, or (iii) maintaining a record of deletions performed on volume instance 102 B, so that those deletions can be synchronized into volume instance 102 A at a later time, (iv) marking data that is written to volume instance 102 B (after the storage array 101 B has assumed an active role), so that the data can be synchronized (e.g., copied) into volume instance 102 B at a later time.
- Assuming a passive role may include stopping to service incoming IO requests.
- a storage array when a storage array assumes a passive role, it may also transmit a message to a multipath client, or a host device, indicating that the storage array is not currently serving IO requests.
- an IO request may include a read request, a write request, a delete request, etc.
- storage arrays 101 A-B are serving IO requests, they are both “active.”
- the phrase “assuming an active role” refers to the situation in which one of storage arrays 101 A-B continues to service IO requests, while believing that the other storage array is unavailable and/or not serving TO requests.
- the storage array 101 A may include one or more computing devices, such as the computing device 900 , which is shown in FIG. 9 .
- the storage array 101 A may store a locally-preferred configuration setting 103 A, a system-preferred configuration setting 104 A, a user-preferred configuration setting 106 A, and a state table 108 A.
- the storage array 101 B may include one or more computing devices, such as the computing device 900 , which is shown in FIG. 9 .
- the storage array 101 B may store a locally-preferred configuration setting 103 B, a system-preferred configuration setting 104 B, a user-preferred configuration setting 106 B, and a state table 108 B.
- Each of the user-preferred configuration settings 106 A-B includes a configuration setting that specifies which one of storage arrays 101 A and 101 B will assume an active role in the event of a failure of link 112 .
- the user-preferred configuration settings 106 A-B are specified by the user (e.g., a system administrator), whereas the system-preferred configuration settings 104 A-B are determined dynamically by the storage system 100 .
- the user-preferred configuration settings 106 A-B are stored in the memory of storage arrays 101 A-B before run-time, whereas the value of system-preferred configuration settings 104 A-B is determined at runtime.
- Each of system-preferred configuration settings 104 A-B may be determined at run-time by the storage system 100 .
- Each of the system-preferred configuration settings 104 A-B specifies which storage array in the storage system 100 will assume an active role when link 112 A fails.
- Each of system-preferred configuration settings 104 A-B may have a value that is selected from the set ⁇ NONE, ARRAY_A, ARRAY_B ⁇ . The value of “NONE” indicates that neither storage array 101 A nor storage array 101 B is designated to assume an active role in the event of a failure of link 112 .
- ARRAY_A indicates that storage array 101 A is designated to assume an active role in the event of a failure of link 112 (i.e., it indicates that storage array 101 A is designated as a system-preferred storage array).
- ARRAY_B indicates that storage array 101 B is designated to assume an active role in the event of a failure of link 112 (i.e., it indicates that storage array 101 B is designated as a system-preferred storage array).
- system-preferred configuration setting 104 A is stored in the memory of storage array 101 A, and it is isolated from storage array 101 B.
- system-preferred configuration setting 104 B is stored in the memory of storage array 101 B, and it is isolated from storage array 101 A.
- system-preferred configuration settings 104 A-B are initially set to NONE, and they are subsequently updated to identify one of the storage arrays 101 A-B as the storage array that would assume an active role in the event of a failure of the link 112 .
- the value of system-preferred configuration setting 104 A is updated by storage array 101 A independently of storage array 101 B, when storage array 101 A is able to confirm that the values of locally-preferred configuration settings 103 A-B are in agreement with one another. In one example, updating the value of system-preferred configuration setting 104 A includes causing the system-preferred configuration setting 104 A to equal the value of locally-preferred configuration setting 103 A.
- the value of system-preferred configuration setting 104 B is updated by storage array 101 B independently of storage array 101 A, when storage array 101 B is able to confirm that the values of locally-preferred configuration settings 103 A-B are in agreement with one another. In one example, updating the value of system-preferred configuration setting 104 B includes causing the system-preferred configuration setting 104 B to equal the value of locally-preferred configuration setting 103 B.
- Locally-preferred configuration setting of 103 A may be determined locally by storage array 101 A, and it may specify which one of storage arrays 101 A and 101 B is preferred to assume an active role in the event of a failure of link 112 .
- Locally-preferred configuration setting 103 B may include a value that is determined locally by storage array 101 B, and it may specify which one of storage arrays 101 A and 101 B is preferred to assume an active role in the event of a failure of link 112 .
- each of locally-preferred configuration settings 103 A-B represents an intermediate (or preliminary) value, which is used in a protocol for setting the values of system-preferred configuration setting 104 A and/or system-preferred configuration setting 104 B.
- the witness node 109 A may include one or more computing devices, such as the computing device 900 , which is discussed further below with respect to FIG. 9 .
- the witness node 109 A may be connected to storage array 101 A via link 114 .
- the witness node 109 A may be connected to storage array 101 B via link 116 .
- the witness node 109 may relay to storage array 101 A information that is transmitted (in status messages 320 ) to the witness node 109 by storage array 101 B.
- the information may include the value of the locally-preferred configuration setting 103 B.
- the witness node 109 may further relay to storage array 101 B information that is transmitted (in status messages 310 ) to the witness node 109 by storage array 101 A.
- the information may include the value of the locally-preferred configuration setting 103 A.
- link may refer to one or more communications channels between two entities (e.g., storage array 101 A, storage array 101 B, and witness node 109 , etc.).
- a link may be UP when an entity is able to transmit information over the link and subsequently receive an acknowledgment that the transmittal has been received at its destination.
- a link may be DOWN when an entity is able to transmit information over the link and subsequently receive an acknowledgment that the information has been received at its destination.
- link 112 may be down when one or more communications networks that are used to establish the link are not operating correctly or when storage array 101 B is unavailable.
- link 112 may be down when one or more communications networks that are used to establish the link are not operating correctly or when storage array 101 A is unavailable.
- link 114 may be DOWN, when one or more communications networks that are used to establish link 114 are not operating correctly or the witness node 109 is unavailable.
- link 114 may be DOWN, when one or more communications networks that are used to establish link 114 are not operating correctly or when storage array 101 A is unavailable.
- link 116 may be DOWN, when one or more communications networks that are used to establish link 116 are not operating correctly or the witness node 109 is unavailable.
- link 116 may be DOWN, when one or more communications networks that are used to establish link 116 are not operating correctly or when storage array 101 B is unavailable.
- any of the links 112 , 114 , and 116 may be implemented by using one or more communications networks, such as the Internet, a local area network (LAN), a wide area network (WAN), an InfiniBand network, etc. It will be understood that the present disclosure is not limited to any specific implementation of any of links 112 , 114 , and 116 and/or any specific method for determining when a link is UP or DOWN.
- the link when a link is DOWN, the link is considered to have failed, irrespective of whether the link being DOWN is caused by a malfunction in one or more communications networks that are used to establish the link or a malfunction in one of the entities that are connected by the link (e.g., one of storage arrays 101 A-B and witness node 109 ).
- FIG. 2 is a state diagram illustrating aspects of the operation of the storage system 100 , according to one example.
- links 112 , 114 , and 116 are UP, and both storage arrays 101 A and 101 B are serving IO requests.
- the storage system 100 may transition into state 204 when at least one of links 114 and/or 116 goes DOWN.
- storage arrays 101 A and 101 B may attempt to set the values of system-preferred configuration settings 104 A-B, respectively.
- at least one of storage arrays 101 A-B may execute a process 700 (shown in FIG. 7 ).
- the storage system 100 may transition from state 204 into state 206 when the attempt is completed.
- state 206 When the storage system 100 is in state 206 , at least one of links 114 and 116 is DOWN, link 112 is UP, and both of storage arrays 101 A and 101 B are serving IO requests.
- the storage system 100 may transition from state 206 to a state 208 when link 112 goes DOWN.
- state 208 When the storage system 100 is in state 208 , one of the storage arrays 101 A-B assumes an active role and the other one of storage arrays 101 A-B assumes a passive role. Each of the storage arrays 101 A-B determines its role independently of the other.
- each (or at least one) of storage arrays 101 A-B may execute a process 800 (shown in FIG. 8 ) to assume its role.
- the storage system 100 may transition from state 208 to state 210 after one of storage arrays 101 A-B has assumed an active role and the other one has assumed a passive role.
- one of storage arrays 101 A-B may operate in an active role (i.e., it may be serving IO requests) and the other one may be in a passive role (i.e., it may not be serving IO requests).
- the storage system 100 may transition from state 210 to a state 212 when all of links 112 , 114 , and 116 are UP again.
- volume instance 102 A and volume instance 102 B are synchronized with each other and brought into a consistent state. After the volume instances 102 A and 102 B are brought into a consistent state, the storage system 100 returns to state 202 .
- the storage system 100 may transition from state 202 to state 208 , when link 112 goes DOWN (and each of links 114 and 116 remains UP) or when link 112 goes DOWN (and at least one of links 114 and 116 goes DOWN). For example, if links 112 and 114 go down at the same time, storage array 101 A may automatically assume a passive role, and storage array 101 B may assume an active role. As another example, if links 112 and 116 go DOWN at the same time, storage array 101 B may assume a passive role, and storage array 101 A may assume an active role.
- FIG. 3 is a diagram illustrating examples of status messages that are transmitted by storage array 101 A, storage array 101 B, and the witness node 109 .
- Message 310 is an example of a status message that is transmitted by storage array 101 A to storage array 101 B and the witness node 109 .
- status message 310 may include fields 312 , 314 , and 316 .
- Field 312 may identify the status of link 112 . Specifically, field 312 may indicate whether link 112 appears to be UP or DOWN to the storage array 101 A.
- Field 314 may identify the status of link 114 . Specifically, field 312 may indicate whether link 114 appears to be UP or DOWN to the storage array 101 A.
- Field 316 may identify the value of the locally-preferred configuration setting 103 A (shown in FIG. 1 ).
- status message 310 may be timestamped, and it may also include the timestamp of the last valid status message 320 that is received at storage array 101 A.
- Message 320 is an example of a status message that is transmitted by storage array 101 B to storage array 101 A and the witness node 109 .
- status message 320 may include fields 322 , 324 , and 326 .
- Field 322 may identify the status of link 112 . Specifically, field 322 may indicate whether link 112 appears to be UP or DOWN to the storage array 101 B.
- Field 324 may identify the status of link 116 . Specifically, field 322 may indicate whether link 116 appears to be UP or DOWN to the storage array 101 B.
- Field 326 may identify the value of the locally-preferred configuration setting 103 B (shown in FIG. 1 ).
- status message 320 may be timestamped, and it may also include the timestamp of the last valid status message 310 that is received at storage array 101 B.
- Message 330 is an example of a status message that is transmitted by the witness node 109 to storage arrays 101 A and 101 B.
- status message 330 may include fields 332 , 334 , 336 , and 338 .
- Field 332 may identify the status of link 114 . Specifically, the field 332 may indicate whether link 114 appears to be UP or DOWN to the witness node 109 .
- Field 334 may identify the status of link 116 . Specifically, field 334 may indicate whether link 116 appears UP or DOWN to the witness node 109 .
- Field 336 may identify a value of locally-preferred configuration setting 103 A that has been reported to the witness node 109 by storage array 101 A.
- the value of field 336 may be equal to the value of field 316 in a status message 310 that is received by the witness node 109 from the storage array 101 A.
- the value of field 338 may be equal to the value of field 326 in a status message 320 that is received by the witness node 109 from the storage array 101 B.
- Field 338 may identify a value of locally-preferred configuration setting 103 B that has been reported to the witness node 109 by storage array 101 B. In instances in which status message 330 is transmitted to storage array 101 A, field 336 may be left blank. In instances in which status message 330 is transmitted to storage array 101 B, field 338 may be left blank.
- FIG. 4A illustrates an example of the contents of a state table 108 A.
- FIG. 4A illustrates what contents might be present in state tables 108 A and 108 B when the storage system is in state 202 (i.e., when all of links 112 , 114 , and 116 are UP and the storage system 100 is operating correctly).
- FIG. 4B illustrates an example of the contents of state table 108 A, when link 116 is down.
- FIG. 4B illustrates that in response to the failure of link 116 : (i) storage array 101 A has set the value of locally-preferred configuration setting 103 A to indicate that storage array 101 A prefers storage array 101 A to assume an active role in the event of a failure of link 112 , and (ii) storage array 101 B has set the value of locally-preferred configuration setting 103 B to indicate that storage array 101 B prefers storage array 101 A to assume an active role in the event of a failure of link 112 .
- state table 108 A is in a steady state.
- FIG. 4C illustrates an example of the contents of state table 108 A when state table 108 A is in an unsteady state.
- FIG. 4C illustrates that after link 116 has failed: (i) the value of locally-preferred configuration setting 103 A is NONE, and (ii) storage array 101 B has set the value of locally-preferred configuration setting 103 B to indicate that storage array 101 B prefers storage array 101 A to assume an active role in the event of a failure of link 112 .
- State table 108 A may be in a steady state when all items of information contained in the table (or at least two items of interest) are consistent with each other. For example, state table 108 may be in a steady state when it indicates that both storage array 101 A and storage array 101 B prefer the same storage array to assume an active role in the event of a failure of link 112 . As another example, state table 108 may be in a steady state when table 108 indicates that the storage array 101 B and the witness node 109 B have reported the same status for link 116 . As yet another example, state table 108 A may be in a steady state when table 108 indicates that storage array 101 A and the witness node 109 have reported the same status for link 114 .
- State table 108 may be in an unsteady state when at least two items of information contained in the table are not consistent with each other.
- state table 108 may be in an unsteady state when it indicates that storage array 101 A and storage array 101 B prefer different storage arrays to assume an active role in the event of a failure of link 112 (i.e., when it indicates that the values of locally-preferred configuration settings 103 A-B are not in agreement with each other).
- state table 108 may be in an unsteady state when table 108 indicates that the storage array 101 B and the witness node 109 have reported conflicting status information for link 116 (e.g., one has reported that the link is UP and the other has reported that the link is DOWN).
- state table 108 may be in an unsteady state when state table 108 indicates that storage array 101 A and the witness node 109 have reported conflicting status information for link 114 (e.g., one has reported that the link is UP and the other has reported that the link is DOWN).
- State table 108 B may have similar structure to state table 108 A.
- state table 108 A may include the values of fields 312 , 314 , 316 that were found in the most recent status message 310 that is transmitted by storage array 101 A.
- State table 108 A may also include the values of fields 322 , 324 , and 326 that were found in the most recent status message 320 that is received by storage array 101 A.
- State table 108 A may also include the values of fields 332 , 334 , and 336 that were found in the most recent status message 330 that is received by storage array 101 A.
- state table 108 B may include the values of fields 322 , 324 , 326 that were found in the most recent status message 320 that is transmitted by storage array 101 B. State table 108 B may also include the values of fields 312 , 314 , and 316 that were found in the most recent status message 310 that is received by storage array 101 B. State table 108 B may also include the values of fields 332 , 334 , and 336 that were found in the most recent status message 330 that is received by storage array 101 B.
- each of storage arrays 101 A-B may update its respective state table 108 with the contents of received status messages, as they arrive.
- only valid status messages may be used to update any of state tables 108 A and 108 B.
- a status message may be valid only when its timestamp is greater than the timestamp of the most recently received message (of the same type) or when it is received within a predetermined time period.
- Each of state tables 108 A and 108 B may be implemented as a single data structure or as a plurality of independent data structures.
- FIG. 5 is a flowchart of an example of a process 500 , according to aspects of the disclosure.
- the process 500 may be performed by either one of the storage arrays 101 A-B.
- the storage array executing the process is “self” and the other storage array is a “peer”.
- the process 500 is described as being executed by the storage array 101 A, it will be understood that the process 500 may also be executed by storage array 101 B.
- the process 500 may be executed by each (or at least one) of storage arrays 101 A-B when the storage system 100 is in any of states 202 , 206 , and 210 .
- the process 500 may be executed during the operation of the storage system 100 , irrespective of the state of the storage system 100 . Stated succinctly, the process 500 is not limited to being executing at any particular time of the operation of the storage system 100 .
- the storage array 101 A detects the status of link 112 . Specifically, the storage array 101 A detects whether link 112 is UP or DOWN.
- the storage array 101 A detects the status of link 114 . Specifically, the storage array 101 A detects whether link 114 is UP or DOWN.
- the storage array 101 A detects the status of link 116 . Specifically, the storage array detects whether link 116 is UP or DOWN based on at least one of: (i) a status message 330 that is received by the storage array 101 A from the witness node 109 and/or (ii) a status message 320 that is received by the storage array 101 A from the storage array 101 B.
- the storage array 101 A optionally updates the value of locally-preferred configuration setting 103 A.
- the storage array 101 A may leave the value of locally-preferred configuration setting 103 A unchanged.
- storage array 101 A may allow the value of locally-preferred configuration setting 103 A to remain NONE.
- both links 114 and 115 are determined to be DOWN (at steps 504 and 506 )
- the storage array 101 A may leave the value of locally-preferred configuration setting 103 A unchanged.
- storage array 101 A may allow the value of locally-preferred configuration setting 103 A to remain NONE.
- the storage array 101 A may set the value of locally-preferred configuration setting 103 A to indicate that storage array 101 A prefers storage array 101 A to assume an active role in the event of a failure of link 112 .
- storage array 101 A may set the value of locally-preferred configuration setting 103 A to ARRAY_A.
- link 114 is determined to be DOWN and link 116 is determined to be UP (at steps 504 and 506 )
- the storage array 101 A may set the value of locally-preferred configuration setting 103 A to indicate that storage array 101 A prefers storage array 101 B to assume an active role in the event of a failure of link 112 .
- storage array 101 A may set the value of locally-preferred configuration setting 103 A to ARRAY_B.
- the storage array 101 A generates a status message 310 , based on the information determined at steps 502 - 508 , and transmits the generated status message to storage array 101 B.
- the storage array 101 A generates a status message 310 , based on the information determined at steps 502 - 508 , and transmits the generated status message to the witness node 109 .
- the storage array 101 A transmits different status messages to storage array 101 B and the witness node 109 , alternative implementations are possible in which the same status message is transmitted.
- storage array 101 B may generate status messages 320 (instead of status messages 310 ), at steps 510 - 512 .
- storage array 101 B may detect the status of link 114 (at step 506 ) based on at least one of: (i) a status message 330 that is received by the storage array 101 B from the witness node 109 and/or (ii) a status message 310 that is received by the storage array 101 B from the storage array 101 A.
- storage array 101 B may detect the status of link 116 (at step 504 ) in a well-known fashion (e.g., by sending a ping to the witness node 109 A or determining whether it has received an acknowledgment for the most recent communication that is sent to the witness node 109 .)
- the storage array may optionally update the value of locally-preferred configuration setting 103 B by using the same rules. For example, if both links 114 and 116 are UP or if both of them are DOWN, the storage array 101 B may leave the value of locally-preferred configuration setting 103 B unchanged.
- the storage array 101 B may set the value of locally-preferred configuration setting 103 B to indicate that the storage array 101 B prefers the other storage array (e.g., the storage array whose link to the witness node 109 remains UP) to assume an active role in the event of a failure of link 112 .
- FIG. 6 is a flowchart of an example of a process 600 , according to aspects of the disclosure.
- the witness node 109 detects the status of link 114 . Specifically, the witness node 109 detects whether link 114 is UP or DOWN.
- the witness node 109 detects the status of link 116 . Specifically, the witness node 109 detects whether link 116 is UP or DOWN.
- the witness node 109 detects the value of locally-preferred configuration setting 103 A. Specifically, the witness node 109 may retrieve the value of locally-preferred configuration setting 103 A from a status message 310 that is received at the witness node 109 from storage array 101 A.
- the witness node 109 detects the value of locally-preferred configuration setting 103 B. Specifically, the witness node 109 may retrieve the value of locally-preferred configuration setting 103 B from a status message 320 that is received at the witness node 109 from storage array 101 B. At step 610 , the witness node 109 generates a status message 330 based on some (or all) of the information determined at steps 602 - 608 and transmits the status message to the storage array 101 A. At step 612 , the witness node 109 generates another status message 330 based on some (or all) of the information determined at steps 602 - 608 and transmits the status message to the storage array 101 B. Although in the present example the witness node 109 transmits different status messages to storage arrays 101 A-B, alternative implementations are possible in which the same status message is transmitted.
- the witness node 109 may execute the process 600 in a loop (as shown in FIG. 6 ).
- the witness node 109 when the witness node 109 is first started, it might start sending out status messages 330 right away, without waiting for the status of links 112 , 114 , and/or 116 to be determined, and/or without waiting for the receipt of status messages 310 and 320 from storage arrays 101 A and 101 B, respectively.
- the witness node 109 may report the links 112 , 114 , and 116 as being DOWN, by default.
- the witness node 109 may report their values of locally-preferred configuration settings 103 A and 103 B as being NONE.
- FIG. 7 is a flowchart of an example of a process 700 , according to aspects of the disclosure.
- the process 700 may be performed by either one of the storage arrays 101 A-B.
- the storage array executing the process is “self” and the other storage array is a “peer”.
- the process 700 is described as being executed by the storage array 101 A, it will be understood that the process 700 may also be executed by storage array 101 B.
- the process 700 may be executed by each (or at least one) of storage arrays 101 A-B when the storage system 100 is in state 204 (shown in FIG. 2 ).
- the process 700 may be executed in response to the storage array 101 A detecting that one of links 114 and 116 has failed.
- the process 700 may be executed during the operation of the storage system 100 , irrespective of the state of the storage system 100 .
- the process 700 may be executed concurrently with the process 500 (shown in FIG. 5 ). Stated succinctly, the process 700 is not limited to being executing at any particular time of the operation of the storage system 100 .
- the storage array 101 A determines the current value of the locally-preferred configuration setting 103 A. As noted above, the value of the locally-preferred configuration setting 103 A may be set as a result of executing the process 500 , which is discussed above with respect to FIG. 5 .
- the storage array 101 A receives a status message 320 from the storage array 101 B.
- the storage array 101 B receives a status message 330 from the witness node 109 .
- the storage array 101 A detects if the value of the locally-preferred configuration setting 103 A (which is stored in the memory of storage array 101 A) matches the value of the locally-preferred configuration setting 103 B (which is stored in the memory of storage array 101 B). In one example, the storage array 101 A detects if the value of the locally-preferred configuration setting 103 A matches both (or at least one) of: (i) the value of the locally-preferred configuration setting 103 B that is reported in the status message 320 (received at step 704 ) and/or (ii) the value of the locally-preferred configuration setting 103 B that is reported in the status message 330 .
- the process 700 proceeds to step 710 . If the value of locally-preferred configuration setting 103 A does not match the value of locally-preferred configuration setting 103 B, the process 700 returns to step 708 .
- the values of locally-preferred configuration settings 103 A-B match when they both indicate that the same storage array is desired by both of storage arrays 101 A-B to assume an active role when link 112 fails.
- storage array 101 A sets the system-preferred configuration setting 104 A to the value of locally-preferred configuration setting 103 A. For example, if the value of locally-preferred configuration setting 103 A indicates that the storage array 101 A prefers the storage array 101 A to assume an active role in the event of a failure of link 112 (i.e., if the value is “ARRAY_A”), the storage array 101 A may also set the value the system-preferred configuration setting 104 A to indicate that the storage array 101 A is the system-preferred storage array.
- the storage array 101 A may set the value the system-preferred configuration setting 104 A to indicate that the storage array 101 B is the system-preferred storage array.
- the process 700 may be executed by the storage array 101 B.
- storage array 101 B may determine the value of locally-preferred configuration setting 103 B (at step 702 ) and receive a status message 310 from storage array 101 A (at step 704 ).
- the storage array 101 B may detect if the value of the locally-preferred configuration setting 103 B is the same as both (or at least one) of: (i) the value of the locally-preferred configuration setting 103 A that is reported in the status message 310 (received at step 704 ) and/or (ii) the value of the locally-preferred configuration setting 103 A that is reported in a status message 330 .
- the storage array 101 B may set the system-preferred configuration setting 104 B to the value of locally-preferred configuration setting 103 B
- FIG. 8 is a flowchart of an example of a process 800 , according to aspects of the disclosure.
- the process 800 may be performed by either one of the storage arrays 101 A-B. Under the nomenclature of FIG. 8 , the storage array executing the process is “self” and the other storage array is a “peer”. Although in the present example the process 800 is described as being executed by the storage array 101 A, it will be understood that the process 800 may also be executed by storage array 101 B.
- the process 800 may be executed by each (or at least one) of storage arrays 101 A-B when the storage system 100 is in state 208 (shown in FIG. 2 ). In other words, the process 800 may be executed in response to the storage array 101 A detecting that link 112 is DOWN.
- the storage array 101 A waits for a predetermined period after detecting that link 112 is DOWN.
- the storage array 101 A detects if link 112 remains DOWN after the predetermined period has passed. If link 112 is still DOWN, the process 800 proceeds to step 804 . Otherwise, if link 112 appears to be back up, the process 800 ends.
- the storage array 101 A detects the value of the system-preferred configuration setting 104 A. If the value indicates that the storage array 101 B is designated to assume an active role in the event of a failure of link 112 , the process 800 proceeds to step 806 . If the value of the system-preferred configuration setting 104 A indicates that the storage array 101 A is designated to assume an active role in the event of a failure of link 112 , the process 800 proceeds to step 808 . If the value of the system-preferred configuration setting indicates that neither storage array 101 A nor storage array 101 B is designated to assume an active role in the event of a failure of link 112 , the process 800 proceeds to step 812 .
- storage array 101 A assumes a passive role.
- storage array 101 A assumes an active role.
- the storage array 101 A detects if the state table 108 A is in a steady state. As noted above, the state table 108 A may be in an unsteady state when link 112 appears to be down to the storage array 101 A, but is reported to be UP by the storage array 101 B. If the state table 108 A is in a steady state, the process 800 proceeds to step 816 . Otherwise, if table 108 A is not in a steady state, the process 800 proceeds to step 814 .
- the storage array 101 A assumes a passive role.
- the storage array 101 A detects whether link 114 between storage array 101 A and the witness node 109 is UP. If link 114 is UP, the process 800 proceeds to step 820 . Otherwise, if link 114 is DOWN, the process 800 proceeds to step 818 .
- the storage array 101 A assumes a passive role.
- the storage array 101 A detects whether link 116 between the witness node 109 and the storage array 101 B is UP. As noted above, the storage array 101 A may determine whether link 116 is UP based on information that is reported by the witness node 109 in one or more status messages 330 that are received at storage array 101 A from the witness node 109 . If the link is UP, the process 800 proceeds to step 822 . Otherwise, if the link is DOWN, the process 800 proceeds to step 824 . At step 822 , the storage array 101 A assumes a role that is specified by the user-preferred configuration setting 106 A.
- the storage array may assume an active role.
- the user-preferred configuration setting 106 A is set to a first value (e.g., “ARRAY_A”)
- the storage array may assume an active role.
- the user-preferred configuration setting 106 B is set to a second value (e.g., “ARRAY_B”)
- the storage array 101 A assumes a passive role.
- the storage array assumes an active role.
- the process 800 may be executed by the storage array 101 B.
- storage array 101 B may determine the value of the system-preferred configuration setting 104 B and use it as a basis for executing step 804 .
- the storage array 101 B may detect whether state table 108 B is in a steady state (at step 812 ).
- the storage array 101 B may detect whether link 116 is UP, and, at step 820 , the storage array 110 B may detect whether link 114 is UP.
- the storage array 101 B may detect the value of the user-preferred configuration setting 106 B (at step 822 ) and use it as a basis for executing step 822 .
- a computing device 900 may include processor 902 , volatile memory 904 (e.g., RAM), non-volatile memory 906 (e.g., a hard disk drive, a solid-state drive such as a flash drive, a hybrid magnetic and solid-state drive, etc.), graphical user interface (GUI) 908 (e.g., a touchscreen, a display, and so forth) and input/output (I/O) device 920 (e.g., a mouse, a keyboard, etc.).
- Non-volatile memory 906 stores computer instructions 912 , an operating system 916 and data 918 such that, for example, the computer instructions 912 are executed by the processor 902 out of volatile memory 904 .
- Program code may be applied to data entered using an input device of GUI 908 or received from I/O device 920 .
- Processor 902 may be implemented by one or more programmable processors executing one or more computer programs to perform the functions of the system.
- the term “processor” describes an electronic circuit that performs a function, an operation, or a sequence of operations. The function, operation, or sequence of operations may be hard-coded into the electronic circuit or soft coded by way of instructions held in a memory device.
- a “processor” may perform the function, operation, or sequence of operations using digital values or using analog signals.
- the “processor” can be embodied in an application-specific integrated circuit (ASIC).
- ASIC application-specific integrated circuit
- the “processor” may be embodied in a microprocessor with associated program memory.
- the “processor” may be embodied in a discrete electronic circuit.
- the “processor” may be analog, digital or mixed-signal.
- the “processor” may be one or more physical processors or one or more “virtual” (e.g., remotely located or “cloud”) processors.
- FIGS. 1-9 are provided as an example only. At least some of the steps discussed with respect to FIGS. 1-9 may be performed in parallel, in a different order, or altogether omitted.
- the word “exemplary” is used herein to mean serving as an example, instance, or illustration. Any aspect or design described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other aspects or designs.
- the term “or” is intended to mean an inclusive “or” rather than an exclusive “or”. That is, unless specified otherwise, or clear from context, “X employs A or B” is intended to mean any of the natural inclusive permutations. That is, if X employs A; X employs B; or X employs both A and B, then “X employs A or B” is satisfied under any of the foregoing instances.
- the articles “a” and “an” as used in this application and the appended claims should generally be construed to mean “one or more” unless specified otherwise or clear from context to be directed to a singular form.
- a component may be, but is not limited to being, a process running on a processor, a processor, an object, an executable, a thread of execution, a program, and/or a computer.
- an application running on a controller and the controller can be a component.
- One or more components may reside within a process and/or thread of execution and a component may be localized on one computer and/or distributed between two or more computers.
- circuits including possible implementation as a single integrated circuit, a multi-chip module, a single card, or a multi-card circuit pack
- the described embodiments are not so limited.
- various functions of circuit elements may also be implemented as processing blocks in a software program.
- Such software may be employed in, for example, a digital signal processor, micro-controller, or general-purpose computer.
- Some embodiments might be implemented in the form of methods and apparatuses for practicing those methods. Described embodiments might also be implemented in the form of program code embodied in tangible media, such as magnetic recording media, optical recording media, solid-state memory, floppy diskettes, CD-ROMs, hard drives, or any other machine-readable storage medium, wherein, when the program code is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the claimed invention.
- Described embodiments might also be implemented in the form of program code, for example, whether stored in a storage medium, loaded into and/or executed by a machine, or transmitted over some transmission medium or carrier, such as over electrical wiring or cabling, through fiber optics, or via electromagnetic radiation, wherein, when the program code is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the claimed invention.
- program code When implemented on a general-purpose processor, the program code segments combine with the processor to provide a unique device that operates analogously to specific logic circuits.
- Described embodiments might also be implemented in the form of a bitstream or other sequence of signal values electrically or optically transmitted through a medium, stored magnetic-field variations in a magnetic recording medium, etc., generated using a method and/or an apparatus of the claimed invention.
- Couple refers to any manner known in the art or later developed in which energy is allowed to be transferred between two or more elements, and the interposition of one or more additional elements is contemplated, although not required. Conversely, the terms “directly coupled,” “directly connected,” etc., imply the absence of such additional elements.
- the term “compatible” means that the element communicates with other elements in a manner wholly or partially specified by the standard, and would be recognized by other elements as sufficiently capable of communicating with the other elements in the manner specified by the standard.
- the compatible element does not need to operate internally in a manner specified by the standard.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Quality & Reliability (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
- A distributed storage system may include a plurality of storage devices (e.g., storage arrays) to provide data storage to a plurality of nodes. The plurality of storage devices and the plurality of nodes may be situated in the same physical location, or in one or more physically remote locations. The plurality of nodes may be coupled to the storage devices by a high-speed interconnect, such as a switch fabric.
- This Summary is provided to introduce a selection of concepts in a simplified form that is further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
- According to aspects of the disclosure, a method is provided for use in a first storage array, comprising: detecting whether a second storage array has designated the first storage array as a locally-preferred storage array, the detecting being performed when a first link between the second storage array and a witness node is down; setting a value of a first configuration setting to indicate that the first storage array is designated as a system-preferred storage array, the value of the first configuration setting being stored in a memory of the first storage array, the value of the first configuration setting being set only when the second storage array has designated the first storage array as a locally-preferred storage array; detecting, by the first storage array, whether a second link between the first storage array and the second storage array is down; and when the second link is down, assuming one of an active role or a passive role based, at least in part, on the value of the first configuration setting.
- According to aspects of the disclosure, a storage array is provided, comprising: a memory configured to store a first configuration setting and a second configuration setting; and at least one processor operatively coupled to the memory, the at least one processor being configured to perform the operations of: detecting whether a peer storage array has designated the storage array as a locally-preferred storage array, the detecting being performed when a first link between the peer storage array and a witness node is down; setting a value of the first configuration setting to indicate that the storage array is designated as a system-preferred storage array, the value of the first configuration setting being stored in a memory of the storage array, the value of the first configuration setting being set only when the peer storage array has designated the storage array as a locally-preferred storage array; detecting, by the storage array, whether a second link between the storage array and the peer storage array is down; and when the second link is down, assuming one of an active role or a passive role based, at least in part, on the value of the first configuration setting.
- According to aspects of the disclosure, a non-transitory computer-readable medium is provided that stores one or more processor-executable instructions, which, when executed by at least one processor of a first storage array, cause the at least one processor to perform the operations of: detecting whether a second storage array has designated the first storage array as a locally-preferred storage array, the detecting being performed when a first link between the second storage array and a witness node is down; setting a value of a first configuration setting to indicate that the first storage array is designated as a system-preferred storage array, the value of the first configuration setting being stored in a memory of the first storage array, the value of the first configuration setting being set only when the second storage array has designated the first storage array as a locally-preferred storage array; detecting, by the first storage array, whether a second link between the first storage array and the second storage array is down; and when the second link is down, assuming one of an active role or a passive role based, at least in part, on the value of the first configuration setting.
- Other aspects, features, and advantages of the claimed invention will become more fully apparent from the following detailed description, the appended claims, and the accompanying drawings in which like reference numerals identify similar or identical elements. Reference numerals that are introduced in the specification in association with a drawing figure may be repeated in one or more subsequent figures without additional description in the specification in order to provide context for other features.
-
FIG. 1 is a diagram of an example of a storage system, according to aspects of the disclosure; -
FIG. 2 is a state diagram illustrating the operation of the storage system ofFIG. 1 , according to aspects of the disclosure; -
FIG. 3 is a diagram illustrating examples of status messages, according to aspects of the disclosure; -
FIG. 4A is a diagram of an example of a state table, according to aspects of the disclosure; -
FIG. 4B is a diagram of an example of a state table, according to aspects of the disclosure; -
FIG. 4C is a diagram of an example of a state table, according to aspects of the disclosure; -
FIG. 5 is a flowchart of an example of a process, according to aspects of the disclosure; -
FIG. 6 is a flowchart of an example of a process, according to aspects of the disclosure; -
FIG. 7 is a flowchart of an example of a process, according to aspects of the disclosure; -
FIG. 8 is a flowchart of an example of a process, according to aspects of the disclosure; and -
FIG. 9 is a diagram of an example of a computing device, according to aspects of the disclosure. -
FIG. 1 is a diagram of an example of astorage system 100, according to aspects of the disclosure. As illustrated, thestorage system 100 may include astorage array 101A, astorage array 101B, and awitness node 109. - The
storage array 101A may be configured to maintain avolume instance 102A. Thestorage array 101B may be configured to maintain avolume instance 102B.Volume instances storage arrays storage array 101A may store data associated with the write request involume instance 102A, and then transmit the data overlink 112 tostorage array 101B, after which the transmitted data is stored involume instance 102B. In other words, when the state ofvolume instance 102A is changed (e.g., by storing or deleting data), this change is propagated tovolume instance 102B, overlink 112, in order to keepvolume instances storage array 101B may store data associated with the write request involume instance 102B, and then transmit the data overlink 112 tostorage array 101A, after which the transmitted data is stored involume instance 102A. In other words, when the state ofvolume instance 102B is changed (e.g., by storing or deleting data), this change is propagated tovolume instance 102A, overlink 112, in order to keepvolume instances - Maintaining a consistent state between volume instances is important for the operation of the
storage system 100. Ifvolume instances link 112. Whenlink 112 is down,volume instances link 112 is down, one of thestorage arrays 101A-B may assume a passive role (and stop serving IO requests), and the other one of thestorage arrays 101A-B may assume an active role (and continue serving IO requests). - The examples that follow illustrate a technique that enables the
storage arrays 101A-B to choose their respective roles when link 112 fails. Specifically, the technique allows one ofstorage arrays 101A-B to choose a passive role and the other to choose an active role. When the technique is used, it guarantees (e.g., at least under most circumstances) that only one ofstorage arrays 101A-B will choose an active role and only one of thestorage arrays 101A-B will choose a passive role, thereby preventing a situation in which both storage arrays have chosen the same role. Moreover, the technique allows each ofstorage arrays 101A-B to chose its respective role autonomously of the other. - Assuming an active role by
storage array 101A may include one or more of: (i) continuing to service IO requests, (ii) transitioning into a state in which data that is written tovolume instance 102A is not synchronously transmitted tostorage array 101B, or (iii) maintaining a record of deletions performed onvolume instance 102A, so that those deletions can be synchronized intovolume instance 102B at a later time, (iv) marking data that is written tovolume instance 102A (after thestorage array 101A has assumed an active role), so that the data can be synchronized (e.g., copied) intovolume instance 102B at a later time. Assuming an active role bystorage array 101B may include one or more of: (i) continuing to service IO requests, (ii) transitioning into a state in which data that is written tovolume instance 102B a is not synchronously transmitted tostorage array 101A, or (iii) maintaining a record of deletions performed onvolume instance 102B, so that those deletions can be synchronized intovolume instance 102A at a later time, (iv) marking data that is written tovolume instance 102B (after thestorage array 101B has assumed an active role), so that the data can be synchronized (e.g., copied) intovolume instance 102B at a later time. Assuming a passive role may include stopping to service incoming IO requests. In some implementations, when a storage array assumes a passive role, it may also transmit a message to a multipath client, or a host device, indicating that the storage array is not currently serving IO requests. By way of example, an IO request may include a read request, a write request, a delete request, etc. Whenstorage arrays 101A-B are serving IO requests, they are both “active.” However, under the nomenclature of the present disclosure, the phrase “assuming an active role” refers to the situation in which one ofstorage arrays 101A-B continues to service IO requests, while believing that the other storage array is unavailable and/or not serving TO requests. - The
storage array 101A may include one or more computing devices, such as thecomputing device 900, which is shown inFIG. 9 . In its memory (e.g., volatile and/or non-volatile memory), thestorage array 101A may store a locally-preferred configuration setting 103A, a system-preferredconfiguration setting 104A, a user-preferredconfiguration setting 106A, and a state table 108A. Thestorage array 101B may include one or more computing devices, such as thecomputing device 900, which is shown inFIG. 9 . In its memory (e.g., volatile and/or non-volatile memory), thestorage array 101B may store a locally-preferredconfiguration setting 103B, a system-preferred configuration setting 104B, a user-preferredconfiguration setting 106B, and a state table 108B. - Each of the user-preferred
configuration settings 106A-B includes a configuration setting that specifies which one ofstorage arrays link 112. The user-preferred configuration settings 106A-B are specified by the user (e.g., a system administrator), whereas the system-preferred configuration settings 104A-B are determined dynamically by thestorage system 100. Moreover, the user-preferred configuration settings 106A-B are stored in the memory ofstorage arrays 101A-B before run-time, whereas the value of system-preferred configuration settings 104A-B is determined at runtime. - Each of system-
preferred configuration settings 104A-B may be determined at run-time by thestorage system 100. Each of the system-preferred configuration settings 104A-B specifies which storage array in thestorage system 100 will assume an active role when link 112A fails. Each of system-preferred configuration settings 104A-B may have a value that is selected from the set {NONE, ARRAY_A, ARRAY_B}. The value of “NONE” indicates that neitherstorage array 101A norstorage array 101B is designated to assume an active role in the event of a failure oflink 112. The value “ARRAY_A” indicates thatstorage array 101A is designated to assume an active role in the event of a failure of link 112 (i.e., it indicates thatstorage array 101A is designated as a system-preferred storage array). The value “ARRAY_B” indicates thatstorage array 101B is designated to assume an active role in the event of a failure of link 112 (i.e., it indicates thatstorage array 101B is designated as a system-preferred storage array). - As noted above, the system-preferred configuration setting 104A is stored in the memory of
storage array 101A, and it is isolated fromstorage array 101B. Similarly, the system-preferred configuration setting 104B is stored in the memory ofstorage array 101B, and it is isolated fromstorage array 101A. When thestorage system 100 is first started, system-preferredconfiguration settings 104A-B are initially set to NONE, and they are subsequently updated to identify one of thestorage arrays 101A-B as the storage array that would assume an active role in the event of a failure of thelink 112. The value of system-preferred configuration setting 104A is updated bystorage array 101A independently ofstorage array 101B, whenstorage array 101A is able to confirm that the values of locally-preferredconfiguration settings 103A-B are in agreement with one another. In one example, updating the value of system-preferred configuration setting 104A includes causing the system-preferred configuration setting 104A to equal the value of locally-preferred configuration setting 103A. The value of system-preferred configuration setting 104B is updated bystorage array 101B independently ofstorage array 101A, whenstorage array 101B is able to confirm that the values of locally-preferredconfiguration settings 103A-B are in agreement with one another. In one example, updating the value of system-preferred configuration setting 104B includes causing the system-preferred configuration setting 104B to equal the value of locally-preferred configuration setting 103B. - Locally-preferred configuration setting of 103A may be determined locally by
storage array 101A, and it may specify which one ofstorage arrays link 112. Locally-preferredconfiguration setting 103B may include a value that is determined locally bystorage array 101B, and it may specify which one ofstorage arrays link 112. As is discussed further below, each of locally-preferredconfiguration settings 103A-B represents an intermediate (or preliminary) value, which is used in a protocol for setting the values of system-preferred configuration setting 104A and/or system-preferred configuration setting 104B. - The witness node 109A may include one or more computing devices, such as the
computing device 900, which is discussed further below with respect toFIG. 9 . The witness node 109A may be connected tostorage array 101A vialink 114. The witness node 109A may be connected tostorage array 101B vialink 116. Thewitness node 109 may relay tostorage array 101A information that is transmitted (in status messages 320) to thewitness node 109 bystorage array 101B. The information may include the value of the locally-preferred configuration setting 103B. Thewitness node 109 may further relay tostorage array 101B information that is transmitted (in status messages 310) to thewitness node 109 bystorage array 101A. The information may include the value of the locally-preferred configuration setting 103A. - According to the present disclosure, the term “link” may refer to one or more communications channels between two entities (e.g.,
storage array 101A,storage array 101B, andwitness node 109, etc.). A link may be UP when an entity is able to transmit information over the link and subsequently receive an acknowledgment that the transmittal has been received at its destination. A link may be DOWN when an entity is able to transmit information over the link and subsequently receive an acknowledgment that the information has been received at its destination. For example, from the perspective ofstorage array 101A, link 112 may be down when one or more communications networks that are used to establish the link are not operating correctly or whenstorage array 101B is unavailable. For example, from the perspective ofstorage array 101B, link 112 may be down when one or more communications networks that are used to establish the link are not operating correctly or whenstorage array 101A is unavailable. From the perspective ofstorage array 101A, link 114 may be DOWN, when one or more communications networks that are used to establish link 114 are not operating correctly or thewitness node 109 is unavailable. From the perspective of thewitness node 109, link 114 may be DOWN, when one or more communications networks that are used to establish link 114 are not operating correctly or whenstorage array 101A is unavailable. From the perspective ofstorage array 101B, link 116 may be DOWN, when one or more communications networks that are used to establish link 116 are not operating correctly or thewitness node 109 is unavailable. From the perspective of thewitness node 109, link 116 may be DOWN, when one or more communications networks that are used to establish link 116 are not operating correctly or whenstorage array 101B is unavailable. According to the present disclosure, in some implementations, any of thelinks links storage arrays 101A-B and witness node 109). -
FIG. 2 is a state diagram illustrating aspects of the operation of thestorage system 100, according to one example. When thestorage system 100 is in astate 202,links storage arrays storage system 100 may transition into state 204 when at least one oflinks 114 and/or 116 goes DOWN. When thestorage system 100 is in state 204,storage arrays configuration settings 104A-B, respectively. When thestorage system 100 is in state 204, at least one ofstorage arrays 101A-B may execute a process 700 (shown inFIG. 7 ). Thestorage system 100 may transition from state 204 intostate 206 when the attempt is completed. When thestorage system 100 is instate 206, at least one oflinks storage arrays storage system 100 may transition fromstate 206 to astate 208 whenlink 112 goes DOWN. When thestorage system 100 is instate 208, one of thestorage arrays 101A-B assumes an active role and the other one ofstorage arrays 101A-B assumes a passive role. Each of thestorage arrays 101A-B determines its role independently of the other. When thestorage system 100 is instate 208, each (or at least one) ofstorage arrays 101A-B may execute a process 800 (shown inFIG. 8 ) to assume its role. Thestorage system 100 may transition fromstate 208 tostate 210 after one ofstorage arrays 101A-B has assumed an active role and the other one has assumed a passive role. When thestorage system 100 is instate 210, one ofstorage arrays 101A-B may operate in an active role (i.e., it may be serving IO requests) and the other one may be in a passive role (i.e., it may not be serving IO requests). Thestorage system 100 may transition fromstate 210 to astate 212 when all oflinks storage system 100 is instate 212,volume instance 102A andvolume instance 102B are synchronized with each other and brought into a consistent state. After thevolume instances storage system 100 returns tostate 202. - In some implementations, the
storage system 100 may transition fromstate 202 tostate 208, when link 112 goes DOWN (and each oflinks links links storage array 101A may automatically assume a passive role, andstorage array 101B may assume an active role. As another example, iflinks storage array 101B may assume a passive role, andstorage array 101A may assume an active role. -
FIG. 3 is a diagram illustrating examples of status messages that are transmitted bystorage array 101A,storage array 101B, and thewitness node 109. - Message 310 is an example of a status message that is transmitted by
storage array 101A tostorage array 101B and thewitness node 109. As illustrated, status message 310 may includefields Field 312 may identify the status oflink 112. Specifically,field 312 may indicate whetherlink 112 appears to be UP or DOWN to thestorage array 101A.Field 314 may identify the status oflink 114. Specifically,field 312 may indicate whetherlink 114 appears to be UP or DOWN to thestorage array 101A.Field 316 may identify the value of the locally-preferred configuration setting 103A (shown inFIG. 1 ). Although not shown, status message 310 may be timestamped, and it may also include the timestamp of the lastvalid status message 320 that is received atstorage array 101A. -
Message 320 is an example of a status message that is transmitted bystorage array 101B tostorage array 101A and thewitness node 109. As illustrated,status message 320 may includefields Field 322 may identify the status oflink 112. Specifically,field 322 may indicate whetherlink 112 appears to be UP or DOWN to thestorage array 101B. Field 324 may identify the status oflink 116. Specifically,field 322 may indicate whetherlink 116 appears to be UP or DOWN to thestorage array 101B.Field 326 may identify the value of the locally-preferred configuration setting 103B (shown inFIG. 1 ). Although not shown,status message 320 may be timestamped, and it may also include the timestamp of the last valid status message 310 that is received atstorage array 101B. -
Message 330 is an example of a status message that is transmitted by thewitness node 109 tostorage arrays status message 330 may includefields Field 332 may identify the status oflink 114. Specifically, thefield 332 may indicate whetherlink 114 appears to be UP or DOWN to thewitness node 109.Field 334 may identify the status oflink 116. Specifically,field 334 may indicate whetherlink 116 appears UP or DOWN to thewitness node 109.Field 336 may identify a value of locally-preferred configuration setting 103A that has been reported to thewitness node 109 bystorage array 101A. The value offield 336 may be equal to the value offield 316 in a status message 310 that is received by thewitness node 109 from thestorage array 101A. The value of field 338 may be equal to the value offield 326 in astatus message 320 that is received by thewitness node 109 from thestorage array 101B. Field 338 may identify a value of locally-preferred configuration setting 103B that has been reported to thewitness node 109 bystorage array 101B. In instances in whichstatus message 330 is transmitted tostorage array 101A,field 336 may be left blank. In instances in whichstatus message 330 is transmitted tostorage array 101B, field 338 may be left blank. -
FIG. 4A illustrates an example of the contents of a state table 108A.FIG. 4A illustrates what contents might be present in state tables 108A and 108B when the storage system is in state 202 (i.e., when all oflinks storage system 100 is operating correctly). -
FIG. 4B illustrates an example of the contents of state table 108A, when link 116 is down.FIG. 4B illustrates that in response to the failure of link 116: (i)storage array 101A has set the value of locally-preferred configuration setting 103A to indicate thatstorage array 101A prefersstorage array 101A to assume an active role in the event of a failure oflink 112, and (ii)storage array 101B has set the value of locally-preferred configuration setting 103B to indicate thatstorage array 101B prefersstorage array 101A to assume an active role in the event of a failure oflink 112. In the example ofFIG. 4B , state table 108A is in a steady state. -
FIG. 4C illustrates an example of the contents of state table 108A when state table 108A is in an unsteady state.FIG. 4C illustrates that afterlink 116 has failed: (i) the value of locally-preferred configuration setting 103A is NONE, and (ii)storage array 101B has set the value of locally-preferred configuration setting 103B to indicate thatstorage array 101B prefersstorage array 101A to assume an active role in the event of a failure oflink 112. - State table 108A may be in a steady state when all items of information contained in the table (or at least two items of interest) are consistent with each other. For example, state table 108 may be in a steady state when it indicates that both
storage array 101A andstorage array 101B prefer the same storage array to assume an active role in the event of a failure oflink 112. As another example, state table 108 may be in a steady state when table 108 indicates that thestorage array 101B and the witness node 109B have reported the same status forlink 116. As yet another example, state table 108A may be in a steady state when table 108 indicates thatstorage array 101A and thewitness node 109 have reported the same status forlink 114. - State table 108 may be in an unsteady state when at least two items of information contained in the table are not consistent with each other. For example, state table 108 may be in an unsteady state when it indicates that
storage array 101A andstorage array 101B prefer different storage arrays to assume an active role in the event of a failure of link 112 (i.e., when it indicates that the values of locally-preferredconfiguration settings 103A-B are not in agreement with each other). As another example, state table 108 may be in an unsteady state when table 108 indicates that thestorage array 101B and thewitness node 109 have reported conflicting status information for link 116 (e.g., one has reported that the link is UP and the other has reported that the link is DOWN). As yet another example, state table 108 may be in an unsteady state when state table 108 indicates thatstorage array 101A and thewitness node 109 have reported conflicting status information for link 114 (e.g., one has reported that the link is UP and the other has reported that the link is DOWN). - State table 108B may have similar structure to state table 108A. In some implementations, state table 108A may include the values of
fields storage array 101A. State table 108A may also include the values offields recent status message 320 that is received bystorage array 101A. State table 108A may also include the values offields recent status message 330 that is received bystorage array 101A. In some implementations, state table 108B may include the values offields recent status message 320 that is transmitted bystorage array 101B. State table 108B may also include the values offields storage array 101B. State table 108B may also include the values offields recent status message 330 that is received bystorage array 101B. - In some implementations, each of
storage arrays 101A-B may update its respective state table 108 with the contents of received status messages, as they arrive. In some implementations, only valid status messages may be used to update any of state tables 108A and 108B. A status message may be valid only when its timestamp is greater than the timestamp of the most recently received message (of the same type) or when it is received within a predetermined time period. Each of state tables 108A and 108B may be implemented as a single data structure or as a plurality of independent data structures. -
FIG. 5 is a flowchart of an example of aprocess 500, according to aspects of the disclosure. Theprocess 500 may be performed by either one of thestorage arrays 101A-B. Under the nomenclature ofFIG. 5 , the storage array executing the process is “self” and the other storage array is a “peer”. Although in the present example theprocess 500 is described as being executed by thestorage array 101A, it will be understood that theprocess 500 may also be executed bystorage array 101B. For example, and without limitation, theprocess 500 may be executed by each (or at least one) ofstorage arrays 101A-B when thestorage system 100 is in any ofstates process 500 may be executed during the operation of thestorage system 100, irrespective of the state of thestorage system 100. Stated succinctly, theprocess 500 is not limited to being executing at any particular time of the operation of thestorage system 100. - At
step 502, thestorage array 101A detects the status oflink 112. Specifically, thestorage array 101A detects whetherlink 112 is UP or DOWN. Atstep 504, thestorage array 101A detects the status oflink 114. Specifically, thestorage array 101A detects whetherlink 114 is UP or DOWN. Atstep 506, thestorage array 101A detects the status oflink 116. Specifically, the storage array detects whetherlink 116 is UP or DOWN based on at least one of: (i) astatus message 330 that is received by thestorage array 101A from thewitness node 109 and/or (ii) astatus message 320 that is received by thestorage array 101A from thestorage array 101B. - At
step 508, thestorage array 101A optionally updates the value of locally-preferred configuration setting 103A. When bothlinks 114 and 115 are determined to be UP (atsteps 504 and 506), thestorage array 101A may leave the value of locally-preferred configuration setting 103A unchanged. For example,storage array 101A may allow the value of locally-preferred configuration setting 103A to remain NONE. When bothlinks 114 and 115 are determined to be DOWN (atsteps 504 and 506), thestorage array 101A may leave the value of locally-preferred configuration setting 103A unchanged. For example,storage array 101A may allow the value of locally-preferred configuration setting 103A to remain NONE. When link 114 is determined to be UP and link 116 is determined to be DOWN (atsteps 504 and 506), thestorage array 101A may set the value of locally-preferred configuration setting 103A to indicate thatstorage array 101A prefersstorage array 101A to assume an active role in the event of a failure oflink 112. For example,storage array 101A may set the value of locally-preferred configuration setting 103A to ARRAY_A. When link 114 is determined to be DOWN and link 116 is determined to be UP (atsteps 504 and 506), thestorage array 101A may set the value of locally-preferred configuration setting 103A to indicate thatstorage array 101A prefersstorage array 101B to assume an active role in the event of a failure oflink 112. For example,storage array 101A may set the value of locally-preferred configuration setting 103A to ARRAY_B. - At step 510, the
storage array 101A generates a status message 310, based on the information determined at steps 502-508, and transmits the generated status message tostorage array 101B. At step 512, thestorage array 101A generates a status message 310, based on the information determined at steps 502-508, and transmits the generated status message to thewitness node 109. Although in the present example thestorage array 101A transmits different status messages tostorage array 101B and thewitness node 109, alternative implementations are possible in which the same status message is transmitted. - In some implementations, when the
process 500 is performed bystorage array 101B,storage array 101B may generate status messages 320 (instead of status messages 310), at steps 510-512. In some implementations, when theprocess 500 is performed bystorage array 101B,storage array 101B may detect the status of link 114 (at step 506) based on at least one of: (i) astatus message 330 that is received by thestorage array 101B from thewitness node 109 and/or (ii) a status message 310 that is received by thestorage array 101B from thestorage array 101A. In some implementations, when theprocess 500 is performed bystorage array 101B,storage array 101B may detect the status of link 116 (at step 504) in a well-known fashion (e.g., by sending a ping to the witness node 109A or determining whether it has received an acknowledgment for the most recent communication that is sent to thewitness node 109.) In some implementations, when theprocess 500 is performed bystorage array 101B, the storage array may optionally update the value of locally-preferred configuration setting 103B by using the same rules. For example, if bothlinks storage array 101B may leave the value of locally-preferred configuration setting 103B unchanged. On the other hand, if the link between one of thestorage arrays 101A-B and thewitness node 109 goes DOWN, while the other remains UP, thestorage array 101B may set the value of locally-preferred configuration setting 103B to indicate that thestorage array 101B prefers the other storage array (e.g., the storage array whose link to thewitness node 109 remains UP) to assume an active role in the event of a failure oflink 112. -
FIG. 6 is a flowchart of an example of aprocess 600, according to aspects of the disclosure. Atstep 602, thewitness node 109 detects the status oflink 114. Specifically, thewitness node 109 detects whetherlink 114 is UP or DOWN. Atstep 604, thewitness node 109 detects the status oflink 116. Specifically, thewitness node 109 detects whetherlink 116 is UP or DOWN. Atstep 606, thewitness node 109 detects the value of locally-preferred configuration setting 103A. Specifically, thewitness node 109 may retrieve the value of locally-preferred configuration setting 103A from a status message 310 that is received at thewitness node 109 fromstorage array 101A. Atstep 608, thewitness node 109 detects the value of locally-preferred configuration setting 103B. Specifically, thewitness node 109 may retrieve the value of locally-preferred configuration setting 103B from astatus message 320 that is received at thewitness node 109 fromstorage array 101B. Atstep 610, thewitness node 109 generates astatus message 330 based on some (or all) of the information determined at steps 602-608 and transmits the status message to thestorage array 101A. Atstep 612, thewitness node 109 generates anotherstatus message 330 based on some (or all) of the information determined at steps 602-608 and transmits the status message to thestorage array 101B. Although in the present example thewitness node 109 transmits different status messages tostorage arrays 101A-B, alternative implementations are possible in which the same status message is transmitted. - In some implementations, the
witness node 109 may execute theprocess 600 in a loop (as shown inFIG. 6 ). In such implementations, when thewitness node 109 is first started, it might start sending outstatus messages 330 right away, without waiting for the status oflinks status messages 310 and 320 fromstorage arrays links witness node 109 may report thelinks status messages 310 and 320 are received by thewitness node 109, which identify the values locally-preferredconfiguration settings witness node 109 may report their values of locally-preferredconfiguration settings -
FIG. 7 is a flowchart of an example of aprocess 700, according to aspects of the disclosure. Theprocess 700 may be performed by either one of thestorage arrays 101A-B. Under the nomenclature ofFIG. 7 , the storage array executing the process is “self” and the other storage array is a “peer”. Although in the present example theprocess 700 is described as being executed by thestorage array 101A, it will be understood that theprocess 700 may also be executed bystorage array 101B. For example, and without limitation, theprocess 700 may be executed by each (or at least one) ofstorage arrays 101A-B when thestorage system 100 is in state 204 (shown inFIG. 2 ). In other words, theprocess 700 may be executed in response to thestorage array 101A detecting that one oflinks process 700 may be executed during the operation of thestorage system 100, irrespective of the state of thestorage system 100. In such implementations, theprocess 700 may be executed concurrently with the process 500 (shown inFIG. 5 ). Stated succinctly, theprocess 700 is not limited to being executing at any particular time of the operation of thestorage system 100. - At
step 702, thestorage array 101A determines the current value of the locally-preferred configuration setting 103A. As noted above, the value of the locally-preferredconfiguration setting 103A may be set as a result of executing theprocess 500, which is discussed above with respect toFIG. 5 . Atstep 704, thestorage array 101A receives astatus message 320 from thestorage array 101B. At step 706, thestorage array 101B receives astatus message 330 from thewitness node 109. Atstep 708, thestorage array 101A detects if the value of the locally-preferred configuration setting 103A (which is stored in the memory ofstorage array 101A) matches the value of the locally-preferred configuration setting 103B (which is stored in the memory ofstorage array 101B). In one example, thestorage array 101A detects if the value of the locally-preferred configuration setting 103A matches both (or at least one) of: (i) the value of the locally-preferred configuration setting 103B that is reported in the status message 320 (received at step 704) and/or (ii) the value of the locally-preferred configuration setting 103B that is reported in thestatus message 330. If the value of the locally-preferred configuration setting 103A matches the value of the locally-preferred configuration setting 103B, theprocess 700 proceeds to step 710. If the value of locally-preferred configuration setting 103A does not match the value of locally-preferred configuration setting 103B, theprocess 700 returns to step 708. The values of locally-preferredconfiguration settings 103A-B match when they both indicate that the same storage array is desired by both ofstorage arrays 101A-B to assume an active role when link 112 fails. - At
step 710,storage array 101A sets the system-preferred configuration setting 104A to the value of locally-preferred configuration setting 103A. For example, if the value of locally-preferred configuration setting 103A indicates that thestorage array 101A prefers thestorage array 101A to assume an active role in the event of a failure of link 112 (i.e., if the value is “ARRAY_A”), thestorage array 101A may also set the value the system-preferred configuration setting 104A to indicate that thestorage array 101A is the system-preferred storage array. Alternatively, if the value of locally-preferred configuration setting 103A indicates that thestorage array 101A prefers thestorage array 101B to assume an active role in the event of a failure of link 112 (i.e., if the value is “ARRAY_B”), thestorage array 101A may set the value the system-preferred configuration setting 104A to indicate that thestorage array 101B is the system-preferred storage array. - As noted above, in some implementations, the
process 700 may be executed by thestorage array 101B. When theprocess 700 is performed bystorage array 101B,storage array 101B may determine the value of locally-preferred configuration setting 103B (at step 702) and receive a status message 310 fromstorage array 101A (at step 704). Furthermore, atstep 708, thestorage array 101B may detect if the value of the locally-preferredconfiguration setting 103B is the same as both (or at least one) of: (i) the value of the locally-preferred configuration setting 103A that is reported in the status message 310 (received at step 704) and/or (ii) the value of the locally-preferred configuration setting 103A that is reported in astatus message 330. In addition, atstep 710, thestorage array 101B may set the system-preferred configuration setting 104B to the value of locally-preferredconfiguration setting 103B -
FIG. 8 is a flowchart of an example of aprocess 800, according to aspects of the disclosure. Theprocess 800 may be performed by either one of thestorage arrays 101A-B. Under the nomenclature ofFIG. 8 , the storage array executing the process is “self” and the other storage array is a “peer”. Although in the present example theprocess 800 is described as being executed by thestorage array 101A, it will be understood that theprocess 800 may also be executed bystorage array 101B. For example, and without limitation, theprocess 800 may be executed by each (or at least one) ofstorage arrays 101A-B when thestorage system 100 is in state 208 (shown inFIG. 2 ). In other words, theprocess 800 may be executed in response to thestorage array 101A detecting thatlink 112 is DOWN. - At
step 802, thestorage array 101A waits for a predetermined period after detecting thatlink 112 is DOWN. Atstep 803, thestorage array 101A detects iflink 112 remains DOWN after the predetermined period has passed. Iflink 112 is still DOWN, theprocess 800 proceeds to step 804. Otherwise, iflink 112 appears to be back up, theprocess 800 ends. - At
step 804, thestorage array 101A detects the value of the system-preferred configuration setting 104A. If the value indicates that thestorage array 101B is designated to assume an active role in the event of a failure oflink 112, theprocess 800 proceeds to step 806. If the value of the system-preferred configuration setting 104A indicates that thestorage array 101A is designated to assume an active role in the event of a failure oflink 112, theprocess 800 proceeds to step 808. If the value of the system-preferred configuration setting indicates that neitherstorage array 101A norstorage array 101B is designated to assume an active role in the event of a failure oflink 112, theprocess 800 proceeds to step 812. - At
step 806,storage array 101A assumes a passive role. Atstep 808,storage array 101A assumes an active role. Atstep 812, thestorage array 101A detects if the state table 108A is in a steady state. As noted above, the state table 108A may be in an unsteady state when link 112 appears to be down to thestorage array 101A, but is reported to be UP by thestorage array 101B. If the state table 108A is in a steady state, theprocess 800 proceeds to step 816. Otherwise, if table 108A is not in a steady state, theprocess 800 proceeds to step 814. - At
step 814, thestorage array 101A assumes a passive role. Atstep 816, thestorage array 101A detects whetherlink 114 betweenstorage array 101A and thewitness node 109 is UP. Iflink 114 is UP, theprocess 800 proceeds to step 820. Otherwise, iflink 114 is DOWN, theprocess 800 proceeds to step 818. - At
step 818, thestorage array 101A assumes a passive role. Atstep 820, thestorage array 101A detects whetherlink 116 between thewitness node 109 and thestorage array 101B is UP. As noted above, thestorage array 101A may determine whetherlink 116 is UP based on information that is reported by thewitness node 109 in one ormore status messages 330 that are received atstorage array 101A from thewitness node 109. If the link is UP, theprocess 800 proceeds to step 822. Otherwise, if the link is DOWN, theprocess 800 proceeds to step 824. At step 822, thestorage array 101A assumes a role that is specified by the user-preferred configuration setting 106A. For instance, if the user-preferred configuration setting 106A is set to a first value (e.g., “ARRAY_A”), the storage array may assume an active role. On the other hand, if the user-preferredconfiguration setting 106B is set to a second value (e.g., “ARRAY_B”), thestorage array 101A assumes a passive role. Atstep 824, the storage array assumes an active role. - As noted above, in some implementations, the
process 800 may be executed by thestorage array 101B. When theprocess 800 is performed bystorage array 101B,storage array 101B may determine the value of the system-preferred configuration setting 104B and use it as a basis for executingstep 804. Furthermore, thestorage array 101B may detect whether state table 108B is in a steady state (at step 812). In addition, atstep 816, thestorage array 101B may detect whetherlink 116 is UP, and, atstep 820, the storage array 110B may detect whetherlink 114 is UP. Thestorage array 101B may detect the value of the user-preferred configuration setting 106B (at step 822) and use it as a basis for executing step 822. - Referring to
FIG. 9 , acomputing device 900 may includeprocessor 902, volatile memory 904 (e.g., RAM), non-volatile memory 906 (e.g., a hard disk drive, a solid-state drive such as a flash drive, a hybrid magnetic and solid-state drive, etc.), graphical user interface (GUI) 908 (e.g., a touchscreen, a display, and so forth) and input/output (I/O) device 920 (e.g., a mouse, a keyboard, etc.).Non-volatile memory 906stores computer instructions 912, anoperating system 916 anddata 918 such that, for example, thecomputer instructions 912 are executed by theprocessor 902 out ofvolatile memory 904. Program code may be applied to data entered using an input device ofGUI 908 or received from I/O device 920. -
Processor 902 may be implemented by one or more programmable processors executing one or more computer programs to perform the functions of the system. As used herein, the term “processor” describes an electronic circuit that performs a function, an operation, or a sequence of operations. The function, operation, or sequence of operations may be hard-coded into the electronic circuit or soft coded by way of instructions held in a memory device. A “processor” may perform the function, operation, or sequence of operations using digital values or using analog signals. In some embodiments, the “processor” can be embodied in an application-specific integrated circuit (ASIC). In some embodiments, the “processor” may be embodied in a microprocessor with associated program memory. In some embodiments, the “processor” may be embodied in a discrete electronic circuit. The “processor” may be analog, digital or mixed-signal. In some embodiments, the “processor” may be one or more physical processors or one or more “virtual” (e.g., remotely located or “cloud”) processors. -
FIGS. 1-9 are provided as an example only. At least some of the steps discussed with respect toFIGS. 1-9 may be performed in parallel, in a different order, or altogether omitted. As used in this application, the word “exemplary” is used herein to mean serving as an example, instance, or illustration. Any aspect or design described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other aspects or designs. - Additionally, the term “or” is intended to mean an inclusive “or” rather than an exclusive “or”. That is, unless specified otherwise, or clear from context, “X employs A or B” is intended to mean any of the natural inclusive permutations. That is, if X employs A; X employs B; or X employs both A and B, then “X employs A or B” is satisfied under any of the foregoing instances. In addition, the articles “a” and “an” as used in this application and the appended claims should generally be construed to mean “one or more” unless specified otherwise or clear from context to be directed to a singular form.
- To the extent directional terms are used in the specification and claims (e.g., upper, lower, parallel, perpendicular, etc.), these terms are merely intended to assist in describing and claiming the invention and are not intended to limit the claims in any way. Such terms do not require exactness (e.g., exact perpendicularity or exact parallelism, etc.), but instead it is intended that normal tolerances and ranges apply. Similarly, unless explicitly stated otherwise, each numerical value and range should be interpreted as being approximate as if the word “about”, “substantially” or “approximately” preceded the value of the value or range.
- Moreover, the terms “system,” “component,” “module,” “interface,”, “model” or the like are generally intended to refer to a computer-related entity, either hardware, a combination of hardware and software, software, or software in execution. For example, a component may be, but is not limited to being, a process running on a processor, a processor, an object, an executable, a thread of execution, a program, and/or a computer. By way of illustration, both an application running on a controller and the controller can be a component. One or more components may reside within a process and/or thread of execution and a component may be localized on one computer and/or distributed between two or more computers.
- Although the subject matter described herein may be described in the context of illustrative implementations to process one or more computing application features/operations for a computing application having user-interactive components the subject matter is not limited to these particular embodiments. Rather, the techniques described herein can be applied to any suitable type of user-interactive component execution management methods, systems, platforms, and/or apparatus.
- While the exemplary embodiments have been described with respect to processes of circuits, including possible implementation as a single integrated circuit, a multi-chip module, a single card, or a multi-card circuit pack, the described embodiments are not so limited. As would be apparent to one skilled in the art, various functions of circuit elements may also be implemented as processing blocks in a software program. Such software may be employed in, for example, a digital signal processor, micro-controller, or general-purpose computer.
- Some embodiments might be implemented in the form of methods and apparatuses for practicing those methods. Described embodiments might also be implemented in the form of program code embodied in tangible media, such as magnetic recording media, optical recording media, solid-state memory, floppy diskettes, CD-ROMs, hard drives, or any other machine-readable storage medium, wherein, when the program code is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the claimed invention. Described embodiments might also be implemented in the form of program code, for example, whether stored in a storage medium, loaded into and/or executed by a machine, or transmitted over some transmission medium or carrier, such as over electrical wiring or cabling, through fiber optics, or via electromagnetic radiation, wherein, when the program code is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the claimed invention. When implemented on a general-purpose processor, the program code segments combine with the processor to provide a unique device that operates analogously to specific logic circuits. Described embodiments might also be implemented in the form of a bitstream or other sequence of signal values electrically or optically transmitted through a medium, stored magnetic-field variations in a magnetic recording medium, etc., generated using a method and/or an apparatus of the claimed invention.
- It should be understood that the steps of the exemplary methods set forth herein are not necessarily required to be performed in the order described, and the order of the steps of such methods should be understood to be merely exemplary. Likewise, additional steps may be included in such methods, and certain steps may be omitted or combined, in methods consistent with various embodiments.
- Also, for purposes of this description, the terms “couple,” “coupling,” “coupled,” “connect,” “connecting,” or “connected” refer to any manner known in the art or later developed in which energy is allowed to be transferred between two or more elements, and the interposition of one or more additional elements is contemplated, although not required. Conversely, the terms “directly coupled,” “directly connected,” etc., imply the absence of such additional elements.
- As used herein in reference to an element and a standard, the term “compatible” means that the element communicates with other elements in a manner wholly or partially specified by the standard, and would be recognized by other elements as sufficiently capable of communicating with the other elements in the manner specified by the standard. The compatible element does not need to operate internally in a manner specified by the standard.
- It will be further understood that various changes in the details, materials, and arrangements of the parts which have been described and illustrated in order to explain the nature of the claimed invention might be made by those skilled in the art without departing from the scope of the following claims.
Claims (20)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/238,615 US11500556B1 (en) | 2021-04-23 | 2021-04-23 | Storage system with passive witness node |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/238,615 US11500556B1 (en) | 2021-04-23 | 2021-04-23 | Storage system with passive witness node |
Publications (2)
Publication Number | Publication Date |
---|---|
US20220342566A1 true US20220342566A1 (en) | 2022-10-27 |
US11500556B1 US11500556B1 (en) | 2022-11-15 |
Family
ID=83694209
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/238,615 Active 2041-06-12 US11500556B1 (en) | 2021-04-23 | 2021-04-23 | Storage system with passive witness node |
Country Status (1)
Country | Link |
---|---|
US (1) | US11500556B1 (en) |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110179188A1 (en) * | 2009-10-09 | 2011-07-21 | Hitachi, Ltd. | Storage system and storage system communication path management method |
US20150134929A1 (en) * | 2013-11-11 | 2015-05-14 | International Business Machines Corporation | Load Balancing Logical Units in an Active/Passive Storage System |
US20160342337A1 (en) * | 2015-04-17 | 2016-11-24 | Emc Corporation | Method and apparatus for scaling out storage devices and scaled-out storage devices |
US20170315724A1 (en) * | 2016-04-28 | 2017-11-02 | Pure Storage, Inc. | Deploying client-specific applications in a storage system utilizing redundant system resources |
US20180293017A1 (en) * | 2017-04-10 | 2018-10-11 | Pure Storage, Inc. | Migrating applications executing on a storage system |
US20200034043A1 (en) * | 2018-07-24 | 2020-01-30 | International Business Machines Corporation | Storage controller with ocs for managing active-passive backend storage arrays |
US20200042481A1 (en) * | 2018-08-01 | 2020-02-06 | EMC IP Holding Company LLC | Moving from back-to-back topology to switched topology in an infiniband network |
US20200250126A1 (en) * | 2018-11-16 | 2020-08-06 | Vmware, Inc. | Active-active architecture for distributed iscsi target in hyper-converged storage |
US20200293551A1 (en) * | 2013-12-12 | 2020-09-17 | Huawei Technologies Co., Ltd. | Data Replication Method and Storage System |
US20210042051A1 (en) * | 2019-08-05 | 2021-02-11 | Hitachi, Ltd. | Storage system and storage control method |
US20220147269A1 (en) * | 2019-06-24 | 2022-05-12 | Zhejiang Dahua Technology Co., Ltd. | Dual-controller storage systems |
-
2021
- 2021-04-23 US US17/238,615 patent/US11500556B1/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110179188A1 (en) * | 2009-10-09 | 2011-07-21 | Hitachi, Ltd. | Storage system and storage system communication path management method |
US20150134929A1 (en) * | 2013-11-11 | 2015-05-14 | International Business Machines Corporation | Load Balancing Logical Units in an Active/Passive Storage System |
US20200293551A1 (en) * | 2013-12-12 | 2020-09-17 | Huawei Technologies Co., Ltd. | Data Replication Method and Storage System |
US20160342337A1 (en) * | 2015-04-17 | 2016-11-24 | Emc Corporation | Method and apparatus for scaling out storage devices and scaled-out storage devices |
US20170315724A1 (en) * | 2016-04-28 | 2017-11-02 | Pure Storage, Inc. | Deploying client-specific applications in a storage system utilizing redundant system resources |
US20180293017A1 (en) * | 2017-04-10 | 2018-10-11 | Pure Storage, Inc. | Migrating applications executing on a storage system |
US20200034043A1 (en) * | 2018-07-24 | 2020-01-30 | International Business Machines Corporation | Storage controller with ocs for managing active-passive backend storage arrays |
US20200042481A1 (en) * | 2018-08-01 | 2020-02-06 | EMC IP Holding Company LLC | Moving from back-to-back topology to switched topology in an infiniband network |
US20200250126A1 (en) * | 2018-11-16 | 2020-08-06 | Vmware, Inc. | Active-active architecture for distributed iscsi target in hyper-converged storage |
US20220147269A1 (en) * | 2019-06-24 | 2022-05-12 | Zhejiang Dahua Technology Co., Ltd. | Dual-controller storage systems |
US20210042051A1 (en) * | 2019-08-05 | 2021-02-11 | Hitachi, Ltd. | Storage system and storage control method |
Also Published As
Publication number | Publication date |
---|---|
US11500556B1 (en) | 2022-11-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7337355B2 (en) | Method, system, and program for error handling in a dual adaptor system where one adaptor is a master | |
US9477738B2 (en) | Initialization protocol for a peer-to-peer replication environment | |
US7392424B2 (en) | Router and routing protocol redundancy | |
US20180329630A1 (en) | Data synchronization method and system, and synchronization obtaining method and apparatus | |
KR20070026327A (en) | Redundant routing capabilities for a network node cluster | |
US9367298B1 (en) | Batch configuration mode for configuring network devices | |
US11403319B2 (en) | High-availability network device database synchronization | |
CN110113406B (en) | Distributed computing service cluster system | |
JP2009251786A (en) | Data processing method, storage apparatus, and storage system | |
US20130110782A1 (en) | Oportunistic database duplex operations | |
KR100433056B1 (en) | Method for Program Upgrade | |
US11500556B1 (en) | Storage system with passive witness node | |
US11063859B2 (en) | Packet processing method and network device | |
JPWO2008136107A1 (en) | Switching processing program, switching processing method, and complete duplex system | |
CN109445984B (en) | Service recovery method, device, arbitration server and storage system | |
US10691616B1 (en) | Safe buffer transfer mechanism in a distributed storage system | |
US20200034058A1 (en) | Method and apparatus for dynamic flow control in distributed storage systems | |
US20170279667A1 (en) | Providing a redundant connection in response to a modified connection | |
US20210255798A1 (en) | Recover Time Improvement Mechanism After Device Path Failure In A Storage System | |
US11281548B2 (en) | 2-phase sync replication recovery to optimize recovery point objective (RPO) | |
US11070654B2 (en) | Sockets for shared link applications | |
CN115698955A (en) | Fault tolerance of transaction images | |
US11637749B1 (en) | Metadata synchronization for remote managed systems | |
US20240097812A1 (en) | Clock source selection method, apparatus, and system, and storage medium | |
US20210132806A1 (en) | Global deadline driven local synchronous replication i/o handling and recover |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
AS | Assignment |
Owner name: CREDIT SUISSE AG, CAYMAN ISLANDS BRANCH, NORTH CAROLINA Free format text: SECURITY AGREEMENT;ASSIGNORS:DELL PRODUCTS L.P.;EMC IP HOLDING COMPANY LLC;REEL/FRAME:056250/0541 Effective date: 20210514 |
|
AS | Assignment |
Owner name: CREDIT SUISSE AG, CAYMAN ISLANDS BRANCH, NORTH CAROLINA Free format text: CORRECTIVE ASSIGNMENT TO CORRECT THE MISSING PATENTS THAT WERE ON THE ORIGINAL SCHEDULED SUBMITTED BUT NOT ENTERED PREVIOUSLY RECORDED AT REEL: 056250 FRAME: 0541. ASSIGNOR(S) HEREBY CONFIRMS THE ASSIGNMENT;ASSIGNORS:DELL PRODUCTS L.P.;EMC IP HOLDING COMPANY LLC;REEL/FRAME:056311/0781 Effective date: 20210514 |
|
AS | Assignment |
Owner name: THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT, TEXAS Free format text: SECURITY INTEREST;ASSIGNORS:DELL PRODUCTS L.P.;EMC IP HOLDING COMPANY LLC;REEL/FRAME:056295/0001 Effective date: 20210513 Owner name: THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT, TEXAS Free format text: SECURITY INTEREST;ASSIGNORS:DELL PRODUCTS L.P.;EMC IP HOLDING COMPANY LLC;REEL/FRAME:056295/0280 Effective date: 20210513 Owner name: THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT, TEXAS Free format text: SECURITY INTEREST;ASSIGNORS:DELL PRODUCTS L.P.;EMC IP HOLDING COMPANY LLC;REEL/FRAME:056295/0124 Effective date: 20210513 |
|
AS | Assignment |
Owner name: EMC IP HOLDING COMPANY LLC, TEXAS Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:CREDIT SUISSE AG, CAYMAN ISLANDS BRANCH;REEL/FRAME:058297/0332 Effective date: 20211101 Owner name: DELL PRODUCTS L.P., TEXAS Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:CREDIT SUISSE AG, CAYMAN ISLANDS BRANCH;REEL/FRAME:058297/0332 Effective date: 20211101 |
|
AS | Assignment |
Owner name: EMC IP HOLDING COMPANY LLC, TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS PREVIOUSLY RECORDED AT REEL/FRAME (056295/0001);ASSIGNOR:THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT;REEL/FRAME:062021/0844 Effective date: 20220329 Owner name: DELL PRODUCTS L.P., TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS PREVIOUSLY RECORDED AT REEL/FRAME (056295/0001);ASSIGNOR:THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT;REEL/FRAME:062021/0844 Effective date: 20220329 Owner name: EMC IP HOLDING COMPANY LLC, TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS PREVIOUSLY RECORDED AT REEL/FRAME (056295/0124);ASSIGNOR:THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT;REEL/FRAME:062022/0012 Effective date: 20220329 Owner name: DELL PRODUCTS L.P., TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS PREVIOUSLY RECORDED AT REEL/FRAME (056295/0124);ASSIGNOR:THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT;REEL/FRAME:062022/0012 Effective date: 20220329 Owner name: EMC IP HOLDING COMPANY LLC, TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS PREVIOUSLY RECORDED AT REEL/FRAME (056295/0280);ASSIGNOR:THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT;REEL/FRAME:062022/0255 Effective date: 20220329 Owner name: DELL PRODUCTS L.P., TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS PREVIOUSLY RECORDED AT REEL/FRAME (056295/0280);ASSIGNOR:THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT;REEL/FRAME:062022/0255 Effective date: 20220329 Owner name: EMC IP HOLDING COMPANY LLC, MASSACHUSETTS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:GOLAN, SALLY;LOYA, LIRAN;HARDUF, YUVAL;REEL/FRAME:060164/0794 Effective date: 20220509 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |