US20220229693A1 - Centralized high-availability flows execution framework - Google Patents

Centralized high-availability flows execution framework Download PDF

Info

Publication number
US20220229693A1
US20220229693A1 US17/153,135 US202117153135A US2022229693A1 US 20220229693 A1 US20220229693 A1 US 20220229693A1 US 202117153135 A US202117153135 A US 202117153135A US 2022229693 A1 US2022229693 A1 US 2022229693A1
Authority
US
United States
Prior art keywords
process thread
execution
active
execute
request
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US17/153,135
Other versions
US11586466B2 (en
Inventor
Inna Reznik
Ahia Lieber
Eran Banin
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
EMC Corp
Original Assignee
EMC IP Holding Co LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by EMC IP Holding Co LLC filed Critical EMC IP Holding Co LLC
Priority to US17/153,135 priority Critical patent/US11586466B2/en
Assigned to EMC IP Holding Company LLC reassignment EMC IP Holding Company LLC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: BANIN, ERAN, LIEBER, AHIA, REZNIK, INNA
Assigned to CREDIT SUISSE AG, CAYMAN ISLANDS BRANCH reassignment CREDIT SUISSE AG, CAYMAN ISLANDS BRANCH SECURITY AGREEMENT Assignors: DELL PRODUCTS L.P., EMC IP Holding Company LLC
Assigned to THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT reassignment THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT SECURITY INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: DELL PRODUCTS L.P., EMC IP Holding Company LLC
Assigned to THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT reassignment THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT SECURITY INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: DELL PRODUCTS L.P., EMC IP Holding Company LLC
Assigned to THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT reassignment THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT SECURITY INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: DELL PRODUCTS L.P., EMC IP Holding Company LLC
Assigned to EMC IP Holding Company LLC, DELL PRODUCTS L.P. reassignment EMC IP Holding Company LLC RELEASE OF SECURITY INTEREST AT REEL 055408 FRAME 0697 Assignors: CREDIT SUISSE AG, CAYMAN ISLANDS BRANCH
Assigned to DELL PRODUCTS L.P., EMC IP Holding Company LLC reassignment DELL PRODUCTS L.P. RELEASE OF SECURITY INTEREST IN PATENTS PREVIOUSLY RECORDED AT REEL/FRAME (055479/0342) Assignors: THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT
Assigned to EMC IP Holding Company LLC, DELL PRODUCTS L.P. reassignment EMC IP Holding Company LLC RELEASE OF SECURITY INTEREST IN PATENTS PREVIOUSLY RECORDED AT REEL/FRAME (055479/0051) Assignors: THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT
Assigned to EMC IP Holding Company LLC, DELL PRODUCTS L.P. reassignment EMC IP Holding Company LLC RELEASE OF SECURITY INTEREST IN PATENTS PREVIOUSLY RECORDED AT REEL/FRAME (056136/0752) Assignors: THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT
Publication of US20220229693A1 publication Critical patent/US20220229693A1/en
Publication of US11586466B2 publication Critical patent/US11586466B2/en
Application granted granted Critical
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3851Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution from multiple instruction streams, e.g. multistreaming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/485Task life-cycle, e.g. stopping, restarting, resuming execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5018Thread allocation

Definitions

  • storage controllers In a high-availability (HA) cluster, storage controllers (also referred to herein as “storage nodes”) may be deployed in an active-passive configuration, in which a primary storage node takes on the role of an active node and at least one secondary storage node takes on the role of a standby node.
  • the active node In the active-passive configuration, the active node may process storage input/output (IO) requests from host computers and maintain page reference information on its memory, while the standby node may not be currently interacting with the host computers.
  • Storage controllers in an HA cluster may also be deployed in an active-active configuration, in which two or more active nodes collaborate to process storage IO requests from host computers and maintain page reference information on their memories in image style.
  • HA event In the event of a process or equipment malfunction (also referred to herein as a “high availability (HA) event”) on an active node in an active-passive configuration, a system-level failover can occur, in which tasks of the active node, including processing storage IO requests and maintaining page reference information, are entirely taken over by a standby node. An appropriate set of actions can then be executed in a high-availability (HA) process flow (also referred to herein as an “HA flow”) to address actual or potential ramifications of the HA event.
  • HA process flow also referred to herein as an “HA flow”
  • multiple such HA events can occur simultaneously on two or more active nodes, requiring multiple HA flows to be executed concurrently to address any actual or potential ramifications of the HA events. Because such concurrent HA flows can have dependencies in which certain HA flows are dependent upon other HA flows or processes to execute their functions, a more unified approach to addressing HA events occurring on storage nodes in an active-active configuration is
  • the disclosed techniques can include an HA flows execution framework manager (also referred to herein as the “framework manager”), which can be implemented on one of multiple storage nodes in an active-active configuration.
  • the framework manager can receive, periodically or at intervals, explicit or implicit notifications and/or reports of functional statuses of processes and/or equipment associated with the storage nodes in the active-active configuration.
  • the framework manager can make determinations regarding whether and/or how to address any actual or potential process and/or equipment malfunctions (or “HA events”) based on the received notifications and/or reports.
  • the framework manager can implement an HA flow for each HA event as an asynchronous process thread.
  • the framework manager can represent each HA flow as an instance of an HA flow object and store the HA flow object for each HA flow waiting to be executed in a persistent repository or database.
  • the framework manager can define each HA flow with reference to one or more dependencies specifying its relationships with one or more other HA flows and/or certain software, firmware, and/or hardware modules or components in the active-active configuration.
  • the framework manager can determine whether to refuse a request to execute the HA flow, service the request to execute the HA flow, abort one or more HA flows in execution, and/or postpone execution of the HA flow to a later time.
  • a method of handling execution of high-availability (HA) process threads in an active-active storage node configuration includes receiving notifications of functional statuses of processes or equipment associated with storage nodes in an active-active configuration, determining that an HA event has occurred on one of the processes or equipment associated with the storage nodes in the active-active configuration based on the received notifications, and, in response to a request to execute a first HA process thread to address the HA event, performing one or more of refusing the request to execute the first HA process thread, servicing the request to execute the first HA process thread, aborting one or more HA process threads in execution, and postponing execution of the first HA process thread based on one or more dependencies defining conditions for the first HA process thread.
  • HA high-availability
  • the method further includes specifying a set of parameters and a set of executable steps for the first HA process thread.
  • the set of parameters includes the one or more dependencies defining the conditions for the first HA process thread and an abort policy specifying rules regarding whether or when to abort the one or more HA process threads in execution.
  • the method further includes, in response to the request to execute the first HA process thread not being refused, allocating a first HA process thread object representing the first HA process thread, and adding the first HA process thread object to a persistent database.
  • the method further includes checking the specified rules in the abort policy and aborting one or more of the HA process threads in execution based on the specified rules.
  • the method further includes checking the dependencies defining the conditions for the first HA process thread with regard to one or more other HA process threads represented by other HA process thread objects in the persistent database.
  • the method further includes, in response to the dependencies dictating an order in which the first HA process thread and the other HA process threads are to be executed, performing the postponing of the execution of the first HA process thread to satisfy the dependencies.
  • the method further includes checking the specified rules in the abort policy and aborting all of the HA process threads in execution based on the specified rules.
  • the method further includes checking the dependencies defining the conditions for the first HA process thread with regard to one or more other HA process threads represented by other HA process thread objects in the persistent database, and, in response to the dependencies dictating an order in which the first HA process thread and the other HA process threads are to be executed, performing the postponing of the execution of the first HA process thread to satisfy the dependencies.
  • the method further includes, for each respective HA process thread from among the one or more other HA process threads represented by the other HA process thread objects in the persistent database, determining one or more of whether a request to execute the respective HA process thread should be refused and whether execution of the respective HA process thread should be postponed as necessary to satisfy its dependencies.
  • the method further includes, having determined whether the request to execute the respective HA process thread should be refused or whether the execution of the respective HA process thread should be postponed, initiating execution of the first HA process thread.
  • a system for handling execution of high-availability (HA) process threads in an active-active storage node configuration includes a persistent database, a memory, and processing circuitry configured to execute program instructions out of the memory to receive notifications of functional statuses of processes or equipment associated with storage nodes in an active-active configuration, to determine that an HA event has occurred on one of the processes or equipment associated with the storage nodes in the active-active configuration based on the received notifications, and, in response to a request to execute a first HA process thread to address the HA event, to perform one or more of refusing the request to execute the first HA process thread, servicing the request to execute the first HA process thread, aborting one or more HA process threads in execution, and postponing execution of the first HA process thread based on one or more dependencies defining conditions for the first HA process thread.
  • HA high-availability
  • the processing circuitry is further configured to execute the program instructions out of the memory to specify a set of parameters and a set of executable steps for the first HA process thread, in which the set of parameters includes the one or more dependencies defining the conditions for the first HA process thread and an abort policy specifying rules regarding whether or when to abort the one or more HA process threads in execution.
  • the processing circuitry is further configured to execute the program instructions out of the memory, in response to the request to execute the first HA process thread not being refused, to allocate a first HA process thread object representing the first HA process thread, and to add the first HA process thread object to the persistent database.
  • the processing circuitry is further configured to execute the program instructions out of the memory to check the specified rules in the abort policy, and to abort one or more of the HA process threads in execution based on the specified rules.
  • the processing circuitry is further configured to execute the program instructions out of the memory to check the dependencies defining the conditions for the first HA process thread with regard to one or more other HA process threads represented by other HA process thread objects in the persistent database.
  • the processing circuitry is further configured to execute the program instructions out of the memory, in response to the dependencies dictating an order in which the first HA process thread and the other HA process threads are to be executed, to perform the postponing of the execution of the first HA process thread to satisfy the dependencies.
  • the processing circuitry is further configured to execute the program instructions out of the memory to check the specified rules in the abort policy, to abort all of the HA process threads in execution based on the specified rules, to check the dependencies defining the conditions for the first HA process thread with regard to one or more other HA process threads represented by other HA process thread objects in the persistent database, and, in response to the dependencies dictating an order in which the first HA process thread and the other HA process threads are to be executed, to perform the postponing of the execution of the first HA process thread to satisfy the dependencies.
  • the processing circuitry is further configured to execute the program instructions out of the memory, for each respective HA process thread from among the one or more other HA process threads represented by the other HA process thread objects in the persistent database, to determine one or more of whether a request to execute the respective HA process thread should be refused and whether execution of the respective HA process thread should be postponed as necessary to satisfy its dependencies.
  • the processing circuitry is further configured to execute the program instructions out of the memory, having determined whether the request to execute the respective HA process thread should be refused or whether the execution of the respective HA process thread should be postponed, initiating execution of the first HA process thread.
  • a computer program product includes a set of non-transitory, computer-readable media having instructions that, when executed by processing circuitry, cause the processing circuitry to perform a method of handling execution of high-availability (HA) process threads in an active-active storage node configuration.
  • HA high-availability
  • the method includes receiving notifications of functional statuses of processes or equipment associated with storage nodes in an active-active configuration, determining that an HA event has occurred on one of the processes or equipment associated with the storage nodes in the active-active configuration based on the received notifications, and, in response to a request to execute a first HA process thread to address the HA event, performing one or more of refusing the request to execute the first HA process thread, servicing the request to execute the first HA process thread, aborting one or more HA process threads in execution, and postponing execution of the first HA process thread based on one or more dependencies defining conditions for the first HA process thread.
  • FIG. 1 a is a block diagram of an exemplary data storage environment, in which techniques can be practiced for providing a centralized framework for handling execution of high-availability (HA) process flows in an active-active storage node configuration;
  • HA high-availability
  • FIG. 1 b is a block diagram of an active-active storage system in the data storage environment of FIG. 1 a;
  • FIG. 2 a is a block diagram of an exemplary storage controller (or “storage node”) from among multiple storage controllers (or “storage nodes”) included in the active-active storage system of FIG. 1 b, in which the storage node includes an HA flows execution framework manager (or “framework manager”) and a persistent HA flow object database;
  • the storage node includes an HA flows execution framework manager (or “framework manager”) and a persistent HA flow object database;
  • FIG. 2 b is a block diagram of an exemplary HA flow object upon which the techniques can be practiced in the data storage environment of FIG. 1 a;
  • FIG. 3 is a flow diagram of an exemplary method of providing a centralized framework for handling execution of HA process flows in an active-active storage node configuration.
  • HA flow(s) high-availability process flows
  • the disclosed techniques can include receiving notifications and/or reports of functional statuses of processes and/or equipment associated with storage nodes in an active-active configuration, making determinations regarding whether and/or how to address actual or potential malfunctions (also referred to herein as “HA events”) occurring on the processes and/or equipment associated with the storage nodes based on the received notifications and/or reports, and, in response to a request to execute an HA flow for a respective HA event, determining whether to refuse the request to execute the HA flow, service the request to execute the HA flow, abort one or more HA flows in execution, and/or postpone execution of the HA flow to a later time based on one or more dependencies defining conditions for the HA flow. In this way, mutual interference of HA flows or other process threads in an active-active configuration can be reduced
  • FIG. 1 a depicts an illustrative embodiment of an exemplary data storage environment 100 , in which techniques can be practiced for providing a centralized framework for handling execution of HA flows in an active-active storage node configuration.
  • the data storage environment 100 can include a plurality of host computers 102 . 1 , 102 . 2 , . . . , 102 . n, an active-active storage system 104 , and a communications medium 103 that includes at least one network 108 .
  • n can be configured as a web server computer, a file server computer, an email server computer, an enterprise server computer, and/or any other suitable client/server computer or computerized device.
  • the plurality of host computers 102 . 1 , . . . , 102 . n can be configured to provide, over the network 108 , storage input/output (IO) requests (e.g., small computer system interface (SCSI) commands, network file system (NFS) commands) to the active-active storage system 104 .
  • IO storage input/output
  • SCSI small computer system interface
  • NFS network file system
  • Such storage IO requests can direct a storage controller (or “storage node”) to write or read data blocks, data pages, data files, or any other suitable data elements to/from volumes (VOLs), logical units (LUs), file systems, and/or any other suitable storage objects, such as storage objects 110 . 1 , 110 . 2 , . . . , 110 . m maintained in association with the active-active storage system 104 .
  • VOLs volumes
  • LUs logical units
  • file systems and/or any other suitable storage objects, such as storage objects 110 . 1 , 110 . 2 , . . . , 110 . m maintained in association with the active-active storage system 104 .
  • the communications medium 103 can be configured to interconnect the plurality of host computers 102 . 1 , . . . , 102 . n and the active-active storage system 104 to enable them to communicate and exchange data and/or control signaling.
  • the communications medium 103 can be illustrated as a “cloud” to represent different communications topologies such as a backbone topology, a hub-and-spoke topology, a loop topology, an irregular topology, and so on, or any suitable combination thereof.
  • the communications medium 103 can include copper-based data communications devices and cabling, fiber optic-based communications devices and cabling, wireless communications devices, and so on, or any suitable combination thereof.
  • the communications medium 103 can be configured to support storage area network (SAN) communications, network attached storage (NAS) communications, local area network (LAN) communications, metropolitan area network (MAN) communications, wide area network (WAN) communications, wireless communications, distributed infrastructure communications, and/or any other suitable communications.
  • SAN storage area network
  • NAS network attached storage
  • LAN local area network
  • MAN metropolitan area network
  • WAN wide area network
  • wireless communications distributed infrastructure communications, and/or any other suitable communications.
  • FIG. 1 b depicts another view of the active-active storage system 104 of FIG. 1 a.
  • the term “active-active storage system” refers to a highly available storage system, in which multiple storage nodes have shared or exclusive read-write IO access to the same storage objects (e.g., volumes (VOLs), logical units (LUs), file systems).
  • the active-active storage system 104 can include at least two storage controllers (or “storage nodes”) for high availability, namely, a primary storage node A 112 . 1 and a secondary storage node B 112 . 2 , which is communicably connected to the storage node A 112 . 1 by a communication path 111 .
  • each of the storage node A 112 . 1 and the storage node B 112 . 2 can receive storage IO requests from the respective host computers 102 . 1 , . . . , 102 . n over the network 108 .
  • the storage nodes A 112 . 1 , B 112 . 2 can perform storage IO operations (e.g., read-write IO operations) to write/read data blocks, data pages, data files, or any other suitable data elements to/from one or more of the plurality of storage objects 110 . 1 , . . . , 110 . m.
  • page reference information pertaining to read-write IO operations maintained in a journal by the storage node A 112 . 1 can be synchronized with corresponding page reference information maintained in a journal by the storage node B 112 . 2 . If the storage node A 112 . 1 is taken offline (or at any other suitable time), then the storage node B 112 . 2 can assume the role and/or duties of the storage node A 112 . 1 with regard to the handling of storage IO requests, providing high availability within the active-active storage system 104 . As further shown in FIG.
  • the active-active storage system 104 can include one or more storage devices 114 , which can be embodied as one or more non-volatile random-access memories (NVRAM), solid-state drives (SSDs), hard disk drives (HDDs), flash memories, and/or any other suitable storage device(s) for storing storage object data and/or metadata.
  • NVRAM non-volatile random-access memories
  • SSDs solid-state drives
  • HDDs hard disk drives
  • flash memories and/or any other suitable storage device(s) for storing storage object data and/or metadata.
  • FIG. 2 a depicts an exemplary storage node 200 from among multiple storage nodes included in the active-active storage system 104 of FIG. 1 a.
  • the storage node 200 can include a communications interface 202 , processing circuitry 204 , and a memory 206 .
  • the communications interface 202 can include one or more of an Ethernet interface, an InfiniBand interface, a fiber channel interface, and/or any other suitable communications interface.
  • the communications interface 202 can further include SCSI target adapters, network interface adapters, and/or any other suitable adapters for converting electronic, optical, and/or wireless signals received over the network(s) 108 to a form suitable for use by the processing circuitry 204 .
  • the memory 206 can include volatile memory such as random-access memory (RAM) or any other suitable volatile memory, as well as persistent memory such as NVRAM, read-only memory (ROM), one or more HDDs, one or more SSDs, or any other suitable persistent memory.
  • RAM random-access memory
  • ROM read-only memory
  • the memory 206 can be configured to store a variety of software constructs realized in the form of specialized code and data (e.g., program instructions) that can be executed by the processing circuitry 204 to carry out the techniques and/or methods disclosed herein.
  • the memory 206 can further include an operating system 208 such as the Linux OS, Unix OS, Windows OS, or any other suitable operating system, as well as a malfunction monitor 210 that can be executed by the processing circuitry 204 .
  • the processing circuitry 204 can include one or more physical processors, controllers, 10 modules, and/or any other suitable computer hardware or combination thereof.
  • each of the multiple storage nodes included in the active-active storage system 104 can be configured to include at least a communications interface, processing circuitry, a memory, an OS, and a malfunction monitor like the storage node 200 of FIG. 2 a .
  • one of the multiple storage nodes can be further configured to include an HA flows execution framework manager (or “framework manager”) 212 , as well as a persistent repository or database 214 configured to store multiple instances of high-availability (HA) flow objects.
  • the malfunction monitor 210 can be configured to monitor functional statuses of processes and/or equipment associated with the storage node 200 and send notifications and/or reports of the functional statuses to the framework manager 212 .
  • a computer program product can be configured to deliver all or a portion of the specialized code and data to the respective processor(s).
  • a computer program product can include one or more non-transient computer-readable storage media, such as a magnetic disk, a magnetic tape, a compact disk (CD), a digital versatile disk (DVD), an optical disk, a flash drive, a solid-state drive (SSD), a secure digital (SD) chip or device, an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), and so on.
  • the non-transient computer-readable storage media can be encoded with sets of program instructions for performing, when executed by the respective processor(s), the various techniques and/or methods disclosed herein.
  • FIG. 2 b depicts an exemplary HA flow object 216 that can be stored in the persistent HA flow object database 214 of FIG. 2 a .
  • each HA flow can be represented as an instance of an HA flow object, and each HA flow object for each HA flow waiting to be executed can be stored in the HA flow object database 214 . As shown in FIG.
  • the HA flow object 216 can include a plurality of fields relating to an HA flow, including a first field for an indication of an HA flow purpose 218 (e.g., to establish a connection, reset a disk, reboot a storage node), a second field for an HA flow identifier 220 (e.g., alphabetic, numeric, alphanumeric identifier), a third field for an indication of an HA flow state 222 (e.g., uninitialized, waiting to be executed, in execution, completed execution), a fourth field for an indication of an HA flow progress 224 (e.g., process flow step in execution), and a fifth field for an indication of an HA flow execution result 226 (e.g., successful or failed execution).
  • the HA flow object 216 can further include a field for logging and statistics information 228 (e.g., certain actions taken by the framework manager 212 , a timestamp taken at the start of HA flow execution).
  • the framework manager e.g., the framework manager 212 ; see FIG. 2 a
  • the framework manager can receive, periodically or at intervals, notifications and/or reports of functional statuses of processes and/or equipment associated with the multiple storage nodes (e.g., storage nodes A 112 . 1 , B 112 . 2 ) in the active-active storage system 104 .
  • the framework manager 212 can make determinations regarding whether and/or how to address any actual or potential process and/or equipment malfunctions (or “HA events”) based on the received notifications and/or reports. For example, such HA events can occur on one or more of the storage nodes (e.g., storage nodes A 112 . 1 , B 112 .
  • the framework manager 212 determines to address one or more actual or potential HA events occurring on certain processes and/or equipment associated with storage nodes in the active-active storage system 104 , then the framework manager 212 can implement an HA flow for each HA event as an asynchronous process thread.
  • Each HA flow can be represented as an instance of an HA flow object, and the HA flow object for each HA flow waiting to be executed can be stored in the persistent HA flow object database (e.g., the HA flow object database 214 ; see FIG. 2 a ).
  • the framework manager 212 can define each HA flow at least with reference to one or more dependencies specifying its relationships with one or more other HA flows and/or certain software, firmware, and/or hardware modules or components in the active-active storage system 104 . Based at least on the dependencies defining conditions for the HA flow, the framework manager 212 can determine whether to refuse a request to execute the HA flow, service the request to execute the HA flow, abort one or more HA flows in execution, and/or postpone execution of the HA flow to a later time. In this way, mutual interference of HA flows or other process threads in an active-active configuration can be reduced or eliminated, and recovery times from HA events occurring in the active-active configuration can be reduced.
  • the framework manager 212 (see FIG. 2 a ) has received one or more notifications of functional statuses of processes and/or equipment associated with at least one of the storage node A 112 . 1 and the storage node B 112 . 2 in the active-active storage system 104 .
  • notifications can be sent to the framework manager 212 by a malfunction monitor (e.g., the malfunction monitor 210 ; see FIG. 2 a ) executing on the storage node A 112 . 1 and/or the storage node B 112 . 2 .
  • Such notifications can be explicit notifications of the functional statuses of processes or equipment (e.g., a disk is disconnected) and/or implicit notifications of the functional statuses of processes or equipment (e.g., a storage IO operation has failed).
  • the framework manager 212 has determined, based on the received notifications, that a process or equipment malfunction (or “HA event”) associated with one of the storage nodes A 112 . 1 , B 112 . 2 has occurred.
  • the HA event can be due to a disk malfunction, an overheated hardware component, a control malfunction, or any other suitable malfunction.
  • the framework manager 212 implements a new HA flow for the HA event as an asynchronous process thread.
  • the new HA flow is defined by a set of parameters and a set of executable steps.
  • the set of parameters can include (i) zero, one, or more dependencies specifying the new HA flow's relationships with one or more other HA flows represented by HA flow objects in the persistent HA flow object database 214 , (ii) an abort policy specifying rules regarding whether and/or when to abort certain HA flows in execution at the time a request to execute the new HA flow is generated, and (iii) logging and statistics information.
  • the abort policy can be priority-based or can explicitly specify which HA flows in execution to abort. It is noted that certain HA flows in execution will be aborted only if required by the abort policy.
  • the HA flow will not be aborted or interrupted.
  • the set of executable steps can include a set of actions to be taken by the new HA flow to address the HA event.
  • the framework manager 212 Upon implementation of the new HA flow for the HA event, the framework manager 212 generates a request to execute the new HA flow.
  • the framework manager 212 determines, as appropriate, (i) whether the request should be immediately refused, (ii) whether any HA flows in execution should be aborted, in accordance with the abort policy, and (iii) whether execution of the new HA flow should be postponed to a later time. For example, such refusal of the request to execute the new HA flow can be based on the storage node A 112 . 1 or B 112 . 2 of interest having been taken offline or any other suitable reason. If the request is not immediately refused, then the framework manager 212 allocates an HA flow object configured to represent the new HA flow and adds the HA flow object to the HA flow object database 214 .
  • the framework manager 212 checks the rules specified in the abort policy for the new HA flow and aborts zero, one, or more asynchronous process threads for HA flows in execution, as warranted by the rules. In addition, the framework manager 212 checks the dependencies of the new HA flow vis-a-vis one or more other HA flows represented by HA flow objects in the HA flow object database 214 . If the HA flow dependencies dictate a certain order in which the HA flows may be executed, then the framework manager 212 can postpone the execution of the new HA flow, as necessary, to satisfy the dependencies.
  • the framework manager 212 can determine whether any other factors exist preventing immediate execution of the new HA flow. If so, then the framework manager 212 can determine, periodically or at intervals, whether such factors preventing execution of the new HA flow continue to exist. Once it is determined that such factors no longer exist, then the framework manager 212 starts execution of the new HA flow in the asynchronous process thread.
  • the framework manager 212 (see FIG. 2 a ) has received one or more additional notifications of functional statuses of processes and/or equipment associated with at least one of the storage node A 112 . 1 and the storage node B 112 . 2 in the active-active storage system 104 . It is further assumed that the framework manager 212 has determined, based on the received notifications, that a process or equipment malfunction (or “HA event”) has again occurred on a process or equipment associated with one of the storage nodes A 112 . 1 , B 112 . 2 .
  • a process or equipment malfunction or “HA event”
  • the framework manager 212 implements another new HA flow for the HA event as an asynchronous process thread.
  • the new HA flow of the second example is defined by a set of parameters and a set of executable steps.
  • the set of parameters can include (i) zero, one, or more dependencies specifying the new HA flow's relationships with one or more other HA flows represented by HA flow objects in the persistent HA flow object database 214 , (ii) an abort policy specifying rules regarding whether and/or when to abort certain HA flows in execution at the time a request to execute the new HA flow is generated, and (iii) logging and statistics information.
  • the rules specified in the abort policy dictate that all HA flows in execution are to be aborted.
  • the framework manager 212 Upon implementation of the new HA flow for the HA event, the framework manager 212 generates a request to execute the new HA flow.
  • the framework manager 212 determines, as appropriate, (i) whether the request should be immediately refused, (ii) whether any HA flows in execution should be aborted, in accordance with the abort policy, and (iii) whether execution of the new HA flow should be postponed to a later time. If the request is not immediately refused, then the framework manager 212 allocates an HA flow object configured to represent the new HA flow and adds the HA flow object to the HA flow object database 214 . Further, the framework manager 212 checks the rules specified in the abort policy for the new HA flow and aborts all asynchronous process threads for HA flows in execution, as warranted by the rules.
  • the framework manager 212 checks the dependencies of the new HA flow vis-a-vis one or more other HA flows represented by HA flow objects in the HA flow object database 214 and postpones the execution of the new HA flow, as necessary, to satisfy the dependencies. Moreover, for each HA flow from among the other HA flows represented by HA flow objects in the HA flow object database 214 , the framework manager 212 further determines, as appropriate, (i) whether the request to execute the HA flow should be immediately refused and (ii) whether execution of the HA flow should be postponed as necessary to satisfy its dependencies. Once these further determinations are made and satisfied, the framework manager 212 starts execution of the new HA flow in the asynchronous process thread.
  • a method of handling execution of HA process threads in an active-active storage node configuration is described below with reference to FIG. 3 .
  • notifications of functional statuses of processes or equipment associated with storage nodes in an active-active configuration are received.
  • a determination is made that an HA event has occurred on one of the processes or equipment associated with the storage nodes in the active-active configuration based on the received notifications.
  • one or more of refusing the request to execute the HA process thread, servicing the request to execute the HA process thread, aborting one or more HA process threads in execution, and postponing execution of the HA process thread based on one or more dependencies defining conditions for the HA process thread, are performed.
  • storage system is intended to be broadly construed to encompass, for example, private or public cloud computing systems for storing data, as well as systems for storing data comprising virtual infrastructure and those not comprising virtual infrastructure.
  • client refers, interchangeably, to any person, system, or other entity that uses a storage system to read/write data.
  • the term “storage device” may refer to a storage array including multiple storage devices.
  • a storage device may refer to any non-volatile memory (NVM) device, including hard disk drives (HDDs), solid state drives (SSDs), flash devices (e.g., NAND flash devices, NOR flash devices), and/or similar devices that may be accessed locally and/or remotely (e.g., via a storage attached network (SAN)).
  • a storage array (drive array, disk array) may refer to a data storage system used for block-based, file-based, or object storage.
  • Storage arrays can include, for example, dedicated storage hardware containing HDDs, SSDs, and/or all-flash drives.
  • a data storage entity may be a filesystem, an object storage, a virtualized device, a logical unit (LU), a logical unit number (LUN), a logical volume (LV), a logical device, a physical device, and/or a storage medium.
  • An LU may be a logical entity provided by a storage system for accessing data from the storage system and may be used interchangeably with a logical volume.
  • An LU or LUN may be used interchangeably with each other.
  • a LUN may be a logical unit number for identifying an LU and may also refer to one or more virtual disks or virtual LUNs, which may correspond to one or more virtual machines.
  • a physical storage unit may be a physical entity such as a drive or disk or an array of drives or disks for storing data in storage locations that can be accessed by addresses.
  • a physical storage unit may be used interchangeably with a physical volume.
  • the term “storage medium” may refer to one or more storage media such as a hard drive, a combination of hard drives, flash storage, a combination of flash storage, a combination of hard drives, flash storage, and other storage devices, and/or any other suitable types or combinations of computer readable storage media.
  • a storage medium may also refer to both physical and logical storage media, include multiple levels of virtual-to-physical mappings, and include an image or disk image.
  • a storage medium may be computer-readable and may be referred to as a computer-readable program medium.
  • IO request or “IO” may be used to refer to an input or output request such as a data read request or data write request.
  • the terms, “such as,” “for example,” “e.g.,” “exemplary,” and variants thereof describe non-limiting embodiments and mean “serving as an example, instance, or illustration.” Any embodiments described herein using such phrases and/or variants are not necessarily to be construed as preferred or more advantageous over other embodiments, and/or to exclude the incorporation of features from other embodiments.
  • the term “optionally” is employed herein to mean that a feature or process, etc., is provided in certain embodiments and not provided in other certain embodiments. Any particular embodiment of the present disclosure may include a plurality of “optional” features unless such features conflict with one another.

Abstract

Techniques for providing a framework for handling execution of HA flows in an active-active storage node configuration. The techniques include receiving notifications of functional statuses of processes and/or equipment associated with storage nodes in the active-active configuration, making determinations regarding how to address HA events occurring on the processes and/or equipment associated with the storage nodes based on the received notifications, and, in response to a request to execute an HA flow for a respective HA event, determining whether to refuse the request to execute the HA flow, service the request to execute the HA flow, abort one or more HA flows in execution, and/or postpone execution of the HA flow to a later time based on one or more dependencies defining conditions for the HA flow. In this way, mutual interference of HA flows or other process threads in the active-active configuration can be reduced or eliminated.

Description

    BACKGROUND
  • In a high-availability (HA) cluster, storage controllers (also referred to herein as “storage nodes”) may be deployed in an active-passive configuration, in which a primary storage node takes on the role of an active node and at least one secondary storage node takes on the role of a standby node. In the active-passive configuration, the active node may process storage input/output (IO) requests from host computers and maintain page reference information on its memory, while the standby node may not be currently interacting with the host computers. Storage controllers in an HA cluster may also be deployed in an active-active configuration, in which two or more active nodes collaborate to process storage IO requests from host computers and maintain page reference information on their memories in image style.
  • SUMMARY
  • In the event of a process or equipment malfunction (also referred to herein as a “high availability (HA) event”) on an active node in an active-passive configuration, a system-level failover can occur, in which tasks of the active node, including processing storage IO requests and maintaining page reference information, are entirely taken over by a standby node. An appropriate set of actions can then be executed in a high-availability (HA) process flow (also referred to herein as an “HA flow”) to address actual or potential ramifications of the HA event. However, in an active-active configuration, multiple such HA events can occur simultaneously on two or more active nodes, requiring multiple HA flows to be executed concurrently to address any actual or potential ramifications of the HA events. Because such concurrent HA flows can have dependencies in which certain HA flows are dependent upon other HA flows or processes to execute their functions, a more unified approach to addressing HA events occurring on storage nodes in an active-active configuration is needed.
  • Techniques are disclosed herein for providing a centralized framework for handling execution of high-availability (HA) process flows in an active-active storage node configuration. The disclosed techniques can include an HA flows execution framework manager (also referred to herein as the “framework manager”), which can be implemented on one of multiple storage nodes in an active-active configuration. In the disclosed techniques, the framework manager can receive, periodically or at intervals, explicit or implicit notifications and/or reports of functional statuses of processes and/or equipment associated with the storage nodes in the active-active configuration. The framework manager can make determinations regarding whether and/or how to address any actual or potential process and/or equipment malfunctions (or “HA events”) based on the received notifications and/or reports. If the framework manager determines to address one or more actual or potential HA events occurring in the active-active configuration, then the framework manager can implement an HA flow for each HA event as an asynchronous process thread. The framework manager can represent each HA flow as an instance of an HA flow object and store the HA flow object for each HA flow waiting to be executed in a persistent repository or database. The framework manager can define each HA flow with reference to one or more dependencies specifying its relationships with one or more other HA flows and/or certain software, firmware, and/or hardware modules or components in the active-active configuration. Based at least on the dependencies defining conditions for the HA flow, the framework manager can determine whether to refuse a request to execute the HA flow, service the request to execute the HA flow, abort one or more HA flows in execution, and/or postpone execution of the HA flow to a later time.
  • By receiving notifications and/or reports of functional statuses of processes and/or equipment associated with storage nodes in an active-active configuration, making determinations regarding whether and/or how to address actual or potential HA events occurring on the processes and/or equipment associated with the storage nodes based on the received notifications and/or reports, and, in response to a request to execute an HA flow for a respective HA event, determining whether to refuse the request to execute the HA flow, service the request to execute the HA flow, abort one or more HA flows in execution, and/or postpone execution of the HA flow to a later time based on one or more dependencies defining conditions for the HA flow, mutual interference of HA flows or other process threads in the active-active configuration can be reduced or eliminated. As a result, recovery times from HA events occurring in the active-active configuration can be reduced.
  • In certain embodiments, a method of handling execution of high-availability (HA) process threads in an active-active storage node configuration includes receiving notifications of functional statuses of processes or equipment associated with storage nodes in an active-active configuration, determining that an HA event has occurred on one of the processes or equipment associated with the storage nodes in the active-active configuration based on the received notifications, and, in response to a request to execute a first HA process thread to address the HA event, performing one or more of refusing the request to execute the first HA process thread, servicing the request to execute the first HA process thread, aborting one or more HA process threads in execution, and postponing execution of the first HA process thread based on one or more dependencies defining conditions for the first HA process thread.
  • In certain arrangements, the method further includes specifying a set of parameters and a set of executable steps for the first HA process thread. The set of parameters includes the one or more dependencies defining the conditions for the first HA process thread and an abort policy specifying rules regarding whether or when to abort the one or more HA process threads in execution.
  • In certain arrangements, the method further includes, in response to the request to execute the first HA process thread not being refused, allocating a first HA process thread object representing the first HA process thread, and adding the first HA process thread object to a persistent database.
  • In certain arrangements, the method further includes checking the specified rules in the abort policy and aborting one or more of the HA process threads in execution based on the specified rules.
  • In certain arrangements, the method further includes checking the dependencies defining the conditions for the first HA process thread with regard to one or more other HA process threads represented by other HA process thread objects in the persistent database.
  • In certain arrangements, the method further includes, in response to the dependencies dictating an order in which the first HA process thread and the other HA process threads are to be executed, performing the postponing of the execution of the first HA process thread to satisfy the dependencies.
  • In certain arrangements, the method further includes checking the specified rules in the abort policy and aborting all of the HA process threads in execution based on the specified rules.
  • In certain arrangements, the method further includes checking the dependencies defining the conditions for the first HA process thread with regard to one or more other HA process threads represented by other HA process thread objects in the persistent database, and, in response to the dependencies dictating an order in which the first HA process thread and the other HA process threads are to be executed, performing the postponing of the execution of the first HA process thread to satisfy the dependencies.
  • In certain arrangements, the method further includes, for each respective HA process thread from among the one or more other HA process threads represented by the other HA process thread objects in the persistent database, determining one or more of whether a request to execute the respective HA process thread should be refused and whether execution of the respective HA process thread should be postponed as necessary to satisfy its dependencies.
  • In certain arrangements, the method further includes, having determined whether the request to execute the respective HA process thread should be refused or whether the execution of the respective HA process thread should be postponed, initiating execution of the first HA process thread.
  • In certain embodiments, a system for handling execution of high-availability (HA) process threads in an active-active storage node configuration includes a persistent database, a memory, and processing circuitry configured to execute program instructions out of the memory to receive notifications of functional statuses of processes or equipment associated with storage nodes in an active-active configuration, to determine that an HA event has occurred on one of the processes or equipment associated with the storage nodes in the active-active configuration based on the received notifications, and, in response to a request to execute a first HA process thread to address the HA event, to perform one or more of refusing the request to execute the first HA process thread, servicing the request to execute the first HA process thread, aborting one or more HA process threads in execution, and postponing execution of the first HA process thread based on one or more dependencies defining conditions for the first HA process thread.
  • In certain arrangements, the processing circuitry is further configured to execute the program instructions out of the memory to specify a set of parameters and a set of executable steps for the first HA process thread, in which the set of parameters includes the one or more dependencies defining the conditions for the first HA process thread and an abort policy specifying rules regarding whether or when to abort the one or more HA process threads in execution.
  • In certain arrangements, the processing circuitry is further configured to execute the program instructions out of the memory, in response to the request to execute the first HA process thread not being refused, to allocate a first HA process thread object representing the first HA process thread, and to add the first HA process thread object to the persistent database.
  • In certain arrangements, the processing circuitry is further configured to execute the program instructions out of the memory to check the specified rules in the abort policy, and to abort one or more of the HA process threads in execution based on the specified rules.
  • In certain arrangements, the processing circuitry is further configured to execute the program instructions out of the memory to check the dependencies defining the conditions for the first HA process thread with regard to one or more other HA process threads represented by other HA process thread objects in the persistent database.
  • In certain arrangements, the processing circuitry is further configured to execute the program instructions out of the memory, in response to the dependencies dictating an order in which the first HA process thread and the other HA process threads are to be executed, to perform the postponing of the execution of the first HA process thread to satisfy the dependencies.
  • In certain arrangements, the processing circuitry is further configured to execute the program instructions out of the memory to check the specified rules in the abort policy, to abort all of the HA process threads in execution based on the specified rules, to check the dependencies defining the conditions for the first HA process thread with regard to one or more other HA process threads represented by other HA process thread objects in the persistent database, and, in response to the dependencies dictating an order in which the first HA process thread and the other HA process threads are to be executed, to perform the postponing of the execution of the first HA process thread to satisfy the dependencies.
  • In certain arrangements, the processing circuitry is further configured to execute the program instructions out of the memory, for each respective HA process thread from among the one or more other HA process threads represented by the other HA process thread objects in the persistent database, to determine one or more of whether a request to execute the respective HA process thread should be refused and whether execution of the respective HA process thread should be postponed as necessary to satisfy its dependencies.
  • In certain arrangements, the processing circuitry is further configured to execute the program instructions out of the memory, having determined whether the request to execute the respective HA process thread should be refused or whether the execution of the respective HA process thread should be postponed, initiating execution of the first HA process thread.
  • In certain embodiments, a computer program product includes a set of non-transitory, computer-readable media having instructions that, when executed by processing circuitry, cause the processing circuitry to perform a method of handling execution of high-availability (HA) process threads in an active-active storage node configuration. The method includes receiving notifications of functional statuses of processes or equipment associated with storage nodes in an active-active configuration, determining that an HA event has occurred on one of the processes or equipment associated with the storage nodes in the active-active configuration based on the received notifications, and, in response to a request to execute a first HA process thread to address the HA event, performing one or more of refusing the request to execute the first HA process thread, servicing the request to execute the first HA process thread, aborting one or more HA process threads in execution, and postponing execution of the first HA process thread based on one or more dependencies defining conditions for the first HA process thread.
  • Other features, functions, and aspects of the present disclosure will be evident from the Detailed Description that follows.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • The foregoing and other objects, features, and advantages will be apparent from the following description of particular embodiments of the present disclosure, as illustrated in the accompanying drawings, in which like reference characters refer to the same parts throughout the different views.
  • FIG. 1a is a block diagram of an exemplary data storage environment, in which techniques can be practiced for providing a centralized framework for handling execution of high-availability (HA) process flows in an active-active storage node configuration;
  • FIG. 1b is a block diagram of an active-active storage system in the data storage environment of FIG. 1 a;
  • FIG. 2a is a block diagram of an exemplary storage controller (or “storage node”) from among multiple storage controllers (or “storage nodes”) included in the active-active storage system of FIG. 1 b, in which the storage node includes an HA flows execution framework manager (or “framework manager”) and a persistent HA flow object database;
  • FIG. 2b is a block diagram of an exemplary HA flow object upon which the techniques can be practiced in the data storage environment of FIG. 1 a; and
  • FIG. 3 is a flow diagram of an exemplary method of providing a centralized framework for handling execution of HA process flows in an active-active storage node configuration.
  • DETAILED DESCRIPTION
  • Techniques are disclosed herein for providing a centralized framework for handling execution of high-availability (HA) process flows (also referred to herein as “HA flow(s)”) in an active-active storage node configuration. The disclosed techniques can include receiving notifications and/or reports of functional statuses of processes and/or equipment associated with storage nodes in an active-active configuration, making determinations regarding whether and/or how to address actual or potential malfunctions (also referred to herein as “HA events”) occurring on the processes and/or equipment associated with the storage nodes based on the received notifications and/or reports, and, in response to a request to execute an HA flow for a respective HA event, determining whether to refuse the request to execute the HA flow, service the request to execute the HA flow, abort one or more HA flows in execution, and/or postpone execution of the HA flow to a later time based on one or more dependencies defining conditions for the HA flow. In this way, mutual interference of HA flows or other process threads in an active-active configuration can be reduced or eliminated, and recovery times from HA events occurring in the active-active configuration can be reduced.
  • FIG. 1a depicts an illustrative embodiment of an exemplary data storage environment 100, in which techniques can be practiced for providing a centralized framework for handling execution of HA flows in an active-active storage node configuration. As shown in FIG. 1 a, the data storage environment 100 can include a plurality of host computers 102.1, 102.2, . . . , 102.n, an active-active storage system 104, and a communications medium 103 that includes at least one network 108. For example, each of the plurality of host computers 102.1, . . . , 102.n can be configured as a web server computer, a file server computer, an email server computer, an enterprise server computer, and/or any other suitable client/server computer or computerized device. The plurality of host computers 102.1, . . . , 102.n can be configured to provide, over the network 108, storage input/output (IO) requests (e.g., small computer system interface (SCSI) commands, network file system (NFS) commands) to the active-active storage system 104. Such storage IO requests (e.g., write IO requests, read IO requests) can direct a storage controller (or “storage node”) to write or read data blocks, data pages, data files, or any other suitable data elements to/from volumes (VOLs), logical units (LUs), file systems, and/or any other suitable storage objects, such as storage objects 110.1, 110.2, . . . , 110.m maintained in association with the active-active storage system 104.
  • The communications medium 103 can be configured to interconnect the plurality of host computers 102.1, . . . , 102.n and the active-active storage system 104 to enable them to communicate and exchange data and/or control signaling. As shown in FIG. 1 a, the communications medium 103 can be illustrated as a “cloud” to represent different communications topologies such as a backbone topology, a hub-and-spoke topology, a loop topology, an irregular topology, and so on, or any suitable combination thereof. As such, the communications medium 103 can include copper-based data communications devices and cabling, fiber optic-based communications devices and cabling, wireless communications devices, and so on, or any suitable combination thereof. Further, the communications medium 103 can be configured to support storage area network (SAN) communications, network attached storage (NAS) communications, local area network (LAN) communications, metropolitan area network (MAN) communications, wide area network (WAN) communications, wireless communications, distributed infrastructure communications, and/or any other suitable communications.
  • FIG. 1b depicts another view of the active-active storage system 104 of FIG. 1 a. As employed herein, the term “active-active storage system” refers to a highly available storage system, in which multiple storage nodes have shared or exclusive read-write IO access to the same storage objects (e.g., volumes (VOLs), logical units (LUs), file systems). As shown in FIG. 1 b, the active-active storage system 104 can include at least two storage controllers (or “storage nodes”) for high availability, namely, a primary storage node A 112.1 and a secondary storage node B 112.2, which is communicably connected to the storage node A 112.1 by a communication path 111. For example, each of the storage node A 112.1 and the storage node B 112.2 can receive storage IO requests from the respective host computers 102.1, . . . , 102.n over the network 108. In response to the storage IO requests, the storage nodes A 112.1, B 112.2 can perform storage IO operations (e.g., read-write IO operations) to write/read data blocks, data pages, data files, or any other suitable data elements to/from one or more of the plurality of storage objects 110.1, . . . , 110.m. Further, periodically or at intervals, page reference information pertaining to read-write IO operations maintained in a journal by the storage node A 112.1 can be synchronized with corresponding page reference information maintained in a journal by the storage node B 112.2. If the storage node A 112.1 is taken offline (or at any other suitable time), then the storage node B 112.2 can assume the role and/or duties of the storage node A 112.1 with regard to the handling of storage IO requests, providing high availability within the active-active storage system 104. As further shown in FIG. 1 b, the active-active storage system 104 can include one or more storage devices 114, which can be embodied as one or more non-volatile random-access memories (NVRAM), solid-state drives (SSDs), hard disk drives (HDDs), flash memories, and/or any other suitable storage device(s) for storing storage object data and/or metadata.
  • FIG. 2a depicts an exemplary storage node 200 from among multiple storage nodes included in the active-active storage system 104 of FIG. 1 a. As shown in FIG. 2a , the storage node 200 can include a communications interface 202, processing circuitry 204, and a memory 206. The communications interface 202 can include one or more of an Ethernet interface, an InfiniBand interface, a fiber channel interface, and/or any other suitable communications interface. The communications interface 202 can further include SCSI target adapters, network interface adapters, and/or any other suitable adapters for converting electronic, optical, and/or wireless signals received over the network(s) 108 to a form suitable for use by the processing circuitry 204. The memory 206 can include volatile memory such as random-access memory (RAM) or any other suitable volatile memory, as well as persistent memory such as NVRAM, read-only memory (ROM), one or more HDDs, one or more SSDs, or any other suitable persistent memory. The memory 206 can be configured to store a variety of software constructs realized in the form of specialized code and data (e.g., program instructions) that can be executed by the processing circuitry 204 to carry out the techniques and/or methods disclosed herein. The memory 206 can further include an operating system 208 such as the Linux OS, Unix OS, Windows OS, or any other suitable operating system, as well as a malfunction monitor 210 that can be executed by the processing circuitry 204. The processing circuitry 204 can include one or more physical processors, controllers, 10 modules, and/or any other suitable computer hardware or combination thereof.
  • It is noted that each of the multiple storage nodes (e.g., storage node A 112.1, storage node B 112.2) included in the active-active storage system 104 can be configured to include at least a communications interface, processing circuitry, a memory, an OS, and a malfunction monitor like the storage node 200 of FIG. 2a . In the active-active storage system 104, one of the multiple storage nodes can be further configured to include an HA flows execution framework manager (or “framework manager”) 212, as well as a persistent repository or database 214 configured to store multiple instances of high-availability (HA) flow objects. In the disclosed techniques, the malfunction monitor 210 can be configured to monitor functional statuses of processes and/or equipment associated with the storage node 200 and send notifications and/or reports of the functional statuses to the framework manager 212.
  • In the context of the processing circuitry 204 being implemented using one or more processors executing specialized code and data, a computer program product can be configured to deliver all or a portion of the specialized code and data to the respective processor(s). Such a computer program product can include one or more non-transient computer-readable storage media, such as a magnetic disk, a magnetic tape, a compact disk (CD), a digital versatile disk (DVD), an optical disk, a flash drive, a solid-state drive (SSD), a secure digital (SD) chip or device, an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), and so on. Further, the non-transient computer-readable storage media can be encoded with sets of program instructions for performing, when executed by the respective processor(s), the various techniques and/or methods disclosed herein.
  • FIG. 2b depicts an exemplary HA flow object 216 that can be stored in the persistent HA flow object database 214 of FIG. 2a . In the disclosed techniques, each HA flow can be represented as an instance of an HA flow object, and each HA flow object for each HA flow waiting to be executed can be stored in the HA flow object database 214. As shown in FIG. 2b , the HA flow object 216 can include a plurality of fields relating to an HA flow, including a first field for an indication of an HA flow purpose 218 (e.g., to establish a connection, reset a disk, reboot a storage node), a second field for an HA flow identifier 220 (e.g., alphabetic, numeric, alphanumeric identifier), a third field for an indication of an HA flow state 222 (e.g., uninitialized, waiting to be executed, in execution, completed execution), a fourth field for an indication of an HA flow progress 224 (e.g., process flow step in execution), and a fifth field for an indication of an HA flow execution result 226 (e.g., successful or failed execution). The HA flow object 216 can further include a field for logging and statistics information 228 (e.g., certain actions taken by the framework manager 212, a timestamp taken at the start of HA flow execution).
  • During operation, the framework manager (e.g., the framework manager 212; see FIG. 2a ) can receive, periodically or at intervals, notifications and/or reports of functional statuses of processes and/or equipment associated with the multiple storage nodes (e.g., storage nodes A 112.1, B 112.2) in the active-active storage system 104. The framework manager 212 can make determinations regarding whether and/or how to address any actual or potential process and/or equipment malfunctions (or “HA events”) based on the received notifications and/or reports. For example, such HA events can occur on one or more of the storage nodes (e.g., storage nodes A 112.1, B 112.2) in the active-active storage system 104, and/or one or more shared resources in the active-active storage system 104 (e.g., disks, disks drawers, PSU hardware modules). If the framework manager 212 determines to address one or more actual or potential HA events occurring on certain processes and/or equipment associated with storage nodes in the active-active storage system 104, then the framework manager 212 can implement an HA flow for each HA event as an asynchronous process thread. Each HA flow can be represented as an instance of an HA flow object, and the HA flow object for each HA flow waiting to be executed can be stored in the persistent HA flow object database (e.g., the HA flow object database 214; see FIG. 2a ). The framework manager 212 can define each HA flow at least with reference to one or more dependencies specifying its relationships with one or more other HA flows and/or certain software, firmware, and/or hardware modules or components in the active-active storage system 104. Based at least on the dependencies defining conditions for the HA flow, the framework manager 212 can determine whether to refuse a request to execute the HA flow, service the request to execute the HA flow, abort one or more HA flows in execution, and/or postpone execution of the HA flow to a later time. In this way, mutual interference of HA flows or other process threads in an active-active configuration can be reduced or eliminated, and recovery times from HA events occurring in the active-active configuration can be reduced.
  • The disclosed techniques for providing a centralized framework for handling execution of HA flows in an active-active storage node configuration will be further understood with reference to the following illustrative examples. In a first example, it is assumed that the framework manager 212 (see FIG. 2a ) has received one or more notifications of functional statuses of processes and/or equipment associated with at least one of the storage node A 112.1 and the storage node B 112.2 in the active-active storage system 104. For example, such notifications can be sent to the framework manager 212 by a malfunction monitor (e.g., the malfunction monitor 210; see FIG. 2a ) executing on the storage node A 112.1 and/or the storage node B 112.2. Such notifications can be explicit notifications of the functional statuses of processes or equipment (e.g., a disk is disconnected) and/or implicit notifications of the functional statuses of processes or equipment (e.g., a storage IO operation has failed). Further in this first example, it is assumed that the framework manager 212 has determined, based on the received notifications, that a process or equipment malfunction (or “HA event”) associated with one of the storage nodes A 112.1, B 112.2 has occurred. For example, the HA event can be due to a disk malfunction, an overheated hardware component, a control malfunction, or any other suitable malfunction.
  • Having determined that an HA event has occurred on a process or equipment associated with one of the storage nodes A 112.1, B 112.2, the framework manager 212 implements a new HA flow for the HA event as an asynchronous process thread. In this first example, the new HA flow is defined by a set of parameters and a set of executable steps. For example, the set of parameters can include (i) zero, one, or more dependencies specifying the new HA flow's relationships with one or more other HA flows represented by HA flow objects in the persistent HA flow object database 214, (ii) an abort policy specifying rules regarding whether and/or when to abort certain HA flows in execution at the time a request to execute the new HA flow is generated, and (iii) logging and statistics information. In some embodiments, the abort policy can be priority-based or can explicitly specify which HA flows in execution to abort. It is noted that certain HA flows in execution will be aborted only if required by the abort policy. In cases where there is no need to abort or otherwise interrupt an HA flow in execution, the HA flow will not be aborted or interrupted. Further, the set of executable steps can include a set of actions to be taken by the new HA flow to address the HA event. Upon implementation of the new HA flow for the HA event, the framework manager 212 generates a request to execute the new HA flow.
  • In this first example, once the request to execute the new HA flow has been generated, the framework manager 212 determines, as appropriate, (i) whether the request should be immediately refused, (ii) whether any HA flows in execution should be aborted, in accordance with the abort policy, and (iii) whether execution of the new HA flow should be postponed to a later time. For example, such refusal of the request to execute the new HA flow can be based on the storage node A 112.1 or B 112.2 of interest having been taken offline or any other suitable reason. If the request is not immediately refused, then the framework manager 212 allocates an HA flow object configured to represent the new HA flow and adds the HA flow object to the HA flow object database 214. Further, the framework manager 212 checks the rules specified in the abort policy for the new HA flow and aborts zero, one, or more asynchronous process threads for HA flows in execution, as warranted by the rules. In addition, the framework manager 212 checks the dependencies of the new HA flow vis-a-vis one or more other HA flows represented by HA flow objects in the HA flow object database 214. If the HA flow dependencies dictate a certain order in which the HA flows may be executed, then the framework manager 212 can postpone the execution of the new HA flow, as necessary, to satisfy the dependencies.
  • Having determined that the request to execute the new HA flow should not be immediately refused, aborted zero, one, or more asynchronous process threads for HA flows in execution, and/or postponed the execution of the new HA flow as necessary to satisfy any dependencies, the framework manager 212 can determine whether any other factors exist preventing immediate execution of the new HA flow. If so, then the framework manager 212 can determine, periodically or at intervals, whether such factors preventing execution of the new HA flow continue to exist. Once it is determined that such factors no longer exist, then the framework manager 212 starts execution of the new HA flow in the asynchronous process thread.
  • In a second example, it is assumed that the framework manager 212 (see FIG. 2a ) has received one or more additional notifications of functional statuses of processes and/or equipment associated with at least one of the storage node A 112.1 and the storage node B 112.2 in the active-active storage system 104. It is further assumed that the framework manager 212 has determined, based on the received notifications, that a process or equipment malfunction (or “HA event”) has again occurred on a process or equipment associated with one of the storage nodes A 112.1, B 112.2.
  • Having determined that an HA event has again occurred on a process or equipment associated with one of the storage nodes A 112.1, B 112.2, the framework manager 212 implements another new HA flow for the HA event as an asynchronous process thread. As in the first example, the new HA flow of the second example is defined by a set of parameters and a set of executable steps. For example, the set of parameters can include (i) zero, one, or more dependencies specifying the new HA flow's relationships with one or more other HA flows represented by HA flow objects in the persistent HA flow object database 214, (ii) an abort policy specifying rules regarding whether and/or when to abort certain HA flows in execution at the time a request to execute the new HA flow is generated, and (iii) logging and statistics information. In this second example, however, the rules specified in the abort policy dictate that all HA flows in execution are to be aborted. Upon implementation of the new HA flow for the HA event, the framework manager 212 generates a request to execute the new HA flow.
  • In this second example, once the request to execute the new HA flow has been generated, the framework manager 212 determines, as appropriate, (i) whether the request should be immediately refused, (ii) whether any HA flows in execution should be aborted, in accordance with the abort policy, and (iii) whether execution of the new HA flow should be postponed to a later time. If the request is not immediately refused, then the framework manager 212 allocates an HA flow object configured to represent the new HA flow and adds the HA flow object to the HA flow object database 214. Further, the framework manager 212 checks the rules specified in the abort policy for the new HA flow and aborts all asynchronous process threads for HA flows in execution, as warranted by the rules. In addition, the framework manager 212 checks the dependencies of the new HA flow vis-a-vis one or more other HA flows represented by HA flow objects in the HA flow object database 214 and postpones the execution of the new HA flow, as necessary, to satisfy the dependencies. Moreover, for each HA flow from among the other HA flows represented by HA flow objects in the HA flow object database 214, the framework manager 212 further determines, as appropriate, (i) whether the request to execute the HA flow should be immediately refused and (ii) whether execution of the HA flow should be postponed as necessary to satisfy its dependencies. Once these further determinations are made and satisfied, the framework manager 212 starts execution of the new HA flow in the asynchronous process thread.
  • A method of handling execution of HA process threads in an active-active storage node configuration is described below with reference to FIG. 3. As depicted in block 302, notifications of functional statuses of processes or equipment associated with storage nodes in an active-active configuration are received. As depicted in block 304, a determination is made that an HA event has occurred on one of the processes or equipment associated with the storage nodes in the active-active configuration based on the received notifications. As depicted in block 306, in response to a request to execute an HA process thread to address the HA event, one or more of refusing the request to execute the HA process thread, servicing the request to execute the HA process thread, aborting one or more HA process threads in execution, and postponing execution of the HA process thread based on one or more dependencies defining conditions for the HA process thread, are performed.
  • Several definitions of terms are provided below for the purpose of aiding the understanding of the foregoing description, as well as the claims set forth herein.
  • As employed herein, the term “storage system” is intended to be broadly construed to encompass, for example, private or public cloud computing systems for storing data, as well as systems for storing data comprising virtual infrastructure and those not comprising virtual infrastructure.
  • As employed herein, the terms “client,” “host,” and “user” refer, interchangeably, to any person, system, or other entity that uses a storage system to read/write data.
  • As employed herein, the term “storage device” may refer to a storage array including multiple storage devices. Such a storage device may refer to any non-volatile memory (NVM) device, including hard disk drives (HDDs), solid state drives (SSDs), flash devices (e.g., NAND flash devices, NOR flash devices), and/or similar devices that may be accessed locally and/or remotely (e.g., via a storage attached network (SAN)). A storage array (drive array, disk array) may refer to a data storage system used for block-based, file-based, or object storage. Storage arrays can include, for example, dedicated storage hardware containing HDDs, SSDs, and/or all-flash drives. A data storage entity may be a filesystem, an object storage, a virtualized device, a logical unit (LU), a logical unit number (LUN), a logical volume (LV), a logical device, a physical device, and/or a storage medium. An LU may be a logical entity provided by a storage system for accessing data from the storage system and may be used interchangeably with a logical volume. An LU or LUN may be used interchangeably with each other. A LUN may be a logical unit number for identifying an LU and may also refer to one or more virtual disks or virtual LUNs, which may correspond to one or more virtual machines. A physical storage unit may be a physical entity such as a drive or disk or an array of drives or disks for storing data in storage locations that can be accessed by addresses. A physical storage unit may be used interchangeably with a physical volume.
  • As employed herein, the term “storage medium” may refer to one or more storage media such as a hard drive, a combination of hard drives, flash storage, a combination of flash storage, a combination of hard drives, flash storage, and other storage devices, and/or any other suitable types or combinations of computer readable storage media. A storage medium may also refer to both physical and logical storage media, include multiple levels of virtual-to-physical mappings, and include an image or disk image. A storage medium may be computer-readable and may be referred to as a computer-readable program medium.
  • As employed herein, the term “IO request” or “IO” may be used to refer to an input or output request such as a data read request or data write request.
  • As employed herein, the terms, “such as,” “for example,” “e.g.,” “exemplary,” and variants thereof describe non-limiting embodiments and mean “serving as an example, instance, or illustration.” Any embodiments described herein using such phrases and/or variants are not necessarily to be construed as preferred or more advantageous over other embodiments, and/or to exclude the incorporation of features from other embodiments. In addition, the term “optionally” is employed herein to mean that a feature or process, etc., is provided in certain embodiments and not provided in other certain embodiments. Any particular embodiment of the present disclosure may include a plurality of “optional” features unless such features conflict with one another.
  • While various embodiments of the present disclosure have been particularly shown and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the scope of the present disclosure, as defined by the appended claims.

Claims (20)

What is claimed is:
1. A method of handling execution of high-availability (HA) process threads in an active-active storage node configuration, comprising:
receiving notifications of functional statuses of processes or equipment associated with storage nodes in an active-active configuration;
determining that an HA event has occurred on one of the processes or equipment associated with the storage nodes in the active-active configuration based on the received notifications; and
in response to a request to execute a first HA process thread to address the HA event, performing one or more of refusing the request to execute the first HA process thread, servicing the request to execute the first HA process thread, aborting one or more HA process threads in execution, and postponing execution of the first HA process thread based on one or more dependencies defining conditions for the first HA process thread.
2. The method of claim 1 further comprising:
specifying a set of parameters and a set of executable steps for the first HA process thread, the set of parameters including the one or more dependencies defining the conditions for the first HA process thread and an abort policy specifying rules regarding whether or when to abort the one or more HA process threads in execution.
3. The method of claim 2 further comprising:
in response to the request to execute the first HA process thread not being refused, allocating a first HA process thread object representing the first HA process thread; and
adding the first HA process thread object to a persistent database.
4. The method of claim 3 further comprising:
checking the specified rules in the abort policy; and
aborting one or more of the HA process threads in execution based on the specified rules.
5. The method of claim 4 further comprising:
checking the dependencies defining the conditions for the first HA process thread with regard to one or more other HA process threads represented by other HA process thread objects in the persistent database.
6. The method of claim 5 further comprising:
in response to the dependencies dictating an order in which the first HA process thread and the other HA process threads are to be executed, performing the postponing of the execution of the first HA process thread to satisfy the dependencies.
7. The method of claim 3 further comprising:
checking the specified rules in the abort policy; and
aborting all of the HA process threads in execution based on the specified rules.
8. The method of claim 7 further comprising:
checking the dependencies defining the conditions for the first HA process thread with regard to one or more other HA process threads represented by other HA process thread objects in the persistent database; and
in response to the dependencies dictating an order in which the first HA process thread and the other HA process threads are to be executed, performing the postponing of the execution of the first HA process thread to satisfy the dependencies.
9. The method of claim 8 further comprising:
for each respective HA process thread from among the one or more other HA process threads represented by the other HA process thread objects in the persistent database, determining one or more of whether a request to execute the respective HA process thread should be refused and whether execution of the respective HA process thread should be postponed as necessary to satisfy its dependencies.
10. The method of claim 9 further comprising:
having determined whether the request to execute the respective HA process thread should be refused or whether the execution of the respective HA process thread should be postponed, initiating execution of the first HA process thread.
11. A system for handling execution of high-availability (HA) process threads in an active-active storage node configuration, comprising:
a persistent database;
a memory; and
processing circuitry configured to execute program instructions out of the memory to:
receive notifications of functional statuses of processes or equipment associated with storage nodes in an active-active configuration;
determine that an HA event has occurred on one of the processes or equipment associated with the storage nodes in the active-active configuration based on the received notifications; and
in response to a request to execute a first HA process thread to address the HA event, perform one or more of refusing the request to execute the first HA process thread, servicing the request to execute the first HA process thread, aborting one or more HA process threads in execution, and postponing execution of the first HA process thread based on one or more dependencies defining conditions for the first HA process thread.
12. The system of claim 11 wherein the processing circuitry is further configured to execute the program instructions out of the memory to specify a set of parameters and a set of executable steps for the first HA process thread, wherein the set of parameters includes the one or more dependencies defining the conditions for the first HA process thread and an abort policy specifying rules regarding whether or when to abort the one or more HA process threads in execution.
13. The system of claim 12 wherein the processing circuitry is further configured to execute the program instructions out of the memory, in response to the request to execute the first HA process thread not being refused, to allocate a first HA process thread object representing the first HA process thread, and to add the first HA process thread object to the persistent database.
14. The system of claim 13 wherein the processing circuitry is further configured to execute the program instructions out of the memory to check the specified rules in the abort policy and abort one or more of the HA process threads in execution based on the specified rules.
15. The system of claim 14 wherein the processing circuitry is further configured to execute the program instructions out of the memory to check the dependencies defining the conditions for the first HA process thread with regard to one or more other HA process threads represented by other HA process thread objects in the persistent database.
16. The system of claim 15 wherein the processing circuitry is further configured to execute the program instructions out of the memory, in response to the dependencies dictating an order in which the first HA process thread and the other HA process threads are to be executed, to perform the postponing of the execution of the first HA process thread to satisfy the dependencies.
17. The system of claim 13 wherein the processing circuitry is further configured to execute the program instructions out of the memory to:
check the specified rules in the abort policy;
abort all of the HA process threads in execution based on the specified rules;
check the dependencies defining the conditions for the first HA process thread with regard to one or more other HA process threads represented by other HA process thread objects in the persistent database; and
in response to the dependencies dictating an order in which the first HA process thread and the other HA process threads are to be executed, perform the postponing of the execution of the first HA process thread to satisfy the dependencies.
18. The system of claim 17 wherein the processing circuitry is further configured to execute the program instructions out of the memory, for each respective HA process thread from among the one or more other HA process threads represented by the other HA process thread objects in the persistent database, to determine one or more of whether a request to execute the respective HA process thread should be refused and whether execution of the respective HA process thread should be postponed as necessary to satisfy its dependencies.
19. The system of claim 18 wherein the processing circuitry is further configured to execute the program instructions out of the memory, having determined whether the request to execute the respective HA process thread should be refused or whether the execution of the respective HA process thread should be postponed, initiating execution of the first HA process thread.
20. A computer program product including a set of non-transitory, computer-readable media having instructions that, when executed by processing circuitry, cause the processing circuitry to perform a method of handling execution of high-availability (HA) process threads in an active-active storage node configuration, the method comprising
receiving notifications of functional statuses of processes or equipment associated with storage nodes in an active-active configuration;
determining that an HA event has occurred on one of the processes or equipment associated with the storage nodes in the active-active configuration based on the received notifications; and
in response to a request to execute a first HA process thread to address the HA event, performing one or more of refusing the request to execute the first HA process thread, servicing the request to execute the first HA process thread, aborting one or more HA process threads in execution, and postponing execution of the first HA process thread based on one or more dependencies defining conditions for the first HA process thread.
US17/153,135 2021-01-20 2021-01-20 Centralized high-availability flows execution framework Active 2041-05-21 US11586466B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/153,135 US11586466B2 (en) 2021-01-20 2021-01-20 Centralized high-availability flows execution framework

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US17/153,135 US11586466B2 (en) 2021-01-20 2021-01-20 Centralized high-availability flows execution framework

Publications (2)

Publication Number Publication Date
US20220229693A1 true US20220229693A1 (en) 2022-07-21
US11586466B2 US11586466B2 (en) 2023-02-21

Family

ID=82405154

Family Applications (1)

Application Number Title Priority Date Filing Date
US17/153,135 Active 2041-05-21 US11586466B2 (en) 2021-01-20 2021-01-20 Centralized high-availability flows execution framework

Country Status (1)

Country Link
US (1) US11586466B2 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140344824A1 (en) * 2013-05-14 2014-11-20 International Business Machines Corporation Interruption of chip component managing tasks
US20160269313A1 (en) * 2015-03-09 2016-09-15 Amazon Technologies, Inc. Opportunistic resource migration to optimize resource placement
US20160277498A1 (en) * 2015-03-20 2016-09-22 Intel Corporation Location and boundary controls for storage volumes
US20170171302A1 (en) * 2015-12-15 2017-06-15 Samsung Electronics Co., Ltd. Storage system and method for connection-based load balancing
US20170228233A1 (en) * 2016-02-09 2017-08-10 Intel Corporation Methods, apparatus, and instructions for user-level thread suspension
US20190190778A1 (en) * 2017-12-20 2019-06-20 Hewlett Packard Enterprise Development Lp Distributed lifecycle management for cloud platforms

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11151082B1 (en) 2015-03-31 2021-10-19 EMC IP Holding Company LLC File system operation cancellation
US10178032B1 (en) 2015-09-23 2019-01-08 EMC IP Holding Company LLC Wide area network distribution, load balancing and failover for multiple internet protocol addresses
US10229017B1 (en) 2015-10-01 2019-03-12 EMC IP Holding Company LLC Resetting fibre channel devices for failover in high availability backup systems
US10469574B1 (en) 2016-04-20 2019-11-05 EMC IP Holding Company LLC Incremental container state persistency and replication for containerized stateful applications
US10664397B2 (en) 2018-07-31 2020-05-26 EMC IP Holding Company LLC Cache recovery method in a distributed storage system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140344824A1 (en) * 2013-05-14 2014-11-20 International Business Machines Corporation Interruption of chip component managing tasks
US20160269313A1 (en) * 2015-03-09 2016-09-15 Amazon Technologies, Inc. Opportunistic resource migration to optimize resource placement
US20160277498A1 (en) * 2015-03-20 2016-09-22 Intel Corporation Location and boundary controls for storage volumes
US20170171302A1 (en) * 2015-12-15 2017-06-15 Samsung Electronics Co., Ltd. Storage system and method for connection-based load balancing
US20170228233A1 (en) * 2016-02-09 2017-08-10 Intel Corporation Methods, apparatus, and instructions for user-level thread suspension
US20190190778A1 (en) * 2017-12-20 2019-06-20 Hewlett Packard Enterprise Development Lp Distributed lifecycle management for cloud platforms

Also Published As

Publication number Publication date
US11586466B2 (en) 2023-02-21

Similar Documents

Publication Publication Date Title
US10379759B2 (en) Method and system for maintaining consistency for I/O operations on metadata distributed amongst nodes in a ring structure
US9286344B1 (en) Method and system for maintaining consistency for I/O operations on metadata distributed amongst nodes in a ring structure
CN111488241B (en) Method and system for realizing agent-free backup and recovery operation in container arrangement platform
US9256374B1 (en) Metadata for managing I/O and storage for a virtualization environment
US8639876B2 (en) Extent allocation in thinly provisioned storage environment
US9747287B1 (en) Method and system for managing metadata for a virtualization environment
US8839030B2 (en) Methods and structure for resuming background tasks in a clustered storage environment
US8161128B2 (en) Sharing of data across disjoint clusters
US9575858B2 (en) Dynamic protection of storage resources for disaster recovery
US11144252B2 (en) Optimizing write IO bandwidth and latency in an active-active clustered system based on a single storage node having ownership of a storage object
US9280469B1 (en) Accelerating synchronization of certain types of cached data
US20140195698A1 (en) Non-disruptive configuration of a virtualization cotroller in a data storage system
CN106777394B (en) Cluster file system
US9984139B1 (en) Publish session framework for datastore operation records
US9898370B2 (en) Flash copy for disaster recovery (DR) testing
US9367405B1 (en) Managing software errors in storage systems
US10719257B1 (en) Time-to-live (TTL) license management in an active/active replication session
US10146683B2 (en) Space reclamation in space-efficient secondary volumes
US11586466B2 (en) Centralized high-availability flows execution framework
US9781057B1 (en) Deadlock avoidance techniques
US11416441B2 (en) RPC-less locking mechanism based on RDMA CAW for storage cluster with active-active architecture
US11449398B2 (en) Embedded container-based control plane for clustered environment
US11315028B2 (en) Method and apparatus for increasing the accuracy of predicting future IO operations on a storage system
JP6668733B2 (en) Control device, management device, storage system, control program, management program, control method, and management method
US10929342B2 (en) Techniques for limiting the maximum storage consumed by a file system without shrinking an underlying volume

Legal Events

Date Code Title Description
FEPP Fee payment procedure

Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

AS Assignment

Owner name: EMC IP HOLDING COMPANY LLC, MASSACHUSETTS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:REZNIK, INNA;LIEBER, AHIA;BANIN, ERAN;REEL/FRAME:055365/0981

Effective date: 20210120

AS Assignment

Owner name: CREDIT SUISSE AG, CAYMAN ISLANDS BRANCH, NORTH CAROLINA

Free format text: SECURITY AGREEMENT;ASSIGNORS:EMC IP HOLDING COMPANY LLC;DELL PRODUCTS L.P.;REEL/FRAME:055408/0697

Effective date: 20210225

AS Assignment

Owner name: THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT, TEXAS

Free format text: SECURITY INTEREST;ASSIGNORS:EMC IP HOLDING COMPANY LLC;DELL PRODUCTS L.P.;REEL/FRAME:055479/0342

Effective date: 20210225

Owner name: THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT, TEXAS

Free format text: SECURITY INTEREST;ASSIGNORS:EMC IP HOLDING COMPANY LLC;DELL PRODUCTS L.P.;REEL/FRAME:055479/0051

Effective date: 20210225

Owner name: THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT, TEXAS

Free format text: SECURITY INTEREST;ASSIGNORS:EMC IP HOLDING COMPANY LLC;DELL PRODUCTS L.P.;REEL/FRAME:056136/0752

Effective date: 20210225

AS Assignment

Owner name: EMC IP HOLDING COMPANY LLC, TEXAS

Free format text: RELEASE OF SECURITY INTEREST AT REEL 055408 FRAME 0697;ASSIGNOR:CREDIT SUISSE AG, CAYMAN ISLANDS BRANCH;REEL/FRAME:058001/0553

Effective date: 20211101

Owner name: DELL PRODUCTS L.P., TEXAS

Free format text: RELEASE OF SECURITY INTEREST AT REEL 055408 FRAME 0697;ASSIGNOR:CREDIT SUISSE AG, CAYMAN ISLANDS BRANCH;REEL/FRAME:058001/0553

Effective date: 20211101

AS Assignment

Owner name: DELL PRODUCTS L.P., TEXAS

Free format text: RELEASE OF SECURITY INTEREST IN PATENTS PREVIOUSLY RECORDED AT REEL/FRAME (056136/0752);ASSIGNOR:THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT;REEL/FRAME:062021/0771

Effective date: 20220329

Owner name: EMC IP HOLDING COMPANY LLC, TEXAS

Free format text: RELEASE OF SECURITY INTEREST IN PATENTS PREVIOUSLY RECORDED AT REEL/FRAME (056136/0752);ASSIGNOR:THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT;REEL/FRAME:062021/0771

Effective date: 20220329

Owner name: DELL PRODUCTS L.P., TEXAS

Free format text: RELEASE OF SECURITY INTEREST IN PATENTS PREVIOUSLY RECORDED AT REEL/FRAME (055479/0051);ASSIGNOR:THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT;REEL/FRAME:062021/0663

Effective date: 20220329

Owner name: EMC IP HOLDING COMPANY LLC, TEXAS

Free format text: RELEASE OF SECURITY INTEREST IN PATENTS PREVIOUSLY RECORDED AT REEL/FRAME (055479/0051);ASSIGNOR:THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT;REEL/FRAME:062021/0663

Effective date: 20220329

Owner name: DELL PRODUCTS L.P., TEXAS

Free format text: RELEASE OF SECURITY INTEREST IN PATENTS PREVIOUSLY RECORDED AT REEL/FRAME (055479/0342);ASSIGNOR:THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT;REEL/FRAME:062021/0460

Effective date: 20220329

Owner name: EMC IP HOLDING COMPANY LLC, TEXAS

Free format text: RELEASE OF SECURITY INTEREST IN PATENTS PREVIOUSLY RECORDED AT REEL/FRAME (055479/0342);ASSIGNOR:THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT;REEL/FRAME:062021/0460

Effective date: 20220329

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER

STPP Information on status: patent application and granting procedure in general

Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS

STCF Information on status: patent grant

Free format text: PATENTED CASE