US20220405170A1 - Systems and methods for application failover management using a distributed director and probe system - Google Patents
Systems and methods for application failover management using a distributed director and probe system Download PDFInfo
- Publication number
- US20220405170A1 US20220405170A1 US17/351,657 US202117351657A US2022405170A1 US 20220405170 A1 US20220405170 A1 US 20220405170A1 US 202117351657 A US202117351657 A US 202117351657A US 2022405170 A1 US2022405170 A1 US 2022405170A1
- Authority
- US
- United States
- Prior art keywords
- application
- director
- unavailable
- systems
- primary
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 143
- 239000000523 sample Substances 0.000 title claims abstract description 73
- 230000008569 process Effects 0.000 claims abstract description 84
- 238000012544 monitoring process Methods 0.000 claims abstract description 6
- 230000001960 triggered effect Effects 0.000 claims description 2
- 230000004044 response Effects 0.000 description 16
- 230000006870 function Effects 0.000 description 11
- 238000007726 management method Methods 0.000 description 10
- 238000010586 diagram Methods 0.000 description 8
- 238000013515 script Methods 0.000 description 7
- 230000035484 reaction time Effects 0.000 description 6
- 238000012423 maintenance Methods 0.000 description 5
- 238000003860 storage Methods 0.000 description 5
- 230000000007 visual effect Effects 0.000 description 4
- 238000013500 data storage Methods 0.000 description 3
- 230000036541 health Effects 0.000 description 3
- 241000238102 Scylla Species 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 2
- 238000013474 audit trail Methods 0.000 description 2
- 239000003086 colorant Substances 0.000 description 2
- 230000014759 maintenance of location Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000003213 activating effect Effects 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000003116 impacting effect Effects 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 230000010076 replication Effects 0.000 description 1
- 230000003362 replicative effect Effects 0.000 description 1
- 239000010979 ruby Substances 0.000 description 1
- 229910001750 ruby Inorganic materials 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1415—Saving, restoring, recovering or retrying at system level
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/048—Interaction techniques based on graphical user interfaces [GUI]
- G06F3/0481—Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
- G06F3/0482—Interaction with lists of selectable items, e.g. menus
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/048—Interaction techniques based on graphical user interfaces [GUI]
- G06F3/0484—Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range
- G06F3/04847—Interaction techniques to control parameter settings, e.g. interaction with sliders or dials
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2201/00—Indexing scheme relating to error detection, to error correction, and to monitoring
- G06F2201/81—Threshold
Definitions
- the present disclosure generally relates to computerized systems and methods for application management.
- embodiments of the present disclosure relate to inventive and unconventional systems that maximize resilience of applications and reaction time and flexibility of failover processes by employing a distributed director and probe system to monitor and automatically trigger a failover in the event of an unhealthy application.
- One aspect of the present disclosure is directed to a computer-implemented system for application management.
- a probe system comprising at least one memory storing instructions and one or more processors configured to execute the instructions to monitor an availability of an application and update a status associated with the availability of the application in a first data store.
- Additional embodiments may include one or more director systems comprising at least one memory storing instructions and one or more processors configured to execute the instructions to poll the first data store in intervals to retrieve the status associated with the availability of the application at different times; upon retrieving at least a particular number of consecutive statuses associated with the application being unavailable, determine the application is unavailable; determine whether at least one other director system of the one or more director systems has determined the application is unavailable; and upon determining the at least one other director system of the one or more director systems has determined the application is unavailable, trigger a failover process.
- one or more director systems comprising at least one memory storing instructions and one or more processors configured to execute the instructions to poll the first data store in intervals to retrieve the status associated with the availability of the application at different times; upon retrieving at least a particular number of consecutive statuses associated with the application being unavailable, determine the application is unavailable; determine whether at least one other director system of the one or more director systems has determined the application is unavailable; and upon determining the at least one other director system of the one or more director systems has
- Another aspect of the present disclosure is directed to a computer-implemented method for application management.
- certain embodiments of the method may include monitoring an availability of an application; updating a status associated with the availability of the application in a first data store; polling the first data store in intervals to retrieve the status associated with the availability of the application at different times; upon retrieving at least a particular number of consecutive statuses associated with the application being unavailable, determining the application is unavailable; determining whether at least one director system of associated one or more director systems has determined the application is unavailable; and upon determining the at least one director system of the associated one or more director systems has determined the application is unavailable, triggering a failover process.
- Yet another aspect of the present disclosure is directed to a computer-implemented system for application management.
- a probe system comprising at least one memory storing instructions and one or more processors configured to execute the instructions to monitor an availability of an application and update a status associated with the availability of the application in a first data store.
- Additional embodiments may include one or more secondary director systems comprising at least one memory storing instructions and one or more processors configured to execute the instructions to poll the first data store in intervals to retrieve the status associated with the availability of the application at different times and upon retrieving at least a particular number of consecutive statuses associated with the application being unavailable, determine the application is unavailable.
- Additional embodiments may include a primary director system comprising at least one memory storing instructions and one or more processors configured to execute the instructions to poll the first data store in intervals to retrieve the status associated with the availability of the application at different times, upon retrieving the at least the particular number of consecutive statuses associated with the application being unavailable, determine the application is unavailable, determine whether at least one secondary director system of the one or more secondary director systems has determined the application is unavailable, and upon determining the at least one secondary director system of the one or more secondary director systems has determined the application is unavailable, trigger a failover process.
- a primary director system comprising at least one memory storing instructions and one or more processors configured to execute the instructions to poll the first data store in intervals to retrieve the status associated with the availability of the application at different times, upon retrieving the at least the particular number of consecutive statuses associated with the application being unavailable, determine the application is unavailable, determine whether at least one secondary director system of the one or more secondary director systems has determined the application is unavailable, and upon determining the at least one secondary director system of the one or more
- non-transitory computer readable storage media may store program instructions, which are executed by at least one processor and perform any of the methods described herein.
- FIG. 1 is a diagram of an exemplary system for application management, consistent with disclosed embodiments.
- FIG. 2 is a diagram of an exemplary director cluster, consistent with disclosed embodiments.
- FIG. 3 is a diagram of an exemplary system for application management which has undergone a failover process, consistent with disclosed embodiments.
- FIG. 4 is a diagram of a user interface for managing one or more applications, consistent with disclosed embodiments.
- FIG. 5 is a flowchart of an exemplary method for monitoring the availability of an application, consistent with disclosed embodiments.
- FIG. 6 is a flowchart of an exemplary method for application management, consistent with disclosed embodiments.
- FIG. 7 is a flowchart of an exemplary method for performing a failover process, consistent with disclosed embodiments.
- FIG. 8 is a flowchart of an exemplary method for adopting the role of a primary director system, consistent with disclosed embodiments.
- Disclosed embodiments include systems and methods for application management using a distributed configuration of director systems and probe systems to improve upon the resilience, reaction time, flexibility, convenience, and compatibility of conventional failover processes.
- the disclosed improved failover processes may be performed to allow a user to manage the failover processes for a plurality of applications from one convenient user interface, determine whether to enable a failover feature for each application, manually trigger a failover process, determine which components of each application stack to engage in failover processes, view an audit trail indicating the failover history for the plurality of applications, receive alerts regarding applications and/or associated director systems, set maintenance times for each application where a failover process must not be triggered, and more, as discussed herein.
- the disclosed embodiments improve upon the technical process of failover processes as they engage a novel distributed director and probe system which operates to improve the resilience and reaction time of failover processes.
- FIG. 1 is a diagram of an exemplary system 100 for managing an application 114 , consistent with disclosed embodiments.
- System 100 may include a director system 102 , a failover system 104 , a primary local network 110 , a primary probe system 112 , a primary application 114 , a primary database (DB) 116 , a primary network-attached storage (NAS) 118 , a secondary local network 120 , a secondary probe system 122 , a secondary application 124 , a secondary DB 126 , and a secondary NAS 128 .
- DB primary database
- NAS primary network-attached storage
- primary local network 110 and secondary local network 120 may be simply referred to as local networks 110 and 120 ; primary probe system 112 and secondary probe system 122 may be simply referred to as probe systems 112 and 122 ; primary application 114 and secondary application 124 may be simply referred to as applications 114 and 124 ; primary DB 116 and secondary DB 126 may be simply referred to as DBs 116 and 126 ; and primary NAS 118 and secondary NAS 128 may be simply referred to as NASs 118 and 128 .
- local networks 110 and 120 may be the same network and probe systems 112 and 122 may be the same probe system.
- Components of system 100 may be connected to each other through a network (not shown) such as a Wide Area Network (WAN) or a Local Area Network (LAN).
- a network such as a Wide Area Network (WAN) or a Local Area Network (LAN).
- director system 102 may be directly connected to probe systems 112 and 122 and to failover system 104 ; failover system 104 may be directly connected to director system 102 and applications 114 and 124 ; primary DB 116 may be directly connected to secondary DB 126 ; and primary NAS 118 may be directly connected to secondary NAS 128 .
- WAN Wide Area Network
- LAN Local Area Network
- system 100 may be arranged in various ways and implemented with any suitable combination of hardware, firmware, and/or software, as applicable.
- system 100 may include a larger or smaller number of director systems, failover systems, probe systems, applications, databases, network-attached storages, or networks.
- system 100 may further include other components or devices not depicted that perform or assist in the performance of one or more processes, consistent with the disclosed embodiments.
- the exemplary components and arrangements shown in FIG. 1 are not intended to limit the disclosed embodiments.
- Director system 102 may include one or more memory units and one or more processors configured to perform operations consistent with disclosed embodiments.
- director system 102 may include hardware, software, and/or firmware modules.
- some or all components of director system 102 may be hosted on one or more servers, one or more clusters of servers, or one or more cloud services (e.g., cloud services hosted by Akamai, Microsoft, Amazon, Oracle, Google, Apache, or any other appropriate cloud service).
- Director system 102 may be connected to one or more networks and/or may be connected directly to probe systems 112 and 122 and failover system 104 .
- Director system 102 may be configured to make a plurality of determinations, and based on those determinations, trigger a failover process.
- Director system 102 is described in greater detail below with reference to FIGS. 2 and 6 .
- Failover system 104 may include one or more memory units and one or more processors configured to perform operations consistent with disclosed embodiments.
- failover system 104 may include hardware, software, and/or firmware modules.
- some or all components of failover system 104 may be hosted on one or more servers, one or more clusters of servers, or one or more cloud services (e.g., cloud services hosted by Akamai, Microsoft, Amazon, Oracle, Google, Apache, or any other appropriate cloud service).
- Failover system 104 may be connected to one or more networks and/or may be connected directly to applications 114 and 124 and director system 102 .
- Failover system 104 may be configured to perform a failover process automatically or upon receiving a trigger, such as from director system 102 . Failover system 104 is described in greater detail below with reference to FIGS. 3 and 7 .
- Local networks 110 and 120 may be public networks or private networks and may each include, for example, a wired or wireless network, including, without limitation, a Local Area Network, a Wide Area Network, a Metropolitan Area Network, an IEEE 802.11 wireless network (e.g., “Wi-Fi”), a network of networks (e.g., the Internet), a land-line telephone network, or the like.
- local networks 110 and 120 may be secure networks and require a password or other authentication criterion to access the networks.
- Probe systems 112 and 122 may each include one or more memory units and one or more processors configured to perform operations consistent with disclosed embodiments.
- probe systems 112 and 122 may include hardware, software, and/or firmware modules.
- some or all components of probe systems 112 and 122 may be hosted on one or more servers, one or more clusters of servers, or one or more cloud services (e.g., cloud services hosted by Akamai, Microsoft, Amazon, Oracle, Google, Apache, or any other appropriate cloud service).
- Probe systems 112 and 122 may be configured to send requests or run queries against one of applications 114 and 124 , DBs 116 and 126 , or NASs 118 and 128 to determine whether applications 114 and 124 are available.
- primary probe system 112 may run a query against primary DB 116 to determine that primary application 114 is available, and thus may be associated with a status of ‘UP.’
- secondary probe system 122 may run a query against secondary DB 126 to determine that secondary application 124 is unavailable, and thus may be associated with a status of ‘DOWN.’
- Probe systems 112 and 122 may be connected to one or more networks and/or may be connected directly to director system 102 , applications 114 and 124 , DBs 116 and 126 , and NASs 118 and 128 . Probe systems 112 and 122 are described in greater detail below with reference to FIG. 5 .
- Applications 114 and 124 may include programs or pieces of software (e.g., modules, code, scripts, or functions) designed and written to process data and perform a particular task or set of tasks to fulfill a particular purpose for a user.
- applications 114 and 124 may be configured to manage a bank account of a user.
- Applications 114 and 124 may be configured to perform a task in response to a triggering event.
- a triggering event such as the receipt of input data from one component of system 100 , from a user, or from any other entity, applications 114 and 124 may be configured to process the input data and forward processed data to another system 100 component.
- Applications 114 may be connected to one or more networks and/or may be connected directly to failover system 104 , probe systems 112 and 122 , DBs 116 and 126 , and NASs 118 and 128 . Applications 114 and 124 may be configured to perform similar tasks.
- DBs 116 and 126 may include any collection of data values and relationships among them.
- the data may be stored linearly, horizontally, hierarchically, relationally, non-relationally, uni-dimensionally, multidimensionally, operationally, in an ordered manner, in an unordered manner, in an object-oriented manner, in a centralized manner, in a decentralized manner, in a distributed manner, in a custom manner, or in any manner enabling data access.
- DBs 116 and 126 may each include an array, an associative array, a linked list, a binary tree, a balanced tree, a heap, a stack, a queue, a set, a hash table, a record, a tagged union, ER model, or a graph.
- DBs 116 and 126 may each include an XML database, an RDBMS database, an SQL database or NoSQL alternatives for data storage/search such as, for example, MongoDB, Redis, Couchbase, Datastax Enterprise Graph, Elastic Search, Splunk, SoIr, Cassandra, Amazon DynamoDB, Scylla, HBase, or Neo4J.
- DBs 116 and 126 may be components of system 100 or remote computing components (e.g., cloud-based data structures). Data in DBs 116 and 126 may be stored in contiguous or non-contiguous memory. Moreover, DBs 116 and 126 do not require information to be co-located. DBs 116 and 126 may be distributed across multiple servers, for example, that may be owned or operated by the same or different entities. Thus, the terms “database” or “data structure” as used herein in the singular are inclusive of plural databases or data structures. DBs 116 and 126 may be configured to contain the same or similar data.
- DBs 116 and 126 may be connected to one or more networks and/or may be connected directly to each other, probe systems 112 and 122 , applications 114 and 124 , and NASs 118 and 128 .
- primary DB 116 may be active and may replicate its data onto secondary DB 126 by any appropriate process, such as by replicating data between heterogeneous databases.
- one or more components of system 100 e.g., director system 102 , failover system 104 , or applications 114 and 124
- secondary DB 126 may be active and may replicate its data onto primary DB 116 by any appropriate process.
- NASs 118 and 128 may include any data storage server connected to one or more networks, such as local networks 110 and 120 .
- the data may be stored linearly, horizontally, hierarchically, relationally, non-relationally, uni-dimensionally, multidimensionally, operationally, in an ordered manner, in an unordered manner, in an object-oriented manner, in a centralized manner, in a decentralized manner, in a distributed manner, in a custom manner, or in any manner enabling data access.
- NASs 118 and 128 may each include an array, an associative array, a linked list, a binary tree, a balanced tree, a heap, a stack, a queue, a set, a hash table, a record, a tagged union, ER model, and a graph.
- NASs 118 and 128 may each include an XML database, an RDBMS database, an SQL database or NoSQL alternatives for data storage/search such as, for example, MongoDB, Redis, Couchbase, Datastax Enterprise Graph, Elastic Search, Splunk, SoIr, Cassandra, Amazon DynamoDB, Scylla, HBase, or Neo4J.
- NASs 118 and 128 may be components of system 100 or remote computing components (e.g., cloud-based data structures). Data in NASs 118 and 128 may be stored in contiguous or non-contiguous memory. Moreover, NASs 118 and 128 do not require information to be co-located. NASs 118 and 128 may be distributed across multiple servers, for example, that may be owned or operated by the same or different entities. NASs 118 and 128 may be configured to contain the same or similar data.
- NASs 118 and 128 may be connected to one or more networks and/or may be connected directly to each other, probe systems 112 and 122 , applications 114 and 124 , and DBs 116 and 126 .
- primary NAS 118 may be active and may replicate its data onto secondary NAS 128 by any appropriate process, such as snapshot replication.
- one or more components of system 100 e.g., director system 102 , failover system 104 , or applications 114 and 124
- secondary NAS 128 may be active and may replicate its data onto primary NAS 118 by any appropriate process.
- FIG. 2 is a diagram of an exemplary director cluster 200 , consistent with disclosed embodiments.
- Director cluster 200 may include a primary director system 210 , a primary decision manager 212 , primary sensors 214 , a primary user interface 216 , a first secondary director system 220 , a first secondary decision manager 222 , first secondary sensors 224 , a first secondary user interface 226 , a second secondary director system 230 , a second secondary decision manager 232 , second secondary sensors 234 , and a second secondary user interface 236 .
- first and second secondary director systems 220 and 230 may be simply referred to as secondary director systems 220 and 230 ; first and second secondary decision managers 222 and 232 may be simply referred to as secondary decision managers 222 and 232 ; first and second secondary sensors 224 and 234 may be simply referred to as secondary sensors 224 and 234 ; and first and second secondary user interfaces 226 and 236 may be simply referred to as secondary user interfaces 226 and 236 .
- director cluster 200 may include only primary director system 210 , primary director system 210 and one secondary director system 220 or 230 , or primary director system 210 and any number of secondary director systems.
- primary director system 210 may be inactive, and one of secondary director systems 220 or 230 may adapt to the role of primary director system 210 , as discussed in greater detail herein.
- Primary director system 210 and secondary director systems 220 and 230 may be connected to each other through a network such as a Wide Area Network (WAN) or a Local Area Network (LAN).
- WAN Wide Area Network
- LAN Local Area Network
- decision managers 212 , 222 , and 232 may be directly connected by any appropriate means.
- director cluster 200 may be arranged in various ways and implemented with any suitable combination of hardware, firmware, and/or software, as applicable.
- director cluster 200 may include a larger or smaller number of director systems, decision managers, sensors, or user interfaces.
- director cluster 200 may further include other components or devices not depicted that perform or assist in the performance of one or more processes, consistent with the disclosed embodiments.
- the exemplary components and arrangements shown in FIG. 2 are not intended to limit the disclosed embodiments.
- Primary director system 210 may include one or more memory units and one or more processors, as discussed in greater detail herein.
- Primary director system 210 may include primary decision manager 212 , which may include programs or pieces of software (e.g., modules, code, scripts, or functions) designed and written to process data and perform a particular task or set of tasks to fulfill a particular purpose.
- primary decision manager 212 may be configured to manage an application (e.g., application 114 of FIG. 1 ) by triggering a failover process if the application becomes unavailable.
- Primary decision manager 212 may be configured to perform a task in response to a triggering event.
- primary decision manager 212 may be configured to initiate a failover protocol.
- a triggering event such as a consecutive number of ‘DOWN’ statuses associated with an application
- primary decision manager 212 may be configured to process the input data and forward processed data to another director cluster 200 or system 100 component.
- Primary decision manager 212 may be connected to one or more networks and/or may be connected directly to secondary decision managers 222 and 232 and any other component of system 100 or director cluster 200 .
- Primary decision manager 212 may include primary sensors 214 , which may be software (e.g., modules, code, scripts, or functions) or hardware configured to detect or measure a status associated with an application, object, or entity and transmit a resulting signal corresponding to their findings.
- primary sensors 214 may be configured to determine the health of an application and report their findings to primary decision manager 212 .
- primary sensors 214 may be configured to poll probe systems 112 or 122 of FIG. 1 to determine the health of application 114 or 124 , respectively, and report their findings to primary decision manager 212 .
- Primary sensors 214 may be connected to one or more networks and/or may be connected directly to probe systems 112 and 122 and any other component of system 100 or director cluster 200 .
- Primary decision manager 212 may include primary user interface 216 , which may be software (e.g., modules, code, scripts, or functions) and/or hardware configured to allow a user and a computer system to interact.
- primary user interface 216 may be configured to display, on a physical or virtual display, elements to a user which allow the user to make selections regarding one or more components of system 100 of FIG. 1 or director cluster 200 .
- Primary user interface 216 may be connected to one or more networks and/or may be connected directly to one or more components of system 100 or director cluster 200 . Primary user interface 216 is described in greater detail below.
- Secondary director systems 220 and 230 may include one or more memory units and one or more processors, as discussed in greater detail herein. Secondary director systems 220 and 230 may include secondary decision managers 222 and 232 , which may be similar to primary decision manager 212 and may be configured to perform similar functions. Additionally, secondary decision managers 222 and 232 may be configured to adapt to the role of primary decision manager 212 , as discussed in greater detail below with respect to FIG. 8 . Secondary decision managers 222 and 232 may be connected to one or more networks and/or may be connected directly to each other, primary decision manager 212 , and any other component of system 100 or director cluster 200 .
- Secondary director systems 220 and 230 may include secondary sensors 224 and 234 , which may be similar to primary sensors 214 and may be configured to perform similar functions. Secondary sensors 224 and 234 may be connected to one or more networks and/or may be connected directly to probe systems 112 and 122 of FIG. 1 and any other component of system 100 or director cluster 200 .
- Secondary director systems 220 and 230 may include secondary user interfaces 226 and 236 , which may be similar to primary user interface 216 and may be configured to perform similar functions. Additionally, secondary user interfaces may be configured to adapt to the role of primary user interface 216 . Secondary user interfaces 226 and 236 may be connected to one or more networks and/or may be connected directly to one or more components of system 100 or director cluster 200 .
- FIG. 3 is a diagram of an exemplary system 300 for managing an application 324 which has undergone a failover process, consistent with disclosed embodiments.
- System 300 may include a director system 302 , a failover system 304 , a primary local network 310 , a primary probe system 312 , a primary application 314 , a primary database (DB) 316 , a primary network-attached storage (NAS) 318 , a secondary local network 320 , a secondary probe system 322 , a secondary application 324 , a secondary DB 326 , and a secondary NAS 328 .
- the components of system 300 are similar to each corresponding component of system 100 of FIG. 1 and will not be described further with respect to FIG. 3 .
- system 300 may be arranged in various ways and implemented with any suitable combination of hardware, firmware, and/or software, as applicable.
- system 300 may include a larger or smaller number of director systems, failover systems, probe systems, applications, databases, network-attached storages, or networks.
- system 300 may further include other components or devices not depicted that perform or assist in the performance of one or more processes, consistent with the disclosed embodiments.
- the exemplary components and arrangements shown in FIG. 3 are not intended to limit the disclosed embodiments.
- system 300 has undergone a failover process, causing secondary application 324 , secondary DB 326 , and secondary NAS 328 to become active, while primary application 314 , primary DB 316 , and primary NAS 318 become inactive.
- primary application 114 of FIG. 1 may have become unavailable, causing director system 102 to trigger a failover process by, for example, instructing failover system 104 to perform the failover process.
- the failover process may involve shutting down primary local network 110 , primary application 114 , primary DB 116 , and/or primary NAS 118 ; bringing up secondary local network 120 , secondary application 124 , secondary DB 126 , and/or secondary NAS 128 ; and switching traffic to secondary local network 120 , application 124 , secondary DB 126 , and/or secondary NAS 128 , causing system 100 to become system 300 .
- primary probe system 312 may determine that primary application 314 is unavailable, and thus may be associated with a status of ‘DOWN.’
- secondary probe system 322 may determine that secondary application 324 is available, and thus may be associated with a status of ‘UP.’
- secondary DB 326 may now be active and may replicate its data onto primary DB 316 by any appropriate process.
- secondary NAS 328 may now be active and may replicate its data onto primary NAS 318 by any appropriate process.
- one or more components of system 300 e.g., director system 302 , failover system 304 , or applications 314 and 324 ) may ensure that primary DB 316 contains an up-to-date copy of the data in secondary DB 326 and primary NAS 318 contains an up-to-date copy of the data in secondary NAS 328 .
- FIG. 4 is a diagram of a user interface 400 for managing one or more applications, consistent with disclosed embodiments.
- User interface 400 may include table 402 containing rows 404 a - h corresponding to applications and columns 406 a - h corresponding to data associated with the applications.
- rows 404 a - g may correspond to Applications A-G and row 404 h may describe the data contained in each column of table 402
- column 406 a may indicate the name of an application
- column 406 b may indicate the primary director system associated with an application and whether it is active
- column 406 c may indicate the secondary director system associated with an application and whether it is active
- column 406 d may indicate maintenance times for an application during which an automatic failover process should not be engaged
- column 406 e may indicate the status of an application
- column 404 f may indicate whether a user has selected to enable the automatic failover process
- column 406 g may indicate whether there is an alert associated with an application
- column 406 h may allow a user to click on a director system to trigger a failover process and switch traffic from a primary application (e.g., primary application 114 of FIG. 1 ) to a secondary application (e.g., secondary application 124 ).
- a primary application e.g., primary application 114
- user interface 400 may be arranged in various ways and implemented with any suitable combination of hardware, firmware, and/or software, as applicable.
- user interface 400 may include a larger or smaller number of rows or columns, allowing for a larger or smaller number of applications or amount of data associated with the applications.
- user interface 400 may include an additional ‘Secondary Director’ column to allow for a director cluster with three director systems, such as director cluster 200 of FIG. 2 .
- user interface 400 may further include other components not depicted that perform or assist in the performance of one or more processes, consistent with the disclosed embodiments.
- the exemplary components and arrangements shown in FIG. 4 are not intended to limit the disclosed embodiments.
- ‘Application A’ corresponding to row 404 a may have an inactive primary director system ‘X,’ an active secondary director system ‘Y,’ maintenance scheduled for Sunday from 02:00-06:00, a status of ‘UP,’ user-enabled the automatic failover process, an outstanding alert, and the option to trigger a failover process to activate primary director system ‘X.’
- ‘Application G’ corresponding to row 402 g may have an inactive primary director system ‘X,’ an inactive secondary director system ‘Z,’ maintenance scheduled for Sunday from 02:00-06:00, a status of ‘DOWN,’ user-disabled the automatic failover process, no outstanding alerts, and the option to trigger a failover process to activate primary director system ‘X’ or secondary director system ‘Z.’
- visual indications may be utilized in columns 406 b and 406 c to specify which, if any, of the director systems is currently active.
- an active or inactive director system may be specified by way of different colors, shading, text, or by any other means which may convey to a user whether a director system is active.
- a user may be able to click on a cell or data contained within a cell associated with a primary or secondary director system of an application to activate the clicked primary or secondary director system.
- clicking on or hovering over a cell or data contained within a cell associated with a primary or secondary director system of an application may reveal information related to the clicked primary or secondary director system.
- columns 406 b and 406 c may be updated automatically by a suitable component of system 100 of FIG. 1 or director cluster 200 of FIG. 2 , or may be modified by a user to, for example, swap the primary or secondary director systems for a different director system.
- the maintenance time specified in column 406 d indicates a period of time during which the automatic failover process, should it be enabled, will not be engaged.
- column 406 e relating to the status of an application, may be updated by one or more of probe systems 112 and 122 , director system 102 , or any other suitable component of system 100 of FIG. 1 .
- column 406 f may allow a user to enable the automatic failover process by, for example, clicking on or sliding a slider one way or another.
- the automatic failover process may be enabled for rows 404 a - b and 404 d, while manual failover may be required for rows 404 c and 404 e - g. The automatic failover process will be discussed in greater detail below with respect to FIG. 6 .
- the data contained in the cells of column 406 g may merely indicate, in binary form, whether there is an alert associated with an application.
- different types of alerts may be indicated by way of visual indications, such as different colors, shapes, sizes, or any other appropriate visual cues.
- the alert may be retrieved by clicking on, hovering over, or activating in any appropriate manner, an element or data contained within a cell of column 406 g. Additionally or alternatively, a part of or all of the alert may itself be contained within the cells of column 406 g. In the example of FIG. 4 , there may be an outstanding alert associated with Applications A and C-E, for example, to alert of an issue with director system ‘Y.’
- column 406 h may allow a user to click on a director system to trigger a failover process and switch traffic from a primary application to a secondary application, as discussed above.
- a user may click on or otherwise select primary director system ‘X’ to trigger a failover process and switch traffic from secondary director ‘Y’ to primary director system ‘X.’
- columns 406 d, 406 f, and 406 h may be modified or updated by a user or automatically by a suitable component of system 100 of FIG. 1 or director cluster 200 of FIG. 2 .
- user interface 400 may include other features that a user may interact with, such as options to sort, filter, search, or otherwise modify table 402 , generate a report with all or a part of the data contained in table 402 , view historical data (e.g., total number of failovers executed), view or generate statistics, view an audit trail indicating a chronological record of the sequence of activities performed on user interface 400 , determine which components of an application stack are to undergo the failover process or any other appropriate function which may be useful to a user using user interface 400 .
- view historical data e.g., total number of failovers executed
- view or generate statistics view an audit trail indicating a chronological record of the sequence of activities performed on user interface 400
- FIG. 5 is a flowchart of exemplary method 500 for monitoring the availability of an application, consistent with disclosed embodiments.
- method 500 may be performed by a component of system 100 of FIG. 1 , for example, one of probe systems 112 or 122 or director system 102 .
- Method 500 is described below with reference to the networked systems of FIG. 1 , but any other configuration of systems, subsystems, or modules may be used to perform method 500 .
- probe system 112 may run a query against database 116 .
- probe system 112 may send a request for the query to application 114 .
- a load balancer of application 114 may transmit the request to a web server of application 114 , which in turn may transmit the request to an application server of application 114 , which may then run the query against database 116 .
- the response which probe system 112 expects may be a login webpage, a JSON file, a 200 response code, or any other suitable response which may indicate to probe system 112 whether application 114 and database 116 are available.
- probe system 112 may connect directly to database 116 .
- Probe system 112 may continuously run queries against database 116 or may run queries against database 116 in intervals, such as every minute.
- probe system 112 may determine whether the query response is acceptable. For example, if probe system 112 is successfully directed to a login webpage or receives a ‘ 200 OK’ response code, probe system 112 may determine that the query response is acceptable and method 500 may proceed to step 506 a. Alternatively, if probe system 112 does not receive an acceptable response, for example, there is no response or it is incomplete, such as being directed to a login webpage including an error, method 500 may proceed to step 506 b.
- probe system 112 may have determined that the query response is acceptable, and may label a status associated with application 114 as ‘UP.’
- probe system 112 may have determined that the query response is not acceptable, and may label the status associated with application 114 as ‘DOWN.’
- probe system 112 may update a data store associated with probe system 112 with the labeled status of ‘UP’ or ‘DOWN,’ depending on whether the response was acceptable or not, respectively.
- the data store may be a database which is connected to one or more networks of system 100 and, as such, director system 102 may access.
- the data store is a webpage which director system 102 may access through an Internet connection.
- the data store may be any repository for storing data which may include a file, email, document, database, webpage, spreadsheet, message queue, or any other suitable method for storing data which may be accessed by director system 102 .
- FIG. 6 is a flowchart of exemplary method 600 for application management, consistent with disclosed embodiments.
- method 600 may be performed by a component of system 100 of FIG. 1 and/or director cluster 200 of FIG. 2 , for example, primary director system 210 .
- Method 600 is described below with reference to the networked systems of FIGS. 1 and 2 , but any other configuration of systems, subsystems, or modules may be used to perform method 600 .
- primary director system 210 may render a user interface (e.g., user interface 400 of FIG. 4 ) on a display such that a user may interact with the user interface.
- the display may include an electronic device or part of an electronic device which serves for the visual presentation of data.
- primary director system 210 may receive a selection from the user indicating whether the automatic failover process is to be enabled for application 114 .
- the selection may include an activation of an element of a cell of column 406 f associated with application 114 .
- primary director system 210 may poll the probe data store of probe system 112 in intervals to retrieve the status of application 114 .
- Polling the probe data store may include accessing a web page or database, receiving a file or an email, or any suitable method by which primary director system 210 may retrieve the status of application 114 via a data store.
- primary director system 210 may determine the status of application 114 by polling application 114 directly.
- Polling the probe data store or application 114 in intervals may refer to polling the probe data store or application 114 once every ‘X’ amount of time. For example, director system 210 may poll the data store or application 114 once every minute.
- the interval time may be set by a user, for example, via user interface 400 , automatically determined by primary director system 210 , or predetermined by a manufacturer.
- primary director system 210 may store the retrieved status in a data store with an associated timestamp.
- a timestamp may be a digital record of the time at which the status was retrieved.
- primary director system 210 may determine whether the second data store includes a particular number of consecutive ‘DOWN’ statuses, the consecutive ‘DOWN’ statuses being the latest statuses to have been retrieved from the probe data store or application 114 . For example, if the particular number is 5 , primary director system 210 may determine the second data store includes the particular number of consecutive ‘DOWN’ statuses upon retrieving and/or storing 5 successive ‘DOWN’ statuses. If primary director system 210 determines that the second data store does include the particular number of consecutive ‘DOWN’ statuses, method 600 may proceed to step 612 . Alternatively, method 600 may return to step 606 to continue polling the probe data store in intervals.
- primary director system 210 may engage in a handshake procedure with one or more of secondary director systems 220 and/or 230 to determine whether the one or more of secondary director systems 220 and/or 230 confirm that a particular number of consecutive ‘DOWN’ statuses for application 114 has been reached.
- the handshake procedure may be an automated process of an exchange of information between one or more director systems.
- primary director system 210 may communicate with secondary director system 220 to determine whether secondary director system 220 has determined that application 114 is unavailable. This may prevent primary director system 210 from triggering a failover process for application 114 if the problem only exists in the connection between primary director system 210 and probe system 112 and secondary director systems 220 and/or 230 do not consider application 114 to be unavailable.
- method 600 may proceed to step 614 . Otherwise, primary director system 210 may not proceed with the failover process and may engage an error process, such as alerting a support team regarding a potential probe failure.
- primary director system 210 may determine whether the user has enabled the automatic failover process based on the user selection of step 604 .
- the automatic failover process may refer to a piece of software (e.g., modules, code, scripts, or functions) which automatically triggers a failover process without requiring a user input following the determination that an application is unavailable. If primary director system 210 determines that the user has not enabled the automatic failover process, method 600 may proceed to step 616 . Otherwise, method 600 may proceed to step 624 , where primary director system 210 may trigger a failover process.
- primary director system 210 may transmit an alert to the user.
- the alert may take the form of an email, notification, update to column 406 g of FIG. 4 , or any other means of informing the user that application 114 is unavailable.
- primary director system 210 may render the user interface automatically or as a result of the user attempting to access the user interface.
- primary director system 210 may receive a selection from the user instructing primary director system 210 to trigger the failover process.
- primary director system 210 may trigger the failover process by, for example, instructing failover system 104 to perform the failover process or by performing the failover process itself.
- Method 600 may be adjusted to be performed in fewer than ‘X’ minutes to satisfy a ‘X’-minute service level agreement (SLA). For example, method 600 may be performed in fewer than 15 minutes to satisfy a 15-minute SLA if primary director system 210 polls the probe data store every minute and the number of consecutive ‘DOWN’ statuses necessary to trigger the failover process is 5.
- SLA service level agreement
- FIG. 7 is a flowchart of exemplary method 700 for performing a failover process, consistent with disclosed embodiments.
- method 700 may be performed by a component of system 100 of FIG. 1 and/or director cluster 200 of FIG. 2 , for example, failover system 104 .
- Method 700 is described below with reference to the networked systems of FIGS. 1 and 2 , but any other configuration of systems, subsystems, or modules may be used to perform method 700 .
- primary director system 210 may trigger a failover process by, for example, instructing failover system 104 to perform the failover process.
- failover system 104 may shut down at least one of primary local network 110 , primary application 114 , primary DB 116 , or primary NAS 118 .
- Failover system 104 may terminate the connections by, for example, forcing local network 110 , primary application 114 , primary DB 116 , or primary NAS 118 to go offline, creating a dynamic KILL statement for each connection, and/or altering the connections to have a single or restricted user.
- failover system 104 may bring up at least one of secondary local network 120 , secondary application 124 , secondary DB 126 , or secondary NAS 128 .
- failover system 104 may switch traffic to the at least one of secondary local network 120 , secondary application 124 , secondary DB 126 , or secondary NAS 128 by reestablishing the connections from local network 110 , primary application 114 , primary DB 116 , or primary NAS 118 to secondary local network 120 , secondary application 124 , secondary DB 126 , or secondary NAS 128 .
- FIG. 8 is a flowchart of exemplary method 800 for adopting the role of a primary director system, consistent with disclosed embodiments.
- method 800 may be performed by a component of system 100 of FIG. 1 and/or director cluster 200 of FIG. 2 , for example, one of secondary director systems 220 or 230 .
- Method 800 is described below with reference to the networked systems of FIGS. 1 and 2 , but any other configuration of systems, subsystems, or modules may be used to perform method 800 .
- secondary director system 220 may poll the probe data store of method 500 of FIG. 5 in intervals to retrieve the status of application 114 . Polling the probe data store may include accessing a web page or database, receiving a file or an email, or any suitable method by which secondary director system 220 may retrieve the status of application 114 via a data store. In other embodiments, secondary director system 220 may determine the status of application 114 directly. At step 804 , secondary director system 220 may store the retrieved status in a data store with an associated timestamp.
- secondary director system 220 may determine whether the second data store includes a particular number of consecutive ‘DOWN’ statuses, the consecutive ‘DOWN’ statuses being the latest statuses to have been retrieved from the probe data store or application 114 . If secondary director system 220 determines that the second data store does include the particular number of consecutive ‘DOWN’ statuses, method 800 may proceed to step 808 . Alternatively, method 800 may return to step 802 to continue polling the probe data store or application 114 in intervals.
- secondary director system 220 may determine whether it has engaged in a handshake procedure with primary director system 210 to determine whether secondary director systems 220 confirms that a particular number of consecutive ‘DOWN’ statuses for application 114 has been reached. If secondary director system 220 determines that it has engaged in a handshake procedure with primary director system 210 , then primary director system 210 is active and secondary director system 220 may remain inactive, returning to step 802 to once again poll the probe data store or application 114 in intervals. Otherwise, method 800 may proceed to step 810 .
- secondary director system 220 may determine whether a certain amount of time has passed since it determined that application 114 was unavailable. For example, the certain amount of time may be 1 minute after retrieving and/or storing 5 consecutive ‘DOWN’ statuses. As another example, the certain amount of time may be 5 minutes since the first ‘DOWN’ status of the 5 consecutive ‘DOWN’ statuses was retrieved and/or stored. If secondary director system 220 determines that the certain amount of time has not passed, method 800 may return to step 808 to await the handshake from primary director system 210 until the certain amount of time has passed. Otherwise, method 800 may proceed to step 812 .
- the certain amount of time may be 1 minute after retrieving and/or storing 5 consecutive ‘DOWN’ statuses.
- the certain amount of time may be 5 minutes since the first ‘DOWN’ status of the 5 consecutive ‘DOWN’ statuses was retrieved and/or stored. If secondary director system 220 determines that the certain amount of time has not passed, method 800 may return to step 808 to await the handshake from primary director system 210 until the
- secondary director system 220 may adopt the role of primary director system 210 , as primary director system 210 is assumed to be unavailable or inactive. Adopting the role of primary director system 210 may involve performing the steps of method 600 of FIG. 6 beginning at step 612 .
- Systems and methods disclosed herein involve unconventional improvements over conventional failover systems. As compared to conventional technologies, the disclosed embodiments may improve resilience, reaction time, flexibility, convenience, and compatibility.
- Computer programs based on the written description and methods of this specification are within the skill of a software developer.
- the various functions, scripts, programs, or modules can be created using a variety of programming techniques.
- programs, scripts, functions, program sections or program modules can be designed in or by means of languages, including JAVASCRIPT, C, C++, JAVA, PHP, PYTHON, RUBY, PERL, BASH, or other programming or scripting languages.
- One or more of such software sections or modules can be integrated into a computer system, non-transitory computer-readable media, or existing communications software.
- the programs, modules, or code can also be implemented or replicated as firmware or circuit logic.
Abstract
Description
- The present disclosure generally relates to computerized systems and methods for application management. In particular, embodiments of the present disclosure relate to inventive and unconventional systems that maximize resilience of applications and reaction time and flexibility of failover processes by employing a distributed director and probe system to monitor and automatically trigger a failover in the event of an unhealthy application.
- Conventional failover systems and methods often require vast technical expertise to operate, and are inflexible and inconvenient. Usually, conventional failover processes do not allow a user to modify any part of the failover process unless the user can modify the code itself. Further, these systems may be too simple for complex systems, such as banking systems or financial systems, which require systems with high availability. In particular, conventional failover systems may take actions which cause applications to undergo redundant failover processes and be unavailable for some period of time, negatively impacting customer experience. For example, a system which iteratively checks the health of an application and performs a failover process immediately upon registering a moment of unhealthiness may be operating with incomplete data. For instance, the issue may not be in the application, but in the system itself, causing the application to undergo failover for no reason. On the other hand, systems which attempt to correct this by engaging a human operator may cause the reaction time for responding to a genuinely unhealthy application to increase dramatically.
- Additionally, conventional failover systems are specific to each application and must be set up individually at great financial and time cost, rendering these systems inconvenient and many times incompatible between applications.
- Therefore, in view of the shortcomings and problems with existing methods, there is a need for improved systems and methods that employ application failover management that can be used for a plurality of applications. Such unconventional systems will improve resilience, reaction time, flexibility, convenience, decrease cost, and increase compatibility.
- One aspect of the present disclosure is directed to a computer-implemented system for application management. For example, certain embodiments may include a probe system comprising at least one memory storing instructions and one or more processors configured to execute the instructions to monitor an availability of an application and update a status associated with the availability of the application in a first data store. Additional embodiments may include one or more director systems comprising at least one memory storing instructions and one or more processors configured to execute the instructions to poll the first data store in intervals to retrieve the status associated with the availability of the application at different times; upon retrieving at least a particular number of consecutive statuses associated with the application being unavailable, determine the application is unavailable; determine whether at least one other director system of the one or more director systems has determined the application is unavailable; and upon determining the at least one other director system of the one or more director systems has determined the application is unavailable, trigger a failover process.
- Another aspect of the present disclosure is directed to a computer-implemented method for application management. For example, certain embodiments of the method may include monitoring an availability of an application; updating a status associated with the availability of the application in a first data store; polling the first data store in intervals to retrieve the status associated with the availability of the application at different times; upon retrieving at least a particular number of consecutive statuses associated with the application being unavailable, determining the application is unavailable; determining whether at least one director system of associated one or more director systems has determined the application is unavailable; and upon determining the at least one director system of the associated one or more director systems has determined the application is unavailable, triggering a failover process.
- Yet another aspect of the present disclosure is directed to a computer-implemented system for application management. For example, certain embodiments may include a probe system comprising at least one memory storing instructions and one or more processors configured to execute the instructions to monitor an availability of an application and update a status associated with the availability of the application in a first data store. Additional embodiments may include one or more secondary director systems comprising at least one memory storing instructions and one or more processors configured to execute the instructions to poll the first data store in intervals to retrieve the status associated with the availability of the application at different times and upon retrieving at least a particular number of consecutive statuses associated with the application being unavailable, determine the application is unavailable. Additional embodiments may include a primary director system comprising at least one memory storing instructions and one or more processors configured to execute the instructions to poll the first data store in intervals to retrieve the status associated with the availability of the application at different times, upon retrieving the at least the particular number of consecutive statuses associated with the application being unavailable, determine the application is unavailable, determine whether at least one secondary director system of the one or more secondary director systems has determined the application is unavailable, and upon determining the at least one secondary director system of the one or more secondary director systems has determined the application is unavailable, trigger a failover process.
- Consistent with other disclosed embodiments, non-transitory computer readable storage media may store program instructions, which are executed by at least one processor and perform any of the methods described herein.
- The foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the claims.
- The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate several embodiments and, together with the description, serve to explain the disclosed principles. In the drawings:
-
FIG. 1 is a diagram of an exemplary system for application management, consistent with disclosed embodiments. -
FIG. 2 is a diagram of an exemplary director cluster, consistent with disclosed embodiments. -
FIG. 3 is a diagram of an exemplary system for application management which has undergone a failover process, consistent with disclosed embodiments. -
FIG. 4 is a diagram of a user interface for managing one or more applications, consistent with disclosed embodiments. -
FIG. 5 is a flowchart of an exemplary method for monitoring the availability of an application, consistent with disclosed embodiments. -
FIG. 6 is a flowchart of an exemplary method for application management, consistent with disclosed embodiments. -
FIG. 7 is a flowchart of an exemplary method for performing a failover process, consistent with disclosed embodiments. -
FIG. 8 is a flowchart of an exemplary method for adopting the role of a primary director system, consistent with disclosed embodiments. - Disclosed embodiments include systems and methods for application management using a distributed configuration of director systems and probe systems to improve upon the resilience, reaction time, flexibility, convenience, and compatibility of conventional failover processes. The disclosed improved failover processes may be performed to allow a user to manage the failover processes for a plurality of applications from one convenient user interface, determine whether to enable a failover feature for each application, manually trigger a failover process, determine which components of each application stack to engage in failover processes, view an audit trail indicating the failover history for the plurality of applications, receive alerts regarding applications and/or associated director systems, set maintenance times for each application where a failover process must not be triggered, and more, as discussed herein. The disclosed embodiments improve upon the technical process of failover processes as they engage a novel distributed director and probe system which operates to improve the resilience and reaction time of failover processes.
- Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings and disclosed herein. Wherever convenient, the same reference numbers will be used throughout the drawings to refer to the same or like parts.
-
FIG. 1 is a diagram of anexemplary system 100 for managing anapplication 114, consistent with disclosed embodiments.System 100 may include adirector system 102, afailover system 104, a primarylocal network 110, aprimary probe system 112, aprimary application 114, a primary database (DB) 116, a primary network-attached storage (NAS) 118, a secondarylocal network 120, asecondary probe system 122, asecondary application 124, asecondary DB 126, and asecondary NAS 128. Throughout this disclosure, primarylocal network 110 and secondarylocal network 120 may be simply referred to aslocal networks primary probe system 112 andsecondary probe system 122 may be simply referred to asprobe systems primary application 114 andsecondary application 124 may be simply referred to asapplications primary DB 116 andsecondary DB 126 may be simply referred to asDBs secondary NAS 128 may be simply referred to asNASs local networks probe systems - Components of
system 100 may be connected to each other through a network (not shown) such as a Wide Area Network (WAN) or a Local Area Network (LAN). As shown inFIG. 1 ,director system 102 may be directly connected toprobe systems failover system 104;failover system 104 may be directly connected todirector system 102 andapplications primary DB 116 may be directly connected tosecondary DB 126; andprimary NAS 118 may be directly connected tosecondary NAS 128. - As will be appreciated by one skilled in the art, the components of
system 100 may be arranged in various ways and implemented with any suitable combination of hardware, firmware, and/or software, as applicable. For example, as compared to the depiction inFIG. 1 ,system 100 may include a larger or smaller number of director systems, failover systems, probe systems, applications, databases, network-attached storages, or networks. In addition,system 100 may further include other components or devices not depicted that perform or assist in the performance of one or more processes, consistent with the disclosed embodiments. The exemplary components and arrangements shown inFIG. 1 are not intended to limit the disclosed embodiments. -
Director system 102 may include one or more memory units and one or more processors configured to perform operations consistent with disclosed embodiments. In some embodiments,director system 102 may include hardware, software, and/or firmware modules. In some embodiments, some or all components ofdirector system 102 may be hosted on one or more servers, one or more clusters of servers, or one or more cloud services (e.g., cloud services hosted by Akamai, Microsoft, Amazon, Oracle, Google, Apache, or any other appropriate cloud service).Director system 102 may be connected to one or more networks and/or may be connected directly toprobe systems failover system 104.Director system 102 may be configured to make a plurality of determinations, and based on those determinations, trigger a failover process.Director system 102 is described in greater detail below with reference toFIGS. 2 and 6 . -
Failover system 104 may include one or more memory units and one or more processors configured to perform operations consistent with disclosed embodiments. In some embodiments,failover system 104 may include hardware, software, and/or firmware modules. In some embodiments, some or all components offailover system 104 may be hosted on one or more servers, one or more clusters of servers, or one or more cloud services (e.g., cloud services hosted by Akamai, Microsoft, Amazon, Oracle, Google, Apache, or any other appropriate cloud service).Failover system 104 may be connected to one or more networks and/or may be connected directly toapplications director system 102. Failoversystem 104 may be configured to perform a failover process automatically or upon receiving a trigger, such as fromdirector system 102.Failover system 104 is described in greater detail below with reference toFIGS. 3 and 7 . - As shown in
FIG. 1 , at least one ofprimary probe system 112,primary application 114,primary DB 116, andprimary NAS 118 may connect to primarylocal network 110, and at least one ofsecondary probe system 122,secondary application 124,secondary DB 126, andsecondary NAS 128 may connect to secondarylocal network 120.Local networks local networks - Probe
systems probe systems probe systems systems applications DBs NASs applications primary probe system 112 may run a query againstprimary DB 116 to determine thatprimary application 114 is available, and thus may be associated with a status of ‘UP.’ As another example,secondary probe system 122 may run a query againstsecondary DB 126 to determine thatsecondary application 124 is unavailable, and thus may be associated with a status of ‘DOWN.’ Probesystems director system 102,applications DBs NASs systems FIG. 5 . -
Applications applications Applications system 100, from a user, or from any other entity,applications system 100 component.Applications 114 may be connected to one or more networks and/or may be connected directly tofailover system 104,probe systems DBs NASs Applications -
DBs DBs DBs DBs system 100 or remote computing components (e.g., cloud-based data structures). Data inDBs DBs DBs DBs -
DBs systems applications NASs primary DB 116 may be active and may replicate its data ontosecondary DB 126 by any appropriate process, such as by replicating data between heterogeneous databases. In such embodiments, one or more components of system 100 (e.g.,director system 102,failover system 104, orapplications 114 and 124) may ensure thatsecondary DB 126 contains an up-to-date copy of the data inprimary DB 116. Additionally or alternatively, in other embodiments,secondary DB 126 may be active and may replicate its data ontoprimary DB 116 by any appropriate process. -
NASs local networks NASs NASs NASs system 100 or remote computing components (e.g., cloud-based data structures). Data inNASs NASs NASs NASs -
NASs systems applications DBs primary NAS 118 may be active and may replicate its data ontosecondary NAS 128 by any appropriate process, such as snapshot replication. In such embodiments, one or more components of system 100 (e.g.,director system 102,failover system 104, orapplications 114 and 124) may ensure thatsecondary NAS 128 contains an up-to-date copy of the data inprimary NAS 118. Additionally or alternatively, in other embodiments,secondary NAS 128 may be active and may replicate its data ontoprimary NAS 118 by any appropriate process. -
FIG. 2 is a diagram of anexemplary director cluster 200, consistent with disclosed embodiments.Director cluster 200 may include aprimary director system 210, aprimary decision manager 212,primary sensors 214, aprimary user interface 216, a firstsecondary director system 220, a firstsecondary decision manager 222, firstsecondary sensors 224, a firstsecondary user interface 226, a secondsecondary director system 230, a secondsecondary decision manager 232, secondsecondary sensors 234, and a secondsecondary user interface 236. Throughout this disclosure, first and secondsecondary director systems secondary director systems secondary decision managers secondary decision managers secondary sensors secondary sensors secondary user interfaces secondary user interfaces - In some embodiments,
director cluster 200 may include onlyprimary director system 210,primary director system 210 and onesecondary director system primary director system 210 and any number of secondary director systems. In other embodiments,primary director system 210 may be inactive, and one ofsecondary director systems primary director system 210, as discussed in greater detail herein.Primary director system 210 andsecondary director systems FIG. 2 ,decision managers - As will be appreciated by one skilled in the art, the components of
director cluster 200 may be arranged in various ways and implemented with any suitable combination of hardware, firmware, and/or software, as applicable. For example, as compared to the depiction inFIG. 2 ,director cluster 200 may include a larger or smaller number of director systems, decision managers, sensors, or user interfaces. In addition,director cluster 200 may further include other components or devices not depicted that perform or assist in the performance of one or more processes, consistent with the disclosed embodiments. The exemplary components and arrangements shown inFIG. 2 are not intended to limit the disclosed embodiments. -
Primary director system 210 may include one or more memory units and one or more processors, as discussed in greater detail herein.Primary director system 210 may includeprimary decision manager 212, which may include programs or pieces of software (e.g., modules, code, scripts, or functions) designed and written to process data and perform a particular task or set of tasks to fulfill a particular purpose. For example,primary decision manager 212 may be configured to manage an application (e.g.,application 114 ofFIG. 1 ) by triggering a failover process if the application becomes unavailable.Primary decision manager 212 may be configured to perform a task in response to a triggering event. For example, in response to a triggering event such as a consecutive number of ‘DOWN’ statuses associated with an application,primary decision manager 212 may be configured to initiate a failover protocol. As another example, in response to a triggering event such as the receipt of input data from any component ofdirector cluster 200 orsystem 100 ofFIG. 1 , a user, or any other entity,primary decision manager 212 may be configured to process the input data and forward processed data to anotherdirector cluster 200 orsystem 100 component.Primary decision manager 212 may be connected to one or more networks and/or may be connected directly tosecondary decision managers system 100 ordirector cluster 200. -
Primary decision manager 212 may includeprimary sensors 214, which may be software (e.g., modules, code, scripts, or functions) or hardware configured to detect or measure a status associated with an application, object, or entity and transmit a resulting signal corresponding to their findings. For example,primary sensors 214 may be configured to determine the health of an application and report their findings toprimary decision manager 212. In particular,primary sensors 214 may be configured to pollprobe systems FIG. 1 to determine the health ofapplication primary decision manager 212.Primary sensors 214 may be connected to one or more networks and/or may be connected directly to probesystems system 100 ordirector cluster 200. -
Primary decision manager 212 may includeprimary user interface 216, which may be software (e.g., modules, code, scripts, or functions) and/or hardware configured to allow a user and a computer system to interact. For example,primary user interface 216 may be configured to display, on a physical or virtual display, elements to a user which allow the user to make selections regarding one or more components ofsystem 100 ofFIG. 1 ordirector cluster 200.Primary user interface 216 may be connected to one or more networks and/or may be connected directly to one or more components ofsystem 100 ordirector cluster 200.Primary user interface 216 is described in greater detail below. -
Secondary director systems Secondary director systems secondary decision managers primary decision manager 212 and may be configured to perform similar functions. Additionally,secondary decision managers primary decision manager 212, as discussed in greater detail below with respect toFIG. 8 .Secondary decision managers primary decision manager 212, and any other component ofsystem 100 ordirector cluster 200. -
Secondary director systems secondary sensors primary sensors 214 and may be configured to perform similar functions.Secondary sensors systems FIG. 1 and any other component ofsystem 100 ordirector cluster 200. -
Secondary director systems secondary user interfaces primary user interface 216 and may be configured to perform similar functions. Additionally, secondary user interfaces may be configured to adapt to the role ofprimary user interface 216.Secondary user interfaces system 100 ordirector cluster 200. -
FIG. 3 is a diagram of anexemplary system 300 for managing anapplication 324 which has undergone a failover process, consistent with disclosed embodiments.System 300 may include adirector system 302, afailover system 304, a primarylocal network 310, aprimary probe system 312, aprimary application 314, a primary database (DB) 316, a primary network-attached storage (NAS) 318, a secondarylocal network 320, asecondary probe system 322, asecondary application 324, asecondary DB 326, and asecondary NAS 328. The components ofsystem 300 are similar to each corresponding component ofsystem 100 ofFIG. 1 and will not be described further with respect toFIG. 3 . - As will be appreciated by one skilled in the art, the components of
system 300 may be arranged in various ways and implemented with any suitable combination of hardware, firmware, and/or software, as applicable. For example, as compared to the depiction inFIG. 3 ,system 300 may include a larger or smaller number of director systems, failover systems, probe systems, applications, databases, network-attached storages, or networks. In addition,system 300 may further include other components or devices not depicted that perform or assist in the performance of one or more processes, consistent with the disclosed embodiments. The exemplary components and arrangements shown inFIG. 3 are not intended to limit the disclosed embodiments. - As shown in
FIG. 3 ,system 300 has undergone a failover process, causingsecondary application 324,secondary DB 326, andsecondary NAS 328 to become active, whileprimary application 314,primary DB 316, andprimary NAS 318 become inactive. Conceivably,primary application 114 ofFIG. 1 may have become unavailable, causingdirector system 102 to trigger a failover process by, for example, instructingfailover system 104 to perform the failover process. In some embodiments, the failover process may involve shutting down primarylocal network 110,primary application 114,primary DB 116, and/orprimary NAS 118; bringing up secondarylocal network 120,secondary application 124,secondary DB 126, and/orsecondary NAS 128; and switching traffic to secondarylocal network 120,application 124,secondary DB 126, and/orsecondary NAS 128, causingsystem 100 to becomesystem 300. For example, ifprimary probe system 312 runs a query againstprimary DB 316,primary probe system 312 may determine thatprimary application 314 is unavailable, and thus may be associated with a status of ‘DOWN.’ As another example, ifsecondary probe system 322 runs a query againstsecondary DB 326,secondary probe system 322 may determine thatsecondary application 324 is available, and thus may be associated with a status of ‘UP.’ - In some embodiments,
secondary DB 326 may now be active and may replicate its data ontoprimary DB 316 by any appropriate process. In some embodiments,secondary NAS 328 may now be active and may replicate its data ontoprimary NAS 318 by any appropriate process. In such embodiments, one or more components of system 300 (e.g.,director system 302,failover system 304, orapplications 314 and 324) may ensure thatprimary DB 316 contains an up-to-date copy of the data insecondary DB 326 andprimary NAS 318 contains an up-to-date copy of the data insecondary NAS 328. -
FIG. 4 is a diagram of auser interface 400 for managing one or more applications, consistent with disclosed embodiments.User interface 400 may include table 402 containing rows 404 a-h corresponding to applications and columns 406 a-h corresponding to data associated with the applications. For example, rows 404 a-g may correspond to Applications A-G and row 404 h may describe the data contained in each column of table 402,column 406 a may indicate the name of an application,column 406 b may indicate the primary director system associated with an application and whether it is active, column 406 c may indicate the secondary director system associated with an application and whether it is active,column 406 d may indicate maintenance times for an application during which an automatic failover process should not be engaged,column 406 e may indicate the status of an application,column 404 f may indicate whether a user has selected to enable the automatic failover process, column 406 g may indicate whether there is an alert associated with an application, andcolumn 406 h may allow a user to click on a director system to trigger a failover process and switch traffic from a primary application (e.g.,primary application 114 ofFIG. 1 ) to a secondary application (e.g., secondary application 124). - As will be appreciated by one skilled in the art, the components of
user interface 400 may be arranged in various ways and implemented with any suitable combination of hardware, firmware, and/or software, as applicable. For example, as compared to the depiction inFIG. 4 ,user interface 400 may include a larger or smaller number of rows or columns, allowing for a larger or smaller number of applications or amount of data associated with the applications. For instance,user interface 400 may include an additional ‘Secondary Director’ column to allow for a director cluster with three director systems, such asdirector cluster 200 ofFIG. 2 . In addition,user interface 400 may further include other components not depicted that perform or assist in the performance of one or more processes, consistent with the disclosed embodiments. The exemplary components and arrangements shown inFIG. 4 are not intended to limit the disclosed embodiments. - As an example, ‘Application A’ corresponding to row 404a may have an inactive primary director system ‘X,’ an active secondary director system ‘Y,’ maintenance scheduled for Sunday from 02:00-06:00, a status of ‘UP,’ user-enabled the automatic failover process, an outstanding alert, and the option to trigger a failover process to activate primary director system ‘X.’ As another example, ‘Application G’ corresponding to row 402 g may have an inactive primary director system ‘X,’ an inactive secondary director system ‘Z,’ maintenance scheduled for Sunday from 02:00-06:00, a status of ‘DOWN,’ user-disabled the automatic failover process, no outstanding alerts, and the option to trigger a failover process to activate primary director system ‘X’ or secondary director system ‘Z.’
- In some embodiments, visual indications may be utilized in
columns 406 b and 406 c to specify which, if any, of the director systems is currently active. For example, an active or inactive director system may be specified by way of different colors, shading, text, or by any other means which may convey to a user whether a director system is active. In some embodiments, a user may be able to click on a cell or data contained within a cell associated with a primary or secondary director system of an application to activate the clicked primary or secondary director system. In other embodiments, clicking on or hovering over a cell or data contained within a cell associated with a primary or secondary director system of an application may reveal information related to the clicked primary or secondary director system. In some embodiments,columns 406 b and 406 c may be updated automatically by a suitable component ofsystem 100 ofFIG. 1 ordirector cluster 200 ofFIG. 2 , or may be modified by a user to, for example, swap the primary or secondary director systems for a different director system. - In some embodiments, the maintenance time specified in
column 406 d indicates a period of time during which the automatic failover process, should it be enabled, will not be engaged. In some embodiments,column 406 e, relating to the status of an application, may be updated by one or more ofprobe systems director system 102, or any other suitable component ofsystem 100 ofFIG. 1 . In some embodiments,column 406 f may allow a user to enable the automatic failover process by, for example, clicking on or sliding a slider one way or another. For example, the automatic failover process may be enabled for rows 404 a-b and 404 d, while manual failover may be required forrows 404 c and 404 e-g. The automatic failover process will be discussed in greater detail below with respect toFIG. 6 . - In some embodiments, the data contained in the cells of column 406 g may merely indicate, in binary form, whether there is an alert associated with an application. In other embodiments, different types of alerts may be indicated by way of visual indications, such as different colors, shapes, sizes, or any other appropriate visual cues. In some embodiments, the alert may be retrieved by clicking on, hovering over, or activating in any appropriate manner, an element or data contained within a cell of column 406 g. Additionally or alternatively, a part of or all of the alert may itself be contained within the cells of column 406 g. In the example of
FIG. 4 , there may be an outstanding alert associated with Applications A and C-E, for example, to alert of an issue with director system ‘Y.’ - By way of example,
column 406 h may allow a user to click on a director system to trigger a failover process and switch traffic from a primary application to a secondary application, as discussed above. For example, for ‘Application A,’ a user may click on or otherwise select primary director system ‘X’ to trigger a failover process and switch traffic from secondary director ‘Y’ to primary director system ‘X.’ In some embodiments,columns system 100 ofFIG. 1 ordirector cluster 200 ofFIG. 2 . In some examples,user interface 400 may include other features that a user may interact with, such as options to sort, filter, search, or otherwise modify table 402, generate a report with all or a part of the data contained in table 402, view historical data (e.g., total number of failovers executed), view or generate statistics, view an audit trail indicating a chronological record of the sequence of activities performed onuser interface 400, determine which components of an application stack are to undergo the failover process or any other appropriate function which may be useful to a user usinguser interface 400. -
FIG. 5 is a flowchart ofexemplary method 500 for monitoring the availability of an application, consistent with disclosed embodiments. In some embodiments,method 500 may be performed by a component ofsystem 100 ofFIG. 1 , for example, one ofprobe systems director system 102.Method 500 is described below with reference to the networked systems ofFIG. 1 , but any other configuration of systems, subsystems, or modules may be used to performmethod 500. - At
step 502,probe system 112 may run a query againstdatabase 116. In some embodiments,probe system 112 may send a request for the query toapplication 114. A load balancer ofapplication 114 may transmit the request to a web server ofapplication 114, which in turn may transmit the request to an application server ofapplication 114, which may then run the query againstdatabase 116. The response whichprobe system 112 expects may be a login webpage, a JSON file, a 200 response code, or any other suitable response which may indicate to probesystem 112 whetherapplication 114 anddatabase 116 are available. In other embodiments,probe system 112 may connect directly todatabase 116.Probe system 112 may continuously run queries againstdatabase 116 or may run queries againstdatabase 116 in intervals, such as every minute. - At
step 504,probe system 112 may determine whether the query response is acceptable. For example, ifprobe system 112 is successfully directed to a login webpage or receives a ‘200 OK’ response code,probe system 112 may determine that the query response is acceptable andmethod 500 may proceed to step 506 a. Alternatively, ifprobe system 112 does not receive an acceptable response, for example, there is no response or it is incomplete, such as being directed to a login webpage including an error,method 500 may proceed to step 506 b. - At
step 506 a,probe system 112 may have determined that the query response is acceptable, and may label a status associated withapplication 114 as ‘UP.’ On the other hand, atstep 506 b,probe system 112 may have determined that the query response is not acceptable, and may label the status associated withapplication 114 as ‘DOWN.’ - At
step 508,probe system 112 may update a data store associated withprobe system 112 with the labeled status of ‘UP’ or ‘DOWN,’ depending on whether the response was acceptable or not, respectively. The data store may be a database which is connected to one or more networks ofsystem 100 and, as such,director system 102 may access. In other embodiments, the data store is a webpage whichdirector system 102 may access through an Internet connection. In yet other embodiments, the data store may be any repository for storing data which may include a file, email, document, database, webpage, spreadsheet, message queue, or any other suitable method for storing data which may be accessed bydirector system 102. -
FIG. 6 is a flowchart ofexemplary method 600 for application management, consistent with disclosed embodiments. In some embodiments,method 600 may be performed by a component ofsystem 100 ofFIG. 1 and/ordirector cluster 200 ofFIG. 2 , for example,primary director system 210.Method 600 is described below with reference to the networked systems ofFIGS. 1 and 2 , but any other configuration of systems, subsystems, or modules may be used to performmethod 600. - At
step 602,primary director system 210 may render a user interface (e.g.,user interface 400 ofFIG. 4 ) on a display such that a user may interact with the user interface. The display may include an electronic device or part of an electronic device which serves for the visual presentation of data. Atstep 604,primary director system 210 may receive a selection from the user indicating whether the automatic failover process is to be enabled forapplication 114. The selection may include an activation of an element of a cell ofcolumn 406 f associated withapplication 114. - At
step 606,primary director system 210 may poll the probe data store ofprobe system 112 in intervals to retrieve the status ofapplication 114. Polling the probe data store may include accessing a web page or database, receiving a file or an email, or any suitable method by whichprimary director system 210 may retrieve the status ofapplication 114 via a data store. In other embodiments,primary director system 210 may determine the status ofapplication 114 bypolling application 114 directly. Polling the probe data store orapplication 114 in intervals may refer to polling the probe data store orapplication 114 once every ‘X’ amount of time. For example,director system 210 may poll the data store orapplication 114 once every minute. The interval time may be set by a user, for example, viauser interface 400, automatically determined byprimary director system 210, or predetermined by a manufacturer. Atstep 608,primary director system 210 may store the retrieved status in a data store with an associated timestamp. A timestamp may be a digital record of the time at which the status was retrieved. - At
step 610,primary director system 210 may determine whether the second data store includes a particular number of consecutive ‘DOWN’ statuses, the consecutive ‘DOWN’ statuses being the latest statuses to have been retrieved from the probe data store orapplication 114. For example, if the particular number is 5,primary director system 210 may determine the second data store includes the particular number of consecutive ‘DOWN’ statuses upon retrieving and/or storing 5 successive ‘DOWN’ statuses. Ifprimary director system 210 determines that the second data store does include the particular number of consecutive ‘DOWN’ statuses,method 600 may proceed to step 612. Alternatively,method 600 may return to step 606 to continue polling the probe data store in intervals. - At
step 612,primary director system 210 may engage in a handshake procedure with one or more ofsecondary director systems 220 and/or 230 to determine whether the one or more ofsecondary director systems 220 and/or 230 confirm that a particular number of consecutive ‘DOWN’ statuses forapplication 114 has been reached. The handshake procedure may be an automated process of an exchange of information between one or more director systems. For example,primary director system 210 may communicate withsecondary director system 220 to determine whethersecondary director system 220 has determined thatapplication 114 is unavailable. This may preventprimary director system 210 from triggering a failover process forapplication 114 if the problem only exists in the connection betweenprimary director system 210 andprobe system 112 andsecondary director systems 220 and/or 230 do not considerapplication 114 to be unavailable. Ifprimary director system 210 determines that the one or more ofsecondary director systems 220 and/or 230 confirm that the particular number of consecutive ‘DOWN’ statuses forapplication 114 have been reached,method 600 may proceed to step 614. Otherwise,primary director system 210 may not proceed with the failover process and may engage an error process, such as alerting a support team regarding a potential probe failure. - At
step 614,primary director system 210 may determine whether the user has enabled the automatic failover process based on the user selection ofstep 604. The automatic failover process may refer to a piece of software (e.g., modules, code, scripts, or functions) which automatically triggers a failover process without requiring a user input following the determination that an application is unavailable. Ifprimary director system 210 determines that the user has not enabled the automatic failover process,method 600 may proceed to step 616. Otherwise,method 600 may proceed to step 624, whereprimary director system 210 may trigger a failover process. - At
step 616,primary director system 210 may transmit an alert to the user. The alert may take the form of an email, notification, update to column 406 g ofFIG. 4 , or any other means of informing the user thatapplication 114 is unavailable. Atstep 618,primary director system 210 may render the user interface automatically or as a result of the user attempting to access the user interface. Atstep 620,primary director system 210 may receive a selection from the user instructingprimary director system 210 to trigger the failover process. Atstep 622,primary director system 210 may trigger the failover process by, for example, instructingfailover system 104 to perform the failover process or by performing the failover process itself. -
Method 600 may be adjusted to be performed in fewer than ‘X’ minutes to satisfy a ‘X’-minute service level agreement (SLA). For example,method 600 may be performed in fewer than 15 minutes to satisfy a 15-minute SLA ifprimary director system 210 polls the probe data store every minute and the number of consecutive ‘DOWN’ statuses necessary to trigger the failover process is 5. -
FIG. 7 is a flowchart ofexemplary method 700 for performing a failover process, consistent with disclosed embodiments. In some embodiments,method 700 may be performed by a component ofsystem 100 ofFIG. 1 and/ordirector cluster 200 ofFIG. 2 , for example,failover system 104.Method 700 is described below with reference to the networked systems ofFIGS. 1 and 2 , but any other configuration of systems, subsystems, or modules may be used to performmethod 700. - At
step 702,primary director system 210 may trigger a failover process by, for example, instructingfailover system 104 to perform the failover process. - At
step 704,failover system 104 may shut down at least one of primarylocal network 110,primary application 114,primary DB 116, orprimary NAS 118.Failover system 104 may terminate the connections by, for example, forcinglocal network 110,primary application 114,primary DB 116, orprimary NAS 118 to go offline, creating a dynamic KILL statement for each connection, and/or altering the connections to have a single or restricted user. - At
step 706,failover system 104 may bring up at least one of secondarylocal network 120,secondary application 124,secondary DB 126, orsecondary NAS 128. - At
step 708,failover system 104 may switch traffic to the at least one of secondarylocal network 120,secondary application 124,secondary DB 126, orsecondary NAS 128 by reestablishing the connections fromlocal network 110,primary application 114,primary DB 116, orprimary NAS 118 to secondarylocal network 120,secondary application 124,secondary DB 126, orsecondary NAS 128. -
FIG. 8 is a flowchart ofexemplary method 800 for adopting the role of a primary director system, consistent with disclosed embodiments. In some embodiments,method 800 may be performed by a component ofsystem 100 ofFIG. 1 and/ordirector cluster 200 ofFIG. 2 , for example, one ofsecondary director systems Method 800 is described below with reference to the networked systems ofFIGS. 1 and 2 , but any other configuration of systems, subsystems, or modules may be used to performmethod 800. - At
step 802,secondary director system 220 may poll the probe data store ofmethod 500 ofFIG. 5 in intervals to retrieve the status ofapplication 114. Polling the probe data store may include accessing a web page or database, receiving a file or an email, or any suitable method by whichsecondary director system 220 may retrieve the status ofapplication 114 via a data store. In other embodiments,secondary director system 220 may determine the status ofapplication 114 directly. Atstep 804,secondary director system 220 may store the retrieved status in a data store with an associated timestamp. - At
step 806,secondary director system 220 may determine whether the second data store includes a particular number of consecutive ‘DOWN’ statuses, the consecutive ‘DOWN’ statuses being the latest statuses to have been retrieved from the probe data store orapplication 114. Ifsecondary director system 220 determines that the second data store does include the particular number of consecutive ‘DOWN’ statuses,method 800 may proceed to step 808. Alternatively,method 800 may return to step 802 to continue polling the probe data store orapplication 114 in intervals. - At
step 808,secondary director system 220 may determine whether it has engaged in a handshake procedure withprimary director system 210 to determine whethersecondary director systems 220 confirms that a particular number of consecutive ‘DOWN’ statuses forapplication 114 has been reached. Ifsecondary director system 220 determines that it has engaged in a handshake procedure withprimary director system 210, thenprimary director system 210 is active andsecondary director system 220 may remain inactive, returning to step 802 to once again poll the probe data store orapplication 114 in intervals. Otherwise,method 800 may proceed to step 810. - At
step 810,secondary director system 220 may determine whether a certain amount of time has passed since it determined thatapplication 114 was unavailable. For example, the certain amount of time may be 1 minute after retrieving and/or storing 5 consecutive ‘DOWN’ statuses. As another example, the certain amount of time may be 5 minutes since the first ‘DOWN’ status of the 5 consecutive ‘DOWN’ statuses was retrieved and/or stored. Ifsecondary director system 220 determines that the certain amount of time has not passed,method 800 may return to step 808 to await the handshake fromprimary director system 210 until the certain amount of time has passed. Otherwise,method 800 may proceed to step 812. - At
step 812,secondary director system 220 may adopt the role ofprimary director system 210, asprimary director system 210 is assumed to be unavailable or inactive. Adopting the role ofprimary director system 210 may involve performing the steps ofmethod 600 ofFIG. 6 beginning atstep 612. - Systems and methods disclosed herein involve unconventional improvements over conventional failover systems. As compared to conventional technologies, the disclosed embodiments may improve resilience, reaction time, flexibility, convenience, and compatibility.
- Descriptions of the disclosed embodiments are not exhaustive and are not limited to the precise forms or embodiments disclosed. Modifications and adaptations of the embodiments will be apparent from consideration of the specification and practice of the disclosed embodiments. Additionally, the disclosed embodiments are not limited to the examples discussed herein.
- Computer programs based on the written description and methods of this specification are within the skill of a software developer. The various functions, scripts, programs, or modules can be created using a variety of programming techniques. For example, programs, scripts, functions, program sections or program modules can be designed in or by means of languages, including JAVASCRIPT, C, C++, JAVA, PHP, PYTHON, RUBY, PERL, BASH, or other programming or scripting languages. One or more of such software sections or modules can be integrated into a computer system, non-transitory computer-readable media, or existing communications software. The programs, modules, or code can also be implemented or replicated as firmware or circuit logic.
- Moreover, while illustrative embodiments have been described herein, the scope includes any and all embodiments having equivalent elements, modifications, omissions, combinations (e.g., of aspects across various embodiments), adaptations or alterations based on the present disclosure. The elements in the claims are to be interpreted broadly based on the language employed in the claims and not limited to examples described in the present specification or during the prosecution of the application, which examples are to be construed as non-exclusive. Further, the steps of the disclosed methods can be modified in any manner, including by reordering steps or inserting or deleting steps. It is intended, therefore, that the specification and examples be considered as exemplary only, with a true scope and spirit being indicated by the following claims and their full scope of equivalents.
Claims (20)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/351,657 US20220405170A1 (en) | 2021-06-18 | 2021-06-18 | Systems and methods for application failover management using a distributed director and probe system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/351,657 US20220405170A1 (en) | 2021-06-18 | 2021-06-18 | Systems and methods for application failover management using a distributed director and probe system |
Publications (1)
Publication Number | Publication Date |
---|---|
US20220405170A1 true US20220405170A1 (en) | 2022-12-22 |
Family
ID=84490305
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/351,657 Pending US20220405170A1 (en) | 2021-06-18 | 2021-06-18 | Systems and methods for application failover management using a distributed director and probe system |
Country Status (1)
Country | Link |
---|---|
US (1) | US20220405170A1 (en) |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100042673A1 (en) * | 2008-08-18 | 2010-02-18 | Novell, Inc. | System and method for dynamically enabling an application for business continuity |
US7971094B1 (en) * | 2009-03-03 | 2011-06-28 | Netapp, Inc. | Method, system and apparatus for creating and executing a failover plan on a computer network |
US20140047088A1 (en) * | 2012-08-09 | 2014-02-13 | International Business Machines Corporation | Service management roles of processor nodes in distributed node service management |
US20160092322A1 (en) * | 2014-09-30 | 2016-03-31 | Microsoft Corporation | Semi-automatic failover |
US20190340060A1 (en) * | 2018-05-05 | 2019-11-07 | Dell Products L.P. | Systems and methods for adaptive proactive failure analysis for memories |
US20200349036A1 (en) * | 2019-05-03 | 2020-11-05 | EMC IP Holding Company LLC | Self-contained disaster detection for replicated multi-controller systems |
US20220255822A1 (en) * | 2021-02-08 | 2022-08-11 | Sap Se | Reverse health checks |
US20220261321A1 (en) * | 2021-02-12 | 2022-08-18 | Commvault Systems, Inc. | Automatic failover of a storage manager |
-
2021
- 2021-06-18 US US17/351,657 patent/US20220405170A1/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100042673A1 (en) * | 2008-08-18 | 2010-02-18 | Novell, Inc. | System and method for dynamically enabling an application for business continuity |
US7971094B1 (en) * | 2009-03-03 | 2011-06-28 | Netapp, Inc. | Method, system and apparatus for creating and executing a failover plan on a computer network |
US20140047088A1 (en) * | 2012-08-09 | 2014-02-13 | International Business Machines Corporation | Service management roles of processor nodes in distributed node service management |
US20160092322A1 (en) * | 2014-09-30 | 2016-03-31 | Microsoft Corporation | Semi-automatic failover |
US20190340060A1 (en) * | 2018-05-05 | 2019-11-07 | Dell Products L.P. | Systems and methods for adaptive proactive failure analysis for memories |
US20200349036A1 (en) * | 2019-05-03 | 2020-11-05 | EMC IP Holding Company LLC | Self-contained disaster detection for replicated multi-controller systems |
US20220255822A1 (en) * | 2021-02-08 | 2022-08-11 | Sap Se | Reverse health checks |
US20220261321A1 (en) * | 2021-02-12 | 2022-08-18 | Commvault Systems, Inc. | Automatic failover of a storage manager |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10678601B2 (en) | Orchestration service for multi-step recipe composition with flexible, topology-aware, and massive parallel execution | |
US11640434B2 (en) | Identifying resolutions based on recorded actions | |
US10048996B1 (en) | Predicting infrastructure failures in a data center for hosted service mitigation actions | |
US11048574B2 (en) | System and method for workflow error handling | |
US9049105B1 (en) | Systems and methods for tracking and managing event records associated with network incidents | |
US7747717B2 (en) | Fast application notification in a clustered computing system | |
US9419917B2 (en) | System and method of semantically modelling and monitoring applications and software architecture hosted by an IaaS provider | |
US9471462B2 (en) | Proactive risk analysis and governance of upgrade process | |
EP3229151B1 (en) | Reliable generation of a unique key in a distributed system | |
EP2176775B1 (en) | Automatically managing system downtime in a computer network | |
US10289468B1 (en) | Identification of virtual computing instance issues | |
US9411969B2 (en) | System and method of assessing data protection status of data protection resources | |
US8447757B1 (en) | Latency reduction techniques for partitioned processing | |
US10725763B1 (en) | Update and rollback of configurations in a cloud-based architecture | |
US11706084B2 (en) | Self-monitoring | |
US20230016199A1 (en) | Root cause detection of anomalous behavior using network relationships and event correlation | |
US20220245485A1 (en) | Multi-model block capacity forecasting for a distributed storage system | |
US11775358B2 (en) | Tenant copy operation in a microservice architecture | |
CN113760677A (en) | Abnormal link analysis method, device, equipment and storage medium | |
US20220405170A1 (en) | Systems and methods for application failover management using a distributed director and probe system | |
US20200409591A1 (en) | Local analytics for high-availability storage systems | |
US11544166B1 (en) | Data recovery validation test | |
US9639582B2 (en) | Intelligent mapping for an enterprise grid | |
US11582345B2 (en) | Context data management interface for contact center | |
US10228958B1 (en) | Systems and methods for archiving time-series data during high-demand intervals |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: FIDELITY INFORMATION SERVICES, LLC, FLORIDA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:KUMAR, PANKAJ;MANCHIREDDY, ARAVIND;RAVI, RARISH;REEL/FRAME:056585/0921 Effective date: 20210617 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |