WO2016067299A1 - Location aware failover solution - Google Patents

Location aware failover solution Download PDF

Info

Publication number
WO2016067299A1
WO2016067299A1 PCT/IN2015/000024 IN2015000024W WO2016067299A1 WO 2016067299 A1 WO2016067299 A1 WO 2016067299A1 IN 2015000024 W IN2015000024 W IN 2015000024W WO 2016067299 A1 WO2016067299 A1 WO 2016067299A1
Authority
WO
WIPO (PCT)
Prior art keywords
location
node
information
information related
data
Prior art date
Application number
PCT/IN2015/000024
Other languages
French (fr)
Inventor
Rafiq AHAMED K
Priyanka RANJAN
Original Assignee
Hewlett-Packard Development Company, L.P.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hewlett-Packard Development Company, L.P. filed Critical Hewlett-Packard Development Company, L.P.
Publication of WO2016067299A1 publication Critical patent/WO2016067299A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/40Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass for recovering from a failure of a protocol instance or entity, e.g. service redundancy protocols, protocol state redundancy or protocol service redirection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2023Failover techniques
    • G06F11/2025Failover techniques using centralised failover control functionality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2041Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant with more than one idle spare processing component
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2048Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant where the redundant components share neither address space nor persistent storage
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0677Localisation of faults
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/52Network services specially adapted for the location of the user terminal
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2023Failover techniques
    • G06F11/203Failover techniques using migration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/815Virtual

Definitions

  • FIG. 1 illustrates an example of a system that can employ location awareness to trigger an automated failover solution to reduce the impact of potential disruptive conditions.
  • FIG. 2 illustrates an example of a data center system equipped with location data service and global positioning system capabilities.
  • FIG. 3 illustrates an example of a location monitor system that can employ a location aware failover solution upon detection of a potential disruptive condition based on a previously-established action plan.
  • FIG. 4 illustrates an example of an action plan.
  • FIG. 5 illustrates a flowchart of an example method for employing a location aware failover solution.
  • FIG. 6 illustrates a flowchart of an example method for employing location awareness to trigger an automated failover solution.
  • FIG. 7 illustrates a flowchart of another example method for employing location awareness to trigger an automated failover solution.
  • FIG. 8 illustrates an example of a non-transitory computer readable medium to store machine readable instructions to implement a location monitor system.
  • FIG. 9 illustrates an example of a location monitor system. DETAILED DESCRIPTION
  • HA solutions struggle to prevent the impact of disruptive conditions (e.g., weather, earthquake, terrorism, fire, violence, etc.) on the availability of data and applications.
  • disruptive conditions e.g., weather, earthquake, terrorism, fire, violence, etc.
  • an external disruptive condition that affects only a specific geographical area can affect one of the data centers, causing a single point of failure and an unplanned inaccessibility of portions of the data and applications.
  • the data centers are often located at different geographical points that may not all be affected, these external disruptive conditions often occur without sufficient warning to move data and applications stored at a data center at an affected location to another data center at another location.
  • An example location monitor system described herein can reduce or even prevent the impact of such disruptive conditions on the availability of data and applications.
  • the example location monitor system can include a non-transitory memory to store machine readable instructions and a processing resource (e.g., one or more processor cores) to execute the machine readable instructions.
  • a transceiver can receive information related to a potential disruptive condition at a node in a data center system. The information related to the potential disruptive condition can be based on a physical location of the node.
  • a ranker can categorize a threat level posed by the potential disruptive condition to the node.
  • An action unit can perform an action that is defined by an action plan based on the categorized threat level.
  • FIG. 1 illustrates an example of a system 10 that can employ location awareness to trigger an automated failover solution to reduce the impact of potential disruptive conditions on a node of a data center system 14.
  • the system 10 can perform preemptive actions that can provide an automated failover solution for the node of the data center system 14.
  • the automated failover solution can be based on location awareness and can ensure that data and applications (referred to as "data") are accessible at all times.
  • the term "disruptive condition" can refer to both an internal disruptive event (e.g., planned downtime, human error, hardware error, etc.) and an external disruptive event (e.g., weather, earthquake, terrorism, fire, violence, etc.).
  • the system 10 can recognize the potential disruptive conditions and perform preemptive actions that can provide the failover solution.
  • the system 10 can include a data center system 14 and a location monitor system 20 connected to a network 2.
  • the data center system 14 can store data related to one or more organizations.
  • the data center system 14 can include a data center (e.g., a data repository at a physical location) to store the data.
  • the data center system 14 can include a plurality of data centers, and the data can be distributed across the plurality of data centers.
  • three data centers are illustrated in the data center system 14, namely a first data center (labeled in FIG. 1 as data center 1), a second data center (labeled in FIG. 1 as data center 2), and a third data center (labeled as data center Q, where Q is a positive integer denoting the number of data centers in the data center system 14).
  • the data center system 14 can include one or more nodes connected to the network 12 that store a portion of the data.
  • the nodes can provide the system 10 with its location awareness.
  • the nodes can be distributed in different areas of a data center.
  • the nodes can be distributed across multiple data centers.
  • the data can be distributed across the one or more nodes.
  • FIG. 2 illustrates an example of a data center system 14. In the example of FIG. 2, eight nodes (e.g., N1-N8) are illustrated.
  • the nodes can be physical nodes (e.g., servers, computers, etc.) within the data center.
  • the nodes can be virtual nodes (e.g., virtual machines) within the data center.
  • the nodes can include both physical nodes and virtual nodes.
  • the nodes can be located on and/or associated with a rack.
  • the eight nodes (N1 - N8) are located across four racks (Rack 1 - Rack 4).
  • the racks can be located in one data center (e.g., on different floors). In other examples, the racks can be located within different data centers.
  • the racks can be associated with rack identifier information.
  • the nodes located on and/or associated with the rack can also be associated with position on the rack information.
  • the location data service (LDS 16) of the data center system 14 can provide the rack identification information and the location on the rack identification information for each of the nodes.
  • the data center system 14 can include global positioning system (GPS 18) devices that can provide geographical location information for the nodes.
  • GPS 18 global positioning system
  • the location of an individual node can be determined based on the geographical location information, the rack identification information, and the location on the rack identification information.
  • That data center system 14 can also include a transceiver (Tx/Rx) 22, which allows the data center system to communicate across the network 12 of FIG. 1.
  • the data center system 14 can transmit the location information to the location monitor system 20.
  • the location monitor system 20 can transmit data migration instructions to the data center system 14 based on a potential disruptive condition to implement the failover solution.
  • the network 12 can be implemented, for example, as a public network (e.g., a wide area network, such as the Internet), a private network (e.g., a local area network) or a combination thereof.
  • a public network e.g., a wide area network, such as the Internet
  • a private network e.g., a local area network
  • the functionality of several components can be combined and executed on a single component.
  • the components can be implemented, for example, as software (e.g., machine executable instructions), hardware (e.g., an application specific integrated circuit), or as a combination of both (e.g., firmware).
  • the components can be distributed among remote devices across the network 12.
  • the location monitor system 20 can be distributed across the data centers 1-Q 14.
  • the location monitor system 20 can be implemented as a stand-alone system (e.g., implemented on one or more computing devices located externally to the data centers 1-Q 14 or within one of the data centers).
  • the location monitor system 20 can distribute data across a cluster of different nodes.
  • Each node of the cluster can be associated with rack identification information, location on the rack identification information, and global location information.
  • the location monitor system 20 can create the cluster based on the rack identification information, location on the rack identification information, and global location information. For example, a portion of nodes in the cluster can be located on different racks. In another example, a portion of nodes in the cluster can be located on different floors of a data center. In a further example, a portion of nodes in the cluster can be located at different global locations.
  • Each of the nodes in the cluster can be associated with location information (e.g. geographical location information, the rack identification information, and the location on the rack identification information).
  • the location monitor system 20 can implement the failover solution based on the location information for the given nodes in the cluster.
  • the term "failover solution” can refer to a plan for switching to a redundant or standby node upon the failure or potential failure of another node.
  • the failover solution can be automated so that the data is switched automatically (e.g., with little or no direct human control) upon or before the failure occurs.
  • the failover solution of system 10 employed by the location monitor system 20 can switch data from a failed node (or a potentially failing node) to a node that is operational based on the location information, thereby proactively eliminating both single points of failure and cascading failures due to disruptive conditions (e.g., planned downtime, human error, machine error, weather, earthquake, terrorism, fire, violence, etc.).
  • disruptive conditions e.g., planned downtime, human error, machine error, weather, earthquake, terrorism, fire, violence, etc.
  • FIG. 3 illustrates an example of a location monitor system 20 that can employ a location aware failover solution (including executing one or more actions 52) upon detection of a potential disruptive condition based on a previously-established action plan 30.
  • the location monitor system 20 can include a non-transitory memory 24 to store machine-executable instructions. Examples of the non-transitory memory 24 can include volatile memory (e.g., RAM), nonvolatile memory (e.g., a hard disk, a flash memory, a solid state drive, or the like), or a combination of both.
  • the non-transitory memory 24 can also store system data 54.
  • the system data 54 can include system location data 58 that includes location information for the nodes (e.g., rack identification information, position on the rack identification information,
  • the system data 54 can also include an action plan 30 for the nodes within the data center system 14.
  • the location monitor system 20 can include a processing resource 26 (e.g., one or more processing cores) to access the memory and execute the machine- executable instructions to implement functions of the location monitor system 20 (e.g., to employ the location aware failover solution).
  • the location monitor system 20 can enable the automated failover solution based on the system location data 58 and the action plan 30.
  • the failover solution can be based on external location awareness.
  • the external location awareness can be based on a global physical (e.g., geospatial) location of the nodes in the data center system 14.
  • the location can be provided by the GPS 18 device of FIG. 1 and/or by other means of determining the location of the respective nodes in the data center system 14.
  • the location monitor system 20 can receive geographical coordinates (e.g., latitude and longitude coordinates) for the geographical location of the nodes (e.g., from the GPS 18 device).
  • the location monitor system 20 can contact a reverse location service to translate the geographical coordinates to location information representing the actual geographical location (e.g., by translating the coordinates to an address).
  • the actual geographical location and/or the coordinates can be stored in the system location data 58.
  • the failover solution can be based on internal location awareness (e.g., based on the rack identification information and the location on the rack identification information within a respective data center).
  • the location monitor system 20 can implement the failover solution based on both external location awareness and internal location awareness.
  • the location monitor system 20 can also include a user interface 28 that can include a user input device (e.g., keyboard, mouse, microphone, etc.) and/or a display device (e.g., a monitor, a screen, a graphical user interface, speakers, etc.).
  • the location monitor system 20 can be coupled to the network 12 to exchange data with the data center system 14 and one or more information services (IS 1 - IS P) 42 via a transceiver (Tx/Rx) 32.
  • the transceiver (Tx/Rx) 32 can employ one or more application programming interfaces (APIs) 34 (e.g., API 1 - API P) to
  • APIs application programming interfaces
  • the transceiver 32 can send a request for information to each information service 42 via the APIs 34 associated with the respective services 42, which request can include location information for each data center containing nodes of interest.
  • the APIs 34 can establish an interface with the information services 42 (e.g., Web-based global information services, local information sources related to the data center system 14, and the like).
  • the information services 42 can include a weather information service, a news information service, a geological information service, a local information service (for the data center system 14), or the like.
  • the information services to be accessed can be defined in the action plan 30.
  • the APIs 34 can establish that the information services 42 send data based on the locations (e.g., geographical location) of the nodes stored within the system location data 58. Additionally or alternatively, the APIs 34 can establish a frequency at which the information services 42 send the data. The frequency can be established by a control 38 of the transceiver 32 (e.g., based on frequencies defined in the action plan 30). In some examples, the API can be
  • RSS rich site summary
  • the information services 42 contacted and the frequencies are not necessarily the same for each location.
  • one node of the data center system 14 can be geographically located in Oklahoma, USA, which is prone to tornadoes, but not violence.
  • Another node in the data center system 4 can be geographically located in Baghdad, Iraq, which is prone to violence, but not prone to tornadoes.
  • the action plan 30 can define that for the node
  • the weather service sends data every hour and the news service sends data once per day, while for the node geographically located in Baghdad, the weather service can send data once per day and the news service can send data every hour.
  • the transceiver (Tx/Rx) 32 can include a receiver 36 to receive the data from the one or more information services 42 in response to the information request sent via the API.
  • data can be related to a potential disruptive condition for a location of a given node.
  • the potential disruptive condition can include environmental conditions (e.g., a weather storm, an earthquake) as well as human interventions (e.g., a terrorist threat or attack, another potentially destructive event, , a planned shut-down of a data center or loss of power, human error, machine error, and the like).
  • the transceiver can include analytics 40 that can pre-process the information received from the one or more information services 42.
  • the analytics 40 can remove any data tags, metadata, or other unnecessary data sent by the respective one or more of the information services 42 to extract the information relevant to each disruptive condition that applies to the nodes of interest.
  • the analytics 40 can group the potential disruptive conditions based on the location of the respective node.
  • the transceiver (Tx/Rx) 32 can provide the information to the ranker 44, which can categorize a threat level to each node of interest that is posed by the potential disruptive condition(s) described in the data received from the information services 42.
  • the ranker 44 can include analytics 46 that can further process the information.
  • the analytics 46 of the ranker 44 can select the threat level from a plurality of different, discrete threat levels 48 (e.g., defined in the action plan 30).
  • the analytics 46 of the ranker 44 determine that a need exists for additional information about a potential disruptive condition.
  • the ranker can alert the control 38 of the transceiver (Tx/Rx) 32 to generate a query to one or more information services 42 (e.g., the same information service and/or another information service) for the additional information related to the potential disruptive condition.
  • the additional query can be triggered by the ranker 44 when the action plan 30 defines that for a certain threat level, the system 20 is to query another one or more of the information services 42 for additional information.
  • the subsequent query for additional information can include text or data extracted from another information extracted from
  • the categorization by the ranker 44 can be based on the information related to the potential disruptive condition and/or the further information related to the potential disruptive condition in response to the subsequent query.
  • the ranker 44 can categorize the threat level as "low”, “medium”, or "high.”
  • An example of a low threat level can be a hurricane with a potential path directed toward the geographical location in five days.
  • An example of a medium threat level can be a hurricane off the coast with a path directed toward to geographical location in less than 2 days.
  • An example of a high threat level can be a hurricane that is predicted to hit the geographical location within 2 hours. That is, in some examples, the threat level can be based on temporal proximity of the disruptive condition reaching the location of the data center.
  • the ranker 44 can send the categorized potential external disruptive conditions to an action unit 50.
  • the action unit can perform one or more actions 52 (A 1 - A S) (e.g., defined by the action plan 30) based on the categorized threat level, the one or more actions 52 defined by the action plan 30 can include a respective set of one or more actions 52 to be performed by action unit 50 of the location monitor system 20 based on the threat level.
  • the one or more actions 52 can include logging information regarding the potential disruptive event, sending a warning to one or more administrators about the potential disruptive event (e.g., via email, text message, phone call, or the like), moving data from the potentially, affected node to another node, and the like.
  • the action plan 30 can establish that the different threat levels require more information to be provided related to the threat level with increasing threat level.
  • the predefined action can include logging information related to the threat and/or rebalancing the data related to the affected node.
  • the predefined action can include logging the information related to the threat with more information and sending a warning to an administrator of the data center.
  • the predefined action can include logging the information related to the threat with even more, sending a warning to the administrator and another person related to the organization, and automatically migrating data from the affected node to a node at another location (e.g., based on the rack identification information and the location on the rack identification information).
  • the actions performed in each successive increasing threat level can include the actions of the lower threat level as well as one or more additional actions.
  • FIG. 4 illustrates an example of an action plan 30.
  • the action plan can include a respective set of one or more actions to be performed by the location monitor system based on the selected threat level.
  • the action plan 30 can be defined by an administrator upon setup of the system 10.
  • the action plan 30 can be edited and re-defined by an administrator at any time after the setup of the system 10.
  • the action plan 30 can define different actions to perform (e.g., one or more of actions A 1 - A S) for different threat levels 48 (e.g., leveM threat - level R threat).
  • the action plan 30 can also include other information related to operations of the system 10 of FIG. 1.
  • the action plan 30 can include control data 64 that can define operations of the control 38 of transceiver (Tx/Rx) 32.
  • the action plan 30 can include contact information 66 for the administrator and any other person(s) in a line of contacts (and can include information defining the line of contacts).
  • the contact information 66 can include email addresses, mobile telephone numbers, landline telephone numbers, names, addresses, etc.
  • the action plan 30 can include a threat log 68 that includes logged information about potential disruptive conditions.
  • the action plan includes managing movement of data from a data store, corresponding to a given node to another data store, corresponding to a node at another location that is different from the physical location of the given node.
  • the action plan 30 can include balance data 60 that can define clusters of nodes and the current balance of the data between the nodes in the clusters.
  • the balance data 60 can also include a defined frequency for rebalancing the data (e.g., at a scheduled interval and/or in the presence of a potential disruptive condition). Additionally, the balance data 60 can be utilized for urgent data migration.
  • the action plan 30 can include preferences 62 related to the way that data is migrated from a potentially failing node to another node.
  • the illustrated action plan 30 shows a plurality of threat levels: level 1 treat, level 2 threat, and level R threat, where R is a positive integer denoting the number of threat levels.
  • the action plan 30 can establish criteria to establish the level 1 threat, the level 2 threat, and the level R threat.
  • the criteria can include: a risk of damage to the node due to the potential disruptive condition, a time associated with the potential disruptive condition, a proximity of the potential disruptive condition to the geographical location of the node, etc.
  • the action plan 30 can establish one or more actions 52 (e.g., A 1 - A S, where S is a positive integer) for action unit 50 of the location monitor system 20 to execute corresponding to the respective threat level.
  • the action (A 1) can be to log information related to the potential disruptive condition (e.g., within threat log 58).
  • the action (A 2) can be to log additional information related to the potential disruptive condition and the action (A 3) can be send a warning to an administrator (listed in the contact information 66).
  • the action (A4) can be to log additional information related to the potential disruptive condition
  • the action (A5) can be to send a warning to multiple administrators
  • the action (A S) can be to perform an urgent action (e.g., data migration according to the preferences 62 and the balance data 60).
  • FIGS. 5-7 example methods will be better appreciated with reference to FIGS. 5-7. While, for the purposes of simplicity of explanation, the example methods of FIGS. 5-7 are shown and described as executing serially, the present examples are not limited by the illustrated order, as some actions could in other examples occur in different orders and/or concurrently from that shown and described herein. Moreover, it is not necessary that all described actions be performed to implement a method.
  • the method can be stored in one or more non-transitory computer readable media and executed by one or more processing resources, such as disclosed herein.
  • FIG. 5 illustrates a flowchart of an example method 70 for implementing a location aware failover solution.
  • the method 70 can be executed by a system (e.g., location monitor system 20) that can include a non-transitory memory (e.g., non-transitory memory 24) that stores machine executable instructions and a processing resource (e.g., processing resource 26) to access the memory and execute the instructions to cause a computing device to perform the acts of method 70.
  • the location aware failover solution can be employed preemptively and automatically based on the location awareness for one or more nodes in a data center system.
  • a potential disruptive condition related to a node can be discovered.
  • potential disruptive condition e.g., a planned disruptive condition or an unplanned disruptive condition, such as due to an environmental condition, human error or machine error
  • the discovery can be based on the location of the node.
  • the discovery can be based on a detected unresponsiveness of the node.
  • the discovery can be " based on a reduced signal strength from the node.
  • the discovery of the potentially failing node can automatically trigger the failover solution.
  • the failover solution can include migration of data or applications from a nonresponsive server to a responsive server.
  • the nodes that are available for failover can be determined. For example, nodes (e.g., nodes in a cluster, nodes at a location, and/or nodes in the data center system 14) can be examined one-by-one and their availability determined.
  • location information can be received for the nodes that are deemed available for the failover.
  • the location information can include LDS location information, including rack identification information and position on the rack
  • the location information can include GPS information, including global geographical location information.
  • the location information can include LDS information and GPS information.
  • one or more suitable nodes for the failover can be determined.
  • the most suitable node can be determined based on the location information. Accordingly, the most suitable node can be a node that is physically located the farthest away from the non- responsive node. For example, the most suitable node can be located on a different rack than the potentially failing node. In another example, the most suitable node can be located on a different floor than the potentially failing node. In a further example, the most suitable node can be located at a different geographical location than the potentially failing node.
  • FIG. 6 illustrates a flowchart of an example method 80 for employing location awareness to trigger an automated failover solution to reduce the impact of a potential disruptive condition.
  • the method 80 can be executed by a system (e.g., location monitor system 20) that can include a non-transitory memory (e.g., memory 24) that stores machine executable instructions and a processing resource (e.g., processing resource 26) to access the memory and execute the instructions to cause a computing device to perform the acts of method 80.
  • a system e.g., location monitor system 20
  • a non-transitory memory e.g., memory 24
  • a processing resource e.g., processing resource 26
  • a threat level posed by the potential disruptive condition to the node can be categorized (e.g., by ranker 44).
  • an action that is defined by an action plan based on the threat level can be performed (e.g., by action unit 50).
  • FIG. 7 illustrates a flowchart of another example method 90 for employing location awareness to trigger an automated failover solution to reduce the impact of a potential disruptive condition (e.g., planned downtime, human error, machine error, weather, earthquake, terrorism, fire, violence, etc.).
  • a potential disruptive condition e.g., planned downtime, human error, machine error, weather, earthquake, terrorism, fire, violence, etc.
  • the method 80 can be executed by a system (e.g., location monitor system 20) that can include a non- transitory memory (e.g., memory 24) that stores machine executable instructions and a processing resource (e.g., processing resource 26) to access the memory and execute the instructions to cause a computing device to perform the acts of method 90.
  • the location aware failover solution can be executed preemptively based on the location awareness.
  • a subscription to an information service can begin.
  • the information service can be one or more of a news service, a weather service, a geological service, or the like.
  • the service can also be a local service that includes information regarding network conditions.
  • information can be received (e.g., by transceiver 32 from an information service (IS 1 - IS P 32)) related to a potential disruptive condition at a location where one or more nodes reside (e.g., information related to planned downtime, information related to human error, information related to machine error, information related to a weather condition, information related to a terrorist threat, information related to an earthquake threat, information related to a threat of violence, etc.).
  • the information can be gathered based on location information of the node.
  • the location information can include LDS location information, including rack identification information and position on the rack identification information.
  • the location information can include GPS information, including global geographical location information.
  • the location information can include LDS information and GPS information.
  • a threat level posed by the potential disruptive condition to the node can be selected from a plurality of different threat levels (e.g., by ranker 44).
  • the plurality of different threat levels can be defined in an action plan (e.g., action plan 30).
  • the plurality of threat levels can include a low threat level, a medium threat level, and a high threat level.
  • An example of a low threat level can be a hurricane with a potential path directed toward the geographical location in a predefined number of days (e.g., about five days).
  • An example of a medium threat level can be a hurricane a path directed toward to geographical location in predefined number of days that is less than the low thread level (e.g., less than about 24 hours).
  • An example of a high threat level can be a hurricane that is on imminent track to intercept the geographical location a still shorter period of time (e.g., in about 2 hours).
  • the action plan can require an additional query of one or more information services external to the location monitor system for acquiring additional information related to the potential disruptive condition.
  • the additional query can request the additional information based on information already received and utilized to categorize the current disruptive condition.
  • the categorization can be further based on the additional information.
  • a set of actions can be selected (e.g., by action unit 50) that is defined by an action plan (e.g., action plan 30) based on the threat level.
  • the set of actions can, for example, be at least a portion of the automatic failover solution.
  • the action defined by the action plan can include a respective set of one or more actions to be performed by the location monitor system (e.g., system 20) based on the category of the identified threat level.
  • the action can include logging the information related to the potential disruptive condition, wherein the information related to the potential disruptive condition comprises location data describing the physical location of the node and a portion of the information related to the potential disruptive condition.
  • the action can include rebalancing data within a cluster comprising the node and at least one other node that is located at different location from the physical location of the node.
  • the action can include sending a warning message (e.g., including location data describing the physical location of the node, a portion of the information related to the potential disruptive condition, and disruptive condition data describing the selected threat level) to an administrator.
  • the action can include moving data from the node to another node at another location that is different from the physical location of the node.
  • the set of actions can be performed (e.g., by action unit 50).
  • FIG. 8 illustrates an example of a non-transitory computer readable medium 110 to store machine readable instructions to implement a location monitor system.
  • the machine readable instructions can be executed to cause a computing device to perform operations to implement the location monitor system.
  • the instructions can include receiving instructions 116 to receive information related to a potential disruptive condition at a node in a data center system. For example, the information related to the potential disruptive condition can be based on a location the node.
  • the instructions can also include querying instructions 118 to query an information service external to the location monitor system for additional information related to the potential disruptive condition.
  • the instructions can also include selecting instructions 120 to select a threat level posed by the potential disruptive condition to the node from a plurality of different threat levels based on the information related to the potential disruptive condition and the additional information related to the potential disruptive condition.
  • the instructions can also include performing instructions 122 to perform an action that is defined by an action plan based on the categorized threat level.
  • FIG. 9 illustrates an example of a location monitor system 130.
  • the location monitor system 130 can include various components that can be hardware, software (e.g., stored in a non-transitory memory and executable by a processing resource to cause the performance of operations), and/or a combinations of hardware and software.
  • the components can include a transceiver 132 to receive data related to a potential disruptive condition to a node in a data center system from an information source external to the system.
  • the transceiver 132 can also request additional data related to the potential disruptive condition.
  • the transceiver 132 can preprocess the data related to the potential disruptive condition and the additional data.
  • components can also include a ranker 134 to select a threat level posed by the potential disruptive condition to the node from a plurality of different threat levels based on the preprocessed data related to the potential disruptive condition and the additional data.
  • the components can also include an action unit 136 to perform an action that is defined by an action plan based on the selected threat level.
  • the action plan can define a respective set of one or more actions to be performed by the location monitor system based on the selected threat level.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Telephonic Communication Services (AREA)

Abstract

An example location monitor system can comprise a non-transitory memory to store machine readable instructions. The system can also comprise a processing resource to execute the machine readable instructions. The machine readable instructions can comprise a transceiver to receive information related to a potential disruptive condition at a node in a data center system. The information related to the potential disruptive condition can be based on a physical location of the node. The machine readable instructions can also comprise a ranker to categorize a threat level posed by the potential disruptive condition to the node. The machine readable instructions can also comprise an action unit to perform an action that is defined by an action plan based on the categorized threat level.

Description

LOCATION AWARE FAILOVER SOLUTION
BACKGROUND
[0001] Many organizations, both large and small, utilize data centers for the remote storage and distribution of large amounts of data and applications. These organizations rely on the data and applications being available at all times. Accordingly, these organizations employ high-availability (HA) solutions that ensure that their services are available at all times.
BRIEF DESCRIPTION OF THE DRAWINGS
[0002] FIG. 1 illustrates an example of a system that can employ location awareness to trigger an automated failover solution to reduce the impact of potential disruptive conditions.
[0003] FIG. 2 illustrates an example of a data center system equipped with location data service and global positioning system capabilities.
[0004] FIG. 3 illustrates an example of a location monitor system that can employ a location aware failover solution upon detection of a potential disruptive condition based on a previously-established action plan.
[0005] FIG. 4 illustrates an example of an action plan.
[0006] FIG. 5 illustrates a flowchart of an example method for employing a location aware failover solution. ¾
[0007] FIG. 6 illustrates a flowchart of an example method for employing location awareness to trigger an automated failover solution.
[0008] FIG. 7 illustrates a flowchart of another example method for employing location awareness to trigger an automated failover solution.
[0009] FIG. 8 illustrates an example of a non-transitory computer readable medium to store machine readable instructions to implement a location monitor system.
[0010] FIG. 9 illustrates an example of a location monitor system. DETAILED DESCRIPTION
[0011] Even HA solutions struggle to prevent the impact of disruptive conditions (e.g., weather, earthquake, terrorism, fire, violence, etc.) on the availability of data and applications. Even by employing different data centers at different geographical locations does not ensure that the data and applications will be available all the time. For example, an external disruptive condition that affects only a specific geographical area can affect one of the data centers, causing a single point of failure and an unplanned inaccessibility of portions of the data and applications. While the data centers are often located at different geographical points that may not all be affected, these external disruptive conditions often occur without sufficient warning to move data and applications stored at a data center at an affected location to another data center at another location.
[0012] An example location monitor system described herein can reduce or even prevent the impact of such disruptive conditions on the availability of data and applications. The example location monitor system can include a non-transitory memory to store machine readable instructions and a processing resource (e.g., one or more processor cores) to execute the machine readable instructions. A transceiver can receive information related to a potential disruptive condition at a node in a data center system. The information related to the potential disruptive condition can be based on a physical location of the node. A ranker can categorize a threat level posed by the potential disruptive condition to the node. An action unit can perform an action that is defined by an action plan based on the categorized threat level.
[0013] FIG. 1 illustrates an example of a system 10 that can employ location awareness to trigger an automated failover solution to reduce the impact of potential disruptive conditions on a node of a data center system 14. The system 10 can perform preemptive actions that can provide an automated failover solution for the node of the data center system 14. The automated failover solution can be based on location awareness and can ensure that data and applications (referred to as "data") are accessible at all times. As used herein, the term "disruptive condition" can refer to both an internal disruptive event (e.g., planned downtime, human error, hardware error, etc.) and an external disruptive event (e.g., weather, earthquake, terrorism, fire, violence, etc.). The system 10 can recognize the potential disruptive conditions and perform preemptive actions that can provide the failover solution.
[0014] The system 10 can include a data center system 14 and a location monitor system 20 connected to a network 2. The data center system 14 can store data related to one or more organizations. In some examples, the data center system 14 can include a data center (e.g., a data repository at a physical location) to store the data. In other examples, the data center system 14 can include a plurality of data centers, and the data can be distributed across the plurality of data centers. In the example of FIG. 1, three data centers are illustrated in the data center system 14, namely a first data center (labeled in FIG. 1 as data center 1), a second data center (labeled in FIG. 1 as data center 2), and a third data center (labeled as data center Q, where Q is a positive integer denoting the number of data centers in the data center system 14).
[0015] The data center system 14 can include one or more nodes connected to the network 12 that store a portion of the data. The nodes can provide the system 10 with its location awareness. For example, the nodes can be distributed in different areas of a data center. In another example, the nodes can be distributed across multiple data centers. The data can be distributed across the one or more nodes. FIG. 2 illustrates an example of a data center system 14. In the example of FIG. 2, eight nodes (e.g., N1-N8) are illustrated. In some examples, the nodes can be physical nodes (e.g., servers, computers, etc.) within the data center. In other examples, the nodes can be virtual nodes (e.g., virtual machines) within the data center. In still other examples, the nodes can include both physical nodes and virtual nodes.
[0016] The nodes can be located on and/or associated with a rack. In the example of FIG. 2, the eight nodes (N1 - N8) are located across four racks (Rack 1 - Rack 4). In some examples, the racks can be located in one data center (e.g., on different floors). In other examples, the racks can be located within different data centers. The racks can be associated with rack identifier information. The nodes located on and/or associated with the rack can also be associated with position on the rack information. The location data service (LDS 16) of the data center system 14 can provide the rack identification information and the location on the rack identification information for each of the nodes. Additionally, the data center system 14 can include global positioning system (GPS 18) devices that can provide geographical location information for the nodes. As an example, the location of an individual node can be determined based on the geographical location information, the rack identification information, and the location on the rack identification information.
[0017] That data center system 14 can also include a transceiver (Tx/Rx) 22, which allows the data center system to communicate across the network 12 of FIG. 1. For example, the data center system 14 can transmit the location information to the location monitor system 20. As another example, the location monitor system 20 can transmit data migration instructions to the data center system 14 based on a potential disruptive condition to implement the failover solution. The network 12 can be implemented, for example, as a public network (e.g., a wide area network, such as the Internet), a private network (e.g., a local area network) or a combination thereof. For the purposes of simplification of explanation, in the present example, different components of the system 10 are illustrated and described as performing different functions.
However, in other examples, the functionality of several components can be combined and executed on a single component. The components can be implemented, for example, as software (e.g., machine executable instructions), hardware (e.g., an application specific integrated circuit), or as a combination of both (e.g., firmware). In other examples, the components can be distributed among remote devices across the network 12. For example, the location monitor system 20 can be distributed across the data centers 1-Q 14. In other examples, the location monitor system 20 can be implemented as a stand-alone system (e.g., implemented on one or more computing devices located externally to the data centers 1-Q 14 or within one of the data centers).
[0018] As an example, the location monitor system 20 can distribute data across a cluster of different nodes. Each node of the cluster can be associated with rack identification information, location on the rack identification information, and global location information. In some examples, the location monitor system 20 can create the cluster based on the rack identification information, location on the rack identification information, and global location information. For example, a portion of nodes in the cluster can be located on different racks. In another example, a portion of nodes in the cluster can be located on different floors of a data center. In a further example, a portion of nodes in the cluster can be located at different global locations.
[0019] Each of the nodes in the cluster can be associated with location information (e.g. geographical location information, the rack identification information, and the location on the rack identification information). The location monitor system 20 can implement the failover solution based on the location information for the given nodes in the cluster. As used herein, the term "failover solution" can refer to a plan for switching to a redundant or standby node upon the failure or potential failure of another node. The failover solution can be automated so that the data is switched automatically (e.g., with little or no direct human control) upon or before the failure occurs. For example, the failover solution of system 10 employed by the location monitor system 20 can switch data from a failed node (or a potentially failing node) to a node that is operational based on the location information, thereby proactively eliminating both single points of failure and cascading failures due to disruptive conditions (e.g., planned downtime, human error, machine error, weather, earthquake, terrorism, fire, violence, etc.).
[0020] FIG. 3 illustrates an example of a location monitor system 20 that can employ a location aware failover solution (including executing one or more actions 52) upon detection of a potential disruptive condition based on a previously-established action plan 30. The location monitor system 20 can include a non-transitory memory 24 to store machine-executable instructions. Examples of the non-transitory memory 24 can include volatile memory (e.g., RAM), nonvolatile memory (e.g., a hard disk, a flash memory, a solid state drive, or the like), or a combination of both. The non-transitory memory 24 can also store system data 54. For example, the system data 54 can include system location data 58 that includes location information for the nodes (e.g., rack identification information, position on the rack identification information,
geographical location information), clustering information for the nodes, and/or balancing information for the clusters. The system data 54 can also include an action plan 30 for the nodes within the data center system 14.
[0021] The location monitor system 20 can include a processing resource 26 (e.g., one or more processing cores) to access the memory and execute the machine- executable instructions to implement functions of the location monitor system 20 (e.g., to employ the location aware failover solution). As an example, the location monitor system 20 can enable the automated failover solution based on the system location data 58 and the action plan 30. The failover solution can be based on external location awareness. As an example, the external location awareness can be based on a global physical (e.g., geospatial) location of the nodes in the data center system 14. The location can be provided by the GPS 18 device of FIG. 1 and/or by other means of determining the location of the respective nodes in the data center system 14.
[0022] By way of example, the location monitor system 20 can receive geographical coordinates (e.g., latitude and longitude coordinates) for the geographical location of the nodes (e.g., from the GPS 18 device). The location monitor system 20 can contact a reverse location service to translate the geographical coordinates to location information representing the actual geographical location (e.g., by translating the coordinates to an address). The actual geographical location and/or the coordinates can be stored in the system location data 58. In another example, the failover solution can be based on internal location awareness (e.g., based on the rack identification information and the location on the rack identification information within a respective data center). In other examples, the location monitor system 20 can implement the failover solution based on both external location awareness and internal location awareness.
[0023] In some examples, the location monitor system 20 can also include a user interface 28 that can include a user input device (e.g., keyboard, mouse, microphone, etc.) and/or a display device (e.g., a monitor, a screen, a graphical user interface, speakers, etc.). The location monitor system 20 can be coupled to the network 12 to exchange data with the data center system 14 and one or more information services (IS 1 - IS P) 42 via a transceiver (Tx/Rx) 32. The transceiver (Tx/Rx) 32 can employ one or more application programming interfaces (APIs) 34 (e.g., API 1 - API P) to
communicate with one or more corresponding information services 42 (e.g., IS 1 - IS P) across the network 12. The transceiver 32 can send a request for information to each information service 42 via the APIs 34 associated with the respective services 42, which request can include location information for each data center containing nodes of interest. For example, the APIs 34 can establish an interface with the information services 42 (e.g., Web-based global information services, local information sources related to the data center system 14, and the like). In some examples, the information services 42 can include a weather information service, a news information service, a geological information service, a local information service (for the data center system 14), or the like. In some examples, the information services to be accessed can be defined in the action plan 30. In some examples, the APIs 34 can establish that the information services 42 send data based on the locations (e.g., geographical location) of the nodes stored within the system location data 58. Additionally or alternatively, the APIs 34 can establish a frequency at which the information services 42 send the data. The frequency can be established by a control 38 of the transceiver 32 (e.g., based on frequencies defined in the action plan 30). In some examples, the API can be
implemented as including one or more rich site summary (RSS) readers, each to regularly acquire information from a respective resource location corresponding to a given information services 42.
[0024] The information services 42 contacted and the frequencies are not necessarily the same for each location. For example, one node of the data center system 14 can be geographically located in Oklahoma, USA, which is prone to tornadoes, but not violence. Another node in the data center system 4 can be geographically located in Baghdad, Iraq, which is prone to violence, but not prone to tornadoes. In this example, the action plan 30 can define that for the node
geographically located in Oklahoma, the weather service sends data every hour and the news service sends data once per day, while for the node geographically located in Baghdad, the weather service can send data once per day and the news service can send data every hour.
[0025] The transceiver (Tx/Rx) 32 can include a receiver 36 to receive the data from the one or more information services 42 in response to the information request sent via the API. For example, data can be related to a potential disruptive condition for a location of a given node. Examples of the potential disruptive condition can include environmental conditions (e.g., a weather storm, an earthquake) as well as human interventions (e.g., a terrorist threat or attack, another potentially destructive event, , a planned shut-down of a data center or loss of power, human error, machine error, and the like). The transceiver can include analytics 40 that can pre-process the information received from the one or more information services 42. For example, the analytics 40 can remove any data tags, metadata, or other unnecessary data sent by the respective one or more of the information services 42 to extract the information relevant to each disruptive condition that applies to the nodes of interest. In other examples, the analytics 40 can group the potential disruptive conditions based on the location of the respective node.
[0026] After the initial pre-processing, the transceiver (Tx/Rx) 32 can provide the information to the ranker 44, which can categorize a threat level to each node of interest that is posed by the potential disruptive condition(s) described in the data received from the information services 42. The ranker 44 can include analytics 46 that can further process the information. For example, the analytics 46 of the ranker 44 can select the threat level from a plurality of different, discrete threat levels 48 (e.g., defined in the action plan 30). In some examples, the analytics 46 of the ranker 44 determine that a need exists for additional information about a potential disruptive condition. The ranker can alert the control 38 of the transceiver (Tx/Rx) 32 to generate a query to one or more information services 42 (e.g., the same information service and/or another information service) for the additional information related to the potential disruptive condition. For example, the additional query can be triggered by the ranker 44 when the action plan 30 defines that for a certain threat level, the system 20 is to query another one or more of the information services 42 for additional information. The subsequent query for additional information can include text or data extracted from another information extracted from In this example, the categorization by the ranker 44 can be based on the information related to the potential disruptive condition and/or the further information related to the potential disruptive condition in response to the subsequent query. As an example, the ranker 44 can categorize the threat level as "low", "medium", or "high." An example of a low threat level can be a hurricane with a potential path directed toward the geographical location in five days. An example of a medium threat level can be a hurricane off the coast with a path directed toward to geographical location in less than 2 days. An example of a high threat level can be a hurricane that is predicted to hit the geographical location within 2 hours. That is, in some examples, the threat level can be based on temporal proximity of the disruptive condition reaching the location of the data center.
[0027] The ranker 44 can send the categorized potential external disruptive conditions to an action unit 50. The action unit can perform one or more actions 52 (A 1 - A S) (e.g., defined by the action plan 30) based on the categorized threat level, the one or more actions 52 defined by the action plan 30 can include a respective set of one or more actions 52 to be performed by action unit 50 of the location monitor system 20 based on the threat level. For example, the one or more actions 52 can include logging information regarding the potential disruptive event, sending a warning to one or more administrators about the potential disruptive event (e.g., via email, text message, phone call, or the like), moving data from the potentially, affected node to another node, and the like. In some examples, the action plan 30 can establish that the different threat levels require more information to be provided related to the threat level with increasing threat level.
[0028] For example, in the example where the threat levels are "low", "medium", and "high", in the case of the low threat level, the predefined action can include logging information related to the threat and/or rebalancing the data related to the affected node. In the case of the medium threat level, the predefined action can include logging the information related to the threat with more information and sending a warning to an administrator of the data center. In the case of the high threat level, the predefined action can include logging the information related to the threat with even more, sending a warning to the administrator and another person related to the organization, and automatically migrating data from the affected node to a node at another location (e.g., based on the rack identification information and the location on the rack identification information). Thus, in some examples, the actions performed in each successive increasing threat level can include the actions of the lower threat level as well as one or more additional actions.
[0029] FIG. 4 illustrates an example of an action plan 30. As mentioned above, the action plan can include a respective set of one or more actions to be performed by the location monitor system based on the selected threat level. For example, the action plan 30 can be defined by an administrator upon setup of the system 10. In other examples, the action plan 30 can be edited and re-defined by an administrator at any time after the setup of the system 10. The action plan 30 can define different actions to perform (e.g., one or more of actions A 1 - A S) for different threat levels 48 (e.g., leveM threat - level R threat). The action plan 30 can also include other information related to operations of the system 10 of FIG. 1. For example, the action plan 30 can include control data 64 that can define operations of the control 38 of transceiver (Tx/Rx) 32. As another example, the action plan 30 can include contact information 66 for the administrator and any other person(s) in a line of contacts (and can include information defining the line of contacts). The contact information 66 can include email addresses, mobile telephone numbers, landline telephone numbers, names, addresses, etc. In a further example, the action plan 30 can include a threat log 68 that includes logged information about potential disruptive conditions. In yet another example, the action plan includes managing movement of data from a data store, corresponding to a given node to another data store, corresponding to a node at another location that is different from the physical location of the given node. For instance, the action plan 30 can include balance data 60 that can define clusters of nodes and the current balance of the data between the nodes in the clusters. The balance data 60 can also include a defined frequency for rebalancing the data (e.g., at a scheduled interval and/or in the presence of a potential disruptive condition). Additionally, the balance data 60 can be utilized for urgent data migration. In a further example, the action plan 30 can include preferences 62 related to the way that data is migrated from a potentially failing node to another node.
[0030] As a further example, the illustrated action plan 30 shows a plurality of threat levels: level 1 treat, level 2 threat, and level R threat, where R is a positive integer denoting the number of threat levels. The action plan 30 can establish criteria to establish the level 1 threat, the level 2 threat, and the level R threat. For example, the criteria can include: a risk of damage to the node due to the potential disruptive condition, a time associated with the potential disruptive condition, a proximity of the potential disruptive condition to the geographical location of the node, etc. The action plan 30 can establish one or more actions 52 (e.g., A 1 - A S, where S is a positive integer) for action unit 50 of the location monitor system 20 to execute corresponding to the respective threat level. For example, for a level 1 threat, the action (A 1) can be to log information related to the potential disruptive condition (e.g., within threat log 58). As another example, for a level 2 threat, the action (A 2) can be to log additional information related to the potential disruptive condition and the action (A 3) can be send a warning to an administrator (listed in the contact information 66). As a further example, for a level R threat, the action (A4) can be to log additional information related to the potential disruptive condition, the action (A5) can be to send a warning to multiple administrators, and the action (A S) can be to perform an urgent action (e.g., data migration according to the preferences 62 and the balance data 60).
[0031] In view of the foregoing structural and functional features described above, example methods will be better appreciated with reference to FIGS. 5-7. While, for the purposes of simplicity of explanation, the example methods of FIGS. 5-7 are shown and described as executing serially, the present examples are not limited by the illustrated order, as some actions could in other examples occur in different orders and/or concurrently from that shown and described herein. Moreover, it is not necessary that all described actions be performed to implement a method. The method can be stored in one or more non-transitory computer readable media and executed by one or more processing resources, such as disclosed herein.
[0032] FIG. 5 illustrates a flowchart of an example method 70 for implementing a location aware failover solution. For example, the method 70 can be executed by a system (e.g., location monitor system 20) that can include a non-transitory memory (e.g., non-transitory memory 24) that stores machine executable instructions and a processing resource (e.g., processing resource 26) to access the memory and execute the instructions to cause a computing device to perform the acts of method 70. In some examples, the location aware failover solution can be employed preemptively and automatically based on the location awareness for one or more nodes in a data center system.
[0033] At 72, a potential disruptive condition related to a node can be discovered. For example, potential disruptive condition (e.g., a planned disruptive condition or an unplanned disruptive condition, such as due to an environmental condition, human error or machine error) can cause the node to fail. In some examples, the discovery can be based on the location of the node. In other examples, the discovery can be based on a detected unresponsiveness of the node. In still other examples, the discovery can be" based on a reduced signal strength from the node. The discovery of the potentially failing node can automatically trigger the failover solution. For example, the failover solution can include migration of data or applications from a nonresponsive server to a responsive server. At 74, the nodes that are available for failover (e.g., to receive the data migrated from the potentially failing node) can be determined. For example, nodes (e.g., nodes in a cluster, nodes at a location, and/or nodes in the data center system 14) can be examined one-by-one and their availability determined.
[0034] At 76, location information can be received for the nodes that are deemed available for the failover. For example, the location information can include LDS location information, including rack identification information and position on the rack
identification information. As another example, the location information can include GPS information, including global geographical location information. In a further example, the location information can include LDS information and GPS information. At 78, one or more suitable nodes for the failover can be determined. For example, the most suitable node can be determined based on the location information. Accordingly, the most suitable node can be a node that is physically located the farthest away from the non- responsive node. For example, the most suitable node can be located on a different rack than the potentially failing node. In another example, the most suitable node can be located on a different floor than the potentially failing node. In a further example, the most suitable node can be located at a different geographical location than the potentially failing node.
[0035] FIG. 6 illustrates a flowchart of an example method 80 for employing location awareness to trigger an automated failover solution to reduce the impact of a potential disruptive condition. For example, the method 80 can be executed by a system (e.g., location monitor system 20) that can include a non-transitory memory (e.g., memory 24) that stores machine executable instructions and a processing resource (e.g., processing resource 26) to access the memory and execute the instructions to cause a computing device to perform the acts of method 80. At 82, information related to a potential disruptive condition at a node can be received (e.g., by receiver 36). At 84, a threat level posed by the potential disruptive condition to the node can be categorized (e.g., by ranker 44). At 86, an action that is defined by an action plan based on the threat level can be performed (e.g., by action unit 50).
[0036] FIG. 7 illustrates a flowchart of another example method 90 for employing location awareness to trigger an automated failover solution to reduce the impact of a potential disruptive condition (e.g., planned downtime, human error, machine error, weather, earthquake, terrorism, fire, violence, etc.). For example, the method 80 can be executed by a system (e.g., location monitor system 20) that can include a non- transitory memory (e.g., memory 24) that stores machine executable instructions and a processing resource (e.g., processing resource 26) to access the memory and execute the instructions to cause a computing device to perform the acts of method 90. In some examples, the location aware failover solution can be executed preemptively based on the location awareness.
[0037] At 92, a subscription to an information service (e.g., one or more of IS 1 - IS P) can begin. For example, the information service can be one or more of a news service, a weather service, a geological service, or the like. The service can also be a local service that includes information regarding network conditions. At 94, information can be received (e.g., by transceiver 32 from an information service (IS 1 - IS P 32)) related to a potential disruptive condition at a location where one or more nodes reside (e.g., information related to planned downtime, information related to human error, information related to machine error, information related to a weather condition, information related to a terrorist threat, information related to an earthquake threat, information related to a threat of violence, etc.). For example, the information can be gathered based on location information of the node. The location information can include LDS location information, including rack identification information and position on the rack identification information. As another example, the location information can include GPS information, including global geographical location information. In a further example, the location information can include LDS information and GPS information.
[0038] At 96, a threat level posed by the potential disruptive condition to the node can be selected from a plurality of different threat levels (e.g., by ranker 44). For example, the plurality of different threat levels can be defined in an action plan (e.g., action plan 30). As an example, the plurality of threat levels can include a low threat level, a medium threat level, and a high threat level. An example of a low threat level can be a hurricane with a potential path directed toward the geographical location in a predefined number of days (e.g., about five days). An example of a medium threat level can be a hurricane a path directed toward to geographical location in predefined number of days that is less than the low thread level (e.g., less than about 24 hours). An example of a high threat level can be a hurricane that is on imminent track to intercept the geographical location a still shorter period of time (e.g., in about 2 hours). In some examples (e.g., based on the categorized threat level), the action plan can require an additional query of one or more information services external to the location monitor system for acquiring additional information related to the potential disruptive condition. The additional query can request the additional information based on information already received and utilized to categorize the current disruptive condition. In this example, the categorization can be further based on the additional information.
[0039] At 98, a set of actions can be selected (e.g., by action unit 50) that is defined by an action plan (e.g., action plan 30) based on the threat level. The set of actions can, for example, be at least a portion of the automatic failover solution. For example, the action defined by the action plan can include a respective set of one or more actions to be performed by the location monitor system (e.g., system 20) based on the category of the identified threat level. For example, the action can include logging the information related to the potential disruptive condition, wherein the information related to the potential disruptive condition comprises location data describing the physical location of the node and a portion of the information related to the potential disruptive condition. As another example, the action can include rebalancing data within a cluster comprising the node and at least one other node that is located at different location from the physical location of the node. In a further example, the action can include sending a warning message (e.g., including location data describing the physical location of the node, a portion of the information related to the potential disruptive condition, and disruptive condition data describing the selected threat level) to an administrator. In yet another example, the action can include moving data from the node to another node at another location that is different from the physical location of the node. At 100, the set of actions can be performed (e.g., by action unit 50).
[0040] FIG. 8 illustrates an example of a non-transitory computer readable medium 110 to store machine readable instructions to implement a location monitor system. When the non-transitory computer readable medium 110 is accessed by a processing resource 112, the machine readable instructions can be executed to cause a computing device to perform operations to implement the location monitor system. The instructions can include receiving instructions 116 to receive information related to a potential disruptive condition at a node in a data center system. For example, the information related to the potential disruptive condition can be based on a location the node. The instructions can also include querying instructions 118 to query an information service external to the location monitor system for additional information related to the potential disruptive condition. The instructions can also include selecting instructions 120 to select a threat level posed by the potential disruptive condition to the node from a plurality of different threat levels based on the information related to the potential disruptive condition and the additional information related to the potential disruptive condition. The instructions can also include performing instructions 122 to perform an action that is defined by an action plan based on the categorized threat level.
[0041] FIG. 9 illustrates an example of a location monitor system 130. The location monitor system 130 can include various components that can be hardware, software (e.g., stored in a non-transitory memory and executable by a processing resource to cause the performance of operations), and/or a combinations of hardware and software. The components can include a transceiver 132 to receive data related to a potential disruptive condition to a node in a data center system from an information source external to the system. The transceiver 132 can also request additional data related to the potential disruptive condition. The transceiver 132 can preprocess the data related to the potential disruptive condition and the additional data. The
components can also include a ranker 134 to select a threat level posed by the potential disruptive condition to the node from a plurality of different threat levels based on the preprocessed data related to the potential disruptive condition and the additional data. The components can also include an action unit 136 to perform an action that is defined by an action plan based on the selected threat level. The action plan can define a respective set of one or more actions to be performed by the location monitor system based on the selected threat level.
[0042] What have been described above are examples. It is, of course, not possible to describe every conceivable combination of components or methods, but one of ordinary skill in the art will recognize that many further combinations and
permutations are possible. Accordingly, the disclosure is intended to embrace all such alterations, modifications, and variations that fall within the scope of this application, including the appended claims. Additionally, where the disclosure or claims recite "a," "an," "a first," or "another" element, or the equivalent thereof, it should be interpreted to include one or more than one such element, neither requiring nor excluding two or more such elements. As used herein, the term "includes" means includes but not limited to, and the term "including" means including but not limited to. The term "based on" means based at least in part on.

Claims

CLAIMS What is claimed is:
1. A method, comprising:
receiving, by a location monitor system comprising a non-transitory memory and a processing resource, information related to a potential disruptive condition at a node in a data center system, wherein the information related to the potential disruptive condition is based on a physical location of the node;
categorizing, by the location monitor system, a threat level posed by the potential disruptive condition to the node; and
performing, by the location monitor system, an action that is defined by an action plan based on the categorized threat level.
2. The method of claim 1 , further comprising determining the location of the node is based on at least one of rack identification information, position of the node on the rack identification information, and global location information.
3. The method of claim 1 , wherein the categorizing of the threat level further comprises selecting the threat level from a plurality of different threat levels, wherein the action defined by the action plan comprises a respective set of one or more actions to be performed by the location monitor system based on the selected threat level.
4. The method of claim 3, wherein the action comprises logging the information related to the potential disruptive condition, wherein the information related to the potential disruptive condition comprises location data describing the physical location of the node and a portion of the information related to the potential disruptive condition.
5. The method of claim 3, wherein the action comprises rebalancing data within a cluster comprising the node and at least one other node that is located at different location from the physical location of the node.
6. The method of claim 3, wherein the action comprises sending a warning message to an administrator, wherein the warning message comprises location data describing the physical location of the node, a portion of the information related to the potential disruptive condition, and disruptive condition data describing the selected threat level.
7. The method of claim 3, wherein the action comprises managing movement of data from the node to another node at another location that is different from the physical location of the node.
8. The method of claim 1 , wherein the information related to the potential disruptive condition comprises at least one of information related to a weather condition, information related to a terrorist threat, information related to an earthquake threat, and information related to a threat of violence.
9. The method of claim 1 , further comprising querying, by the location monitor system, an information service external to the location monitor system for additional information related to the potential disruptive condition, wherein the categorizing the threat level is based on the additional information received from the information service.
10. The method of claim 9, wherein the querying of the information service is based on the categorized threat level associated with the potential disruptive condition.
11. A non-transitory computer readable medium to store machine readable instructions that when accessed and executed by a processing resource cause a computing device to perform operations, the operations comprising:
receiving information related to a potential disruptive condition at a node in a data center system, wherein the information related to the potential disruptive condition is based on a location the node;
querying an information service external to the location monitor system for additional information related to the potential disruptive condition
selecting a threat level posed by the potential disruptive condition to the node from a plurality of different threat levels based on the information related to the potential disruptive condition and the additional information related to the potential disruptive condition; and
performing an action that is defined by an action plan based on the categorized threat level.
12. The non-transitory computer readable medium of claim 11 , wherein when the location of the node is based on at least one of rack identification information, position of the node on the rack identification information, and global positioning information.
13. The non-transitory computer readable medium of claim 11 , wherein the action that is defined by the action plan comprises a respective set of one or more actions to be performed by the location monitor system based on the selected threat level.
14. The non-transitory computer readable medium of claim 1 1 , wherein the information related to the potential disruptive condition comprises at least one of information related to a weather condition, information related to a terrorist threat, information related to an earthquake threat, and information related to a threat of violence.
15. A location monitor system, comprising:
a transceiver to receive data related to a potential disruptive condition to a node in a data center system from an information source external to the system, request additional data related to the potential disruptive condition, and preprocess the data related to the potential disruptive condition and the additional data;
a ranker to select a threat level posed by the potential disruptive condition to the node from a plurality of different threat levels based on the preprocessed data related to the potential disruptive condition and the additional data; and
an action unit to perform an action that is defined by an action plan based on the selected threat level, wherein the action plan defines a respective set of one or more actions to be performed by the location monitor system based on the selected threat level.
PCT/IN2015/000024 2014-10-30 2015-01-16 Location aware failover solution WO2016067299A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
IN5423CH2014 2014-10-30
IN5423/CHE/2014 2014-10-30

Publications (1)

Publication Number Publication Date
WO2016067299A1 true WO2016067299A1 (en) 2016-05-06

Family

ID=55856708

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IN2015/000024 WO2016067299A1 (en) 2014-10-30 2015-01-16 Location aware failover solution

Country Status (1)

Country Link
WO (1) WO2016067299A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018126189A1 (en) * 2016-12-30 2018-07-05 AEA International Holdings Pte. Ltd. Systems and methods for web-based health and security management
US10664570B1 (en) * 2015-10-27 2020-05-26 Blue Cross Blue Shield Institute, Inc. Geographic population health information system
CN111488675A (en) * 2020-03-18 2020-08-04 四川大学 Mining method for cascading failure potential trigger mode of power system
CN112254573A (en) * 2020-10-09 2021-01-22 中国人民解放军91404部队 Grading method for air electromagnetic threat training scene

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101442479A (en) * 2007-11-22 2009-05-27 华为技术有限公司 Method, equipment and system for updating route in P2P peer-to-peer after node failure
CN101867919A (en) * 2010-07-16 2010-10-20 王晓喃 IPv6 address configuration method of wireless sensor network based on geographical position
CN101930463A (en) * 2010-08-25 2010-12-29 中国运载火箭技术研究院 Memory database-based simulation grid node quick migration method
CN104506576A (en) * 2014-12-03 2015-04-08 常州大学 Wireless sensor network and node task migration method thereof

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101442479A (en) * 2007-11-22 2009-05-27 华为技术有限公司 Method, equipment and system for updating route in P2P peer-to-peer after node failure
CN101867919A (en) * 2010-07-16 2010-10-20 王晓喃 IPv6 address configuration method of wireless sensor network based on geographical position
CN101930463A (en) * 2010-08-25 2010-12-29 中国运载火箭技术研究院 Memory database-based simulation grid node quick migration method
CN104506576A (en) * 2014-12-03 2015-04-08 常州大学 Wireless sensor network and node task migration method thereof

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10664570B1 (en) * 2015-10-27 2020-05-26 Blue Cross Blue Shield Institute, Inc. Geographic population health information system
US11023563B2 (en) 2015-10-27 2021-06-01 Blue Cross Blue Shield Institute, Inc. Geographic population health information system
US11550842B2 (en) 2015-10-27 2023-01-10 Blue Cross And Blue Shield Association Geographic population health information system
US11954146B2 (en) 2015-10-27 2024-04-09 Blue Cross And Blue Shield Association Geographic population health information system
WO2018126189A1 (en) * 2016-12-30 2018-07-05 AEA International Holdings Pte. Ltd. Systems and methods for web-based health and security management
CN111488675A (en) * 2020-03-18 2020-08-04 四川大学 Mining method for cascading failure potential trigger mode of power system
CN112254573A (en) * 2020-10-09 2021-01-22 中国人民解放军91404部队 Grading method for air electromagnetic threat training scene

Similar Documents

Publication Publication Date Title
US11005730B2 (en) System, method, and apparatus for high throughput ingestion for streaming telemetry data for network performance management
US8156219B2 (en) System and method of health monitoring and fault monitoring in a network system
US10924370B2 (en) Monitoring cloud-based services and/or features
EP3231135B1 (en) Alarm correlation in network function virtualization environment
US8984328B2 (en) Fault tolerance in a parallel database system
US9584617B2 (en) Allocating cache request in distributed cache system based upon cache object and marker identifying mission critical data
US8959530B1 (en) Messaging middleware processing thread count based events
US11570075B2 (en) Reverse health checks
US11563657B2 (en) On-demand outages notification in a cloud environment
US10394670B2 (en) High availability and disaster recovery system architecture
US9043636B2 (en) Method of fencing in a cluster system
US10769641B2 (en) Service request management in cloud computing systems
US11570074B2 (en) Detecting outages in a multiple availability zone cloud environment
WO2016067299A1 (en) Location aware failover solution
US11223522B1 (en) Context-based intelligent re-initiation of microservices
US11838194B2 (en) Detecting outages in a cloud environment
CN114338684B (en) Energy management system and method
US11153173B1 (en) Dynamically updating compute node location information in a distributed computing environment
CN114189425A (en) Intent-based distributed alert service
US20170004012A1 (en) Methods and apparatus to manage operations situations in computing environments using presence protocols
CN111342986A (en) Distributed node management method and device, distributed system and storage medium
US9594622B2 (en) Contacting remote support (call home) and reporting a catastrophic event with supporting documentation
US11153769B2 (en) Network fault discovery
US9935836B2 (en) Exclusive IP zone support systems and method
US11940886B2 (en) Automatically predicting fail-over of message-oriented middleware systems

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15855090

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15855090

Country of ref document: EP

Kind code of ref document: A1