US10079713B2 - Determining statuses of computer modules - Google Patents
Determining statuses of computer modules Download PDFInfo
- Publication number
- US10079713B2 US10079713B2 US14/750,549 US201514750549A US10079713B2 US 10079713 B2 US10079713 B2 US 10079713B2 US 201514750549 A US201514750549 A US 201514750549A US 10079713 B2 US10079713 B2 US 10079713B2
- Authority
- US
- United States
- Prior art keywords
- computer modules
- computer
- service
- modules
- computing system
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active, expires
Links
- 238000000034 method Methods 0.000 claims description 36
- 238000012544 monitoring process Methods 0.000 claims description 34
- 230000004044 response Effects 0.000 claims description 9
- 230000003247 decreasing effect Effects 0.000 claims description 2
- 230000006870 function Effects 0.000 description 11
- 238000004590 computer program Methods 0.000 description 8
- 230000008859 change Effects 0.000 description 7
- 239000003795 chemical substances by application Substances 0.000 description 6
- 238000004891 communication Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 230000001419 dependent effect Effects 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 230000008439 repair process Effects 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000001174 ascending effect Effects 0.000 description 1
- 239000000969 carrier Substances 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000001953 sensory effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0654—Management of faults, events, alarms or notifications using network fault recovery
- H04L41/0663—Performing the actions predefined by failover planning, e.g. switching to standby network elements
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/008—Reliability or availability analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/20—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
- G06F11/202—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
- G06F11/2023—Failover techniques
- G06F11/2028—Failover techniques eliminating a faulty processor or activating a spare
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/08—Configuration management of networks or network elements
- H04L41/0803—Configuration setting
- H04L41/0806—Configuration setting for initial configuration or provisioning, e.g. plug-and-play
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/08—Configuration management of networks or network elements
- H04L41/0803—Configuration setting
- H04L41/0813—Configuration setting characterised by the conditions triggering a change of settings
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/20—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
- G06F11/202—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
- G06F11/2038—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant with a single idle spare processing component
Definitions
- This description relates to computer systems.
- Computer systems may utilize multiple software applications and/or services to facilitate operations of enterprises such as businesses.
- the services may not all be equally important to ensure that the operations run smoothly.
- some of the services may rely on multiple computer modules, and some of the computer modules may be required to implement multiple services.
- Some computer modules may be required to implement multiple services, and if those computer modules fail, multiple important services could also fail.
- a non-transitory computer-readable storage medium may include instructions stored thereon for ranking multiple computer modules to reduce failure impacts.
- the instructions When executed by at least one processor, the instructions may be configured to cause a computing system implementing the multiple computer modules to at least associate the multiple computer modules with multiple services that rely on the multiple computer modules, at least one of the multiple services relying on more than one of the multiple computer modules, determine values of the multiple services, and rank the multiple computer modules based on the determined values of the multiple services with which the respective multiple computer modules are associated.
- a computing system may include at least one processor, and a non-transitory computer-readable storage medium comprising instructions stored thereon for ranking multiple computer modules to reduce failure impacts.
- the instructions When executed by the at least one processor, the instructions may be configured to cause the computing system to at least associate multiple computer modules with multiple services that rely on the multiple computer modules, at least one of the multiple services relying on more than one of the multiple computer modules, determine values of the multiple services, and rank the multiple computer modules based on the determined values of the multiple services with which the respective multiple computer modules are associated.
- a method for ranking multiple computer modules to reduce failure impacts may include provisioning multiple services that rely on the multiple computer modules, in conjunction with the provisioning the multiple services, associating the multiple computer modules with multiple services that rely on the multiple computer modules, at least one of the multiple services relying on more than one of the multiple computer modules, determining values of the multiple services, the values of each of the multiple services being based on a determined importance of the respective service, determining scores of the multiple computer modules based on the determined values of the multiple services that rely on the respective computer modules, ranking the multiple computer modules based on the determined scores of the multiple computer modules, re-determining the values of the multiple services based on a present time being included in a different part of a schedule for at least one of the multiple services than a previous time, re-ranking the multiple computer modules based on the re-determined values of the multiple services, selecting at least one of the multiple computer modules for failover support based on the rank of the at least one of the computer modules, and providing the failover support to
- FIG. 1 is a block diagram of a computing system for ranking computer modules according to an example implementation.
- FIG. 2 is a flowchart of a method for ranking computer modules according to an example implementation.
- FIG. 3 is a diagram of a service model used in ranking computer modules according to an example implementation.
- FIG. 4A is a table showing costs of a service at different times based on a schedule according to an example implementation.
- FIG. 4B is a table showing costs of a machine reservation service at different times based on another schedule according to an example implementation.
- FIG. 4C is a table showing a cost of a payroll service according to an example implementation.
- FIG. 4D is a table showing a cost of a human resource (HR) service according to an example implementation.
- FIG. 5A is a table showing costs of services at a specified time according to an example implementation.
- FIG. 5B is a table showing scores of computer modules included in the service model shown in FIG. 3 according to an example implementation.
- FIG. 5C is a table showing the computer modules ranked according to the scores shown in FIG. 5B according to an example implementation.
- a computing system such as a computer network including a datacenter, may include multiple computers that implement multiple computer modules to host, implement, and/or provide multiple services.
- the multiple computer modules which may also be considered configuration items (CIs), may implement the multiple services.
- the computer modules may include virtual machines, hypervisors, web servers, software applications, and/or database servers. Some computer modules may implement a single service, some computer modules may implement multiple services, and some services may require multiple computer modules to implement the service.
- the computer modules may be prioritized and/or ranked (such as from most important to least important), for correction (such as by administrators), have their frequencies of monitoring determined and/or changed, be selected for decommissioning, be selected for provisioning a new service, and/or be selected for failover support, as non-limiting examples.
- the failover support for a computer module may include allocating one or more redundant computer modules to the service supported by the computer module so that if the computer module fails, the one or more redundant computer modules can still support the service.
- the rankings may be used to minimize the impact of disruptions or failure by the computer modules in a computer network with shared resources, the computer network being managed by the computing system described herein.
- FIG. 1 is a block diagram of a computing system 100 for ranking computer modules according to an example implementation.
- the computing system 100 may perform functions similar to those described above.
- the computing system 100 may include a service maintainer 104 .
- the service maintainer 104 may maintain the services that have been launched and/or provisioned by the service provisioner 102 .
- the service maintainer 104 may, for example, ensure that adequate computing resources continue to be provided to each of the services, and determine whether any of the services are interrupted, such as if a service drops or fails.
- the service maintainer 104 may also allocate failover resources to services and/or computer modules based on their ranks, such as by allocating failover resources to services with the highest ranks.
- the service maintainer 104 may allocate failover resources to services and/or computer modules by assigning more and/or redundant computer modules to the services so that, in the event that one or more of the computer modules assigned to the service fails, the service will also still continue to function by relying on the remaining computer modules.
- the service maintainer 106 may assign the failover resources to the highest ranked computer modules.
- the service maintainer 104 may also determine that some of the services and/or computer modules should be decommissioned.
- the service maintainer 104 may determine that some of the services should be decommissioned based on their ranks, such as by decommissioning the service(s) and/or computer modules with the lowest rank(s).
- the service maintainer 104 may determine that one or more of the services and/or computer modules should be decommissioned based on insufficient computing resources, such as insufficient computer modules, to support all of the services.
- the service maintainer 104 may decommission services and/or computer modules, starting with the services and/or computer modules with lowest importance, thereby freeing the computer modules that supported and/or implemented the decommissioned services, until sufficient computing resources exist to support and/or implement all the services.
- the computing system 100 may include a schedule maintainer 106 .
- the schedule maintainer 106 may maintain a schedule for each of the services, and/or determine what part of a schedule a particular time falls into. Some services may be active during some dates and/or times and inactive during other dates and/or times, or may have greater importance or criticality at some dates and/or times than other dates and/or times.
- the schedule maintainer 106 may maintain these dates and/or times of activity or inactivity, or criticality.
- the schedule maintainer 106 may also maintain importances of each of the services for each part of the schedule.
- the services may have different importances at different parts of their respective schedules, reflecting their changing importances at different times, as shown and described below with respect to FIGS. 4A, 4B, 4C, and 4D . Not all services may be equally important, and the importances and/or costs of the services may be based on a cost of impact to business operations if the services become unavailable.
- the computing system 100 may include a service importance determiner 108 .
- the service importance determiner 108 may determine the importances, which may be relative importances, of the services launched by the service provisioner 102 and/or maintained by the service maintainer 104 .
- the service importance determiner 108 may, for example, assign a numerical value or score to each of the services.
- the numerical value or score of each service may be based on the determined importance of each service.
- the importance may change based on the date and/or time, such as based on active or critical times maintained by the schedule maintainer 106 .
- the service importance determiner 108 may determine the importances based on input received by the computing system 100 from an administrator, or based on importances included in a file or template, as non-limiting examples.
- the computing system 100 may include a service and module associater 110 .
- the service and module associater 110 may associate services with the computer modules needed to implement the services.
- the service and module associater 110 may, for example, associate services with the computer modules that the service provisioner 102 determined were needed to implement the services, and may associate the services with the computer modules in response to the service provisioner 102 provisioning the services. At least one of the services may rely on more than one of the multiple computer modules.
- the computing system 100 may include a module maintainer 112 .
- the module maintainer 112 may maintain the computer modules (which may also be considered configuration items), such as by ensuring that they are continuing to run properly and have not been overloaded with requests from more services than the computer modules have resources to handle.
- the module maintainer 112 may also decommission, end, and/or turn off a computer module.
- the computing system 100 may include a module associater 117 .
- the module associater 117 may associate the computer modules with the services.
- the module associater 117 may interface and/or communicate with the service and module associater 110 to determine the services with which each computer module is associated and/or supports.
- the computing system 100 may include a module score determiner 118 .
- the module score determiner 118 may determine scores for each computer module.
- the module score determiner 118 may determine the scores for each computer module based on the importance(s) of the services that are associated with and/or dependent on the respective computer module.
- the module score determiner 118 may, for example, add (and/or determine a sum of) the numerical importances of each of the services associated with the respective computer module.
- the module score determiner 118 may divide the numerical importance of the service by the number of computer modules associated with the service (and/or multiply, for each of the computer modules associated with the service, the importance by a fraction less than one, with the fractions for all of the computer modules associated with the service adding up to one) and add the quotient (and/or product) to the score of each computer module. If each of the computer modules is necessary for the service to function, the module score determiner 118 may add the numerical importance of the service to each of the computer modules associated with the service.
- the module score determiner 118 may either divide the numerical importance by the number of computer modules (or multiply the numerical importance by a fraction which may be the same or different for each of the computer modules), or add the numerical importance to each of the computer modules, based on whether each of the computer modules is necessary for the performance of the service and/or whether a given computer module could be replaced with another computer module while the service is still provided and/or performed.
- the computing system 100 may include a module selector 120 .
- the module selector may select computer modules for provisioning and/or supporting services. If new computer modules are not available, the module selector 120 may select a lower-ranked computer module to provision the new service, so that a less important service dependent on the lower-ranked computer module may be decommissioned rather than a more important service dependent on a higher-ranked computer module.
- the module selector 120 may select computer modules to support the services based on the resource needs of the services and/or based on the resources available at each of the computer modules.
- the computing system 100 may include a ranker 122 .
- the ranker 122 may rank the computer modules based on determined importances of the multiple services, and/or based on the scores of the computer modules that the module score determiner 118 determined.
- the ranker 122 may, for example, rank the computer modules in descending order with the computer modules that have the highest scores as the highest ranked, or rank the computer modules in ascending order with the computer modules that have the lowest scores as the highest ranked.
- the computer system 100 may include a computation trigger 124 .
- the computation trigger 124 may prompt the module score determiner 118 to re-determine and/or re-compute the scores of the computer modules, and/or prompt the service importance determiner 108 to re-determine the importances and/or values of the services, and/or prompt the ranker 122 to re-rank the computer modules.
- the re-computation, re-determining, and/or re-ranking may update the scores, values, and/or ranks in response to events and enable the computer system to minimize the impacts of failures by providing support to the most important computer modules and/or distributing resources in such a manner as to reduce the impact of any one computer module failing.
- the computation trigger 124 may prompt the module score determiner 118 to re-determine and/or re-compute the scores of the computer modules, and/or prompt the service importance determiner 108 to re-determine the importances of the events, and/or prompt the ranker 122 to re-rank the computer modules, in response to events such as the present time being within a different part of and/or within a different period of, a schedule for at least one of the services than a previous time, preconfigured and/or predetermined events occurring in a computer network monitored by the computing system 100 , a preconfigured event associated with at least one of the computer modules exceeding a time threshold, a new computer module being added to the computer network that the computing system 100 is monitoring, or a computer module ceasing to function properly, as non-limiting examples.
- the computing system 100 may include at least one processor 128 .
- the at least one processor 128 may include a processor, such as a microprocessor, capable of executing stored instructions to execute any of the functions, methods, or processes described herein.
- the computing system 100 may include at least one memory device 130 .
- the at least one memory device 130 may store data and/or instructions.
- the data may include data and/or information used to perform, and/or generated as a result of, any of the functions, methods, or processes described herein.
- the instructions may include instructions for the at least one processor 128 to execute any of the functions, methods, or processes described herein.
- the computing system 100 may include at least one input/output device 132 .
- the input/output device 132 may include one or more input devices which receive data from other computing systems and/or receive user input, and one or more output devices which send data to other computing systems and/or provide output to one or more users.
- the at least one processor 128 , at least one memory 130 , and at least one input/output device 132 may be included in a single computing device, or may be distributed among multiple computing devices in a distributed system.
- FIG. 2 is a flowchart of a method 200 for ranking computer modules to reduce failure impacts according to an example implementation.
- the method 200 may include associating the multiple computer modules with multiple services that rely on the multiple computer modules, at least one of the multiple services relying on more than one of the multiple computer modules ( 202 ).
- the method 200 may also include determining values of the multiple services ( 204 ).
- the method 200 may also include ranking the multiple computer modules based on the determined values of the multiple services with which the respective multiple computer modules are associated ( 206 ).
- the associating the multiple computer modules with the multiple services that rely on the multiple computer modules is performed in conjunction with provisioning the multiple services.
- the determined value of each of the multiple services is based on a determined importance of the respective service.
- the ranking the multiple computer modules includes ranking each of the multiple computer modules based on the determined values of the multiple services that rely on the respective computer modules.
- the method 200 may further include determining scores of the multiple computer modules based on the values of the multiple services with which the multiple computer modules are associated.
- the ranking the multiple computer modules may include ranking the multiple computer modules based on the determined scores of the multiple computer modules.
- the method 200 may further include re-determining the values of the multiple services based on a present time being part of a different part of a schedule for at least one of the multiple services than a previous time, and re-ranking the multiple computer modules based on the re-determined values of the multiple services.
- the method 200 may further include re-determining the values of the multiple services based on a preconfigured event occurring in a computer network managed by the computing system, and re-ranking the multiple computer modules based on the re-determined values of the multiple services.
- the method 200 may further include re-determining the values of the multiple services based on a preconfigured event associated with at least one of the computer modules exceeding a time threshold, and re-ranking the multiple computer modules based on the re-determined values of the multiple services.
- the method 200 may further include re-determining the values of the multiple services based on a new computer module being added to a computer network managed by the computing system, and re-ranking the multiple computer modules based on the re-determined values of the multiple services.
- the method 200 may further include re-determining the values of the multiple services based on at least one of the multiple computer modules ceasing to function properly, and re-ranking the multiple computing modules based on the re-determined values of the multiple services.
- the method 200 may further include increasing a frequency of monitoring at least one of the multiple computer modules based on the rank of the at least one computer module.
- the method 200 may further include decreasing a frequency of monitoring at least one of the multiple computer modules based on the rank of the at least one computer module.
- the method 200 may further include selecting at least one computer module for which to decrease a frequency of monitoring based on the rank of the at least one computer module.
- the method 200 may further include decommissioning at least one of the computer modules based on the rank of the at least one of the computer modules.
- the method 200 may further include determining that available computing resources are insufficient to support all of the multiple services.
- the decommissioning is performed in response to the determining that available computing resources are insufficient to support all of the multiple services.
- the method 200 may further include selecting at least one of the computer modules for provisioning a new service based on the rank of the at least one of the computer modules.
- the method 200 may further include selecting at least one of the computer modules for failover support based on the rank of the at least one of the computer modules.
- the method 200 may further include providing the failover support to the selected at least one computer module by associating a redundant computer module with a service associated with the selected at least one computer module.
- FIG. 3 is a diagram of a service model 300 used in ranking computer modules according to an example implementation.
- the service model 300 may include and/or implement a service 302 , a machine reservation service 312 , a human resources (HR) service 318 , and a payroll service 324 . While four services 302 , 312 , 318 , 324 are implemented in the service model 300 shown in FIG. 3 , more or fewer services may be implemented by other example service models.
- the service 302 may include a web application platform.
- four computer modules may implement and/or support the web application platform and/or service 302 .
- the four computer modules may include three web servers 304 , 306 , 308 , and a database 310 .
- the three web servers 304 , 306 , 308 may share the load caused by Internet traffic to and from the web application platform, and when Internet traffic is low, the web application platform may not require all three web servers 304 , 306 , 308 at once.
- the web servers 304 , 306 , 308 may be part of a cluster, and each of the web servers 304 , 306 , 308 in the cluster may be assigned a relative weight within the cluster, which may be based on whether they are a primary web server or a backup web server.
- the module score determiner 118 may assign a primary web server 304 a weight of fifty percent (50%) of the importance of the service 302
- the module score determiner 118 may assign a secondary web server 306 a weight of thirty percent (30%) of the importance of the service 302
- the module score determiner 118 may assign a third web server 308 a weight of twenty percent (20%) of the importance of the service 302 .
- the service 302 may require the database 310 to implement the web application platform, causing the module score determiner 118 to assign the database 310 a score equal to the full importance of the service 302 .
- the three web servers 304 , 306 , 308 and the database 310 may be dedicated solely to the service 302 .
- the service model 300 may also include and/or implement the machine reservation service 312 .
- the machine reservation service 312 may require two computer modules, a machine reservation application 314 and a machine database 316 . Both the machine reservation application 314 and the machine database 316 may be required to implement the machine reservation service 312 , and both the machine reservation application 314 and the machine database 316 may be dedicated solely to the machine reservation service 312 , causing the module score determiner 118 to assign the machine reservation application 314 and the machine database 316 each a score equal to the importance of the machine reservation service 312 .
- the service model 300 may also include and/or implement the HR service 318 .
- the HR service 318 may require, as computer modules, an HR application 320 and a server 322 .
- the HR application 320 may be dedicated solely to the HR service 318 , causing the module score determiner 118 to assign the HR application a score equal to the importance of the HR service 318 , but the server 322 may be shared between the HR service 318 and the payroll service 324 , causing the module score determiner 118 to assign the server 322 a score based on a sum of the importances of the HR service 318 and the payroll service 324 .
- the service model 300 may also include and/or implement the payroll service 324 .
- the payroll service 324 may require, as computer modules, a payroll application 326 and the server 322 .
- the payroll application 326 may be dedicated solely to the payroll service 324 , but the server 322 may be shared between the payroll service 324 and the HR service 318 , causing the module score determiner 118 to assign the payroll application 326 a score equal to the full importance of the payroll service 324 , and to add the importance of the payroll service 324 to the score of the server 322 based on the server's 322 support of the HR service 318 .
- FIGS. 4A, 4B, 4C, and 4D show costs 404 , 408 , 412 , 416 associated with services 302 , 312 , 324 , 318 .
- the costs 404 , 408 , 412 , 416 may be interchanged with, and/or be considered equivalent or synonymous with, importances, of services, as described herein.
- FIG. 4A is a table showing costs 404 of the service 302 at different times based on a schedule 402 according to an example implementation.
- the cost 404 of the service 302 may change based on a time period within the schedule 402 .
- the service 302 has a cost 404 of 300.
- non-peak business hours which may be between 5 pm/17:00 and 9 am/09:00
- the service 302 has a cost 404 of 200.
- the service 302 has a cost 404 of 100.
- These costs 404 reflect the relative importance of providing the service 302 at different times, with greatest importance during peak business hours, next non-peak business hours, and least importance on weekends.
- FIG. 4B is a table showing costs 408 of the machine reservation service 312 at different times based on a schedule 406 according to an example implementation.
- weekends may be more important for the machine reservation service 312 than weekdays.
- the cost 408 of the machine reservation service 312 may change based on the time period within the schedule 406 .
- the machine reservation service 312 may have a cost 408 of 100 during weekdays (which may be Monday through Friday), and the machine reservation service 312 may have a cost 408 of 500 during weekends (which may be Saturday and Sunday).
- FIG. 4C is a table showing a cost 412 of a payroll service 324 according to an example implementation.
- the cost 412 does not vary based on the time period within the schedule 410 , and/or is the same at all days (Monday through Sunday) and times, and is always 75.
- FIG. 4D is a table showing a cost 416 of the HR service 318 according to an example implementation.
- the cost 416 does not vary based on the time period within the schedule 414 and/or is the same at all days (Monday through Sunday) and times, and is always 125.
- FIG. 5A is a table showing costs, and/or importances, of the services 302 , 312 , 324 , 318 at a specified time according to an example implementation.
- the time may be Monday at 3 pm. This may place the service 302 in the peak business hours for a score of 300, place the machine reservation service 312 during a weekday for a score of 100, and the payroll service 324 and HR service would be assigned their only allowable scores of 75 and 125, respectively.
- FIG. 5B is a table showing scores of computer modules included in the service model shown in FIG. 3 according to an example implementation.
- the database 310 which is required to implement the service 302 , has a score of 300, equal to the importance of the service 302 .
- the machine reservation application 314 and the machine database 316 are both required to implement the machine reservation service 312 , and therefore have scores of 100, equal to the importance of the machine reservation service 312 .
- the server 322 is required to implement both the HR service 318 and the payroll service 324 . Because the server 322 is required to implement both the HR service 318 and the payroll service 324 , the server 322 has a score of 200, equal to the sum of the importance of the HR service 318 (125) and the importance of the payroll service 324 (75).
- the payroll application 326 which is required to implement the payroll service 324 , has a score of 75, equal to the importance of the payroll service 324 .
- the HR application 320 which is required to implement the HR service 318 , has a score of 125, equal to the importance of the HR service 318 .
- FIG. 5C is a table showing the computer modules ranked according to the scores shown in FIG. 5B according to an example implementation.
- the database 310 with the highest score of 300, has the highest rank of one (1)
- the server 322 with a score of 200, has the rank of two (2)
- the web server 304 with a score of 150, has a rank of three (3)
- the HR application 320 which has a score of 125, has a rank of four (4)
- the machine reservation application 314 and machine database 316 which have scores of 100, can either be tied with ranks of five (5) or have ranks of five (5) and six (6) as shown in FIG.
- the computing system 100 may recomputed the score and re-rank the computer modules when a service enters a different part of the service's respective schedule, when changes are introduced to the system such as computing devices such as servers are added or removed, or when a computer module goes down and/or fails.
- the web server 306 which is part of a cluster with the web servers 304 , 306 , went down, then the remaining web servers 304 , 306 in the cluster would be assigned higher scores by redistributing the score from the web server 306 to the web servers 304 , 306 , which may result in the web servers 304 , 306 having higher ranks.
- the ranks of the computer modules may be used to prioritize ticket requests, such as to request support personnel and/or information technology specialists to repair or patch any issues with the computer modules (the support personnel and/or information technology specialists may be requested to repair or patch more highly ranked computer modules first), to decommission computer modules with lower ranks in the event that insufficient resources exist to support all the computer modules, and/or for the module monitor 116 to change a frequency of monitoring the computer modules, such as the module monitor 116 monitoring computer modules with higher ranks more frequently and monitoring computer modules with lower ranks less frequently.
- the computing system 100 may perform adaptive monitoring to dynamically rank the computer modules.
- the module monitor 116 may dynamically adjust a frequency of monitoring the computer modules. For example, when a service transitions from one part of a schedule to another, such as from peak business hours to non-peak business hours, or from a weekday to a weekend, the cost or importance of the service may change.
- a throttling mechanism may lower the frequency of monitoring.
- the monitoring agent may be under load in situations such as when the monitoring agent goes down within a cluster or network congestion is such that it is desirable to limit the transfer of monitoring data (data generated and sent by the monitoring agent), the throttling mechanism may lower the frequency of monitoring to slow down the data collection and processing performed by the monitoring agent until the monitoring agent is no longer under the excessive load.
- the module monitor 116 could lower the frequency of monitoring all the computer modules, or only the lowest ranked computer module(s).
- priorities and/or rankings of services and/or computer modules may be updated in response to preconfigured events such as use of computing resources, such as processing resources, memory resources, and/or network or communication resources. For example, processor usage above a certain threshold such as 60% for a predetermined time such as two minutes may be considered a medium priority event, and processor usage above another threshold such as 80% for a predetermined time such as two minutes may be considered a critical priority event.
- the priority and/or ranking of computer modules may be changed, such as increasing the rankings of computer modules, in response to the computing resource usage associated with the computer modules exceeding predetermined thresholds for predetermined times. The increased ranks of the computer modules may prompt support personnel to address any issues regarding computer resource usage with respect to the computer modules.
- failover may be provided for some computer modules to ensure high availability.
- the computing system 100 may maintain a pool of failover resources, which may include computing resources dedicated to failover events, to accommodate a small percentage of the computer modules.
- the computing system 100 may assign the failover resources to the computer modules with the highest ranks in response to the computer modules being re-ranked.
- the computing system 100 may also decommission some predetermined percentage of the lowest ranked computer modules, and/or provision some other predetermined percentage of the highest ranked computer modules, in response to the computer modules being re-ranked.
- the service provisioner 102 may assign computer modules to the new service that have low or lowest scores. Assigning computer modules with lower scores and/or ranks to new services may prevent a single or multiple computer modules from becoming too critical or a single point of failure for multiple services.
- Implementations of the various techniques described herein may be implemented in digital electronic circuitry, or in computer hardware, firmware, software, or in combinations of them. Implementations may implemented as a computer program product, i.e., a computer program tangibly embodied in an information carrier, e.g., in a machine-readable storage device, for execution by, or to control the operation of, data processing apparatus, e.g., a programmable processor, a computer, or multiple computers.
- a computer program such as the computer program(s) described above, can be written in any form of programming language, including compiled or interpreted languages, and can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.
- a computer program can be deployed to be executed on one computer or on multiple computers at one site or distributed across multiple sites and interconnected by a communication network.
- Method steps may be performed by one or more programmable processors executing a computer program to perform functions by operating on input data and generating output. Method steps also may be performed by, and an apparatus may be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit).
- FPGA field programmable gate array
- ASIC application-specific integrated circuit
- processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer.
- a processor will receive instructions and data from a read-only memory or a random access memory or both.
- Elements of a computer may include at least one processor for executing instructions and one or more memory devices for storing instructions and data.
- a computer also may include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks.
- Information carriers suitable for embodying computer program instructions and data include all forms of non-volatile memory, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto-optical disks; and CD-ROM and DVD-ROM disks.
- semiconductor memory devices e.g., EPROM, EEPROM, and flash memory devices
- magnetic disks e.g., internal hard disks or removable disks
- magneto-optical disks e.g., CD-ROM and DVD-ROM disks.
- the processor and the memory may be supplemented by, or incorporated in special purpose logic circuitry.
- implementations may be implemented on a computer having a display device, e.g., a cathode ray tube (CRT) or liquid crystal display (LCD) monitor, for displaying information to the user and a keyboard and a pointing device, e.g., a mouse or a trackball, by which the user can provide input to the computer.
- a display device e.g., a cathode ray tube (CRT) or liquid crystal display (LCD) monitor
- keyboard and a pointing device e.g., a mouse or a trackball
- Other kinds of devices can be used to provide for interaction with a user as well; for example, feedback provided to the user can be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback; and input from the user can be received in any form, including acoustic, speech, or tactile input.
- Implementations may be implemented in a computing system that includes a back-end component, e.g., as a data server, or that includes a middleware component, e.g., an application server, or that includes a front-end component, e.g., a client computer having a graphical user interface or a Web browser through which a user can interact with an implementation, or any combination of such back-end, middleware, or front-end components.
- Components may be interconnected by any form or medium of digital data communication, e.g., a communication network. Examples of communication networks include a local area network (LAN) and a wide area network (WAN), e.g., the Internet.
- LAN local area network
- WAN wide area network
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Software Systems (AREA)
- Computer Hardware Design (AREA)
- Hardware Redundancy (AREA)
Abstract
Description
Claims (27)
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/750,549 US10079713B2 (en) | 2015-06-25 | 2015-06-25 | Determining statuses of computer modules |
EP16175785.1A EP3109760B1 (en) | 2015-06-25 | 2016-06-22 | Ranking of computer modules |
US16/129,156 US10257022B2 (en) | 2015-06-25 | 2018-09-12 | Determining statuses of computer modules |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/750,549 US10079713B2 (en) | 2015-06-25 | 2015-06-25 | Determining statuses of computer modules |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/129,156 Continuation US10257022B2 (en) | 2015-06-25 | 2018-09-12 | Determining statuses of computer modules |
Publications (2)
Publication Number | Publication Date |
---|---|
US20160380803A1 US20160380803A1 (en) | 2016-12-29 |
US10079713B2 true US10079713B2 (en) | 2018-09-18 |
Family
ID=56235642
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/750,549 Active 2036-02-08 US10079713B2 (en) | 2015-06-25 | 2015-06-25 | Determining statuses of computer modules |
US16/129,156 Active US10257022B2 (en) | 2015-06-25 | 2018-09-12 | Determining statuses of computer modules |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/129,156 Active US10257022B2 (en) | 2015-06-25 | 2018-09-12 | Determining statuses of computer modules |
Country Status (2)
Country | Link |
---|---|
US (2) | US10079713B2 (en) |
EP (1) | EP3109760B1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11132180B2 (en) * | 2018-01-05 | 2021-09-28 | Microsoft Technology Licensing, Llc | Neural-guided deductive search for program synthesis |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10783473B2 (en) * | 2017-01-13 | 2020-09-22 | Bank Of America Corporation | Near Real-time system or network incident detection |
US11030024B2 (en) * | 2019-08-28 | 2021-06-08 | Microsoft Technology Licensing, Llc | Assigning a severity level to a computing service using tenant telemetry data |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060080738A1 (en) * | 2004-10-08 | 2006-04-13 | Bezilla Daniel B | Automatic criticality assessment |
US20090177927A1 (en) * | 2003-12-16 | 2009-07-09 | Daniel Bailey | Determination of impact of a failure of a component for one or more services |
US20090183162A1 (en) * | 2008-01-15 | 2009-07-16 | Microsoft Corporation | Priority Based Scheduling System for Server |
US20090313337A1 (en) * | 2008-06-11 | 2009-12-17 | Linkool International, Inc. | Method for Generating Extended Information |
US20130111468A1 (en) * | 2011-10-27 | 2013-05-02 | Verizon Patent And Licensing Inc. | Virtual machine allocation in a computing on-demand system |
US20140165070A1 (en) * | 2012-12-06 | 2014-06-12 | Hewlett-Packard Development Company, L.P. | Ranking and scheduling of monitoring tasks |
US20140330897A1 (en) * | 2013-04-11 | 2014-11-06 | Tencent Technology (Shenzhen) Company Limited | Method, device and system for recommending access ip address of server, server and storage medium |
US20160117621A1 (en) * | 2010-02-02 | 2016-04-28 | International Business Machines Corporation | Re-factoring, rationalizing and prioritizing a service model and assessing service exposure in the service model |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9940187B2 (en) * | 2015-04-17 | 2018-04-10 | Microsoft Technology Licensing, Llc | Nexus determination in a computing device |
-
2015
- 2015-06-25 US US14/750,549 patent/US10079713B2/en active Active
-
2016
- 2016-06-22 EP EP16175785.1A patent/EP3109760B1/en active Active
-
2018
- 2018-09-12 US US16/129,156 patent/US10257022B2/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090177927A1 (en) * | 2003-12-16 | 2009-07-09 | Daniel Bailey | Determination of impact of a failure of a component for one or more services |
US20060080738A1 (en) * | 2004-10-08 | 2006-04-13 | Bezilla Daniel B | Automatic criticality assessment |
US20090183162A1 (en) * | 2008-01-15 | 2009-07-16 | Microsoft Corporation | Priority Based Scheduling System for Server |
US20090313337A1 (en) * | 2008-06-11 | 2009-12-17 | Linkool International, Inc. | Method for Generating Extended Information |
US20160117621A1 (en) * | 2010-02-02 | 2016-04-28 | International Business Machines Corporation | Re-factoring, rationalizing and prioritizing a service model and assessing service exposure in the service model |
US20130111468A1 (en) * | 2011-10-27 | 2013-05-02 | Verizon Patent And Licensing Inc. | Virtual machine allocation in a computing on-demand system |
US20140165070A1 (en) * | 2012-12-06 | 2014-06-12 | Hewlett-Packard Development Company, L.P. | Ranking and scheduling of monitoring tasks |
US20140330897A1 (en) * | 2013-04-11 | 2014-11-06 | Tencent Technology (Shenzhen) Company Limited | Method, device and system for recommending access ip address of server, server and storage medium |
Non-Patent Citations (2)
Title |
---|
Extended European Search Report for European Application No. 16175785.1, dated Oct. 20, 2016, 8 pages. |
Wang, Juite et al., "A fuzzy multicriteria group decision making approach to select configuration items for software development", Fuzzy Sets and Systems, 134, pp. 343-363, 2003. |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11132180B2 (en) * | 2018-01-05 | 2021-09-28 | Microsoft Technology Licensing, Llc | Neural-guided deductive search for program synthesis |
Also Published As
Publication number | Publication date |
---|---|
EP3109760B1 (en) | 2019-11-27 |
US20190013998A1 (en) | 2019-01-10 |
EP3109760A1 (en) | 2016-12-28 |
US10257022B2 (en) | 2019-04-09 |
US20160380803A1 (en) | 2016-12-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11416286B2 (en) | Computing on transient resources | |
US9755990B2 (en) | Automated reconfiguration of shared network resources | |
US8332862B2 (en) | Scheduling ready tasks by generating network flow graph using information receive from root task having affinities between ready task and computers for execution | |
US8966030B1 (en) | Use of temporarily available computing nodes for dynamic scaling of a cluster | |
US10754704B2 (en) | Cluster load balancing based on assessment of future loading | |
US9495214B2 (en) | Dynamic resource allocations method, systems, and program | |
US10257022B2 (en) | Determining statuses of computer modules | |
US11106508B2 (en) | Elastic multi-tenant container architecture | |
US10191771B2 (en) | System and method for resource management | |
EP3293632B1 (en) | Dynamically varying processing capacity entitlements | |
CN113886089B (en) | Task processing method, device, system, equipment and medium | |
WO2012164446A1 (en) | Resource allocation for a plurality of resources for a dual activity system | |
US9537787B2 (en) | Dynamically balancing resource requirements for clients with unpredictable loads | |
US9043575B2 (en) | Managing CPU resources for high availability micro-partitions | |
US11385972B2 (en) | Virtual-machine-specific failover protection | |
US11303546B2 (en) | Service system and control method of the same | |
WO2011096249A1 (en) | Load control device | |
US10218779B1 (en) | Machine level resource distribution | |
US20230026659A1 (en) | Transparent multiple availability zones in a cloud platform | |
US20230229477A1 (en) | Upgrade of cell sites with reduced downtime in telco node cluster running containerized applications | |
Xie et al. | A novel independent job rescheduling strategy for cloud resilience in the cloud environment | |
US20240069974A1 (en) | Workload scheduler for high availability | |
CN115469995A (en) | Container service distribution |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: BMC SOFTWARE, INC., TEXAS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:PHADKE, NILESH;PHADKE, PALLAVI;REEL/FRAME:035909/0356 Effective date: 20150623 |
|
AS | Assignment |
Owner name: CREDIT SUISSE AG, CAYMAN ISLANDS BRANCH, AS COLLATERAL AGENT, NORTH CAROLINA Free format text: SECURITY INTEREST;ASSIGNORS:BMC SOFTWARE, INC.;BLADELOGIC, INC.;REEL/FRAME:043351/0189 Effective date: 20160523 Owner name: CREDIT SUISSE AG, CAYMAN ISLANDS BRANCH, AS COLLAT Free format text: SECURITY INTEREST;ASSIGNORS:BMC SOFTWARE, INC.;BLADELOGIC, INC.;REEL/FRAME:043351/0189 Effective date: 20160523 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
AS | Assignment |
Owner name: CREDIT SUISSE, AG, CAYMAN ISLANDS BRANCH, AS COLLATERAL AGENT, NEW YORK Free format text: SECURITY INTEREST;ASSIGNORS:BMC SOFTWARE, INC.;BLADELOGIC, INC.;REEL/FRAME:047185/0744 Effective date: 20181002 Owner name: CREDIT SUISSE, AG, CAYMAN ISLANDS BRANCH, AS COLLA Free format text: SECURITY INTEREST;ASSIGNORS:BMC SOFTWARE, INC.;BLADELOGIC, INC.;REEL/FRAME:047185/0744 Effective date: 20181002 |
|
AS | Assignment |
Owner name: BMC ACQUISITION L.L.C., TEXAS Free format text: RELEASE OF PATENTS;ASSIGNOR:CREDIT SUISSE AG, CAYMAN ISLANDS BRANCH;REEL/FRAME:047198/0468 Effective date: 20181002 Owner name: BLADELOGIC, INC., TEXAS Free format text: RELEASE OF PATENTS;ASSIGNOR:CREDIT SUISSE AG, CAYMAN ISLANDS BRANCH;REEL/FRAME:047198/0468 Effective date: 20181002 Owner name: BMC SOFTWARE, INC., TEXAS Free format text: RELEASE OF PATENTS;ASSIGNOR:CREDIT SUISSE AG, CAYMAN ISLANDS BRANCH;REEL/FRAME:047198/0468 Effective date: 20181002 |
|
AS | Assignment |
Owner name: CREDIT SUISSE AG, CAYMAN ISLANDS BRANCH, AS COLLAT Free format text: SECURITY INTEREST;ASSIGNORS:BMC SOFTWARE, INC.;BLADELOGIC, INC.;REEL/FRAME:050327/0634 Effective date: 20190703 Owner name: CREDIT SUISSE AG, CAYMAN ISLANDS BRANCH, AS COLLATERAL AGENT, NORTH CAROLINA Free format text: SECURITY INTEREST;ASSIGNORS:BMC SOFTWARE, INC.;BLADELOGIC, INC.;REEL/FRAME:050327/0634 Effective date: 20190703 |
|
AS | Assignment |
Owner name: THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS COLLATERAL AGENT, TEXAS Free format text: SECURITY INTEREST;ASSIGNORS:BMC SOFTWARE, INC.;BLADELOGIC, INC.;REEL/FRAME:052844/0646 Effective date: 20200601 Owner name: THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS COLLATERAL AGENT, TEXAS Free format text: SECURITY INTEREST;ASSIGNORS:BMC SOFTWARE, INC.;BLADELOGIC, INC.;REEL/FRAME:052854/0139 Effective date: 20200601 |
|
AS | Assignment |
Owner name: ALTER DOMUS (US) LLC, ILLINOIS Free format text: GRANT OF SECOND LIEN SECURITY INTEREST IN PATENT RIGHTS;ASSIGNORS:BMC SOFTWARE, INC.;BLADELOGIC, INC.;REEL/FRAME:057683/0582 Effective date: 20210930 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 4 |
|
AS | Assignment |
Owner name: BLADELOGIC, INC., TEXAS Free format text: TERMINATION AND RELEASE OF SECURITY INTEREST IN PATENTS;ASSIGNOR:ALTER DOMUS (US) LLC;REEL/FRAME:066567/0283 Effective date: 20240131 Owner name: BMC SOFTWARE, INC., TEXAS Free format text: TERMINATION AND RELEASE OF SECURITY INTEREST IN PATENTS;ASSIGNOR:ALTER DOMUS (US) LLC;REEL/FRAME:066567/0283 Effective date: 20240131 |
|
AS | Assignment |
Owner name: GOLDMAN SACHS BANK USA, AS SUCCESSOR COLLATERAL AGENT, NEW YORK Free format text: OMNIBUS ASSIGNMENT OF SECURITY INTERESTS IN PATENT COLLATERAL;ASSIGNOR:CREDIT SUISSE AG, CAYMAN ISLANDS BRANCH, AS RESIGNING COLLATERAL AGENT;REEL/FRAME:066729/0889 Effective date: 20240229 |