US20150067696A1 - System and method for managing workload performance on billed computer systems - Google Patents
System and method for managing workload performance on billed computer systems Download PDFInfo
- Publication number
- US20150067696A1 US20150067696A1 US14/478,062 US201414478062A US2015067696A1 US 20150067696 A1 US20150067696 A1 US 20150067696A1 US 201414478062 A US201414478062 A US 201414478062A US 2015067696 A1 US2015067696 A1 US 2015067696A1
- Authority
- US
- United States
- Prior art keywords
- performance
- goal
- defined performance
- workload
- determining
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
- G06F9/505—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
- G06F9/5077—Logical partitioning of resources; Management or configuration of virtualized resources
Definitions
- the present invention relates to systems and methods for managing billed computer system usage, and more particularly, to optimizing workload performance through management of workload performance goals.
- System z users have the benefit of multiple redundant mainframe computers that will continue to seamlessly execute users' workload despite the failure of individual machines.
- Each group of related computing functions being performed for a user is referred to as a logical partition (LPAR), which is executed by a given machine called a central electronic complex (CEO).
- LPAR logical partition
- CEO central electronic complex
- the user can set usage limits for LPARs and for groups of LPARs.
- the present inventors have previously developed improved systems and methods for managing LPAR capacity limits to enhance system performance and control billable costs. An example of such systems and methods can be seen in U.S. Non-provisional patent application Ser. No. 14/199,364, filed on Mar. 6, 2014, the contents of which are herein incorporated by reference in their entirety.
- service classes In connection with assigning computing workload to LPARs, users define “service classes.” When defining a service class, a user defines a workload importance level for the workload to be performed therein, as well as a performance goal. In the System z context, there are seven importance levels ranging from 0 (most important) through 6 (least important, also called “Discretionary”). The performance goal is defined in terms of certain performance parameters, such as a percentage of operations completed within a given time. An example of a defined performance goal would be 90% of transactions to be finished with 0.01 seconds clock time.
- a service class can include multiple divisions called “periods,” assigned to different importance levels and having different defined performance goals.
- periodics When workload is introduced into a multi-period service class, it automatically starts in the period with the highest importance level. If the workload exceeds a defined usage limit of the period in which it is currently running, it will be automatically transferred into the period having the next highest importance level.
- the usage limit is defined in terms of a usage parameter, such as time, processor cycles or the like.
- multi-period service classes are used to allow shorter running workload to pass more quickly through the system without being unduly delayed by longer running workload assigned to the same service class.
- the System z operating system includes a Workload Manager (WLM) for each LPAR which manages service class workload with the LPAR based on importance level, and which also monitors achievement of the defined performance goal.
- WLM Workload Manager
- a performance index (PI) is measured for each defined performance goal by z/OS based on the performance parameters in terms of which the goal is defined.
- API of 1.0 indicates that a given defined performance goal is being exactly met, although a range of 0.8 to 1.2 is generally used as an indicator of satisfactory goal achievement, with PI values under 0.8 indicating overachievement (i.e., the performance goal is exceeded) and values over 1.2 indicating underachievement (i.e., the performance goal is not achieved).
- FIG. 1 a chart graphically illustrates the relationship between service classes and WLM importance levels.
- some of the service classes have multiple periods (e.g., the service classes DDFPROD and DDFTEST—while it is common for a multi-period service class to have only two periods, a service class could include more than two periods).
- Each service class or period thereof has a defined performance goal, which the WLM monitors achievement of based on the PI.
- the WLM will allocate capacity between service classes (and periods thereof) based upon the PI.
- the WLM will reduce allocated capacity to the overachieving service class or period in view of a service class or period with a PI indicating underachievement.
- the WLM is configured to stop allocating more capacity thereto. The logic underlying this configuration being that the defined performance goal of the service class/period simply cannot be achieved with a reasonable allocation of capacity.
- a performance goal is normally defined by a user when a service class is created. While a user could manually change the defined performance goals later, this is rarely done. While the WLM will change allocated capacity based on the PI, it does not ever change the defined performance goal. Sub-optimal goal definitions can lead to undesirable results. For instance, the overachievement case described above can effectively result in higher importance workload being slowed down in favor or less important workload in another service class/period. The persistent underachievement case can effectively result in the WLM “giving up” on the affected service class/period.
- a method for managing mainframe computer system usage includes receiving a first performance optimization goal for workload performance in a first service class, the first service class having a first defined performance goal. Achievement of the first performance optimization goal is assessed, and a first preferred value for the first defined performance goal is determined based on assessing achievement of the first performance optimization goal.
- a first notification including the first preferred value is generated.
- the notification can include a request to change the first defined performance goal to match the first preferred value.
- Automatic changes can also be authorized, and implemented depending on other factors such as capacity shortages of an associated logical partition and workload performance criticality.
- the method can be applied to single- and multiple-period service classes, and repeated iteratively while workload is being performed on the mainframe computer system.
- a method for managing mainframe computer system usage includes receiving, for workload to be performed in each of a plurality of service classes having a respective plurality of defined performance goals: a performance optimization goal for workload performance; and a workload criticality designation, indicating that workload performance is critical or not critical.
- An automatic change authorization is also received, indicating that automatic changes to the respective defined performance goals are or are not authorized.
- Achievement of the respective plurality of performance optimization goals is assessed to identify achievement, underachievement or overachievement thereof.
- the workload criticality designation and the automatic change authorization it is determined whether any action is to be taken in connection with the respective defined performance goal.
- a tangible data storage medium is encoded with program instructions to perform the methods and systems of the present invention when executed by a computer system.
- FIG. 1 is a graphical illustration of services classes and workload importance levels definitions associated therewith;
- FIG. 2 is a schematic overview of a system for managing mainframe computer usage according to an embodiment of the present invention, including a performance management controller and a plurality of management system agents executed by central electronic complexes (CECs) of a mainframe computer system;
- CECs central electronic complexes
- FIG. 3 is a schematic overview of functional interactions involving the performance management controller of FIG. 2 ;
- FIG. 4 is a flow diagram of a configuration phase of a method of managing workload performance
- FIG. 5 is a flow diagram of a performance monitoring and optimization phase of a method of managing workload performance
- FIG. 6 is the graphical illustration of FIG. 1 , additionally reflecting workload performance criticality designations
- FIG. 7 is a decision table illustrating possible outcomes of an iteration of the performance monitoring and optimization phase of FIG. 5 , based on settings received in the configuration phase of FIG. 4 ;
- FIG. 8 is an exemplary table illustrating the determination of preferred values for defined performance goals, when such determination is dictated by the outcomes of FIG. 7 .
- IBM System z platform is the preeminent contemporary example of third party-provided computing services.
- the following description will be couched in terms relevant to this example.
- the present invention could be applied to manage workload on other billed computer systems, in which workload is assigned to classes or other divisions for which performance goals are defined and monitored in connection with capacity management.
- a user is having workload executed by a plurality of logical partitions (LPARs—LPA1, LPA2 . . . LPC3) running on plurality of third-party mainframe computers (CEO A, CEO B, CEO C).
- a system 10 for managing workload performance is implemented via a management system controller 12 executed by one of the LPARs (LPB1 in the depicted example) and a plurality of management system agents 14 running on each of the LPARs. Data is exchanged between the system controller 12 and agents 14 using appropriate communications protocols (e.g., TCP/IP).
- a hardware management console (HMC) allows the controller 12 to implement management changes via the agents 14 .
- the agents 14 access the HMC via a base control program internal interface (BCPii).
- the management system controller 12 receives information on existing service classes/periods and corresponding definition and configuration information from the interactive system productivity facility (ISPF).
- ISPF interactive system productivity facility
- the agents 14 supply the controller 12 with information on current service class configuration and performance information.
- this information includes the performance index (PI) monitored by the work load manager (WLM) based on the respective defined performance goal, but preferably includes additional performance parameters that may be separate from those upon which the defined performance goal is based.
- Non-limiting examples include indications of processing capacity being used (e.g., activity in millions of service units (MSU)) and speed of workload execution (e.g., input from a Delay Counter) and number of workload entities (e.g. online transactions) executed per second.
- MSU mobility management unit
- workload execution e.g., input from a Delay Counter
- number of workload entities e.g. online transactions
- users can also define performance optimization goals for each service class or period thereof including not only the parameter(s) used to assess achievement of the defined performance goal, but based on additional parameters, information regarding which would be gathered by the agents 14 , as described above.
- the use of performance optimization goals based on a combination of parameters allows for a more accurate determination of service class workload performance.
- the ISPF can be used to allow the user to determine the performance parameters to be used and set performance optimization goals based thereon.
- the management controller 12 evaluates the achievement of the performance optimization goals based upon the usage information received from the agents 14 , and determines, for each service class/period, whether action should be taken with respect to the defined performance goal. This determination will be explained in greater detail below.
- action preferably includes sending notifications to the user and/or determining and implementing defined performance goal changes. Notifications can be sent via a write-to-operator (WTO) message, an email, or other message type. Goal changes are implemented via the respective agent 14 via the HMC.
- WTO write-to-operator
- Goal changes are implemented via the respective agent 14 via the HMC.
- the management controller 12 stores its inputs and outputs in databases 16 .
- a web server 20 offers a reporting interface 22 via which users can generate reports on current and historical performance and management actions.
- the method of managing workload performance includes a configuration phase (shown in FIG. 4 ), in which performance optimization goals and management preferences are defined for the service class, and a performance monitoring and optimization phase (shown in FIG. 5 ), in which performance optimization goal achievement is periodically assessed in order to determine what, if any, action(s) should be taken.
- a configuration phase shown in FIG. 4
- a performance monitoring and optimization phase shown in FIG. 5
- performance optimization goal achievement is periodically assessed in order to determine what, if any, action(s) should be taken.
- configuration starts at block 100 for a first service class or period thereof.
- a performance optimization goal is received, which is distinct from the defined performance goal for that service class/period, although it may share one or more performance parameters therewith.
- the performance optimization goal is preferably based on at least one separate performance parameter, as well as a shared parameter.
- the user can be allowed to specify the parameters, themselves, in addition to threshold values therefor.
- a workload criticality designation is received, which indicates whether workload being performed in the respective service class/period is critical or not.
- This designation allows a user to set performance goals while distinguishing between service classes/periods where it is critical that the goal be met, as opposed to others for which the goal is desirable but failing to meet the goal is of less consequence.
- the criticality designation is used in determining whether action is necessary, as will be described in greater detail below.
- a graphic illustration of the significance of criticality designations can be seen in FIG. 6 , which reproduces FIG. 1 except that service classes or periods thereof designated to be critical are more darkly shaded.
- criticality designations can have time-based criteria. For example, batch workload can be critical during certain night time hours, but not critical during the day.
- a typical System z user would ordinarily be running multiple LPARs, each of which included a plurality of service classes.
- the actions of blocks 102 - 104 can be repeated until performance optimization goals, workload criticality designations and automatic change authorizations have been received for every service class or period thereof (block 110 ).
- an automatic change authorization is received at block 110 .
- the automatic change authorization allows a user to specify whether the performance management system is permitted to automatically implement changes to the defined performance goal for the service classes/periods. Without authorization being given for automatic changes, specific permission to implement a recommended change will always be required.
- the configuration phase ends at block 112 . A user would preferably be permitted to revisit the configuration phase, were it desired to change settings for any service class or period thereof.
- the performance monitoring and optimization phase begins at block 200 .
- the monitoring and optimization phase runs continuously for every service class/period while workload is being executed.
- An advantageous interval between iterations is five minutes.
- the performance optimization goal for a given service class/period preferably shares parameters with the defined performance goal for that service class or period thereof.
- the performance index (PI) determined by z/OS for the service class/period can be used to determine whether, with respect to the shared parameters, workload performance indicates achievement within an acceptable range, overachievement or underachievement.
- other performance information is used to determine whether workload performance is positive (i.e., meets or exceeds) or negative (i.e., fails to meet) thresholds set for such parameters.
- FIG. 7 illustrates, for a given iteration of the method for a given service class or period thereof, how the different pertinent factors affect the determination of the action to be taken.
- goal achievement status underachieved, achieved, or overachieved
- the existence of a capacity shortage on the respective LPAR, the workload criticality designation and the automatic change authorization can all come into play.
- the simplest case is where the assessment indicates goal achievement. In this case, no action is taken, regardless of the other factors.
- a preferred value for the defined performance goal for the service class/period will be determined, and notification will be generated including the preferred value. If the overachieving service class/period is not designated as critical, then the defined performance goal will not be changed to match the preferred value, regardless of automatic change authorization status. If the overachieving service class/period is designated as critical, then the defined performance goal will be automatically changed if automatic changes are authorized. In cases where the notification is sent with the preferred value, but an automatic change is not made, the notification can allow the user to authorize the change. Upon receipt of such authorization, the defined performance goal would be changed to match the preferred value.
- a notification When underachievement is indicated, then a notification will be sent regardless of the status of the other factors. If there is not a capacity shortage and the workload performance is not critical, then the notification will simply note the underachievement and a preferred value need not be generated or communicated. However, if workload performance is critical, then a preferred value will be generated for the defined performance goal even absent a capacity shortage, and, if automatic changes are authorized, the change to the preferred value will be automatically implemented. In the case of underachievement with a capacity shortage identified, then a preferred value will generated and sent with the notification, regardless of workload criticality. An automatic change will again only be implemented in the case where the service class/period workload performance is designated as critical and automatic changes are authorized.
- the preferred interplay between the PI and the other performance parameters in determining a preferred value for the defined performance goal for the service class/period is illustrated.
- the PI indicates an acceptable range of achievement (e.g, 0.8-1.2)
- the preferred value would not change regardless of whether the other performance parameter(s) was/were met.
- the preferred value will reflect a “harder” goal and where the PI indicates underachievement (e.g., greater than 1.2), then the preferred value with reflect an “easier” goal.
- the direction of change of the preferred value relative to the current defined performance goal is independent of whether the other performance parameter(s) were met.
- the magnitude of change represented by the preferred value relative to the current goal will vary depending on whether the assessment of the other performance parameter(s) is/are positive or negative. In either case, a positive assessment will result in a larger magnitude of change.
- the present invention offers automatic, dynamic notification and adjustment of workload performance bottlenecks, thereby reducing negative performance impacts of non-optimal settings in service classes.
- the present system and method further allow such adjustment to take into consideration the question if workload being is time critical or not, further enhancing the effectiveness of workload definitions and capacity adjustment.
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Debugging And Monitoring (AREA)
Abstract
Description
- This application claims the benefit of U.S. Non-provisional Patent Application Ser. No. 61/874,052, filed on Sep. 5, 2013, the contents of which are herein incorporated by reference in their entirety.
- The present invention relates to systems and methods for managing billed computer system usage, and more particularly, to optimizing workload performance through management of workload performance goals.
- Computer users requiring exceptional reliability, redundancy or security, such as very large corporations—and particularly financial sector corporations such as banks, exchanges, brokerages and the like—will often outsource computing needs to third party providers. The preeminent example of such a provider is the International Business Machines (IBM) corporation, which has several thousand users who pay a premium for the capability and reliability of its System z (“z” standing for “zero downtime”) computing platform.
- System z users have the benefit of multiple redundant mainframe computers that will continue to seamlessly execute users' workload despite the failure of individual machines. Each group of related computing functions being performed for a user is referred to as a logical partition (LPAR), which is executed by a given machine called a central electronic complex (CEO). The user can set usage limits for LPARs and for groups of LPARs. The present inventors have previously developed improved systems and methods for managing LPAR capacity limits to enhance system performance and control billable costs. An example of such systems and methods can be seen in U.S. Non-provisional patent application Ser. No. 14/199,364, filed on Mar. 6, 2014, the contents of which are herein incorporated by reference in their entirety.
- In connection with assigning computing workload to LPARs, users define “service classes.” When defining a service class, a user defines a workload importance level for the workload to be performed therein, as well as a performance goal. In the System z context, there are seven importance levels ranging from 0 (most important) through 6 (least important, also called “Discretionary”). The performance goal is defined in terms of certain performance parameters, such as a percentage of operations completed within a given time. An example of a defined performance goal would be 90% of transactions to be finished with 0.01 seconds clock time.
- To allow further flexibility, a service class can include multiple divisions called “periods,” assigned to different importance levels and having different defined performance goals. When workload is introduced into a multi-period service class, it automatically starts in the period with the highest importance level. If the workload exceeds a defined usage limit of the period in which it is currently running, it will be automatically transferred into the period having the next highest importance level. The usage limit is defined in terms of a usage parameter, such as time, processor cycles or the like. In general, multi-period service classes are used to allow shorter running workload to pass more quickly through the system without being unduly delayed by longer running workload assigned to the same service class.
- The System z operating system (z/OS) includes a Workload Manager (WLM) for each LPAR which manages service class workload with the LPAR based on importance level, and which also monitors achievement of the defined performance goal. A performance index (PI) is measured for each defined performance goal by z/OS based on the performance parameters in terms of which the goal is defined. API of 1.0 indicates that a given defined performance goal is being exactly met, although a range of 0.8 to 1.2 is generally used as an indicator of satisfactory goal achievement, with PI values under 0.8 indicating overachievement (i.e., the performance goal is exceeded) and values over 1.2 indicating underachievement (i.e., the performance goal is not achieved).
- Referring to
FIG. 1 , a chart graphically illustrates the relationship between service classes and WLM importance levels. As can be seen, some of the service classes have multiple periods (e.g., the service classes DDFPROD and DDFTEST—while it is common for a multi-period service class to have only two periods, a service class could include more than two periods). Each service class or period thereof has a defined performance goal, which the WLM monitors achievement of based on the PI. - Significantly, when an LPAR is capacity-limited, the WLM will allocate capacity between service classes (and periods thereof) based upon the PI. In the case of overachievement, the WLM will reduce allocated capacity to the overachieving service class or period in view of a service class or period with a PI indicating underachievement. In the case of a service class/period that is experiencing continuous underachievement in a capacity-limited situation, the WLM is configured to stop allocating more capacity thereto. The logic underlying this configuration being that the defined performance goal of the service class/period simply cannot be achieved with a reasonable allocation of capacity.
- A performance goal is normally defined by a user when a service class is created. While a user could manually change the defined performance goals later, this is rarely done. While the WLM will change allocated capacity based on the PI, it does not ever change the defined performance goal. Sub-optimal goal definitions can lead to undesirable results. For instance, the overachievement case described above can effectively result in higher importance workload being slowed down in favor or less important workload in another service class/period. The persistent underachievement case can effectively result in the WLM “giving up” on the affected service class/period.
- While features like service class definitions and the WLM importance levels allow billed computer system users some flexibility to manage workload performance on LPARs, further improvements are possible.
- In view of the foregoing, it is an object of the present invention to provide an improved system and method for managing workload performance on billed computer systems.
- In a system and method for managing mainframe computer usage according to the present invention, preferred values for service class defined performance goals are determined to optimize workload performance in service classes across a logical partition. According to one method aspect, a method for managing mainframe computer system usage includes receiving a first performance optimization goal for workload performance in a first service class, the first service class having a first defined performance goal. Achievement of the first performance optimization goal is assessed, and a first preferred value for the first defined performance goal is determined based on assessing achievement of the first performance optimization goal.
- According to further aspects, a first notification including the first preferred value is generated. The notification can include a request to change the first defined performance goal to match the first preferred value. Automatic changes can also be authorized, and implemented depending on other factors such as capacity shortages of an associated logical partition and workload performance criticality. The method can be applied to single- and multiple-period service classes, and repeated iteratively while workload is being performed on the mainframe computer system.
- According to another method aspect, a method for managing mainframe computer system usage includes receiving, for workload to be performed in each of a plurality of service classes having a respective plurality of defined performance goals: a performance optimization goal for workload performance; and a workload criticality designation, indicating that workload performance is critical or not critical. An automatic change authorization is also received, indicating that automatic changes to the respective defined performance goals are or are not authorized. Achievement of the respective plurality of performance optimization goals is assessed to identify achievement, underachievement or overachievement thereof. For each of the plurality of service classes, based on the assessed achievement, the workload criticality designation and the automatic change authorization, it is determined whether any action is to be taken in connection with the respective defined performance goal.
- According to an additional aspect of the present invention, a tangible data storage medium is encoded with program instructions to perform the methods and systems of the present invention when executed by a computer system.
- These and other objects, aspects and advantages of the present invention will be better appreciated in view of the drawings and following detailed description of preferred embodiments.
-
FIG. 1 is a graphical illustration of services classes and workload importance levels definitions associated therewith; -
FIG. 2 is a schematic overview of a system for managing mainframe computer usage according to an embodiment of the present invention, including a performance management controller and a plurality of management system agents executed by central electronic complexes (CECs) of a mainframe computer system; -
FIG. 3 is a schematic overview of functional interactions involving the performance management controller ofFIG. 2 ; -
FIG. 4 is a flow diagram of a configuration phase of a method of managing workload performance; -
FIG. 5 is a flow diagram of a performance monitoring and optimization phase of a method of managing workload performance; -
FIG. 6 is the graphical illustration ofFIG. 1 , additionally reflecting workload performance criticality designations; -
FIG. 7 is a decision table illustrating possible outcomes of an iteration of the performance monitoring and optimization phase ofFIG. 5 , based on settings received in the configuration phase ofFIG. 4 ; and -
FIG. 8 is an exemplary table illustrating the determination of preferred values for defined performance goals, when such determination is dictated by the outcomes ofFIG. 7 . - As discussed above, the IBM System z platform is the preeminent contemporary example of third party-provided computing services. Thus, the following description will be couched in terms relevant to this example. However, those skilled in the art will appreciate that the present invention could be applied to manage workload on other billed computer systems, in which workload is assigned to classes or other divisions for which performance goals are defined and monitored in connection with capacity management.
- According to an illustrative embodiment of the present invention, with reference to
FIG. 2 , a user is having workload executed by a plurality of logical partitions (LPARs—LPA1, LPA2 . . . LPC3) running on plurality of third-party mainframe computers (CEO A, CEO B, CEO C). Asystem 10 for managing workload performance is implemented via amanagement system controller 12 executed by one of the LPARs (LPB1 in the depicted example) and a plurality ofmanagement system agents 14 running on each of the LPARs. Data is exchanged between thesystem controller 12 andagents 14 using appropriate communications protocols (e.g., TCP/IP). A hardware management console (HMC) allows thecontroller 12 to implement management changes via theagents 14. In IBM System z, theagents 14 access the HMC via a base control program internal interface (BCPii). - Referring also to
FIG. 3 , themanagement system controller 12 receives information on existing service classes/periods and corresponding definition and configuration information from the interactive system productivity facility (ISPF). ISPF generates the interface by which users define and configure service classes, and receives and stores these inputs. Theagents 14 supply thecontroller 12 with information on current service class configuration and performance information. For each service class/period, this information includes the performance index (PI) monitored by the work load manager (WLM) based on the respective defined performance goal, but preferably includes additional performance parameters that may be separate from those upon which the defined performance goal is based. Non-limiting examples include indications of processing capacity being used (e.g., activity in millions of service units (MSU)) and speed of workload execution (e.g., input from a Delay Counter) and number of workload entities (e.g. online transactions) executed per second. - In connection with the
system 10, users can also define performance optimization goals for each service class or period thereof including not only the parameter(s) used to assess achievement of the defined performance goal, but based on additional parameters, information regarding which would be gathered by theagents 14, as described above. The use of performance optimization goals based on a combination of parameters allows for a more accurate determination of service class workload performance. The ISPF can be used to allow the user to determine the performance parameters to be used and set performance optimization goals based thereon. - The
management controller 12 evaluates the achievement of the performance optimization goals based upon the usage information received from theagents 14, and determines, for each service class/period, whether action should be taken with respect to the defined performance goal. This determination will be explained in greater detail below. When action is taken, such action preferably includes sending notifications to the user and/or determining and implementing defined performance goal changes. Notifications can be sent via a write-to-operator (WTO) message, an email, or other message type. Goal changes are implemented via therespective agent 14 via the HMC. In addition to outputting notifications and/or changes, themanagement controller 12 stores its inputs and outputs indatabases 16. Aweb server 20 offers a reportinginterface 22 via which users can generate reports on current and historical performance and management actions. - The method of managing workload performance includes a configuration phase (shown in
FIG. 4 ), in which performance optimization goals and management preferences are defined for the service class, and a performance monitoring and optimization phase (shown inFIG. 5 ), in which performance optimization goal achievement is periodically assessed in order to determine what, if any, action(s) should be taken. As a preliminary step to a first implementation of the method, the software necessary to execute thesystem 10 components is installed by the user on the CECs. - With the necessary software installed, configuration starts at
block 100 for a first service class or period thereof. Atblock 102, a performance optimization goal is received, which is distinct from the defined performance goal for that service class/period, although it may share one or more performance parameters therewith. As discussed above, the performance optimization goal is preferably based on at least one separate performance parameter, as well as a shared parameter. In setting the performance optimization parameters, the user can be allowed to specify the parameters, themselves, in addition to threshold values therefor. - At
block 104, a workload criticality designation is received, which indicates whether workload being performed in the respective service class/period is critical or not. This designation allows a user to set performance goals while distinguishing between service classes/periods where it is critical that the goal be met, as opposed to others for which the goal is desirable but failing to meet the goal is of less consequence. The criticality designation is used in determining whether action is necessary, as will be described in greater detail below. A graphic illustration of the significance of criticality designations can be seen inFIG. 6 , which reproducesFIG. 1 except that service classes or periods thereof designated to be critical are more darkly shaded. Advantageously, criticality designations can have time-based criteria. For example, batch workload can be critical during certain night time hours, but not critical during the day. - A typical System z user would ordinarily be running multiple LPARs, each of which included a plurality of service classes. Thus, the actions of blocks 102-104 can be repeated until performance optimization goals, workload criticality designations and automatic change authorizations have been received for every service class or period thereof (block 110).
- Either before or after receiving the optimization goals and criticality indications for the service classes, an automatic change authorization is received at
block 110. The automatic change authorization allows a user to specify whether the performance management system is permitted to automatically implement changes to the defined performance goal for the service classes/periods. Without authorization being given for automatic changes, specific permission to implement a recommended change will always be required. The configuration phase ends atblock 112. A user would preferably be permitted to revisit the configuration phase, were it desired to change settings for any service class or period thereof. - Referring to
FIG. 5 , the performance monitoring and optimization phase begins atblock 200. Preferably, once the optimization goals and other settings have been configured for all the service classes, the monitoring and optimization phase runs continuously for every service class/period while workload is being executed. For economy of illustration, however, only one iteration of the monitoring and optimization phase is described for a given service class or period thereof. An advantageous interval between iterations is five minutes. With reference to the appended claims, it should be noted that a reference to a service class or to goals, definitions and/or settings thereof could generically refer to either a single period service class or to a multi-period service class—with the understanding that each of the multiple periods would have its own respective goals and other definitions and/or settings—unless further specified. - At
block 202, achievement of the performance optimization goal is assessed. As discussed above, the performance optimization goal for a given service class/period preferably shares parameters with the defined performance goal for that service class or period thereof. Thus, the performance index (PI) determined by z/OS for the service class/period can be used to determine whether, with respect to the shared parameters, workload performance indicates achievement within an acceptable range, overachievement or underachievement. With respect to the separate parameters, other performance information is used to determine whether workload performance is positive (i.e., meets or exceeds) or negative (i.e., fails to meet) thresholds set for such parameters. - Based on the assessment of performance optimization goal achievement, a determination is made whether any further action is necessary at
block 204. If no action is determined to be necessary, then the method simply returns to block 202 to await the next assessment. If action is determined to be necessary atblock 204, then the necessary action is determined (block 206) and taken (block 210). After the action is taken, the method again returns to block 202 until the next assessment. - Referring to
FIG. 7 , the determination of whether action is to be taken insteps FIG. 4 ).FIG. 7 illustrates, for a given iteration of the method for a given service class or period thereof, how the different pertinent factors affect the determination of the action to be taken. In addition to goal achievement status (underachieved, achieved, or overachieved), the existence of a capacity shortage on the respective LPAR, the workload criticality designation and the automatic change authorization can all come into play. As can be seen, the simplest case is where the assessment indicates goal achievement. In this case, no action is taken, regardless of the other factors. - If overachievement is indicated, then whether or not there is a capacity shortage on the LPAR executing the service class/period in question is significant. The existence of a capacity shortage could be judged preemptively, based upon a proximity to a capacity limit and/or a predictive model indicating a likelihood of capacity meeting its limit within a predetermined time period, or actually based on the present existence of limited capacity. If there is no capacity shortage for the LPAR executing the overachieving service glass, then no action is taken, regardless of the other factors.
- On the other hand, if a capacity shortage is determined to exist, then a preferred value for the defined performance goal for the service class/period will be determined, and notification will be generated including the preferred value. If the overachieving service class/period is not designated as critical, then the defined performance goal will not be changed to match the preferred value, regardless of automatic change authorization status. If the overachieving service class/period is designated as critical, then the defined performance goal will be automatically changed if automatic changes are authorized. In cases where the notification is sent with the preferred value, but an automatic change is not made, the notification can allow the user to authorize the change. Upon receipt of such authorization, the defined performance goal would be changed to match the preferred value.
- When underachievement is indicated, then a notification will be sent regardless of the status of the other factors. If there is not a capacity shortage and the workload performance is not critical, then the notification will simply note the underachievement and a preferred value need not be generated or communicated. However, if workload performance is critical, then a preferred value will be generated for the defined performance goal even absent a capacity shortage, and, if automatic changes are authorized, the change to the preferred value will be automatically implemented. In the case of underachievement with a capacity shortage identified, then a preferred value will generated and sent with the notification, regardless of workload criticality. An automatic change will again only be implemented in the case where the service class/period workload performance is designated as critical and automatic changes are authorized.
- Referring to
FIG. 8 , the preferred interplay between the PI and the other performance parameters in determining a preferred value for the defined performance goal for the service class/period is illustrated. Where the PI indicates an acceptable range of achievement (e.g, 0.8-1.2), the preferred value would not change regardless of whether the other performance parameter(s) was/were met. In general, where the PI indicates overachievement (e.g., less than 0.8), then the preferred value will reflect a “harder” goal and where the PI indicates underachievement (e.g., greater than 1.2), then the preferred value with reflect an “easier” goal. The direction of change of the preferred value relative to the current defined performance goal (i.e., harder or easier) is independent of whether the other performance parameter(s) were met. However, for a given range of under- or over-achievement, the magnitude of change represented by the preferred value relative to the current goal will vary depending on whether the assessment of the other performance parameter(s) is/are positive or negative. In either case, a positive assessment will result in a larger magnitude of change. - From the foregoing, it will be appreciated that the present invention offers automatic, dynamic notification and adjustment of workload performance bottlenecks, thereby reducing negative performance impacts of non-optimal settings in service classes. The present system and method further allow such adjustment to take into consideration the question if workload being is time critical or not, further enhancing the effectiveness of workload definitions and capacity adjustment.
- The above embodiments and provided for illustrative and exemplary purposes; the present invention is not necessarily limited thereto. Rather, those skilled in the art will appreciate that these various modifications, as well as adaptations to particular circumstances, will fall within the scope of the invention as herein shown and described and of the claims appended hereto.
Claims (37)
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/478,062 US8978037B1 (en) | 2013-09-05 | 2014-09-05 | System and method for managing workload performance on billed computer systems |
US14/614,832 US9519519B2 (en) | 2013-09-05 | 2015-02-05 | System and method for managing workload performance on billed computer systems |
US15/375,552 US10127083B2 (en) | 2013-09-05 | 2016-12-12 | System and method for managing workload performance on billed computer systems |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201361874052P | 2013-09-05 | 2013-09-05 | |
US14/478,062 US8978037B1 (en) | 2013-09-05 | 2014-09-05 | System and method for managing workload performance on billed computer systems |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/614,832 Continuation US9519519B2 (en) | 2013-09-05 | 2015-02-05 | System and method for managing workload performance on billed computer systems |
Publications (2)
Publication Number | Publication Date |
---|---|
US20150067696A1 true US20150067696A1 (en) | 2015-03-05 |
US8978037B1 US8978037B1 (en) | 2015-03-10 |
Family
ID=52585172
Family Applications (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/478,062 Active US8978037B1 (en) | 2013-09-05 | 2014-09-05 | System and method for managing workload performance on billed computer systems |
US14/614,832 Expired - Fee Related US9519519B2 (en) | 2013-09-05 | 2015-02-05 | System and method for managing workload performance on billed computer systems |
US15/375,552 Active 2034-12-17 US10127083B2 (en) | 2013-09-05 | 2016-12-12 | System and method for managing workload performance on billed computer systems |
Family Applications After (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/614,832 Expired - Fee Related US9519519B2 (en) | 2013-09-05 | 2015-02-05 | System and method for managing workload performance on billed computer systems |
US15/375,552 Active 2034-12-17 US10127083B2 (en) | 2013-09-05 | 2016-12-12 | System and method for managing workload performance on billed computer systems |
Country Status (1)
Country | Link |
---|---|
US (3) | US8978037B1 (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9846600B1 (en) * | 2016-09-12 | 2017-12-19 | Bmc Software, Inc. | System and method to dynamically allocate varying processing capacity entitlements based on workload importance |
US10643193B2 (en) * | 2015-03-23 | 2020-05-05 | Bmc Software, Inc. | Dynamic workload capping |
US10812278B2 (en) | 2015-08-31 | 2020-10-20 | Bmc Software, Inc. | Dynamic workload capping |
US11301443B2 (en) * | 2016-11-14 | 2022-04-12 | Bank Of America Corporation | Database work file storage limit facility |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020065907A1 (en) * | 2000-11-29 | 2002-05-30 | Cloonan Thomas J. | Method and apparatus for dynamically modifying service level agreements in cable modem termination system equipment |
US20050066326A1 (en) * | 2003-09-19 | 2005-03-24 | International Business Machines Corporation | Program-level performance tuning |
US7228546B1 (en) * | 2000-01-28 | 2007-06-05 | Hewlett-Packard Development Company, L.P. | Dynamic management of computer workloads through service level optimization |
US7752623B1 (en) * | 2004-09-16 | 2010-07-06 | Hewlett-Packard Development Company, L.P. | System and method for allocating resources by examining a system characteristic |
US20130290972A1 (en) * | 2012-04-27 | 2013-10-31 | Ludmila Cherkasova | Workload manager for mapreduce environments |
US20140317691A1 (en) * | 2011-07-27 | 2014-10-23 | Telefonaktiebolaget L M Ericsson (Publ) | Dynamic Client Authorization in Network Management Systems |
Family Cites Families (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5504894A (en) * | 1992-04-30 | 1996-04-02 | International Business Machines Corporation | Workload manager for achieving transaction class response time goals in a multiprocessing system |
US6519660B1 (en) * | 1999-09-28 | 2003-02-11 | International Business Machines Corporation | Method, system and program products for determining I/O configuration entropy |
US7640342B1 (en) * | 2002-09-27 | 2009-12-29 | Emc Corporation | System and method for determining configuration of one or more data storage systems |
US7783852B2 (en) * | 2003-11-26 | 2010-08-24 | Oracle International Corporation | Techniques for automated allocation of memory among a plurality of pools |
US8151269B1 (en) * | 2003-12-08 | 2012-04-03 | Teradata Us, Inc. | Database system having a service level goal responsive regulator |
US7395537B1 (en) * | 2003-12-08 | 2008-07-01 | Teradata, Us Inc. | Administering the workload of a database system using feedback |
US7620706B2 (en) * | 2004-03-13 | 2009-11-17 | Adaptive Computing Enterprises Inc. | System and method for providing advanced reservations in a compute environment |
US8356306B2 (en) * | 2007-07-31 | 2013-01-15 | Hewlett-Packard Development Company, L.P. | Workload management controller using dynamic statistical control |
US20090307508A1 (en) * | 2007-10-30 | 2009-12-10 | Bank Of America Corporation | Optimizing the Efficiency of an Organization's Technology Infrastructure |
US8099411B2 (en) * | 2008-12-15 | 2012-01-17 | Teradata Us, Inc. | System, method, and computer-readable medium for applying conditional resource throttles to facilitate workload management in a database system |
US20100162251A1 (en) * | 2008-12-19 | 2010-06-24 | Anita Richards | System, method, and computer-readable medium for classifying problem queries to reduce exception processing |
US8332857B1 (en) * | 2008-12-30 | 2012-12-11 | Teradota Us, Inc. | Database system having a regulator that performs workload regulation based on optimizer estimates |
US8266477B2 (en) * | 2009-01-09 | 2012-09-11 | Ca, Inc. | System and method for modifying execution of scripts for a job scheduler using deontic logic |
US8793694B2 (en) * | 2009-02-26 | 2014-07-29 | International Business Machines Corporation | Policy driven autonomic performance data collection |
US8707300B2 (en) * | 2010-07-26 | 2014-04-22 | Microsoft Corporation | Workload interference estimation and performance optimization |
CN103098014B (en) * | 2010-08-31 | 2015-08-26 | 日本电气株式会社 | Storage system |
US8751656B2 (en) * | 2010-10-20 | 2014-06-10 | Microsoft Corporation | Machine manager for deploying and managing machines |
US8856335B1 (en) * | 2011-01-28 | 2014-10-07 | Netapp, Inc. | Managing service level objectives for storage workloads |
US8997107B2 (en) * | 2011-06-28 | 2015-03-31 | Microsoft Technology Licensing, Llc | Elastic scaling for cloud-hosted batch applications |
US20130139164A1 (en) * | 2011-11-28 | 2013-05-30 | Sap Ag | Business Process Optimization |
-
2014
- 2014-09-05 US US14/478,062 patent/US8978037B1/en active Active
-
2015
- 2015-02-05 US US14/614,832 patent/US9519519B2/en not_active Expired - Fee Related
-
2016
- 2016-12-12 US US15/375,552 patent/US10127083B2/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7228546B1 (en) * | 2000-01-28 | 2007-06-05 | Hewlett-Packard Development Company, L.P. | Dynamic management of computer workloads through service level optimization |
US20020065907A1 (en) * | 2000-11-29 | 2002-05-30 | Cloonan Thomas J. | Method and apparatus for dynamically modifying service level agreements in cable modem termination system equipment |
US20050066326A1 (en) * | 2003-09-19 | 2005-03-24 | International Business Machines Corporation | Program-level performance tuning |
US7752623B1 (en) * | 2004-09-16 | 2010-07-06 | Hewlett-Packard Development Company, L.P. | System and method for allocating resources by examining a system characteristic |
US20140317691A1 (en) * | 2011-07-27 | 2014-10-23 | Telefonaktiebolaget L M Ericsson (Publ) | Dynamic Client Authorization in Network Management Systems |
US20130290972A1 (en) * | 2012-04-27 | 2013-10-31 | Ludmila Cherkasova | Workload manager for mapreduce environments |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10643193B2 (en) * | 2015-03-23 | 2020-05-05 | Bmc Software, Inc. | Dynamic workload capping |
US10812278B2 (en) | 2015-08-31 | 2020-10-20 | Bmc Software, Inc. | Dynamic workload capping |
US9846600B1 (en) * | 2016-09-12 | 2017-12-19 | Bmc Software, Inc. | System and method to dynamically allocate varying processing capacity entitlements based on workload importance |
US10108459B2 (en) | 2016-09-12 | 2018-10-23 | Bmc Software, Inc. | System and method to dynamically allocate varying processing capacity entitlements based on workload importance |
US11301443B2 (en) * | 2016-11-14 | 2022-04-12 | Bank Of America Corporation | Database work file storage limit facility |
Also Published As
Publication number | Publication date |
---|---|
US8978037B1 (en) | 2015-03-10 |
US20170090986A1 (en) | 2017-03-30 |
US9519519B2 (en) | 2016-12-13 |
US10127083B2 (en) | 2018-11-13 |
US20150150021A1 (en) | 2015-05-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10733026B2 (en) | Automated workflow selection | |
US8904405B1 (en) | System and method for managing mainframe computer system usage | |
US10277525B2 (en) | Method and apparatus for disaggregated overlays via application services profiles | |
US9935865B2 (en) | System and method for detecting and preventing service level agreement violation in a virtualized environment | |
US10789102B2 (en) | Resource provisioning in computing systems | |
US8365182B2 (en) | Method and system for provisioning of resources | |
US10127083B2 (en) | System and method for managing workload performance on billed computer systems | |
US10162684B2 (en) | CPU resource management in computer cluster | |
US20220035682A1 (en) | Dynamic capacity optimization for shared computing resources | |
US10944581B2 (en) | Increasing processing capacity of processor cores during initial program load processing | |
US20170235606A1 (en) | System and methods for implementing control of use of shared resource in a multi-tenant system | |
US20080168457A1 (en) | Method for trading resources between partitions of a data processing system | |
US11327767B2 (en) | Increasing resources for partition to compensate for input/output (I/O) recovery event | |
Anglano et al. | FC2Q: exploiting fuzzy control in server consolidation for cloud applications with SLA constraints | |
US10565021B2 (en) | Automated capacity management in distributed computing systems | |
JP5616523B2 (en) | Information processing system | |
US10643193B2 (en) | Dynamic workload capping | |
CN113544647A (en) | Capacity management in cloud computing systems using virtual machine family modeling | |
JP3993848B2 (en) | Computer apparatus and computer apparatus control method | |
EP3611620B1 (en) | Cost optimization in dynamic workload capping | |
US7925755B2 (en) | Peer to peer resource negotiation and coordination to satisfy a service level objective | |
Zhang et al. | PRMRAP: A proactive virtual resource management framework in cloud | |
KR102672580B1 (en) | Increased virtual machine processing capacity for abnormal events | |
US10176021B1 (en) | System and method for managing actual processing capacity usage on a mainframe computer |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
AS | Assignment |
Owner name: ZIT CONSULTING GMBH, GERMANY Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:PEETERS, JOHANNES G.J.;STOEHLER, FRIEDHELM HERBERT;DOEHLER, HORST WALTER;REEL/FRAME:041883/0288 Effective date: 20161213 |
|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.) |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 4 |
|
AS | Assignment |
Owner name: CA, INC., NEW YORK Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:ZIT CONSULTING GMBH;REEL/FRAME:047062/0654 Effective date: 20170627 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 8 |