WO2020246973A1

WO2020246973A1 - Network optimization systems

Info

Publication number: WO2020246973A1
Application number: PCT/US2019/035571
Authority: WO
Inventors: Anand Murugappan; Arnab Bose
Original assignee: Google Llc
Priority date: 2019-06-05
Filing date: 2019-06-05
Publication date: 2020-12-10

Abstract

An automated performance management system for maintaining or optimizing a client-server system. The automated performance management system comprises a model determination subsystem to determine model parameters representing a change in the response of the client-server system to changes in control parameters. The model determination subsystem is configured to, for each of a set of control parameters in turn: vary the control parameter, input data representing a change in a response value to determine a change in the response of the client-server system to variation of the control parameter, and determine a model parameter for the control parameter, the model parameter representing a partial derivative of the response value with respect to the control parameter. A performance control subsystem is coupled to the model determination subsystem and is configured to optimize the hierarchical system by setting control parameters on one or both of a client computer and a server computer.

Description

NETWORK OPTIMIZATION SYSTEMS

BACKGROUND

[1] This specification relates to systems and methods for maintaining and optimizing a hierarchical system of networked elements such as a client-server computer network.

[2] Maintaining large computer networks is a difficult problem. For example there may be large numbers of parameters to adjust, and these may interact in complex ways. There may be many thousands of different computers and users and the operation of the system may deteriorate over time. It may also be necessary to set up a new service from time to time, which may need to be optimized. Such maintenance and/or optimization will often need to be performed during live operation of the system, and one particular problem which then arises is the additional traffic incurred by this process.

SUMMARY

[3] In one aspect there is therefore provided an automated performance management system for maintaining or optimizing a hierarchical system of networked elements. In implementations the automated performance management system comprises one or more output data feeds to output data relating to a set of control parameters for the hierarchical system, and at least one input data feed to receive a response value relating to a response of the hierarchical system to adjustment of the set of control parameters.

[4] The automated performance management system may further comprise a model

determination subsystem coupled to the one or more output data feeds and to the input feed and configured to determine model data defining a model of the hierarchical system. The model data may comprise data defining model parameters representing a change in the response of the hierarchical system to changes in the control parameters. The model parameters may comprise, i.e. may be dependent upon or represent, a set of partial derivatives of the response value with respect to each of the set of control parameters.

[5] The model determination subsystem may be configured to, for each control parameter of the set of control parameters: output control data to vary the control parameter, input data representing a change in the response value to determine a change in the response of the hierarchical system to variation of the control parameter, and determine a model parameter for the control parameter, the model parameter representing a partial derivative of the response value with respect to the control parameter. In some implementations each control parameter is varied in turn, e.g. one control parameter at a time. Although this involves varying one control parameter at a time, where multiple models are employed (as described later) changes in multiple response values may be determined simultaneously.

[6] The automated performance management system may further comprise a performance control subsystem, in implementations coupled to the model determination subsystem, to the one or more output data feeds, and to the input feed. The performance control subsystem may be configured to adjust a subset of one or more of the set of control parameters to move the response value towards a target response value to maintain or optimize the hierarchical system. The subset of control parameters may comprise all the control parameters or it may be what is termed a proper subset, that is it may comprise fewer than all the control parameters.

[7] The performance control subsystem may output the adjusted control parameters e.g. on the one or more output data feeds, for controlling the hierarchical system, for example for setting control parameters on one or both of a client computer system and a server computer system.

[8] The target response value may be a numerical or a range, for example defined by an upper or lower threshold value. The target response value may be user-defined or predetermined, or set in some other manner.

[9] Implementations of the automated performance management system may significantly reduce network traffic and/or data processing requirements. Firstly it may be assumed that the system is not too far off a target configuration. This allows a locally linear model to be used in which one or just a few data points (e.g. less than 10 per control parameter) are used to characterize the model. This assumption is used by determining model parameters which comprising (or characterizing) the set of partial derivatives of the response value with respect to each of the set of control parameters - the assumption is that a change in the response value is predicted by a (small) change in a control parameter multiplied by its corresponding partial derivative. Assuming such a locally linear model means that the response to each control parameter may be determined by making just one change to the control parameter, giving two data points, the current value and new value of the parameter, which are sufficient together to define a straight line. Optionally e.g. three data points may be used, e.g. a current value of a control parameter and small steps up and down from this, i.e. two new values of the parameter with respective positive and negative changes to the value, but still the number of data points is small compared with the number which might otherwise be used to characterize a complex non-linear response.

[10] Secondly, it is assumed that the effects on the response value of varying the control parameters are independent of one another. Thus it is assumed that a change in the response value due to a change in one control parameter can be predicted by varying that one control parameter without taking into account whether or not another control parameter is varying. This assumption is used by varying each of the control parameters in turn to determine the model parameters - it is assumed that there is no need to apply all the different combinations of the varying control parameters.

[11] Taken together these approaches significantly reduce the number of“experiments” needed to characterize the response(s) of the hierarchical system of networked elements. The model parameters are then used in controlling the hierarchical system of networked elements using the performance control subsystem, to maintain or optimize the hierarchical system.

[12] In some implementations the performance control subsystem is configured to automatically adjust the subset of the set of control parameters by gradient descent to move the response value towards the target value. Thus in implementations the performance control subsystem is configured to implement the gradient descent as a sequence of gradient descent steps. Each step comprises adjusting each of the subset of control parameters dependent upon a product of a learning rate and the partial derivative of the response value with respect to the control parameter. The learning rate may be less than 0.1, for example around 0.05.

[13] The response of the hierarchical system may, for example, be defined by one or more client- server traffic parameters, e.g. representing client to server traffic. Unlike gradient descent in, say, machine learning in implementations the gradient descent steps are performed over an extended period of time, for example at time intervals of at least a day, a week, or a month.

[14] In some implementations a gradient descent step is performed conditionally, dependent upon a statistical significance of the step. The statistical significance of a step may be determined by a statistical test, and/or from a predicted error or confidence interval of the change in the response value e.g. by requiring less than a threshold error in order to perform a step.

[15] In implementations such a gradient descent approach only adjusts one response value but there may be a need to simultaneously adjust multiple response values, for example with the aim of bringing each them of them into a target range at the same time - where this is possible.

[16] Thus in some implementations the at least one input data feed receive a set of response values relating to the response of the hierarchical system to adjustment of the set of control parameters, each response value characterizing the response of a different aspect of the hierarchical system. The model determination subsystem may then be configured to determine model data defining a set of models of the hierarchical system, one for each response value. These may use some or all of the same set of control parameters. Each model has a respective set of model parameters (later 0_j) which define how the modelled response value responds to changes in the control parameters. The performance control subsystem may then apply an optimizer to adjust the control parameters to apply a set of constraints to the set of response values, to constrain one, two, or more of these values, to thereby maintain or optimize the hierarchical system. In implementations, however, the optimizer operates on predicted response values, that is the response values which are predicted by the set of models. [17] This approach allows any conventional optimizer to be applied to the problem of optimizing multiple system responses simultaneously. A conventional optimizer can be used because as previously described the models are (locally) linear and thus perform a linear transform of the control parameters. Thus the entire problem is linear and any suitable optimization may be employed, for example convex optimization implemented by a standard convex optimization solver package. The ability to solve for multiple constraints simultaneously can again very substantially reduce the data requirements of the system, resulting in much faster and more efficient configuration of the hierarchical system.

[18] In implementations the performance control subsystem may be further configured to apply the optimizer to adjust the control parameters to maximize or minimize an objective function of the set of response values. The objective function may comprise, for example, a linear combination of one or more of the response values. This may be maximized (or, equivalently, a negative of this may be minimized) using the same convex optimization as described above for the constraints. Again because the problem is linear the convex optimization may be applied to a modified objective function dependent upon the set of predicted response values.

[19] In some implementations the constraints may be stochastic constraints. Here a stochastic constraint may be a constraint to be satisfied with a certain probability e.g. a probability greater than

1— a where a < 0.5 or a < 0.1, for example. This can help to maintain or optimize a system where achieving one or more particular constraints may be difficult by allowing an adequate probability of optimization rather than expending large amounts of data/traffic/compute on attempting perfection. The value of a defines the“hardness” of a constraint; the same value may be used for each constraint or each constraint may have a respective value of a, which enables different constraints to have different“hardnesses”. Such a stochastic constraint may be implemented, for example, using convex optimization solver which performs second-order cone programming (SOCP) or a more general conic optimization.

[20] In some cases a perfect solution may not be possible, that is it may not be possible simultaneously to satisfy all the constraints. In such a case potentially large amounts of unnecessary data/traffic/computing power could be wasted by forcing the hierarchical system into a state of continuous“experimentation”.

[21] Thus in some implementations the performance control subsystem is configured to enhance the set of constraints by including a set of slack variables. In implementations each constraint has a respective slack variable. Each slack variable has the constraint of being equal to or greater than zero, which is applied by the convex optimization. In addition each constraint of the set of constraints is modified by a respective one of the slack variables, for example by subtracting the slack variable, to provide a modified constraint (later t_j— 5_j). The modified constraints are also applied by the convex optimization. The set of modified constraints and the set of constraints on the slack variables together constitute an enhanced set of constraints which may be applied by the convex optimization. In this way the performance of the hierarchical system may be maintained or optimized in a manner which allows for imperfect solutions. In implementations the enhanced set of constraints is applied stochastically as previously described.

[22] The slack values may also be included in the objective function, together with the linear combination of response values. For example, a (negative) term may be added to the objective function dependent upon the set of slack variables multiplied by one or more respective weights. For example a sum of the slack variables may be multiplied by a collective weight (later y). or each slack variable may be multiplied by a respective weight. The or each weight may be set to define a relative priority of meeting the set of constraints, collectively or individually, and maximizing (or minimizing) the objective function. For example a relatively larger weight will tend to prioritize meeting the constraints, that is achieving a slack of zero for one or more of the target response values.

[23] Optionally constraints may also be placed on one or more of the control parameters when applying an optimization as described above. Such a constraint may apply to one or more individual control parameters, and/or there may be a collective constraint on the control parameters, for example to inhibit the control parameters from moving further than a collective threshold distance metric from a current set of their values.

[24] In some implementations a system as described above may have provision for a user interface which allows a user to define one or more of: a constraint for a response value, use of a slack variable for a constraint, a probability or hardness of the constraint (e.g. a value for a), a weight for the constraint, and a constraint for a control parameter. However these data may also be loaded from storage, received from a remote source, or provided in some other manner.

[25] Although some implementations of the system vary one control parameter at a time to reduce the number of“experiments” performed on the hierarchical system, some of the benefits described may be obtained without such an approach. For example, the above described optimizing subject to multiple constraints may be implemented independently of how the model data defining the models is obtained, provided that the model(s) of the response value(s) is(are) linear with respect to the control parameters.

[26] In implementations, because there is an assumption of local linearity a model may only be valid for small changes in the set of control parameters, i.e. a partial derivative of a response value with respect to a control parameter may be assumed constant. Thus in implementations the model determination subsystem may be configured to determine updated model data after the performance control subsystem has adjusted the subset of control parameters. The model(s) may be updated after each adjustment of the set of control parameters, or there may be more than one update to the control parameters before updating the model(s). As previously described, updating the model parameters may comprise varying a control parameter, determining a change in the modelled response value, and determining a model parameter from a partial derivative of the response value. As previously described this may be done by varying one control parameter at a time.

[27] In some implementations the system may be configured to delay after adjusting the subset of control parameters and before the model determination subsystem determines updated model data, for example to allow the hierarchical system to settle. Determining updated model data may be performed over a period of time such as at least a day, week or month to allow time for a statistically meaningful quantity of data to be collected. In some implementations the system may be configured to alternate between operation of the model determination subsystem and the performance control subsystem. A period of such alternation may be at least a day, week, or month. This can facilitate traversal of a complex optimization surface by a real-world hierarchical system such as a client-server computer network.

[28] The performance control subsystem may also comprise a user interface to enable a user to adjust the control parameters to provide user-adjusted control parameters. The model determination subsystem may then determine a predicted response value from the user-adjusted control parameters and also a confidence interval for the predicted response value, for example upper and/or lower bounds for, say, a 90% or 95% confidence level. Thus the user interface may be configured to provide both the predicted response value and its confidence interval in response to user adjustment of the subset of control parameters. This information may be provided in any convenient manner, for example numerically or using a combination of one or more numeric values and a colour code for confidence. The system may output the user-adjusted control parameters to the hierarchical system in response to a user command.

[29] Providing a confidence interval for a predicted response value helps a user configure the hierarchical system as it can show which control parameters matter the most and whether, for example, the effect of adjusting other control parameters may be in the noise. This can therefore facilitate efficient user maintenance/optimization of the hierarchical system. However where this is relied upon it can be important that the confidence interval provided is reliable, but the locally linear model used is inherently approximate. Thus in some implementations the confidence interval includes a term which represents deviation of the model from linearity. This facilitates use of a low- overhead linear model as described, with fewer“experiments” used in determining the model and faster configuration of the hierarchical system based upon more accurate technical data, or at least a more accurate indication of which technical data may reliably be used to control the system operation.

[30] In some applications the hierarchical system of networked elements comprises a client-server computer system with at least one server computer system and multiple client computer systems. The control parameters may then control the operation of the client computer systems, for example controlling parameters relating to a rate and/or quantity and/or format of data transmitted to or received from a client computer system. For example the control parameters may include one or more control parameters characterizing data served by the server computer system to the client computer systems and a response value may characterize data traffic from a client computer system to the server computer system. In an HTTP -based client-server system a control parameter may be varied by the model determination subsystem using a cookie. In some implementations the model determination subsystem may be configured to vary the control parameters of just a subset of the client computer systems, so as to further reduce the model determination traffic overhead, for example a subset comprising less than 50%, 10% or 5% of the total number of client computer systems.

[31] In some other applications the hierarchical system of networked elements may comprise elements of a mobile phone network, such as radio access and/or core network elements, for example central and/or local servers, radio base stations, switching centres, base station controllers, radio network controllers and/or other elements. In other applications the hierarchical system of networked elements may comprise elements of a manufacturing plant or computer data centre.

[32] In a related aspect there is provided a method of maintaining or optimizing a hierarchical system of networked elements. The method may comprise determining model data defining a model of the hierarchical system. The model data may comprise data defining model parameters representing a change in a response of the hierarchical system to changes in the control parameters. The model parameters may comprise a set of partial derivatives of a response value relating to the response with respect to each of the set of control parameters. Determining the model data may comprise, for each control parameter of a set of control parameters in turn, one at a time, outputting control data to vary the control parameter, inputting data representing a change in the response value to determine a change in the response of the hierarchical system to variation of the control parameter, and determining a model parameter for the control parameter, the model parameter representing a partial derivative of the response value with respect to the control parameter. Maintaining or optimizing a hierarchical system may then further comprise adjusting a subset of one or more of the set of control parameters to move the response value towards a target response value.

[33] There is also provided a computer storage medium encoded with instructions that, when executed by one or more computers, cause the one or more computers to implement a system or method as described above.

[34] The details of one or more example implementations are set forth in the accompanying drawings and in the description below. Other features, aspects, and advantages of the subject matter will be apparent from the description, the drawings, and the claims.

BRIEF DESCRIPTION OF THE DRAWINGS

[35] Figure 1 shows an example automated performance management system for a client-server system. [36] Figure 2 shows a flow diagram of an example process for determining a model for use in an automated performance management process running on the system of Figure 1.

[37] Figures 3a and 3b show flow diagrams of example automated performance management processes for the system of Figure 1.

[38] Figures 4a and 4b illustrate the operation of processes of the type shown in Figure 3b.

[39] Like reference numerals and designations in the various drawings indicate like elements.

DETAILED DESCRIPTION

[40] Figure 1 shows shows an example automated performance management system 100 which may be implemented as computer programs on one or more computers in one or more locations. The performance management system 100 is configured for maintaining/optimizing a client-server system 160 comprising one or more server computer systems 162 coupled to multiple client computer systems 164 via a network 170, such as the Internet.

[41] The performance management system 100 comprises a model determination subsystem 110 coupled to an input data feed, here a system response interface 112. The system response interface 112 may comprise a query engine configured to query the client-server system 160 and to receive responses comprising response values, such as one or more metrics relating to the client-server system 160.

[42] The model determination subsystem 110 is also coupled to an output data feed, here a system control interface 114. The system control interface 114 may be configured to receive instructions e.g. in the form of an RPC (Remote Procedure Call) and to implement the instructions to adjust control parameters of the client-server system 160, for example relating to the format of data sent over network 170. The system control interface 114 may be coupled to or part of the system response interface 112, as indicated by the connecting dashed line.

[43] The performance management system 100 also comprises a performance control subsystem 120 also coupled to the output data feed e.g. to the system control interface 114, and in

implementations to the input data feed e.g. to the system response interface 112. In some

implementations the performance management system 100 is also coupled to, or includes, an optimizer 130, as described further later.

[44] A data store 140, which may be distributed, stores data for the system 100, such as one or more of configuration data defining the modelled response(s) and defining the control parameters on which they are dependent, model parameters for the model(s), constraints e.g. on the response value(s) and/or control parameters, and one or more objective(s) for the system.

[45] A user interface 150 is coupled to the model determination subsystem 110 and to the performance control subsystem 120 and has an associated interactive display 152. The user interface 150 may, for example, enable a user define one or more models i.e. a set of control parameters and an associated response value, to thereby configure the system. It may also enable a user to define one or more constraints and/or objectives for an automated performance management process, as described later. The user interface 150 may further enable a user to manually adjust one or more of the control parameters, and in response may provide one or more predicted responses of the client-server system 160, in implementations each including a confidence interval or with some other measure of error or confidence in the prediction(s). The predicted response(s) may be numeric and/or in the form of a graphical display.

[46] Figure 2 shows an example of a process that may be implemented on the automated performance management system 100 of Figure 1 to determine parameters of one or more models for maintaining and/or optimizing the client-server system 160.

[47] At step 200 the process identifies a set of control parameters, e.g. by loading them from data store 140 or obtaining them from the user interface or via a network connection. The process then adjusts each control parameter in turn and outputs the adjusted control parameter, via system control interface 114, to adjust the client-server system 160 accordingly (step 202). One or more response values are then read, via system response interface 112, and changes in these values are determined (step 204). There may be a delay between adjusting a control parameter of the client-server system and reading the response, to allow time to react (step 206). The process then estimates a model parameter from the or each response value, e.g. by determining a partial derivative of the response value with respect to the control parameter as described below, and stores this in data store 140 (step 208). The process loops until all the model parameters have been determined. Modification of a control parameter in the process of Figure 2 may be termed an“experiment”.

[48] In more detail, let X; denote the value of the rth control parameter and let the observed response of the client-server system to a set of n control parameters i.e. the response value, be defined by a function /(x_1; x₂, ... , x_n)· Consider a set of control parameters each modified by a small change A _j, ( -_L + s₁Ax₁, x₂ + S₂AX₂, , x_n + s_nAx_n) where, for each experiment, one s_t is set to +1 and/or - 1 and all the other S_j values are set to zero. In implementations the quadrant where the target response lies may be identified and then S_j set to +1 or -1 accordingly. This creates n, or 2 n“axial” experiments, which is sufficient to determine a local gradient with respect to the control parameters. Thus, assuming / is differentiable

[49] This allows the change in / to be estimated (predicted) for a small step e in any direction, i.e.

and changed x_t define the partial derivative; where s_t takes values of both +1 and -1 there are three points which can be used for a linear fit to estimate The approximation is better for smaller changes (or if / is linear). A change in a control parameter of the client-server system 160 may be implemented by providing a cookie from the server 162 to the client 164 e.g. instructed by system control interface 114. Varying each control parameter in turn and constructing a regression estimator of / as described uses less data than, e.g., observing / for all combinations of S_j (which would need 2ⁿ experiments), although the coverage of the control parameter space is reduced.

[50] In implementations the set of partial derivatives

may be used as the model parameters.

The process of Figure 2 may be used to determine a set of models, one for each of a corresponding set of response values. The models may have the same set of control parameters, or disjoint or overlapping sets of control parameters. The model parameters of multiple models may be determined in parallel by varying a control parameter and measuring the corresponding change in two or more response values.

[51] The user interface 150 may be configured to enable a user to adjust the set of control parameters and to output one or more predicted response values determined by the model(s), as indicated by example display 152. It is then useful to provide a confidence interval or similar for the response value, to facilitate a user identifying which control parameters are important in controlling the response of the client-server system 160.

[52] For an error e, /(x + e) « /(x) + åi e / (where x and e are vectors) and the variance

Var /(x + e) = å_t ef of (each x_t is varied independently, in turn, and

2 experiment variance

covariances are zero) and an estimate of of is (Axi . A confidence interval may then be defined in terms of the total variance, e.g. plus or minus some number of standard deviations where the standard deviation is the square root of Var /(x + e ).

[53] However this assumes that the response of the client-server system 160 is linear and in general error estimates under this linearity assumption do not work well. Thus an extra term may be added to account for the model bias.

[54] For generality consider a model of m metrics i.e. response values: An m x 1 vector of response values y may be linearly modelled as y = Cb + e where X is an m x n matrix of control parameters, e is an m x 1 experimental error vector, and b is a vector of model parameters of length n. (When modelling small changes in control parameters around their current values the intercept may be disregarded). In a linear model an estimate of b which minimizes the mean square error is b = (X^TX) ¹X^Ty. An observation of a response value y_;- depends on the unobserved ground truth yj and experimental error, y_;· = yj +

. The error err due to deviation from the linearity assumption is yj = C b + errfX_jf and the variance of err is given by t²c^t (X^T X ¹x where t² is estimated by [55] The terms may be calculated directly from the response values and their variances, the control parameter values X and the linear model parameters b . In practice a matrix A = t² (X^TX)^~1 may be computed when the model parameters are determined and then t²c^T(C^TC)^~1c = x^T Ax.

[56] Figure 3a shows a first example of a process that may be implemented on the performance management system 100 of Figure 1 to automatically maintain or optimize the client-server system 160. The process of Figure 3a uses gradient descent to maximize or minimize a response value, or a function of the response value which may be termed an objective function.

[57] The process identifies the control parameters concerned and a response value for a target response (300), for example as defined by the user interface 152, or the configuration data loaded from data store 140, or by data obtained over a network. Experiments are performed using the process of Figure 2 to determine parameters of a model linking the control parameters to the target response (302); the model parameters may be stored in the data store 140. The process optimizes an objective function, i.e. it determines maximum or minimum value of the objective function. The objective function may comprise or consist of a response value associated with the response; e.g. it may be the response value or a function of the response value. As previously described there may be a model parameter for each control parameter, which may comprise a partial derivative of the objective function with respect to the control parameter.

[58] The process then obtains current values of the control parameters x . These may be known because values of the parameters have previously been defined by the process or in some other way, or they may be read from the client server system (304).

[59] The process then performs a gradient descent (or ascent) step according to a learning rate a to optimize the objective function (306). The learning rate may be e.g. loaded from the data store 140 and/or user-defined; it may be in the range 1-10% e.g. around 5%. Performing the gradient descent (or ascent) step may comprise adjusting each of the control parameters (or a subset thereof) by the product of a gradient determined by the respective model parameter and the learning rate:

[60] The process then outputs the adjusted control parameters to the client-server system 160. The process may wait for a period such as a day, week, or month or longer, depending upon the rate of response of the client-server system, and/or until a variance of the response value drops below a threshold (308). The process then loops back either to step 302 to update the model or to step 304 to perform another gradient descent step. For example the system may perform a few gradient descent steps then update the model. The learning rate may be adjusted, e.g. reduced, as optimization progresses. Taking a gradient descent step may be conditional on whether the estimated gradient e.g. a norm of the gradient is significant, e.g. by a statistical measure and/or manual approval. The process may loop continuously or until a target is reached; the target may be defined by a threshold value of the objective function.

[61] Figure 3b shows a second example of a process that may be implemented on the performance management system 100 of Figure 1 to automatically maintain or optimize the client-server system 160. The process of Figure 3b is able to optimize multiple responses simultaneously, e.g. according to a set of constraints they should meet.

[62] The process identifies a set of response values for a set of target responses (350), and the control parameters for each. The set of response values may be defined by the user interface 152, or by the configuration data loaded from data store 140, or by data from a network connection.

Experiments are performed using the process of Figure 2 to determine parameters of each of a set of models, each model defining a change in a response value dependent upon changes in the control parameters (352).

[63] The process then obtains current values of the control parameters x as previously described, as well as values of any constraints i.e. bounds on the control parameters and/or response value(s) and, optionally, an objective defined by an objective function (354). Here constraints on the response value(s) may comprise target or threshold maximum or minimum values for the optimization process e.g. y_/ > t_j. The constraints and objective may be obtained e.g. from user interface 150 and/or data store 140 and/or via a network connection. Defining the objective is optional as the process may be used to apply constraints without any further explicit objective.

[64] The objective function for the process may comprise a linear combination of the response values, i.e. metrics, to be maximized (or minimized) with respect to the constraints (bounds), e.g. CJ _j where the vector c defines the linear combination. This may be used to determine a modified objective function defined in terms of predicted response values, i.e. in terms of the model parameters operating on the control parameters, more specifically small changes in the control parameters (356).

[65] In implementations the constraints are also defined in terms of predicted response values, i.e. in terms of the model parameters operating on the control parameters, more specifically small changes in the control parameters. Where a probabilistic approach is employed, i.e. the constraints are stochastic to be satisfied with a minimum probability, each constraint may include a term defining the minimum probability. Each constraint may also include a term dependent upon the prediction error of the model for the corresponding response value, e.g. a term dependent upon the noise or variance of the response value y.

[66] The process may then apply the optimizer 130 to the current values of the control values obtained in step 354 and to the (modified) objective function and constraints to determine a set of changes to the control parameters Ax_t (358). These are applied to the client-server system 160, either directly or after manual review (360). Optionally a random component may be added to the changes to the control parameters, which can help make the system more robust to model errors. Optionally one or more auxiliary outputs may be provided, e.g. comprising values of the slack variable(s) and of the objective function value and a confidence interval (or similar measure) for the objective function value. These may be provided via the user interface to facilitate manual supervision and/or intervention if necessary e.g. for safety.

[67] The process may then wait for a period such as a day, week, or month or longer, depending upon the rate of response of the client-server system, and/or until the variance of the response values drops below a threshold (362). The process may then loop back to adjust the control parameters again and/or to update the models. Optionally control parameter boundaries and/or a learning rate may be set or adjusted. The control parameter learning rate may comprise a limit on the maximum change in a control parameter or on the set of control parameters, e.g. ||Dc|| < r. The process may be run continuously and/or until one or more of the constraints are satisfied, taking into account the slack variables if implemented.

[68] In more detail, a predicted response value y_/ of a set of m response values may be linearly modelled as y_; = Q_jAx where Dc is a vector of control parameter changes. The expected prediction error is distributed

1/2

covariance matrix and å_/ = E such that E X E = å_j . The on-diagonal terms in å_;come from the experiments; the off-diagonal terms in å_j may be zero where independent experiments are conducted and only one Ax_t is varied at a time; in implementations independent experiments are useful but not essential. Here, as previously, Dc refers to an n x 1 vector comprising a set of control parameter changes, e.g. of percentage changes in the parameters, and 9_j refers to the set of model parameters for y_j as a function of the set of control parameters. In implementations å and the previously described matrix A are used for the same purpose and thus may be the same matrix.

[69] The objective function may be to maximize c y_;·, where C_j define weights of the metrics y_j, subject to y_; > t_j (a lower bound may be defined by defining a new metric—y_j). The objective may then be re-written in terms of predicted values, maximize c'jAx, where c'J =

cj [q[, q_l ^t, ... , Q _n] and the constraints may be re-written Q/Ac ³ t_j . This may then be provided to the optimizer 130, which may employ conventional convex optimization e.g. linear programming, to maximize the objective function subject to the constraints. Optionally control parameter boundaries e.g. l_[ £ Ax _I £ Ui where l and U_j define lower and upper bounds respectively, and/or ||Dc|| < r.

[70] The prediction errors mean that in practice a constraint may not be satisfied and a control parameter may need to be adjusted to a large degree to have more confidence that the constraint is satisfied. This is illustrated in Figure 4a, where for a distribution of metric values shown by dashed line 400 a threshold indicated by line 402 only provides a 50% chance of meeting the constrains whereas a threshold indicated by line 404 provides a greater, 1— a probability where a < 0.5.

[71] In this case the constraints are P(

which may be re-written as

where F() is the normal distribution function (and since a < 0, F ¹(a) is negative), and where may be determined from Dc^t å_jAx. Thus F ¹ (l— a^Ax⁷ å_jAx may be subtracted

from the objective provided to the optimizer 130. In this case the convex optimizer may comprise a Second Order Cone Programming optimizer, as implemented by many standard convex optimization packages.

[72] In some cases it may not be possible to simultaneously meet all the constraints, in which case it is desirable for the system to approach a constraint so far as possible. However this is complicated with multiple constraints. A solution to this is to define what are here termed slack variables, where a slack variable is associated with a constraint and to formulate the optimization task so that the total slack is minimized. This is conceptually illustrated in Figure 4b, which shows solving only for constraints (without an additional objective). A desired target region 450 for the metrics, i.e. response values, cannot be met but each metric has some respective slack, illustrated by the length of line 452 for Metric 2. A current set of control parameter values defines a point 454 corresponding to values of Metric 1 and Metric 2, and the optimization process moves the client-server system 160 to point 456, when a new set of experiments is conducted. In this way the client-server system 160 approaches the target region for the metrics, though it may not be able to operate within the target region.

[73] To implement this approach the objective function may be to maximize cjy_j— g(d₁ + d₂ +— l· d,_h) where one or more or each response value y_;- has a respective slack variable <5). A constant, y, determines the respective weight of maximizing

cj y_;- and of meeting the constraints in the objective function: if g is large meeting the constraints is prioritized. In some implementations each value of <5) may have a respective multiplier y, . The objective function is maximized with respect to y_; > t_j— <5) and <5) > 0 to determine Dc and the <5js.

[74] Thus in implementations the constraints are P(

and dy > 0. This is formulated for optimizer 130 as:

Or equivalently [75] As before this may be solved by a standard convex optimizer e.g. a Second Order Cone Programming optimizer. A hard constraint may have a_;- small, e.g. -0.05, and no slack variable, whilst a soft constraint may have a slack variable and a_;- - 0.5, with a continuum of options in between.

[76] Embodiments of the systems and methods described herein may be implemented as one or more computer programs comprising instructions for execution or interpretation by, or to control the operation of, data processing apparatus.

[77] The instructions may be encoded on an electrical, electromagnetic or optical signal or provided on a non-transitory computer storage medium. The computer storage medium may comprise a machine-readable storage device, substrate, or memory device, or a combination of one or more of these (but is not a propagated signal). Thus the computer storage medium may comprise, for example, a disk or programmed memory. The instructions may be distributed across multiple computers and/or sites interconnected by a communication network.

[78] The instructions may comprise, for example, source, object or executable code in any type of programming language, e.g. compiled, interpreted, declarative or procedural, or code for a hardware description language. The computer program(s) may be stand-alone or may be for implementation is a computing environment e.g. as one or more additional modules.

[79] The data processing apparatus may comprise any kind of apparatus, device, or machine for processing data, such as a one or more programmable processors or computers, or special purpose logic circuitry such as an FPGA (Field Programmable Gate Array) or an ASIC (Application Specific Integrated Circuit). The apparatus may include code that creates an execution environment for the computer program(s) such as an operating system and/or database management system or protocol stack.

[80] The client-server system may have the client and server remote from one another and may interact via a communications network such as a local area network (“FAN”), a wide area network (“WAN”), and/or the Internet. The client computer may have a graphical user interface and/or a Web browser; the server may be a data server and/or may include a middleware component such as an application server.

[81] Some example implementations have been described but various modifications may be made. The operations are shown in the drawings in a particular order but this should not be understood as requiring that the operations are performed in the order shown or even in sequential order, or that all the operations must be performed, to achieve desirable results. For example in some implementations multitasking and parallel processing may be useful. Thus it will be understood that the invention is not limited to the described embodiments and encompasses modifications apparent to those skilled in the art lying within the scope of the claims appended hereto.

Claims

1. An automated performance management system for maintaining or optimizing a hierarchical system of networked elements, the system comprising: one or more output data feeds to output data relating to a set of control parameters for the hierarchical system, and at least one input data feed to receive a response value relating to a response of the hierarchical system to adjustment of the set of control parameters; a model determination subsystem coupled to the one or more output data feeds and to the input feed and configured to determine model data defining a model of the hierarchical system, the model data comprising data defining model parameters representing a change in the response of the hierarchical system to changes in the control parameters, the model parameters comprising a set of partial derivatives of the response value with respect to each of the set of control parameters, wherein the model determination subsystem is configured to, for each control parameter of the set of control parameters in turn, one at a time: output control data to vary the control parameter, input data representing a change in the response value to determine a change in the response of the hierarchical system to variation of the control parameter, and determine a model parameter for the control parameter, the model parameter representing a partial derivative of the response value with respect to the control parameter; a performance control subsystem coupled to the model determination subsystem, to the one or more output data feeds, and to the input feed, and configured to adjust a subset of one or more of the set of control parameters to move the response value towards a target value to maintain or optimize the hierarchical system.

2. A system as claimed in claim 1 wherein the performance control subsystem is configured to automatically adjust the subset of the set of control parameters by gradient descent to move the response value towards the target value.

3. A system as claimed in claim 2 wherein the performance control subsystem is configured to implement the gradient descent as a sequence of gradient descent steps, wherein each step comprises adjusting each of the subset of control parameters dependent upon a product of a learning rate and the partial derivative of the response value with respect to the control parameter, and wherein the gradient descent steps are performed at intervals of at least a day, a week, or a month.

4. A system as claimed in claim 2 or 3 wherein the performance control subsystem is configured to implement the gradient descent as a sequence of gradient descent steps, wherein each step comprises adjusting each of the subset of control parameters dependent upon a product of a learning rate and the partial derivative of the response value with respect to the control parameter, and wherein the gradient descent steps are performed conditionally dependent upon a statistical significance of the step.

5. A system as claimed in claim 1 wherein the at least one input data feed is configured to receive a set of response values relating to the response of the hierarchical system to adjustment of the set of control parameters, each response value characterizing the response of a different aspect of the hierarchical system, wherein the model determination subsystem is configured to determine model data defining a set of models of the hierarchical system, one for each response value, each model having a respective set of model parameters, and wherein the performance control subsystem is configured to apply an optimizer to adjust the control parameters to apply a set of constraints to the set of response values to maintain or optimize the hierarchical system, wherein the optimizer is configured to apply convex optimization to a set of predicted response values, wherein the predicted response values are response values predicted by the set of models.

6. A system as claimed in claim 5 wherein the performance control subsystem is further configured to apply the optimizer to adjust the control parameters to maximize or minimize an objective function of the set of response values by determining, from the objective function, a modified objective function dependent upon the set of predicted response values and applying the convex optimization to the modified objective function.

7. A system as claimed in claim 5 or 6 wherein the set of constraints comprises stochastic constraints, where a stochastic constraint is a constraint to be satisfied with a probability greater than 1— a where a £ 0.5 or a £ 0.1.

8. A system as claimed in claim 5, 6 or 7 wherein the performance control subsystem is configured to enhance the set of constraints by including a set of slack variables to provide an enhanced set of constraints, wherein each of the slack variables has a constraint of being equal to or greater than zero, wherein in the enhanced the set of constraints each of the constraints of the set of constraints to the set of response values is modified to a constraint on the response value adjusted by a respective one of the slack variables, and wherein the performance control subsystem is configured to apply the optimizer to apply the enhanced set of constraints to the set of response values to maintain or optimize the hierarchical system.

9. A system as claimed in claim 8 when dependent upon claim 6 wherein the performance control subsystem is configured to modify the objective function to add a term dependent upon the set of slack variables multiplied by one or more respective weights, wherein the or each weight determines a relative priority of the set of constraints and the objective function in maintaining or optimizing the hierarchical system.

10. A system as claimed in any one of claims 5-9 wherein the performance control subsystem is further configured to apply the optimizer to adjust the control parameters subject to one or more parameter constraints on the control parameters.

11. A system as claimed in any preceding claim wherein the model parameters define a model linear in the control parameters, and wherein the model determination subsystem is configured to determine updated model data after the performance control subsystem adjusts the subset of control parameters.

12. A system as claimed in claim 11 wherein the system is configured to delay after adjusting the subset of control parameters before the model determination subsystem determines updated model data, to allow the hierarchical system to settle.

13. A system as claimed in any preceding claim configured to alternate between operation of the model determination subsystem and the performance control subsystem, wherein a period of the alternation is at least a week.

14. A system as claimed in any preceding claim wherein the performance control subsystem comprises a user interface configured to enable a user to adjust the subset of control parameters to provide user-adjusted control parameters, wherein the model determination subsystem is configured to determine a predicted response value from the user-adjusted control parameters and a confidence interval for the predicted response value, and wherein the user interface configured to provide the predicted response value and the confidence interval in response to user adjustment of the subset of control parameters and, in response to a user command, to output the user-adjusted control parameters to the hierarchical system.

15. A system as claimed in claim 14 wherein the model parameters define a model linear in the control parameters, and wherein the confidence interval includes a term which represents deviation of the model from linearity.

16. A method of maintaining or optimizing a hierarchical system of networked elements, the method comprising: determining model data defining a model of the hierarchical system, the model data comprising data defining model parameters representing a change in a response of the hierarchical system to changes in the control parameters, the model parameters comprising a set of partial derivatives of a response value relating to the response with respect to each of the set of control parameters, wherein determining the model data comprises, for each control parameter of a set of control parameters in turn, one at a time: outputting control data to vary the control parameter, inputting data representing a change in the response value to determine a change in the response of the hierarchical system to variation of the control parameter, and determining a model parameter for the control parameter, the model parameter representing a partial derivative of the response value with respect to the control parameter; and adjusting a subset of one or more of the set of control parameters to move the response value towards a target value to maintain or optimize the hierarchical system.

17. A system or method as recited in any preceding claim wherein the hierarchical system of networked elements comprises a client-server computer system with at least one server computer system and multiple client computer systems.

18. A system or method as claimed in claim 17 wherein the control parameters control the operation of the client computer systems, and wherein the model determination subsystem is configured to vary the control parameters of a subset of the client computer systems, wherein the subset is less than 10% of the total number of client computer systems.

19. A system or method as claimed in claim 17 or 18 wherein the server computer system is configured to serve data to the client computer systems, wherein the control parameters include one or more control parameters characterizing the data, and wherein the response value characterizes data traffic from the client computer systems to the at least one server computer system.

20. A computer storage medium encoded with instructions that, when executed by one or more computers, cause the one or more computers to implement the system of any one of claims 1-15, the method of claim 16, or the system or method of any one of claims 17-19.