WO2011128921A1

WO2011128921A1 - Automated service time estimation method for it system resources

Info

Publication number: WO2011128921A1
Application number: PCT/IT2010/000164
Authority: WO
Inventors: Di Milano Politecnico; Paolo Cremonesi; Kanika Dhyani; Stefano Visconti
Original assignee: Neptuny S.R.L.
Priority date: 2010-04-15
Filing date: 2010-04-15
Publication date: 2011-10-20
Also published as: WO2012020328A1

Abstract

A method for upgrading or allocating resources in a IT system is disclosed. The method comprises the steps of collecting a dataset by sampling utilization versus workload of a resource in the IT system and then analyzing said dataset to obtain service time through clusterwise regression procedure, said service time being used to trigger the upgrade or allocation of said resources, wherein method further comprises the following steps: (i) normalize collected dataset, (ii) scatter data when utilization has been rounded, (iii) provide for partition of data to find density based clusters through DBSCAN procedure, (iv) discard clusters with less than the z% of the total number of observations, (v) in each cluster, perform clusterwise regression and obtain linear sub-clusters in a pre-defined number, (vi) reduce sub-clusters applying refinement procedure, removing subclusters that fit to outliers and merging pairs of clusters that fit the same model, (vi) update clusters with the reduced sub-clusters, (vii) remove globular clusters, (viii) reduce number of clusters with refinement procedure, and (ix) de-normalize results.

Description

Automated Service Time Estimation Method for IT System Resources

Field of the invention

The present invention relates to a service time estimation method for IT system resources. In particular, it relates to an estimation method based on DBSCAN methodology.

Background

As known, queuing network models are a powerful framework to study and predict the

performance of computer systems, i.e. for capacity planning of the system. However, their parameterization is often a challenging task and it cannot be entirely automatically performed. The problem of estimating the parameters of queuing network models has been undertaken in a number of works in the prior art, in connection with IT systems and communication networks.

One of the most critical parameters is the service time of the system, which is the mean time required to process one request when no other requests are being processed by the system. Indeed, service time estimation is a building block in queuing network modelling, as diagrammatically shown in fig. 1A.

To parameterize a queuing network model, service time must be provided for each combination of service station and workload class. Unfortunately, service time measurements are rarely available in real systems and obtaining them might require invasive techniques such as benchmarking, load testing, profiling, application instrumentation or kernel instrumentation. On the other hand, aggregate measurements such as the workload and the utilization are usually available.

According to the utilization law, the service time can be estimated from workload (=throughput of the system) and utilization using simple statistical techniques such as least squares regression. However, anomalous or discontinuous behaviour can occur during the

observation period. For instance, hardware and software may be upgraded or subject to failure, reducing or increasing service time, and certain back-

l ground tasks can affect the residual utilization. The system therefore has multiple working zones, each corresponding to a different regression models, which shall be correctly detected and taken into consideration. This task, according to the prior art, cannot be efficiently automatically performed.

Two examples of a poor detection of regression models is shown in figs. IB and 1C: here the single regression line is not effectively and correctly representing the behaviour of sampled data from two IT systems.

The problem of simultaneously identifying the clustering of linearly related samples and the regression lines is known in literature as cluster- wise linear regression (CWLR) or regression-wise clustering and is a particular case of model based clustering. This problem find immense applications in areas like control systems, neural networks and medicine.

This problem has already been addressed to using different techniques, but usually it requires some degree of manual intervention : i.e. it is required a human intelligence to detect at least the number of clusters within the dataset points and supply the correct value of some parameters to the chosen algorithm.

An object of the present invention is hence to supply an enhanced method for estimating these regression models and correctly classifying observation samples according to the regression model that generated them, so as to correctly plan capacity and upgrading of the system.

In other words, given n observations of workload versus utilization of an IT system, it is required to identify the number k of significant clusters, the corresponding regression lines (service time and residual utilization), cluster membership and outliers. Based on this identification, estimation of the IT system behaviour over a wide range of workload and utilization can be inferred, so that automatic upgrading or allocation of hardware/software resources can be performed in the system.

Summary of the invention

The above object can be obtained through a method as defined in its essential terms in the attached claims.

In particular, a new method is provided that combines density based clustering, cluster wise regression and a refinement procedure. While service time estimation according to the prior art considered the functional regression model, in which errors only affect the independent variable (the utilization), the method of the invention is based on the structural regression model, in which there is no distinction between dependent and independent variable. While it makes sense to consider the workload a controlled variable, using the structural model for regression is less prone to underestimate the service time when the model assumptions are not met. Results obtained with this method yields more accurate results than existing methods in many real-world scenarios.

Moreover, it shall be noted that according to the prior art service time estimation is based on standard regression (executed on the vertical distance, i.e. along the ordinate axis) and utilization is considered the independent variable and the workload is assumed to be error-free: then, if this assumption does not hold, the estimator is biased and inconsistent. By contrast, according to the invention an orthogonal regression has been chosen, which proved to yield the best results on most performance data sets. This approach proved to be effective also because aggregate measurements are often used for workload and utilization : for example, if observation is done on a web server to get page visits vs CPU utilization

- not all pages count the same in terms of CPU utilization,

- even if there is no error in CPU utilization measurements, the data will not be perfectly fit by a straight line,

and this is due to different mixtures of page accesses during different observation periods.

According to the method of the invention, it has been chosen to leave occurrence of overestimation of the number of clusters, so as to rely on an automatic procedure, and the reduce the number of clusters to the correct one through refinement procedure.

Brief description of the drawings

Further features and advantages of the system according to the invention will in any case be more evident from the following detailed description of some preferred embodiments of the same, given by way of example and illustrated in the accompanying drawings, wherein :

fig. 1A is a diagram showing the concept of utilization law and service time in a IT system;

figs. IB and 1C are exemplary plots of regression lines obtained according to the prior art;

fig. 2 is representing the conversion of a dataset with rounded utilization into a plot of scattered data;

fig. 3 is a flow chart showing the main steps of the method of the invention;

figs. 4 and 5 are exemplary plots of dataset after applying DBSCAN and VNS;

figs. 6A-6C are plots of clusters upon applying refinement procedure; fig. 7 is an exemplary plot of a dataset where three critical clusters are identified; and

fig. 8 are plots of different datasets showing the difference between cluster removal and cluster merging under the refinement procedure.

Detailed description of a preferred embodiment of the invention

The utilization law states that U = XS, where X is the workload of the system, S is the service time and U is the utilization. According to the utilization law, when no requests are entering the system, utilization should be zero. This is not always the case, due to batch processes, operating system activities and non-modelled workload classes. Therefore, there is a residual utilization present. If we represent residual utilization with the constant term R, the utilization law becomes U = XS + R.

In other terms, utilization law is a an equation of a straight line, where service time is the slope of a regression line and residual utilization (due to not modelled work) is the intercept of the regression line.

During an observation period, hardware and software upgrades may occur, causing a change in the service time. At the same time, background activities can affect the residual utilization. Therefore, the data is generated by k > 1 linear models:

U = XS1 + R1

U = XS2 + R2

U = XSk + Rk According to the invention, it is assumed the error-in-variables (EV) model, therefore if we let (X , U ), (X₂\ U₂ ^*) (X_n ^*, U_n ) be the real values generated by the model, the observations (Xf, Uj^*) are defined as X = Χ +η, and Uj=Xj^*S+¾, where η, and £i are random variables representing the error. The choice of the EV model is moti- vated later on in the specification. Given the set of observation samples (affected by hardware/software upgrades, by background or batch activities and by outliers), the task is to simultaneously estimate the number of models k that generated the data, the model parameters (S_j ;R_j) for j e {1, k} and a partition of

the data (d, C_k where Q c {1, ...,n}, |C_j| >2, C C_k = 0 for C_j≠C_k and

CiU... uC_k={l, 2, ...n} such that the observations in cluster were generated by the model with parameters (Sj ;Rj ).

In other words, it is required to simultaneously estimate the regression lines (clusters) and cluster membership, problem which is known in literature as clusterwise regression problem.

A real dataset is given from sampling utilization versus workload in a

IT system (for example a CPU of a computer). Said dataset is analyzed to obtain proper service time to be later used to trigger upgrading or allocation of hardware resources in the system. To this purpose, the following steps are performed on the dataset according to the method of the invention:

1. Normalize data

2. Scatter data if utilization has been rounded

3. Find density based clusters (DBSCAN)

4. Discard clusters with less than the z% of the total number of observations

5. In each cluster, perform clusterwise regression and obtain sub- clusters

- reduce sub-clusters with refinement procedure

- update cluster list with the sub-clusters

6. Remove "globular cluster" 7. Reduce clusters with refinement procedure

8. Post-processing: shared points and outliers

9. De-normalize results (Renormalize regression coefficients)

The method proposed according to the invention will be called RECRA (Refinement Enhanced Clusterwise Regression Algorithm). The general principle of this method is to obtain an initial partition of the data into clusters of arbitrary shape using a density based clustering algorithm. In the next step, each cluster is split into multiple linear subclusters by applying a CWLR algorithm. The number of subclusters is fixed a priori and should be an overestimate. A refinement procedure then removes the subclusters that fit to outliers and merges pairs of clusters that fit the same model. In the next step the clusters are replaced by their subclusters and the refinement procedure is run on all the clusters, merging the ones that were split by the density based clustering algorithm (see fig. 3).

1. Normalize data

Data are normalized so as not to introduce further errors.

2. Scatter data

When utilization data have been rounded, scattering of the data is required to prevent existence of clusters of perfectly collinear points. For example, as seen in fig. 2, integer CPU utilization has been rounded (left plot) and then value U is scattered using uniform [-0.5,0.5] noise (right plot): collinear sample points, due to the sampling methodology, can be hidden so as to prevent false determination of collinear clusters.

3. DBSCAN (density based clustering) application

An initial clustering partition is obtained through DBSCAN (Ester M., Kriegel H.P., Sander J., and Xu X. "A density-based algorithm for discovering clusters in large spatial databases with noise"), which is a well known clustering algorithm that relies on a density-based notion of clusters. Density-based clustering algorithms can successfully identify clusters of arbitrary shapes. The DBSCAN method requires two parameters: the minimum number of points and ε, the size of the neighbourhood. The ε parameter is important to achieve a good clustering. A small value for this parameter leads to many clusters, while a larger values leads to less clusters. Accord- ing to the prior art, it is suggested to visually inspect the curve of the sorted distances of the points to their k-neighbour (sorted k-distance) and choose the knee point of this curve as ε . According to the invention, since the method shall be performed automatically, the 95-percentile of the sorted k- distance is picked.

The solution of picking up 0.95-quantile of the sorted k-distance works well on typical datasets sampled from IT systems; in any case, even when it doesn't work properly, the method of the invention provides for subsequent steps which adjust the result. In fact, if it is too big with respect to the theoric correct value, less clusters than desired are obtained and the clusterwise regression step will split them; if it is too small, more clusters than desired are obtained and refinement procedure will merge them.

Applying density based clustering at this stage of the method has two advantages:

- it reduces the complexity of the problem undertaken by the cluster- wise regression technique (estimating regression lines in two small cluster is much easier than finding regression lines in two big clusters, since the scope of the search is restricted).

- it prevents too many regression lines to be produced on the same cluster.

This often happens when one of the clusters is very "thick" with respect to the others. Many regression lines will be used to minimize the error in the dense cluster and only one or few regression lines will be used to fit the other clusters, causing two or more clusters being fitted by the same regression line.

In some cases this density based clustering step might separate the data produced by the same regression model in two clusters. This usually happens when the observations produced by the same regression model are centred around two different workload values. Unless the clusters are ex- tremely spares, these cases can be effectively addressed to in the following refinement step.

4. Discarding clusters Clusters having less than the z% of the total number of observations are discarded as not significant.

5. Clusterwise regression and refinement

During this step, a clusterwise regression algorithm is applied, with an overestimate of the number of clusters. Various algorithms can be used for the clusterwise regression step. According to a preferred embodiment of the invention, the VNS algorithm proposed in "Caporossi G. and Hansen P., Variable neighbourhood search for least squares clusterwise regression. Les Cahiers du GERAD, G-2005-61" is used. This method uses a variable neighbourhood search (VNS) as a meta-heuristic to solve the least squares clusterwise linear regression (CWLR); in particular, it is based on ordinary least squares regression. This method of performing regression is non- robust and requires the choice of an independent variable. Service time estimation in the prior art have considered the utilization as the independent variable, but if this assumption does not hold, the estimator is biased and inconsistent. Orthogonal regression, on the other hand, is based on an er- ror-in-variables (EV) model, in which both variables considered are subject to error. Computational experiments made by the applicant have shown that orthogonal regression yields the best results on many performance data sets. This is understandable since it is often convenient to choose aggregate measurements to represent the workload. For example, in the context of web applications, the workload is often measured as the number of hits on the web server, making no distinction among different pages, despite the fact that different dynamic pages can have well-distinguished lev- els of CPU load. It is easy to see why, even if we assume the error in the measurement of utilization to be zero, the data will not be perfectly fit by a straight line, due to different mixtures of page accesses during different observation periods. The approximation done by choosing aggregate measurements for workload is taken into account by the EV model, but not by regular regression models. It is worth pointing out that in cases in which the assumption of having errors in both variables does not hold, regular regression techniques would provide better results. Because of this observation, according to the invention a modified VNS is used, using a regression method which is robust and based on the errors-in-variables model, thus measuring orthogonal distances between the points and the regression lines. A preferred method is based on the methodology proposed in "Fekri M. and Ruiz-Gazen A. Robust weighted orthogonal regression in the errors- in-variables model, 88:89-108, 2004.", which describes a way of obtaining robust estimators for the orthogonal regression line (equivalent major axis or principal component) from robust estimators of location and scatter. The MCD estimator (Rouseeuw P.J. Least median of squares regression. Journal of the American Statistical Association, 79:871-881, 1984) is used for loca- tion and scatter, which only takes into account the h out of n observations whose covariance matrix has the minimum determinant (thus removing the effect of outliers). Preferably a fast version of this estimator (based on Rousseeuw P.J. and van Driessen K. A fast algorithm for the minimum co- variance determinant estimator. Technometrics, 41 :212-223, 1998.) is used and to ensure the performance of VIMS a high value of h shall be set. The one step re-weighted estimates are computed using the Huber's discrepancy function (see Huber P.J. "Robust regression: asymptotics, conjectures and monte carlo", The Annals of Statistics, 1 :799-821, 1973).

The VNS step of the method is working in the following manner. Given the number of clusters K, the K clusters whose regression lines provides the best fit of the data shall be found. Then convergence until a certain condition is met is followed through:

(i) local search:

-If the error is smaller than previous best, the result is saved and perturbation intensity is set as p = 1;

- else, perturbation intensity is set as p = p % (K - 1) + 1

(ii) perturbation of the solution.

Local search is performed by reassigning sample points to the closest cluster (distance from the regression line), then computing regression lines and repeat the same procedure until no points are required to be moved and reassigned to a closer cluster.

Perturbation (also called "split" strategy), is performed by applying p times the following procedure: - take a random cluster and assign one of its points to an other cluster

- take another random cluster and split it in two randomly and perform local search.

A typical result of this clusterwise regression procedure is shown in figs 4 and 5, where five subclusters are identified in a given datasets.

Additionally, globular clusters shall be detected and removed, since "square" or "globular" clusters are not at all significant for the purpose of estimating the service time. This can be done according to two techniques.

A first mode provides to transform points of each cluster in such a way that the regression line corresponds to the abscissa axis (i.e. workload) of the plot. Then, the distance of the transformed points to the abscissa axis is computed and the q-quantile of the distribution of points on the x axis and on the y axis is considered: if it is smaller than a predetermined threshold, corresponding cluster points can be removed. A second mode provides to compute confidence interval of regression line: if it is above a predetermined threshold (or even if the sign of the slope is affected), then the corresponding cluster can be removed.

7. Refinement

A refinement procedure is performed for reducing the number of significant clusters; this step is carried out by removing or merging clusters by re-assigning points to other clusters on the basis of some distance function, thereby reducing the number of clusters needed.

This procedure is run both during the central part of the method (as seen above), on sub-clusters right after clusterwise regression step - so as to reduce the number of sub-clusters overestimated by VIMS - and on all clusters at the end of the estimate procedure - so as to merge clusters generated by the same linear model but separated by DBSCAN, because they are centred around different zones.

Applying the refinement in two phases reduces the number of pairs of clusters to be evaluated and also improves the chances that the correct pairs clusters are merged.

Refinement procedure is performed according to the following steps. It is assumed that 'delta' is the z-quantile of the orthogonal distances from the points of a cluster to its regression line; it is suggested to use value for z = 0.9. Delta is computed for each cluster and then the pair that, when merged, gives origin to the cluster with the smallest increase in cluster delta is found.

- Increase over the sum of deltas (see example of fig.6A),

- Increase over the max delta (see example of fig. 6B),

- Increase over the max delta multiplied by the increase in the number of points (see example of fig. 6C).

In, general, by merging a big cluster with a small cluster the increase is expected to be small, while merging two big clusters the increase can be big.

If the increase of delta is below a predetermined threshold, the pair of clusters can be merged and a new regression lines is computed and then this procedure is started again; otherwise the procedure is stopped.

A typical situation which can be solved by a variant of refinement procedure is shown in fig. 7, where no pair can be merged without causing a large increase in delta. This procedure is applied every time delta increase is too big.

Each cluster is evaluated. The points of one cluster are assigned to the other clusters and it is checked which cluster suffers the biggest delta increase. The procedure then find the cluster that, when removed (i.e. having its points assigned to other clusters), gives origin to the smallest max increase in delta (since the regression line can change a lot, a few steps of local search are also performed). If the delta increase is below a predetermined threshold, the cluster is actually removed and the procedure is repeated. Otherwise the procedure is stopped.

From a computational point of view, refinement procedure can be seen as follows.

Given a cluster , the associated regression line defined by the coefficients ( i,Si), and a point (X_jf U_j), let d(i,j) be the orthogonal distance of the point from the regression line. For each cluster d the distances d(i,j) for j = 1, ... , IQI can be considered a random sample from an unknown distri- bution. We call 5p(Q) the p-percentile of this sample. A point j is considered inliner w.r.t. to a cluster if d(i,j) < 1.56₀.9(Cj) .

The refinement procedure works as follows.

1. For each cluster Ci from the smallest (in terms of point number) to the largest one

(a) If more than a certain percentage Tj of its points are inliners w. r.t. other clusters or if less than T_p points are not inliners w.r.t. other clusters, remove the cluster, reassign its points to the closest cluster and perform a local search .

2. Repeat

(a) For each pair of clusters Cj,C_j :

i. Merge the two clusters into a temporary cluster Cy.

ii. Remove from Cy any point that is inliner w. r.t. some cluster C_s with s≠ i and s≠ j.

iii . Compute the regression line of Cy , 6₀.9(Cy) and 6₀.95(Cy) .

iv. Let Csmaii be the smallest cluster among Q and C_}.

v. If more than a certain percentage T₀ of the points of C_sman are out^¬ liers w. r.t. Cy , go to the next pair clusters.

vi. Compute the correlation R_ix (R_)x) between the workload and the residuals of the points in Cy "^>Cj (Cy ^ C,). vii. If | R_ix| > T_R or | R_jx| > T_R, go to the next pair clusters.

viii. If the size of C_jrj is less than Tp points, remove both C, and C_j , assign their points to the closest cluster and go to the next pair clusters.

,(c„,)

ix. Compute S0.9O, j)

^0.95 (^C,.j )

x. Compute S0.95 , j) =

<Wc,) + <Wc_y)

xi. If either S₀.g(i, j) < Τ_δ or So.₉₅(i, j) < Τ_δ mark the pair as a candidate for merging. Store Ci,j , S₀.g(i/ j) and S0.95 , j)-

(b) If at least one pair is marked as a candidate for merging, select the pair of clusters Q,C_j for which So.g(i, j)+S₀.95(i, j) is minimum and merge the two clusters. Points of Q or Q that do not belong to C,_(j are assigned to the closest cluster. If no pair is marked as a candidate for merging, exit from the refinement procedure.

Summarizing the above refinement procedure, it can be inferred that the first part of the procedure deals with the removal of clusters that fit outliers from other clusters. This situation is frequent when overestimating the number of clusters. The second part of the procedure tackles the cases in which multiple regression lines fit the same cluster. This is also a common scenario.

The detection of such cases is based on the δ₀.9 and δο.95 values of the merged cluster and the ones of the clusters being merged. A decrease or even a small increase in this values suggests that the clusters are not well separated and should be merged. Two different values are used to improve the robustness of the approach. Considering only this criteria is safe only when the two clusters being merged have similar sizes.

To avoid merging clusters that shouldn't be merged, two different conditions should be verified. The first one prevents a large cluster from being merged with a small cluster just which lies far away from it's regression, requiring that at least a certain amount of points of the smallest cluster should be inliner in the merged cluster. The second condition is based on the correlation of residuals with the workload and preserves small clusters that are "attached" to big clusters but have a significantly different slope.

Examples of merging and removal of clusters are shown in fig. 8.

8. Post-processing: shared points and outliers

9. De-normalize results (Renormalize regression coefficients).

While there has been illustrated and described what are presently considered to be example embodiments, it will be understood by those skilled in the art that various other modifications may be made, and equivalents may be substituted, without departing from claimed subject matter. Additionally, many modifications may be made to adapt a particular situation to the teachings of claimed subject matter without departing from the central concept described herein. Therefore, it is intended that claimed subject matter not be limited to the particular embodiments disclosed, but that such claimed subject matter may also include all embodiments falling within the scope of the appended claims, and equivalents thereof.

Claims

1. Method for upgrading or allocating resources in a IT system, comprising the steps of collecting a dataset by sampling utilization versus workload of a resource in the ΓΤ system and then analyzing said dataset to obtain service time through clusterwise regression procedure, said service time being used to trigger the upgrade or allocation of said resources, characterized in that the method comprises the following steps:

(i) normalize collected dataset,

(ii) scatter data when utilization has been rounded,

(iii) provide for partition of data to find density based clusters through DBSCAN procedure,

(iv) discard clusters with less than the z% of the total number of observations,

(v) in each cluster, perform clusterwise regression and obtain linear sub-clusters in a pre-defined number,

(vi) reduce sub-clusters applying refinement procedure, removing subclusters that fit to outliers and merging pairs of clusters that fit the same model,

(vi) update clusters with the reduced sub-clusters,

(vii) remove globular clusters,

(viii) reduce number of clusters with refinement procedure, and

(ix) de-normalize results.

2. Method as in claim 1), wherein said refinement procedure comprises a merging step, wherein, assuming that 'delta' is the z-quantile of the orthogonal distances from the points of a cluster to its regression line and z = 0.9, delta is computed for each cluster and then the pair that, when merged, gives origin to the cluster with the smallest increase in cluster delta is found, then

if the increase of delta is below a predetermined threshold, the pair of clusters can be merged and a new regression lines is computed and then this procedure is started again; otherwise this step is ended.

3. Method as in claim 2), wherein said refinement procedure provides that,

given a cluster Q, the associated regression line defined by the coeffi- cients (Ri,S_f), and a point (X_j, U_j), let d(i,j) be the orthogonal distance of the point from the regression line, the following steps are performed:

- for each cluster Q the distances d(i,j) for j = 1, IQI is computed and considered a random sample from an unknown distribution, and

assuming 6p(Q) the p-percentile of said sample, a point j is consid- ered inliner w.r.t. to a cluster if d(i,j) <1.56₀.9(Ci), then

if more than a certain percentage T₍ of the points of the cluster are inliners w.r.t. other clusters or if less than T_p points are not inliners w.r.t. other clusters, the cluster is removed, its points reassigned to the closest cluster and a local search is performed.

4. Method as in claim 1) or 2), wherein said refinement procedure provides

assigning the points of one cluster to other clusters and checking which cluster suffers the biggest delta increase, then finding the cluster that, when having all its points assigned to other clusters, gives origin to the smallest max increase in delta, then

if delta increase is below a predetermined threshold, said cluster is actually removed.

5. A computer readable medium bearing a program product loadable into an internal memory of a digital computer, comprising software portions for performing the steps of any one of preceding claims when said product is run on a computer.