WO2011128921A1 - Automated service time estimation method for it system resources - Google Patents

Automated service time estimation method for it system resources Download PDF

Info

Publication number
WO2011128921A1
WO2011128921A1 PCT/IT2010/000164 IT2010000164W WO2011128921A1 WO 2011128921 A1 WO2011128921 A1 WO 2011128921A1 IT 2010000164 W IT2010000164 W IT 2010000164W WO 2011128921 A1 WO2011128921 A1 WO 2011128921A1
Authority
WO
WIPO (PCT)
Prior art keywords
clusters
cluster
regression
procedure
points
Prior art date
Application number
PCT/IT2010/000164
Other languages
French (fr)
Inventor
Di Milano Politecnico
Paolo Cremonesi
Kanika Dhyani
Stefano Visconti
Original Assignee
Neptuny S.R.L.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Neptuny S.R.L. filed Critical Neptuny S.R.L.
Priority to PCT/IT2010/000164 priority Critical patent/WO2011128921A1/en
Priority to PCT/IB2011/051648 priority patent/WO2012020328A1/en
Publication of WO2011128921A1 publication Critical patent/WO2011128921A1/en
Priority to US13/650,767 priority patent/US9350627B2/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/50Network service management, e.g. ensuring proper service fulfilment according to agreements
    • H04L41/5003Managing SLA; Interaction between SLA and QoS
    • H04L41/5009Determining service level performance parameters or violations of service level contracts, e.g. violations of agreed response time or mean time between failures [MTBF]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/50Network service management, e.g. ensuring proper service fulfilment according to agreements
    • H04L41/5003Managing SLA; Interaction between SLA and QoS
    • H04L41/5019Ensuring fulfilment of SLA
    • H04L41/5025Ensuring fulfilment of SLA by proactively reacting to service quality change, e.g. by reconfiguration after service quality degradation or upgrade
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/70Admission control; Resource allocation
    • H04L47/83Admission control; Resource allocation based on usage prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/02Protocol performance

Definitions

  • the present invention relates to a service time estimation method for IT system resources.
  • it relates to an estimation method based on DBSCAN methodology.
  • queuing network models are a powerful framework to study and predict the
  • service time of the system is the mean time required to process one request when no other requests are being processed by the system.
  • service time estimation is a building block in queuing network modelling, as diagrammatically shown in fig. 1A.
  • service time must be provided for each combination of service station and workload class.
  • service time measurements are rarely available in real systems and obtaining them might require invasive techniques such as benchmarking, load testing, profiling, application instrumentation or kernel instrumentation.
  • aggregate measurements such as the workload and the utilization are usually available.
  • workload throughput of the system
  • simple statistical techniques such as least squares regression.
  • anomalous or discontinuous behaviour can occur during the
  • An object of the present invention is hence to supply an enhanced method for estimating these regression models and correctly classifying observation samples according to the regression model that generated them, so as to correctly plan capacity and upgrading of the system.
  • a new method is provided that combines density based clustering, cluster wise regression and a refinement procedure. While service time estimation according to the prior art considered the functional regression model, in which errors only affect the independent variable (the utilization), the method of the invention is based on the structural regression model, in which there is no distinction between dependent and independent variable. While it makes sense to consider the workload a controlled variable, using the structural model for regression is less prone to underestimate the service time when the model assumptions are not met. Results obtained with this method yields more accurate results than existing methods in many real-world scenarios.
  • fig. 1A is a diagram showing the concept of utilization law and service time in a IT system
  • figs. IB and 1C are exemplary plots of regression lines obtained according to the prior art.
  • fig. 2 is representing the conversion of a dataset with rounded utilization into a plot of scattered data
  • fig. 3 is a flow chart showing the main steps of the method of the invention.
  • figs. 4 and 5 are exemplary plots of dataset after applying DBSCAN and VNS;
  • figs. 6A-6C are plots of clusters upon applying refinement procedure; fig. 7 is an exemplary plot of a dataset where three critical clusters are identified; and
  • fig. 8 are plots of different datasets showing the difference between cluster removal and cluster merging under the refinement procedure.
  • utilization law is a an equation of a straight line, where service time is the slope of a regression line and residual utilization (due to not modelled work) is the intercept of the regression line.
  • the task is to simultaneously estimate the number of models k that generated the data, the model parameters (S j ;R j ) for j e ⁇ 1, k ⁇ and a partition of
  • CiU... uC k ⁇ l, 2, ...n ⁇ such that the observations in cluster were generated by the model with parameters (Sj ;Rj ).
  • a real dataset is given from sampling utilization versus workload in a
  • the method proposed according to the invention will be called RECRA (Refinement Enhanced Clusterwise Regression Algorithm).
  • RECRA refinement Enhanced Clusterwise Regression Algorithm
  • the general principle of this method is to obtain an initial partition of the data into clusters of arbitrary shape using a density based clustering algorithm.
  • each cluster is split into multiple linear subclusters by applying a CWLR algorithm.
  • the number of subclusters is fixed a priori and should be an overestimate.
  • a refinement procedure then removes the subclusters that fit to outliers and merges pairs of clusters that fit the same model.
  • the clusters are replaced by their subclusters and the refinement procedure is run on all the clusters, merging the ones that were split by the density based clustering algorithm (see fig. 3).
  • An initial clustering partition is obtained through DBSCAN (Ester M., Kriegel H.P., Sander J., and Xu X. "A density-based algorithm for discovering clusters in large spatial databases with noise"), which is a well known clustering algorithm that relies on a density-based notion of clusters. Density-based clustering algorithms can successfully identify clusters of arbitrary shapes.
  • the DBSCAN method requires two parameters: the minimum number of points and ⁇ , the size of the neighbourhood.
  • the ⁇ parameter is important to achieve a good clustering. A small value for this parameter leads to many clusters, while a larger values leads to less clusters.
  • this density based clustering step might separate the data produced by the same regression model in two clusters. This usually happens when the observations produced by the same regression model are centred around two different workload values. Unless the clusters are ex- tremely spares, these cases can be effectively addressed to in the following refinement step.
  • Clusters having less than the z% of the total number of observations are discarded as not significant.
  • a clusterwise regression algorithm is applied, with an overestimate of the number of clusters.
  • Various algorithms can be used for the clusterwise regression step.
  • the VNS algorithm proposed in "Caporossi G. and Hansen P., Variable neighbourhood search for least squares clusterwise regression. Les Cahiers du GERAD, G-2005-61" is used.
  • This method uses a variable neighbourhood search (VNS) as a meta-heuristic to solve the least squares clusterwise linear regression (CWLR); in particular, it is based on ordinary least squares regression. This method of performing regression is non- robust and requires the choice of an independent variable.
  • the VNS step of the method is working in the following manner. Given the number of clusters K, the K clusters whose regression lines provides the best fit of the data shall be found. Then convergence until a certain condition is met is followed through:
  • Local search is performed by reassigning sample points to the closest cluster (distance from the regression line), then computing regression lines and repeat the same procedure until no points are required to be moved and reassigned to a closer cluster.
  • Perturbation (also called “split" strategy), is performed by applying p times the following procedure: - take a random cluster and assign one of its points to an other cluster
  • globular clusters shall be detected and removed, since "square" or "globular" clusters are not at all significant for the purpose of estimating the service time. This can be done according to two techniques.
  • a first mode provides to transform points of each cluster in such a way that the regression line corresponds to the abscissa axis (i.e. workload) of the plot. Then, the distance of the transformed points to the abscissa axis is computed and the q-quantile of the distribution of points on the x axis and on the y axis is considered: if it is smaller than a predetermined threshold, corresponding cluster points can be removed.
  • a second mode provides to compute confidence interval of regression line: if it is above a predetermined threshold (or even if the sign of the slope is affected), then the corresponding cluster can be removed.
  • a refinement procedure is performed for reducing the number of significant clusters; this step is carried out by removing or merging clusters by re-assigning points to other clusters on the basis of some distance function, thereby reducing the number of clusters needed.
  • This procedure is run both during the central part of the method (as seen above), on sub-clusters right after clusterwise regression step - so as to reduce the number of sub-clusters overestimated by VIMS - and on all clusters at the end of the estimate procedure - so as to merge clusters generated by the same linear model but separated by DBSCAN, because they are centred around different zones.
  • the pair of clusters can be merged and a new regression lines is computed and then this procedure is started again; otherwise the procedure is stopped.
  • Each cluster is evaluated.
  • the points of one cluster are assigned to the other clusters and it is checked which cluster suffers the biggest delta increase.
  • the procedure then find the cluster that, when removed (i.e. having its points assigned to other clusters), gives origin to the smallest max increase in delta (since the regression line can change a lot, a few steps of local search are also performed). If the delta increase is below a predetermined threshold, the cluster is actually removed and the procedure is repeated. Otherwise the procedure is stopped.
  • refinement procedure can be seen as follows.
  • d(i,j) be the orthogonal distance of the point from the regression line.
  • d(i,j) 1, ...
  • IQI can be considered a random sample from an unknown distri- bution.
  • 5p(Q) the p-percentile of this sample.
  • a point j is considered inliner w.r.t. to a cluster if d(i,j) ⁇ 1.56 0 .9(Cj) .
  • C jrj If the size of C jrj is less than Tp points, remove both C, and C j , assign their points to the closest cluster and go to the next pair clusters.
  • the first part of the procedure deals with the removal of clusters that fit outliers from other clusters. This situation is frequent when overestimating the number of clusters.
  • the second part of the procedure tackles the cases in which multiple regression lines fit the same cluster. This is also a common scenario.
  • the first one prevents a large cluster from being merged with a small cluster just which lies far away from it's regression, requiring that at least a certain amount of points of the smallest cluster should be inliner in the merged cluster.
  • the second condition is based on the correlation of residuals with the workload and preserves small clusters that are "attached" to big clusters but have a significantly different slope.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Business, Economics & Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Quality & Reliability (AREA)
  • Strategic Management (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • Game Theory and Decision Science (AREA)
  • Educational Administration (AREA)
  • Development Economics (AREA)
  • Computer Security & Cryptography (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Tourism & Hospitality (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Debugging And Monitoring (AREA)

Abstract

A method for upgrading or allocating resources in a IT system is disclosed. The method comprises the steps of collecting a dataset by sampling utilization versus workload of a resource in the IT system and then analyzing said dataset to obtain service time through clusterwise regression procedure, said service time being used to trigger the upgrade or allocation of said resources, wherein method further comprises the following steps: (i) normalize collected dataset, (ii) scatter data when utilization has been rounded, (iii) provide for partition of data to find density based clusters through DBSCAN procedure, (iv) discard clusters with less than the z% of the total number of observations, (v) in each cluster, perform clusterwise regression and obtain linear sub-clusters in a pre-defined number, (vi) reduce sub-clusters applying refinement procedure, removing subclusters that fit to outliers and merging pairs of clusters that fit the same model, (vi) update clusters with the reduced sub-clusters, (vii) remove globular clusters, (viii) reduce number of clusters with refinement procedure, and (ix) de-normalize results.

Description

Automated Service Time Estimation Method for IT System Resources
Field of the invention
The present invention relates to a service time estimation method for IT system resources. In particular, it relates to an estimation method based on DBSCAN methodology.
Background
As known, queuing network models are a powerful framework to study and predict the
performance of computer systems, i.e. for capacity planning of the system. However, their parameterization is often a challenging task and it cannot be entirely automatically performed. The problem of estimating the parameters of queuing network models has been undertaken in a number of works in the prior art, in connection with IT systems and communication networks.
One of the most critical parameters is the service time of the system, which is the mean time required to process one request when no other requests are being processed by the system. Indeed, service time estimation is a building block in queuing network modelling, as diagrammatically shown in fig. 1A.
To parameterize a queuing network model, service time must be provided for each combination of service station and workload class. Unfortunately, service time measurements are rarely available in real systems and obtaining them might require invasive techniques such as benchmarking, load testing, profiling, application instrumentation or kernel instrumentation. On the other hand, aggregate measurements such as the workload and the utilization are usually available.
According to the utilization law, the service time can be estimated from workload (=throughput of the system) and utilization using simple statistical techniques such as least squares regression. However, anomalous or discontinuous behaviour can occur during the
observation period. For instance, hardware and software may be upgraded or subject to failure, reducing or increasing service time, and certain back-
l ground tasks can affect the residual utilization. The system therefore has multiple working zones, each corresponding to a different regression models, which shall be correctly detected and taken into consideration. This task, according to the prior art, cannot be efficiently automatically performed.
Two examples of a poor detection of regression models is shown in figs. IB and 1C: here the single regression line is not effectively and correctly representing the behaviour of sampled data from two IT systems.
The problem of simultaneously identifying the clustering of linearly related samples and the regression lines is known in literature as cluster- wise linear regression (CWLR) or regression-wise clustering and is a particular case of model based clustering. This problem find immense applications in areas like control systems, neural networks and medicine.
This problem has already been addressed to using different techniques, but usually it requires some degree of manual intervention : i.e. it is required a human intelligence to detect at least the number of clusters within the dataset points and supply the correct value of some parameters to the chosen algorithm.
An object of the present invention is hence to supply an enhanced method for estimating these regression models and correctly classifying observation samples according to the regression model that generated them, so as to correctly plan capacity and upgrading of the system.
In other words, given n observations of workload versus utilization of an IT system, it is required to identify the number k of significant clusters, the corresponding regression lines (service time and residual utilization), cluster membership and outliers. Based on this identification, estimation of the IT system behaviour over a wide range of workload and utilization can be inferred, so that automatic upgrading or allocation of hardware/software resources can be performed in the system.
Summary of the invention
The above object can be obtained through a method as defined in its essential terms in the attached claims.
In particular, a new method is provided that combines density based clustering, cluster wise regression and a refinement procedure. While service time estimation according to the prior art considered the functional regression model, in which errors only affect the independent variable (the utilization), the method of the invention is based on the structural regression model, in which there is no distinction between dependent and independent variable. While it makes sense to consider the workload a controlled variable, using the structural model for regression is less prone to underestimate the service time when the model assumptions are not met. Results obtained with this method yields more accurate results than existing methods in many real-world scenarios.
Moreover, it shall be noted that according to the prior art service time estimation is based on standard regression (executed on the vertical distance, i.e. along the ordinate axis) and utilization is considered the independent variable and the workload is assumed to be error-free: then, if this assumption does not hold, the estimator is biased and inconsistent. By contrast, according to the invention an orthogonal regression has been chosen, which proved to yield the best results on most performance data sets. This approach proved to be effective also because aggregate measurements are often used for workload and utilization : for example, if observation is done on a web server to get page visits vs CPU utilization
- not all pages count the same in terms of CPU utilization,
- even if there is no error in CPU utilization measurements, the data will not be perfectly fit by a straight line,
and this is due to different mixtures of page accesses during different observation periods.
According to the method of the invention, it has been chosen to leave occurrence of overestimation of the number of clusters, so as to rely on an automatic procedure, and the reduce the number of clusters to the correct one through refinement procedure.
Brief description of the drawings
Further features and advantages of the system according to the invention will in any case be more evident from the following detailed description of some preferred embodiments of the same, given by way of example and illustrated in the accompanying drawings, wherein :
fig. 1A is a diagram showing the concept of utilization law and service time in a IT system;
figs. IB and 1C are exemplary plots of regression lines obtained according to the prior art;
fig. 2 is representing the conversion of a dataset with rounded utilization into a plot of scattered data;
fig. 3 is a flow chart showing the main steps of the method of the invention;
figs. 4 and 5 are exemplary plots of dataset after applying DBSCAN and VNS;
figs. 6A-6C are plots of clusters upon applying refinement procedure; fig. 7 is an exemplary plot of a dataset where three critical clusters are identified; and
fig. 8 are plots of different datasets showing the difference between cluster removal and cluster merging under the refinement procedure.
Detailed description of a preferred embodiment of the invention
The utilization law states that U = XS, where X is the workload of the system, S is the service time and U is the utilization. According to the utilization law, when no requests are entering the system, utilization should be zero. This is not always the case, due to batch processes, operating system activities and non-modelled workload classes. Therefore, there is a residual utilization present. If we represent residual utilization with the constant term R, the utilization law becomes U = XS + R.
In other terms, utilization law is a an equation of a straight line, where service time is the slope of a regression line and residual utilization (due to not modelled work) is the intercept of the regression line.
During an observation period, hardware and software upgrades may occur, causing a change in the service time. At the same time, background activities can affect the residual utilization. Therefore, the data is generated by k > 1 linear models:
U = XS1 + R1
U = XS2 + R2
U = XSk + Rk According to the invention, it is assumed the error-in-variables (EV) model, therefore if we let (X , U ), (X2\ U2 *) (Xn *, Un ) be the real values generated by the model, the observations (Xf, Uj*) are defined as X = Χ +η, and Uj=Xj*S+¾, where η, and £i are random variables representing the error. The choice of the EV model is moti- vated later on in the specification. Given the set of observation samples (affected by hardware/software upgrades, by background or batch activities and by outliers), the task is to simultaneously estimate the number of models k that generated the data, the model parameters (Sj ;Rj) for j e {1, k} and a partition of
the data (d, Ck where Q c {1, ...,n}, |Cj| >2, C Ck = 0 for Cj≠Ck and
CiU... uCk={l, 2, ...n} such that the observations in cluster were generated by the model with parameters (Sj ;Rj ).
In other words, it is required to simultaneously estimate the regression lines (clusters) and cluster membership, problem which is known in literature as clusterwise regression problem.
A real dataset is given from sampling utilization versus workload in a
IT system (for example a CPU of a computer). Said dataset is analyzed to obtain proper service time to be later used to trigger upgrading or allocation of hardware resources in the system. To this purpose, the following steps are performed on the dataset according to the method of the invention:
1. Normalize data
2. Scatter data if utilization has been rounded
3. Find density based clusters (DBSCAN)
4. Discard clusters with less than the z% of the total number of observations
5. In each cluster, perform clusterwise regression and obtain sub- clusters
- reduce sub-clusters with refinement procedure
- update cluster list with the sub-clusters
6. Remove "globular cluster" 7. Reduce clusters with refinement procedure
8. Post-processing: shared points and outliers
9. De-normalize results (Renormalize regression coefficients)
The method proposed according to the invention will be called RECRA (Refinement Enhanced Clusterwise Regression Algorithm). The general principle of this method is to obtain an initial partition of the data into clusters of arbitrary shape using a density based clustering algorithm. In the next step, each cluster is split into multiple linear subclusters by applying a CWLR algorithm. The number of subclusters is fixed a priori and should be an overestimate. A refinement procedure then removes the subclusters that fit to outliers and merges pairs of clusters that fit the same model. In the next step the clusters are replaced by their subclusters and the refinement procedure is run on all the clusters, merging the ones that were split by the density based clustering algorithm (see fig. 3).
1. Normalize data
Data are normalized so as not to introduce further errors.
2. Scatter data
When utilization data have been rounded, scattering of the data is required to prevent existence of clusters of perfectly collinear points. For example, as seen in fig. 2, integer CPU utilization has been rounded (left plot) and then value U is scattered using uniform [-0.5,0.5] noise (right plot): collinear sample points, due to the sampling methodology, can be hidden so as to prevent false determination of collinear clusters.
3. DBSCAN (density based clustering) application
An initial clustering partition is obtained through DBSCAN (Ester M., Kriegel H.P., Sander J., and Xu X. "A density-based algorithm for discovering clusters in large spatial databases with noise"), which is a well known clustering algorithm that relies on a density-based notion of clusters. Density-based clustering algorithms can successfully identify clusters of arbitrary shapes. The DBSCAN method requires two parameters: the minimum number of points and ε, the size of the neighbourhood. The ε parameter is important to achieve a good clustering. A small value for this parameter leads to many clusters, while a larger values leads to less clusters. Accord- ing to the prior art, it is suggested to visually inspect the curve of the sorted distances of the points to their k-neighbour (sorted k-distance) and choose the knee point of this curve as ε . According to the invention, since the method shall be performed automatically, the 95-percentile of the sorted k- distance is picked.
The solution of picking up 0.95-quantile of the sorted k-distance works well on typical datasets sampled from IT systems; in any case, even when it doesn't work properly, the method of the invention provides for subsequent steps which adjust the result. In fact, if it is too big with respect to the theoric correct value, less clusters than desired are obtained and the clusterwise regression step will split them; if it is too small, more clusters than desired are obtained and refinement procedure will merge them.
Applying density based clustering at this stage of the method has two advantages:
- it reduces the complexity of the problem undertaken by the cluster- wise regression technique (estimating regression lines in two small cluster is much easier than finding regression lines in two big clusters, since the scope of the search is restricted).
- it prevents too many regression lines to be produced on the same cluster.
This often happens when one of the clusters is very "thick" with respect to the others. Many regression lines will be used to minimize the error in the dense cluster and only one or few regression lines will be used to fit the other clusters, causing two or more clusters being fitted by the same regression line.
In some cases this density based clustering step might separate the data produced by the same regression model in two clusters. This usually happens when the observations produced by the same regression model are centred around two different workload values. Unless the clusters are ex- tremely spares, these cases can be effectively addressed to in the following refinement step.
4. Discarding clusters Clusters having less than the z% of the total number of observations are discarded as not significant.
5. Clusterwise regression and refinement
During this step, a clusterwise regression algorithm is applied, with an overestimate of the number of clusters. Various algorithms can be used for the clusterwise regression step. According to a preferred embodiment of the invention, the VNS algorithm proposed in "Caporossi G. and Hansen P., Variable neighbourhood search for least squares clusterwise regression. Les Cahiers du GERAD, G-2005-61" is used. This method uses a variable neighbourhood search (VNS) as a meta-heuristic to solve the least squares clusterwise linear regression (CWLR); in particular, it is based on ordinary least squares regression. This method of performing regression is non- robust and requires the choice of an independent variable. Service time estimation in the prior art have considered the utilization as the independent variable, but if this assumption does not hold, the estimator is biased and inconsistent. Orthogonal regression, on the other hand, is based on an er- ror-in-variables (EV) model, in which both variables considered are subject to error. Computational experiments made by the applicant have shown that orthogonal regression yields the best results on many performance data sets. This is understandable since it is often convenient to choose aggregate measurements to represent the workload. For example, in the context of web applications, the workload is often measured as the number of hits on the web server, making no distinction among different pages, despite the fact that different dynamic pages can have well-distinguished lev- els of CPU load. It is easy to see why, even if we assume the error in the measurement of utilization to be zero, the data will not be perfectly fit by a straight line, due to different mixtures of page accesses during different observation periods. The approximation done by choosing aggregate measurements for workload is taken into account by the EV model, but not by regular regression models. It is worth pointing out that in cases in which the assumption of having errors in both variables does not hold, regular regression techniques would provide better results. Because of this observation, according to the invention a modified VNS is used, using a regression method which is robust and based on the errors-in-variables model, thus measuring orthogonal distances between the points and the regression lines. A preferred method is based on the methodology proposed in "Fekri M. and Ruiz-Gazen A. Robust weighted orthogonal regression in the errors- in-variables model, 88:89-108, 2004.", which describes a way of obtaining robust estimators for the orthogonal regression line (equivalent major axis or principal component) from robust estimators of location and scatter. The MCD estimator (Rouseeuw P.J. Least median of squares regression. Journal of the American Statistical Association, 79:871-881, 1984) is used for loca- tion and scatter, which only takes into account the h out of n observations whose covariance matrix has the minimum determinant (thus removing the effect of outliers). Preferably a fast version of this estimator (based on Rousseeuw P.J. and van Driessen K. A fast algorithm for the minimum co- variance determinant estimator. Technometrics, 41 :212-223, 1998.) is used and to ensure the performance of VIMS a high value of h shall be set. The one step re-weighted estimates are computed using the Huber's discrepancy function (see Huber P.J. "Robust regression: asymptotics, conjectures and monte carlo", The Annals of Statistics, 1 :799-821, 1973).
The VNS step of the method is working in the following manner. Given the number of clusters K, the K clusters whose regression lines provides the best fit of the data shall be found. Then convergence until a certain condition is met is followed through:
(i) local search:
-If the error is smaller than previous best, the result is saved and perturbation intensity is set as p = 1;
- else, perturbation intensity is set as p = p % (K - 1) + 1
(ii) perturbation of the solution.
Local search is performed by reassigning sample points to the closest cluster (distance from the regression line), then computing regression lines and repeat the same procedure until no points are required to be moved and reassigned to a closer cluster.
Perturbation (also called "split" strategy), is performed by applying p times the following procedure: - take a random cluster and assign one of its points to an other cluster
- take another random cluster and split it in two randomly and perform local search.
A typical result of this clusterwise regression procedure is shown in figs 4 and 5, where five subclusters are identified in a given datasets.
Additionally, globular clusters shall be detected and removed, since "square" or "globular" clusters are not at all significant for the purpose of estimating the service time. This can be done according to two techniques.
A first mode provides to transform points of each cluster in such a way that the regression line corresponds to the abscissa axis (i.e. workload) of the plot. Then, the distance of the transformed points to the abscissa axis is computed and the q-quantile of the distribution of points on the x axis and on the y axis is considered: if it is smaller than a predetermined threshold, corresponding cluster points can be removed. A second mode provides to compute confidence interval of regression line: if it is above a predetermined threshold (or even if the sign of the slope is affected), then the corresponding cluster can be removed.
7. Refinement
A refinement procedure is performed for reducing the number of significant clusters; this step is carried out by removing or merging clusters by re-assigning points to other clusters on the basis of some distance function, thereby reducing the number of clusters needed.
This procedure is run both during the central part of the method (as seen above), on sub-clusters right after clusterwise regression step - so as to reduce the number of sub-clusters overestimated by VIMS - and on all clusters at the end of the estimate procedure - so as to merge clusters generated by the same linear model but separated by DBSCAN, because they are centred around different zones.
Applying the refinement in two phases reduces the number of pairs of clusters to be evaluated and also improves the chances that the correct pairs clusters are merged.
Refinement procedure is performed according to the following steps. It is assumed that 'delta' is the z-quantile of the orthogonal distances from the points of a cluster to its regression line; it is suggested to use value for z = 0.9. Delta is computed for each cluster and then the pair that, when merged, gives origin to the cluster with the smallest increase in cluster delta is found.
- Increase over the sum of deltas (see example of fig.6A),
- Increase over the max delta (see example of fig. 6B),
- Increase over the max delta multiplied by the increase in the number of points (see example of fig. 6C).
In, general, by merging a big cluster with a small cluster the increase is expected to be small, while merging two big clusters the increase can be big.
If the increase of delta is below a predetermined threshold, the pair of clusters can be merged and a new regression lines is computed and then this procedure is started again; otherwise the procedure is stopped.
A typical situation which can be solved by a variant of refinement procedure is shown in fig. 7, where no pair can be merged without causing a large increase in delta. This procedure is applied every time delta increase is too big.
Each cluster is evaluated. The points of one cluster are assigned to the other clusters and it is checked which cluster suffers the biggest delta increase. The procedure then find the cluster that, when removed (i.e. having its points assigned to other clusters), gives origin to the smallest max increase in delta (since the regression line can change a lot, a few steps of local search are also performed). If the delta increase is below a predetermined threshold, the cluster is actually removed and the procedure is repeated. Otherwise the procedure is stopped.
From a computational point of view, refinement procedure can be seen as follows.
Given a cluster , the associated regression line defined by the coefficients ( i,Si), and a point (Xjf Uj), let d(i,j) be the orthogonal distance of the point from the regression line. For each cluster d the distances d(i,j) for j = 1, ... , IQI can be considered a random sample from an unknown distri- bution. We call 5p(Q) the p-percentile of this sample. A point j is considered inliner w.r.t. to a cluster if d(i,j) < 1.560.9(Cj) .
The refinement procedure works as follows.
1. For each cluster Ci from the smallest (in terms of point number) to the largest one
(a) If more than a certain percentage Tj of its points are inliners w. r.t. other clusters or if less than Tp points are not inliners w.r.t. other clusters, remove the cluster, reassign its points to the closest cluster and perform a local search .
2. Repeat
(a) For each pair of clusters Cj,Cj :
i. Merge the two clusters into a temporary cluster Cy.
ii. Remove from Cy any point that is inliner w. r.t. some cluster Cs with s≠ i and s≠ j.
iii . Compute the regression line of Cy , 60.9(Cy) and 60.95(Cy) .
iv. Let Csmaii be the smallest cluster among Q and C}.
v. If more than a certain percentage T0 of the points of Csman are out¬ liers w. r.t. Cy , go to the next pair clusters.
vi. Compute the correlation Rix (R)x) between the workload and the residuals of the points in Cy ">Cj (Cy ^ C,). vii. If | Rix| > TR or | Rjx| > TR, go to the next pair clusters.
viii. If the size of Cjrj is less than Tp points, remove both C, and Cj , assign their points to the closest cluster and go to the next pair clusters.
,(c„,)
ix. Compute S0.9O, j)
^0.95 (C,.j )
x. Compute S0.95 , j) =
<Wc,) + <Wcy)
xi. If either S0.g(i, j) < Τδ or So.95(i, j) < Τδ mark the pair as a candidate for merging. Store Ci,j , S0.g(i/ j) and S0.95 , j)-
(b) If at least one pair is marked as a candidate for merging, select the pair of clusters Q,Cj for which So.g(i, j)+S0.95(i, j) is minimum and merge the two clusters. Points of Q or Q that do not belong to C,(j are assigned to the closest cluster. If no pair is marked as a candidate for merging, exit from the refinement procedure.
Summarizing the above refinement procedure, it can be inferred that the first part of the procedure deals with the removal of clusters that fit outliers from other clusters. This situation is frequent when overestimating the number of clusters. The second part of the procedure tackles the cases in which multiple regression lines fit the same cluster. This is also a common scenario.
The detection of such cases is based on the δ0.9 and δο.95 values of the merged cluster and the ones of the clusters being merged. A decrease or even a small increase in this values suggests that the clusters are not well separated and should be merged. Two different values are used to improve the robustness of the approach. Considering only this criteria is safe only when the two clusters being merged have similar sizes.
To avoid merging clusters that shouldn't be merged, two different conditions should be verified. The first one prevents a large cluster from being merged with a small cluster just which lies far away from it's regression, requiring that at least a certain amount of points of the smallest cluster should be inliner in the merged cluster. The second condition is based on the correlation of residuals with the workload and preserves small clusters that are "attached" to big clusters but have a significantly different slope.
Examples of merging and removal of clusters are shown in fig. 8.
8. Post-processing: shared points and outliers
9. De-normalize results (Renormalize regression coefficients).
While there has been illustrated and described what are presently considered to be example embodiments, it will be understood by those skilled in the art that various other modifications may be made, and equivalents may be substituted, without departing from claimed subject matter. Additionally, many modifications may be made to adapt a particular situation to the teachings of claimed subject matter without departing from the central concept described herein. Therefore, it is intended that claimed subject matter not be limited to the particular embodiments disclosed, but that such claimed subject matter may also include all embodiments falling within the scope of the appended claims, and equivalents thereof.

Claims

1. Method for upgrading or allocating resources in a IT system, comprising the steps of collecting a dataset by sampling utilization versus workload of a resource in the ΓΤ system and then analyzing said dataset to obtain service time through clusterwise regression procedure, said service time being used to trigger the upgrade or allocation of said resources, characterized in that the method comprises the following steps:
(i) normalize collected dataset,
(ii) scatter data when utilization has been rounded,
(iii) provide for partition of data to find density based clusters through DBSCAN procedure,
(iv) discard clusters with less than the z% of the total number of observations,
(v) in each cluster, perform clusterwise regression and obtain linear sub-clusters in a pre-defined number,
(vi) reduce sub-clusters applying refinement procedure, removing subclusters that fit to outliers and merging pairs of clusters that fit the same model,
(vi) update clusters with the reduced sub-clusters,
(vii) remove globular clusters,
(viii) reduce number of clusters with refinement procedure, and
(ix) de-normalize results.
2. Method as in claim 1), wherein said refinement procedure comprises a merging step, wherein, assuming that 'delta' is the z-quantile of the orthogonal distances from the points of a cluster to its regression line and z = 0.9, delta is computed for each cluster and then the pair that, when merged, gives origin to the cluster with the smallest increase in cluster delta is found, then
if the increase of delta is below a predetermined threshold, the pair of clusters can be merged and a new regression lines is computed and then this procedure is started again; otherwise this step is ended.
3. Method as in claim 2), wherein said refinement procedure provides that,
given a cluster Q, the associated regression line defined by the coeffi- cients (Ri,Sf), and a point (Xj, Uj), let d(i,j) be the orthogonal distance of the point from the regression line, the following steps are performed:
- for each cluster Q the distances d(i,j) for j = 1, IQI is computed and considered a random sample from an unknown distribution, and
assuming 6p(Q) the p-percentile of said sample, a point j is consid- ered inliner w.r.t. to a cluster if d(i,j) <1.560.9(Ci), then
if more than a certain percentage T( of the points of the cluster are inliners w.r.t. other clusters or if less than Tp points are not inliners w.r.t. other clusters, the cluster is removed, its points reassigned to the closest cluster and a local search is performed.
4. Method as in claim 1) or 2), wherein said refinement procedure provides
assigning the points of one cluster to other clusters and checking which cluster suffers the biggest delta increase, then finding the cluster that, when having all its points assigned to other clusters, gives origin to the smallest max increase in delta, then
if delta increase is below a predetermined threshold, said cluster is actually removed.
5. A computer readable medium bearing a program product loadable into an internal memory of a digital computer, comprising software portions for performing the steps of any one of preceding claims when said product is run on a computer.
PCT/IT2010/000164 2010-04-15 2010-04-15 Automated service time estimation method for it system resources WO2011128921A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
PCT/IT2010/000164 WO2011128921A1 (en) 2010-04-15 2010-04-15 Automated service time estimation method for it system resources
PCT/IB2011/051648 WO2012020328A1 (en) 2010-04-15 2011-04-15 Automated service time estimation method for it system resources
US13/650,767 US9350627B2 (en) 2010-04-15 2012-10-12 Automated service time estimation method for IT system resources

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/IT2010/000164 WO2011128921A1 (en) 2010-04-15 2010-04-15 Automated service time estimation method for it system resources

Publications (1)

Publication Number Publication Date
WO2011128921A1 true WO2011128921A1 (en) 2011-10-20

Family

ID=43302177

Family Applications (2)

Application Number Title Priority Date Filing Date
PCT/IT2010/000164 WO2011128921A1 (en) 2010-04-15 2010-04-15 Automated service time estimation method for it system resources
PCT/IB2011/051648 WO2012020328A1 (en) 2010-04-15 2011-04-15 Automated service time estimation method for it system resources

Family Applications After (1)

Application Number Title Priority Date Filing Date
PCT/IB2011/051648 WO2012020328A1 (en) 2010-04-15 2011-04-15 Automated service time estimation method for it system resources

Country Status (1)

Country Link
WO (2) WO2011128921A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106096631A (en) * 2016-06-02 2016-11-09 上海世脉信息科技有限公司 A kind of recurrent population's Classification and Identification based on the big data of mobile phone analyze method
CN108256560A (en) * 2017-12-27 2018-07-06 同济大学 A kind of park recognition methods based on space-time cluster
CN108769101A (en) * 2018-04-03 2018-11-06 北京奇艺世纪科技有限公司 A kind of information processing method, client and system
CN110389873A (en) * 2018-04-17 2019-10-29 北京京东尚科信息技术有限公司 A kind of method and apparatus of determining server resource service condition

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104167092B (en) * 2014-07-30 2016-09-21 北京市交通信息中心 A kind of method determining center, on-board and off-board hot spot region of hiring a car and device
CN109000645A (en) * 2018-04-26 2018-12-14 西南电子技术研究所(中国电子科技集团公司第十研究所) Complex environment target classics track extracting method
CN117112871B (en) * 2023-10-19 2024-01-05 南京华飞数据技术有限公司 Data real-time efficient fusion processing method based on FCM clustering algorithm model

Non-Patent Citations (13)

* Cited by examiner, † Cited by third party
Title
CASALE G ET AL: "Robust Workload Estimation in Queueing Network Performance Models", PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING, 2008. PDP 2008. 16TH EUROMICRO CONFERENCE ON, IEEE, PISCATAWAY, NJ, USA, 13 February 2008 (2008-02-13), pages 183 - 187, XP031233612, ISBN: 978-0-7695-3089-5 *
ESTER M ET AL: "A density-based algorithm for discovering clusters in large spatial databases with noise", PROCEEDINGS. INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY ANDDATA MINING, XX, XX, 1 January 1996 (1996-01-01), pages 226 - 231, XP002355949 *
ESTER M.; KRIEGEL H.P.; SANDER J.; XU X., A DENSITY-BASED ALGORITHM FOR DISCOVERING CLUSTERS IN LARGE SPATIAL DATABASES WITH NOISE
FEKRI M.; RUIZ-GAZEN A., ROBUST WEIGHTED ORTHOGONAL REGRESSION IN THE ERRORS-IN-VARIABLES MODEL, vol. 88, 2004, pages 89 - 108
G. CAPOROSSI, P. HANSEN: "Variable neighborhood search for least squares clusterwise regression", December 2007 (2007-12-01), XP002614130, Retrieved from the Internet <URL:http://www.gerad.ca/fichiers/cahiers/G-2005-61.pdf> [retrieved on 20101214] *
HUBER P.J.: "Robust regression: asymptotics, conjectures and monte carlo", THE ANNALS OF STATISTICS, vol. 1, 1973, pages 799 - 821
M. FEKRI, A. RUIZ-GAZEN: "Robust weighted orthogonal regression in the errors-in-variables model", JOURNAL OF MULTIVARIATE ANALYSIS, vol. 88, no. 1, 1 January 2004 (2004-01-01), pages 89 - 108, XP002614131, DOI: http://dx.doi.org/10.1016/S0047-259X(03)00057-5 *
PAOLO CREMONESI, KANIKA DHYANI, ANDREA SANSOTTERA: "Service Time Estimation with a Refinement Enhanced Hybrid Clustering Algorithm - Whitepaper February 2010", February 2010 (2010-02-01), XP002614129, Retrieved from the Internet <URL:http://www.neptuny.com/files/Service_time_estimation_with_a_refinement_enhanced_hybrid_clustering_algorithm.pdf> [retrieved on 20101213] *
PETER J. HUBER: "Robust Regression: Asymptotics, Conjectures and Monte Carlo", 1973, XP002614133, Retrieved from the Internet <URL:http://www.projecteuclid.org/DPubS/Repository/1.0/Disseminate?view=body&id=pdf_1&handle=euclid.aos/1176342503> [retrieved on 20101214], DOI: doi:10.1214/aos/1176342503 *
ROUSEEUW P.J.: "Least median of squares regression", JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, vol. 79, 1984, pages 871 - 881
ROUSSEEUW P J ET AL: "Fast Algorithm for the Minimum Covariance Determinant Estimator", 15 December 1998 (1998-12-15), INTERNET CITATION, XP002614132, Retrieved from the Internet <URL:http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.45.5870&rep=rep1&type=pdf> [retrieved on 20101214] *
ROUSSEEUW P J: "LEAST MEDIAN OF SQUARES REGRESSION", JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, AMERICAN STATISTICAL ASSOCIATION, NEW YORK, US, vol. 79, no. 388, 1 December 1984 (1984-12-01), pages 871 - 880, XP008024952, ISSN: 0162-1459 *
ROUSSEEUW P.J.; VAN DRIESSEN K. A: "fast algorithm for the minimum covariance determinant estimator", TECHNOMETRICS, vol. 41, 1998, pages 212 - 223

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106096631A (en) * 2016-06-02 2016-11-09 上海世脉信息科技有限公司 A kind of recurrent population's Classification and Identification based on the big data of mobile phone analyze method
CN106096631B (en) * 2016-06-02 2019-03-19 上海世脉信息科技有限公司 A kind of floating population's Classification and Identification analysis method based on mobile phone big data
CN108256560A (en) * 2017-12-27 2018-07-06 同济大学 A kind of park recognition methods based on space-time cluster
CN108256560B (en) * 2017-12-27 2021-05-04 同济大学 Parking identification method based on space-time clustering
CN108769101A (en) * 2018-04-03 2018-11-06 北京奇艺世纪科技有限公司 A kind of information processing method, client and system
CN110389873A (en) * 2018-04-17 2019-10-29 北京京东尚科信息技术有限公司 A kind of method and apparatus of determining server resource service condition

Also Published As

Publication number Publication date
WO2012020328A1 (en) 2012-02-16

Similar Documents

Publication Publication Date Title
WO2011128921A1 (en) Automated service time estimation method for it system resources
US9350627B2 (en) Automated service time estimation method for IT system resources
US9658936B2 (en) Optimization analysis using similar frequencies
US20150347273A1 (en) Deploying Trace Objectives Using Cost Analyses
US10797953B2 (en) Server consolidation system
WO2014126601A1 (en) Application tracing by distributed objectives
KR101845393B1 (en) Factory smart analysis system and method based on bigdata
CN106708738B (en) Software test defect prediction method and system
EP3503473B1 (en) Server classification in networked environments
Cremonesi et al. Service time estimation with a refinement enhanced hybrid clustering algorithm
US10567398B2 (en) Method and apparatus for remote malware monitoring
US20130254524A1 (en) Automated configuration change authorization
Chen et al. Predicting job completion times using system logs in supercomputing clusters
da Silva et al. Self-healing of workflow activity incidents on distributed computing infrastructures
Aleti et al. An efficient method for uncertainty propagation in robust software performance estimation
Al Dallal Transitive-based object-oriented lack-of-cohesion metric
US20170317950A1 (en) Batch job frequency control
CN105786682A (en) Implementation system and method for avoiding software performance failure
Messer et al. The multiple filter test for change point detection in time series
CN109800052B (en) Anomaly detection and positioning method and device applied to distributed container cloud platform
Genkin et al. Automatic, on-line tuning of YARN container memory and CPU parameters
CN104615752A (en) Information classification method and system
Zacheilas et al. Dione: Profiling spark applications exploiting graph similarity
Cusack et al. Escra: Event-driven, sub-second container resource allocation
Rinsaka et al. A faster estimation algorithm for periodic preventive rejuvenation schedule maximizing system availability

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 10740023

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 10740023

Country of ref document: EP

Kind code of ref document: A1