US20160105327A9 - Automated upgrading method for capacity of it system resources - Google Patents
Automated upgrading method for capacity of it system resources Download PDFInfo
- Publication number
- US20160105327A9 US20160105327A9 US13/650,827 US201213650827A US2016105327A9 US 20160105327 A9 US20160105327 A9 US 20160105327A9 US 201213650827 A US201213650827 A US 201213650827A US 2016105327 A9 US2016105327 A9 US 2016105327A9
- Authority
- US
- United States
- Prior art keywords
- dataset
- data
- series
- prediction
- time
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 84
- 230000008569 process Effects 0.000 claims abstract description 25
- 230000001932 seasonal effect Effects 0.000 claims abstract description 22
- 238000007781 pre-processing Methods 0.000 claims abstract description 8
- 238000005311 autocorrelation function Methods 0.000 claims abstract description 6
- 238000001914 filtration Methods 0.000 claims abstract 2
- 230000006399 behavior Effects 0.000 claims description 23
- 238000012544 monitoring process Methods 0.000 claims description 5
- 238000004422 calculation algorithm Methods 0.000 description 46
- 238000004458 analytical method Methods 0.000 description 19
- 238000012360 testing method Methods 0.000 description 19
- 230000000737 periodic effect Effects 0.000 description 12
- 241001123248 Arma Species 0.000 description 10
- 238000002790 cross-validation Methods 0.000 description 7
- 230000006870 function Effects 0.000 description 7
- 238000001514 detection method Methods 0.000 description 6
- 238000013459 approach Methods 0.000 description 5
- 238000013439 planning Methods 0.000 description 5
- 238000012731 temporal analysis Methods 0.000 description 5
- 238000000700 time series analysis Methods 0.000 description 5
- 230000002547 anomalous effect Effects 0.000 description 4
- 230000008859 change Effects 0.000 description 4
- 238000009826 distribution Methods 0.000 description 4
- 238000013277 forecasting method Methods 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 238000003860 storage Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 3
- 238000011835 investigation Methods 0.000 description 3
- 230000007774 longterm Effects 0.000 description 3
- 230000000007 visual effect Effects 0.000 description 3
- 230000003442 weekly effect Effects 0.000 description 3
- 241001074710 Eucalyptus populnea Species 0.000 description 2
- 238000005314 correlation function Methods 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 238000009499 grossing Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 238000007689 inspection Methods 0.000 description 2
- 238000012417 linear regression Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000005295 random walk Methods 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 238000000714 time series forecasting Methods 0.000 description 2
- 238000007476 Maximum Likelihood Methods 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 238000000540 analysis of variance Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 230000001364 causal effect Effects 0.000 description 1
- 230000001010 compromised effect Effects 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000010921 in-depth analysis Methods 0.000 description 1
- 238000002955 isolation Methods 0.000 description 1
- 238000005304 joining Methods 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000013450 outlier detection Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000000630 rising effect Effects 0.000 description 1
- GOLXNESZZPUPJE-UHFFFAOYSA-N spiromesifen Chemical compound CC1=CC(C)=CC(C)=C1C(C(O1)=O)=C(OC(=O)CC(C)(C)C)C11CCCC1 GOLXNESZZPUPJE-UHFFFAOYSA-N 0.000 description 1
- 230000005654 stationary process Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000036962 time dependent Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/04—Processing captured monitoring data, e.g. for logfile generation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3452—Performance evaluation by statistical analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/02—Knowledge representation; Symbolic representation
- G06N5/022—Knowledge engineering; Knowledge acquisition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/04—Inference or reasoning models
- G06N5/046—Forward inferencing; Production systems
- G06N5/047—Pattern matching networks; Rete networks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/14—Network analysis or design
- H04L41/142—Network analysis or design using statistical or mathematical methods
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/14—Network analysis or design
- H04L41/147—Network analysis or design for predicting network behaviour
Definitions
- the present invention relates to a prediction method for capacity planning of IT system resources.
- it relates to an automated prediction method.
- capacity planning is very crucial for the efficient management of available resources.
- IT infrastructure it is the science of estimating the space, the computer hardware, software and connection resources that will be needed in some time in the future.
- the aim of a capacity planner is therefore to find the most cost efficient solution by determining appropriate tradeoffs so that the needed technical capacity/resources can be added in time to meet the predicted demand, however making sure that the resources do not go unused for long periods of time.
- it is required to upgrade or update some hardware at the right point in time, so as to cope with the demand without too much anticipate the correct time so as to budget correctly upgrading costs.
- Time series analysis is a very vital process for predicting how major aspects of various economic and social processes evolve over time. For a long time now, it is extensively applied in predicting the growth of key business activities, for instance the rise and fall of stock prices, determining market trends amongst others. Due to the rising need of optimizing IT infrastructure to offer better services while minimizing the cost of maintaining and buying the infrastructure, there is a growing necessity of developing advanced methods that automatically trigger hardware upgrading or add-on processes.
- Time series analysis applied to an IT infrastructure is based on collecting or sampling data related to signals issued by monitored hardware, so as to build the historical behaviour and hence estimate the future points of the model. This analysis projected in time, is apt to supply specific information for establishing when and how said hardware or software resource will require upgrading or substitution. Upgrading of a certain resource n the IT infrastructure for example intended for a specific task, may occur also as an automatic re-allocation of resources (for example memory banks, disk space, CPU, . . . ) from an other system provisionally allocated to a different task: in such a case the entire upgrading process can be carried out in a completely automatic mode.
- resources for example memory banks, disk space, CPU, . . .
- the same analysis supplies information about occurrence of events, errors on prediction bands, point in time when given hardware changes should be done or when the given infrastructure will breakdown.
- a time series analysis can help minimizing user response time by predicting future hardware requests. This constitutes a simple capacity planning situation in a demand-supply scenario where a balance between how much hardware infrastructures need to be installed on the basis of expected number of users and minimizing the loss of profit situations due to a slow web access needs to be determined by a capacity planner.
- An object of the present invention it is hence that of supplying a method for hardware upgrading based on a robust time series, prediction in the domain of capacity planning of business and workload performance metrics in IT infrastructure, like business drivers, technical proxy, CPU, memory utilization etc.
- To achieve the goal it is desired to develop a completely automated time series prediction method.
- Having an automated method for performance data has two-fold advantages: (i) due to the large volumes of data with constantly changing physical characteristics which needs to be regularly analyzed, an automation of reading data, updating of internal parameters and a through extensive analysis is imperative; (ii) human intervention in time series prediction process always has some draw backs as capacity planners are engineers who generally lack a deep mathematical and statistical knowledge that time forecasting experts have.
- the method specified relies on a forecasting algorithm based on the Box and Jenkins prediction algorithm with added functionalities which, on the basis of proper identification of characteristic properties of the data set, is able to boost the accuracy of prediction and of the hardware upgrading process.
- the algorithm is completely automated and is designed for an unskilled capacity planner requiring no prior knowledge in this area and no manual intervention. To achieve this end, apart from all the other phases of the algorithm, the main core of the algorithm comprising the Box and Jenkins prediction algorithm has also been completely automated.
- This algorithm is very suited and tailored for time series coming from the workload and performance domains in IT systems, since they have a lot of internal behaviour like long range trends, long term and short term seasonalities and dynamics that evolve independently of each other, representing the different physical contribution to the final structure of the data.
- the method of the invention has a clear edge over other popular forecasting methods like Robust Linear regression (P. J. Rousseeuw and A. M. LeRoy. Linear regression and outlier detection. Hoboken: Wiley, 2003), which can only capture long term trends without giving any further insight on smaller granularity data, Holt-Winters (P. S. Kalekar, “Time series forecasting using Holt-Winters exponential smoothing”, December 2004), which provides a prediction based on the trend and seasonality in the data but is not robust to anomalies, the Random Walk algorithm (N. Guillotin-Plantard and R. Scott, Dyanim random walks: theory and applications.
- Robust Linear regression P. J. Rousseeuw and A. M. LeRoy. Linear regression and outlier detection. Hoboken: Wiley, 2003
- Holt-Winters P. S. Kalekar, “Time series forecasting using Holt-Winters exponential smoothing”, December 2004
- the Random Walk algorithm N. Guillotin-Plan
- FIG. 1 is a block diagram showing the main steps of the prediction method according to the invention.
- FIG. 2 is an exemplary time series representing the active memory of a IT machine
- FIG. 3 is a plot showing forecast and prediction bands of a test series.
- FIG. 4 is a cross validation on workload series showing comparison amongst different algorithms.
- FIG. 5 is a cross validation on performance series showing comparison amongst different algorithms.
- FIG. 6 (Table 1) is a table of results for workload data showing comparison amongst different algorithms.
- FIG. 7 (Table 2) is a table of results for performance data showing comparison amongst different algorithms.
- T(t) represents the trend
- P(t) the periodic components
- S(t) a stationary process
- the method according to the invention after having collected (over time) a dataset of hardware performance signals coming from a monitored IT system, is relying on a treatment process of said dataset which is divided in 5 main phases (see FIG. 1 ): (a) pre-processing of the data, (b) identification of trend, (c) seasonal components analysis, (d) ARMA modelling and (e) final prediction of the time series,
- a level discontinuity is a change in the level of the series, which usually appears in the form of a step. These jumps can be caused by occasional hardware upgrades or changes in the physical structure of the monitored IT applications.
- To detect level discontinuities a set of candidate points is created using Kullback-Leibler divergence (see W. N. Venables, D. M. Smith, (December 2008). An introduction to R. Available at http://www.r-project.org/): for each point of the time series, the deviation between its backward and forward window is computed. The resulting vector is filtered through a significance threshold and the remaining points constitute the change points in the series, which are the required level discontinuities.
- the algorithm provides another method, which works in two steps: (i) an alternative list of candidate jump points is created: considering the second difference of the series, the procedure takes care of samples in which it is greater than half (as default) of the standard deviation of the dataset; (ii) a seasonal analysis is computed to this vector of candidates, to evade peaks at known periodic lags to be considered as level discontinuities. Thanks to this 2-step filter, only jumps of considerable entity and concrete meaning are selected and adjusted.
- Outliers are values that deviate substantially from the behaviour of the series. Their identification is crucial for the goodness of the final forecast of the data, as the presence of strange samples in the time series can lead to big errors in the determination of the parameters for the model of the prediction.
- samples which appear as anomalous points as their behaviour differs from that of the data In addition to these points, there are samples which appear as anomalous points as their behaviour differs from that of the data.
- these points have a physical significance and are known as events. Events normally represent some hardware upgrades or changes in the system which occur either in isolation or may occur at periodic intervals. Both outliers and events are obtained in the vector of change points got using the Kullback-Leibler function. Change points are calculated using a boxplot analysis conducted at all intervals, the values in these intervals over the whisker point are generally considered anomalous.
- this list of unlabeled points is then refined by the event detection box, which identifies seasonalities in this vector (if existing) by automatically detecting the starting point of the seasonal sequence and producing a list of events in the data. All the other points are labelled as outliers.
- the event detection procedure ensures that events are considered as seasonal elements and properly dealt with the dedicated section of the algorithm.
- the last P samples (P is the granularity of the data) of the dataset is left out, while a P-step ahead forecast (using the estimation method itself) is computed. It produces the estimated value for the last period, which is judged to be unusual or not. If it is, the real values are substituted by the just computed prediction. Then, the last sample of the obtained data series is ignored, and its unusuality is judged through a procedure specular to the one above.
- This filter of the earliest samples i.e. the most significant samples of the trailing edge
- the prediction method of the invention has been equipped with a set of alternative identifiers, which are useful to adequately characterize aspects of the specific performance and workload data of IT systems.
- IT time series are not allowed to take negative values, as they represent percentages or natural metrics, which by definition cannot be negative. Because of that, the prediction shall be limited to prevent the forecast to reach negative values.
- the series constitutes a utilization dataset, so that a lower and an upper bound can be put.
- the series shows a non trivial lower (or upper) bound: this limit must be correctly detected and considered, as letting forecasts go under that boundary causes infeasible situations in terms of physical meaning of the data.
- the mode of the series is calculated: if it is the lower (or upper) bound of the data and has a sufficiently high frequency, then it is considered the base of the data, which is labelled as a trampoline series.
- the fixed set of possible curves is composed by polynomial functions (linear, quadratic and cubic) and a non linear one (exponential). Initially a heuristic test to detect possible exponential behaviour is computed on the series. Y(t) (which is the output of the pre-processing procedure): the natural logarithm of Y is taken to obtain the slope of the resulting fitting line, which is useful for the further analysis. Supposing, in fact, that Y(t) y T(t) (condition satisfied in almost all real situations),
- the slope (b) of the fitting regression line can be used to obtain the proper analytical expression for the exponential arrangement to the series. Afterwards, it is possible to apply a R 2 test to the dataset, involving the modelling exponential function just calculated (e b*t ) and the polynomial ones (t, t 2 and t 3 ) cited above. The maximum value of the R 2 is the one corresponding to the correct regression.
- Every aspect of the data is highly influenced by time. All time series have a granularity, which is the interval between which data are collected. Data, for example, can be gathered hourly, weekly or yearly; they can even be picked at a certain granularity and then be processed to obtain different time intervals. Often, datasets show time-dependent correlation, like events or usual realizations as well, that tend to be periodic in their appearances. IT data, in this sense, are very significant, as particularly expressive of this crucial aspect of datasets.
- an entire process block that deals seasonal traits in the data. It is divided in 3 parts: the first two handle respectively the original detrended dataset (Z(t)) and its aggregation with respect to the basic seasonality, while the third one uses the information acquired from the previous two to prepare the series for the next steps of the process,
- ACF Auto Correlation Function
- Granularity-based A seasonal component in the original data is considered relevant only if its period is equal to the time interval. This approach allows to discover regular dependencies which relies to correlation that have a concrete physical meaning connected to the time elapsing. This is the default option of the algorithm;
- Custom It is possible finally to leave to the user the choice to input a set of feasible periods to the automatic process.
- the procedure chooses the lowest period which is both in this set and in the relevant periods returned by the ACF-test. This option has been added to let the user the possibility to customize (if needed) his analysis and also to enable the algorithm to manage peculiar time series that can rarely appear in real applications.
- a data collection for instance, can be conducted from Monday to Friday and stopped in the week-end (due to the closing of the offices, for example).
- a possibly relevant seasonality would have period 5 (rather than 7, as in usual weekly dependencies), and due to this selection the algorithm can properly handle it.
- this procedure could return no period, to express that seasonality is not important in the evolution of the data.
- T is the value returned by the previous processing (if seasonality has not been considered relevant in the data, then it is set to time interval P; if the last group has ⁇ hacek over (T) ⁇ elements and ⁇ hacek over (T) ⁇ T, it is filled adding T ⁇ hacek over (T) ⁇ samples equal to the mean of the series).
- T is the value returned by the previous processing (if seasonality has not been considered relevant in the data, then it is set to time interval P; if the last group has ⁇ hacek over (T) ⁇ elements and ⁇ hacek over (T) ⁇ T, it is filled adding T ⁇ hacek over (T) ⁇ samples equal to the mean of the series).
- the ACF-test can be applied to this series, to find possible significant seasonal components in the data (T).
- This additional periodic examination allows to discover specific patterns, which are connected to “double seasonalities” in the structure of the metric that is modelling the data.
- the applied method for this procedure is the greedy one, which does not forbid a period to be considered significant by the algorithm. This choice takes into account the complicated dynamics of performance dataset. IT aggregated time series, in fact, do not have fixed periodic patterns but can show regularities at any sample lag.
- the season parameter can vary, according to the possible results of the previous inspection of the data:
- the obtained series S(t) is accordingly deprived of all its seasonal components and the parameter ⁇ is kept for the further reapplication of the seasonality upon the final stage of prediction.
- a process tool has been construed which is able to accurately identify the most correct orders of AR an MA portions of the ARMA process, starting from reasonable considerations on the structure of the data.
- the procedure agrees with the one from Lu and Abourizk, about the bounds on the orders, as p and q components greater than 3 would not bring substantial improvements to the modelling goodness and would only complicate the abstraction on the data.
- This process tool executes an accurate inspection of the ‘acf’ (a correlation function) and ‘pcf’ (p correlation function) function, which basically applies rules described in the document S. Bittanti, “Identificazione dei modelli e sisterni adattativi,” Bologna, Italy: Pitagora editrice, 2005.
- This procedure is concerned with the knowledge acquired directly from the serial correlation among data samples (ad and pcf), that is strongly meaningful for the behaviour of the data and in addition can be computed in a very short time.
- every obtained model can be used and extended to any future collected data, to produce a forecast of the dataset.
- the prediction of time series is computed following the inverse order, with respect to the identification, applying results from the less relevant component to the dominant one.
- the modelling of S(t) allows to produce an analytical expression for the serial correlation of the series with past values and noise sequences.
- the forecast generated from this information is particularly relevant for a short prediction horizon (in the first future samples), while far unknown samples are less affected by the ARMA contribution,
- H is the desired forecasting horizon. This procedure puts particular care on the last period of the data (which is, as discussed previously, a very important portion of the dataset).
- the attention of the forecast is focused on Z(n ⁇ T+j(mod T)), that basically considers only the last period of the series, joining it with the prediction of the stationary component of the data, to replicate seasonality over time.
- parameter T includes all interesting seasonalities of the data and allows the dealing of multiple periodic components, without any other additional information.
- the method according to the invention further triggers a proper procedure, depending on the specific hardware employed and monitored, to upgrade said hardware, either allocating to the system some un-employed shared resources of an other system or issuing an alarm for the IT manager to start a manual upgrading procedure.
- the computed prediction based on the historical usage data vs time of the hard-disk gives and indication that at time t the hard-disk capacity will be used up to 99%.
- the process is hence set so that, at time t ⁇ n before reaching complete usage of said resource, more disk space capacity is allocated to that IT system.
- prediction bands are computed in addition to the estimated predicted values of the dataset. These prediction bands are calculated as a function of the forecasted value and the chosen confidence on error of prediction: they represent the region (i.e. upper and lower bounds) in which the prediction lays with a certain probability.
- the accuracy of the prediction method is evaluated on data coming from IT metrics, roughly categorized in two subsets: workload and performance data.
- the first one includes datasets representing raw business data, directly taken from the IT activities: for example Business Driver, which monitor user-based metrics (number of requests, logins, orders, etc.), Technical Proxy, which indirectly measure business performance (rates, number of hits, volumes of data involved in transactions, etc.) and Disk Load, describing the load addressed to memory devices.
- the second one is composed of time series which can be workload data after the processing applied by queuing network models or performance series, coming directly from IT architecture devices (CPU's, storage systems, databases, etc.).
- the accuracy of the method according to the invention has been assessed through visual judgement of cross-validation, comparing the results with some others which can be obtained from other popular forecasting methods, like robust linear (RL) and Holt-Winters (HW). In both cases, the method according to the invention has been judged superior.
- FIG. 2 is shown a real dataset obtained monitoring the number of bytes of active memory of a computing machine.
- the method of the invention has been found able to automatically detect and recognize missing values (dashed ovals in the figure), which shall be filled as explained above, the level discontinuities (rectangular shapes), which shall be adjusted for the correct automatic analysis, and the outliers (dashed circles) which are well considered in the pre-processing stage.
- missing values dashedos in the figure
- the dataset shows a seasonal component of period 24 (number of hours in a day) while the aggregate series is dominated by a seasonality of period 7 (number of days in the week).
- the hourly time series tends to replicate its behaviour every week.
- FIG. 3 shows the initial series, together with its final prediction (bold line) for 408 samples (17 days), along with the prediction bands (dashed line), computed for a 75% confidence.
- the performance of the algorithm is evaluated through the three performance indices.
- ⁇ (j) represent the prediction of sample y(j) and N the length of the portion of series used to test the prediction.
- RMSE Root Mean Squared Error
- a Mean Absolute Percentage Error (MAPE) can be calculated, it is defined as follows:
- ⁇ x ⁇ 1 2 n 1 + ⁇ 2 2 n 2 .
- First column indicates the type of parameter shown ( ⁇ is the mean and ⁇ is the standard deviation); columns from 2 to 4 shows MAPE values for Box and Jenkins, Robust Linear and Holt-Winters, while columns from 5 to 7 and from 8 to 10 illustrate for the three algorithms respectively the RMSE and the R. Every table shows the results grouped with respect to the type of metric that the series is monitoring. For each group, the mean and the standard deviation of the results vector are displayed, together with the output of previously discussed test. In the “test” row, symbol ‘+’ indicates that the null hypothesis is been rejected in favor of the automated Box and Jenkins algorithm, while a stands for the acceptance of the null hypothesis. Results of these tests stand the accuracy of our algorithm, in predicting IT time series, with respect to the considered performance indices. The ⁇ value for MAPE is never over 40%, which in literature is considered a reasonable threshold for the goodness of the forecast.
- FIG. 4 shows the cross-validation performed on a time series monitoring the number of events occurring in a web server on a daily basis.
- datasets of this type show a clear lower bound, which is properly detected by the trampoline identifier, there by evading infeasible situations due to possible negative trend in the data.
- the instance illustrated in FIG. 4 shows a clear trampoline base represented by value 0, as datasets monitoring events, can not have negative values (as Holt-Winters instead uncorrectly predicts).
- our automated prediction method detects all basic seasonal components in the data so that the forecast fits the periodic behavior of the series appropriately (unlike Robust Linear algorithm, which is unaware of seasonalities).
- FIG. 2 represents the cross-validation test on a hourly sampled storage time series, representing the disk memory used by a machine. Analyzing the data closely shows that there is a double seasonality, arising from daily and weekly fluctuations of the memory occupation over the trend of the series. Capturing this characteristic behavior leads to a more informed prediction, which follows correctly the recurrence of local and global peaks in the data. For this specific time series, the improvement in terms of prediction accuracy of this algorithm with respect to its counterparts is considerable: it reduces the MAPE by 35% and the RMSE by 26% against Robust linear, 80% and 75% respectively against Holt-Winters. Further, none of the predicted values is considered uncorrect (with confidence 95%) and accordingly the ER for automated Box and Jenkins algorithm is 0.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Human Resources & Organizations (AREA)
- Entrepreneurship & Innovation (AREA)
- Quality & Reliability (AREA)
- Strategic Management (AREA)
- Economics (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Probability & Statistics with Applications (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Computer Hardware Design (AREA)
- Educational Administration (AREA)
- Operations Research (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Game Theory and Decision Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Marketing (AREA)
- Evolutionary Biology (AREA)
- Development Economics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Optimization (AREA)
- Mathematical Analysis (AREA)
- Algebra (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
Description
- The present invention relates to a prediction method for capacity planning of IT system resources. In particular, it relates to an automated prediction method.
- As known, capacity planning is very crucial for the efficient management of available resources. In the context of IT infrastructure, it is the science of estimating the space, the computer hardware, software and connection resources that will be needed in some time in the future. The aim of a capacity planner is therefore to find the most cost efficient solution by determining appropriate tradeoffs so that the needed technical capacity/resources can be added in time to meet the predicted demand, however making sure that the resources do not go unused for long periods of time. In other words, it is required to upgrade or update some hardware at the right point in time, so as to cope with the demand without too much anticipate the correct time so as to budget correctly upgrading costs.
- Time series analysis is a very vital process for predicting how major aspects of various economic and social processes evolve over time. For a long time now, it is extensively applied in predicting the growth of key business activities, for instance the rise and fall of stock prices, determining market trends amongst others. Due to the rising need of optimizing IT infrastructure to offer better services while minimizing the cost of maintaining and buying the infrastructure, there is a growing necessity of developing advanced methods that automatically trigger hardware upgrading or add-on processes.
- Time series analysis applied to an IT infrastructure is based on collecting or sampling data related to signals issued by monitored hardware, so as to build the historical behaviour and hence estimate the future points of the model. This analysis projected in time, is apt to supply specific information for establishing when and how said hardware or software resource will require upgrading or substitution. Upgrading of a certain resource n the IT infrastructure for example intended for a specific task, may occur also as an automatic re-allocation of resources (for example memory banks, disk space, CPU, . . . ) from an other system provisionally allocated to a different task: in such a case the entire upgrading process can be carried out in a completely automatic mode.
- The same analysis supplies information about occurrence of events, errors on prediction bands, point in time when given hardware changes should be done or when the given infrastructure will breakdown.
- As an example, the following can be reported: based on past behaviour of entities like the number of accesses to or transactions in a web site, a time series analysis can help minimizing user response time by predicting future hardware requests. This constitutes a simple capacity planning situation in a demand-supply scenario where a balance between how much hardware infrastructures need to be installed on the basis of expected number of users and minimizing the loss of profit situations due to a slow web access needs to be determined by a capacity planner.
- One of the algorithm mostly employed in the field of time series prediction is the well known Box and Jenkins prediction algorithm (see, for example, G. E. P. Box and G. M. Jenkins, Time Series Analysis: Forecasting and Control. San Francisco, Calif.: Holden-Day, 1976 and J. G. Caldwell, (2007, February). Mathematical forecasting using the Box-Jenkins methodology, available online at www.foundationwebsite.org.); this system is able to roughly match well to operate to any condition, regardless of the specific domain wherein it is used. Typically, to tune this algorithm to supply good results for a specific application field, it is required a certain amount of manual intervention to select a number of tuning parameters based on visual observation of the historical behaviour of the specific acquired time series. Of course, this way of proceeding, as such, is not suitable to completely automate the upgrading process.
- An object of the present invention it is hence that of supplying a method for hardware upgrading based on a robust time series, prediction in the domain of capacity planning of business and workload performance metrics in IT infrastructure, like business drivers, technical proxy, CPU, memory utilization etc. To achieve the goal, it is desired to develop a completely automated time series prediction method. Having an automated method for performance data, has two-fold advantages: (i) due to the large volumes of data with constantly changing physical characteristics which needs to be regularly analyzed, an automation of reading data, updating of internal parameters and a through extensive analysis is imperative; (ii) human intervention in time series prediction process always has some draw backs as capacity planners are engineers who generally lack a deep mathematical and statistical knowledge that time forecasting experts have.
- The above object is obtained through a method as defined in its essential characteristics in the attached claims.
- In particular, the method specified relies on a forecasting algorithm based on the Box and Jenkins prediction algorithm with added functionalities which, on the basis of proper identification of characteristic properties of the data set, is able to boost the accuracy of prediction and of the hardware upgrading process.
- The algorithm is completely automated and is designed for an unskilled capacity planner requiring no prior knowledge in this area and no manual intervention. To achieve this end, apart from all the other phases of the algorithm, the main core of the algorithm comprising the Box and Jenkins prediction algorithm has also been completely automated.
- This algorithm is very suited and tailored for time series coming from the workload and performance domains in IT systems, since they have a lot of internal behaviour like long range trends, long term and short term seasonalities and dynamics that evolve independently of each other, representing the different physical contribution to the final structure of the data.
- For this specific domain of data, the method of the invention has a clear edge over other popular forecasting methods like Robust Linear regression (P. J. Rousseeuw and A. M. LeRoy. Linear regression and outlier detection. Hoboken: Wiley, 2003), which can only capture long term trends without giving any further insight on smaller granularity data, Holt-Winters (P. S. Kalekar, “Time series forecasting using Holt-Winters exponential smoothing”, December 2004), which provides a prediction based on the trend and seasonality in the data but is not robust to anomalies, the Random Walk algorithm (N. Guillotin-Plantard and R. Scott, Dyanim random walks: theory and applications. Oxford, UK: Elsevier, 2006), especially used for stock forecasting, which is suited only for short-range perspective as it predicts on the basis on the last observation and does not take the general trend into account, and the Moving Average set of algorithms (P. J. Brockwell and R. A. Davis, Time series: theory and methods. 2nd ed. New York: Springer, 1991), which assume a relation between the short and long term perspective by defining a user threshold and generally work well only if seasonalities in the data are regular and cyclic.
- Further features and advantages of the system according to the invention will in any case be more evident from the following detailed description of some preferred embodiments of the same, given by way of example and illustrated in the accompanying drawings and tables, wherein:
-
FIG. 1 is a block diagram showing the main steps of the prediction method according to the invention; -
FIG. 2 is an exemplary time series representing the active memory of a IT machine; -
FIG. 3 is a plot showing forecast and prediction bands of a test series. -
FIG. 4 is a cross validation on workload series showing comparison amongst different algorithms. -
FIG. 5 is a cross validation on performance series showing comparison amongst different algorithms. -
FIG. 6 (Table 1) is a table of results for workload data showing comparison amongst different algorithms. -
FIG. 7 (Table 2) is a table of results for performance data showing comparison amongst different algorithms. - The Box and Jenkins approach is a complex forecasting method which is known since 1976. This framework is based on the assumption that each time series x(t) can be modelled as follows:
-
x(t)=f(T(t);P(t);S(t)); (1) - where T(t) represents the trend, P(t) the periodic components and S(t) a stationary process.
- A major pitfall of this algorithm is that it requires—as mentioned above—substantial manual work and a deep statistical knowledge. For instance this algorithm is based on the determining of the parameters of the ARMA model which cannot be trivially inferred from the data. Up till now, this was done by iterating over p and q, leading to high number of possible combinations and hence it is high demanding for computing resources and doesn't allow to get a solution in real time.
- Due to the peculiarity of IT systems data, in addition to completely automating the prediction process, several components have been added to the Box and Jenkins algorithm to give an informed intelligent prediction. These procedures identify some characteristic behaviours of the series, which are known as features and are selectively used according to the invention. A list of 35 important features have already been suggested in the art, which can be broadly classified as domain knowledge, functional form, context knowledge, causal forces, trend, seasonality, uncertainty and instability. While some of the features listed cannot be automated, some others are not suited for the performance and workload data. After an in-depth analysis, according to the invention a subset of 20 features has been chosen and then merged into 6 main characteristics to be detected. Those characteristics have been splitted and allotted to two different treatment stages of the process: some of them to a pre-processing step, while some others in the other sections.
- In particular, the method according to the invention, after having collected (over time) a dataset of hardware performance signals coming from a monitored IT system, is relying on a treatment process of said dataset which is divided in 5 main phases (see
FIG. 1 ): (a) pre-processing of the data, (b) identification of trend, (c) seasonal components analysis, (d) ARMA modelling and (e) final prediction of the time series, - As the name suggests, in this stage the algorithm prepares the dataset for further analysis. This preamble is very crucial to the accuracy of the final prediction of the dataset, as solving anomalies in the data leads to a cleaner exploration of the series structure. In the following, some of the features that strongly characterize each time series and that are used to compute an informed pre-processing of the data is described.
- (a.1) NA values
- In real applications, i.e. in IT systems resources, many series contain missing values (the abbreviation “NA” stands for “Not Available”). These lacks of information are caused by the missing data during collection (the machine in charge of the acquisition is down for some reason) or by the in consistency of the data with the domain of the metric. According to the invention, each missing observation is replaced by the median of the k closest samples (default is k=5): the median, in fact, maintains the general behaviour of the series and in addition is not affected by extreme values (unlike the mean), thereby not distorting the model components (trend and seasonality).
- A level discontinuity is a change in the level of the series, which usually appears in the form of a step. These jumps can be caused by occasional hardware upgrades or changes in the physical structure of the monitored IT applications. To detect level discontinuities, a set of candidate points is created using Kullback-Leibler divergence (see W. N. Venables, D. M. Smith, (December 2008). An introduction to R. Available at http://www.r-project.org/): for each point of the time series, the deviation between its backward and forward window is computed. The resulting vector is filtered through a significance threshold and the remaining points constitute the change points in the series, which are the required level discontinuities. To filter further level jumps, the algorithm provides another method, which works in two steps: (i) an alternative list of candidate jump points is created: considering the second difference of the series, the procedure takes care of samples in which it is greater than half (as default) of the standard deviation of the dataset; (ii) a seasonal analysis is computed to this vector of candidates, to evade peaks at known periodic lags to be considered as level discontinuities. Thanks to this 2-step filter, only jumps of considerable entity and concrete meaning are selected and adjusted.
- Outliers are values that deviate substantially from the behaviour of the series. Their identification is crucial for the goodness of the final forecast of the data, as the presence of strange samples in the time series can lead to big errors in the determination of the parameters for the model of the prediction. In addition to these points, there are samples which appear as anomalous points as their behaviour differs from that of the data. However, in real world these points have a physical significance and are known as events. Events normally represent some hardware upgrades or changes in the system which occur either in isolation or may occur at periodic intervals. Both outliers and events are obtained in the vector of change points got using the Kullback-Leibler function. Change points are calculated using a boxplot analysis conducted at all intervals, the values in these intervals over the whisker point are generally considered anomalous. To distinguish events from outliers, this list of unlabeled points is then refined by the event detection box, which identifies seasonalities in this vector (if existing) by automatically detecting the starting point of the seasonal sequence and producing a list of events in the data. All the other points are labelled as outliers. The event detection procedure ensures that events are considered as seasonal elements and properly dealt with the dedicated section of the algorithm.
- It is known that statisticians think that predicting a time series, with a highly corrupted last period, can bring to such huge conceptual mistake that the whole forecasting process could be totally compromised. Handling suitably anomalies in the last part of the series is really important for the final prediction of the data. Hence, according to the invention, the last P samples (P is the granularity of the data) of the dataset is left out, while a P-step ahead forecast (using the estimation method itself) is computed. It produces the estimated value for the last period, which is judged to be unusual or not. If it is, the real values are substituted by the just computed prediction. Then, the last sample of the obtained data series is ignored, and its unusuality is judged through a procedure specular to the one above. This filter of the earliest samples (i.e. the most significant samples of the trailing edge) is very important, as dealing series with stable last portion leads to a more accurate prediction of the data.
- Other than these widely known features, the prediction method of the invention has been equipped with a set of alternative identifiers, which are useful to adequately characterize aspects of the specific performance and workload data of IT systems. In particular, IT time series are not allowed to take negative values, as they represent percentages or natural metrics, which by definition cannot be negative. Because of that, the prediction shall be limited to prevent the forecast to reach negative values.
- Further, it is detected if the series constitutes a utilization dataset, so that a lower and an upper bound can be put. In some cases, in real situations, the series shows a non trivial lower (or upper) bound: this limit must be correctly detected and considered, as letting forecasts go under that boundary causes infeasible situations in terms of physical meaning of the data. To identify the bound, the mode of the series is calculated: if it is the lower (or upper) bound of the data and has a sufficiently high frequency, then it is considered the base of the data, which is labelled as a trampoline series.
- Once the time series has been cleaned from all anomalous behaviours, and therefore can be considered a pure and meaningful expression of the process underlying the data, it shall be treated according to the three steps illustrated in
FIG. 1 . - The trend part of the data is, in most of the cases, the most relevant one as it dominates the whole series. For this reason, performing a good identification of the general direction of the data usually leads a good final prediction of the series. To accomplish this job, the coefficient of determination (R2) technique is suggested to be used (see, for explanations, L. Huang and J. Chen, “Analysis of variance, coefficient of determination and f-test for local polynomial regression”, The annals of statistics, vol. 36, no. 5, pp. 2085-2109, October 2008; E. R. Dougherty, Kim and Y Chen, “Coefficient of determination in non linear signal processing”. Signal processing, vol. 80, no. 10 pp. 2219-2235, October 2000.)
- The fixed set of possible curves is composed by polynomial functions (linear, quadratic and cubic) and a non linear one (exponential). Initially a heuristic test to detect possible exponential behaviour is computed on the series. Y(t) (which is the output of the pre-processing procedure): the natural logarithm of Y is taken to obtain the slope of the resulting fitting line, which is useful for the further analysis. Supposing, in fact, that Y(t) y T(t) (condition satisfied in almost all real situations),
- if
-
T(t)=a*e b*t, -
than -
log Y(t)=log a*e b*t=log a+log e b*t=log a+b*t - The slope (b) of the fitting regression line can be used to obtain the proper analytical expression for the exponential arrangement to the series. Afterwards, it is possible to apply a R2 test to the dataset, involving the modelling exponential function just calculated (eb*t) and the polynomial ones (t, t2 and t3) cited above. The maximum value of the R2 is the one corresponding to the correct regression.
- Once the function is chosen, it is straightforward to find the analytical expression of the best-fitting line (all computer programs for numerical calculations provide a built-in function to fit generic analytical models to a dataset). Sometimes, unfortunately, real world data are very “dirty”. The coefficient of determination could be biased by some random circumstance in the data, that could deviate the output of the R2 test (especially if samples are not numerous). To prevent unexpected and undesirable situations of bad adaptation to the data, a threshold is put on the trend test. It, represents the value over which the maximum R2 rate must stay, to be significant. This filter is put to avoid the overfitting of the model, that could rely too much on the original data (which can be corrupted, instead, by some disturbing random factor) with respect to future samples.
- In most of the real world situations in IT systems, every aspect of the data is highly influenced by time. All time series have a granularity, which is the interval between which data are collected. Data, for example, can be gathered hourly, weekly or yearly; they can even be picked at a certain granularity and then be processed to obtain different time intervals. Often, datasets show time-dependent correlation, like events or usual realizations as well, that tend to be periodic in their appearances. IT data, in this sense, are very significant, as particularly expressive of this crucial aspect of datasets.
- According to the invention there is provided an entire process block that deals seasonal traits in the data. It is divided in 3 parts: the first two handle respectively the original detrended dataset (Z(t)) and its aggregation with respect to the basic seasonality, while the third one uses the information acquired from the previous two to prepare the series for the next steps of the process,
- To detect if seasonality is a relevant component of the series, an analysis on the Auto Correlation Function (ACF) over dataset is computed. The procedure for the identification of seasonal components detects sufficiently high peaks (local minima or maxima greater than a threshold) in the ACF, which represents the period of the relevant seasonal components in the data. This test highlights regular behaviours of the series, that usually denote specific qualities of the process underlying the data. The process can handle the ACF-test output with the following 3 different approaches.
- 1. Granularity-based. A seasonal component in the original data is considered relevant only if its period is equal to the time interval. This approach allows to discover regular dependencies which relies to correlation that have a concrete physical meaning connected to the time elapsing. This is the default option of the algorithm;
- 2. Greedy. This approach instead supposes that every period labelled as significant by the Auto Correlation Function analysis is acceptable and potentially relevant. The algorithm chooses the highest peak returned by the seasonality test and assumes the corresponding period as the one driving the series;
- 3. Custom. It is possible finally to leave to the user the choice to input a set of feasible periods to the automatic process. The procedure chooses the lowest period which is both in this set and in the relevant periods returned by the ACF-test. This option has been added to let the user the possibility to customize (if needed) his analysis and also to enable the algorithm to manage peculiar time series that can rarely appear in real applications. A data collection, for instance, can be conducted from Monday to Friday and stopped in the week-end (due to the closing of the offices, for example). In this case, a possibly relevant seasonality would have period 5 (rather than 7, as in usual weekly dependencies), and due to this selection the algorithm can properly handle it. Obviously, this procedure could return no period, to express that seasonality is not important in the evolution of the data.
- Once the original series has been dealt, a deeper analysis is made on the dataset. The data vector is divided in K groups and each of them is composed of T elements, T is the value returned by the previous processing (if seasonality has not been considered relevant in the data, then it is set to time interval P; if the last group has {hacek over (T)} elements and {hacek over (T)}<T, it is filled adding T−{hacek over (T)} samples equal to the mean of the series). Hence, the so called aggregate series is obtained, where each sample Z′(j) (j×{1, 2, . . . , K}) is defined as
-
Σi=0 T-1 Z(j−1) - At this point, the ACF-test can be applied to this series, to find possible significant seasonal components in the data (T). This additional periodic examination allows to discover specific patterns, which are connected to “double seasonalities” in the structure of the metric that is modelling the data. The applied method for this procedure is the greedy one, which does not forbid a period to be considered significant by the algorithm. This choice takes into account the complicated dynamics of performance dataset. IT aggregated time series, in fact, do not have fixed periodic patterns but can show regularities at any sample lag.
- Once even the aggregated series has been fully analyzed, Z(t) can be taken in consideration again and information obtained in the two previous sections can be used. Seasonal differencing is a form of adjustment, whose aim is to remove the seasonality explicitly from the dataset. In general, given a time series X(t) (of length n) and the season Δ, the difference series is obtained as follows:
-
S(j)=X(j+Δ)−X(j) j×{1,2, . . . ,n−Δ} (2) - The season parameter can vary, according to the possible results of the previous inspection of the data:
-
- if aggregated series showed relevant seasonality (independently from original series' result), then
-
Δ=T*T′; -
- if aggregated data did not reflect periodic regularities, while original data did, then
-
Δ=T; -
- otherwise, if neither the original series, nor the aggregated one showed seasonality, then seasonal differencing can not be applied, as periodicity is not thought as relevant in the structure of the data.
- The obtained series S(t) is accordingly deprived of all its seasonal components and the parameter Δ is kept for the further reapplication of the seasonality upon the final stage of prediction.
- Dataset resulting from all previous procedures is a stationary series S(t), which can be modelled as an ARMA(p,q) process (see, for example, G. E. P. Box and G. M. Jenkins, Time Series Analysis: Forecasting and Control. San Francisco, Calif.: Holden-Day, 1976.). Therefore, investigations to be accomplished on the dataset are the detection of the order of the model and its identification.
- There are some different approaches to handle this portion of the Box and Jenkins analysis (see for example, Y. Lu and S. M. AbouRizk, “Automated Box Jenkins forecasting modelling”, Automation in construction, vol. 18, pp. 547-558, November 2008.) and the details of the processes to be used will not be described here, since they are well known in the field and they do not form specifically part of the present invention.
- According to a preferred embodiment of the invention, with the aim of (tightly) reducing the computational time of this procedure, a process tool has been construed which is able to accurately identify the most correct orders of AR an MA portions of the ARMA process, starting from reasonable considerations on the structure of the data. First, the procedure agrees with the one from Lu and Abourizk, about the bounds on the orders, as p and q components greater than 3 would not bring substantial improvements to the modelling goodness and would only complicate the abstraction on the data. This process tool, then, executes an accurate inspection of the ‘acf’ (a correlation function) and ‘pcf’ (p correlation function) function, which basically applies rules described in the document S. Bittanti, “Identificazione dei modelli e sisterni adattativi,” Bologna, Italy: Pitagora editrice, 2005.
- This procedure is concerned with the knowledge acquired directly from the serial correlation among data samples (ad and pcf), that is strongly meaningful for the behaviour of the data and in addition can be computed in a very short time.
- After p and q components have been identified correctly, the only thing that is left is the computation of the values of model parameters. This is done by the machine, which uses a traditional MILE (Maximum Likelihood Estimate) method to estimate coefficients that best model the given time series S(t). Once the ARMA model has been built, the system is ready for the final prediction stage of the procedure.
- At this stage, an abstraction of every component of the time series has been produced: hence every obtained model can be used and extended to any future collected data, to produce a forecast of the dataset. The prediction of time series is computed following the inverse order, with respect to the identification, applying results from the less relevant component to the dominant one.
- The modelling of S(t) allows to produce an analytical expression for the serial correlation of the series with past values and noise sequences. The forecast generated from this information is particularly relevant for a short prediction horizon (in the first future samples), while far unknown samples are less affected by the ARMA contribution,
- Expression (2) explained how to obtain the difference series without the seasonal components. Now seasonality must be reapplied to build the desired dataset {hacek over (Z)}(t). Known samples are trivially reacquired with the following:
-
{hacek over (Z)}(j)=Z(j) j×1,2, . . . ,T (3) -
{hacek over (Z)}(j+T)=Z(j)+S(j) j×1,2, . . . ,n−T (4) - Future samples, instead, are obtained, thanks to the ARMA forecast, using:
-
{hacek over (Z)}(n+j)=Z(n−T+j(mod T))+{circumflex over (S)}(n−T+j) j×1,2, . . . ,H (5) - where H is the desired forecasting horizon. This procedure puts particular care on the last period of the data (which is, as discussed previously, a very important portion of the dataset). In equation (5), in effect, the attention of the forecast is focused on Z(n−T+j(mod T)), that basically considers only the last period of the series, joining it with the prediction of the stationary component of the data, to replicate seasonality over time. Please notice that parameter T includes all interesting seasonalities of the data and allows the dealing of multiple periodic components, without any other additional information.
- Finally, the identified regression curve of the data is reapplied. In most performance data series, trend is the most relevant component of the data. Therefore, to perform effective capacity planning, the general behaviour of a time series is crucial and its proper detection and application becomes the most critical section of any forecasting procedure. That is why so much attention is paid in the trend identification section of this procedure. Finally, after having considered detected features (reapplication of level discontinuities and outliers, etc.), the definitive prediction is computed.
- Based on this computed prediction, the method according to the invention further triggers a proper procedure, depending on the specific hardware employed and monitored, to upgrade said hardware, either allocating to the system some un-employed shared resources of an other system or issuing an alarm for the IT manager to start a manual upgrading procedure.
- As an example, if the prediction method is based on a dataset representing the storage space usage in a hard-disk, the computed prediction based on the historical usage data vs time of the hard-disk, gives and indication that at time t the hard-disk capacity will be used up to 99%. The process is hence set so that, at time t−n before reaching complete usage of said resource, more disk space capacity is allocated to that IT system.
- According to a preferred embodiment of the invention, to provide a more robust estimation of unknown forecasted samples, prediction bands are computed in addition to the estimated predicted values of the dataset. These prediction bands are calculated as a function of the forecasted value and the chosen confidence on error of prediction: they represent the region (i.e. upper and lower bounds) in which the prediction lays with a certain probability.
- The accuracy of the prediction method is evaluated on data coming from IT metrics, roughly categorized in two subsets: workload and performance data. The first one includes datasets representing raw business data, directly taken from the IT activities: for example Business Driver, which monitor user-based metrics (number of requests, logins, orders, etc.), Technical Proxy, which indirectly measure business performance (rates, number of hits, volumes of data involved in transactions, etc.) and Disk Load, describing the load addressed to memory devices. The second one is composed of time series which can be workload data after the processing applied by queuing network models or performance series, coming directly from IT architecture devices (CPU's, storage systems, databases, etc.).
- The accuracy of the method according to the invention has been assessed through visual judgement of cross-validation, comparing the results with some others which can be obtained from other popular forecasting methods, like robust linear (RL) and Holt-Winters (HW). In both cases, the method according to the invention has been judged superior.
- In
FIG. 2 is shown a real dataset obtained monitoring the number of bytes of active memory of a computing machine. The method of the invention has been found able to automatically detect and recognize missing values (dashed ovals in the figure), which shall be filled as explained above, the level discontinuities (rectangular shapes), which shall be adjusted for the correct automatic analysis, and the outliers (dashed circles) which are well considered in the pre-processing stage. In particular, some of outliers were not detected as anomalous points, but correctly recognized—through the event detection procedure—as usual periodic peaks due to seasonality; trend analysis on the cleaned series fits an upward linear trend to the series. The seasonal detection, instead, discovers a double seasonality in the data. The dataset, in fact, shows a seasonal component of period 24 (number of hours in a day) while the aggregate series is dominated by a seasonality of period 7 (number of days in the week). Hence, the hourly time series tends to replicate its behaviour every week. This aspect of the dataset is handled by the seasonal differencing, which combines the ARMA prediction (p=q=3) and the identified trend to make an appropriate forecast.FIG. 3 shows the initial series, together with its final prediction (bold line) for 408 samples (17 days), along with the prediction bands (dashed line), computed for a 75% confidence. - While there has been illustrated and described what are presently considered to be example embodiments, it will be understood by those skilled in the art that various other modifications may be made, and equivalents may be substituted, without departing from claimed subject matter. Additionally, many modifications may be made to adapt a particular situation to the teachings of claimed subject matter without departing from the central concept described herein. Therefore, it is intended that claimed subject matter not be limited to the particular embodiments disclosed, but that such claimed subject matter may also include all embodiments falling within the scope of the appended claims, and equivalents thereof.
- Evaluation and Example of Comparison with Other Algorithms
- To demonstrate the efficiency of the automated prediction algorithm, which intelligently uses the underlying behavior of the time series, with other algorithms, we present extensive cross-validation results. Our algorithm is compared with two different forecasting methods, widely used in common data analysis applications; Robust Linear (RL) and Holt-Winters (HW). The first one is very popular in the time series forecasting, because of its easy understanding, its robustness with respect to outliers and to high variable behaviors. Robust linear, anyway, extracts a very basic model from the data, and so does not take seasonal dynamics in the series in consideration. Holt-Winters, instead, is an exponential smoothing method, which handles both trend and seasonality behaviors. The major drawback of this forecasting algorithm is the lack of an informed periodic analysis: data is just smoothed and seasonality of the last portion of the dataset is replicated over time, without knowing if the periodic component is relevant and ignoring multiple seasonalities. Finally, both these methods lack the ARMA analysis, which is important in detecting the serial correlation in the time series.
- The performance of the algorithm is evaluated through the three performance indices.
- Let ŷ(j) represent the prediction of sample y(j) and N the length of the portion of series used to test the prediction. Firstly, to have a quantitative indicator of the accuracy of our prediction algorithm, we use the most basic performance index: the Root Mean Squared Error (RMSE), which computes the variation of the forecast with respect to the read data, defined as
-
- To obtain an absolute indicator of the goodness of the prediction, a Mean Absolute Percentage Error (MAPE) can be calculated, it is defined as follows:
-
- Finally, the third index we use to evaluate the performance of our prediction algorithm computes a test on each predicted sample separately, calculating the deviation of the forecasted value from the real one. Error(j) is computed as follows:
-
- where σ is the standard deviation of dataset y and q1−α is the quantile of a normal distribution. Error is therefore a vector with as many zeroes as the number of samples correctly predicted, with the chosen confidence of (1−α). To obtain an absolute indicator of the accuracy of the forecast, we compute the Error Ratio (ER):
-
- We set the confidence level for ER index as 95%. For each considered time series, the last third of the data vector is left out of the prediction and used to validate the computed forecast. For each subset a hypothesis test for difference between means is computed: we suppose that for every comparison, the two sampling population (with mean μ and standard deviation σ) are normally distributed. Considering two distributions with sample parameters μ1, μ2, σ1, σ2 and lengths n1, n2, we formulate the hypotheses:
-
H 0:|μ1−μ2|=0 -
H 1:|μ1−μ2|>0 - We consider μx=|μ1−μ2| and
-
- Then we calculate the z-score
-
- and the significance threshold t1-α, which is the quantile of the t-distribution with min {n1,n2}=1 degrees of freedom. If the z-score is greater than the threshold, than the two distributions are different and the null hypothesis is rejected. In tables 1 and 2, results for different types of workload and performance data are displayed. We do not report computational time taken by the execution of the algorithm, as it is almost always less than a few seconds, which is reasonable, for the purposes of this study. First column indicates the type of parameter shown (μ is the mean and σ is the standard deviation); columns from 2 to 4 shows MAPE values for Box and Jenkins, Robust Linear and Holt-Winters, while columns from 5 to 7 and from 8 to 10 illustrate for the three algorithms respectively the RMSE and the R. Every table shows the results grouped with respect to the type of metric that the series is monitoring. For each group, the mean and the standard deviation of the results vector are displayed, together with the output of previously discussed test. In the “test” row, symbol ‘+’ indicates that the null hypothesis is been rejected in favor of the automated Box and Jenkins algorithm, while a stands for the acceptance of the null hypothesis. Results of these tests stand the accuracy of our algorithm, in predicting IT time series, with respect to the considered performance indices. The μ value for MAPE is never over 40%, which in literature is considered a reasonable threshold for the goodness of the forecast.
- Further, the μ value for Error Rate never surpasses 0.2, which implies that 20% of the uncorrect predicted samples is a reasonable percentage. Seeing the results obtained by the null hypothesis test shows that our algorithm is never statistically outperformed by its counter parts, in particular it is significantly better in 43% of the considered subsets of data. Moreover, amongst all the types of data metrics, the algorithm performs best on Business Driver (Events) and Storage data We present two visual examples of cross validation which back our algorithm further.
-
FIG. 4 shows the cross-validation performed on a time series monitoring the number of events occurring in a web server on a daily basis. This example will clearly illustrate the suitability of our method over the algorithms for this type of time series, in general, datasets of this type show a clear lower bound, which is properly detected by the trampoline identifier, there by evading infeasible situations due to possible negative trend in the data. The instance illustrated inFIG. 4 shows a clear trampoline base represented byvalue 0, as datasets monitoring events, can not have negative values (as Holt-Winters instead uncorrectly predicts). Furthermore, our automated prediction method detects all basic seasonal components in the data so that the forecast fits the periodic behavior of the series appropriately (unlike Robust Linear algorithm, which is unaware of seasonalities). Our second example shown inFIG. 2 represents the cross-validation test on a hourly sampled storage time series, representing the disk memory used by a machine. Analyzing the data closely shows that there is a double seasonality, arising from daily and weekly fluctuations of the memory occupation over the trend of the series. Capturing this characteristic behavior leads to a more informed prediction, which follows correctly the recurrence of local and global peaks in the data. For this specific time series, the improvement in terms of prediction accuracy of this algorithm with respect to its counterparts is considerable: it reduces the MAPE by 35% and the RMSE by 26% against Robust linear, 80% and 75% respectively against Holt-Winters. Further, none of the predicted values is considered uncorrect (with confidence 95%) and accordingly the ER for automated Box and Jenkins algorithm is 0.
Claims (2)
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/IT2010/000165 WO2011128922A1 (en) | 2010-04-15 | 2010-04-15 | Automated upgrading method for capacity of it system resources |
ITPCT/IT2010/000165 | 2010-04-15 | ||
WOPCT/IT2010/000165 | 2010-04-15 | ||
PCT/IB2011/051650 WO2012020329A1 (en) | 2010-04-15 | 2011-04-15 | Automated upgrading method for capacity of it system resources |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/IB2011/051650 Continuation WO2012020329A1 (en) | 2010-04-15 | 2011-04-15 | Automated upgrading method for capacity of it system resources |
Publications (3)
Publication Number | Publication Date |
---|---|
US20130041644A1 US20130041644A1 (en) | 2013-02-14 |
US20160105327A9 true US20160105327A9 (en) | 2016-04-14 |
US9356846B2 US9356846B2 (en) | 2016-05-31 |
Family
ID=43414796
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/650,827 Active 2031-12-18 US9356846B2 (en) | 2010-04-15 | 2012-10-12 | Automated upgrading method for capacity of IT system resources |
Country Status (3)
Country | Link |
---|---|
US (1) | US9356846B2 (en) |
EP (1) | EP2558938A1 (en) |
WO (2) | WO2011128922A1 (en) |
Cited By (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170249649A1 (en) * | 2016-02-29 | 2017-08-31 | Oracle International Corporation | Systems and methods for trending patterns within time-series data |
US10331802B2 (en) | 2016-02-29 | 2019-06-25 | Oracle International Corporation | System for detecting and characterizing seasons |
US10635563B2 (en) | 2016-08-04 | 2020-04-28 | Oracle International Corporation | Unsupervised method for baselining and anomaly detection in time-series data for enterprise systems |
US10699211B2 (en) | 2016-02-29 | 2020-06-30 | Oracle International Corporation | Supervised method for classifying seasonal patterns |
US10817803B2 (en) | 2017-06-02 | 2020-10-27 | Oracle International Corporation | Data driven methods and systems for what if analysis |
US10855548B2 (en) * | 2019-02-15 | 2020-12-01 | Oracle International Corporation | Systems and methods for automatically detecting, summarizing, and responding to anomalies |
US10885461B2 (en) | 2016-02-29 | 2021-01-05 | Oracle International Corporation | Unsupervised method for classifying seasonal patterns |
EP3767467A1 (en) * | 2019-07-15 | 2021-01-20 | Bull SAS | Method and device for determining a performance index value for prediction of anomalies in a computer infrastructure from performance indicator values |
US10915830B2 (en) | 2017-02-24 | 2021-02-09 | Oracle International Corporation | Multiscale method for predictive alerting |
US10949436B2 (en) | 2017-02-24 | 2021-03-16 | Oracle International Corporation | Optimization for scalable analytics using time series models |
US10963346B2 (en) | 2018-06-05 | 2021-03-30 | Oracle International Corporation | Scalable methods and systems for approximating statistical distributions |
US10970186B2 (en) | 2016-05-16 | 2021-04-06 | Oracle International Corporation | Correlation-based analytic for time-series data |
US10997517B2 (en) | 2018-06-05 | 2021-05-04 | Oracle International Corporation | Methods and systems for aggregating distribution approximations |
US11082439B2 (en) | 2016-08-04 | 2021-08-03 | Oracle International Corporation | Unsupervised method for baselining and anomaly detection in time-series data for enterprise systems |
US11138090B2 (en) | 2018-10-23 | 2021-10-05 | Oracle International Corporation | Systems and methods for forecasting time series with variable seasonality |
US11263566B2 (en) * | 2016-06-20 | 2022-03-01 | Oracle International Corporation | Seasonality validation and determination of patterns |
US20220343099A1 (en) * | 2021-04-21 | 2022-10-27 | Tata Consultancy Services Limited | Method and system for identification of agro-phenological zones and updation of agro-phenological zones |
US11533326B2 (en) | 2019-05-01 | 2022-12-20 | Oracle International Corporation | Systems and methods for multivariate anomaly detection in software monitoring |
US11537940B2 (en) | 2019-05-13 | 2022-12-27 | Oracle International Corporation | Systems and methods for unsupervised anomaly detection using non-parametric tolerance intervals over a sliding window of t-digests |
US11836526B1 (en) * | 2017-09-15 | 2023-12-05 | Splunk Inc. | Processing data streams received from instrumented software using incremental finite window double exponential smoothing |
US11887015B2 (en) | 2019-09-13 | 2024-01-30 | Oracle International Corporation | Automatically-generated labels for time series data and numerical lists to use in analytic and machine learning systems |
US11900282B2 (en) * | 2020-01-21 | 2024-02-13 | Hcl Technologies Limited | Building time series based prediction / forecast model for a telecommunication network |
US12001926B2 (en) | 2018-10-23 | 2024-06-04 | Oracle International Corporation | Systems and methods for detecting long term seasons |
Families Citing this family (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140222997A1 (en) * | 2013-02-05 | 2014-08-07 | Cisco Technology, Inc. | Hidden markov model based architecture to monitor network node activities and predict relevant periods |
CN103218516A (en) * | 2013-03-22 | 2013-07-24 | 南京航空航天大学 | Clustered and retrogressed single-step predication method of airport noises |
US9218570B2 (en) * | 2013-05-29 | 2015-12-22 | International Business Machines Corporation | Determining an anomalous state of a system at a future point in time |
US9652354B2 (en) * | 2014-03-18 | 2017-05-16 | Microsoft Technology Licensing, Llc. | Unsupervised anomaly detection for arbitrary time series |
JP2015184818A (en) * | 2014-03-20 | 2015-10-22 | 株式会社東芝 | Server, model application propriety determination method and computer program |
US9996444B2 (en) * | 2014-06-25 | 2018-06-12 | Vmware, Inc. | Automated methods and systems for calculating hard thresholds |
CN104200099A (en) * | 2014-09-01 | 2014-12-10 | 山东科技大学 | Mine water inflow calculating method based on hydrogeological account |
WO2017058045A1 (en) * | 2015-09-29 | 2017-04-06 | Emc Corporation | Dynamic storage tiering based on predicted workloads |
US10572836B2 (en) * | 2015-10-15 | 2020-02-25 | International Business Machines Corporation | Automatic time interval metadata determination for business intelligence and predictive analytics |
US10417111B2 (en) | 2016-05-09 | 2019-09-17 | Oracle International Corporation | Correlation of stack segment intensity in emergent relationships |
US11016730B2 (en) | 2016-07-28 | 2021-05-25 | International Business Machines Corporation | Transforming a transactional data set to generate forecasting and prediction insights |
US9582781B1 (en) | 2016-09-01 | 2017-02-28 | PagerDuty, Inc. | Real-time adaptive operations performance management system using event clusters and trained models |
US10515323B2 (en) * | 2016-09-12 | 2019-12-24 | PagerDuty, Inc. | Operations command console |
US10411969B2 (en) * | 2016-10-03 | 2019-09-10 | Microsoft Technology Licensing, Llc | Backend resource costs for online service offerings |
US10061677B2 (en) * | 2016-11-16 | 2018-08-28 | Anodot Ltd. | Fast automated detection of seasonal patterns in time series data without prior knowledge of seasonal periodicity |
CN107018039B (en) * | 2016-12-16 | 2020-04-14 | 阿里巴巴集团控股有限公司 | Method and device for testing performance bottleneck of server cluster |
JP6714536B2 (en) * | 2017-04-24 | 2020-06-24 | 日本電信電話株式会社 | Time series data display system, time series data display method and program |
CN107122299A (en) * | 2017-04-25 | 2017-09-01 | 丹露成都网络技术有限公司 | A kind of automated software code detection method |
CN107391213A (en) * | 2017-08-29 | 2017-11-24 | 郑州云海信息技术有限公司 | A kind of method, apparatus of memory apparatus system upgrading and a kind of upgrade-system |
US10635565B2 (en) | 2017-10-04 | 2020-04-28 | Servicenow, Inc. | Systems and methods for robust anomaly detection |
EP3477559A1 (en) * | 2017-10-31 | 2019-05-01 | Tata Consultancy Services Limited | Method and system for multi-core processing based time series management with pattern detection based forecasting |
US11442958B2 (en) | 2019-04-25 | 2022-09-13 | EMC IP Holding Company LLC | Data distribution in continuous replication systems |
US10949116B2 (en) | 2019-07-30 | 2021-03-16 | EMC IP Holding Company LLC | Storage resource capacity prediction utilizing a plurality of time series forecasting models |
CN111045907B (en) * | 2019-12-12 | 2020-10-09 | 苏州博纳讯动软件有限公司 | System capacity prediction method based on traffic |
US11973779B2 (en) * | 2021-05-11 | 2024-04-30 | Bank Of America Corporation | Detecting data exfiltration and compromised user accounts in a computing network |
CN113238714A (en) * | 2021-05-28 | 2021-08-10 | 广东好太太智能家居有限公司 | Disk capacity prediction method and system based on historical monitoring data and storage medium |
US11983721B2 (en) * | 2021-10-29 | 2024-05-14 | Paypal, Inc. | Computer software architecture for execution efficiency |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CA2471013C (en) * | 2001-12-19 | 2011-07-26 | David Helsper | Method and system for analyzing and predicting the behavior of systems |
AU2003202356A1 (en) * | 2002-02-07 | 2003-09-02 | Thinkdynamics Inc. | Method and system for managing resources in a data center |
US20050038729A1 (en) * | 2003-08-13 | 2005-02-17 | Gofaser Technology Company | Method and system for monitoring volume information in stock market |
EP1739558A1 (en) * | 2005-06-29 | 2007-01-03 | International Business Machines Corporation | Method, computer program and device to automatically predict performance shortages of databases |
US20080033991A1 (en) * | 2006-08-03 | 2008-02-07 | Jayanta Basak | Prediction of future performance of a dbms |
-
2010
- 2010-04-15 WO PCT/IT2010/000165 patent/WO2011128922A1/en active Application Filing
-
2011
- 2011-04-15 WO PCT/IB2011/051650 patent/WO2012020329A1/en active Application Filing
- 2011-04-15 EP EP11722549A patent/EP2558938A1/en not_active Ceased
-
2012
- 2012-10-12 US US13/650,827 patent/US9356846B2/en active Active
Cited By (36)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11080906B2 (en) | 2016-02-29 | 2021-08-03 | Oracle International Corporation | Method for creating period profile for time-series data with recurrent patterns |
US11670020B2 (en) | 2016-02-29 | 2023-06-06 | Oracle International Corporation | Seasonal aware method for forecasting and capacity planning |
US10970891B2 (en) | 2016-02-29 | 2021-04-06 | Oracle International Corporation | Systems and methods for detecting and accommodating state changes in modelling |
US11928760B2 (en) | 2016-02-29 | 2024-03-12 | Oracle International Corporation | Systems and methods for detecting and accommodating state changes in modelling |
US10692255B2 (en) | 2016-02-29 | 2020-06-23 | Oracle International Corporation | Method for creating period profile for time-series data with recurrent patterns |
US10699211B2 (en) | 2016-02-29 | 2020-06-30 | Oracle International Corporation | Supervised method for classifying seasonal patterns |
US11836162B2 (en) | 2016-02-29 | 2023-12-05 | Oracle International Corporation | Unsupervised method for classifying seasonal patterns |
US10331802B2 (en) | 2016-02-29 | 2019-06-25 | Oracle International Corporation | System for detecting and characterizing seasons |
US10867421B2 (en) * | 2016-02-29 | 2020-12-15 | Oracle International Corporation | Seasonal aware method for forecasting and capacity planning |
US10885461B2 (en) | 2016-02-29 | 2021-01-05 | Oracle International Corporation | Unsupervised method for classifying seasonal patterns |
US20170249649A1 (en) * | 2016-02-29 | 2017-08-31 | Oracle International Corporation | Systems and methods for trending patterns within time-series data |
US11232133B2 (en) | 2016-02-29 | 2022-01-25 | Oracle International Corporation | System for detecting and characterizing seasons |
US11113852B2 (en) * | 2016-02-29 | 2021-09-07 | Oracle International Corporation | Systems and methods for trending patterns within time-series data |
US10127695B2 (en) | 2016-02-29 | 2018-11-13 | Oracle International Corporation | Method for creating period profile for time-series data with recurrent patterns |
US10970186B2 (en) | 2016-05-16 | 2021-04-06 | Oracle International Corporation | Correlation-based analytic for time-series data |
US11263566B2 (en) * | 2016-06-20 | 2022-03-01 | Oracle International Corporation | Seasonality validation and determination of patterns |
US10635563B2 (en) | 2016-08-04 | 2020-04-28 | Oracle International Corporation | Unsupervised method for baselining and anomaly detection in time-series data for enterprise systems |
US11082439B2 (en) | 2016-08-04 | 2021-08-03 | Oracle International Corporation | Unsupervised method for baselining and anomaly detection in time-series data for enterprise systems |
US10949436B2 (en) | 2017-02-24 | 2021-03-16 | Oracle International Corporation | Optimization for scalable analytics using time series models |
US10915830B2 (en) | 2017-02-24 | 2021-02-09 | Oracle International Corporation | Multiscale method for predictive alerting |
US10817803B2 (en) | 2017-06-02 | 2020-10-27 | Oracle International Corporation | Data driven methods and systems for what if analysis |
US11836526B1 (en) * | 2017-09-15 | 2023-12-05 | Splunk Inc. | Processing data streams received from instrumented software using incremental finite window double exponential smoothing |
US10963346B2 (en) | 2018-06-05 | 2021-03-30 | Oracle International Corporation | Scalable methods and systems for approximating statistical distributions |
US10997517B2 (en) | 2018-06-05 | 2021-05-04 | Oracle International Corporation | Methods and systems for aggregating distribution approximations |
US11138090B2 (en) | 2018-10-23 | 2021-10-05 | Oracle International Corporation | Systems and methods for forecasting time series with variable seasonality |
US12001926B2 (en) | 2018-10-23 | 2024-06-04 | Oracle International Corporation | Systems and methods for detecting long term seasons |
US10855548B2 (en) * | 2019-02-15 | 2020-12-01 | Oracle International Corporation | Systems and methods for automatically detecting, summarizing, and responding to anomalies |
US11533326B2 (en) | 2019-05-01 | 2022-12-20 | Oracle International Corporation | Systems and methods for multivariate anomaly detection in software monitoring |
US11949703B2 (en) | 2019-05-01 | 2024-04-02 | Oracle International Corporation | Systems and methods for multivariate anomaly detection in software monitoring |
US11537940B2 (en) | 2019-05-13 | 2022-12-27 | Oracle International Corporation | Systems and methods for unsupervised anomaly detection using non-parametric tolerance intervals over a sliding window of t-digests |
US11403164B2 (en) | 2019-07-15 | 2022-08-02 | Bull Sas | Method and device for determining a performance indicator value for predicting anomalies in a computing infrastructure from values of performance indicators |
EP3767467A1 (en) * | 2019-07-15 | 2021-01-20 | Bull SAS | Method and device for determining a performance index value for prediction of anomalies in a computer infrastructure from performance indicator values |
FR3098938A1 (en) * | 2019-07-15 | 2021-01-22 | Bull Sas | Method and device for determining an anomaly prediction performance index value in an IT infrastructure from performance indicator values |
US11887015B2 (en) | 2019-09-13 | 2024-01-30 | Oracle International Corporation | Automatically-generated labels for time series data and numerical lists to use in analytic and machine learning systems |
US11900282B2 (en) * | 2020-01-21 | 2024-02-13 | Hcl Technologies Limited | Building time series based prediction / forecast model for a telecommunication network |
US20220343099A1 (en) * | 2021-04-21 | 2022-10-27 | Tata Consultancy Services Limited | Method and system for identification of agro-phenological zones and updation of agro-phenological zones |
Also Published As
Publication number | Publication date |
---|---|
US9356846B2 (en) | 2016-05-31 |
WO2012020329A1 (en) | 2012-02-16 |
US20130041644A1 (en) | 2013-02-14 |
EP2558938A1 (en) | 2013-02-20 |
WO2011128922A1 (en) | 2011-10-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9356846B2 (en) | Automated upgrading method for capacity of IT system resources | |
US9280436B2 (en) | Modeling a computing entity | |
US7039559B2 (en) | Methods and apparatus for performing adaptive and robust prediction | |
US7865389B2 (en) | Analyzing time series data that exhibits seasonal effects | |
US7251589B1 (en) | Computer-implemented system and method for generating forecasts | |
US8712950B2 (en) | Resource capacity monitoring and reporting | |
US8150861B2 (en) | Technique for implementing database queries for data streams using a curved fitting based approach | |
US20070083650A1 (en) | Prediction of service level compliance in it infrastructures | |
US7725575B2 (en) | Unexpected demand detection system and unexpected demand detection program | |
US20080221974A1 (en) | Lazy Evaluation of Bulk Forecasts | |
US10817046B2 (en) | Power saving through automated power scheduling of virtual machines | |
US20070250630A1 (en) | Method and a system of generating and evaluating potential resource allocations for an application | |
CA2772866A1 (en) | Benefit-based earned value management system | |
US8887161B2 (en) | System and method for estimating combined workloads of systems with uncorrelated and non-deterministic workload patterns | |
US8781869B2 (en) | Determining estimation variance associated with project planning | |
US10225337B2 (en) | Modeling and forecasting reserve capacity for overbooked clusters | |
US20090018813A1 (en) | Using quantitative models for predictive sla management | |
JP5118438B2 (en) | Improvement of computer network | |
CN116107854A (en) | Method, system, equipment and medium for predicting operation maintenance index of computer | |
US11556451B2 (en) | Method for analyzing the resource consumption of a computing infrastructure, alert and sizing | |
CN112418534B (en) | Method and device for predicting quantity of collected parts, electronic equipment and computer readable storage medium | |
CN103455858B (en) | Service-oriented system quality dynamic early-warning method | |
Wombacher et al. | Estimating the Processing Time of Process Instances in Semi-structured Processes--A Case Study | |
Czarnul et al. | BeesyBees—agent-based, adaptive & learning workflow execution module for BeesyCluster | |
Wickboldt et al. | Computer-generated comprehensive risk assessment for IT project management |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: CAPLAN SOFTWARE DEVELOPMENT S.R.L., ITALY Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:CREMONESI, PAOLO;DHYANI, KANIKA;VISCONTI, STEFANO;SIGNING DATES FROM 20130905 TO 20140130;REEL/FRAME:032295/0309 |
|
AS | Assignment |
Owner name: BMC SOFTWARE, INC., TEXAS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:CAPLAN SOFTWARE DEVELOPMENT S.R.L.;REEL/FRAME:038332/0475 Effective date: 20151210 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
AS | Assignment |
Owner name: CREDIT SUISSE AG, CAYMAN ISLANDS BRANCH, AS COLLATERAL AGENT, NORTH CAROLINA Free format text: SECURITY INTEREST;ASSIGNORS:BMC SOFTWARE, INC.;BLADELOGIC, INC.;REEL/FRAME:043514/0845 Effective date: 20170727 Owner name: CREDIT SUISSE AG, CAYMAN ISLANDS BRANCH, AS COLLAT Free format text: SECURITY INTEREST;ASSIGNORS:BMC SOFTWARE, INC.;BLADELOGIC, INC.;REEL/FRAME:043514/0845 Effective date: 20170727 |
|
AS | Assignment |
Owner name: CREDIT SUISSE, AG, CAYMAN ISLANDS BRANCH, AS COLLATERAL AGENT, NEW YORK Free format text: SECURITY INTEREST;ASSIGNORS:BMC SOFTWARE, INC.;BLADELOGIC, INC.;REEL/FRAME:047185/0744 Effective date: 20181002 Owner name: CREDIT SUISSE, AG, CAYMAN ISLANDS BRANCH, AS COLLA Free format text: SECURITY INTEREST;ASSIGNORS:BMC SOFTWARE, INC.;BLADELOGIC, INC.;REEL/FRAME:047185/0744 Effective date: 20181002 |
|
AS | Assignment |
Owner name: BMC ACQUISITION L.L.C., TEXAS Free format text: RELEASE OF PATENTS;ASSIGNOR:CREDIT SUISSE AG, CAYMAN ISLANDS BRANCH;REEL/FRAME:047198/0468 Effective date: 20181002 Owner name: BMC SOFTWARE, INC., TEXAS Free format text: RELEASE OF PATENTS;ASSIGNOR:CREDIT SUISSE AG, CAYMAN ISLANDS BRANCH;REEL/FRAME:047198/0468 Effective date: 20181002 Owner name: BLADELOGIC, INC., TEXAS Free format text: RELEASE OF PATENTS;ASSIGNOR:CREDIT SUISSE AG, CAYMAN ISLANDS BRANCH;REEL/FRAME:047198/0468 Effective date: 20181002 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 4 |
|
AS | Assignment |
Owner name: THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS COLLATERAL AGENT, TEXAS Free format text: SECURITY INTEREST;ASSIGNORS:BMC SOFTWARE, INC.;BLADELOGIC, INC.;REEL/FRAME:052844/0646 Effective date: 20200601 Owner name: THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS COLLATERAL AGENT, TEXAS Free format text: SECURITY INTEREST;ASSIGNORS:BMC SOFTWARE, INC.;BLADELOGIC, INC.;REEL/FRAME:052854/0139 Effective date: 20200601 |
|
AS | Assignment |
Owner name: ALTER DOMUS (US) LLC, ILLINOIS Free format text: GRANT OF SECOND LIEN SECURITY INTEREST IN PATENT RIGHTS;ASSIGNORS:BMC SOFTWARE, INC.;BLADELOGIC, INC.;REEL/FRAME:057683/0582 Effective date: 20210930 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 8 |
|
AS | Assignment |
Owner name: BLADELOGIC, INC., TEXAS Free format text: TERMINATION AND RELEASE OF SECURITY INTEREST IN PATENTS;ASSIGNOR:ALTER DOMUS (US) LLC;REEL/FRAME:066567/0283 Effective date: 20240131 Owner name: BMC SOFTWARE, INC., TEXAS Free format text: TERMINATION AND RELEASE OF SECURITY INTEREST IN PATENTS;ASSIGNOR:ALTER DOMUS (US) LLC;REEL/FRAME:066567/0283 Effective date: 20240131 |
|
AS | Assignment |
Owner name: GOLDMAN SACHS BANK USA, AS SUCCESSOR COLLATERAL AGENT, NEW YORK Free format text: OMNIBUS ASSIGNMENT OF SECURITY INTERESTS IN PATENT COLLATERAL;ASSIGNOR:CREDIT SUISSE AG, CAYMAN ISLANDS BRANCH, AS RESIGNING COLLATERAL AGENT;REEL/FRAME:066729/0889 Effective date: 20240229 |
|
AS | Assignment |
Owner name: BLADELOGIC, INC., TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS PREVIOUSLY RECORDED AT REEL/FRAME (052854/0139);ASSIGNOR:THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS COLLATERAL AGENT;REEL/FRAME:068339/0617 Effective date: 20240731 Owner name: BMC SOFTWARE, INC., TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS PREVIOUSLY RECORDED AT REEL/FRAME (052854/0139);ASSIGNOR:THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS COLLATERAL AGENT;REEL/FRAME:068339/0617 Effective date: 20240731 Owner name: BLADELOGIC, INC., TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS PREVIOUSLY RECORDED AT REEL/FRAME (052844/0646);ASSIGNOR:THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS COLLATERAL AGENT;REEL/FRAME:068339/0408 Effective date: 20240731 Owner name: BMC SOFTWARE, INC., TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS PREVIOUSLY RECORDED AT REEL/FRAME (052844/0646);ASSIGNOR:THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS COLLATERAL AGENT;REEL/FRAME:068339/0408 Effective date: 20240731 |