WO2017164779A1 - Method and module for managing samples relating to performance of a first service hosted by a data center - Google Patents

Method and module for managing samples relating to performance of a first service hosted by a data center Download PDF

Info

Publication number
WO2017164779A1
WO2017164779A1 PCT/SE2016/050232 SE2016050232W WO2017164779A1 WO 2017164779 A1 WO2017164779 A1 WO 2017164779A1 SE 2016050232 W SE2016050232 W SE 2016050232W WO 2017164779 A1 WO2017164779 A1 WO 2017164779A1
Authority
WO
WIPO (PCT)
Prior art keywords
configuration
service
data center
sample
performance
Prior art date
Application number
PCT/SE2016/050232
Other languages
French (fr)
Inventor
Yue Lu
Jawwad AHMED
Original Assignee
Telefonaktiebolaget Lm Ericsson (Publ)
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Telefonaktiebolaget Lm Ericsson (Publ) filed Critical Telefonaktiebolaget Lm Ericsson (Publ)
Priority to PCT/SE2016/050232 priority Critical patent/WO2017164779A1/en
Publication of WO2017164779A1 publication Critical patent/WO2017164779A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3452Performance evaluation by statistical analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/302Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a software system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/60Software deployment
    • G06F8/65Updates

Definitions

  • Embodiments herein relate to data centers, such as physical or virtual data centers, computer networks or the like.
  • a Sample Managing Module and a method therein for managing samples relating to performance of a first service hosted by a data center are disclosed.
  • a corresponding computer program and a carrier therefor are also disclosed.
  • a data center may host a plurality of services.
  • the plurality of services may be provided by one or more software applications.
  • a particular type of services is referred to as Distributed and Dependent Services (DDS).
  • the DDS may be executed in a single or multiple data centers, such as a single or multiple clouds. DDS are dependent on each other due to so called connecting entities, e.g. queues.
  • connecting entities e.g. queues.
  • a few typical examples include distributed Video on Demand (VoD) services, distributed tenant services, services with Service Level Agreements (SLAs) executed to control networked industrial robots, connected cars and heavy vehicles.
  • VoD Video on Demand
  • SLAs Service Level Agreements
  • the update of the data center can refer to updates of the DDS executing on the data center or to updates of a hardware infrastructure on which the data center relies, or to both.
  • At least some known solutions for change impact analysis apply statistical methods and probability theory in a conventional way, e.g. by applying a comparison using simple descriptive statistics or statistical hypothesis test, to raw data relating to an original system and a changed system.
  • the raw data can be service completion time and/or energy consumption of the data center.
  • An object may thus be to enable improved change impact analysis relating to
  • the object is achieved by a method, performed by a
  • Sample Managing Module for managing samples relating to performance of a first service hosted by a data center.
  • the data center is operable in a first configuration and a second configuration.
  • the data center when operated in the first configuration, hosts a second service.
  • the first service is dependent on the second service in that the performance of the first service depends on a resource operated by the first and second services.
  • the data center when operated in the second configuration, hosts a third service.
  • the first service is dependent on the third service in that the performance of the first service depends on the resource being operated by the first and third services.
  • the Sample Managing Module creates, for operation of the data center in each of the first and second configurations, a respective set of statistical measures, being independent and identically distributed within the respective set of statistical measures, by generating, from a respective number of observation periods, a respective statistical measure based on respective samples for said respective number of observation periods, wherein said respective samples relates to the performance of the first service when the data center is operated in the first and second configurations, respectively.
  • a Sample Managing Module configured for managing samples relating to performance of a first service hosted by a data center.
  • the data center is operable in a first configuration and a second configuration.
  • the data center when operated in the first configuration, hosts a second service.
  • the first service is dependent on the second service in that the performance of the first service depends on a resource operated by the first and second services.
  • the data center when operated in the second configuration, hosts a third service.
  • the first service is dependent on the third service in that the performance of the first service depends on the resource being operated by the first and third services.
  • the Sample Managing Module is configured for, for operation of the data center in each of the first and second configurations: creating a respective set of statistical measures, being independent and identically distributed within the respective set of statistical measures, by generating, from a respective number of observation periods, a respective statistical measure based on respective samples for said respective number of observation periods, wherein said respective samples relates to the performance of the first service when the data center is operated in the first and second configurations, respectively.
  • the object is achieved by a computer program and a carrier therefor corresponding to the aspects above.
  • the Sample Managing Module generates the respective statistical measure based on the respective samples for said respective number of observations periods, any dependency between samples from one particular observation period is removed. In this manner, the respective samples relating to the performance are collected from different observations periods. Since the dependency between the samples is avoided, the created respective set of statistical measures is also
  • the respective samples relating to the performance may be an execution time of the first service and/or a power consumption relating to execution of the first service.
  • improvement of change impact analysis pertaining to execution time and/or power consumption may be achieved according to some embodiments.
  • Figure 1 is a schematic overview of an exemplifying data center in which embodiments herein may be implemented
  • Figure 2 is a flowchart illustrating embodiments of the method performed by the Sample Managing Module
  • Figure 3 is another flowchart illustrating a particular embodiment of the method performed by the Sample Managing Module
  • Figure 4 is a block diagram illustrating embodiments of the Sample Managing Module.
  • an analysis dataset is Independent and Identically Distributed (IID).
  • IID Independent and Identically Distributed
  • the analysis dataset refers to a dataset to be analyzed using change impact analysis.
  • the known solution applies statistical methods and probability theory in a conventional way, e.g. using a statistical hypothesis test and a Markovian model.
  • the known solution applying statistical methods and probability theory in a conventional way, on the analysis dataset is invalid theoretically. As a result, the known solution may achieve inaccurate results in many cases.
  • the known solution makes the assumption about that the analysis dataset is IID independently of chosen statistical model and/or probability theory. Moreover, the known solution assumes that a Markovian model describes performance change of the observed service when the data center is updated. Thus, again it is assumed the analysis dataset to be IID. Moreover, use of the Markovian model is computationally expensive due to e.g. convolution of probability distributions as compared to the statistical hypothesis test. Disadvantageously, the use of the Markovian model increases requirements concerning computational capacity required to execute the known solution. Markovian models are described in e.g. "Applied Probability and Queues", 2 nd ed., to Asmussen, S., published in 2003 by Springer, New York.
  • the parametric hypothesis tests make assumptions about an underlying distribution of a population from which a sample is drawn, and which is investigated. It is typical that the population conforms to a normal distribution.
  • the parametric hypothesis tests include ANalysis Of VAriance (ANOVA) tests, Chi-Square Test, Contingency Tables, Fisher-test (F-test), Student's-test (t-test), z-test and the like.
  • Non-parametric tests, or distribution-free tests make no assumption about the underlying distribution of the population.
  • the non-parametric tests can be applied for both non-normally distributed populations, e.g. unknown distribution of population, and normally distributed populations, e.g. known distribution of population.
  • the non-parametric tests are more accurate for non-normally distributed populations than for normally distributed populations.
  • non-parametric tests do not require the analysis dataset to be large, e.g. hundreds of samples. Instead, with non-parametric tests, the analysis dataset may be less than hundred, e.g. tens of samples or even less.
  • the non-parametric tests include two-sample Kolmogorov-Smirnov (KS)-test, Mann- Whitney U-Test, Anderson-Darling-test (AD-test), sign scores test and the like.
  • WRS-test the Mann-Whitney U-Test, aka Wilcoxon Rank-Sum test, will be referred to as WRS-test.
  • FIG. 1 depicts an exemplifying data center 100 in which embodiments herein may be implemented.
  • the data center 100 is a so called Virtual Data Center (VDC).
  • VDC Virtual Data Center
  • the data center 100 may be a cloud computer system, a data system, a cloud data system, a computer network, a hardware system, a computer system platform, a hardware platform, a disaggregated hardware system, a server system, a cloud server system, one or more cloud environments or the like.
  • the data center 100 hosts one or more distributed and dependent services, such as a first service 101 and at least one other service.
  • the first service 101 is dependent on at least one other service.
  • the data center hosts, as mentioned, distributed and dependent services.
  • a second service 102 and a third service 103 are illustrated with dashed-lines, since they may not always be hosted by the data center 100 at the same time as explained in more detail below.
  • the first, second and third services 101 , 102, 103 may be different instances of one application, or they may be instances of respective applications, or a combination thereof.
  • One of the distributed and dependent services, hosted by the data center 100, may be dependent on another one of the DDS in that a resource 104 is operated by at least a pair of distributed and dependent services.
  • a resource 104 When the resource 104 is operated, it may mean load on the resource 104, busyness of the resource 104, contents of the resource 104 is altered.
  • the resource 104 may be a queue, whose length is changed when distributed and dependent services operate on it.
  • the resource 104 may sometimes be referred to as 'connecting entity'.
  • the queue may comprise a set of messages, which are processed by the pair of distributed and dependent services. When a message is processed, it will be removed from the queue.
  • a first type of DDS retrieve, or consume, information and/or functions provided by the resource 104.
  • a second type of DDS push, or produce information and/or functions provided by the resource 104.
  • a certain DDS may be of both the first type and the second type. This means that the term "operated” may mean that the resource 104 is shared 120, 121, 122 between DDS, i.e. operated means one or both of "consume” and "produce”.
  • the resource 104 may be a software application that provides a few components, or functions, that are shared between the DDS.
  • Figure 1 illustrates a Sample Managing Module (SMM) 110
  • the data center 100 is operable in a first configuration and a second
  • the data center 100 hosts the second service 102, i.e. the above mentioned at least one other service.
  • the first service is dependent on the second service in that the performance of the first service depends on the resource 104 operated by the first and second services 101 , 102.
  • the data center 100 hosts the third service 103 in addition to the first service 101.
  • the data center 100 hosts only the first and third services 101 , 103, i.e. the data center 100 does not host the second service 102.
  • the first service is dependent on the third service in that the performance of the first service depends on the resource, but consequently the first service is no longer dependent on the second service.
  • the resource is thus operated by only the first and third services according to the first example of the second configuration.
  • the data center 100 hosts the first and third services 101 , 103, i.e. in addition to that the data center 100 hosts the second service 102.
  • the first service is dependent on the third service in that the performance of the first service depends on the resource, and - as in the first configuration - the first service is still dependent on the second service.
  • the resource is thus operated by the first, second and third services according to the second example of the second configuration.
  • the replacement may be realized by a software update of the second service, where the updated version of the second service is referred to as the third service.
  • the replacement may be realized by removing the second service, i.e. shutting down the second service, and implementing the third service, i.e. starting up the third service, wherein the second and third services are completely different, in that they provide different services, i.e. perform different tasks.
  • the addition of the third service 103 is foreseen when the first configuration is considered to be an original configuration, i.e. before an update is performed, and the second configuration is considered to be an updated configuration, i.e. after the update has been performed.
  • the removal of the third service 103 is foreseen when the first configuration is considered to be the updated configuration and the second configuration is considered to be the original configuration.
  • the embodiments herein provides a way of obtaining, or creating, a dataset that is IID.
  • an IID-dataset is given by a respective set of statistical measures for operation of the data center 100 in each of the first and second
  • This part may be seen as a pre-processing, before performing an actual change impact analysis, e.g. a data processing.
  • the data processing may be performed as described according to some embodiments herein or according to known manners.
  • the pre-processing is described in actions A010 to A070.
  • the change impact analysis is about identifying the potential consequences of a change, or estimating what needs to be modified to accomplish a change.
  • the data processing it is aimed to perform a change impact analysis of performance of DDS, in terms of identifying a statistically, significant difference as a consequence of a change, or estimating a result of a change statistically, regarding e.g. execution time of a service or energy consumption relating to a service. Execution time of the service may be a time period for completing the service.
  • the samples relating to performance are dependent on each other.
  • the number of messages may be changed by other services periodically/sporadically/aperiodically.
  • the samples relating to performance may be execution time of the observed service and/or energy consumption relating to the observed service.
  • the service completion time for the observed service is calculated as h * t, in which h is the number of messages stored in the queue, e.g. the resource 104, and t is an execution time of either a message processing operation or another type of execution operation depending of a message taken from the queue.
  • h the number of messages stored in the queue
  • t is an execution time of either a message processing operation or another type of execution operation depending of a message taken from the queue.
  • the service completion time of the observed service do change.
  • the measured service completion times are not independent with each other anymore.
  • samples within the observation period may be non-normality distributed, e.g. distributed in a multimodal manner.
  • an assumption regarding an underlying populations from which samples are drawn i.e. assuming normality, does often not hold.
  • the Markovian model is inappropriate.
  • the data processing may thus need to be adapted accordingly as described in more detail below.
  • FIG 2 a schematic flowchart of exemplifying methods in the SMM 1 10 is shown. Accordingly, the SMM 1 10 performs a method for managing samples relating to performance of the first service 101 hosted by the data center 100.
  • the data center 100 is operable in a first configuration and a second configuration.
  • the data center 100 when operated in the first configuration, hosts a second service.
  • the first service is dependent on the second service in that the performance of the first service depends on a resource operated by the first and second services.
  • the data center 100 when operated in the second configuration, hosts a third service.
  • the first service is dependent on the third service in that the performance of the first service depends on the resource being operated by the first and third services.
  • the pre-processing may be performed by one or more of actions A010 to A070.
  • action A010 is performed.
  • the SMM 1 10 creates a respective set of statistical measures, being independent and identically distributed within the respective set of statistical measures.
  • the respective set of statistical measures is obtained by generating, from a respective number of observation periods, a respective statistical measure based on respective samples for said respective number of observation periods. This means that the respective samples are collected from each of said respective number of observation periods.
  • Said respective samples relates to the performance of the first service when the data center 100 is operated in the first and second configurations, respectively.
  • the respective samples may comprise a respective first configuration sample relating to the performance of the first service for each one of the first number of observation periods, when operating the data center 100 in the first configuration, and a respective second configuration sample relating to the performance of the first service for each one of the second number of observation periods, when operating the data center 100 in the second configuration.
  • a sample among the respective samples may relate to one or more metrics relating to the performance of the first service.
  • This means that the sample for one observation period among the respective number of observation periods may include one or more metrics, such as execution time, energy consumption or the like as disclosed herein.
  • the respective set of statistical measures may comprise a first set of statistical measures, when operating the data center 100 in the first configuration, and a second set of statistical measures, when operating the data center 100 in the second configuration.
  • the respective number of observation periods may comprise a first number of observations periods when operating the data center 100 in the first configuration and a second number of observations periods when operating the data center 100 in the second configuration.
  • the respective statistical measure may comprise a respective first configuration statistical measure, when operating the data center 100 in the first configuration, and a respective second configuration statistical measure, when operating the data center 100 in the second configuration.
  • Action A010 i.e. the creating of the respective set of statistical measures, may comprise, when operating the data center 100 in the first configuration actions A020, A030 and A040.
  • the SMM 1 10 may obtain the respective first configuration sample relating to the performance of the first service for each one of the first number of observations periods. Action A030
  • the SMM 1 10 may generate, for each observation period of the first number of observation periods, the respective first configuration statistical measure based on the respective first configuration sample. Action A040
  • the SMM 1 10 may form the first set of statistical measures by including the respective first configuration statistical measure for said each observation period of the first number of observations periods, wherein each respective first configuration statistical measure is independent and identically distributed within the first set of statistical measures.
  • Action A010 i.e. the creating of the respective set of statistical measures, may comprise, when operating the data center 100 in the second configuration actions A050, A060 and A070.
  • Action A050 i.e. the creating of the respective set of statistical measures
  • the SMM 1 10 may obtain the respective second configuration sample relating to the performance of the first service for each one of the second number of observations periods,
  • the SMM 1 10 may generate, for each observation period of the second number of observation periods, the respective second configuration statistical measure based on the respective second configuration sample.
  • the SMM 1 10 may form the second set of statistical measures by including the respective second configuration statistical measure for said each observation period of the second number of observations periods, wherein each respective second configuration statistical measure is independent and identically distributed within the second set of statistical measures.
  • the data processing is described in actions A080 to A110.
  • the data processing may be split into three parts, i.e. data analytics part as in action A080 and A090, a weighted sum model part as in action A100 and a conclusion part as in action 1 10.
  • known parametric hypothesis test may be applied as an alternative to the data processing described below.
  • the SMM 1 10 may apply at least three non-parametric hypothesis tests for evaluating a first hypothesis and a second hypothesis concerning the first and second sets of statistical measures.
  • the three non-parametric hypothesis tests may comprise:
  • a first test managing detection of a vertical difference between the first and second sets of statistical measures
  • a second test managing a median difference between the first and second sets of statistical measures
  • a third test managing a tail difference between the first and second sets of statistical measures.
  • the first test may be a two-sample Kolmogorov-Smirnov test, i.e. the two-sample KS test is applied to detect the vertical difference between two populations
  • the second test may be a WRS-test
  • the median difference between two populations
  • the third test may be an Anderson-Darling-test to test the tail difference between two populations. Note that a common assumption of such non- parametric statistics, i.e. IID-datasets, has been achieved by the proposed preprocessing as described above.
  • the first hypothesis such as a null hypothesis, may assume that the difference is statistically insignificant
  • the second hypothesis such as an alternative hypothesis, may assume that the difference is statistically significant
  • the above mentioned three non-parametric hypothesis test will be applied to verify if the two datasets, i.e. the respective first configuration statistical measure and the respective second configuration statistical measure, are from a same underlying population or not, with the first and second hypotheses as follows:
  • Ha From a viewpoint of completion time and/or energy consumption of services of interest, the difference caused by a change is statistically significant, by using a specific non-parametric statistical hypothesis test.
  • Each of said three non-parametric hypothesis tests may be applied using a respective significance level or a common significance level.
  • the common significance level is applied for all three non-parametric hypothesis tests, e.g. 0,05 which is a common value.
  • different significance levels are adopted for different hypothesis tests, then complexity will increase as in a confidence level of the total score has to be recalculated/unified. For example, Score A given by the two-sample KS test on 0,05 * weight A, Score B given by the WRS test on 0,01 * weight B, and Score C * weight C obtained by the AD test on 0,32 cannot be added together, directly, i.e. without compensating for the difference in significance level.
  • the SMM 1 10 may, for each of said three non-parametric hypotheses tests, compute a respective score.
  • the respective score may be set to zero when said each of said three non- parametric hypotheses tests does not reject the first hypothesis and the respective score may be set to one otherwise to indicate acceptance of the second hypothesis, wherein a set of scores may comprise the respective score for each of said three non-parametric hypothesis tests.
  • the SMM 1 10 may calculate a total score as a weighted sum of the set of scores. In this manner, a combined joint effort of the three non-parametric hypothesis tests is achieved.
  • weights of the weighted sum may be equal to 1 divided by a number of non-parametric hypothesis tests, i.e. 3 in case of three non-parametric hypothesis tests. In this manner, the different non-parametric hypothesis tests are given an equal impact on the total score, i.e. none of the non-parametric hypothesis tests is more important than another test. This means that a sum of the weights for terms of the weighted sum is equal to 1.
  • one or more of the weights may be different from the other weights. In this manner, the non-parametric hypothesis tests may be given different importance, i.e. impact on the total score.
  • the SMM 1 10 may conclude that a difference between the first and second sets of statistical measures is statistically significant when the total score is greater than a threshold value for identifying the difference as statistically significant.
  • the threshold value may be 2/3.
  • a method for performing change impact analysis of a DDS regarding its performance data including the data processing, a method for performing change impact analysis of a DDS regarding its performance data, e.g.
  • embodiments herein are highly scalability and less computational expense than existing methods.
  • the Markovian model may be requiring in terms of computational capacity. Since the embodiments herein avoids the use of the Markovian model and applies non-parametric hypothesis test, requirements in terms of computational capacity will be less.
  • the respective sets of statistical measures for the first and second configurations may be performed in parallel. Thus, allowing for a scalable solution.
  • inventions herein are easy to integrate with any existing data collection solution using measurements.
  • the integration may come easy due to that the creation, e.g. as in action A010, of respective sets of statistical measures may be applied to the respective samples, e.g. raw performance data present in for example already existing system log-files and the like.
  • Figure 3 shows a further flowchart of an exemplifying method performed by the SMM 1 10.
  • the SMM 1 10 constructs an IID-dataset of performance data of a DDS being observed, containing k data from k observation periods, for both the original and the changed systems, respectively.
  • each of k observation periods there are m raw performance data collected in the period, and each of such m performance data is recorded between the instants when the service gets started and when it is finished.
  • a specific descriptive statistics of such m data is chosen, e.g. minimum, lower quartile, second quartile (median), third quartile, maximum, or mean of such m raw performance data.
  • the specific descriptive statistics is an example of the statistical measure.
  • a random selection of the descriptive statistics may be applied, e.g. using a uniform random variable to give all the descriptive statistics an equal opportunity to be chosen for the next observation period. For instance, in one observation period, the minimum of m raw performance data may be selected, while in the next observation any of such descriptive statistics as returned by the random number obtained by a uniform random variable may be selected.
  • the m raw performance data may be the respective sample relating to the performance of the first service when operating the data center 100 in the first and/or second configuration.
  • A010 when including action A020, A030, A040, A050, A060 and A070.
  • kohginai may be an example of the respective first configuration sample when operating the data center 100 in the first configuration.
  • k cha nged may be an example of the respective second configuration sample when operating the data center 100 in the second configuration.
  • k or iginai and kchanged may relate to the second configuration sample and first configuration sample, respectively.
  • Alpha is the chosen level of significance, which is for instance 0.05, a typical value and based on preliminary assessments provides appropriate results. Accordingly, the confidence level is (1 -Alpha)*100%, e.g. 95%.
  • the variances and the shape of samples should also be verified by using e.g. Levene's test. If such a test cannot be passed, then the IID samples will be reconstructed by repeating the first step of data preprocessing component.
  • This action is similar to one or more of actions A080 and A090.
  • the SMM 1 10 may apply a weighted sum model
  • This action is similar to action A100.
  • the SMM 100 may check if Sfinal-score is not less than a Threshold, e.g. 2/3.
  • This action is similar to a part of action A1 10.
  • This action is similar to a part of action A1 10.
  • condition of action 304 is not fulfilled, conclude CO.
  • This action is similar to a part of action A1 10.
  • the method has been evaluated by using a use case of DDS in an example of simulated networked industrial robots, in which the execution time and period of DDS are changed at runtime. DDS which control the actuator of robots has been observed.
  • Results have shown that the solution according to some embodiments herein can identify most cases, i.e. with 87.5% (i.e. 7/8) accuracy.
  • the SMM 1 10 may comprise a processing module 401 , such as a means for performing the methods described herein.
  • the means may be embodied in the form of one or more hardware modules and/or one or more software modules.
  • the SMM 1 10 may further comprise a memory 402.
  • the memory may comprise, such as contain or store, instructions, e.g. in the form of a computer program 403, which may comprise computer readable code units.
  • the SMM 110 and/or the processing module 401 comprises a processing circuit 404 as an exemplifying hardware module.
  • the processing module 401 may be embodied in the form of, or 'realized by', the processing circuit 404.
  • the instructions may be executable by the processing circuit 404, whereby the SMM 110 is operative to perform the methods of Figure 2.
  • the instructions when executed by the SMM 1 10 and/or the processing circuit 404, may cause the SMM 1 10 to perform the method according to Figure 2.
  • Figure 4 further illustrates a carrier 405, or program carrier, which comprises the computer program 403 as described directly above.
  • the processing module 401 comprises an Input/Output module 406, which may be exemplified by a receiving module and/or a sending module as described below when applicable.
  • the SMM 110 and/or the processing module 401 may comprise one or more of a pre-processing module 407, a data processing module 408, a creating module 410, an obtaining module 420, a generating module 430, a forming module 440, an applying module 450, a computing module 460, a calculating module 470 and a concluding module 480 as exemplifying hardware modules.
  • one or more of the aforementioned exemplifying hardware modules may be implemented as one or more software modules.
  • the pre-processing module 407 may comprise one or more of the creating module 410, the obtaining module 420, the generating module 430 and the forming module 440.
  • the data processing module 408 may comprises one or more of the applying module 450, the computing module 460, the calculating module 470 and the concluding module 480.
  • the SMM 110 is configured for managing samples relating to performance of a first service hosted by a data center 100.
  • the data center 100 is operable in a first configuration and a second configuration, wherein the data center 100, when operated in the first configuration, hosts a second service, wherein the first service is dependent on the second service in that the performance of the first service depends on a resource operated by the first and second services, wherein the data center 100, when operated in the second configuration, hosts a third service, wherein the first service is dependent on the third service in that the performance of the first service depends on the resource being operated by the first and third services.
  • the SMM 1 10 and/or the processing module 401 and/or the creating module 410 is configured for, for operation of the data center 100 in each of the first and second configurations, creating a respective set of statistical measures, being independent and identically distributed within the respective set of statistical measures, by generating, from a respective number of observation periods, a respective statistical measure based on respective samples for said respective number of observation periods, wherein said respective samples relates to the performance of the first service when the data center 100 is operated in the first and second configurations, respectively.
  • the respective set of statistical measures may comprise a first set of statistical measures, when operating the data center 100 in the first configuration, and a second set of statistical measures, when operating the data center 100 in the second configuration.
  • the respective number of observation periods may comprise a first number of observations periods when operating the data center 100 in the first configuration and a second number of observations periods when operating the data center 100 in the second configuration.
  • the respective samples may comprise a respective first configuration sample relating to the performance of the first service for each one of the first number of observation periods, when operating the data center 100 in the first configuration, and a respective second configuration sample relating to the performance of the first service for each one of the second number of observation periods, when operating the data center 100 in the second configuration.
  • the respective statistical measure may comprise a respective first configuration statistical measure, when operating the data center 100 in the first configuration, and a respective second configuration statistical measure, when operating the data center 100 in the second configuration.
  • the SMM 1 10 and/or the processing module 401 and/or the obtaining module may comprise a respective first configuration statistical measure, when operating the data center 100 in the first configuration, and a respective second configuration statistical measure, when operating the data center 100 in the second configuration.
  • 420 may be configured for creating the respective set of statistical measures, when operating the data center 100 in the first configuration, by obtaining the respective first configuration sample relating to the performance of the first service for each one of the first number of observations periods.
  • 430 may be configured for creating the respective set of statistical measures, when operating the data center 100 in the first configuration, by generating, for each observation period of the first number of observation periods, the respective first configuration statistical measure based on the respective first configuration sample.
  • the SMM 1 10 and/or the processing module 401 and/or the forming module 440 may be configured for creating the respective set of statistical measures, when operating the data center 100 in the first configuration, by forming the first set of statistical measures by including the respective first configuration statistical measure for said each observation period of the first number of observations periods, wherein each respective first configuration statistical measure is independent and identically distributed within the first set of statistical measures.
  • the SMM 1 10 and/or the processing module 401 and/or the obtaining module 420, or another obtaining module (not shown), may be configured for creating the respective set of statistical measures, when operating the data center 100 in the second configuration, by obtaining the respective second configuration sample relating to the performance of the first service for each one of the second number of observations periods.
  • the SMM 1 10 and/or the processing module 401 and/or the generating module 430, or another generating module (not shown), may be configured for creating the respective set of statistical measures, when operating the data center 100 in the first configuration, by generating, for each observation period of the second number of observation periods, the respective second configuration statistical measure based on the respective second configuration sample, and
  • the SMM 1 10 and/or the processing module 401 and/or the forming module 440, or another forming module (not shown), may be configured for creating the respective set of statistical measures, when operating the data center 100 in the first configuration, by forming the second set of statistical measures by including the respective second configuration statistical measure for said each observation period of the second number of observations periods, wherein each respective second configuration statistical measure is independent and identically distributed within the second set of statistical measures.
  • a sample among the respective samples may relate to one or more metrics relating to the performance of the first service.
  • the SMM 1 10 and/or the processing module 401 and/or the applying module 450 may be configured for applying at least three non-parametric hypothesis tests for evaluating a first hypothesis and a second hypothesis concerning the first and second sets of statistical measures.
  • the SMM 1 10 and/or the processing module 401 and/or the computing module 460 may be configured for, for each of said three non-parametric hypotheses tests, computing a respective score.
  • the respective score is set to zero when said each of said three non-parametric hypotheses tests does not reject the first hypothesis and the respective score is set to one otherwise to indicate acceptance of the second hypothesis, wherein a set of scores may comprise the respective score for each of said three non-parametric hypothesis tests.
  • the SMM 1 10 and/or the processing module 401 and/or the calculating module 470 may be configured for calculating a total score as a weighted sum of the set of scores.
  • the SMM 1 10 and/or the processing module 401 and/or the concluding module 480 may be configured for concluding that a difference between the first and second sets of statistical measures is statistically significant when the total score is greater than a threshold value for identifying the difference as statistically significant.
  • the first hypothesis such as a null hypothesis, may assume that the difference is statistically insignificant
  • the second hypothesis such as an alternative hypothesis, may assume that the difference is statistically significant
  • Each of said three non-parametric hypothesis tests may be applied using a respective significance level or a common significance level.
  • the three non-parametric hypothesis tests may comprise:
  • a third test managing a tail difference between the first and second sets of statistical measures.
  • the data center 100 when operated in the second configuration, may host the second and third services.
  • the first service is dependent on the second and third services in that the performance of the first service depends on the resource being operated by the first, second and third services.
  • Weights of the weighted sum may be equal to 1 divided by a number of non- parametric hypothesis tests, such as three.
  • the threshold value may be 2/3.
  • the term “module” may refer to one or more functional modules, each of which may be implemented as one or more hardware modules and/or one or more software modules and/or a combined software/hardware module in a node. In some examples, the module may represent a functional unit realized as software and/or hardware of the node.
  • the term “program carrier”, or “carrier” may refer to one of an electronic signal, an optical signal, a radio signal, and a computer readable medium. In some examples, the program carrier may exclude transitory, propagating signals, such as the electronic, optical and/or radio signal. Thus, in these examples, the carrier may be a non-transitory carrier, such as a non-transitory computer readable medium.
  • processing module may include one or more hardware modules, one or more software modules or a combination thereof. Any such module, be it a hardware, software or a combined hardware-software module, may be a determining means, estimating means, capturing means, associating means, comparing means, identification means, selecting means, receiving means, sending means or the like as disclosed herein.
  • the expression “means” may be a module
  • software module may refer to a software application, a Dynamic Link Library (DLL), a software component, a software object, an object according to Component Object Model (COM), a software component, a software function, a software engine, an executable binary software file or the like.
  • DLL Dynamic Link Library
  • COM Component Object Model
  • processing circuit may refer to a processing unit, a processor, an Application Specific integrated Circuit (ASIC), a Field-Programmable Gate Array (FPGA) or the like.
  • the processing circuit or the like may comprise one or more processor kernels.
  • the expression “configured to/for” may mean that a processing circuit is configured to, such as adapted to or operative to, by means of software configuration and/or hardware configuration, perform one or more of the actions described herein.
  • action may refer to an action, a step, an operation, a response, a reaction, an activity or the like. It shall be noted that an action herein may be split into two or more sub-actions as applicable. Moreover, also as applicable, it shall be noted that two or more of the actions described herein may be merged into a single action.
  • the term “memory” may refer to a hard disk, a magnetic storage medium, a portable computer diskette or disc, flash memory, random access memory (RAM) or the like. Furthermore, the term “memory” may refer to an internal register memory of a processor or the like. As used herein, the term “computer readable medium” may be a Universal Serial Bus (USB) memory, a DVD-disc, a Blu-ray disc, a software module that is received as a stream of data, a Flash memory, a hard drive, a memory card, such as a MemoryStick, a Multimedia Card (MMC), Secure Digital (SD) card, etc.
  • USB Universal Serial Bus
  • MMC Multimedia Card
  • SD Secure Digital
  • aforementioned examples of computer readable medium may be provided as one or more computer program products.
  • computer readable code units may be text of a computer program, parts of or an entire binary file representing a computer program in a compiled format or anything there between.
  • number and/or value may be any kind of digit, such as binary, real, imaginary or rational number or the like. Moreover, “number” and/or “value” may be one or more characters, such as a letter or a string of letters. “Number” and/or “value” may also be represented by a string of bits, i.e. zeros and/or ones.
  • a set of may refer to one or more of something.
  • a set of devices may refer to one or more devices
  • a set of parameters may refer to one or more parameters or the like according to the embodiments herein.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Quality & Reliability (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Hardware Design (AREA)
  • Probability & Statistics with Applications (AREA)
  • Evolutionary Biology (AREA)
  • Computer Security & Cryptography (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

A method and a Sample Managing Module, SMM, (110) for managing samples relating to performance of a first service hosted by a data center (100). The data center (100), when operated in a first configuration, hosts a second service. The first service is dependent on the second service in that the performance of the first service depends on a resource operated by the first and second services. The data center (100), when operated in a second configuration, hosts a third service. The first service is dependent on the third service in that the performance of the first service depends on the resource being operated by the first and third services. The SMM (110) creates, for operation of the data center (100) in each of the first and second configurations, a respective set of statistical measures, being independent and identically distributed within the respective set of statistical measures, by generating, from a respective number of observation periods, a respective statistical measure based on respective samples for said respective number of observation periods. Said respective samples relates to the performance of the first service when the data center (100) is operated in the first and second configurations, respectively.

Description

METHOD AND MODULE FOR MANAGING SAMPLES RELATING TO
PERFORMANCE OF A FIRST SERVICE HOSTED BY A DATA CENTER
TECHNICAL FIELD
Embodiments herein relate to data centers, such as physical or virtual data centers, computer networks or the like. In particular, a Sample Managing Module and a method therein for managing samples relating to performance of a first service hosted by a data center are disclosed. A corresponding computer program and a carrier therefor are also disclosed.
BACKGROUND
Generally, a data center may host a plurality of services. The plurality of services may be provided by one or more software applications. A particular type of services is referred to as Distributed and Dependent Services (DDS). The DDS may be executed in a single or multiple data centers, such as a single or multiple clouds. DDS are dependent on each other due to so called connecting entities, e.g. queues. A few typical examples include distributed Video on Demand (VoD) services, distributed tenant services, services with Service Level Agreements (SLAs) executed to control networked industrial robots, connected cars and heavy vehicles.
When performing an update to a data center hosting a number of DDS, it is of interest to identify any potential consequences of the update. It may also, or
alternatively, be of interest to estimate what needs to be modified to accomplish the update, while achieving a certain performance level. This is often referred to as change impact analysis. The update of the data center can refer to updates of the DDS executing on the data center or to updates of a hardware infrastructure on which the data center relies, or to both.
At least some known solutions for change impact analysis apply statistical methods and probability theory in a conventional way, e.g. by applying a comparison using simple descriptive statistics or statistical hypothesis test, to raw data relating to an original system and a changed system. The raw data can be service completion time and/or energy consumption of the data center.
Due to complexity of change impact analysis of DDS, the known solutions cannot always provide reliable and accurate results. SUMMARY
An object may thus be to enable improved change impact analysis relating to
DDS. According to an aspect, the object is achieved by a method, performed by a
Sample Managing Module, for managing samples relating to performance of a first service hosted by a data center. The data center is operable in a first configuration and a second configuration. The data center, when operated in the first configuration, hosts a second service. The first service is dependent on the second service in that the performance of the first service depends on a resource operated by the first and second services. The data center, when operated in the second configuration, hosts a third service. The first service is dependent on the third service in that the performance of the first service depends on the resource being operated by the first and third services. The Sample Managing Module creates, for operation of the data center in each of the first and second configurations, a respective set of statistical measures, being independent and identically distributed within the respective set of statistical measures, by generating, from a respective number of observation periods, a respective statistical measure based on respective samples for said respective number of observation periods, wherein said respective samples relates to the performance of the first service when the data center is operated in the first and second configurations, respectively.
According to another aspect, the object is achieved by a Sample Managing Module configured for managing samples relating to performance of a first service hosted by a data center. The data center is operable in a first configuration and a second configuration. The data center, when operated in the first configuration, hosts a second service. The first service is dependent on the second service in that the performance of the first service depends on a resource operated by the first and second services. The data center, when operated in the second configuration, hosts a third service. The first service is dependent on the third service in that the performance of the first service depends on the resource being operated by the first and third services. The Sample Managing Module is configured for, for operation of the data center in each of the first and second configurations: creating a respective set of statistical measures, being independent and identically distributed within the respective set of statistical measures, by generating, from a respective number of observation periods, a respective statistical measure based on respective samples for said respective number of observation periods, wherein said respective samples relates to the performance of the first service when the data center is operated in the first and second configurations, respectively. According to further aspects, the object is achieved by a computer program and a carrier therefor corresponding to the aspects above.
Thanks to that the Sample Managing Module generates the respective statistical measure based on the respective samples for said respective number of observations periods, any dependency between samples from one particular observation period is removed. In this manner, the respective samples relating to the performance are collected from different observations periods. Since the dependency between the samples is avoided, the created respective set of statistical measures is also
independent and identically distributed within the respective set of statistical measures.
As will be further explained in the detailed description below, the creation of the respective statistical measure, being independent and identically distributed, enables improved changed impact analysis, using existing solutions or using at least one particular embodiment herein.
As an example, the respective samples relating to the performance may be an execution time of the first service and/or a power consumption relating to execution of the first service. Thus, improvement of change impact analysis pertaining to execution time and/or power consumption may be achieved according to some embodiments.
BRIEF DESCRIPTION OF THE DRAWINGS
The various aspects of embodiments disclosed herein, including particular features and advantages thereof, will be readily understood from the following detailed description and the accompanying drawings, in which:
Figure 1 is a schematic overview of an exemplifying data center in which embodiments herein may be implemented,
Figure 2 is a flowchart illustrating embodiments of the method performed by the Sample Managing Module, Figure 3 is another flowchart illustrating a particular embodiment of the method performed by the Sample Managing Module, and
Figure 4 is a block diagram illustrating embodiments of the Sample Managing Module.
DETAILED DESCRIPTION
In order to better appreciate the embodiments herein, it will here be explained how the present inventors have realized what problems are associated with prior solutions.
With a known solution for change impact analysis concerning services in a data center, it is assumed that an analysis dataset is Independent and Identically Distributed (IID). The analysis dataset refers to a dataset to be analyzed using change impact analysis. Under this assumption, the known solution applies statistical methods and probability theory in a conventional way, e.g. using a statistical hypothesis test and a Markovian model.
However, dependencies between DDS, caused for instance by messages stored in a queue of a connecting entity, makes this assumption invalid. A reason is that a service completion time, e.g. for an observed service, directly depends on a number of messages stored in the queue. Clearly, when dealing with DDS, the number of messages in the queue may be changed by other services during execution of the observed service. Accordingly, a sample relating to performance of the observed service is dependent on one or more prior samples that have been collected, or recorded. Consequently, the analysis dataset is non-IID.
Therefore, the known solution, applying statistical methods and probability theory in a conventional way, on the analysis dataset is invalid theoretically. As a result, the known solution may achieve inaccurate results in many cases.
It has further been observed that the known solution makes the assumption about that the analysis dataset is IID independently of chosen statistical model and/or probability theory. Moreover, the known solution assumes that a Markovian model describes performance change of the observed service when the data center is updated. Thus, again it is assumed the analysis dataset to be IID. Moreover, use of the Markovian model is computationally expensive due to e.g. convolution of probability distributions as compared to the statistical hypothesis test. Disadvantageously, the use of the Markovian model increases requirements concerning computational capacity required to execute the known solution. Markovian models are described in e.g. "Applied Probability and Queues", 2nd ed., to Asmussen, S., published in 2003 by Springer, New York.
Furthermore, concerning statistical hypothesis tests, there are parametric hypothesis tests and non-parametric hypothesis tests.
The parametric hypothesis tests make assumptions about an underlying distribution of a population from which a sample is drawn, and which is investigated. It is typical that the population conforms to a normal distribution. The parametric hypothesis tests include ANalysis Of VAriance (ANOVA) tests, Chi-Square Test, Contingency Tables, Fisher-test (F-test), Student's-test (t-test), z-test and the like.
Non-parametric tests, or distribution-free tests, make no assumption about the underlying distribution of the population. Thus, the non-parametric tests can be applied for both non-normally distributed populations, e.g. unknown distribution of population, and normally distributed populations, e.g. known distribution of population. However, the non-parametric tests are more accurate for non-normally distributed populations than for normally distributed populations. Moreover, non-parametric tests do not require the analysis dataset to be large, e.g. hundreds of samples. Instead, with non-parametric tests, the analysis dataset may be less than hundred, e.g. tens of samples or even less. The non-parametric tests include two-sample Kolmogorov-Smirnov (KS)-test, Mann- Whitney U-Test, Anderson-Darling-test (AD-test), sign scores test and the like.
Hereinafter, the Mann-Whitney U-Test, aka Wilcoxon Rank-Sum test, will be referred to as WRS-test.
However, non-parametric tests often yields a higher so called p-value than the parametric tests for a given analysis dataset. A higher p-value means a less accurate result of the hypothesis test.
Throughout the following description similar reference numerals have been used to denote similar features, such as nodes, actions, steps, modules, circuits, parts, items elements, units or the like, when applicable. In the Figures, features that appear in some embodiments are indicated by dashed lines. Figure 1 depicts an exemplifying data center 100 in which embodiments herein may be implemented. In this example, the data center 100 is a so called Virtual Data Center (VDC).
In other examples, the data center 100 may be a cloud computer system, a data system, a cloud data system, a computer network, a hardware system, a computer system platform, a hardware platform, a disaggregated hardware system, a server system, a cloud server system, one or more cloud environments or the like.
The data center 100 hosts one or more distributed and dependent services, such as a first service 101 and at least one other service. The first service 101 is dependent on at least one other service. Thus, the data center hosts, as mentioned, distributed and dependent services. In Figure 1 , a second service 102 and a third service 103 are illustrated with dashed-lines, since they may not always be hosted by the data center 100 at the same time as explained in more detail below. The first, second and third services 101 , 102, 103 may be different instances of one application, or they may be instances of respective applications, or a combination thereof.
One of the distributed and dependent services, hosted by the data center 100, may be dependent on another one of the DDS in that a resource 104 is operated by at least a pair of distributed and dependent services. When the resource 104 is operated, it may mean load on the resource 104, busyness of the resource 104, contents of the resource 104 is altered. As an example, the resource 104 may be a queue, whose length is changed when distributed and dependent services operate on it. The resource 104 may sometimes be referred to as 'connecting entity'. The queue may comprise a set of messages, which are processed by the pair of distributed and dependent services. When a message is processed, it will be removed from the queue.
It shall be understood that a first type of DDS retrieve, or consume, information and/or functions provided by the resource 104. Moreover, a second type of DDS push, or produce information and/or functions provided by the resource 104. A certain DDS may be of both the first type and the second type. This means that the term "operated" may mean that the resource 104 is shared 120, 121, 122 between DDS, i.e. operated means one or both of "consume" and "produce".
In other examples, the resource 104 may be a software application that provides a few components, or functions, that are shared between the DDS. Moreover, Figure 1 illustrates a Sample Managing Module (SMM) 110
according to embodiments herein as described below.
The data center 100 is operable in a first configuration and a second
configuration.
According to the first configuration, the data center 100 hosts the second service 102, i.e. the above mentioned at least one other service. This means that the data center 100 hosts both the first and second services 101 , 102 in the first configuration. The first service is dependent on the second service in that the performance of the first service depends on the resource 104 operated by the first and second services 101 , 102.
According to the second configuration, the data center 100 hosts the third service 103 in addition to the first service 101.
In a first example of the second configuration, the data center 100 hosts only the first and third services 101 , 103, i.e. the data center 100 does not host the second service 102. The first service is dependent on the third service in that the performance of the first service depends on the resource, but consequently the first service is no longer dependent on the second service. The resource is thus operated by only the first and third services according to the first example of the second configuration.
In a second example of the second configuration, the data center 100 hosts the first and third services 101 , 103, i.e. in addition to that the data center 100 hosts the second service 102. The first service is dependent on the third service in that the performance of the first service depends on the resource, and - as in the first configuration - the first service is still dependent on the second service. The resource is thus operated by the first, second and third services according to the second example of the second configuration.
When the data center 100 is updated, its configuration may be changed.
For example, when the data center 100 is updated by changing its configuration from the first configuration to the first example of the second configuration, a
replacement of the second service is foreseen. The replacement may be realized by a software update of the second service, where the updated version of the second service is referred to as the third service. Alternatively, or even additionally, the replacement may be realized by removing the second service, i.e. shutting down the second service, and implementing the third service, i.e. starting up the third service, wherein the second and third services are completely different, in that they provide different services, i.e. perform different tasks.
As another example, when the data center 100 is updated by changing its configuration form the first configuration to the second example of the second configuration, an addition or a removal of the third service 103 is foreseen.
The addition of the third service 103 is foreseen when the first configuration is considered to be an original configuration, i.e. before an update is performed, and the second configuration is considered to be an updated configuration, i.e. after the update has been performed.
The removal of the third service 103 is foreseen when the first configuration is considered to be the updated configuration and the second configuration is considered to be the original configuration.
Before proceeding with a description of exemplifying methods performed by the SMM 1 10, an overview will be provided and some terms will be discussed.
As discussed above, a problem with the existing solution is that the analysis dataset is non-IID. Thus, in order to enable improved changed impact analysis, the embodiments herein provides a way of obtaining, or creating, a dataset that is IID.
In the description below, an IID-dataset is given by a respective set of statistical measures for operation of the data center 100 in each of the first and second
configurations. This part may be seen as a pre-processing, before performing an actual change impact analysis, e.g. a data processing. As mentioned above, the data processing may be performed as described according to some embodiments herein or according to known manners.
The pre-processing is described in actions A010 to A070.
An exemplifying data processing is described in actions A080 to A110.
With the data processing the actual change impact analysis may be performed. The change impact analysis is about identifying the potential consequences of a change, or estimating what needs to be modified to accomplish a change. With the data processing, it is aimed to perform a change impact analysis of performance of DDS, in terms of identifying a statistically, significant difference as a consequence of a change, or estimating a result of a change statistically, regarding e.g. execution time of a service or energy consumption relating to a service. Execution time of the service may be a time period for completing the service.
As mentioned above, it has been understood that samples relating to
performance within an observation period for e.g. the first service 101 are IID. For example, due to the fact that a state or status of resources, e.g. the number of messages stored in the queue can be changed at runtime, the samples relating to performance are dependent on each other. The number of messages may be changed by other services periodically/sporadically/aperiodically. Again, the samples relating to performance may be execution time of the observed service and/or energy consumption relating to the observed service.
For example, during one specific observation period there are 1 1 DDS hosted by the data center 100, the service completion time for the observed service is calculated as h * t, in which h is the number of messages stored in the queue, e.g. the resource 104, and t is an execution time of either a message processing operation or another type of execution operation depending of a message taken from the queue. When other services send messages to the queue, then the service completion time of the observed service do change. Clearly, when running such models for an observation period, the measured service completion times are not independent with each other anymore. Moreover, it has been observed that samples within the observation period may be non-normality distributed, e.g. distributed in a multimodal manner. Hence, an assumption regarding an underlying populations from which samples are drawn, i.e. assuming normality, does often not hold. Thus, the Markovian model is inappropriate. The data processing may thus need to be adapted accordingly as described in more detail below.
In Figure 2 a schematic flowchart of exemplifying methods in the SMM 1 10 is shown. Accordingly, the SMM 1 10 performs a method for managing samples relating to performance of the first service 101 hosted by the data center 100.
As mentioned, the data center 100 is operable in a first configuration and a second configuration. The data center 100, when operated in the first configuration, hosts a second service. The first service is dependent on the second service in that the performance of the first service depends on a resource operated by the first and second services. The data center 100, when operated in the second configuration, hosts a third service. The first service is dependent on the third service in that the performance of the first service depends on the resource being operated by the first and third services.
One or more of the following actions may be performed in any suitable order.
As mentioned, the pre-processing may be performed by one or more of actions A010 to A070.
Action A010
For operation of the data center 100 in each of the first and second
configurations, action A010 is performed. Thus, the SMM 1 10 creates a respective set of statistical measures, being independent and identically distributed within the respective set of statistical measures.
The respective set of statistical measures is obtained by generating, from a respective number of observation periods, a respective statistical measure based on respective samples for said respective number of observation periods. This means that the respective samples are collected from each of said respective number of observation periods.
Said respective samples relates to the performance of the first service when the data center 100 is operated in the first and second configurations, respectively.
The respective samples may comprise a respective first configuration sample relating to the performance of the first service for each one of the first number of observation periods, when operating the data center 100 in the first configuration, and a respective second configuration sample relating to the performance of the first service for each one of the second number of observation periods, when operating the data center 100 in the second configuration.
A sample among the respective samples may relate to one or more metrics relating to the performance of the first service. This means that the sample for one observation period among the respective number of observation periods may include one or more metrics, such as execution time, energy consumption or the like as disclosed herein.
The respective set of statistical measures may comprise a first set of statistical measures, when operating the data center 100 in the first configuration, and a second set of statistical measures, when operating the data center 100 in the second configuration.
The respective number of observation periods may comprise a first number of observations periods when operating the data center 100 in the first configuration and a second number of observations periods when operating the data center 100 in the second configuration.
The respective statistical measure may comprise a respective first configuration statistical measure, when operating the data center 100 in the first configuration, and a respective second configuration statistical measure, when operating the data center 100 in the second configuration.
Action A010, i.e. the creating of the respective set of statistical measures, may comprise, when operating the data center 100 in the first configuration actions A020, A030 and A040.
Action A020
The SMM 1 10 may obtain the respective first configuration sample relating to the performance of the first service for each one of the first number of observations periods. Action A030
The SMM 1 10 may generate, for each observation period of the first number of observation periods, the respective first configuration statistical measure based on the respective first configuration sample. Action A040
The SMM 1 10 may form the first set of statistical measures by including the respective first configuration statistical measure for said each observation period of the first number of observations periods, wherein each respective first configuration statistical measure is independent and identically distributed within the first set of statistical measures.
Action A010, i.e. the creating of the respective set of statistical measures, may comprise, when operating the data center 100 in the second configuration actions A050, A060 and A070. Action A050
The SMM 1 10 may obtain the respective second configuration sample relating to the performance of the first service for each one of the second number of observations periods,
Action A060
The SMM 1 10 may generate, for each observation period of the second number of observation periods, the respective second configuration statistical measure based on the respective second configuration sample.
Action A070
The SMM 1 10 may form the second set of statistical measures by including the respective second configuration statistical measure for said each observation period of the second number of observations periods, wherein each respective second configuration statistical measure is independent and identically distributed within the second set of statistical measures.
Now, as mentioned, the data processing is described in actions A080 to A110. The data processing may be split into three parts, i.e. data analytics part as in action A080 and A090, a weighted sum model part as in action A100 and a conclusion part as in action 1 10. As an alternative to the data processing described below, known parametric hypothesis test may be applied.
Thanks to the data processing, accuracy of results from change impact analysis may be improved by adoption of a few non-parametric tests combined in a joint effort.
Action A080
The SMM 1 10 may apply at least three non-parametric hypothesis tests for evaluating a first hypothesis and a second hypothesis concerning the first and second sets of statistical measures.
The three non-parametric hypothesis tests may comprise:
a first test managing detection of a vertical difference between the first and second sets of statistical measures, a second test managing a median difference between the first and second sets of statistical measures,
a third test managing a tail difference between the first and second sets of statistical measures.
As an example, the first test may be a two-sample Kolmogorov-Smirnov test, i.e. the two-sample KS test is applied to detect the vertical difference between two populations, the second test may be a WRS-test, to test the median difference between two populations and the third test may be an Anderson-Darling-test to test the tail difference between two populations. Note that a common assumption of such non- parametric statistics, i.e. IID-datasets, has been achieved by the proposed preprocessing as described above.
The first hypothesis, such as a null hypothesis, may assume that the difference is statistically insignificant, and the second hypothesis, such as an alternative hypothesis, may assume that the difference is statistically significant.
In more detail, the above mentioned three non-parametric hypothesis test will be applied to verify if the two datasets, i.e. the respective first configuration statistical measure and the respective second configuration statistical measure, are from a same underlying population or not, with the first and second hypotheses as follows:
Ho: From a viewpoint of completion time and/or energy consumption of services of interest, the difference caused by a change is not statistically significant, by using a specific non-parametric statistical hypothesis test.
Ha: From a viewpoint of completion time and/or energy consumption of services of interest, the difference caused by a change is statistically significant, by using a specific non-parametric statistical hypothesis test.
Each of said three non-parametric hypothesis tests may be applied using a respective significance level or a common significance level.
Typically, the common significance level is applied for all three non-parametric hypothesis tests, e.g. 0,05 which is a common value. In practice, different significance levels are adopted for different hypothesis tests, then complexity will increase as in a confidence level of the total score has to be recalculated/unified. For example, Score A given by the two-sample KS test on 0,05 * weight A, Score B given by the WRS test on 0,01 * weight B, and Score C * weight C obtained by the AD test on 0,32 cannot be added together, directly, i.e. without compensating for the difference in significance level.
Action A090
The SMM 1 10 may, for each of said three non-parametric hypotheses tests, compute a respective score.
The respective score may be set to zero when said each of said three non- parametric hypotheses tests does not reject the first hypothesis and the respective score may be set to one otherwise to indicate acceptance of the second hypothesis, wherein a set of scores may comprise the respective score for each of said three non-parametric hypothesis tests.
This means that when there is a statistically significant difference, meaning the alternative hypothesis Ha should be accepted, the test result is given score one;
otherwise score zero is given for not rejecting Ho. Although there are results obtained from a few analysis methods, with different focuses of 1) detecting the vertical difference between the cumulative distribution function (the two-sample KS test) and, 2) medians (for the WRS) and, 3) the tail (for the AD-test), of two analysis datasets, this provides a comprehensive assessment of any possible differences. Action A100
The SMM 1 10 may calculate a total score as a weighted sum of the set of scores. In this manner, a combined joint effort of the three non-parametric hypothesis tests is achieved.
In some examples, weights of the weighted sum may be equal to 1 divided by a number of non-parametric hypothesis tests, i.e. 3 in case of three non-parametric hypothesis tests. In this manner, the different non-parametric hypothesis tests are given an equal impact on the total score, i.e. none of the non-parametric hypothesis tests is more important than another test. This means that a sum of the weights for terms of the weighted sum is equal to 1.
In other examples, one or more of the weights may be different from the other weights. In this manner, the non-parametric hypothesis tests may be given different importance, i.e. impact on the total score.
Action A110 The SMM 1 10 may conclude that a difference between the first and second sets of statistical measures is statistically significant when the total score is greater than a threshold value for identifying the difference as statistically significant. The threshold value may be 2/3.
Expressed somewhat differently, if the total score is not smaller than a certain threshold, e.g. 2/3, it means that there are at least two non-parametric hypothesis tests, confirming of the acceptance of Ha, then conclusion Ca is chosen. Otherwise, conclusion Co is drawn. The conclusions Co and Ca are shown as follows, i.e.
Co: From the viewpoint of completion time and/or energy consumption of services of interest, the difference caused by a change is not statistically significant.
Ca: From the viewpoint of completion time and/or energy consumption of services of interest, the difference caused by a change is statistically significant.
According to embodiments, including the data processing, a method for performing change impact analysis of a DDS regarding its performance data, e.g.
service completion time, and energy consumption, is provided. In this manner, a lightweight yet accurate change impact analysis of distributed and dependent services is provided. A further advantage is that embodiments herein are highly scalability and less computational expense than existing methods. As mentioned above, the Markovian model may be requiring in terms of computational capacity. Since the embodiments herein avoids the use of the Markovian model and applies non-parametric hypothesis test, requirements in terms of computational capacity will be less. Moreover, the respective sets of statistical measures for the first and second configurations may be performed in parallel. Thus, allowing for a scalable solution.
Yet another advantage is that the embodiments herein are easy to integrate with any existing data collection solution using measurements. The integration may come easy due to that the creation, e.g. as in action A010, of respective sets of statistical measures may be applied to the respective samples, e.g. raw performance data present in for example already existing system log-files and the like. Figure 3 shows a further flowchart of an exemplifying method performed by the SMM 1 10.
One or more of the following actions may be performed.
Action 301
The SMM 1 10 constructs an IID-dataset of performance data of a DDS being observed, containing k data from k observation periods, for both the original and the changed systems, respectively.
In each of k observation periods, there are m raw performance data collected in the period, and each of such m performance data is recorded between the instants when the service gets started and when it is finished.
A specific descriptive statistics of such m data is chosen, e.g. minimum, lower quartile, second quartile (median), third quartile, maximum, or mean of such m raw performance data. The specific descriptive statistics is an example of the statistical measure.
A random selection of the descriptive statistics may be applied, e.g. using a uniform random variable to give all the descriptive statistics an equal opportunity to be chosen for the next observation period. For instance, in one observation period, the minimum of m raw performance data may be selected, while in the next observation any of such descriptive statistics as returned by the random number obtained by a uniform random variable may be selected. The m raw performance data may be the respective sample relating to the performance of the first service when operating the data center 100 in the first and/or second configuration.
This action is similar to action A010. Moreover, this action is similar to actions
A010 when including action A020, A030, A040, A050, A060 and A070.
Action 302
For each non-parametric hypothesis test, the SMM 1 10 may determine Rn = (koriginah kChanged, Alpha)
If Ho cannot be rejected by using one non-parametric hypothesis test, we give the test score zero; otherwise score one is given for the
acceptance of alternative hypothesis Ha. kohginai may be an example of the respective first configuration sample when operating the data center 100 in the first configuration. kchanged may be an example of the respective second configuration sample when operating the data center 100 in the second configuration. Alternatively, koriginai and kchanged may relate to the second configuration sample and first configuration sample, respectively.
Alpha is the chosen level of significance, which is for instance 0.05, a typical value and based on preliminary assessments provides appropriate results. Accordingly, the confidence level is (1 -Alpha)*100%, e.g. 95%.
Note that for the WRS test for median, the variances and the shape of samples should also be verified by using e.g. Levene's test. If such a test cannot be passed, then the IID samples will be reconstructed by repeating the first step of data preprocessing component.
Optionally, add the result Rn to the hypothesis test table R.
HTT results -> <Rtwosample-ks, Rwrs, Rtwosample-ad>
This action is similar to one or more of actions A080 and A090.
Action 303
The SMM 1 10 may apply a weighted sum model
Sfinal-score = Rtwosample-ks* C1 + Rwrs * C2 + Rtwosample-ad * C3
This action is similar to action A100.
Action 304
In some examples, the SMM 100 may check if Sfinal-score is not less than a Threshold, e.g. 2/3.
This action is similar to a part of action A1 10.
Action 305
When action 304 has been performed, conclude Ca when the Sfinal-score is not smaller than 2/3
This action is similar to a part of action A1 10.
Action 306
Otherwise, if condition of action 304 is not fulfilled, conclude CO.
This action is similar to a part of action A1 10. The method has been evaluated by using a use case of DDS in an example of simulated networked industrial robots, in which the execution time and period of DDS are changed at runtime. DDS which control the actuator of robots has been observed.
Results have shown that the solution according to some embodiments herein can identify most cases, i.e. with 87.5% (i.e. 7/8) accuracy.
With reference to Figure 4, a schematic block diagram of embodiments of the SMM 110 of Figure 1 is shown.
The SMM 1 10 may comprise a processing module 401 , such as a means for performing the methods described herein. The means may be embodied in the form of one or more hardware modules and/or one or more software modules.
The SMM 1 10 may further comprise a memory 402. The memory may comprise, such as contain or store, instructions, e.g. in the form of a computer program 403, which may comprise computer readable code units.
According to some embodiments herein, the SMM 110 and/or the processing module 401 comprises a processing circuit 404 as an exemplifying hardware module. Accordingly, the processing module 401 may be embodied in the form of, or 'realized by', the processing circuit 404. The instructions may be executable by the processing circuit 404, whereby the SMM 110 is operative to perform the methods of Figure 2. As another example, the instructions, when executed by the SMM 1 10 and/or the processing circuit 404, may cause the SMM 1 10 to perform the method according to Figure 2.
Figure 4 further illustrates a carrier 405, or program carrier, which comprises the computer program 403 as described directly above. In some embodiments, the processing module 401 comprises an Input/Output module 406, which may be exemplified by a receiving module and/or a sending module as described below when applicable. In further embodiments, the SMM 110 and/or the processing module 401 may comprise one or more of a pre-processing module 407, a data processing module 408, a creating module 410, an obtaining module 420, a generating module 430, a forming module 440, an applying module 450, a computing module 460, a calculating module 470 and a concluding module 480 as exemplifying hardware modules. In other examples, one or more of the aforementioned exemplifying hardware modules may be implemented as one or more software modules.
As shown in the Figure, the pre-processing module 407 may comprise one or more of the creating module 410, the obtaining module 420, the generating module 430 and the forming module 440. Moreover, the data processing module 408 may comprises one or more of the applying module 450, the computing module 460, the calculating module 470 and the concluding module 480.
Accordingly, the SMM 110 is configured for managing samples relating to performance of a first service hosted by a data center 100.
As mentioned, the data center 100 is operable in a first configuration and a second configuration, wherein the data center 100, when operated in the first configuration, hosts a second service, wherein the first service is dependent on the second service in that the performance of the first service depends on a resource operated by the first and second services, wherein the data center 100, when operated in the second configuration, hosts a third service, wherein the first service is dependent on the third service in that the performance of the first service depends on the resource being operated by the first and third services. Therefore, according to the various embodiments described above, the SMM 1 10 and/or the processing module 401 and/or the creating module 410 is configured for, for operation of the data center 100 in each of the first and second configurations, creating a respective set of statistical measures, being independent and identically distributed within the respective set of statistical measures, by generating, from a respective number of observation periods, a respective statistical measure based on respective samples for said respective number of observation periods, wherein said respective samples relates to the performance of the first service when the data center 100 is operated in the first and second configurations, respectively. Again, the respective set of statistical measures may comprise a first set of statistical measures, when operating the data center 100 in the first configuration, and a second set of statistical measures, when operating the data center 100 in the second configuration.
The respective number of observation periods may comprise a first number of observations periods when operating the data center 100 in the first configuration and a second number of observations periods when operating the data center 100 in the second configuration.
The respective samples may comprise a respective first configuration sample relating to the performance of the first service for each one of the first number of observation periods, when operating the data center 100 in the first configuration, and a respective second configuration sample relating to the performance of the first service for each one of the second number of observation periods, when operating the data center 100 in the second configuration.
The respective statistical measure may comprise a respective first configuration statistical measure, when operating the data center 100 in the first configuration, and a respective second configuration statistical measure, when operating the data center 100 in the second configuration. The SMM 1 10 and/or the processing module 401 and/or the obtaining module
420 may be configured for creating the respective set of statistical measures, when operating the data center 100 in the first configuration, by obtaining the respective first configuration sample relating to the performance of the first service for each one of the first number of observations periods.
The SMM 1 10 and/or the processing module 401 and/or the generating module
430 may be configured for creating the respective set of statistical measures, when operating the data center 100 in the first configuration, by generating, for each observation period of the first number of observation periods, the respective first configuration statistical measure based on the respective first configuration sample.
The SMM 1 10 and/or the processing module 401 and/or the forming module 440 may be configured for creating the respective set of statistical measures, when operating the data center 100 in the first configuration, by forming the first set of statistical measures by including the respective first configuration statistical measure for said each observation period of the first number of observations periods, wherein each respective first configuration statistical measure is independent and identically distributed within the first set of statistical measures.
Furthermore, the SMM 1 10 and/or the processing module 401 and/or the obtaining module 420, or another obtaining module (not shown), may be configured for creating the respective set of statistical measures, when operating the data center 100 in the second configuration, by obtaining the respective second configuration sample relating to the performance of the first service for each one of the second number of observations periods.
The SMM 1 10 and/or the processing module 401 and/or the generating module 430, or another generating module (not shown), may be configured for creating the respective set of statistical measures, when operating the data center 100 in the first configuration, by generating, for each observation period of the second number of observation periods, the respective second configuration statistical measure based on the respective second configuration sample, and
The SMM 1 10 and/or the processing module 401 and/or the forming module 440, or another forming module (not shown), may be configured for creating the respective set of statistical measures, when operating the data center 100 in the first configuration, by forming the second set of statistical measures by including the respective second configuration statistical measure for said each observation period of the second number of observations periods, wherein each respective second configuration statistical measure is independent and identically distributed within the second set of statistical measures.
A sample among the respective samples may relate to one or more metrics relating to the performance of the first service.
The SMM 1 10 and/or the processing module 401 and/or the applying module 450 may be configured for applying at least three non-parametric hypothesis tests for evaluating a first hypothesis and a second hypothesis concerning the first and second sets of statistical measures.
Furthermore, the SMM 1 10 and/or the processing module 401 and/or the computing module 460 may be configured for, for each of said three non-parametric hypotheses tests, computing a respective score. The respective score is set to zero when said each of said three non-parametric hypotheses tests does not reject the first hypothesis and the respective score is set to one otherwise to indicate acceptance of the second hypothesis, wherein a set of scores may comprise the respective score for each of said three non-parametric hypothesis tests.
Moreover, the SMM 1 10 and/or the processing module 401 and/or the calculating module 470 may be configured for calculating a total score as a weighted sum of the set of scores.
The SMM 1 10 and/or the processing module 401 and/or the concluding module 480 may be configured for concluding that a difference between the first and second sets of statistical measures is statistically significant when the total score is greater than a threshold value for identifying the difference as statistically significant.
The first hypothesis, such as a null hypothesis, may assume that the difference is statistically insignificant, and the second hypothesis, such as an alternative hypothesis, may assume that the difference is statistically significant.
Each of said three non-parametric hypothesis tests may be applied using a respective significance level or a common significance level.
The three non-parametric hypothesis tests may comprise:
a first test managing detection of a vertical difference between the first and second sets of statistical measures,
a second test managing a median difference between the first and second sets of statistical measures, and
a third test managing a tail difference between the first and second sets of statistical measures.
The data center 100, when operated in the second configuration, may host the second and third services. The first service is dependent on the second and third services in that the performance of the first service depends on the resource being operated by the first, second and third services.
Weights of the weighted sum may be equal to 1 divided by a number of non- parametric hypothesis tests, such as three. The threshold value may be 2/3.
As used herein, the term "module" may refer to one or more functional modules, each of which may be implemented as one or more hardware modules and/or one or more software modules and/or a combined software/hardware module in a node. In some examples, the module may represent a functional unit realized as software and/or hardware of the node. As used herein, the term "program carrier", or "carrier", may refer to one of an electronic signal, an optical signal, a radio signal, and a computer readable medium. In some examples, the program carrier may exclude transitory, propagating signals, such as the electronic, optical and/or radio signal. Thus, in these examples, the carrier may be a non-transitory carrier, such as a non-transitory computer readable medium.
As used herein, the term "processing module" may include one or more hardware modules, one or more software modules or a combination thereof. Any such module, be it a hardware, software or a combined hardware-software module, may be a determining means, estimating means, capturing means, associating means, comparing means, identification means, selecting means, receiving means, sending means or the like as disclosed herein. As an example, the expression "means" may be a module
corresponding to the modules listed above in conjunction with the Figures.
As used herein, the term "software module" may refer to a software application, a Dynamic Link Library (DLL), a software component, a software object, an object according to Component Object Model (COM), a software component, a software function, a software engine, an executable binary software file or the like.
As used herein, the term "processing circuit" may refer to a processing unit, a processor, an Application Specific integrated Circuit (ASIC), a Field-Programmable Gate Array (FPGA) or the like. The processing circuit or the like may comprise one or more processor kernels.
As used herein, the expression "configured to/for" may mean that a processing circuit is configured to, such as adapted to or operative to, by means of software configuration and/or hardware configuration, perform one or more of the actions described herein.
As used herein, the term "action" may refer to an action, a step, an operation, a response, a reaction, an activity or the like. It shall be noted that an action herein may be split into two or more sub-actions as applicable. Moreover, also as applicable, it shall be noted that two or more of the actions described herein may be merged into a single action.
As used herein, the term "memory" may refer to a hard disk, a magnetic storage medium, a portable computer diskette or disc, flash memory, random access memory (RAM) or the like. Furthermore, the term "memory" may refer to an internal register memory of a processor or the like. As used herein, the term "computer readable medium" may be a Universal Serial Bus (USB) memory, a DVD-disc, a Blu-ray disc, a software module that is received as a stream of data, a Flash memory, a hard drive, a memory card, such as a MemoryStick, a Multimedia Card (MMC), Secure Digital (SD) card, etc. One or more of the
aforementioned examples of computer readable medium may be provided as one or more computer program products.
As used herein, the term "computer readable code units" may be text of a computer program, parts of or an entire binary file representing a computer program in a compiled format or anything there between.
As used herein, the terms "number" and/or "value" may be any kind of digit, such as binary, real, imaginary or rational number or the like. Moreover, "number" and/or "value" may be one or more characters, such as a letter or a string of letters. "Number" and/or "value" may also be represented by a string of bits, i.e. zeros and/or ones.
As used herein, the term "set of may refer to one or more of something. E.g. a set of devices may refer to one or more devices, a set of parameters may refer to one or more parameters or the like according to the embodiments herein.
As used herein, the expression "in some embodiments" has been used to indicate that the features of the embodiment described may be combined with any other embodiment disclosed herein.
Further, as used herein, the common abbreviation "e.g.", which derives from the
Latin phrase "exempli gratia," may be used to introduce or specify a general example or examples of a previously mentioned item, and is not intended to be limiting of such item. If used herein, the common abbreviation "i.e.", which derives from the Latin phrase "id est," may be used to specify a particular item from a more general recitation. The common abbreviation "etc.", which derives from the Latin expression "et cetera" meaning "and other things" or "and so on" may have been used herein to indicate that further features, similar to the ones that have just been enumerated, exist.
Even though embodiments of the various aspects have been described, many different alterations, modifications and the like thereof will become apparent for those skilled in the art. The described embodiments are therefore not intended to limit the scope of the present disclosure.

Claims

1. A method, performed by a Sample Managing Module (1 10), for managing samples relating to performance of a first service hosted by a data center (100), wherein the data center (100) is operable in a first configuration and a second configuration, wherein the data center (100), when operated in the first configuration, hosts a second service, wherein the first service is dependent on the second service in that the performance of the first service depends on a resource operated by the first and second services, wherein the data center (100), when operated in the second configuration, hosts a third service, wherein the first service is dependent on the third service in that the performance of the first service depends on the resource being operated by the first and third services, wherein the method comprises, for operation of the data center (100) in each of the first and second configurations:
creating (A010) a respective set of statistical measures, being independent and identically distributed within the respective set of statistical measures, by generating, from a respective number of observation periods, a respective statistical measure based on respective samples for said respective number of observation periods, wherein said respective samples relates to the performance of the first service when the data center (100) is operated in the first and second configurations, respectively.
2. The method according to claim 1 , wherein the respective set of statistical measures comprises a first set of statistical measures, when operating the data center (100) in the first configuration, and a second set of statistical measures, when operating the data center (100) in the second configuration.
3. The method according to claim 2,
wherein the respective number of observation periods comprises a first number of observations periods when operating the data center (100) in the first configuration and a second number of observations periods when operating the data center (100) in the second configuration,
wherein the respective samples comprise a respective first configuration sample relating to the performance of the first service for each one of the first number of observation periods, when operating the data center (100) in the first configuration, and a respective second configuration sample relating to the performance of the first service for each one of the second number of observation periods, when operating the data center (100) in the second configuration,
wherein the respective statistical measure comprises a respective first configuration statistical measure, when operating the data center (100) in the first configuration, and a respective second configuration statistical measure, when operating the data center (100) in the second configuration,
wherein the creating of the respective set of statistical measures comprises, when operating the data center (100) in the first configuration:
obtaining (A020) the respective first configuration sample relating to the performance of the first service for each one of the first number of observations periods,
generating (A030), for each observation period of the first number of observation periods, the respective first configuration statistical measure based on the respective first configuration sample,
forming (A040) the first set of statistical measures by including the respective first configuration statistical measure for said each observation period of the first number of observations periods, wherein each respective first configuration statistical measure is independent and identically distributed within the first set of statistical measures, and
wherein the creating of the respective set of statistical measures comprises, when operating the data center (100) in the second configuration:
obtaining (A050) the respective second configuration sample relating to the performance of the first service for each one of the second number of observations periods,
generating (A060), for each observation period of the second number of observation periods, the respective second configuration statistical measure based on the respective second configuration sample, and
forming (A070) the second set of statistical measures by including the respective second configuration statistical measure for said each observation period of the second number of observations periods, wherein each respective second configuration statistical measure is independent and identically distributed within the second set of statistical measures.
The method according to any one of claims 1 -3, wherein a sample among the respective samples relates to one or more metrics relating to the performance of the first service.
The method according to any one of claims 2-4, when dependent on claim 2, wherein the method comprises:
applying (A080) at least three non-parametric hypothesis tests for evaluating a first hypothesis and a second hypothesis concerning the first and second sets of statistical measures,
for each of said three non-parametric hypotheses tests, computing (A090) a respective score, wherein the respective score is set to zero when said each of said three non-parametric hypotheses tests does not reject the first hypothesis and the respective score is set to one otherwise to indicate acceptance of the second hypothesis, wherein a set of scores comprises the respective score for each of said three non-parametric hypothesis tests,
calculating (A100) a total score as a weighted sum of the set of scores, and concluding (A1 10) that a difference between the first and second sets of statistical measures is statistically significant when the total score is greater than a threshold value for identifying the difference as statistically significant.
The method according to claim 5, wherein the first hypothesis assumes that the difference is statistically insignificant, and the second hypothesis assumes that the difference is statistically significant.
The method according to any one of claims 5-6, wherein each of said three non- parametric hypothesis tests is applied using a respective significance level or a common significance level.
The method according to any one of claims 5-7, wherein the three non-parametric hypothesis tests comprise:
a first test managing detection of a vertical difference between the first and second sets of statistical measures,
a second test managing a median difference between the first and second sets of statistical measures, and a third test managing a tail difference between the first and second sets of statistical measures.
9. The method according to any one of the preceding claims, wherein the data center (100), when operated in the second configuration, hosts the second and third services, wherein the first service is dependent on the second and third services in that the performance of the first service depends on the resource being operated by the first, second and third services.
10. The method according to any one of claims 5-9, when dependent on claim 5,
wherein weights of the weighted sum are equal to 1 divided by a number of non- parametric hypothesis tests.
1 1 . The method according to any one of claims 5-10, wherein the threshold value is 2/3.
12. A Sample Managing Module (1 10) configured for managing samples relating to
performance of a first service hosted by a data center (100), wherein the data center (100) is operable in a first configuration and a second configuration, wherein the data center (100), when operated in the first configuration, hosts a second service, wherein the first service is dependent on the second service in that the performance of the first service depends on a resource operated by the first and second services, wherein the data center (100), when operated in the second configuration, hosts a third service, wherein the first service is dependent on the third service in that the performance of the first service depends on the resource being operated by the first and third services, wherein the Sample Managing Module (110) is configured for, for operation of the data center (100) in each of the first and second configurations: creating a respective set of statistical measures, being independent and identically distributed within the respective set of statistical measures, by generating, from a respective number of observation periods, a respective statistical measure based on respective samples for said respective number of observation periods, wherein said respective samples relates to the performance of the first service when the data center (100) is operated in the first and second configurations, respectively.
13. The Sample Managing Module (1 10) according to claim 12, wherein the respective set of statistical measures comprises a first set of statistical measures, when operating the data center (100) in the first configuration, and a second set of statistical measures, when operating the data center (100) in the second
configuration.
14. The Sample Managing Module (1 10) according to claim 13,
wherein the respective number of observation periods comprises a first number of observations periods when operating the data center (100) in the first configuration and a second number of observations periods when operating the data center (100) in the second configuration,
wherein the respective samples comprise a respective first configuration sample relating to the performance of the first service for each one of the first number of observation periods, when operating the data center (100) in the first configuration, and a respective second configuration sample relating to the performance of the first service for each one of the second number of observation periods, when operating the data center (100) in the second configuration,
wherein the respective statistical measure comprises a respective first configuration statistical measure, when operating the data center (100) in the first configuration, and a respective second configuration statistical measure, when operating the data center (100) in the second configuration,
wherein the Sample Managing Module (1 10) is configured for creating the respective set of statistical measures, when operating the data center (100) in the first configuration, by:
obtaining the respective first configuration sample relating to the performance of the first service for each one of the first number of observations periods,
generating, for each observation period of the first number of observation periods, the respective first configuration statistical measure based on the respective first configuration sample,
forming the first set of statistical measures by including the respective first configuration statistical measure for said each observation period of the first number of observations periods, wherein each respective first configuration statistical measure is independent and identically distributed within the first set of statistical measures, and wherein the Sample Managing Module (110) is configured for creating the respective set of statistical measures, when operating the data center (100) in the second configuration, by:
obtaining the respective second configuration sample relating to the performance of the first service for each one of the second number of observations periods,
generating, for each observation period of the second number of observation periods, the respective second configuration statistical measure based on the respective second configuration sample, and
forming the second set of statistical measures by including the respective second configuration statistical measure for said each observation period of the second number of observations periods, wherein each respective second configuration statistical measure is independent and identically distributed within the second set of statistical measures.
15. The Sample Managing Module (1 10) according to any one of claims 12-14, wherein a sample among the respective samples relates to one or more metrics relating to the performance of the first service.
16. The Sample Managing Module (1 10) according to any one of claims 13-15, when dependent on claim 13, wherein the Sample Managing Module (1 10) is configured for:
applying at least three non-parametric hypothesis tests for evaluating a first hypothesis and a second hypothesis concerning the first and second sets of statistical measures,
for each of said three non-parametric hypotheses tests, computing a respective score, wherein the respective score is set to zero when said each of said three non-parametric hypotheses tests does not reject the first hypothesis and the respective score is set to one otherwise to indicate acceptance of the second hypothesis, wherein a set of scores comprises the respective score for each of said three non-parametric hypothesis tests,
calculating a total score as a weighted sum of the set of scores, and concluding that a difference between the first and second sets of statistical measures is statistically significant when the total score is greater than a threshold value for identifying the difference as statistically significant.
17. The Sample Managing Module (110) according to claim 16, wherein the first
hypothesis assumes that the difference is statistically insignificant, and the second hypothesis assumes that the difference is statistically significant.
18. The Sample Managing Module (1 10) according to any one of claims 16-17, wherein each of said three non-parametric hypothesis tests is applied using a respective significance level or a common significance level.
19. The Sample Managing Module (1 10) according to any one of claims 16-18, wherein the three non-parametric hypothesis tests comprise:
a first test managing detection of a vertical difference between the first and second sets of statistical measures,
a second test managing a median difference between the first and second sets of statistical measures, and
a third test managing a tail difference between the first and second sets of statistical measures.
20. The Sample Managing Module (1 10) according to any one of claims 12-19, wherein the data center (100), when operated in the second configuration, hosts the second and third services, wherein the first service is dependent on the second and third services in that the performance of the first service depends on the resource being operated by the first, second and third services.
21 . The Sample Managing Module (1 10) according to any one of claims 16-20, when dependent on claim 16, wherein weights of the weighted sum are equal to 1 divided by a number of non-parametric hypothesis tests, such as three.
22. The Sample Managing Module (1 10) according to any one of claims 16-21 , wherein the threshold value is 2/3.
23. A computer program (403), comprising computer readable code units which when executed on a Sample Managing Module (1 10) causes the Sample Managing Module (1 10) to perform the method according to any one of claims 1 -1 1.
24. A carrier (405) comprising the computer program according to the preceding claim, wherein the carrier (405) is one of an electronic signal, an optical signal, a radio signal and a computer readable medium.
PCT/SE2016/050232 2016-03-21 2016-03-21 Method and module for managing samples relating to performance of a first service hosted by a data center WO2017164779A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/SE2016/050232 WO2017164779A1 (en) 2016-03-21 2016-03-21 Method and module for managing samples relating to performance of a first service hosted by a data center

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/SE2016/050232 WO2017164779A1 (en) 2016-03-21 2016-03-21 Method and module for managing samples relating to performance of a first service hosted by a data center

Publications (1)

Publication Number Publication Date
WO2017164779A1 true WO2017164779A1 (en) 2017-09-28

Family

ID=55858875

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/SE2016/050232 WO2017164779A1 (en) 2016-03-21 2016-03-21 Method and module for managing samples relating to performance of a first service hosted by a data center

Country Status (1)

Country Link
WO (1) WO2017164779A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112861287A (en) * 2021-03-05 2021-05-28 重庆大学 Robot lightweight effect evaluation method

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015130199A1 (en) * 2014-02-25 2015-09-03 Telefonaktiebolaget L M Ericsson (Publ) Method, computer program and node for management of resources

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015130199A1 (en) * 2014-02-25 2015-09-03 Telefonaktiebolaget L M Ericsson (Publ) Method, computer program and node for management of resources

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
ASMUSSEN, S: "Applied Probability andQueues", 2003, SPRINGER, NEW YORK
SHARMA BIKASH ET AL: "CloudPD: Problem determination and diagnosis in shared dynamic clouds", 2013 43RD ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS (DSN), IEEE, 24 June 2013 (2013-06-24), pages 1 - 12, XP032473580, ISSN: 1530-0889, ISBN: 978-1-4673-6471-3, [retrieved on 20130806], DOI: 10.1109/DSN.2013.6575298 *
SIDATH HANDURUKANDE ET AL: "Magneto approach to QoS monitoring", INTEGRATED NETWORK MANAGEMENT (IM), 2011 IFIP/IEEE INTERNATIONAL SYMPOSIUM ON, IEEE, 23 May 2011 (2011-05-23), pages 209 - 216, XP032035481, ISBN: 978-1-4244-9219-0, DOI: 10.1109/INM.2011.5990693 *
YANGGRATOKE RERNGVIT ET AL: "Predicting service metrics for cluster-based services using real-time analytics", 2015 11TH INTERNATIONAL CONFERENCE ON NETWORK AND SERVICE MANAGEMENT (CNSM), IFIP, 9 November 2015 (2015-11-09), pages 135 - 143, XP032838991, DOI: 10.1109/CNSM.2015.7367349 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112861287A (en) * 2021-03-05 2021-05-28 重庆大学 Robot lightweight effect evaluation method
CN112861287B (en) * 2021-03-05 2022-09-16 重庆大学 Robot lightweight effect evaluation method

Similar Documents

Publication Publication Date Title
CN108595157B (en) Block chain data processing method, device, equipment and storage medium
CN110554958B (en) Graph database testing method, system, device and storage medium
US8225131B2 (en) Monitoring service endpoints
US9134997B2 (en) Methods for assessing deliverable product quality and devices thereof
CN109815405B (en) Gray level shunting method and system
CN112633461A (en) Application assistance system and method, and computer-readable recording medium
CN110245684B (en) Data processing method, electronic device, and medium
CN110046086B (en) Expected data generation method and device for test and electronic equipment
CN110737655A (en) Method and device for reporting data
US8924343B2 (en) Method and system for using confidence factors in forming a system
CN116737373A (en) Load balancing method, device, computer equipment and storage medium
WO2017164779A1 (en) Method and module for managing samples relating to performance of a first service hosted by a data center
WO2023071566A1 (en) Data processing method and apparatus, computer device, computer-readable storage medium, and computer program product
CN113869989B (en) Information processing method and device
CN111046007B (en) Method, apparatus and computer program product for managing a storage system
CN114116688B (en) Data processing and quality inspection method and device and readable storage medium
CN113111078B (en) Resource data processing method and device, computer equipment and storage medium
CN113360672B (en) Method, apparatus, device, medium and product for generating knowledge graph
CN115509853A (en) Cluster data anomaly detection method and electronic equipment
CN107229487B (en) Code processing method and device
US11294804B2 (en) Test case failure with root cause isolation
CN110266610B (en) Traffic identification method and device and electronic equipment
CN112215527A (en) Logistics management method and device
US10528400B2 (en) Detecting deadlock in a cluster environment using big data analytics
CN114338846A (en) Message testing method and device

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16719151

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 16719151

Country of ref document: EP

Kind code of ref document: A1