WO2019028648A1 - Processing performance data for machine learning - Google Patents

Processing performance data for machine learning Download PDF

Info

Publication number
WO2019028648A1
WO2019028648A1 PCT/CN2017/096358 CN2017096358W WO2019028648A1 WO 2019028648 A1 WO2019028648 A1 WO 2019028648A1 CN 2017096358 W CN2017096358 W CN 2017096358W WO 2019028648 A1 WO2019028648 A1 WO 2019028648A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
interpolation
performance data
system performance
performance
Prior art date
Application number
PCT/CN2017/096358
Other languages
French (fr)
Inventor
Chow Kingsum
Wanyi ZHU
Chengdong LI
Puyuan WU
Original Assignee
Alibaba Group Holding Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Limited filed Critical Alibaba Group Holding Limited
Priority to PCT/CN2017/096358 priority Critical patent/WO2019028648A1/en
Publication of WO2019028648A1 publication Critical patent/WO2019028648A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3452Performance evaluation by statistical analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems

Definitions

  • Cloud service providers and data centers usually process large number of computer system performance data to improve data center operation efficiency such as software application migrations and capacity planning.
  • Approaches to implement trained machine learning to solve large scale performance optimization include manual examination of relations among variables.
  • FIG. 1 illustrates an example system for processing performance data of a computing system.
  • FIG. 2 illustrates an example operation environment of a system of FIG. 1 for processing performance data of a computing system.
  • FIG. 3 illustrates an example operation process of the system of FIG. 1.
  • FIG. 4 illustrates another example operation process of the system of FIG. 1.
  • FIG. 5 illustrates another example operation process of the system of FIG. 1.
  • system 10 may include one or more memory 100 which stores computer executable instructions, which when executed by one or more processing units, configure the processing units and the related computing device (s) to implement a data receiving unit 110, a naming convention unit 120, a data enhancing unit 130, a unification unit 140 and a machine learning unit 150.
  • System 10 may also include one or more processing units (PU) 160, interfacing units 170, communication units 180 and/or other components 190.
  • PU processing units
  • system 10 and the units thereof e.g., data receiving unit 110, naming convention unit 120, data enhancing unit 130, unification unit 140 and machine learning unit 150 are illustrated as a single system, it does not necessarily mean that all components of system 10 are physically or functionally located within a single computing system. All the units as illustrated in system 10 of FIG. 1 may be located on separate computing systems communicating and functioning together in a distributed computing environment.
  • FIG. 2 illustrates an example operation environment 200 of system 10. Note that FIG. 2 illustrates operation environment 200 in a distributed computing scheme, which is not necessary for the implementation. Some or all of the communication shown in FIG. 2 may be implemented through data bus (i.e., local communication) within a single computing system.
  • data bus i.e., local communication
  • units of system 10 may communicate with one another through a network 210.
  • One or more units of system 10 may further communicate with data source (s) 230 (230-1 to 230-N) and/or target computing system (s) 240 (240-1 to 240-M) through a network 220.
  • Each data source 230 may acquire/collect performance data of one or more target computing systems 240 in one or more system performance domains.
  • a system performance domain may be any domain that is relevant to the performance of a target computing system 240 and/or relevant to an analysis thereof.
  • a system performance domain may include a CPU usage, throughput rate, system bandwidth, system response time, etc.
  • System 10 and/or components thereof, e.g., data receiving unit 110, may be configured to communicate with data source 230 to receive the system performance data acquired and/or processed by data source 230.
  • data source 230 may engage different mechanisms to acquire data and/or process acquired data of target computing system (s) 240 because the data from different data sources may include different formats such as attribute name and measurement units.
  • the sampling frequency of data sources 230 may also vary, which is reflected in the data entry points of the dataset acquired.
  • the acquired data may be initially processed locally in different data sources 230 in a distributed computing scheme within system 10.
  • the locally and initially processed system performance data may be received by system 10, through data communication via network 220, for further processing.
  • Such a tiered processing of the acquired system performance data in the distributed computing environment may save communication bandwidth, improve capacity allocation, and enable architecture flexibility and adaptability, which are all technical advantages.
  • FIG. 2 illustrates network 210 and network 220 with different reference numerals, it does not necessarily mean that network 210 and network 220 are two separate network systems.
  • Each of networks 210 and 220 may include one or more network systems for data communication, which may overlap and/or be the same physical network systems.
  • data receiving unit 110 may be generally configured to receive system performance data from data sources 230.
  • Data source selection unit 112 may be configured to select multiple data sources 230, among all the pool of data sources 230, to receive performance data. Any criteria may be used in the selection of data source 230 and all are included in the disclosure. For example, selection criteria may be partially based on the quality of the performance data acquired by a data source 230. For another example, selection criteria may be based on a data acquisition approach used by a data source 230 in acquiring performance data of target computing system (s) 240. For another example, selection criteria may be based on whether the performance data from the multiple data sources 230 are sufficiently consistent in some parameters, e.g., sampling frequency in data acquisition, for further processing. In an example implementation, the selection may be made based partially on at least one of a frequency, a time interval, a data metric, and a measurement unit associated with the performance data collected by data sources 230.
  • Naming convention unit 120 may be generally configured to generate an attribute name and a measurement unit for a system performance domain and convert the received performance data in the performance domain using the standardized attribute name and measurement unit.
  • attribute name generation unit 122 is configured to generate the attribute name and measurement unit
  • converting unit 126 is configured to perform the conversion of received performance data using the generated attribute name and measurement unit (s) .
  • attribute name generation unit 122 may generate a unified naming convention for a system performance domain, which may include attribute name and measurement unit. Therefore, system performance data in the system performance domain of, e.g., network bandwidth, that are received from different data sources 230 may be converted under the unified naming convention including the attribute name and the measurement unit.
  • a name convention may be:
  • an example name convention may be network_bandwidth_kB_s, which indicates that the units of measurements for network bandwidth are kilobytes per second.
  • Measurement unit generation unit 124 of attribute name generation unit 122 may be configured to generate a specific measurement unit for a scenario.
  • the measurement unit may also be packets per second and the name convention may be network_bandwidth_pack_s.
  • Measurement unit generation unit 124 may also be configured to generate a measurement unit used for data unification. For example, still with the illustrative example of network bandwidth, in a case that data receiving unit 110 receives network bandwidth data in both kilobytes per second format and packet per second format, a new measurement unit of kilobytes per packet may be generated and consequently a new name convention of:
  • Data enhancing unit 130 may be generally configured to enhance the converted performance data with an interpolation operation using an interpolation scheme determined based on a type of the performance domain.
  • mapping unit 132 may be configured to map the converted system performance data (e.g., converted by converting unit 126) with a unified time scale.
  • the performance data originally acquired by data source 230 may include different time scales due to the different acquisition procedures/approaches, e.g., different data sampling frequency/interval, sampling points mismatch, etc. Conversion of the performance data using the attribute name and measurement units may not completely resolve the issue of inconsistent time scales among performance data in the same performance domain.
  • the converted data may be mapped with a unified time scale.
  • Data entries that do not fit into the unified time scale may be removed from further operations (but may not need to be deleted and may be maintained for other analytical purposes) .
  • Interpolation scheme selection unit 134 may be configured to select an interpolation scheme based on different performance domains. For performance data in different performance domains, different interpolation schemes may be selected to determine an estimated value to impute for a missing value.
  • the interpolation scheme may include a zero interpolation, an arithmetic mean substitution, a cubic spline interpolation or a Kalman filter interpolation.
  • a cubic spline interpolation scheme may be selected as a default interpolation scheme.
  • a cubic spline interpolation scheme generally tends to be more stable than polynomial interpolation and tends to include less possibility of wild oscillations between the tabulated points.
  • an arithmetic mean substitution interpolation scheme may be selected for an interpolation operation.
  • a Kalman filter interpolation scheme may be selected for the interpolation operation.
  • a Zero interpolation scheme may be selected for the interpolation operation.
  • interpolation scheme selection unit 134 may select multiple interpolation schemes as candidates for a performance domain. The selected multiple candidates may be used in an interpolation operation with an order of priority.
  • Interpolation operation unit 136 may be configured to perform an interpolation operation to obtain an estimated value for a missing value using an interpolation scheme (s) selected by the interpolation scheme selection unit 134. In a case that multiple interpolation schemes are selected, interpolation operation unit 136 may first use an interpolation scheme that has a higher priority in the order.
  • Confidence interval determination unit 138 may be configured to determine whether a value estimated by the operation of interpolation operation unit 136 is within a confidence interval.
  • the confidence interval may be determined dynamically by confidence interval determination unit 138 for a performance domain and/or a performance data of the performance domain, or may be received, e.g., through user inputs.
  • Unification unit 140 may be configured to merge converted performance data in one or more system performance domains with a unified time scale to generate a unified dataset.
  • unified time scale generation unit 142 may be configured to generate a unified time scale for one or more performance domains.
  • the unified time scale (s) may be used for the generation of a unified dataset and/or may be used for other operations.
  • a unified time scale generated by unified time scale generation unit 142 may be used in the operation of mapping unit 132 in mapping the converted performance data.
  • unified time scale generation unit 142 may generate a single unified time scale for all the performance data enhanced and merged into the unified dataset. That is, the same unified data scale may be used in the mapping operation of mapping unit 132 and may be used in the operation of unification. In another example, a time scale (s) is generated separately for the data enhancing operation (s) and another time scale is generated separately for the dataset unification operation.
  • Unified dataset generation unit 144 may be configured to merge multiple converted performance data for one or more system performance domains to generate a unified dataset using a unified time. In the merging, data items (values) that do not map into the unified time scale may not enter the unified dataset, but may be kept separately for potential other analytical uses.
  • unified dataset generation unit 144 may be configured to merge the converted performance data after the enhancing operation by data enhancing unit 130.
  • unification unit 140 may be configured to merge converted performance data without the enhancing operation (s) .
  • Machine learning unit 150 may be configured to learn a target computing system using the unified dataset (s) , wherein analysis unit 152 may be configured to analyze/cause to analyze the unified dataset to determine the performance status of the target computing system, and control unit 154 may be configured to control/cause to control the target computing system 240 based on a result of the analysis. For example, based on a result of the analysis, a configuration of the target computing system 240, e.g., operation parameters and/or computing capacity allocation, may be adjusted.
  • data source selection unit 112 may select multiple data sources 230, among all the pool of data sources 230, to receive performance data. Any criteria may be used in the selection of data source 230 and all are included in the disclosure. In an example implementation, the selection may be made based on at least one of a frequency, a time interval, a data metric, and a measurement unit associated with the performance data collected by data sources 230.
  • data receiving unit 110 may receive performance data of a target computing system (s) 240 from the selected multiple data sources 230. Any approach may be adopted in the data receiving and all are included in the disclosure. Although FIG. 1 illustrates that data source 230 are separate from data receiving unit 110, the disclosure is not limited to such a specific example. Data receiving unit 110 may physically reside together with and/or function together with one or more data source (s) 230. As such, the receiving performance data from multiple data sources 230 includes the scenario (s) that data receiving unit 110 causes/controls data source 230 to acquire system performance data from target computing system (s) 240.
  • attribute name generation unit 122 may generate an attribute name for a system performance domain.
  • measurement unit generation unit 124 may generate a measurement unit for the system performance domain associated with the generated attribute name.
  • attribute name and measurement unit are generated together, e.g., an attribute name includes measurement unit.
  • an example attribute name may be network_bandwidth_pack_s, wherein “network bandwidth” is attribute name and “packet per second” is the measurement unit. Accordingly, operations 330 and 340 may be one operation and/or may be operated together.
  • a measurement unit may be determined separately from the attribute name for different scenarios.
  • an attribute name may be relatively stable for a system performance domain, but the measurement unit may be updated based on different scenarios of, e.g., different performance analysis requirements, different data source 230, etc.
  • measurement unit generation unit 124 may select a measurement unit that tends to minimize data value missing in the following data conversion for multiple data sources. For example, measurement unit generation unit 124 may evaluate the sampling interval/frequency adopted by each of the selected data sources 230 in collecting performance data of a target computing system (s) 240 and select a measurement unit that tends to minimize the detrimental effect (s) of different sampling frequencies/intervals of different data sources 230 in the subsequent conversion operation (s) . Measurement unit generation unit 124 may also evaluate the mismatching in time scales among system performance data obtained from different data sources 230, and may select a measurement unit that tends to minimize the detrimental effect of such mismatched time scales in the subsequent conversion operations.
  • measurement unit generation unit 124 may receive, e.g., from unified time scale generation unit 142, a unified time scale configured to be used for one or more of the data enhancing operation or the data unification operation. Measurement unit generation unit 124 may select a measurement unit based on the unified time scale to, e.g., minimize missing value .
  • converting unit 126 may convert the received system performance data from multiple data sources 230 using the determined attribute name and measurement unit.
  • data enhancing unit 130 may enhance the converted performance data with an interpolation operation using an interpolation scheme based on a type of the performance domain.
  • the interpolation scheme may include a zero interpolation, an arithmetic mean substitution, a cubic spline interpolation or a Kalman filter interpolation.
  • a cubic spline interpolation scheme may be selected as a default imputation scheme.
  • a cubic spline interpolation scheme generally tends to be more stable than polynomial interpolation and tends to include less possibility of wild oscillations between the tabulated points.
  • an arithmetic mean substitution interpolation scheme may be selected for an interpolation operation.
  • a Kalman filter interpolation scheme may be selected for the interpolation operation.
  • a Zero interpolation scheme may be selected for the interpolation operation.
  • interpolation scheme selection unit 134 may select multiple interpolation schemes as candidates for a performance domain.
  • FIG. 4 illustrates an example operation flow 400 including example details of operation 360 of FIG. 3.
  • mapping unit 132 of data enhancing unit 130 may map the converted system performance data (e.g., converted by converting unit 126) with a unified time scale.
  • the unified time scale may be received from unified time scale generation unit 142 and/or may be received from other sources, e.g., through a user input.
  • data entries that do not fit into the unified time scale may be removed from the further enhancing operations (but may not need to be deleted) .
  • interpolation scheme selection unit 134 may select an interpolation scheme based on a type of the performance domain. For performance data in different performance domains, different interpolation schemes may be selected for estimating a missing value.
  • confidence interval determination unit 138 may determine a confidence interval for an estimated value to impute into the converted performance data to replace a missing value for a system performance domain.
  • the confidence interval may be determined dynamically by confidence interval determination unit 138 for a performance domain and/or a performance data of the performance domain, or may be received, e.g., through user inputs.
  • interpolation operation unit 136 may perform an interpolation operation to obtain an estimated value for a missing value using an interpolation scheme (s) selected by the interpolation scheme selection unit 134.
  • interpolation scheme selected by the interpolation scheme selection unit 134.
  • different interpolation schemes may be used in the interpolation operation, corresponding to the interpolation scheme selection operation of operation 410.
  • confidence interval determination unit 138 may determine whether an estimated value determined by the operation of interpolation operation unit 136 is within the confidence interval. If an estimated value is within the confidence interval, the operation flow proceeds to example operation 460, where the estimated value will be imputed into the system performance data to substitute for a missing value. If the estimated value is not within the confidence interval, the operation flow may revert back to example operation 420, where another interpolation scheme may be selected for the interpolation operation.
  • unification unit 140 may merge converted performance data in one or more system performance domains with a unified time scale to generate a unified dataset.
  • FIG. 5 illustrates an example operation flow 500 of example details of operation 370.
  • unified time scale generation unit 142 generates a unified time scale for one or more performance domains.
  • unified time scale generation unit 142 may generate a single unified time scale for all the performance data enhanced and to be merged into a unified dataset.
  • unified time scale generation unit 142 may select a time scale that minimizes data value missing in the mapping of data (converted data and/or enhanced data) with the unified time scale.
  • unified dataset generation unit 144 may merge multiple covered performance data for one or more system performance domains using the unified time scale to generate a unified dataset.
  • data items (values) that do not map into the unified time scale may not enter the unified dataset, but may be kept separately for potential other uses.
  • unified dataset generation unit 144 may merge the converted performance data after the enhancing operation by data enhancing unit 130.
  • unification unit 140 may merge converted performance data without the enhancing operation.
  • analysis unit 152 may analyze/cause to analyze the unified dataset to determine the performance status of the target computing system. Any existing and/or future developed performance analysis may be used and all are included in the disclosure.
  • control unit 154 may control/cause to control the target computing system 240 based on a result of the analysis. For example, based on a result of the analysis, a configuration of an associated target computing system 240, e.g., operation parameters and/or computing resource allocation, may be adjusted.
  • FIGS. 3-5 can be implemented in hardware, software, or a combination thereof.
  • the operations represent computer-executable instructions stored on one or more computer-readable storage media that, when executed by one or more processors, perform the recited operations.
  • computer-executable instructions include routines, programs, objects, components, data structures, and the like that perform particular functions or implement particular abstract data types.
  • hardware components perform one or more of the operations.
  • Such hardware components may include or be incorporated into processors, application-specific integrated circuits (ASICs) , programmable circuits such as field programmable gate arrays (FPGAs) , or in other ways.
  • ASICs application-specific integrated circuits
  • FPGAs field programmable gate arrays
  • the memory may include computer readable media such as a volatile memory, a Random Access Memory (RAM) , and/or non-volatile memory, e.g., Read-Only Memory (ROM) or flash RAM, and so on.
  • RAM Random Access Memory
  • ROM Read-Only Memory
  • the memory is an example of a computer readable medium.
  • Computer readable media include non-volatile, volatile, mobile and non-mobile media, and can implement information storage through any method or technology.
  • the information may be computer readable instructions, data structures, program modules or other data.
  • Examples of storage media of a computer include, but not limited to, Phase-change RAMs (PRAMs) , Static RAMs (SRAMs) , Dynamic RAMs (DRAMs) , other types of RAMs, ROMs, Electrically Erasable Programmable Read-Only Memories (EEPROMs) , flash memories or other memory technologies, Compact Disk Read-Only Memories (CD-ROMs) , Digital Versatile Discs (DVDs) or other optical memories, cassettes, cassette and disk memories or other magnetic memory devices or any other non-transmission media, and can be used for storing information accessible to the computation device.
  • the computer readable media exclude transitory media, such as modulated data signals and carriers.
  • the terms "include” , “comprise” , or any variants thereof are intended to cover a non-exclusive inclusion, such that a process, a method, a product, or a device that includes a series of elements not only includes such elements but also includes other elements not specified expressly, or may further include inherent elements of the process, method, product, or device. In the absence of more restrictions, an element limited by “include a/an" does not exclude other same elements existing in the process, method, product, or device that includes the element.
  • Clause 1 a computer implemented method, comprising: receiving performance data of a computing system in a system performance domain from multiple sources; generating an attribute name and a measurement unit for the system performance domain; converting the performance data received from the sources using the attribute name and the measurement unit; and enhancing the converted performance data with an interpolation operation using an interpolation scheme determined based on a type of the performance domain.
  • Clause 2 the method of clause 1, wherein the generating the measurement unit for the system performance domain includes evaluating a sampling interval in collecting performance data from each of the multiple sources.
  • Clause 3 the method of claim 1, wherein the generating the measurement unit for the system performance domain includes selecting a measurement unit that minimizes missing data values in the converting.
  • Clause 4 the method of claim 1, wherein the generating the measurement unit for the system performance domain include evaluating mismatching in time scales among system performance data obtained from two or more different sources.
  • Clause 5 the method of claim 4, wherein the generating the measurement unit for the system performance domain includes selecting a measurement unit that minimizes the mismatching in time scales among the system performance data from the two or more different sources.
  • Clause 6 the method of claim 1, further comprising selecting the multiple sources from a pool of system performance data sources, the selecting being made based at least partially on at least one of a frequency, a time interval, a data metric, and a measurement unit associated with performance data collected in each system performance data source in the pool.
  • Clause 7 the method of claim 1, wherein the receiving the performance data includes receiving the performance data on multiple system performance domains, and the enhancing the converted performance data with the interpolation operation includes using a first interpolation scheme for converted performance data in a first system performance domain and a second different interpolation scheme for converted performance data in a second different system performance domain.
  • Clause 8 the method of claim 1, wherein the interpolation scheme includes at least one of a zero interpolation, an arithmetic mean substitution, a cubic spline interpolation or a Kalman filter interpolation.
  • Clause 9 the method of claim 1, on a condition that the system performance domain includes CPU frequency data, an arithmetic mean substitution interpolation scheme is used for the interpolation operation.
  • Clause 10 the method of claim 1, on a condition that the system performance domain includes instantaneous performance data, a Kalman filter interpolation scheme is used for the interpolation operation.
  • Clause 11 the method of claim 1, on a condition that the system performance domain includes event log data, a Zero interpolation scheme is used for the interpolation operation.
  • Clause 12 the method of claim 1, further comprising generating a unified timescale for one or more system performance domains.
  • Clause 13 the method of claim 12, further comprising merging converted performance data in the one or more system performance domains with the unified timescale to generate a unified dataset.
  • Clause 14 the method of claim 13, further comprising analyzing a performance of the computing system using the unified dataset.
  • Clause 15 the method of claim 1, wherein the enhancing the converted performance data with the interpolation operation includes using a interpolation scheme to impute a missing value of the converted performance data with an estimated value within a confidence interval.
  • Clause 16 the method of claim 1, wherein the enhancing the converted performance data with the interpolation operation includes mapping the covered performance data with a unified time scale.
  • Clause 17 a method, comprising: receiving multiple datasets; providing a unified time scale; mapping the multiple datasets with the unified time scale; enhancing a mapped dataset in the multiple dataset with an interpolation scheme based on a type of the dataset to generate an enhanced dataset; and merging the mapped multiple datasets including the enhanced dataset into a unified dataset.
  • Clause 18 the method of claim 17, wherein the interpolation scheme includes at least one of a zero interpolation, an arithmetic mean substitution, a cubic spline interpolation or a Kalman filter interpolation.
  • Clause 19 the method of claim 1, wherein the providing the unified time scale includes selecting a time scale that minimizes data value missing in the mapping.
  • a computing device comprising: one or more processors; memory; and a plurality of programing instructions stored on the memory and executable by the one or more processors to implement: a data receiving unit operable to receive performance data of a computing system in a system performance domain from multiple sources, an attribute name generation unit operable to generate an attribute name and a measurement unit for the system performance domain, a converting unit operable to convert the performance data obtained from the multiple sources using the attribute name and the measurement unit; and a data enhancing unit operable to enhance the converted performance data with an interpolation operation using an interpolation scheme determined based on a type of the performance domain.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Hardware Design (AREA)
  • Probability & Statistics with Applications (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

Methods and computing devices for processing system performance data for machine learning are provided. System performance data in system performance domains are received and converted under naming conventions each including an attribute name and a measurement unit. The converted system performance data are enhanced using interpolation and merged into a unified dataset with a unified time scale.

Description

PROCESSING PERFORMANCE DATA FOR MACHINE LEARNING BACKGROUND
Cloud service providers and data centers usually process large number of computer system performance data to improve data center operation efficiency such as software application migrations and capacity planning. Approaches to implement trained machine learning to solve large scale performance optimization include manual examination of relations among variables.
BRIEF DESCRIPTION OF THE DRAWINGS
The detailed description is set forth with reference to the accompanying figures. In the figures, the left-most digit (s) of a reference number identifies the figure in which the reference number first appears. The use of the same reference numbers in different figures indicates similar or identical items or features.
FIG. 1 illustrates an example system for processing performance data of a computing system.
FIG. 2 illustrates an example operation environment of a system of FIG. 1 for processing performance data of a computing system.
FIG. 3 illustrates an example operation process of the system of FIG. 1.
FIG. 4 illustrates another example operation process of the system of FIG. 1.
FIG. 5 illustrates another example operation process of the system of FIG. 1.
DETAILED DESCRIPTION
The disclosure provides a solution to process performance data of a computing system. In the following detailed description, references are made to the accompanying drawings that form a part hereof, and in which are shown by way of illustration specific configurations or examples, in which like numerals represent like elements throughout the several figures.
1. Example Devices
Referring to FIG. 1, an example system 10 for processing performance data of a computing system is illustrated. As shown in FIG. 1, system 10 may include one or more memory 100 which stores computer executable instructions, which when executed by one or more processing units, configure the processing units and the related computing device (s) to implement a data receiving unit 110, a naming convention unit 120, a data enhancing unit 130, a unification unit 140 and a machine learning unit 150.
Data receiving unit 110 may further include a data source selection unit 112; naming convention unit 120 may further include an attribute name generation unit 122 (including a measurement unit generation unit 124) and a converting unit 126; data enhancing unit 130 may further include a mapping unit 132, an interpolation scheme selection unit 134 an interpolation operation unit 136, and a confidence interval determination unit 138; unification unit 140 may further include a unified time scale generation unit 142 and a  unified dataset generation unit 144; and machine learning unit 150 may further include an analysis unit 152 and a control unit 154.
System 10 may also include one or more processing units (PU) 160, interfacing units 170, communication units 180 and/or other components 190.
It should be appreciated that although system 10 and the units thereof, e.g., data receiving unit 110, naming convention unit 120, data enhancing unit 130, unification unit 140 and machine learning unit 150 are illustrated as a single system, it does not necessarily mean that all components of system 10 are physically or functionally located within a single computing system. All the units as illustrated in system 10 of FIG. 1 may be located on separate computing systems communicating and functioning together in a distributed computing environment.
FIG. 2 illustrates an example operation environment 200 of system 10. Note that FIG. 2 illustrates operation environment 200 in a distributed computing scheme, which is not necessary for the implementation. Some or all of the communication shown in FIG. 2 may be implemented through data bus (i.e., local communication) within a single computing system.
As shown in FIG. 2, units of system 10 may communicate with one another through a network 210. One or more units of system 10 may further communicate with data source (s) 230 (230-1 to 230-N) and/or target computing system (s) 240 (240-1 to 240-M) through a network 220. Each data source 230 may acquire/collect performance data of one or more target computing systems 240 in one or more system performance domains. A system performance domain may be any domain that is relevant to the  performance of a target computing system 240 and/or relevant to an analysis thereof. For example, a system performance domain may include a CPU usage, throughput rate, system bandwidth, system response time, etc.
System 10 and/or components thereof, e.g., data receiving unit 110, may be configured to communicate with data source 230 to receive the system performance data acquired and/or processed by data source 230. It is understood that data source 230 may engage different mechanisms to acquire data and/or process acquired data of target computing system (s) 240 because the data from different data sources may include different formats such as attribute name and measurement units. The sampling frequency of data sources 230 may also vary, which is reflected in the data entry points of the dataset acquired.
In an example, as illustrated in FIG. 2, the acquired data may be initially processed locally in different data sources 230 in a distributed computing scheme within system 10. The locally and initially processed system performance data may be received by system 10, through data communication via network 220, for further processing. Such a tiered processing of the acquired system performance data in the distributed computing environment may save communication bandwidth, improve capacity allocation, and enable architecture flexibility and adaptability, which are all technical advantages.
It should be appreciated that although FIG. 2 illustrates network 210 and network 220 with different reference numerals, it does not necessarily mean that network 210 and network 220 are two separate network systems. Each of  networks  210 and 220 may include one or more network systems for  data communication, which may overlap and/or be the same physical network systems.
In operation, data receiving unit 110 may be generally configured to receive system performance data from data sources 230. Data source selection unit 112 may be configured to select multiple data sources 230, among all the pool of data sources 230, to receive performance data. Any criteria may be used in the selection of data source 230 and all are included in the disclosure. For example, selection criteria may be partially based on the quality of the performance data acquired by a data source 230. For another example, selection criteria may be based on a data acquisition approach used by a data source 230 in acquiring performance data of target computing system (s) 240. For another example, selection criteria may be based on whether the performance data from the multiple data sources 230 are sufficiently consistent in some parameters, e.g., sampling frequency in data acquisition, for further processing. In an example implementation, the selection may be made based partially on at least one of a frequency, a time interval, a data metric, and a measurement unit associated with the performance data collected by data sources 230.
Naming convention unit 120 may be generally configured to generate an attribute name and a measurement unit for a system performance domain and convert the received performance data in the performance domain using the standardized attribute name and measurement unit. In an example, attribute name generation unit 122 is configured to generate the attribute name and measurement unit, and converting unit 126 is configured to perform the  conversion of received performance data using the generated attribute name and measurement unit (s) .
For example, for each system performance domain, e.g., a network bandwidth, attribute name generation unit 122 may generate a unified naming convention for a system performance domain, which may include attribute name and measurement unit. Therefore, system performance data in the system performance domain of, e.g., network bandwidth, that are received from different data sources 230 may be converted under the unified naming convention including the attribute name and the measurement unit. For example, a name convention may be:
<attribute_name>_<unit_of_measurement>. As an illustrative example, with respect to system performance domain of network bandwidth, an example name convention may be network_bandwidth_kB_s, which indicates that the units of measurements for network bandwidth are kilobytes per second.
In an example, depending on different system analysis scenarios, e.g., the selected sources 230 and their data acquiring approaches, the units of measurement for an attribute name may vary. Measurement unit generation unit 124 of attribute name generation unit 122 may be configured to generate a specific measurement unit for a scenario. As an illustrative example of network bandwidth, the measurement unit may also be packets per second and the name convention may be network_bandwidth_pack_s.
Measurement unit generation unit 124 may also be configured to generate a measurement unit used for data unification. For example, still with the illustrative example of network bandwidth, in a case that data receiving unit 110 receives network bandwidth data in both kilobytes per second format and  packet per second format, a new measurement unit of kilobytes per packet may be generated and consequently a new name convention of:
<new_name>_kB_pack.
Data enhancing unit 130 may be generally configured to enhance the converted performance data with an interpolation operation using an interpolation scheme determined based on a type of the performance domain.
Within data enhancing unit 130, mapping unit 132 may be configured to map the converted system performance data (e.g., converted by converting unit 126) with a unified time scale. The performance data originally acquired by data source 230 may include different time scales due to the different acquisition procedures/approaches, e.g., different data sampling frequency/interval, sampling points mismatch, etc. Conversion of the performance data using the attribute name and measurement units may not completely resolve the issue of inconsistent time scales among performance data in the same performance domain.
In the mapping operation, the converted data may be mapped with a unified time scale. Data entries that do not fit into the unified time scale may be removed from further operations (but may not need to be deleted and may be maintained for other analytical purposes) .
Interpolation scheme selection unit 134 may be configured to select an interpolation scheme based on different performance domains. For performance data in different performance domains, different interpolation schemes may be selected to determine an estimated value to impute for a missing value. The interpolation scheme may include a zero interpolation, an arithmetic mean substitution, a cubic spline interpolation or a Kalman filter  interpolation. For example, a cubic spline interpolation scheme may be selected as a default interpolation scheme. A cubic spline interpolation scheme generally tends to be more stable than polynomial interpolation and tends to include less possibility of wild oscillations between the tabulated points.
Besides or instead of the default scheme, in the case that the system performance domain includes CPU frequency data, an arithmetic mean substitution interpolation scheme may be selected for an interpolation operation. In the case that the system performance domain includes instantaneous performance data, a Kalman filter interpolation scheme may be selected for the interpolation operation. In the case that the system performance domain includes event log data, a Zero interpolation scheme may be selected for the interpolation operation. In an example, interpolation scheme selection unit 134 may select multiple interpolation schemes as candidates for a performance domain. The selected multiple candidates may be used in an interpolation operation with an order of priority.
Interpolation operation unit 136 may be configured to perform an interpolation operation to obtain an estimated value for a missing value using an interpolation scheme (s) selected by the interpolation scheme selection unit 134. In a case that multiple interpolation schemes are selected, interpolation operation unit 136 may first use an interpolation scheme that has a higher priority in the order.
Confidence interval determination unit 138 may be configured to determine whether a value estimated by the operation of interpolation operation unit 136 is within a confidence interval. The confidence interval may be determined dynamically by confidence interval determination unit 138 for a  performance domain and/or a performance data of the performance domain, or may be received, e.g., through user inputs.
Unification unit 140 may be configured to merge converted performance data in one or more system performance domains with a unified time scale to generate a unified dataset. Specifically, for example, unified time scale generation unit 142 may be configured to generate a unified time scale for one or more performance domains. The unified time scale (s) may be used for the generation of a unified dataset and/or may be used for other operations. For example, a unified time scale generated by unified time scale generation unit 142 may be used in the operation of mapping unit 132 in mapping the converted performance data.
In an example, unified time scale generation unit 142 may generate a single unified time scale for all the performance data enhanced and merged into the unified dataset. That is, the same unified data scale may be used in the mapping operation of mapping unit 132 and may be used in the operation of unification. In another example, a time scale (s) is generated separately for the data enhancing operation (s) and another time scale is generated separately for the dataset unification operation.
Unified dataset generation unit 144 may be configured to merge multiple converted performance data for one or more system performance domains to generate a unified dataset using a unified time. In the merging, data items (values) that do not map into the unified time scale may not enter the unified dataset, but may be kept separately for potential other analytical uses.
In an example, unified dataset generation unit 144 may be configured to merge the converted performance data after the enhancing  operation by data enhancing unit 130. However, the scope of the disclosure is not limited to this example and unification unit 140 may be configured to merge converted performance data without the enhancing operation (s) .
Machine learning unit 150 may be configured to learn a target computing system using the unified dataset (s) , wherein analysis unit 152 may be configured to analyze/cause to analyze the unified dataset to determine the performance status of the target computing system, and control unit 154 may be configured to control/cause to control the target computing system 240 based on a result of the analysis. For example, based on a result of the analysis, a configuration of the target computing system 240, e.g., operation parameters and/or computing capacity allocation, may be adjusted.
2. Example Processes
Referring to FIG. 3, an example operation process 300 of system 10 of FIG. 1 is illustrated. In example operation 310, data source selection unit 112 may select multiple data sources 230, among all the pool of data sources 230, to receive performance data. Any criteria may be used in the selection of data source 230 and all are included in the disclosure. In an example implementation, the selection may be made based on at least one of a frequency, a time interval, a data metric, and a measurement unit associated with the performance data collected by data sources 230.
In example operation 320, data receiving unit 110 may receive performance data of a target computing system (s) 240 from the selected multiple data sources 230. Any approach may be adopted in the data receiving and all are included in the disclosure. Although FIG. 1 illustrates that data  source 230 are separate from data receiving unit 110, the disclosure is not limited to such a specific example. Data receiving unit 110 may physically reside together with and/or function together with one or more data source (s) 230. As such, the receiving performance data from multiple data sources 230 includes the scenario (s) that data receiving unit 110 causes/controls data source 230 to acquire system performance data from target computing system (s) 240.
In example operation 330, attribute name generation unit 122 may generate an attribute name for a system performance domain.
In example operation 340, measurement unit generation unit 124 may generate a measurement unit for the system performance domain associated with the generated attribute name. In an example, attribute name and measurement unit are generated together, e.g., an attribute name includes measurement unit. For an illustrative example performance domain of network bandwidth, an example attribute name may be network_bandwidth_pack_s, wherein “network bandwidth” is attribute name and “packet per second” is the measurement unit. Accordingly,  operations  330 and 340 may be one operation and/or may be operated together.
In another example, a measurement unit may be determined separately from the attribute name for different scenarios. For example, an attribute name may be relatively stable for a system performance domain, but the measurement unit may be updated based on different scenarios of, e.g., different performance analysis requirements, different data source 230, etc.
In an example, in generating the measurement unit for a system performance domain, measurement unit generation unit 124 may select a  measurement unit that tends to minimize data value missing in the following data conversion for multiple data sources. For example, measurement unit generation unit 124 may evaluate the sampling interval/frequency adopted by each of the selected data sources 230 in collecting performance data of a target computing system (s) 240 and select a measurement unit that tends to minimize the detrimental effect (s) of different sampling frequencies/intervals of different data sources 230 in the subsequent conversion operation (s) . Measurement unit generation unit 124 may also evaluate the mismatching in time scales among system performance data obtained from different data sources 230, and may select a measurement unit that tends to minimize the detrimental effect of such mismatched time scales in the subsequent conversion operations.
In an example, measurement unit generation unit 124 may receive, e.g., from unified time scale generation unit 142, a unified time scale configured to be used for one or more of the data enhancing operation or the data unification operation. Measurement unit generation unit 124 may select a measurement unit based on the unified time scale to, e.g., minimize missing value .
In example operation 350, converting unit 126 may convert the received system performance data from multiple data sources 230 using the determined attribute name and measurement unit.
In example operation 360, data enhancing unit 130 may enhance the converted performance data with an interpolation operation using an interpolation scheme based on a type of the performance domain. Any existing and/or future developed interpolation scheme (s) may be used and all are included in the disclosure. The interpolation scheme may include a zero  interpolation, an arithmetic mean substitution, a cubic spline interpolation or a Kalman filter interpolation. For example, a cubic spline interpolation scheme may be selected as a default imputation scheme. A cubic spline interpolation scheme generally tends to be more stable than polynomial interpolation and tends to include less possibility of wild oscillations between the tabulated points.
Besides or instead of the default scheme, in a case that the system performance domain includes CPU frequency data, an arithmetic mean substitution interpolation scheme may be selected for an interpolation operation. In a case that the system performance domain includes instantaneous performance data, a Kalman filter interpolation scheme may be selected for the interpolation operation. In a case that the system performance domain includes event log data, a Zero interpolation scheme may be selected for the interpolation operation. In an example, interpolation scheme selection unit 134 may select multiple interpolation schemes as candidates for a performance domain.
FIG. 4 illustrates an example operation flow 400 including example details of operation 360 of FIG. 3.
Referring now to FIG. 4, in example operation 410, mapping unit 132 of data enhancing unit 130 may map the converted system performance data (e.g., converted by converting unit 126) with a unified time scale. The unified time scale may be received from unified time scale generation unit 142 and/or may be received from other sources, e.g., through a user input. In the mapping operation, data entries that do not fit into the unified time scale may be removed from the further enhancing operations (but may not need to be deleted) .
In example operation 420, interpolation scheme selection unit 134 may select an interpolation scheme based on a type of the performance domain. For performance data in different performance domains, different interpolation schemes may be selected for estimating a missing value.
In example operation 430, confidence interval determination unit 138 may determine a confidence interval for an estimated value to impute into the converted performance data to replace a missing value for a system performance domain. The confidence interval may be determined dynamically by confidence interval determination unit 138 for a performance domain and/or a performance data of the performance domain, or may be received, e.g., through user inputs.
In example operation 440, interpolation operation unit 136 may perform an interpolation operation to obtain an estimated value for a missing value using an interpolation scheme (s) selected by the interpolation scheme selection unit 134. In an example, for converted system performance data of different system performance domains, different interpolation schemes may be used in the interpolation operation, corresponding to the interpolation scheme selection operation of operation 410.
In example operation 450, confidence interval determination unit 138 may determine whether an estimated value determined by the operation of interpolation operation unit 136 is within the confidence interval. If an estimated value is within the confidence interval, the operation flow proceeds to example operation 460, where the estimated value will be imputed into the system performance data to substitute for a missing value. If the estimated value is not within the confidence interval, the operation flow may revert back to example  operation 420, where another interpolation scheme may be selected for the interpolation operation.
Referring back to FIG. 3, in example operation 370, unification unit 140 may merge converted performance data in one or more system performance domains with a unified time scale to generate a unified dataset. FIG. 5 illustrates an example operation flow 500 of example details of operation 370.
Referring now to FIG. 5, in example operation 510, unified time scale generation unit 142 generates a unified time scale for one or more performance domains. In an example, unified time scale generation unit 142 may generate a single unified time scale for all the performance data enhanced and to be merged into a unified dataset. In an example, unified time scale generation unit 142 may select a time scale that minimizes data value missing in the mapping of data (converted data and/or enhanced data) with the unified time scale.
In example operation 520, unified dataset generation unit 144 may merge multiple covered performance data for one or more system performance domains using the unified time scale to generate a unified dataset. In the merging, data items (values) that do not map into the unified time scale may not enter the unified dataset, but may be kept separately for potential other uses.
In an example, unified dataset generation unit 144 may merge the converted performance data after the enhancing operation by data enhancing unit 130. However, the scope of the disclosure is not limited to this example  and unification unit 140 may merge converted performance data without the enhancing operation.
Referring back to FIG. 3, in example operation 380, analysis unit 152 may analyze/cause to analyze the unified dataset to determine the performance status of the target computing system. Any existing and/or future developed performance analysis may be used and all are included in the disclosure.
In example operation 390, control unit 154 may control/cause to control the target computing system 240 based on a result of the analysis. For example, based on a result of the analysis, a configuration of an associated target computing system 240, e.g., operation parameters and/or computing resource allocation, may be adjusted.
The processes described above in association with FIGS. 3-5 can be implemented in hardware, software, or a combination thereof. In the context of software, the operations represent computer-executable instructions stored on one or more computer-readable storage media that, when executed by one or more processors, perform the recited operations. Generally, computer-executable instructions include routines, programs, objects, components, data structures, and the like that perform particular functions or implement particular abstract data types. In other embodiments, hardware components perform one or more of the operations. Such hardware components may include or be incorporated into processors, application-specific integrated circuits (ASICs) , programmable circuits such as field programmable gate arrays (FPGAs) , or in other ways. The order in which the operations are described is not intended to  be construed as a limitation, and any number of the described operations can be combined in any order and/or in parallel to implement the processes.
The memory may include computer readable media such as a volatile memory, a Random Access Memory (RAM) , and/or non-volatile memory, e.g., Read-Only Memory (ROM) or flash RAM, and so on. The memory is an example of a computer readable medium.
Computer readable media include non-volatile, volatile, mobile and non-mobile media, and can implement information storage through any method or technology. The information may be computer readable instructions, data structures, program modules or other data. Examples of storage media of a computer include, but not limited to, Phase-change RAMs (PRAMs) , Static RAMs (SRAMs) , Dynamic RAMs (DRAMs) , other types of RAMs, ROMs, Electrically Erasable Programmable Read-Only Memories (EEPROMs) , flash memories or other memory technologies, Compact Disk Read-Only Memories (CD-ROMs) , Digital Versatile Discs (DVDs) or other optical memories, cassettes, cassette and disk memories or other magnetic memory devices or any other non-transmission media, and can be used for storing information accessible to the computation device. According to the definitions herein, the computer readable media exclude transitory media, such as modulated data signals and carriers.
It should be further noted that, the terms "include" , "comprise" , or any variants thereof are intended to cover a non-exclusive inclusion, such that a process, a method, a product, or a device that includes a series of elements not only includes such elements but also includes other elements not specified expressly, or may further include inherent elements of the process, method,  product, or device. In the absence of more restrictions, an element limited by "include a/an…" does not exclude other same elements existing in the process, method, product, or device that includes the element.
Described above are merely the examples of the present application, which are not used to limit the present application. For those skilled in the art, the present application may have various alterations and changes. Any modification, equivalent replacement, improvement, and the like made within the spirit and principle of the present application shall be included in the scope of the claims of the present application.
The disclosure may be appreciated with the following clauses:
Clause 1: a computer implemented method, comprising: receiving performance data of a computing system in a system performance domain from multiple sources; generating an attribute name and a measurement unit for the system performance domain; converting the performance data received from the sources using the attribute name and the measurement unit; and enhancing the converted performance data with an interpolation operation using an interpolation scheme determined based on a type of the performance domain.
Clause 2: the method of clause 1, wherein the generating the measurement unit for the system performance domain includes evaluating a sampling interval in collecting performance data from each of the multiple sources.
Clause 3: the method of claim 1, wherein the generating the measurement unit for the system performance domain includes selecting a measurement unit that minimizes missing data values in the converting.
Clause 4: the method of claim 1, wherein the generating the measurement unit for the system performance domain include evaluating mismatching in time scales among system performance data obtained from two or more different sources.
Clause 5: the method of claim 4, wherein the generating the measurement unit for the system performance domain includes selecting a measurement unit that minimizes the mismatching in time scales among the system performance data from the two or more different sources.
Clause 6: the method of claim 1, further comprising selecting the multiple sources from a pool of system performance data sources, the selecting being made based at least partially on at least one of a frequency, a time interval, a data metric, and a measurement unit associated with performance data collected in each system performance data source in the pool.
Clause 7: the method of claim 1, wherein the receiving the performance data includes receiving the performance data on multiple system performance domains, and the enhancing the converted performance data with the interpolation operation includes using a first interpolation scheme for converted performance data in a first system performance domain and a second different interpolation scheme for converted performance data in a second different system performance domain.
Clause 8: the method of claim 1, wherein the interpolation scheme includes at least one of a zero interpolation, an arithmetic mean substitution, a cubic spline interpolation or a Kalman filter interpolation.
Clause 9: the method of claim 1, on a condition that the system performance domain includes CPU frequency data, an arithmetic mean substitution interpolation scheme is used for the interpolation operation.
Clause 10: the method of claim 1, on a condition that the system performance domain includes instantaneous performance data, a Kalman filter interpolation scheme is used for the interpolation operation.
Clause 11: the method of claim 1, on a condition that the system performance domain includes event log data, a Zero interpolation scheme is used for the interpolation operation.
Clause 12: the method of claim 1, further comprising generating a unified timescale for one or more system performance domains.
Clause 13: the method of claim 12, further comprising merging converted performance data in the one or more system performance domains with the unified timescale to generate a unified dataset.
Clause 14: the method of claim 13, further comprising analyzing a performance of the computing system using the unified dataset.
Clause 15: the method of claim 1, wherein the enhancing the converted performance data with the interpolation operation includes using a interpolation scheme to impute a missing value of the converted performance data with an estimated value within a confidence interval.
Clause 16: the method of claim 1, wherein the enhancing the converted performance data with the interpolation operation includes mapping the covered performance data with a unified time scale.
Clause 17: a method, comprising: receiving multiple datasets; providing a unified time scale; mapping the multiple datasets with the unified time scale; enhancing a mapped dataset in the multiple dataset with an interpolation scheme based on a type of the dataset to generate an enhanced dataset; and merging the mapped multiple datasets including the enhanced dataset into a unified dataset.
Clause 18: the method of claim 17, wherein the interpolation scheme includes at least one of a zero interpolation, an arithmetic mean substitution, a cubic spline interpolation or a Kalman filter interpolation.
Clause 19: the method of claim 1, wherein the providing the unified time scale includes selecting a time scale that minimizes data value missing in the mapping.
Clause 20: a computing device, comprising: one or more processors; memory; and a plurality of programing instructions stored on the memory and executable by the one or more processors to implement: a data receiving unit operable to receive performance data of a computing system in a system performance domain from multiple sources, an attribute name generation unit operable to generate an attribute name and a measurement unit for the system performance domain, a converting unit operable to convert the performance data obtained from the multiple sources using the attribute name and the measurement unit; and a data enhancing unit operable to enhance the converted performance data with an interpolation operation using an interpolation scheme determined based on a type of the performance domain.

Claims (20)

  1. A computer implemented method, comprising:
    receiving performance data of a computing system in a system performance domain from multiple sources;
    generating an attribute name and a measurement unit for the system performance domain;
    converting the performance data received from the sources using the attribute name and the measurement unit; and
    enhancing the converted performance data with an interpolation operation using an interpolation scheme determined based on a type of the performance domain.
  2. The method of claim 1, wherein the generating the measurement unit for the system performance domain includes evaluating a sampling interval in collecting performance data from each of the multiple sources.
  3. The method of claim 1, wherein the generating the measurement unit for the system performance domain includes selecting a measurement unit that minimizes missing data values in the converting.
  4. The method of claim 1, wherein the generating the measurement unit for the system performance domain include evaluating mismatching in time scales among system performance data obtained from two or more different sources of the multiple sources.
  5. The method of claim 4, wherein the generating the measurement unit for the system performance domain includes selecting a measurement unit that minimizes the mismatching in time scales among the system performance data from the two or more different sources of the multiple sources.
  6. The method of claim 1, further comprising selecting the multiple sources from a pool of system performance data sources, the selecting being made based at least partially on at least one of a frequency, a time interval, a data metric, and a measurement unit associated with performance data collected in each system performance data source in the pool.
  7. The method of claim 1, wherein the receiving the performance data includes receiving the performance data on multiple system performance domains, and the enhancing the converted performance data with the interpolation operation includes using a first interpolation scheme for converted performance data in a first system performance domain and a second different interpolation scheme for converted performance data in a second different system performance domain.
  8. The method of claim 1, wherein the interpolation scheme includes at least one of a zero interpolation, an arithmetic mean substitution, a cubic spline interpolation or a Kalman filter interpolation.
  9. The method of claim 1, on a condition that the system performance domain includes CPU frequency data, an arithmetic mean substitution interpolation scheme is used for the interpolation operation.
  10. The method of claim 1, on a condition that the system performance domain includes instantaneous performance data, a Kalman filter interpolation scheme is used for the interpolation operation.
  11. The method of claim 1, on a condition that the system performance domain includes event log data, a Zero interpolation scheme is used for the interpolation operation.
  12. The method of claim 1, further comprising generating a unified timescale for one or more system performance domains.
  13. The method of claim 12, further comprising merging converted performance data in the one or more system performance domains with the unified timescale to generate a unified dataset.
  14. The method of claim 13, further comprising analyzing a performance of the computing system using the unified dataset.
  15. The method of claim 1, wherein the enhancing the converted performance data with the interpolation operation includes using a  interpolation scheme to impute a missing value of the converted performance data with an estimated value within a confidence interval.
  16. The method of claim 1, wherein the enhancing the converted performance data with the interpolation operation includes mapping the covered performance data with a unified time scale.
  17. A method, comprising:
    receiving multiple datasets;
    providing a unified time scale;
    mapping the multiple datasets with the unified time scale;
    enhancing a mapped dataset in the multiple dataset with an interpolation scheme based on a type of the dataset to generate an enhanced dataset; and
    merging the mapped multiple datasets including the enhanced dataset into a unified dataset.
  18. The method of claim 17, wherein the interpolation scheme includes at least one of a zero interpolation, an arithmetic mean substitution, a cubic spline interpolation or a Kalman filter interpolation.
  19. The method of claim 1, wherein the providing the unified time scale includes selecting a time scale that minimizes data value missing in the mapping.
  20. A computing device, comprising:
    one or more processors;
    memory; and
    a plurality of programing instructions stored on the memory and executable by the one or more processors to implement:
    a data receiving unit operable to receive performance data of a computing system in a system performance domain from multiple sources,
    an attribute name generation unit operable to generate an attribute name and a measurement unit for the system performance domain,
    a converting unit operable to convert the performance data obtained from the multiple sources using the attribute name and the measurement unit; and
    a data enhancing unit operable to enhance the converted performance data with an interpolation operation using an interpolation scheme determined based on a type of the performance domain.
PCT/CN2017/096358 2017-08-08 2017-08-08 Processing performance data for machine learning WO2019028648A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2017/096358 WO2019028648A1 (en) 2017-08-08 2017-08-08 Processing performance data for machine learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2017/096358 WO2019028648A1 (en) 2017-08-08 2017-08-08 Processing performance data for machine learning

Publications (1)

Publication Number Publication Date
WO2019028648A1 true WO2019028648A1 (en) 2019-02-14

Family

ID=65273193

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/096358 WO2019028648A1 (en) 2017-08-08 2017-08-08 Processing performance data for machine learning

Country Status (1)

Country Link
WO (1) WO2019028648A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113111744A (en) * 2021-03-29 2021-07-13 华南理工大学 Vein identification method based on time domain short-time and long-time feature fusion

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060074823A1 (en) * 2004-09-14 2006-04-06 Heumann John M Methods and apparatus for detecting temporal process variation and for managing and predicting performance of automatic classifiers
CN101533366A (en) * 2009-03-09 2009-09-16 浪潮电子信息产业股份有限公司 Method for acquiring and analyzing performance data of server
CN101790092A (en) * 2010-03-15 2010-07-28 河海大学常州校区 Intelligent filter designing method based on image block encoding information
EP2960797A1 (en) * 2014-06-27 2015-12-30 Intel Corporation Identification of software phases using machine learning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060074823A1 (en) * 2004-09-14 2006-04-06 Heumann John M Methods and apparatus for detecting temporal process variation and for managing and predicting performance of automatic classifiers
CN101533366A (en) * 2009-03-09 2009-09-16 浪潮电子信息产业股份有限公司 Method for acquiring and analyzing performance data of server
CN101790092A (en) * 2010-03-15 2010-07-28 河海大学常州校区 Intelligent filter designing method based on image block encoding information
EP2960797A1 (en) * 2014-06-27 2015-12-30 Intel Corporation Identification of software phases using machine learning

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113111744A (en) * 2021-03-29 2021-07-13 华南理工大学 Vein identification method based on time domain short-time and long-time feature fusion
CN113111744B (en) * 2021-03-29 2023-02-14 华南理工大学 Vein identification method based on time domain short-time and long-time feature fusion

Similar Documents

Publication Publication Date Title
US9413858B2 (en) Real-time compressive data collection for cloud monitoring
US9378053B2 (en) Generating map task output with version information during map task execution and executing reduce tasks using the output including version information
Baruffa et al. Comparison of MongoDB and Cassandra databases for spectrum monitoring as-a-service
US10225375B2 (en) Networked device management data collection
CN111459944B (en) MR data storage method, device, server and storage medium
US10108689B2 (en) Workload discovery using real-time analysis of input streams
CN109933515B (en) Regression test case set optimization method and automatic optimization device
US20190286491A1 (en) System and method for analyzing and associating elements of a computer system by shared characteristics
US9141251B2 (en) Techniques for guided access to an external distributed file system from a database management system
US11611502B2 (en) Network latency measurement and analysis system
US20170083820A1 (en) Posterior probabilistic model for bucketing records
US20210382775A1 (en) Systems and methods for classifying and predicting the cause of information technology incidents using machine learning
US9836506B2 (en) Dynamic query optimization with pilot runs
US11507563B2 (en) Unsupervised anomaly detection
Vig et al. Test effort estimation and prediction of traditional and rapid release models using machine learning algorithms
WO2019028648A1 (en) Processing performance data for machine learning
CN108256046A (en) The implementation method of the unified access path of big data processing frame source data
CN111897864A (en) Expert database data extraction method and system based on Internet AI outbound
CN115150285B (en) Network topology relation determining method, network system, device and storage medium
US11164349B2 (en) Visualizing a time series relation
Li et al. Kano: Efficient container network policy verification
Burlov et al. Adaptive accessibility management in geographic information systems using fog computing
US11822446B2 (en) Automated testing methods for condition analysis and exploration
CN110998539B (en) Performance impact analysis of system updates
US11436243B2 (en) Data harvester

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17920807

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17920807

Country of ref document: EP

Kind code of ref document: A1