CN113986704A - TS-Decomposition-based data center high-frequency fault time domain early warning method and system - Google Patents

TS-Decomposition-based data center high-frequency fault time domain early warning method and system Download PDF

Info

Publication number
CN113986704A
CN113986704A CN202111255316.1A CN202111255316A CN113986704A CN 113986704 A CN113986704 A CN 113986704A CN 202111255316 A CN202111255316 A CN 202111255316A CN 113986704 A CN113986704 A CN 113986704A
Authority
CN
China
Prior art keywords
data
time sequence
time
trend
sequence data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111255316.1A
Other languages
Chinese (zh)
Inventor
张剑波
姚孟隆
吴梓杭
董峻铎
王红平
王彤
�田�浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China University of Geosciences
Original Assignee
China University of Geosciences
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China University of Geosciences filed Critical China University of Geosciences
Priority to CN202111255316.1A priority Critical patent/CN113986704A/en
Publication of CN113986704A publication Critical patent/CN113986704A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3447Performance evaluation by modeling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3024Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a central processing unit [CPU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3051Monitoring arrangements for monitoring the configuration of the computing system or of the computing system component, e.g. monitoring the presence of processing resources, peripherals, I/O links, software programs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis

Abstract

The invention discloses a TS-Decomposition-based time domain early warning method and a TS-Decomposition-based time domain early warning system for high-frequency faults of a data center, wherein the method comprises the following steps: acquiring time sequence monitoring data under a data center scene and performing data cleaning to obtain a monitoring value of historical time sequence data; analyzing the numerical influence factors of the time sequence data, performing time sequence Decomposition on the time sequence data based on a TS-Decomposition algorithm, and calculating the numerical influence factors according to the monitoring values of the historical time sequence data; establishing a time sequence prediction model according to the numerical value influence factor calculation result, and predicting the trend of time sequence data to obtain a predicted value of future time sequence data; performing mixed calculation according to the monitoring value of the historical time sequence data and the predicted value of the future time sequence data, and judging the fault time domain of the time sequence data; and carrying out time domain early warning on the high-frequency fault of the data center. The method carries out trend prediction on the time sequence data by mining the incidence relation in the time sequence data, analyzes the fault high-frequency distribution interval in the data, and predicts the fault trend before the fault occurs.

Description

TS-Decomposition-based data center high-frequency fault time domain early warning method and system
Technical Field
The invention belongs to the field of computer intelligent operation and maintenance, and particularly relates to a data center high-frequency fault Time domain early warning method and system based on TS-Decomposition (Time Sequence Decomposition).
Background
In recent years, with the rapid development of the 5G technology, the fields of cloud computing, internet of things and the like have never been developed. The data center is a center responsible for managing, organizing and analyzing various cloud computing resources, terminal resources and Internet of things equipment resources, and ensures the stable operation of thousands of equipment; with the explosive growth of data and the increase of computing nodes, the traditional operation and maintenance mode of the data center has difficulty in meeting the current operation and maintenance requirements. At present, in a data center of an enterprise or a higher institution, ten thousand data nodes, computing resources, sensors, terminals and the like are generally operated, relevant monitoring data are continuously generated every moment, how to extract effective information from mass operation and maintenance monitoring data is combined with a specific machine learning algorithm, and data mining, trend prediction, fault warning and the like aiming at monitoring indexes become important subjects of intelligent operation and maintenance of the data center.
In the massive operation and maintenance data, the time sequence data has a certain degree of autocorrelation and periodicity, and the operation and maintenance equipment corresponding to the instant sequence data often has a tendency of failure before the failure occurs. Taking the temperature of the CPU as an example, a high temperature threshold value is often maintained for a long time before the temperature of the CPU reaches a damage value in a continuous time period, and for external devices, the temperature of the CPU and periodic factors such as alternation between day and night, seasons and the like also have a certain degree of influence, and the influence of various factors is reflected in the numerical change of time sequence data and can be used as a basis for operation and maintenance analysis.
In the traditional operation and maintenance process, after a fault occurs, operation and maintenance fault processing is usually implemented by positioning the fault by professional operation and maintenance personnel in combination with fault information and then troubleshooting and solving the fault one by one. At the present stage, cloud computing resources, terminals and sensors are often not mutually independent, and the problems of association between faults, large fault alarm information amount, non-real-time alarm and the like all cause difficulty in operation and maintenance of the data center. How to utilize the existing machine learning algorithm to mine the fault characteristics in the time sequence data, predict the trend of the time sequence data, predict the high-frequency fault time domain in advance through a specific index library, and reasonably distribute operation and maintenance personnel and operation and maintenance resources is an important subject to be solved by the intelligent operation and maintenance of the current data center.
Disclosure of Invention
In view of the above, the invention provides a data center high-frequency fault time domain early warning method and system based on TS-Decomposition, and is used for solving the problem that the operation and maintenance in the existing data cannot effectively predict a high-frequency fault time domain.
The invention discloses a TS-Decomposition-based time domain early warning method for high-frequency faults of a data center, which comprises the following steps:
acquiring time sequence monitoring data under a data center scene and performing data cleaning to obtain a monitoring value of historical time sequence data;
analyzing the numerical influence factors of the time sequence data, performing time sequence Decomposition on the time sequence data based on a TS-Decomposition algorithm, and calculating the numerical influence factors according to the monitoring values of the historical time sequence data;
establishing a time sequence prediction model according to the numerical influence factor calculation result;
predicting the future trend according to the time sequence prediction model to obtain the predicted value of the future time sequence data;
according to different application scenes, performing mixed calculation according to the monitoring value of historical time sequence data and the predicted value of future time sequence data, and judging the fault time domain of the time sequence data;
and carrying out high-frequency fault time domain early warning on the data center according to the fault time domain of the time sequence data.
Preferably, the acquiring and data cleaning of the time sequence monitoring data in the data center scene includes:
establishing a time sequence monitoring data index library in a data center scene, screening periodic or interval numerical time sequence data from time sequence monitoring data in the data center scene according to the time sequence monitoring data index library, and establishing a time sequence data warehouse;
and performing data cleaning on the time series data in the time series data warehouse, wherein the data cleaning comprises but is not limited to outlier elimination, null filling and data interpolation smoothing.
Preferably, analyzing the numerical influence factors of the time sequence data, performing time sequence Decomposition on the time sequence data based on a TS-decomplexing algorithm, and calculating the numerical influence factors according to the monitored values of the historical time sequence data specifically includes:
the types of the historical time sequence data after data cleaning comprise periodic time sequence data and interval time sequence data; for periodic timing data, the numerical influencing factors include: long term tendency factor TtCyclic variation factor CtAnd irregular variation factor ItThe method for performing multiplication model Decomposition on time sequence data based on the TS-Decomposition algorithm comprises the following steps:
Xt=Tt×Ct×It
wherein, XtA value representing the known time series data itself for the time series full variation; t istA long-term tendency factor, CtAs a cyclically varying factor, ItIs an irregular variation factor;
for interval time series data, the numerical influencing factors comprise: long-term trend factors, irregular variation factors;
the time sequence Decomposition based on the TS-Decomposition algorithm and the numerical influence factor calculation according to the monitoring value of the historical time sequence data specifically comprise the following steps:
preprocessing historical time sequence data according to the type of the time sequence data to eliminate cyclic variation factor CtInfluence on historical time series data values;
fitting a trend function by adopting a trend reasoning method, and analyzing to obtain a long-term trend factor TtAccording to the preprocessing result and the obtained long-term trend factor, calculating to obtain an irregular variation factor ItWherein, TtAnd XtHaving the same dimensions, CtAnd ItAre ratios.
Preferably, the preprocessing according to the type of the historical time series data to eliminate the influence of the cyclic variation factor on the time series data value specifically includes:
the method comprises the following steps of carrying out moving average on periodic time sequence data, and eliminating the influence of cyclic variation factors on the time sequence data value, wherein the formula of the moving average is as follows:
Figure BDA0003323909490000031
wherein, XtA value representing the time series itself for a known full variation of the time series; ctIs a cyclic variation; d' is the result of the moving average processing of the periodic time series data;
aiming at the interval trend time sequence data, the interval trend time sequence data is regarded as a small range interval of the periodic data, and the interval trend time sequence data does not contain a cyclic variation factor and does not need to eliminate the cyclic variation factor;
and recording the time sequence data of the preprocessed elimination cycle variation factors as a moving average sequence D.
Preferably, fitting a trend function by using a trend reasoning method, analyzing to obtain a long-term trend factor, and calculating to obtain an irregular variation factor according to the preprocessing result and the obtained long-term trend factor specifically comprises:
analyzing the trend of the moving average number series D obtained by preprocessing through observation and reasoning, and selecting a trend prediction model; the linear trend model comprises a linear trend prediction model and an exponential trend prediction model;
fitting parameters of a trend prediction model by adopting a least square method based on a moving average number series D obtained by preprocessing to obtain a trend prediction function, and taking the obtained trend prediction function as a long-term trend factor Tt
According to the moving average number sequence D and the long-term trend factor T obtained by calculationtCalculating random fluctuation factors
Figure BDA0003323909490000041
Preferably, the establishing a time series prediction model according to the numerical influence factor calculation result, and performing future trend prediction according to the time series prediction model to obtain the predicted value of the future time series data specifically includes:
according to the long-term trend factor T of the historical time sequence data obtained by calculationtCyclic variation factor CtAnd irregular variation factor ItPresume the long-term trend factor T under the time sequence T +1 of the same period/same interval in the futuret+1Cyclic variation factor Ct+1And irregular variation factor It+1
Long-term trend factor T in time series of same period/same interval in futuret+1Cyclic variation factor Ct+1And irregular variation factor It+1Obtaining a time sequence prediction model:
Xt+1=Tt+1×Ct+1×It+1
predicting to obtain a predicted value X of future time sequence data according to the time sequence prediction modelt+1
Preferably, for different application scenarios, the performing mixed calculation according to the monitoring value of the historical time series data and the predicted value of the future time series data, and the determining the fault time domain of the time series data specifically includes:
monitoring values of historical time series data are recorded as a number series A { a1, a2, a 3.,. an }, predicted values of future time series data are recorded as a number series B { B1, B2, B3.,. bn }, and different mixing strategies are adopted to perform mixed calculation on the number series A and the number series B according to different application scenarios to obtain a mixed number series CnMixing the series CnComparing the time domain with the lower threshold value and the upper threshold value of the corresponding index in different scenes, and obtaining the fault time domain of the time sequence data according to the comparison result;
and performing mixed calculation on the array A and the array B by adopting different mixing strategies aiming at different application scenes to obtain a mixed array CnThe method specifically comprises the following steps:
adopting an overlap method aiming at the periodic time sequence data with the numerical value fluctuation smaller than a first preset threshold value, and recording the mixed numerical sequence as a numerical sequence CnIs provided with
Cn=An1+An2+...+Ank+Bn
Where k is the number of small cycles of the periodic data, n is the number of values contained in one cycle, AnkMonitor value representing historical time series data of kth small period, BnA prediction value representing future time series data;
aiming at the periodic time sequence data with the numerical value fluctuation larger than a second preset threshold value, the average value method is adopted, and the mixed sequence is recorded as a sequence CnIs provided with
Figure BDA0003323909490000051
AnmA monitor value representing the historical timing data for the mth small period;
aiming at time-sensitive periodic time sequence data, a weighting method is adopted, and the mixed sequence is recorded as a sequence CnThe method comprises the following steps:
Figure BDA0003323909490000052
wherein k is the number of small periods of the periodic data, n is the number of numerical values contained in one period, lambda is a weighting index, and the value range is [0,1 ];
aiming at periodic time sequence data with stable trend, a peak value method is adopted, and the mixed sequence is recorded as a sequence CnIs provided with
Cn=max(An1||An2...||Ank||Bn)
The resulting series CnAnd the array is composed of the maximum value of the single time node of each small period.
In a second aspect of the present invention, a data center high-frequency failure time domain early warning system based on TS-decomplexing is disclosed, the method comprising:
a data cleaning module: acquiring time sequence monitoring data under a data center scene, and cleaning data screening data to obtain a monitoring value of historical time sequence data;
a data processing module: analyzing the numerical influence factors of the historical time sequence data, performing time sequence Decomposition on the historical time sequence data based on a TS-Decomposition algorithm, and calculating the numerical influence factors according to the monitoring values of the historical time sequence data;
a trend prediction module: establishing a trend prediction model according to the numerical influence factor calculation result; predicting the future trend according to the trend prediction model to obtain a predicted value of the future time sequence data;
a time domain judging module: according to different application scenes, performing mixed calculation according to the monitoring value of historical time sequence data and the predicted value of future time sequence data, and judging the fault time domain of the time sequence data;
a fault early warning module: and carrying out high-frequency fault time domain early warning on the data center according to the fault time domain of the time sequence data.
In a third aspect of the present invention, an electronic device is disclosed, comprising: at least one processor, at least one memory, a communication interface, and a bus; the processor, the memory and the communication interface complete mutual communication through the bus; the memory stores program instructions executable by the processor which are invoked by the processor to implement the method of the first aspect of the invention.
In a fourth aspect of the invention, a computer-readable storage medium is disclosed, which stores computer instructions for causing a computer to implement the method according to the first aspect of the invention.
Compared with the prior art, the invention has the following beneficial effects:
1) the method depends on time sequence data monitoring in a data center scene, historical time sequence data with periodicity or interval rows are extracted from a data center operation and maintenance data warehouse and used as original data to carry out data cleaning, numerical influence factors of the historical time sequence data are analyzed, time sequence decomposition is carried out on the historical time sequence data based on a time sequence decomposition algorithm, and numerical influence factor calculation is carried out according to monitoring values of the historical time sequence data; and performing trend reasoning in the process of calculating the influence factors to obtain a long-term trend prediction model, and then establishing a time sequence prediction model according to the numerical influence factor calculation result and performing future trend prediction to obtain a predicted value of future time sequence data. According to the method, the incidence relation in the time sequence data is mined, the fault high-frequency distribution interval in the data is analyzed, the fault tendency is predicted before the fault occurs, and the operation and maintenance efficiency is improved.
2) The method and the device perform mixed calculation according to the monitoring value of historical time sequence data and the predicted value of future time sequence data aiming at different time sequence numerical value types, judge the fault time domain of the time sequence data, realize high-frequency fault time domain analysis of the time sequence data of the data center under different application scenes, provide reference for operation and maintenance resource allocation, and can realize high-frequency fault time domain early warning by combining an operation and maintenance index library and an alarm distribution platform.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flow chart of a TS-Decomposition-based time domain early warning method for a high-frequency fault of a data center according to the present invention;
FIG. 2 is a flow chart of the time series data trend prediction of the present invention;
FIG. 3 is a flow chart of trend reasoning in accordance with the present invention;
FIG. 4 is a flow chart of a high frequency fault time domain data acquisition method of the present invention;
fig. 5 is a diagram of an example of a high-frequency fault time domain under a traffic monitoring index according to an embodiment of the present invention.
Fig. 6 is a schematic diagram of high-frequency fault time-domain early warning distribution according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.
Referring to fig. 1, in order to mine the association relationship in the time series data and analyze the high-frequency failure time domain (failure high-frequency distribution interval) in the data, the invention provides a data center high-frequency failure time domain early warning method based on TS-Decomposition, which includes:
s1, acquiring time sequence monitoring data in a data center scene, and performing data screening and data cleaning to obtain a monitoring value of historical time sequence data;
the method comprises the steps of establishing a time sequence monitoring data index library under a data center scene, screening periodic or interval numerical time sequence data from time sequence monitoring data under the data center scene according to the time sequence monitoring data index library by combining an Extract-Transform-Load (ETL) technology, and establishing a new time sequence data warehouse.
Specifically, hardware equipment of the data center mainly comprises a computer, a sensor and terminal equipment, wherein the category of the computer mainly comprises a computing node and a storage node; the sensor equipment mainly comprises an ambient temperature sensor, a humidity sensor and a lightning sensor. A series of operation and maintenance related monitoring and service software including, but not limited to, middleware such as SuperVisor, Redis, Nginx, Prometheus, Elasticissearch, Kibana and the like are deployed on the data center hardware equipment. The computer hardware layer monitoring data mainly comprises hardware monitoring information of computing nodes such as CPU temperature, CPU occupancy rate, CPU frequency, memory utilization rate, disk I/0, disk utilization rate and the like, and the sensor layer mainly comprises monitoring information such as ambient temperature, humidity and the like of the sensor. The computer software layer mainly comprises execution information of different software, such as Redis write rate, MySQL space occupancy rate, Kibana concurrent access amount, SuperVisor service restart times and the like. The time sequence monitoring data index library is established through the indexes, time sequence data indexes which can be further analyzed are screened out based on the time sequence monitoring data index library, and the available time sequence data indexes need to meet three conditions:
(1) and (4) time continuity. The data needs to present continuous characteristics in the time dimension and cannot be a software service class with intermittent restart, such as Redis write rate.
(2) The numerical type. The data needs to be of a monitorable numerical type and cannot be boolean or state values.
(3) And (4) self-correlation. The data change in the time dimension needs to have a certain degree of autocorrelation, and numerical value abnormity caused by timing tasks or manual operation needs to be eliminated.
And cleaning each screened time sequence data index, and forming a self-defined data center time sequence data warehouse by adopting methods of abnormal value elimination, null value filling, data interpolation smoothing and the like, wherein the self-defined data center time sequence data warehouse is used for storing the monitoring value of the cleaned historical time sequence data and is used as input data of subsequent steps.
S2, analyzing the numerical influence factors of the time sequence data, carrying out time sequence Decomposition on the time sequence data based on a TS-Decomposition algorithm, and carrying out numerical influence factor calculation according to the monitoring values of the historical time sequence data;
referring to the flow chart of fig. 2, the time series data to be analyzed is extracted from the time series data warehouse of the custom data center and recorded as a sequence a, and the cleaned historical time series data is generally divided into two types, i.e., periodic numerical time series data and interval numerical time series data without periodicity. And setting parameters including start time, end time, time interval and predicted step number. And then performing time sequence Decomposition on the time sequence data based on a TS-Decomposition algorithm, and performing time sequence data trend prediction by combining trend reasoning to obtain a predicted value of the future time sequence data. Specifically, step S2 includes the following sub-steps:
s21, analyzing and determining numerical influence factors of the time sequence data;
the historical timing data comprises periodic timing data and interval timing data; for periodic timing data, the numerical influencing factors include: long term tendency factor TtCyclic variation factor CtAnd irregular variation factor ItThe method for performing multiplication model Decomposition on time sequence data based on the TS-Decomposition algorithm comprises the following steps: xt=Tt×Ct×It
Wherein, XtA value representing the known time series data itself for the time series full variation; t istA long-term tendency factor, CtAs a cyclically varying factor, ItIs an irregular variation factor;
for interval time series data, the numerical influencing factors comprise: long-term trend factors, irregular variation factors.
After analyzing and determining the numerical influence factors of the time sequence data, performing time sequence Decomposition based on a TS-Decomposition algorithm, and then performing numerical influence factor calculation according to the monitoring values of the historical time sequence data, specifically comprising:
s22, preprocessing the historical time sequence data according to the type of the time sequence data, and eliminating the influence of the cyclic variation factor on the value of the historical time sequence data;
specifically, the periodic time series data is subjected to moving average, and the influence of the cyclic variation factor on the time series data value is eliminated, wherein the moving average formula is as follows:
Figure BDA0003323909490000091
wherein, XtRepresenting the time series book for the known time series full variationA value of body; ctIs a cyclic variation; d' is the result of the moving average processing of the periodic time series data;
aiming at the interval trend time sequence data, the interval trend time sequence data is regarded as a small range interval of the periodic data, and the interval trend time sequence data does not contain a cyclic variation factor and does not need to eliminate the cyclic variation factor; the data obtained before is not periodic, and random factors are small, so that the data can be considered to be smooth.
Recording the preprocessed historical time sequence data for eliminating the cyclic variation factors as a moving average sequence D, and then D is Tt×It
S23, fitting a trend function by adopting a trend reasoning method, and analyzing to obtain a long-term trend factor TtAccording to the preprocessing result and the obtained long-term trend factor, calculating to obtain an irregular variation factor It. Wherein, TtAnd XtHaving the same dimensions, CtAnd ItAre ratios.
Known as D ═ Tt×ItThe existing series already comprises a group of irregular variation factors and long-term trend factors, and the long-term trend factors T are obtained through trend deduction analysist. The trend reasoning method includes a linear trend extension, a curve trend method, a function model reasoning method (index, growth curve, envelope curve) and the like.
Specifically, referring to the trend reasoning flow chart of fig. 3, first, by observing and reasoning, the trend of the moving average number series D obtained by preprocessing is analyzed, and a trend prediction model is selected; the linear trend model comprises a linear trend prediction model TtAx + b exponential trend prediction model
Figure BDA0003323909490000101
Wherein x represents a time point in a time sequence t, and can also be a prediction model such as quadratic fit, curve fit, multivariate fit and the like.
And then collecting data, wherein the moving average number sequence D obtained by the preprocessing is the data required to be collected by the trend reasoning of the invention.
Fitting a trend prediction model by adopting a least square methodObtaining a trend prediction function, and taking the obtained trend prediction function as a long-term trend factor Tt. Through the obtained trend prediction function, trend analysis reasoning can be carried out, for example, trend extrapolation is carried out according to historical trends, and the trend of time series of the same period/same interval in the future is obtained.
According to the moving average number sequence D and the long-term trend factor T obtained by calculationtCalculating random fluctuation factors
Figure BDA0003323909490000102
S3, establishing a time sequence prediction model according to the numerical value influence factor calculation result, and predicting the future trend according to the time sequence prediction model to obtain the predicted value of the future time sequence data;
according to the long-term trend factor T of the historical time sequence data obtained by calculationtCyclic variation factor CtAnd irregular variation factor ItTrend extrapolation is carried out to obtain a long-term trend factor T under the time sequence T +1 of the same period/same interval in the futuret+1Cyclic variation factor Ct+1And irregular variation factor It+1(ii) a In general, the long-term trend factor TtThe function in the prediction model changes along with time according to the trend of the time sequence in the same period/same interval in the future, and the cyclic variation factor Ct+1And irregular variation factor It+1The variation is repeated in a time series of the same period/same interval in the future.
According to the long-term trend factor T under the time sequence of the same period/same interval in the futuret+1Cyclic variation factor Ct+1And irregular variation factor It+1Obtaining a time sequence prediction model:
Xt+1=Tt+1×Ct+1×It+1
predicting to obtain a predicted value X of future time sequence data according to the time sequence prediction modelt+1And is denoted as array B.
S4, performing mixed calculation according to the monitoring value of the historical time sequence data and the predicted value of the future time sequence data aiming at different time sequence value types, and judging the fault time domain of the time sequence data;
specifically, monitoring values of historical time series data are recorded as a number series A { a1, a2, a 3.., an }, predicted values of future time series data are recorded as a number series B { B1, B2, B3.., bn }, and different mixing strategies are adopted to perform mixed calculation on the number series A and the number series B according to different application scenarios to obtain a mixed number series CnMixing the series CnComparing the time domain with the lower threshold value and the upper threshold value of the corresponding index in different scenes, and obtaining the fault time domain of the time sequence data according to the comparison result;
referring to the flow chart of the method for acquiring time domain data of a high frequency fault in fig. 4, for different time sequence numerical types, the sequence a and the sequence B are mixed and calculated by using different mixing strategies to obtain a mixed sequence CnThe method specifically comprises the following steps:
(1) aiming at the periodic time sequence data with the numerical value fluctuation smaller than a first preset threshold value, the numerical value fluctuation is smaller, an overlap method is adopted, and the mixed numerical sequence is recorded as a numerical sequence CnThe method comprises the following steps:
Cn=An1+An2+...+Ank+Bn
where k is the number of small cycles of the periodic data, n is the number of values contained in one cycle, AnkMonitor value representing historical time series data of kth small period, BnRepresenting the predicted value of future time series data.
(2) Aiming at the periodic time sequence data with numerical value fluctuation larger than a second preset threshold value, the numerical value fluctuation is larger, an averaging method is adopted, and the mixed numerical sequence is recorded as a numerical sequence CnThe method comprises the following steps:
Figure BDA0003323909490000111
Anmmonitor value representing the m-th periodic time series data, BnRepresenting the predicted value of future time series data.
(3) For time-sensitive periodic time series data, e.g. data with a certain trend of hard disk capacity over time or a network with a relatively fast change over timeThe operation and maintenance data such as flow rate, etc. are weighted, and the mixed number sequence is recorded as a number sequence CnThe method comprises the following steps:
Figure BDA0003323909490000121
wherein k is the number of small periods of the periodic data, n is the number of numerical values contained in one period, lambda is a weighting index, and the value range is [0,1 ];
(4) aiming at the periodic time sequence data of the numerical value fluctuation range in the preset interval range, the numerical value has a stable trend, a peak value method is adopted, and the mixed numerical sequence is recorded as a numerical sequence CnThe method comprises the following steps:
Cn=max(An1||An2...||Ank||Bn)
the resulting series CnAnd the array is composed of the maximum value of the single time node of each small period.
Fig. 5 is a diagram illustrating an example of a high-frequency failure time domain under a traffic monitoring index according to an embodiment.
And S5, carrying out high-frequency fault time domain early warning on the data center according to the fault time domain of the time sequence data.
The invention realizes the time domain early warning of the high-frequency fault of the data center based on the operation and maintenance fault index and the user-defined fault analysis index of the data center by relying on the alarm distribution infrastructure of the data center and combining the operation and maintenance resource distribution and the actual scene.
A corresponding self-defined high-frequency fault time domain data warehouse is designed by relying on a self-defined data center time sequence data warehouse, and the data warehouse needs to meet the following characteristics: the data processing method is consistent with a self-defined data center time sequence data warehouse, namely, the contained time sequence is consistent, the time sequence interval is consistent, the numerical value unit is consistent, and the number of data nodes is consistent. Each piece of extracted time sequence content corresponds to a high-frequency fault time domain interval sequence, and the two are only different in that the high-frequency fault time domain data warehouse data is calculated from the data of the user-defined data center time sequence data warehouse.
Referring to fig. 5, the high-frequency failure time domain early warning distribution diagram of the invention designs a data center time sequence data high-frequency failure time domain early warning database, after obtaining a high-frequency failure time domain, monitors early warning database information in real time by monitoring service software, captures an abnormal time domain value, and combines with a data center self operation and maintenance early warning system to realize high-frequency failure time domain early warning. The fault analysis indexes in the operation and maintenance monitoring index library mainly comprise two types:
1. basic timing data indicators, such as: CPU utilization, CPU kernel occupancy, etc.
2. Self-defining indexes: and (4) giving an alarm by combining with an alarm index of a service scene, for example, the hard disk occupancy rate reaches 70%.
The monitoring service software is a general name of a series of service software which is combined with an operation and maintenance monitoring index and scans a high-frequency fault time domain early warning database of a data center in real time, the service software scans the early warning database through a timing task after acquiring the monitoring index, and once abnormal time domain data are monitored, abnormal value records and corresponding time interval records are pulled and distributed to an operation and maintenance warning system of the data center. The data center operation and maintenance early warning system is an early warning service of the data center, and can send a prompt to corresponding operation and maintenance personnel like a basic operation and maintenance fault after receiving fault early warning, the operation and maintenance personnel can allocate operation and maintenance resources according to high-frequency fault time domain early warning information, more operation and maintenance personnel are arranged in a fault high-frequency time domain, possible operation and maintenance faults are checked in advance, the operation and maintenance resources are saved in a fault low-frequency time domain, and the purpose of saving the operation and maintenance cost is achieved.
Corresponding to the embodiment of the method, the invention also provides a data center high-frequency fault time domain early warning system based on TS-Decomposition, which comprises the following steps:
a data cleaning module: acquiring time sequence monitoring data under a data center scene, and cleaning data screening data to obtain a monitoring value of historical time sequence data;
a data processing module: analyzing the numerical influence factors of the historical time sequence data, performing time sequence Decomposition on the historical time sequence data based on a TS-Decomposition algorithm, and calculating the numerical influence factors according to the monitoring values of the historical time sequence data;
a trend prediction module: establishing a time sequence prediction model according to the numerical influence factor calculation result; predicting the future trend according to the time sequence prediction model to obtain the predicted value of the future time sequence data;
a time domain judging module: performing mixed calculation according to the monitoring value of historical time sequence data and the predicted value of future time sequence data aiming at different time sequence value types, and judging the fault time domain of the time sequence data;
a fault early warning module: and carrying out high-frequency fault time domain early warning on the data center according to the fault time domain of the time sequence data.
The above method embodiments and system embodiments are corresponding, and please refer to the method embodiments for brief description of the system embodiments.
The method and the device have the advantages that the incidence relation in the time sequence data is mined through the time sequence decomposition algorithm, the future trend of the time sequence data is predicted, mixed calculation is carried out according to the monitoring value of the historical time sequence data and the predicted value of the future time sequence data according to different time sequence value types, the fault time domain of the time sequence data is judged, the fault high-frequency distribution interval in the data is analyzed, the fault trend is predicted before the fault occurs, and the operation and maintenance efficiency is improved. The invention realizes high-frequency fault time domain analysis of data center time sequence data under different application scenes, provides reference for operation and maintenance resource allocation, and can realize high-frequency fault time domain early warning by combining an operation and maintenance index library and an alarm distribution platform.
The present invention also discloses an electronic device, comprising: at least one processor, at least one memory, a communication interface, and a bus; the processor, the memory and the communication interface complete mutual communication through the bus; the memory stores program instructions executable by the processor, which invokes the program instructions to implement the methods of the invention described above.
The invention also discloses a computer readable storage medium which stores computer instructions for causing the computer to implement all or part of the steps of the method of the embodiment of the invention. The storage medium includes: u disk, removable hard disk, ROM, RAM, magnetic disk or optical disk, etc.
The above-described system embodiments are merely illustrative, wherein the units described as separate parts may or may not be physically separate, and the parts shown as units may or may not be physical units, i.e. may be distributed over a plurality of network units. Without creative labor, a person skilled in the art can select some or all of the modules according to actual needs to achieve the purpose of the solution of the embodiment.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (10)

1. A data center high-frequency fault time domain early warning method based on TS-Decomposition is characterized by comprising the following steps:
acquiring time sequence monitoring data under a data center scene and performing data cleaning to obtain a monitoring value of historical time sequence data;
analyzing the numerical influence factors of the time sequence data, performing time sequence Decomposition on the time sequence data based on a TS-Decomposition algorithm, and calculating the numerical influence factors according to the monitoring values of the historical time sequence data;
establishing a time sequence prediction model according to the numerical value influence factor calculation result, and performing time sequence data trend prediction according to the time sequence prediction model to obtain a predicted value of future time sequence data;
performing mixed calculation according to the monitoring value of historical time sequence data and the predicted value of future time sequence data aiming at different time sequence value types, and judging the fault time domain of the time sequence data;
and carrying out high-frequency fault time domain early warning on the data center according to the fault time domain of the time sequence data.
2. The TS-Decomposition-based time domain early warning method for the high-frequency fault of the data center according to claim 1, wherein the acquiring of the time sequence monitoring data and the data cleaning in the data center scene comprises:
establishing a time sequence monitoring data index library in a data center scene, screening periodic or interval numerical time sequence data from time sequence monitoring data in the data center scene according to the time sequence monitoring data index library, and establishing a time sequence data warehouse;
and performing data cleaning on historical time series data in the time series data warehouse, wherein the data cleaning comprises but is not limited to outlier elimination, null filling and data interpolation smoothing.
3. The time-domain early warning method for the high-frequency fault of the data center based on the TS-decomplexing as claimed in claim 1, wherein the analyzing the numerical influence factors of the time-series data, performing time-series Decomposition on the time-series data based on the TS-decomplexing algorithm, and performing the numerical influence factor calculation according to the monitored values of the historical time-series data specifically comprises:
the historical time sequence data after data cleaning comprises periodic time sequence data and interval time sequence data; for periodic timing data, the numerical influencing factors include: long term tendency factor TtCyclic variation factor CtAnd irregular variation factor ItThe method for performing multiplication model Decomposition on time sequence data based on the TS-Decomposition algorithm comprises the following steps:
Xt=Tt×Ct×It
wherein, XtA value representing the known time series data itself for the time series full variation; t istA long-term tendency factor, CtAs a cyclically varying factor, ItIs an irregular variation factor;
for interval time series data, the numerical influencing factors comprise: long-term trend factors, irregular variation factors;
preprocessing historical time sequence data according to the type of the time sequence data to eliminate cyclic variation factor CtInfluence on historical time series data values;
fitting the trend by trend reasoningFunction to obtain long-term trend factor TtAccording to the preprocessing result and the obtained long-term trend factor, calculating to obtain an irregular variation factor ItWherein, TtAnd XtHaving the same dimensions, CtAnd ItAre ratios.
4. The TS-Decomposition-based time domain early warning method for high-frequency faults of a data center according to claim 3, wherein the preprocessing is performed according to the type of historical time series data, and the elimination of the influence of cyclic variation factors on the time series data specifically comprises:
the method comprises the following steps of carrying out moving average on periodic time sequence data, and eliminating the influence of cyclic variation factors on the time sequence data value, wherein the formula of the moving average is as follows:
Figure FDA0003323909480000021
wherein, XtA value representing the time series itself for a known full variation of the time series; ctIs a cyclic variation; d' is the result of the moving average processing of the periodic time series data;
aiming at the interval trend time sequence data, the interval trend time sequence data is regarded as a small range interval of the periodic data, and the interval trend time sequence data does not contain a cyclic variation factor and does not need to eliminate the cyclic variation factor;
and recording the preprocessed historical time sequence data for eliminating the cyclic variation factors as a moving average number sequence D.
5. The TS-Decomposition-based time domain early warning method for high-frequency faults of a data center according to claim 4, wherein a trend function is fitted by adopting a trend reasoning method, long-term trend factors are obtained through analysis, and the calculation of irregular variation factors according to the preprocessing result and the obtained long-term trend factors specifically comprises the following steps:
analyzing the trend of the moving average number series D obtained by preprocessing through observation and reasoning, and selecting a trend prediction model; the trend prediction model comprises a linear trend prediction model and an exponential trend prediction model;
fitting parameters of a trend prediction model by adopting a least square method based on a moving average number series D obtained by preprocessing to obtain a trend prediction function, and taking the obtained trend prediction function as a long-term trend factor Tt
According to the moving average number sequence D and the long-term trend factor T obtained by calculationtCalculating random fluctuation factors
Figure FDA0003323909480000031
6. The time domain early warning method for the high-frequency fault of the data center based on the TS-Decomposition according to claim 5, wherein the establishing of the time sequence prediction model according to the numerical influence factor calculation result, and the predicting of the future trend according to the time sequence prediction model to obtain the predicted value of the future time sequence data specifically include:
according to the long-term trend factor T of the historical time sequence data obtained by calculationtCyclic variation factor CtAnd irregular variation factor ItTrend extrapolation is carried out to obtain a long-term trend factor T under the time sequence T +1 of the same period/same interval in the futuret+1Cyclic variation factor Ct+1And irregular variation factor It+1
According to the long-term trend factor T under the time sequence of the same period/same interval in the futuret+1Cyclic variation factor Ct+1And irregular variation factor It+1Obtaining a time sequence prediction model:
Xt+1=Tt+1×Ct+1×It+1
predicting to obtain a predicted value X of future time sequence data according to the time sequence prediction modelt+1
7. The TS-Decomposition-based time domain early warning method for high-frequency faults of a data center according to claim 1, wherein the step of performing hybrid calculation according to monitoring values of historical time series data and predicted values of future time series data for different time series value types to judge the fault time domain of the time series data specifically comprises the steps of:
monitoring values of historical time series data are recorded as a number series A { a1, a2, a 3.,. an }, predicted values of future time series data are recorded as a number series B { B1, B2, B3.,. bn }, and different mixing strategies are adopted to perform mixed calculation on the number series A and the number series B according to different application scenarios to obtain a mixed number series CnMixing the series CnComparing the time domain with the lower threshold value and the upper threshold value of the corresponding index in different scenes, and obtaining the fault time domain of the time sequence data according to the comparison result;
and performing mixed calculation on the array A and the array B by adopting different mixing strategies aiming at different application scenes to obtain a mixed array CnThe method specifically comprises the following steps:
1) aiming at the periodic time sequence data with the numerical value fluctuation smaller than a first preset threshold value, a superposition method is adopted, and the mixed numerical sequence is recorded as a numerical sequence CnIs provided with
Cn=An1+An2+...+Ank+Bn
Wherein k is the number of small periods of the periodic data, and n is the number of numerical values contained in one period; a. thenkMonitor value representing historical time series data of kth small period, BnA prediction value representing future time series data;
2) aiming at the periodic time sequence data with the numerical value fluctuation larger than a second preset threshold value, the average value method is adopted, and the mixed sequence is recorded as a sequence Cn
Figure FDA0003323909480000041
AnmA monitor value representing the mth periodic time series data;
3) aiming at time-sensitive periodic time sequence data, a weighting method is adopted, and the mixed sequence is recorded as a sequence Cn
Figure FDA0003323909480000042
Wherein k is the number of small periods of the periodic data, n is the number of numerical values contained in one period, lambda is a weighting index, and the value range is [0,1 ];
4) aiming at the periodic time sequence data of the numerical value fluctuation range between the preset interval ranges, a peak value method is adopted, and the mixed sequence is recorded as a sequence Cn
Cn=max(An1||An2...||Ank||Bn)
The resulting series CnAnd the array is composed of the maximum value of the single time node of each small period.
8. A high-frequency failure time domain early warning system of a data center based on TS-Decomposition is characterized by comprising:
a data cleaning module: acquiring time sequence monitoring data under a data center scene, and cleaning data screening data to obtain a monitoring value of historical time sequence data;
a data processing module: analyzing the numerical influence factors of the historical time sequence data, performing time sequence Decomposition on the historical time sequence data based on a TS-Decomposition algorithm, and calculating the numerical influence factors according to the monitoring values of the historical time sequence data;
a trend prediction module: establishing a time sequence prediction model according to the numerical influence factor calculation result; predicting the future trend according to the time sequence prediction model to obtain the predicted value of the future time sequence data;
a time domain judging module: performing mixed calculation according to the monitoring value of historical time sequence data and the predicted value of future time sequence data aiming at different time sequence value types, and judging the fault time domain of the time sequence data;
a fault early warning module: and carrying out high-frequency fault time domain early warning on the data center according to the fault time domain of the time sequence data.
9. An electronic device, comprising: at least one processor, at least one memory, a communication interface, and a bus;
the processor, the memory and the communication interface complete mutual communication through the bus;
the memory stores program instructions executable by the processor, the processor invoking the program instructions to implement the method of any one of claims 1-7.
10. A computer-readable storage medium storing computer instructions for causing a computer to implement the method of any one of claims 1 to 7.
CN202111255316.1A 2021-10-27 2021-10-27 TS-Decomposition-based data center high-frequency fault time domain early warning method and system Pending CN113986704A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111255316.1A CN113986704A (en) 2021-10-27 2021-10-27 TS-Decomposition-based data center high-frequency fault time domain early warning method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111255316.1A CN113986704A (en) 2021-10-27 2021-10-27 TS-Decomposition-based data center high-frequency fault time domain early warning method and system

Publications (1)

Publication Number Publication Date
CN113986704A true CN113986704A (en) 2022-01-28

Family

ID=79742504

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111255316.1A Pending CN113986704A (en) 2021-10-27 2021-10-27 TS-Decomposition-based data center high-frequency fault time domain early warning method and system

Country Status (1)

Country Link
CN (1) CN113986704A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114615134A (en) * 2022-05-10 2022-06-10 北京华创方舟科技集团有限公司 IT intelligent operation and maintenance monitoring system and operation and maintenance method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114615134A (en) * 2022-05-10 2022-06-10 北京华创方舟科技集团有限公司 IT intelligent operation and maintenance monitoring system and operation and maintenance method
CN114615134B (en) * 2022-05-10 2022-08-05 北京华创方舟科技集团有限公司 IT intelligent operation and maintenance monitoring system and operation and maintenance method

Similar Documents

Publication Publication Date Title
Janardhanan et al. CPU workload forecasting of machines in data centers using LSTM recurrent neural networks and ARIMA models
US7467067B2 (en) Self-learning integrity management system and related methods
US10248561B2 (en) Stateless detection of out-of-memory events in virtual machines
US20220206447A1 (en) Automated system control with data analytics using doubly stochastic model
CN111045894B (en) Database abnormality detection method, database abnormality detection device, computer device and storage medium
Bhaduri et al. Detecting abnormal machine characteristics in cloud infrastructures
CN108809760A (en) The control method and device in sampling period in sampled-data system
CN104516808A (en) Data preprocessing device and method thereof
Sîrbu et al. Towards data-driven autonomics in data centers
Zeng et al. Estimation of software defects fix effort using neural networks
CN112083244A (en) Integrated avionics equipment fault intelligent diagnosis system
US20220253689A1 (en) Predictive data capacity planning
CN109558952A (en) Data processing method, system, equipment and storage medium
CN114358106A (en) System anomaly detection method and device, computer program product and electronic equipment
WO2020027931A1 (en) Real time telemetry monitoring tool
Patel et al. MAG-D: A multivariate attention network based approach for cloud workload forecasting
CN113986704A (en) TS-Decomposition-based data center high-frequency fault time domain early warning method and system
Khan et al. Modeling the autoscaling operations in cloud with time series data
EP3042288A1 (en) Analysis of parallel processing systems
CN116149895A (en) Big data cluster performance prediction method and device and computer equipment
Zhang et al. A novel hybrid model for docker container workload prediction
Istin et al. Decomposition based algorithm for state prediction in large scale distributed systems
CN111582343B (en) Equipment fault prediction method and device
Song et al. Adaptive watermark generation mechanism based on time series prediction for stream processing
Streiffer et al. Learning to simplify distributed systems management

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination