CN115545104A - KPI (Key Performance indicator) anomaly detection method, system and medium based on functional data analysis - Google Patents
KPI (Key Performance indicator) anomaly detection method, system and medium based on functional data analysis Download PDFInfo
- Publication number
- CN115545104A CN115545104A CN202211209980.7A CN202211209980A CN115545104A CN 115545104 A CN115545104 A CN 115545104A CN 202211209980 A CN202211209980 A CN 202211209980A CN 115545104 A CN115545104 A CN 115545104A
- Authority
- CN
- China
- Prior art keywords
- kpi
- dynamic function
- function curve
- data
- time sequence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Complex Calculations (AREA)
Abstract
The invention provides a KPI (Key Performance indicator) anomaly detection method, a KPI anomaly detection system and a KPI anomaly detection medium based on functional data analysis, which comprise the following steps: fitting the discrete original KPI time sequence data into a dynamic function curve; extracting the characteristics of the dynamic function curve; inputting the extracted characteristics of the dynamic function curve into a classification model, and performing abnormal recognition on the characteristics based on the classification model; and acquiring an identification result of whether the original KPI time sequence data output by the classification model is abnormal or not. In order to solve the problems that the KPI anomaly detection method in the prior art is low in detection precision and efficiency and cannot be applied to multi-scene or multi-dimensional anomaly detection, the KPI anomaly detection method converts original KPI data into a dynamic function curve, converts the original KPI data into functional data for analysis, has stronger universality, reduces the workload of operation and maintenance sample data acquisition and cleaning, and effectively improves the detection precision and the detection efficiency.
Description
Technical Field
The invention belongs to the technical field of KPI (Key performance indicator) abnormity detection, and particularly relates to a KPI abnormity detection method, a KPI abnormity detection system and a KPI abnormity detection medium based on functional data analysis.
Background
At present, the development of a network information technology promotes the progress of an enterprise IT system, and in the face of increasingly complex information systems, the traditional manual operation and maintenance is difficult to meet the intelligent management requirements of enterprises, and an intelligent operation and maintenance technology is developed at the same time. The intelligent operation and maintenance comprises a plurality of key scenes and technologies, and relates to monitoring, analysis, decision and the like of a large-scale distributed system, wherein KPI (Keyperformance indicator) anomaly detection is a bottom core technology of Internet service intelligent operation and maintenance, and most intelligent operation and maintenance scenes depend on the result of KPI anomaly detection.
KPI data is time series data of particular significance, obtained by timed sampling, in the format (time stamp, value). According to different scenes and application requirements, the method can be divided into two categories of single-dimensional KPI anomaly detection and multi-dimensional KPI anomaly detection. The detection of abnormal conditions (sudden increase and jitter of KPIs) at the level of an indicator of interest for single-dimensional KPI abnormality is the main research direction at present. Unlike single-dimensional anomalies, multi-dimensional KPI anomalies detect anomalies at the level of entities of interest (e.g., industrial equipment such as servers, spacecraft, etc.). An abnormal event of an entity usually causes multiple indexes to be abnormal simultaneously, and a certain correlation exists between the indexes, so that the multi-dimensional KPI abnormal detection is more challenging. Existing anomaly detection models can be roughly divided into two categories: traditional methods and machine learning based methods. However, due to the problems of periodic diversity, concept drift, rare abnormal label samples and the like, some current models are not high in precision in practical application.
Through the above analysis, the problems and defects of the prior art are as follows: the existing KPI abnormity detection method has low detection precision and low detection efficiency, and cannot be applied to multi-scene or multi-dimensional KPI abnormity detection.
Disclosure of Invention
The invention aims to solve the defects in the background technology, and provides a KPI (key performance indicator) abnormity detection method based on functional data analysis, so that the detection precision and the detection efficiency are improved.
The technical scheme adopted by the invention is as follows: a KPI anomaly detection method based on functional data analysis comprises the following steps:
fitting the discrete original KPI time sequence data into a dynamic function curve;
extracting the characteristics of the dynamic function curve;
inputting the extracted characteristics of the dynamic function curve into a classification model, and performing abnormal recognition on the characteristics based on the classification model;
and acquiring an identification result of whether the original KPI time sequence data output by the classification model is abnormal or not.
In the above technical solution, the process of fitting discrete raw KPI timing data to a dynamic function curve includes: calculating the variance of the original KPI time sequence data and the fitted dynamic function curve and punishing a smooth item to be used as the sum of square errors of the original KPI time sequence data and an experimental predicted value; solving to obtain a basic function coefficient of a dynamic function curve corresponding to the original KPI time sequence data by minimizing the sum of squared errors; and calculating to obtain a dynamic function curve after the original KPI time sequence data is fitted according to the basis function coefficient and the basis function, wherein the type of the basis function can select a Fourier basis function or a b-spline basis function according to the actual type.
In the above technical solution, the expression of the sum of square errors of the original KPI time series data and the experimental predicted value is:
wherein sse represents the sum of squared errors, and n represents the number of time sequence values; y is iq A timing value representing a qth sample point of the ith raw KPI timing data; x is the number of i (t q ) Representing the value of the q time sequence point of a dynamic function curve corresponding to the ith original KPI time sequence data; t represents a time-series point; lambda represents an experimental predicted value; x is the number of i (s) a smooth function curve corresponding to the ith original KPI time sequence data is represented; d m Denotes x i (t) m derivatives; ds represents the integral of the square of the m derivatives of the function curve;
x i (t) representing a dynamic function curve corresponding to the ith original KPI time sequence data; k represents the number of basis functions; c. C i,k Representing a kth basic function coefficient of a dynamic function curve corresponding to ith original KPI time sequence data; phi is a unit of k (t) denotes the kth basis function.
In the above technical solution, an FLPP functional feature extraction method is used to extract features of the dynamic function curve.
In the above technical solution, the process of extracting the feature of the dynamic function curve by using the FLPP functional feature extraction method includes:
calculating similarity matrix measurement according to dynamic function curves respectively corresponding to the discrete KPI time sequence data;
according to the similarity matrix measurement, the total weight value between curves of all dynamic functions in a low-dimensional space is obtained, and an FLPP target function is obtained;
converting the target function of the FLPP into a characteristic value for decomposition;
arranging the characteristic values obtained by decomposition from small to large, taking the characteristic vector corresponding to the characteristic value close to the front as the coefficient of the projection function, and calculating based on the coefficient of the projection function to obtain the projection characteristic function;
and mapping each dynamic function curve under the original dimension into a low-dimensional space through a projection characteristic function to form corresponding characteristics of each dynamic function curve after dimension reduction.
In the above technical solution, the expression of the target function of the FLPP is:
f represents the total weight value among all dynamic function curves in the low-dimensional space; y is i Representing a dynamic function curve corresponding to ith original KPI time sequence data in a low-dimensional space; y is j Representing a dynamic function curve corresponding to jth original KPI time sequence data in a low-dimensional space; s ij Representing a dynamic function curve x corresponding to the ith and jth original KPI time sequence data under the original dimension i (t) and x j (t) a similarity measure; n represents the number of original KPI time sequence data; c. C i Denotes x i (t) coefficients of basis functions; c. C j Denotes x j (t) coefficients of basis functions; c. C i ∈N o (c j ) Represents x i (t) isBelong to x j (t) coefficients of basis functions of one of the o-th neighbors of (t); p is a thermonuclear parameter;
a (t) represents a projection function; x is the number of i (t) representing a dynamic function curve corresponding to the ith original KPI time sequence data; x is the number of j (t) representing a dynamic function curve corresponding to the jth original KPI time sequence data;<a,x i (t)>represents the projection of (t) under the projection function a (t);<a,x j (t)>denotes x j (t) projection under the projection function a (t).
In the above technical solution, the process of inputting the extracted features of the dynamic function curve into the classification model includes:
judging the characteristics of the extracted dynamic function curve to be linear divisible or nonlinear divisible;
if the extracted characteristic of the dynamic function curve is linear divisible, inputting the characteristic into a classification model:
and if the extracted characteristics of the dynamic function curve are nonlinear divisible, converting the characteristics of the dynamic function curve into linear divisible in a high-dimensional space through kernel function inner product calculation and inputting the linear divisible characteristics into a classification model.
In the above technical solution, the construction process of the classification model includes:
acquiring a plurality of pieces of KPI time sequence data which do not contain abnormal data,
fitting each KPI time sequence data to a dynamic function curve and extracting the characteristics of the curve to form a training data set;
training the classification model by adopting a training data set, creating a decision boundary of the classification model, and obtaining parameters of the classification model after training.
The invention provides a KPI anomaly detection system based on functional data analysis, comprising: the system comprises a dynamic function curve fitting module, a dynamic function curve characteristic extraction module and a KPI data abnormity identification module;
the dynamic function curve fitting module is used for acquiring original KPI time sequence data and fitting the original KPI time sequence data into a dynamic function curve;
the dynamic function curve feature extraction module is used for extracting the features of the dynamic function curve by using a FLPP-based functional feature extraction method;
and the KPI data abnormity identification module is used for identifying the abnormity in the KPI data by utilizing the classification model based on the extracted characteristics.
The invention provides a computer readable storage medium, which stores a computer program, when the computer program is executed by a processor, the processor executes the steps of the KPI anomaly detection method based on functional data analysis in the technical scheme
The invention has the beneficial effects that: compared with the traditional data analysis (such as multivariate statistical analysis), the KPI anomaly detection method based on functional data analysis converts the original KPI data into a dynamic function curve, converts the original data into the functional data for analysis, and has stronger universality. According to the KPI data fitting method, the original KPI data is fitted through the dynamic function, the operation and maintenance sample data acquisition and cleaning workload can be greatly reduced, meanwhile, the dynamic evolution rule of deeper levels in the KPI data can be identified, and the essential characteristics of complex data can be analyzed. According to the method, the dynamic evolution rule of a deeper level in the KPI data is further identified by extracting the characteristics of the dynamic function curve, and the essential characteristics of the complex data are deeply analyzed. The method classifies whether the KPI data is abnormal or not based on the characteristics of the dynamic function curve through the classification model, and ensures the high efficiency of the identification process.
The method uses the basis function and the penalty term coefficient to fit the original KPI data so as to obtain a dynamic function curve, can quickly calculate the aperiodic and periodic data, and has better fitting advantage so as to fit better local characteristics. Since the feature dimension D is much larger than the number N of data, it is difficult to obtain correct feature values and feature vectors by feature decomposition. The invention uses the special method of the function data, the calculation of the function data is only related to the number of the basis functions (m) and the number of the data (n), the difference value of the m and the n is greatly reduced, and the calculation of the characteristic value and the characteristic function is simpler and more convenient, thereby obtaining more reasonable and more intuitive data interpretation.
The method adopts an FLPP functional feature extraction method to extract the features of the dynamic function curve, reduces the calculated amount of data, and maintains the adjacent relation of a high-dimensional function space in a low-dimensional function space so as to ensure the accuracy of the data, thereby not only identifying the dynamic evolution rule of deeper levels in KPI data and analyzing the essential features under complex data. Compared with the KPI sample after dimension reduction obtained by the traditional method, the FLPP has more detailed, visual and reasonable characteristics, and provides possibility for more efficient KPI abnormality detection.
The invention respectively provides corresponding data processing methods aiming at the extracted nonlinear separable and linear separable characteristics, so that the use of a classification model has higher adaptability, and the characteristic extraction is more detailed. The classification model of the invention adopts a linear separable method, has simple and convenient calculation, strong interpretability and few parameters, and is more beneficial to calculation.
Drawings
Fig. 1 is a schematic diagram of a KPI anomaly detection method based on functional data analysis according to an embodiment of the present invention;
FIG. 2 is a flow chart of a KPI anomaly detection method based on functional data analysis according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a KPI anomaly detection system based on functional data analysis according to an embodiment of the present invention;
FIG. 4 is a diagram of two-dimensional visualization of low noise analog data by FLPP used in the present invention provided by an embodiment of the present invention, compared with the prior art;
FIG. 5 is a diagram of two-dimensional visualization of high noise analog data by FLPP used in the present invention provided by an embodiment of the present invention, compared with the prior art;
FIG. 6 is a graph of the performance of an ECG public data set validation FLPP model provided by an embodiment of the present invention;
fig. 7 is a performance diagram of a FaceAll public data set validation FLPP model provided by an embodiment of the present invention.
Detailed Description
The invention will be further described in detail with reference to the drawings and specific embodiments, which are not intended to limit the invention, for the clear understanding of the invention.
As shown in fig. 1, a KPI anomaly detection method based on Functional Data Analysis (FDA) provided in an embodiment of the present invention includes:
s1, fitting discrete original KPI time sequence data into a dynamic function curve;
s2, extracting the characteristics of the dynamic function curve;
s3, inputting the extracted characteristics of the dynamic function curve into a classification model, and performing abnormal recognition on the characteristics based on the classification model;
and S4, acquiring an identification result of whether the original KPI time sequence data output by the classification model is abnormal or not.
In step S1, original KPI timing data is first obtained, and then the discrete original KPI timing data is fitted to a dynamic function curve.
In step S2, the features of each dynamic function curve are extracted by using a FLPP-based functional feature extraction method.
In step S3, the extracted dynamic function curve features are respectively input into a classification model, and the classification model identifies whether the corresponding original KPI timing data is abnormal based on the input corresponding dynamic function curve features, so as to identify abnormal KPI data in the original discrete KPI data.
As shown in fig. 2, the KPI anomaly detection method based on functional data analysis provided in the embodiment of the present invention specifically includes the following steps:
reading KPI historical data which is discrete original KPI time sequence data;
fitting into a dynamic function curve: respectively fitting each piece of original KPI time sequence data into a dynamic function curve;
functional local projection: extracting the characteristics of the dynamic function curve by adopting a function type local projection mode;
support vector machine anomaly detection: inputting the extracted characteristics of the dynamic function curve into a classification model, and performing abnormal recognition on the characteristics based on the classification model;
and acquiring an identification result of whether the original KPI time sequence data output by the classification model is abnormal or not.
The discrete original KPI time sequence data comprises a plurality of original KPI time sequence data, and the original N KPI time sequence data comprises x 1 ,x 2 ,x 3 ……x n Each piece of KPI time sequence data x i The vector consists of KPI values for different epochs.
Compared with the traditional data analysis (such as multivariate statistical analysis), the KPI anomaly detection method based on functional data analysis converts the original KPI data into a dynamic function curve, converts the original data into the functional data for analysis, and has stronger universality. According to the KPI data fitting method, the original KPI data is fitted through the dynamic function, the operation and maintenance sample data acquisition and cleaning workload can be greatly reduced, meanwhile, the dynamic evolution rule of deeper levels in the KPI data can be identified, and the essential characteristics of complex data can be analyzed. According to the method, the dynamic evolution rule of a deeper level in the KPI data is further identified by extracting the characteristics of the dynamic function curve, and the essential characteristics of the complex data are deeply analyzed. The method classifies whether the KPI data is abnormal or not based on the characteristics of the dynamic function curve through the classification model, and ensures the high efficiency of the identification process.
The process for fitting any piece of original KPI time sequence data into a dynamic function curve provided by the embodiment of the invention comprises the following steps:
calculating the variance of the original KPI time sequence data and the fitted dynamic function curve and punishing a smooth item to be used as the sum of square errors of the original KPI time sequence data and an experimental predicted value; solving to obtain a basic function coefficient of a dynamic function curve corresponding to the original KPI time sequence data by minimizing the sum of squared errors; and calculating to obtain a dynamic function curve after the original KPI time sequence data is fitted according to the base function coefficient and the base function.
The expression of the sum of square errors of the original KPI time sequence data and the experimental predicted value is as follows:
wherein sse represents Sum of squared errors (Sum of Squares for Error), and n represents the number of time sequence values; y is iq A timing value representing a qth sample point of the ith raw KPI timing data; x is the number of i (t q ) Representing the value of the q time sequence point of a dynamic function curve corresponding to the ith original KPI time sequence data; t represents a timing point; λ represents the predicted value of the experiment, usually by default 10 -8 To 1, take the λ with sse minimum in exponential increments; x is a radical of a fluorine atom i (s) a smoothing function curve corresponding to the ith original KPI time sequence data; d m Denotes x i (t) m derivatives; ds represents the integral of the square of the m derivatives of the function curve;
x i (t) representing a dynamic function curve corresponding to the ith original KPI time sequence data; k represents the number of basis functions; c. C ik Representing a kth basic function coefficient of a dynamic function curve corresponding to ith original KPI time sequence data; phi is a unit of k (t) represents the kth basis function, and b-spline and Fourier basis functions are selected according to the characteristics of the data.
X is to be i Substituting expression of (t) into sse λ (i) The following formula can be obtained:
the formula function is used for expressing that the fitted dynamic function curve has the error of the original KPI data, and solving the coefficient c of the basis function according to the minimized error ik The coefficient of the fitted function curve is obtained by the least square method. Finally, the coefficient c of the basis function is obtained ik Calculating a dynamic function curve x i (t) of (d). The process enables original finite-dimension discrete data to be converted into a continuous infinite-dimension continuous curve, and further the method can be suitable for more complicated and various KPI sample abnormity detection, and the model has stronger universality compared with the traditional data analysis.
The method uses the basis function and the penalty term coefficient to fit the original KPI data so as to obtain a dynamic function curve, can quickly calculate the aperiodic and periodic data, and has better fitting advantage so as to fit better local characteristics. Since the feature dimension D is much larger than the number N of data, it is difficult to obtain the correct feature value and feature vector by feature decomposition. The invention uses the special method of the function data, the calculation of the function data is only related to the number of the basis functions (m) and the number of the data (n), the difference value of the m and the n is greatly reduced, and the calculation of the characteristic value and the characteristic function is simpler and more convenient, thereby obtaining more reasonable and more intuitive data interpretation.
After discrete data are converted into functional data, feature extraction is carried out through a method specific to the functional data, and common models comprise Functional Principal Component Analysis (FPCA) and functional local projection (FLPP). The main idea of the FPCA is to find a projection characteristic function which enables the variance of the data after dimensionality reduction to be maximum on the basis of the functional data, then map the original dimensionality data into a low-dimensional space through the projection characteristic function, and extract principal component characteristics of the functional data based on the FPCA, so that redundant information of the data can be removed, and the purpose of dimensionality reduction is achieved. The FLPP method is mainly characterized in that on the basis of functional data, a local manifold structure of the data is kept in a low-dimensional space by using a similarity graph of an original dimensional space, on the basis of keeping the local manifold structure, a projection characteristic function with the maximum local variance of the data after dimension reduction is searched, then the original dimensional data is mapped into the low-dimensional space through the projection characteristic function for characteristic extraction, and the purpose of dimension reduction is achieved by removing redundant information while keeping the local structure. The FPCA-based functional feature extraction method focuses more on global features of KPI samples, is unclear in local feature relation and sensitive to abnormal points, and the FLPP-based functional feature extraction method focuses more on local features among samples, so that the FLPP expression effect is more prominent in KPI abnormality detection.
The FLPP-based functional feature extraction method is characterized in that a manifold structure of data is kept in a low-dimensional function space by utilizing a similarity graph of an original dimensional function space, and similarity matrix measurement is calculated according to discrete sample points.
The process of extracting the features of the dynamic function curve by using the FLPP-based functional feature extraction method in the embodiment includes:
calculating similarity matrix measurement according to the dynamic function curves respectively corresponding to the discrete KPI time sequence data; the method aims to maintain the similarity of an original dimension space in a low-dimension space, and maintain a local manifold structure of data in the low-dimension space by utilizing a similarity graph of the original dimension space;
solving a total weight value between curves of each dynamic function in a low-dimensional space according to the similarity matrix measurement to obtain an FLPP target function; converting the target function of the FLPP into a characteristic value for decomposition; arranging the characteristic values obtained by decomposition from small to large, taking the characteristic vector corresponding to the characteristic value close to the front as the coefficient of the projection function, and calculating based on the coefficient of the projection function to obtain the projection characteristic function; namely, on the basis of keeping a local popular structure, determining a projection characteristic function with the largest local variance of local popular data after dimensionality reduction;
and mapping each dynamic function curve under the original dimension into a low-dimensional space through a projection characteristic function to form corresponding characteristics of each dynamic function curve after dimension reduction.
The specific process of extracting the characteristics of the dynamic function curve by using the FLPP-based functional characteristic extraction method in the specific embodiment includes:
(1) Calculating the similarity matrix measurement by adopting the following formula to calculate the dynamic function curves respectively corresponding to the discrete KPI time sequence data:
s ij representing a dynamic function curve x corresponding to the ith and jth original KPI time sequence data in the original dimension space i (t) and x j (t) a similarity measure, c i Represents x i (t) coefficients of basis functions; c. C j Represents x j (t) coefficients of basis functions; its calculation mode uses x i (t) coefficient of corresponding basis function c i Calculating; c. C i ∈N o (c j ) Represents x i (t) is of x j (t) coefficients of basis functions of one of the o-th neighbors of (t); p is a thermonuclear parameter ranging from 0 to 1, with a default of 1.
(2) Calculating the total weight value among curves of each dynamic function in a low-dimensional space by adopting the following formula to obtain an FLPP target function;
f represents the total weight value among all dynamic function curves in the low-dimensional space; y is i Representing a dynamic function curve corresponding to ith original KPI time sequence data in a low-dimensional space; y is j Representing a dynamic function curve corresponding to jth original KPI time sequence data in a low-dimensional space; n represents the number of original KPI time sequence data; a (t) represents a projection function; x is the number of i (t) representing a dynamic function curve corresponding to the ith original KPI time sequence data; x is a radical of a fluorine atom j (t) representing a dynamic function curve corresponding to the jth original KPI time sequence data;<a,x i (t)>represents the projection of (t) under the projection function a (t);<a,x j (t)>denotes x j (t) projection under the projection function a (t).
The objective function is simplified by:
f represents a simplification of the objective function; l represents based on s ij Forming a Laplace matrix corresponding to the similarity matrix; x represents a dynamic function curve X under n original dimension spaces i (t) forming a function matrix; w represents the inner product between basis functions φ (t); c denotes a basis function coefficient matrix.
(3) And converting the simplified FLPP target function into characteristic value decomposition by using a Lagrange multiplier method:
WCLC T Wa=λWCDC T Wa
arranging the characteristic values obtained by decomposition from small to large, wherein the characteristic vectors corresponding to the first d characteristic values are projection functions [ a ] 1 (t),…,a d (t)]The coefficient of (c). And substituting the d projection function coefficients into the following formula to calculate a projection characteristic function a (t):
wherein, a l Representing the ith projection function coefficient; phi is a l (t) denotes the l-th basis function.
(4) Calculating a projection characteristic function a (t) and a dynamic function curve x under n original dimension spaces i (t) forming the inner product of function matrix X, then extracting n dimension-reduced dynamic function curve dimension-reduced X i Characteristic curve after (t).
The KPI data dynamic function curve feature extraction based on FLPP can identify the dynamic evolution rule of deeper levels in the KPI data, analyze the essential features under complex data, and provide possibility for the KPI abnormality detection with higher efficiency compared with the KPI sample feature obtained by the traditional method after dimension reduction, which is more detailed, visual and reasonable.
Based on the significance characteristics extracted by the functional data analysis model, the abnormity in the KPI data can be efficiently identified based on classification models such as a support vector machine and the like in the subsequent process.
The method adopts an FLPP functional feature extraction method to extract the features of the dynamic function curve, reduces the calculated amount of data, and maintains the adjacent relation of a high-dimensional function space in a low-dimensional function space so as to ensure the accuracy of the data, thereby not only identifying the dynamic evolution rule of deeper levels in KPI data and analyzing the essential features under complex data. Compared with the KPI sample obtained by the traditional method after dimensionality reduction, the FLPP has more detailed, visual and reasonable characteristics, and provides possibility for more efficient KPI abnormity detection.
The process of inputting the extracted features of the dynamic function curve into the classification model in the embodiment includes:
judging the characteristics of the extracted dynamic function curve to be linear divisible or nonlinear divisible;
if the extracted characteristic of the dynamic function curve is linear divisible, inputting the characteristic into a classification model:
if the extracted characteristics of the dynamic function curve are nonlinear divisible, by introducing a kernel function, the essential idea is to map the low-dimensional dynamic function curve characteristics into a high-dimensional space through the kernel function, so that the low-dimensional dynamic function curve characteristics are linearly divisible in a high dimension; and according to kernel function inner product calculation, converting low-dimensional dynamic function curve characteristics into linear divisibility in a high-dimensional space, inputting the linear divisible characteristics into a classification model, obtaining the classification model according to training, and identifying KPI abnormal data by using a linear divisible method.
The invention respectively provides corresponding data processing methods aiming at the extracted nonlinear separable and linear separable characteristics, so that the use of a classification model has higher adaptability, and the characteristic extraction is more detailed.
The classification model of the present embodiment is classified by using the characteristics of the low-dimensional dynamic function curve of the linear separable method, and the construction process of the classification model includes:
obtaining a plurality of KPI time sequence data which do not contain abnormal data,
fitting each KPI time sequence data to a dynamic function curve and extracting the characteristics of the curve to form a training data set;
training the classification model by adopting a training data set, creating a decision boundary of the classification model, and obtaining parameters of the classification model after training.
The classification model of the invention adopts a linear separable method, has simple and convenient calculation, strong interpretability and few parameters, and is more beneficial to calculation.
The classification model of the present embodiment adopts a Support Vector Machine (SVM) algorithm, which is widely used in the field of machine learning and achieves a better effect. The basic idea is to correctly distinguish data and maximize the geometric separation separating hyperplanes, which can guarantee good classification prediction capability for new unknown instances. The support vector machine is used for solving a convex quadratic programming problem under constraint conditions, transforming the lagrange duality into an optimization problem of dual variables, and solving a dual problem equivalent to the original problem to obtain the optimal solution of the original problem.
According to the Lagrange duality, the duality problem of the original problem is the extremely small problem:
for the constraint of the low-dimensional characteristic inequality of each dynamic function curve, introducing a Lagrange multiplier alpha, and defining a Lagrange function:
wherein L (w, b, a) represents a loss function related to parameters w, b and a, alpha represents a Lagrangian multiplier constraint term, and the Lagrangian multiplier method is used for iterative solution; w represents the normal vector dividing the hyperplane segmentation, b represents the displacement, z i Hyperplane function, h, representing the curve characteristic of the ith low-dimensional dynamic function i Representing i dynamic function curve characteristics with low dimensions; and obtaining parameters after training, classifying the characteristics of the low-dimensional dynamic function curve, and detecting whether the characteristic is abnormal. The output result of the classification model is 0 or 1, wherein 0 represents abnormal and 1 represents normal.
As shown in fig. 3, the KPI anomaly detection system based on functional data analysis provided by the present invention includes:
the dynamic function curve fitting module 1 is used for acquiring original KPI time sequence data and fitting the discrete original KPI time sequence data into a dynamic function curve;
the dynamic function curve feature extraction module 2 is used for extracting features of a dynamic function curve by using an FLPP-based functional feature extraction method;
and the KPI data abnormity identification module 3 is used for identifying the abnormity in the KPI data by utilizing a classification model based on the extracted characteristics.
The technical solution of the present invention is further illustrated by the following specific examples.
The invention provides a computer readable storage medium, which stores a computer program, when the computer program is executed by a processor, the processor executes the steps of the KPI anomaly detection method based on functional data analysis according to the technical scheme:
an application embodiment of the present invention provides a computer device, which includes a memory and a processor, wherein the memory stores a computer program, and the computer program, when executed by the processor, causes the processor to execute the KPI abnormality detection method based on functional data analysis.
The application embodiment of the invention provides an information data processing terminal, which is characterized in that the information data processing terminal is used for executing the KPI abnormity detection method based on functional data analysis.
The embodiment of the invention achieves some positive effects in the process of research and development or use, and has great advantages compared with the prior art, and the following contents are described by combining data, diagrams and the like in the test process.
The performance of different algorithms is compared by randomly generating 1200 simulation data (parameters are sampled from a Gaussian distribution, 300 sample points of each type, characteristic dimension 200) based on the following functions, and reducing the simulation data to a 2-dimensional space for data visualization based on the traditional Principal Component Analysis (PCA), local Preserved Projection (LPP), functional Principal Component Analysis (FPCA) and Functional Local Preserved Projection (FLPP).
As shown in fig. 4 and 5, comparing two-dimensional visualization of low-noise and high-noise analog data respectively, it can be seen that the FLPP used in the present invention has better feature dimension reduction performance, and especially performs better in tasks with more noise points, so that it is very suitable for feature extraction of high-dimensional noisy KPI data.
In addition, experiments on ECG and FaceAll public data sets also verified the performance of the FLPP model, performing dimensionality reduction based on PCA, LPP, FPCA and FLPP on both data sets, and inputting low-dimensional features into KNN for classification. The classification accuracy of KNN under different dimensions is shown in FIGS. 6 and 7, and the results show that the FLPP used in the invention has more excellent performance. Wherein FIG. 6 is a graph of the performance of an ECG public data set validation FLPP model; FIG. 7 is a performance graph of the FaceAll public data set validation FLPP model.
It should be noted that the embodiments of the present invention can be realized by hardware, software, or a combination of software and hardware. The hardware portions may be implemented using dedicated logic; the software portions may be stored in a memory and executed by a suitable instruction execution system, such as a microprocessor or specially designed hardware. Those skilled in the art will appreciate that the apparatus and methods described above may be implemented using computer executable instructions and/or embodied in processor control code, such code being provided on a carrier medium such as a disk, CD-or DVD-ROM, programmable memory such as read only memory (firmware), or a data carrier such as an optical or electronic signal carrier, for example. The apparatus and its modules of the present invention may be implemented by hardware circuits such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc., or by software executed by various types of processors, or by a combination of hardware circuits and software, e.g., firmware.
Those not described in detail in this specification are well within the skill of the art.
Claims (10)
1. A KPI anomaly detection method based on functional data analysis is characterized in that: the method comprises the following steps:
fitting the discrete original KPI time sequence data into a dynamic function curve;
extracting the characteristics of the dynamic function curve;
inputting the extracted characteristics of the dynamic function curve into a classification model, and performing anomaly identification on the characteristics based on the classification model;
and acquiring an identification result of whether the original KPI time sequence data output by the classification model is abnormal or not.
2. A KPI anomaly detection method based on functional data analysis according to claim 1, characterized by: the process of fitting discrete raw KPI timing data to a dynamic function curve comprises: calculating the variance of the original KPI time sequence data and the fitted dynamic function curve and punishing a smooth item to be used as the sum of square errors of the original KPI time sequence data and an experimental predicted value; solving to obtain a basic function coefficient of a dynamic function curve corresponding to the original KPI time sequence data by minimizing the sum of squared errors; and calculating to obtain a dynamic function curve after the original KPI time sequence data is fitted according to the base function coefficient and the base function.
3. A KPI anomaly detection method based on functional data analysis according to claim 2, characterized by: the expression of the sum of square errors of the original KPI time sequence data and the experimental predicted value is as follows:
wherein sse represents the sum of squared errors, and n represents the number of time sequence values; y is iq A timing value representing a qth sample point of the ith raw KPI timing data; x is a radical of a fluorine atom i (t q ) Representing the value of the q time sequence point of the dynamic function curve corresponding to the ith original KPI time sequence data; t represents a timing point; lambda represents an experimental predicted value; x is a radical of a fluorine atom i (s) a smooth function curve corresponding to the ith original KPI time sequence data is represented; d m Represents x i (t) m derivatives; ds represents the integral of the square of the m derivatives of the function curve;
x i (t) represents the ith original KPI timing sequenceA dynamic function curve corresponding to the data; k represents the number of basis functions; c. C i,k Expressing a kth basis function coefficient of a dynamic function curve corresponding to ith original KPI time sequence data; phi is a unit of k (t) denotes the kth basis function.
4. A KPI anomaly detection method based on functional data analysis according to claim 1, characterized by: and extracting the characteristics of the dynamic function curve by an FLPP functional characteristic extraction method.
5. A KPI anomaly detection method based on functional data analysis according to claim 4, wherein: the process of extracting the characteristics of the dynamic function curve by using the FLPP functional characteristic extraction method comprises the following steps:
calculating similarity matrix measurement according to the dynamic function curves respectively corresponding to the discrete KPI time sequence data;
solving a total weight value between curves of each dynamic function in a low-dimensional space according to the similarity matrix measurement to obtain an FLPP target function;
converting the target function of the FLPP into a characteristic value decomposition;
arranging the characteristic values obtained by decomposition from small to large, taking the characteristic vector corresponding to the characteristic value close to the front as the coefficient of the projection function, and calculating based on the coefficient of the projection function to obtain the projection characteristic function;
and mapping each dynamic function curve under the original dimension into a low-dimensional space through a projection characteristic function to form corresponding characteristics of each dynamic function curve after dimension reduction.
6. A KPI anomaly detection method based on functional data analysis according to claim 5, characterized by: the expression of the target function of the FLPP is as follows:
f represents the total weight value among all dynamic function curves in the low-dimensional space; y is i Representing a dynamic function curve corresponding to ith original KPI time sequence data in a low-dimensional space; y is j Representing a dynamic function curve corresponding to jth original KPI time sequence data in a low-dimensional space; s is ij Representing a dynamic function curve x corresponding to the ith and jth original KPI time sequence data under the original dimension i (t) and x j (t) a similarity measure; n represents the number of original KPI time sequence data; c. C i Denotes x i (t) coefficients of basis functions; c. C j Denotes x j (t) coefficients of basis functions; c. C i ∈N o (c j ) Represents x i (t) is of x j (t) coefficients of basis functions of one of the o-th neighbors of (t); p is a thermonuclear parameter;
a (t) represents a projection function; x is the number of i (t) representing a dynamic function curve corresponding to the ith original KPI time sequence data; x is the number of j (t) representing a dynamic function curve corresponding to the jth original KPI time sequence data;<a,x i (t)>represents the projection of (t) under the projection function a (t);
<a,x j (t)>represents x j (t) projection under the projection function a (t).
7. A KPI anomaly detection method based on functional data analysis according to claim 1, characterized by: the process of inputting the extracted features of the dynamic function curve into a classification model comprises the following steps:
judging whether the extracted dynamic function curve is characterized by linear divisible or nonlinear divisible;
if the extracted characteristic of the dynamic function curve is linear divisible, inputting the characteristic into a classification model:
and if the extracted characteristics of the dynamic function curve are nonlinear divisible, converting the characteristics of the dynamic function curve into linear divisible in a high-dimensional space through kernel function inner product calculation and then inputting the linear divisible characteristics into a classification model.
8. A KPI anomaly detection method based on functional data analysis according to claim 7, characterized by: the construction process of the classification model comprises the following steps:
acquiring a plurality of pieces of KPI time sequence data which do not contain abnormal data,
fitting each KPI time sequence data to a dynamic function curve and extracting the characteristics of the curve to form a training data set;
training the classification model by adopting a training data set, creating a decision boundary of the classification model, and obtaining parameters of the classification model after training.
9. A KPI anomaly detection system based on functional data analysis, characterized by comprising: the system comprises a dynamic function curve fitting module, a dynamic function curve characteristic extraction module and a KPI data abnormity identification module;
the dynamic function curve fitting module is used for acquiring original KPI time sequence data and fitting the original KPI time sequence data into a dynamic function curve;
the dynamic function curve feature extraction module is used for extracting the features of the dynamic function curve by using a FLPP-based functional feature extraction method;
and the KPI data abnormity identification module is used for identifying the abnormity in the KPI data by utilizing the classification model based on the extracted characteristics.
10. A computer-readable storage medium, storing a computer program which, when executed by a processor, causes the processor to carry out the method steps of a KPI anomaly detection method based on functional data analysis according to any one of claims 1-8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211209980.7A CN115545104A (en) | 2022-09-30 | 2022-09-30 | KPI (Key Performance indicator) anomaly detection method, system and medium based on functional data analysis |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211209980.7A CN115545104A (en) | 2022-09-30 | 2022-09-30 | KPI (Key Performance indicator) anomaly detection method, system and medium based on functional data analysis |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115545104A true CN115545104A (en) | 2022-12-30 |
Family
ID=84731102
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211209980.7A Pending CN115545104A (en) | 2022-09-30 | 2022-09-30 | KPI (Key Performance indicator) anomaly detection method, system and medium based on functional data analysis |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115545104A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118229112A (en) * | 2024-04-07 | 2024-06-21 | 南京审计大学 | Audit platform supervision system and method based on artificial intelligence |
-
2022
- 2022-09-30 CN CN202211209980.7A patent/CN115545104A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118229112A (en) * | 2024-04-07 | 2024-06-21 | 南京审计大学 | Audit platform supervision system and method based on artificial intelligence |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Lee et al. | Integrating independent component analysis and local outlier factor for plant-wide process monitoring | |
Guo et al. | A feature fusion based forecasting model for financial time series | |
WO2006113248A2 (en) | Partially supervised machine learning of data classification based on local-neighborhood laplacian eigenmaps | |
Yang et al. | An incipient fault diagnosis methodology using local Mahalanobis distance: Detection process based on empirical probability density estimation | |
CN110880007A (en) | Automatic selection method and system for machine learning algorithm | |
CN110765587A (en) | Complex petrochemical process fault diagnosis method based on dynamic regularization judgment local retention projection | |
Ramamoorthi | Machine Learning Models for Anomaly Detection in Microservices | |
CN113807418A (en) | Injection molding machine energy consumption abnormity detection method and system based on Gaussian mixture model | |
CN112115965A (en) | SVM-based passive operating system identification method, storage medium and equipment | |
CN109871880A (en) | Feature extracting method based on low-rank sparse matrix decomposition, local geometry holding and classification information maximum statistical correlation | |
CN115034278A (en) | Performance index abnormality detection method and device, electronic equipment and storage medium | |
CN116841808A (en) | Multi-core processor abnormality detection method and device, electronic equipment and storage medium | |
Xu et al. | Industrial process fault detection and diagnosis framework based on enhanced supervised kernel entropy component analysis | |
CN117407313A (en) | Automatic quality testing method and system based on machine learning analysis | |
CN115545104A (en) | KPI (Key Performance indicator) anomaly detection method, system and medium based on functional data analysis | |
Chu et al. | Recognition of unknown wafer defect via optimal bin embedding technique | |
CN110674882A (en) | Abnormal point detection method based on Fourier function transformation | |
CN117251817A (en) | Radar fault detection method, device, equipment and storage medium | |
CN117131022A (en) | Heterogeneous data migration method of electric power information system | |
CN117056842A (en) | Method, device, equipment, medium and product for constructing equipment abnormality monitoring model | |
CN117807470A (en) | Multiple calibration sheet classification method for unsupervised time series anomaly detection | |
Chapel et al. | Anomaly detection with score functions based on the reconstruction error of the kernel PCA | |
Hancock et al. | A Model-Agnostic Feature Selection Technique to Improve the Performance of One-Class Classifiers | |
Zhang et al. | Improved multi-distance ARMF integrated with LTSA based pattern matching method and its application in fault diagnosis | |
Banerjee et al. | Surprisal Driven $ k $-NN for Robust and Interpretable Nonparametric Learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |