CN115269241A - Method, device and storage medium for carrying out anomaly detection on periodic data - Google Patents

Method, device and storage medium for carrying out anomaly detection on periodic data Download PDF

Info

Publication number
CN115269241A
CN115269241A CN202210855545.5A CN202210855545A CN115269241A CN 115269241 A CN115269241 A CN 115269241A CN 202210855545 A CN202210855545 A CN 202210855545A CN 115269241 A CN115269241 A CN 115269241A
Authority
CN
China
Prior art keywords
period
previous
data corresponding
current
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210855545.5A
Other languages
Chinese (zh)
Inventor
巩光乾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianyi Cloud Technology Co Ltd
Original Assignee
Tianyi Cloud Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianyi Cloud Technology Co Ltd filed Critical Tianyi Cloud Technology Co Ltd
Priority to CN202210855545.5A priority Critical patent/CN115269241A/en
Publication of CN115269241A publication Critical patent/CN115269241A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/0709Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a distributed system consisting of a plurality of standalone computer nodes, e.g. clusters, client-server systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0766Error or fault reporting or storing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/079Root cause analysis, i.e. error or fault diagnosis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computer Hardware Design (AREA)
  • Testing And Monitoring For Control Systems (AREA)

Abstract

The application relates to the technical field of computers, and discloses a method, a device and a storage medium for carrying out anomaly detection on periodic data, wherein the method comprises the following steps: the method comprises the steps of determining a data difference value based on predicted data corresponding to a current period and actually measured data corresponding to the current period, and judging that the actually measured data corresponding to the current period is abnormal if the data difference value exceeds an abnormal difference threshold value, wherein the predicted data corresponding to the current period is obtained by inputting the actually measured data and the actually measured period characteristics of at least one previous period into an abnormal detection model, the abnormal detection model is obtained by training the actually measured data of at least one historical period, the actually measured period characteristics corresponding to at least one historical period and the predicted data corresponding to at least one historical period, it needs to be stated that the historical period is earlier than the at least one previous period, and the abnormal difference threshold value is determined based on the predicted data and the actually measured data corresponding to the at least one previous period, so that abnormality is better detected, and the accuracy of alarming is improved.

Description

Method, device and storage medium for carrying out anomaly detection on periodic data
Technical Field
The application relates to the technical field of computers, and provides a method and a device for carrying out anomaly detection on periodic data and a storage medium.
Background
At present, with the continuous expansion of the scale of the edge computing service, the number of machines in an edge cluster is also continuously expanded, and what is important in the edge computing technology is how to realize the monitoring and the abnormality early warning of each edge cluster machine.
However, how to monitor these clusters based on these indexes and detect the abnormality in the indexes is also a very complicated problem, on one hand, these clusters are located in different areas and have different network environments and hardware configurations, and on the other hand, the services carried by the clusters are also different, so when an early warning is given to an index, it is very difficult to detect the abnormality, a uniform threshold cannot be formulated to process the abnormality, and some indexes have obvious periodicity, and if a hard threshold is adopted to process the abnormality, the situation of false alarm is greatly increased. Therefore, the monitoring index monitoring and anomaly detection work under the current edge calculation scene also has the following problems:
(1) A uniform threshold value for all clusters cannot be established, and the threshold value cannot be automatically adjusted according to different conditions of each cluster by the conventional method.
(2) The existing method cannot process periodic data, and the threshold value cannot be automatically adjusted well no matter the threshold value is fixed or the existing dynamic adjustment method cannot be well, so that false alarms are increased.
Disclosure of Invention
The embodiment of the application provides a method and a device for carrying out abnormity detection on periodic data and a storage medium, which are used for improving the detection efficiency and accuracy.
The specific technical scheme provided by the application is as follows:
in a first aspect, an embodiment of the present application provides a method for performing anomaly detection on periodic data, including:
determining a data difference value based on the predicted data corresponding to the current period and the actually measured data corresponding to the current period; the prediction data corresponding to the current period is obtained by inputting the actually measured data and the actually measured period characteristics of at least one previous period into the anomaly detection model; the anomaly detection model is obtained by training based on measured data of at least one historical period, measured period characteristics corresponding to at least one historical period and predicted data corresponding to at least one historical period, wherein the historical period is earlier than at least one previous period;
and if the data difference value exceeds an abnormal difference threshold value, judging that the measured data corresponding to the current period is abnormal, wherein the abnormal difference threshold value is determined based on the predicted data and the measured data corresponding to at least one previous period.
Optionally, the anomaly detection model is trained by:
taking the measured data of at least one historical period and the measured period characteristics corresponding to at least one historical period as input parameters of a lasso regression model, and taking the predicted data corresponding to at least one historical period as output parameters of the lasso regression model;
and training the lasso regression model based on the input parameters and the output parameters, and taking the trained lasso regression model as an abnormality detection model.
Optionally, the measured data corresponding to the current period is determined by:
determining a current starting time and a current ending time corresponding to a current period;
determining a current sliding window corresponding to the current period based on the current starting time and the current ending time;
intercepting an original data stream based on a current sliding window, and determining a plurality of intercepted original data as actually measured data corresponding to a current period, wherein the original data stream is a set of state data of an edge cluster in the operation process;
the measured data of at least one previous cycle is determined by:
determining a previous starting time and a previous ending time corresponding to at least one previous period;
determining a previous sliding window corresponding to at least one previous period based on a previous starting time and a previous ending time;
and intercepting the original data stream based on a previous sliding window, and determining a plurality of intercepted original data as measured data corresponding to at least one previous period.
Optionally, the measured period characteristic is determined by:
extracting corresponding target characteristics from actually measured data corresponding to the current period at each sampling time preset in the current period;
actual measurement period characteristics are determined based on the respective target characteristics.
Optionally, the data difference value is determined by:
for each sampling instant within the current cycle: the prediction data corresponding to the sampling moment and the actual measurement data corresponding to the sampling moment are subjected to difference to obtain a preselected difference value;
overlapping the preselected difference values of all sampling moments, and taking an average value based on the overlapped result;
the average value is determined as the data difference value.
Optionally, the anomaly difference threshold is determined by:
inputting measured data corresponding to at least one historical period and measured period characteristics corresponding to at least one historical period into an anomaly detection model to obtain predicted data corresponding to at least one historical period;
and determining an abnormal difference threshold value based on the predicted data corresponding to the at least one historical period and the measured data corresponding to the at least one historical period.
Optionally, determining the anomaly threshold based on the predicted data corresponding to the at least one historical period and the measured data corresponding to the at least one historical period includes:
performing the following for each preset previous sampling instant within at least one history period: the method comprises the steps that predicted data corresponding to a previous sampling moment and actual measurement data corresponding to the previous sampling moment are subjected to difference to obtain a pre-selection previous difference value;
the preselected previous difference values for each previous sampling instant are superimposed, a previous average value is derived based on the superimposed previous results, and the previous average value is determined as the outlier threshold.
In a second aspect, an embodiment of the present application further provides an apparatus for performing anomaly detection on periodic data, including:
a difference determining unit, configured to determine a data difference based on the predicted data corresponding to the current period and the measured data corresponding to the current period; the prediction data corresponding to the current period is obtained by inputting the actually measured data and the actually measured period characteristics of at least one previous period into the anomaly detection model; the anomaly detection model is obtained by training based on measured data of at least one historical period, measured period characteristics corresponding to the at least one historical period and predicted data corresponding to the at least one historical period, wherein the historical period is earlier than at least one previous period;
and the abnormality determining unit is used for determining that the measured data corresponding to the current period is abnormal if the data difference value exceeds an abnormal difference threshold value, wherein the abnormal difference threshold value is determined based on the predicted data and the measured data corresponding to at least one previous period.
Optionally, the anomaly detection model is trained by:
taking the measured data of at least one historical period and the measured period characteristics corresponding to at least one historical period as input parameters of a lasso regression model, and taking the predicted data corresponding to at least one historical period as output parameters of the lasso regression model;
and training the lasso regression model based on the input parameters and the output parameters, and taking the trained lasso regression model as an abnormality detection model.
Optionally, the measured data corresponding to the current period is determined by:
determining a current starting time and a current ending time corresponding to a current period;
determining a current sliding window corresponding to the current period based on the current starting time and the current ending time;
intercepting an original data stream based on a current sliding window, and determining a plurality of intercepted original data as actually measured data corresponding to a current period, wherein the original data stream is a set of state data of an edge cluster in the operation process;
the measured data of at least one previous cycle is determined by:
determining a previous starting time and a previous ending time corresponding to at least one previous period;
determining a previous sliding window corresponding to at least one previous period based on a previous starting time and a previous ending time;
and intercepting the original data stream based on a previous sliding window, and determining a plurality of intercepted original data as measured data corresponding to at least one previous period.
Optionally, the measured period characteristic is determined by:
extracting corresponding target characteristics from actually measured data corresponding to the current period at each sampling time preset in the current period;
actual measurement period characteristics are determined based on the respective target characteristics.
Optionally, the data difference is determined by:
for each sampling instant within the current cycle, performing the following: the prediction data corresponding to the sampling moment and the actual measurement data corresponding to the sampling moment are subjected to difference to obtain a preselected difference value;
overlapping the preselected difference values of all sampling moments, and taking an average value based on the overlapped result;
the average value is determined as the data difference value.
Optionally, the anomaly difference threshold is determined by:
inputting measured data corresponding to at least one historical period and measured period characteristics corresponding to at least one historical period into an abnormality detection model to obtain predicted data corresponding to at least one historical period;
and determining an abnormal difference threshold value based on the predicted data corresponding to the at least one historical period and the measured data corresponding to the at least one historical period.
Optionally, determining the anomaly threshold based on the predicted data corresponding to the at least one historical period and the measured data corresponding to the at least one historical period includes:
performing the following operations for each preset previous sampling moment in at least one history period: the prediction data corresponding to the previous sampling moment is differenced with the actually measured data corresponding to the previous sampling moment to obtain a preselected previous difference value;
the preselected previous difference values for each previous sampling instant are superimposed, a previous average value is derived based on the superimposed previous results, and the previous average value is determined as the outlier threshold.
In a third aspect, a smart terminal includes:
a memory for storing executable instructions;
a processor for reading and executing executable instructions stored in the memory to implement a method as in any one of the first aspects.
In a fourth aspect, a computer-readable storage medium, wherein instructions, when executed by a processor, enable the processor to perform the method of any of the first aspect.
The beneficial effect of this application is as follows:
to sum up, in the embodiment of the present application, a method, an apparatus, and a storage medium for performing anomaly detection on periodic data are provided, where the method includes: the method comprises the steps of determining a data difference value based on predicted data corresponding to a current period and actually measured data corresponding to the current period, and judging that the actually measured data corresponding to the current period is abnormal if the data difference value exceeds an abnormal difference threshold value, wherein the predicted data corresponding to the current period is obtained by inputting the actually measured data and the actually measured period characteristics of at least one previous period into an abnormal detection model, the abnormal detection model is obtained by training the actually measured data of at least one historical period, the actually measured period characteristics corresponding to at least one historical period and the predicted data corresponding to at least one historical period, it needs to be stated that the historical period is earlier than the at least one previous period, and the abnormal difference threshold value is determined based on the predicted data and the actually measured data corresponding to the at least one previous period, so that abnormality is better detected, and the accuracy of alarming is improved.
Additional features and advantages of the present application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the present application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1 is a block diagram of a system for anomaly detection of periodic data according to an embodiment of the present disclosure;
FIG. 2 is a schematic flow chart illustrating the exception detection of periodic data according to an embodiment of the present disclosure;
FIG. 3 is a schematic diagram illustrating a logic architecture of an apparatus for anomaly detection of periodic data according to an embodiment of the present application;
fig. 4 is a schematic entity architecture diagram of an intelligent terminal in the embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments, but not all embodiments, of the technical solutions of the present application. All other embodiments obtained by a person skilled in the art without any inventive step based on the embodiments described in the present application are within the scope of the protection of the present application.
The terms "first," "second," and the like in the description and in the claims, and in the drawings described above, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein.
Preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
Referring to fig. 1, in the embodiment of the present application, the system includes at least one intelligent terminal, and the intelligent terminal is configured to detect state data of an edge cluster (including multiple connected servers) to determine whether a machine in the edge cluster is abnormal. As will be described in detail below.
Referring to fig. 2, in the embodiment of the present application, a specific process for performing anomaly detection on periodic data is as follows:
step 201: and determining a data difference value based on the predicted data corresponding to the current period and the measured data corresponding to the current period. The prediction data corresponding to the current period is obtained by inputting the measured data and the measured period characteristics of at least one previous period into an anomaly detection model, and the anomaly detection model is obtained by training the measured data based on at least one historical period, the measured period characteristics corresponding to at least one historical period and the prediction data corresponding to at least one historical period, wherein the historical period is earlier than the at least one previous period.
In the implementation process, time is divided in advance to obtain various periods, and different state data can be obtained in each period along with the operation of each machine in the edge cluster, wherein the state data are called measured data, the length of the period is not specifically limited, and the lengths of the time lengths corresponding to the periods can be the same or different. Since the number of machines included in the edge cluster is enormous, the number of measured data acquired per cycle is also very large.
In order to determine whether the measured data of each period is abnormal, in the embodiment of the present application, the measured data of each period and the measured period characteristics are respectively input into the abnormality detection model to obtain the predicted data corresponding to each period, and whether the measured data is abnormal is determined according to the difference between the predicted data and the measured data corresponding to each period. For convenience of detailed description, in the embodiments of the present application, the predicted data corresponding to the current period and the measured data corresponding to the current period are used for detailed description.
Firstly, introducing an anomaly detection model, wherein the anomaly detection model is obtained by training on the basis of a lasso regression model and is trained in the following way:
(1) And taking the measured data of at least one historical period and the measured period characteristics corresponding to at least one historical period as input parameters of the lasso regression model, and taking the predicted data corresponding to at least one historical period as output parameters of the lasso regression model.
In order to train the model to conform to the development rule of the state data as much as possible, in the implementation process, the actual measurement data of the historical period and the actual measurement period characteristics corresponding to the historical period are used as input parameters of the lasso regression model, and the prediction data corresponding to the historical period is used as output parameters of the lasso regression model and used as the training basis.
In order to make the calculation process of training simpler and more convenient, the number of the history periods may be one, and preferably, the actually measured data of a certain history period close to the current time and the corresponding actually measured period feature are selected, so that the effectiveness of the trained anomaly detection model is stronger. Of course, the number of the history periods may also be multiple, that is, the measured data of multiple history periods and the corresponding measured period features are selected as input parameters, so that the trained anomaly detection model has stronger stability.
It should be noted that, here, the history period is a period earlier than at least one previous period, that is, the time for training the input parameters of the lasso regression model is earlier than the previous period in which the prediction data corresponding to the current period can be trained.
Correspondingly, the output parameter of the lasso regression model is the predicted data corresponding to the historical period, and it should be noted that, because the prediction process is to use the current data to infer data of a period of time in the future, the predicted data corresponding to at least one historical period is actually data inferred by using the actual measured data of the historical period and the actual measured period characteristics corresponding to the historical period, and the occurrence times of the actual measured data of the historical period and the predicted data corresponding to the historical period are not completely consistent.
(2) And training the lasso regression model based on the input parameters and the output parameters, and taking the trained lasso regression model as an abnormality detection model.
After the input parameters and the output parameters of the lasso regression model are determined, the lasso regression model is trained, namely the internal parameters of the lasso regression model are adjusted to obtain the trained lasso regression model, and the trained lasso regression model is used as an abnormality detection model.
After the anomaly detection model to be used is determined, input data, namely actual measurement data corresponding to the current period is further determined, and the actual measurement data corresponding to the current period is determined in the following mode:
1) And determining the current starting time and the current ending time corresponding to the current period.
Because the number of machines included in the edge cluster is huge, in order to determine the actually measured data corresponding to the current period, in the implementation process, the current start time and the current end time corresponding to the current period are determined in the time domain, that is, the time length corresponding to the current period is determined.
2) And determining a current sliding window corresponding to the current period based on the current starting time and the current ending time.
And after the time length corresponding to the current period is determined by the current starting time and the current ending time, selecting a sliding window consistent with the time length as a current sliding window corresponding to the current period.
3) Intercepting an original data stream based on a current sliding window, and determining a plurality of intercepted original data as measured data corresponding to a current period, wherein the original data stream is a set of state data of an edge cluster in the operation process.
After the current sliding window is determined, intercepting an original data stream by using the current sliding window, wherein the original data stream is a set of all state data generated by each machine in the edge cluster along with time. After the original data stream is intercepted by the current sliding window, a plurality of original data of a period of time corresponding to the current sliding window can be obtained.
Likewise, the measured data of at least one previous cycle is determined by:
1] determining a previous start time and a previous end time for at least one previous cycle.
Since the number of machines included in the edge cluster is huge, in order to determine the measured data corresponding to at least one previous cycle, in an implementation, a previous start time and a previous end time corresponding to at least one previous cycle are determined in a time domain, that is, a time length corresponding to at least one previous cycle is determined.
And 2, determining a previous sliding window corresponding to at least one previous period based on the previous starting time and the previous ending time.
Correspondingly, after the time length corresponding to at least one previous period is determined by the previous starting time and the previous ending time, a sliding window consistent with the time length is selected as a previous sliding window corresponding to at least one previous period.
And 3, intercepting the original data stream based on the previous sliding window, and determining a plurality of intercepted original data as measured data corresponding to at least one previous period.
After the previous sliding window is determined, intercepting an original data stream by using the previous sliding window, wherein the original data stream is a set of all state data generated by each machine in the edge cluster over time. After the original data stream is truncated with a previous sliding window, a plurality of original data of a period of time corresponding to the previous sliding window can be obtained.
After the measured data corresponding to the current period is determined, another input data, namely measured period characteristics, needs to be determined, and the measured period characteristics are determined in the following way:
[1] and at each sampling moment preset in the current period, extracting corresponding target characteristics from the measured data corresponding to the current period respectively.
Considering that different types of edge clusters may have different actually measured period characteristics, in an implementation process, in order to extract the characteristics from actually measured data, a plurality of sampling moments are set in a current period in advance, and corresponding target characteristics are extracted from the actually measured data corresponding to the current period according to the preset sampling moments, and specific extraction means are not repeated here.
[2] Actual measurement period characteristics are determined based on the respective target characteristics.
After a plurality of target features are extracted from the measured data corresponding to the current period, the target features can be refined in a classification mode, a summary mode and the like, and therefore the measured period features are determined. In practice, the measured period characteristic is input into the anomaly detection model as another input data.
After inputting the measured data and measured period characteristics of at least one previous period into the anomaly detection model, obtaining the predicted data corresponding to the current period, and further determining a data difference value according to the predicted data and the measured data.
Specifically, the data difference is determined by:
performing the following operations for each sampling moment in the current period: and (4) subtracting the predicted data corresponding to the sampling moment from the actually measured data corresponding to the sampling moment to obtain a preselected difference value.
The data amount of the measured data and the predicted data is enormous. In the implementation process, the difference between the measured data and the predicted data at the same time is needed. At each sampling instant within the current cycle, performing the following: and obtaining the predicted data corresponding to the sampling time, obtaining the actual measurement data corresponding to the sampling time, and subtracting the predicted data and the actual measurement data at the same sampling time to obtain a preselected difference value, so that a plurality of corresponding preselected difference values are obtained at a plurality of sampling times.
And (II) overlapping the preselected difference values of all the sampling moments, and averaging based on the overlapped results.
In consideration of the fact that the number of sampling moments included in each period is different, in order to take a unified consideration for each period, in the implementation process, preselected difference values of each sampling moment are superposed, namely the sum of each preselected difference value is calculated, and then an average value is obtained based on the superposed result, namely the average calculation is carried out after the summation.
And (III) determining the average value as the data difference value.
In the implementation process, the average value obtained in the above steps is determined as a data difference value, that is, the data difference value is an average value of the difference values between the predicted data and the measured data in each period.
Step 202: and if the data difference value exceeds an abnormal difference threshold value, judging that the measured data corresponding to the current period is abnormal, wherein the abnormal difference threshold value is determined based on the predicted data and the measured data corresponding to at least one previous period.
In the implementation process, after the data difference value is determined, whether the measured data corresponding to the current period is abnormal or not is further judged according to the data difference value. And the measure of the data difference is the anomaly threshold.
In view of the correlation between the measured data, in practice, the anomaly threshold is determined by a history period earlier than at least one previous period, and is determined by:
firstly), inputting measured data corresponding to at least one historical period and measured period characteristics corresponding to at least one historical period into an abnormality detection model to obtain predicted data corresponding to at least one historical period.
In the implementation process, in order to obtain the abnormal difference threshold, the prediction data corresponding to the history period is acquired first, and the specific method is as follows: and extracting corresponding actual measurement period characteristics from the historical period, and inputting actual measurement data corresponding to the historical period and the corresponding actual measurement period characteristics into an abnormality detection model to obtain prediction data corresponding to the historical period.
In order to make the anomaly difference threshold more accurate, the number of the history cycles may be one or more.
And II) determining an abnormal difference threshold value based on the predicted data corresponding to at least one historical period and the measured data corresponding to at least one historical period.
In the implementation process, after the prediction data corresponding to the historical period and the actual measurement data corresponding to the historical period are obtained, the abnormal difference threshold value can be determined.
Specifically, determining the anomaly threshold value includes:
one ] for each preset previous sampling instant within at least one history period: and (4) making a difference between the predicted data corresponding to the previous sampling time and the actually measured data corresponding to the previous sampling time to obtain a preselected previous difference value.
Similarly, in practice, the difference between the measured data and the predicted data at the same time is required. Performing the following at each previous sampling instant within at least one history period: and obtaining the predicted data corresponding to the previous sampling time, obtaining the actual measurement data corresponding to the previous sampling time, and subtracting the predicted data and the actual measurement data at the same previous sampling time to obtain a pre-selection previous difference value, so that a plurality of corresponding pre-selection previous difference values are obtained at a plurality of previous sampling times.
Two ] pre-selected previous difference values for respective previous sampling instants are superimposed, a previous average value is obtained based on the superimposed previous results, and the previous average value is determined as an anomalous difference threshold.
Similarly, in order to take uniform consideration for each period, considering that each period includes different numbers of previous sampling instants, in an implementation, preselected previous difference values of the previous sampling instants are superimposed, i.e., a sum of the preselected previous difference values is calculated, and then an average value is taken based on the superimposed result, i.e., an average calculation is performed after the sum is performed, so that the calculated previous average value is determined as an anomalous difference threshold.
Based on the same inventive concept, referring to fig. 3, an embodiment of the present application provides an apparatus for performing anomaly detection on periodic data, including:
a difference determining unit 301, configured to determine a data difference based on the predicted data corresponding to the current period and the actual measured data corresponding to the current period; the prediction data corresponding to the current period is obtained by inputting the actually measured data and the actually measured period characteristics of at least one previous period into the abnormality detection model; the anomaly detection model is obtained by training based on measured data of at least one historical period, measured period characteristics corresponding to the at least one historical period and predicted data corresponding to the at least one historical period, wherein the historical period is earlier than at least one previous period;
an anomaly determination unit 302, configured to determine that the measured data corresponding to the current cycle is abnormal if the data difference exceeds an anomaly difference threshold, where the anomaly difference threshold is determined based on the predicted data and the measured data corresponding to at least one previous cycle.
Optionally, the anomaly detection model is trained by:
taking the measured data of at least one historical period and the measured period characteristics corresponding to at least one historical period as input parameters of a lasso regression model, and taking the predicted data corresponding to at least one historical period as output parameters of the lasso regression model;
and training the lasso regression model based on the input parameters and the output parameters, and taking the trained lasso regression model as an abnormality detection model.
Optionally, the measured data corresponding to the current period is determined by:
determining a current starting time and a current ending time corresponding to a current period;
determining a current sliding window corresponding to the current period based on the current starting time and the current ending time;
intercepting an original data stream based on a current sliding window, and determining a plurality of intercepted original data as measured data corresponding to a current period, wherein the original data stream is a set of state data of an edge cluster in an operation process;
the measured data of at least one previous cycle is determined by:
determining a previous starting time and a previous ending time corresponding to at least one previous period;
determining a previous sliding window corresponding to at least one previous period based on a previous starting time and a previous ending time;
and intercepting the original data stream based on a previous sliding window, and determining a plurality of intercepted original data as measured data corresponding to at least one previous period.
Optionally, the measured period characteristic is determined by:
extracting corresponding target characteristics from actually measured data corresponding to the current period at each sampling time preset in the current period;
and determining the measured period characteristics based on the target characteristics.
Optionally, the data difference value is determined by:
for each sampling instant within the current cycle, performing the following: the prediction data corresponding to the sampling moment and the actual measurement data corresponding to the sampling moment are subjected to difference to obtain a preselected difference value;
overlapping the preselected difference values of all sampling moments, and taking an average value based on the overlapped result;
the average value is determined as the data difference value.
Optionally, the anomaly difference threshold is determined by:
inputting measured data corresponding to at least one historical period and measured period characteristics corresponding to at least one historical period into an abnormality detection model to obtain predicted data corresponding to at least one historical period;
and determining an abnormal difference threshold value based on the predicted data corresponding to the at least one historical period and the measured data corresponding to the at least one historical period.
Optionally, determining the anomaly threshold based on the predicted data corresponding to the at least one historical period and the measured data corresponding to the at least one historical period includes:
performing the following operations for each preset previous sampling moment in at least one history period: the prediction data corresponding to the previous sampling moment is differenced with the actually measured data corresponding to the previous sampling moment to obtain a preselected previous difference value;
the preselected previous difference values for each previous sampling instant are superimposed, a previous average value is derived based on the superimposed previous results, and the previous average value is determined as the outlier threshold.
Based on the same inventive concept, referring to fig. 4, an embodiment of the present application provides an intelligent terminal, including: a memory 401 for storing executable instructions; a processor 402 for reading and executing executable instructions stored in the memory and performing any of the methods of the first aspect described above.
Based on the same inventive concept, embodiments of the present application provide a computer-readable storage medium, wherein instructions of the storage medium, when executed by a processor, enable the processor to perform the method of any one of the first aspect.
To sum up, in the embodiment of the present application, a method, an apparatus, and a storage medium for performing anomaly detection on periodic data are provided, where the method includes: the method comprises the steps of determining a data difference value based on predicted data corresponding to a current period and actually measured data corresponding to the current period, and judging that the actually measured data corresponding to the current period is abnormal if the data difference value exceeds an abnormal difference threshold value, wherein the predicted data corresponding to the current period is obtained by inputting the actually measured data and the actually measured period characteristics of at least one previous period into an abnormal detection model, the abnormal detection model is obtained by training the actually measured data of at least one historical period, the actually measured period characteristics corresponding to at least one historical period and the predicted data corresponding to at least one historical period, it needs to be stated that the historical period is earlier than the at least one previous period, and the abnormal difference threshold value is determined based on the predicted data and the actually measured data corresponding to the at least one previous period, so that abnormality is better detected, and the accuracy of alarming is improved.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product system. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product system embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program product systems according to the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims (10)

1. A method of anomaly detection for periodic data, the method comprising:
determining a data difference value based on the predicted data corresponding to the current period and the actually measured data corresponding to the current period; the prediction data corresponding to the current period is obtained by inputting the measured data and the measured period characteristics of at least one previous period into an abnormality detection model; the anomaly detection model is obtained by training based on measured data of at least one historical period, measured period characteristics corresponding to the at least one historical period and predicted data corresponding to the at least one historical period, wherein the historical period is earlier than the at least one previous period;
and if the data difference value exceeds an abnormal difference threshold value, judging that the measured data corresponding to the current period is abnormal, wherein the abnormal difference threshold value is determined based on the predicted data and the measured data corresponding to at least one previous period.
2. The method of claim 1, wherein the anomaly detection model is trained by:
taking the measured data of the at least one historical period and the measured period characteristics corresponding to the at least one historical period as input parameters of a lasso regression model, and taking the predicted data corresponding to the at least one historical period as output parameters of the lasso regression model;
and training the lasso regression model based on the input parameters and the output parameters, and taking the trained lasso regression model as the anomaly detection model.
3. The method of claim 1, wherein the measured data corresponding to the current period is determined by:
determining a current starting time and a current ending time corresponding to the current period;
determining a current sliding window corresponding to the current period based on the current starting time and the current ending time;
intercepting an original data stream based on the current sliding window, and determining a plurality of intercepted original data as measured data corresponding to the current period, wherein the original data stream is a set of state data of an edge cluster in the operation process;
the measured data of the at least one previous cycle is determined by:
determining a previous start time and a previous end time corresponding to the at least one previous period;
determining a previous sliding window corresponding to the at least one previous cycle based on the previous start time and the previous end time;
and intercepting the original data stream based on the previous sliding window, and determining a plurality of intercepted original data as measured data corresponding to the at least one previous period.
4. The method of claim 1, wherein the measured period characteristic is determined by:
extracting corresponding target features from the measured data corresponding to the current period at each sampling time preset in the current period;
determining the measured period characteristics based on each of the target characteristics.
5. The method of claim 4, wherein the data difference value is determined by:
performing the following for each of the sampling instants within the current period: the prediction data corresponding to the sampling moment and the actual measurement data corresponding to the sampling moment are subjected to difference to obtain a preselected difference value;
superposing the preselected difference values of the sampling moments, and averaging based on superposed results;
determining the average as the data difference.
6. The method of any of claims 1 to 4, wherein the anomaly threshold is determined by:
inputting the measured data corresponding to the at least one historical period and the measured period characteristics corresponding to the at least one historical period into an anomaly detection model to obtain predicted data corresponding to the at least one historical period;
and determining the abnormal difference threshold value based on the predicted data corresponding to the at least one historical period and the measured data corresponding to the at least one historical period.
7. The method of claim 6, wherein determining the anomalous difference threshold based on the predicted data corresponding to the at least one historical period and the measured data corresponding to the at least one historical period comprises:
performing the following for each preset previous sampling instant within the at least one history period: the predicted data corresponding to the previous sampling moment is differenced with the actually measured data corresponding to the previous sampling moment to obtain a preselected previous difference value;
superimposing the preselected previous difference values for each of the previous sampling instants, deriving a previous average value based on a superimposed previous result, and determining the previous average value as the outlier difference threshold.
8. An apparatus for anomaly detection of periodic data, comprising:
a difference determining unit, configured to determine a data difference based on the predicted data corresponding to the current period and the measured data corresponding to the current period; the prediction data corresponding to the current period is obtained by inputting the measured data and the measured period characteristics of at least one previous period into an abnormality detection model; the anomaly detection model is obtained by training based on measured data of at least one historical period, measured period characteristics corresponding to the at least one historical period and predicted data corresponding to the at least one historical period, wherein the historical period is earlier than the at least one previous period;
and the abnormity judging unit is used for judging that the actually measured data corresponding to the current period is abnormal if the data difference value exceeds an abnormity difference threshold value, wherein the abnormity difference threshold value is determined based on the predicted data and the actually measured data corresponding to at least one previous period.
9. An intelligent terminal, comprising:
a memory for storing executable instructions;
a processor for reading and executing executable instructions stored in the memory to implement the method of any one of claims 1-7.
10. A computer-readable storage medium, wherein instructions in the storage medium, when executed by a processor, enable the processor to perform the method of any of claims 1-7.
CN202210855545.5A 2022-07-20 2022-07-20 Method, device and storage medium for carrying out anomaly detection on periodic data Pending CN115269241A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210855545.5A CN115269241A (en) 2022-07-20 2022-07-20 Method, device and storage medium for carrying out anomaly detection on periodic data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210855545.5A CN115269241A (en) 2022-07-20 2022-07-20 Method, device and storage medium for carrying out anomaly detection on periodic data

Publications (1)

Publication Number Publication Date
CN115269241A true CN115269241A (en) 2022-11-01

Family

ID=83767420

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210855545.5A Pending CN115269241A (en) 2022-07-20 2022-07-20 Method, device and storage medium for carrying out anomaly detection on periodic data

Country Status (1)

Country Link
CN (1) CN115269241A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115859209A (en) * 2023-02-08 2023-03-28 烟台市福山区动物疫病预防控制中心 Animal husbandry poultry breeding abnormity identification method based on feed consumption data

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115859209A (en) * 2023-02-08 2023-03-28 烟台市福山区动物疫病预防控制中心 Animal husbandry poultry breeding abnormity identification method based on feed consumption data

Similar Documents

Publication Publication Date Title
CN105787248B (en) The abnormal sensing and forecasting system and method for analysis based on time series data
CN102713862B (en) Error cause extraction device, failure cause extracting method and program recorded medium
KR101748122B1 (en) Method for calculating an error rate of alarm
KR101848193B1 (en) Prediction method of disk capacity, equipment, facilities and non-volatile computer storage media
KR102067344B1 (en) Apparatus and Method for Detecting Abnormal Vibration Data
CN112149510A (en) Non-invasive load detection method
CN112532643B (en) Flow anomaly detection method, system, terminal and medium based on deep learning
CN106598822B (en) A kind of abnormal deviation data examination method and device for Capacity Assessment
JP2000181526A (en) Device and method for estimating/predicting plant state
CN115269241A (en) Method, device and storage medium for carrying out anomaly detection on periodic data
CN115800272A (en) Power grid fault analysis method, system, terminal and medium based on topology identification
CN112565187A (en) Power grid attack detection method, system, equipment and medium based on logistic regression
CN112926636A (en) Method and device for detecting abnormal temperature of traction converter cabinet body
EP2132609A1 (en) Machine condition monitoring using discontinuity detection
CN111161097A (en) Method and device for detecting switch event based on event detection algorithm of hypothesis test
CN113746862A (en) Abnormal flow detection method, device and equipment based on machine learning
JP2020166407A (en) Model generation device, abnormality occurrence prediction device, abnormality occurrence prediction model generation method and abnormality occurrence prediction method
CN117370818A (en) Intelligent diagnosis method and intelligent environment-friendly system for water supply and drainage pipe network based on artificial intelligence
CN115495274B (en) Exception handling method based on time sequence data, network equipment and readable storage medium
JPH08314530A (en) Fault prediction device
CN111125195A (en) Data anomaly detection method and device
CN116108376A (en) Monitoring system and method for preventing electricity stealing, electronic equipment and medium
US20220253051A1 (en) Method and assistance system for detecting an abnormal behaviour of a device
CN114330569A (en) Method and equipment for detecting fan unit component fault and storage medium
US11131985B2 (en) Noise generation cause estimation device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination