CN113420800B - Data anomaly detection method and device - Google Patents

Data anomaly detection method and device Download PDF

Info

Publication number
CN113420800B
CN113420800B CN202110657018.9A CN202110657018A CN113420800B CN 113420800 B CN113420800 B CN 113420800B CN 202110657018 A CN202110657018 A CN 202110657018A CN 113420800 B CN113420800 B CN 113420800B
Authority
CN
China
Prior art keywords
current moment
factor
flow
local
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110657018.9A
Other languages
Chinese (zh)
Other versions
CN113420800A (en
Inventor
尉书宾
杨校林
何群辉
李菁菁
胡颖
赵毅
邓鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Computer Network Information Center of CAS
Original Assignee
Computer Network Information Center of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Computer Network Information Center of CAS filed Critical Computer Network Information Center of CAS
Priority to CN202110657018.9A priority Critical patent/CN113420800B/en
Publication of CN113420800A publication Critical patent/CN113420800A/en
Application granted granted Critical
Publication of CN113420800B publication Critical patent/CN113420800B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24147Distances to closest patterns, e.g. nearest neighbour classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The embodiment of the invention relates to a data anomaly detection method and device. The method comprises the following steps: obtaining an observation value of the flow at the current moment; calculating residual errors of the current moment corresponding to the observed value and the predicted value of the current moment flow; establishing a time sequence sliding window; the residual error corresponding to the current moment is put into the time sequence sliding window corresponding to the current moment according to the time sequence, and a residual error sequence is obtained; processing the residual sequence to obtain a flow anomaly factor corresponding to the current moment; and determining that the flow abnormality occurs when the abnormality factor exceeds a preset threshold. According to the data anomaly detection method provided by the invention, the anomaly factor at the current moment is given by combining the time sequence sliding window, the flow observed value and the flow predicted value at the current moment, and if the anomaly factor exceeds the set threshold, the anomaly of the data acquisition source at the current moment is recorded. The method and the device can find whether the data is abnormal or not by using a small amount of data, and ensure shorter operation time.

Description

Data anomaly detection method and device
Technical Field
The present invention relates to the field of computers, and in particular, to a method and apparatus for detecting data anomalies.
Background
The community safety is directly related to the life safety and property safety of people, and the benefits of people can be practically maintained only by reducing the risk of communities, so that people can live in peace and happy business. A plurality of devices for collecting risk data, such as an audio camera, a temperature and humidity sensor and an access visitor system, are deployed in the community, the data generated by the devices can be used as input data for anomaly detection of a community risk data collecting source, and the anomaly detection of the community risk data collecting source is carried out according to a data design related model generated by the data source.
To date, many researchers have proposed many outlier detection-related algorithms, which are roughly classified into the following four categories: statistical model-based, distance model-based, density model-based, and bias model-based. With the rise of artificial intelligence and pattern recognition, more and more novel algorithms are proposed and more applications are implemented. Algorithms based on statistical models require knowledge about data set parameters, such as data distribution, but in most cases the data distribution is unknown. In order to improve the defects of the statistical model-based algorithm, scientific researchers introduce a distance-based anomaly detection algorithm, and the algorithm can be well adapted to a high-dimensional data set without knowing the distribution mode of data in advance, but the distance-based anomaly point detection algorithm can only find global anomaly objects and cannot find local anomaly objects. The prior art adopts an outlier detection algorithm LOF (Local Outlier Factor) based on local density, and the LOF algorithm calculates local anomaly factors based on the local reachable density of each object and its k-neighbor object, but LOF only considers the k-neighbor value of the data object and cannot adapt to a multi-cluster data distribution environment. In the prior art, a local outlier algorithm INFLO (Influenced Outlierness) based on reverse k nearest neighbor is also provided, and the algorithm considers the k-nearest neighbor object and the reverse k-nearest neighbor object of the object simultaneously when calculating the local outlier factor of the object, so that misjudgment of the algorithm on the objects distributed at the cluster edge is avoided.
The LOF algorithm and the INFLO algorithm are unsupervised algorithms, a large amount of labeled data is not needed, but the calculation mode of the LOF ignores reverse k-nearest neighbor of a data object, and the calculation complexity of the INFLO algorithm is too high.
Disclosure of Invention
In order to solve the above problems, the present application proposes a method and apparatus that can discover most of anomalies in data only by using a small amount of tagged data, while ensuring a short operation time.
In a first aspect, the present application provides a data anomaly detection method, including:
obtaining an observation value of the flow at the current moment;
calculating residual errors of the current moment corresponding to the observed value and the predicted value of the current moment flow;
establishing a time sequence sliding window;
the residual error corresponding to the current moment is put into the time sequence sliding window corresponding to the current moment according to the time sequence, and a residual error sequence is obtained;
processing the residual sequence to obtain a flow anomaly factor corresponding to the current moment;
and determining that the flow abnormality occurs when the abnormality factor exceeds a preset threshold.
Preferably, the processing the residual sequence to obtain the flow anomaly factor corresponding to the current time includes:
processing the residual sequence to obtain transformation data;
identifying residual errors corresponding to the current moment in the transformation data as an object p, acquiring a local centroid factor of the object p, and determining whether the local centroid factor is larger than 1;
when the local centroid factor is not more than 1, acquiring reverse k-nearest neighbor of the object p;
and if the reverse k-neighbor of the object p is larger than the average extraction difference value of all factors in the window, acquiring the local abnormal factor of the object p as a flow abnormal factor.
Preferably, the obtaining the local centroid factor of the object p includes determining by the following formula:
Figure BDA0003113473270000021
wherein lcf characterizes the local centroid factor, o characterizes the nearest object to the object p, k-dist (p) characterizes the k-proximity distance of the object p, NN k (p) characterizes the number of k-neighbors of the object p.
Preferably, the obtaining the local anomaly factor of the object includes determining by the following formula:
Figure BDA0003113473270000031
wherein lof characterizes a local abnormality factor, den (p) characterizes the local density of the object p, IS k (p) characterizing the influence domain of the number image p, den avg (IS k (p)) characterising the average local density of data objects within an influence domain, RNN k (p) represents the number of reverse k-neighbors of object p, q represents any residual object within the window, and N represents the number of residual objects.
Preferably, said den (p) is determined by the following formula:
Figure BDA0003113473270000032
wherein ,
Figure BDA0003113473270000033
characterizing the inverse, e, of the k-nearest neighbor distance of the object p -k-dist(p) The base is characterized as e and the index is that of the k-nearest neighbor distance of the negative object p.
Preferably, the data anomaly detection method further includes:
and inputting the historical flow data before the current moment into a long-short-period memory network model for prediction processing, and taking a predicted result obtained by the long-short-period memory network model for prediction processing as a predicted value of the flow at the current moment.
In a second aspect, the present application provides a data anomaly detection device, including:
the observation value acquisition module is used for acquiring the observation value of the flow at the current moment;
the calculation module is used for calculating residual errors of the current moment corresponding to the observed value and the predicted value of the current moment flow;
the window module is used for establishing a time sequence sliding window;
the first determining module is used for placing the residual error corresponding to the current moment into the time sequence sliding window corresponding to the current moment according to the time sequence to obtain a residual error sequence;
the processing module is used for processing the residual sequence to obtain a flow abnormal factor corresponding to the current moment;
and the second determining module is used for determining that the flow abnormality occurs when the abnormality factor exceeds a preset threshold.
Preferably, the processing the residual sequence to obtain the flow anomaly factor corresponding to the current time includes:
processing the residual sequence to obtain transformation data;
identifying residual errors corresponding to the current moment in the transformation data as an object p, acquiring a local centroid factor of the object p, and determining whether the local centroid factor is larger than 1;
when the local centroid factor is not more than 1, acquiring reverse k-nearest neighbor of the object p;
and if the reverse k-neighbor of the object p is larger than the average extraction difference value of all factors in the window, acquiring the local abnormal factor of the object p as a flow abnormal factor.
Preferably, the obtaining the local centroid factor of the object p includes determining by the following formula:
Figure BDA0003113473270000041
wherein lcf characterizes the local centroid factor, o characterizes the nearest object to the object p, k-dist (p) characterizes the k-proximity distance of the object p, NN k (p) characterizes the number of k-neighbors of the object p.
Preferably, the data anomaly detection device further includes:
the prediction value acquisition module is used for inputting the historical flow data before the current moment into the long-short-period memory network model for prediction processing, and taking a prediction result obtained by the long-short-period memory network model for prediction processing as the prediction value of the flow at the current moment.
The invention provides a data anomaly detection method, which combines a time sequence sliding window with a flow observation value and a flow prediction value at the current moment to give an anomaly factor at the current moment, and records that a data acquisition source at the current moment is abnormal if the anomaly factor exceeds a set threshold value. Only a small amount of data is used to find out whether the data is abnormal or not, so that a shorter operation time is ensured.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present description, the drawings that are needed in the description of the embodiments will be briefly introduced below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is an application schematic diagram of the technical solution provided in the embodiment of the present application;
fig. 2 is a schematic diagram of a method process of the technical solution provided in the embodiments of the present application;
fig. 3 is a schematic diagram of a time sliding window according to the technical solution provided in the embodiment of the present application
Fig. 4 is a schematic diagram of a calculation process of a flow anomaly factor in the technical solution provided in the embodiment of the present application;
fig. 5 is a schematic device diagram of the technical solution provided in the embodiment of the present application;
fig. 6 is a schematic structural diagram of a computer device according to an embodiment of the present application.
Detailed Description
The technical scheme provided by the invention is further described in detail below with reference to the accompanying drawings and the embodiments.
Fig. 1 is an application schematic diagram of the technical solution provided in the embodiment of the present application. As shown in fig. 1, after acquiring flow data that needs to be determined whether to be abnormal, by the flow abnormality detection method of the present application, early warning can be provided when abnormality occurs in flow. In addition, in this application, the description is mainly given by taking the traffic of the designated location as an example, and it is not easy to understand that the traffic may be equivalently replaced by traffic in other various possible traffic scenarios.
Fig. 2 is a schematic diagram of a method process of the technical solution provided in the embodiment of the present application. As shown in fig. 2, the present application provides a flow anomaly detection method, including:
step 201: and obtaining the observed value of the flow at the current moment.
I.e. the actual value of the current moment flow is obtained in a specific way.
Step 202: and calculating residual errors of the current moment corresponding to the observed value and the predicted value of the current moment flow.
In some possible embodiments, the obtaining of the flow prediction value may include:
and inputting the historical flow data before the current moment into a Long Short-Term Memory (LSTM) network model for prediction processing, and taking a predicted result obtained by the Long-Term Memory network model for prediction processing as a predicted value of the flow at the current moment.
Specifically, a long-term and short-term memory network model is established, data before the current moment is input into the LSTM model, and the predicted value of the people flow at the current moment is predicted according to the LSTM model.
Step 203: a time series sliding window is established.
Specifically, a time sequence sliding window is firstly established, and the flow data stream is processed through the established time sequence sliding window.
Fig. 3 is a schematic diagram of a time sliding window according to the technical solution provided in the embodiments of the present application. Referring to fig. 3, the time series sliding window slides once along the time series direction at fixed time intervals, and the step size is unchanged. Illustratively, the time series sliding window has a size of 200, sliding every 5 minutes, with a step size of 1.
Step 204: and placing the residual error corresponding to the current moment into the time sequence sliding window corresponding to the current moment according to the time sequence to obtain a residual error sequence.
It will be appreciated that the collection of flow observations may be performed periodically. Correspondingly, the sliding window can also comprise residuals corresponding to each moment before the current moment.
Specifically, a residual error corresponding to the current moment obtained between the observed value of the human flow and the predicted value of the human flow is put into a time sequence sliding window according to the time sequence, and a residual error sequence is obtained. Since the size of the sliding window is fixed to 200, it is necessary to continuously delete the residual data that is first entered to keep the window size unchanged.
Step 205: and processing the residual sequence to obtain a flow anomaly factor corresponding to the current moment.
Specifically, a residual sequence in a window is obtained, and the residual sequence is calculated to obtain a flow abnormal factor corresponding to the current moment.
Fig. 4 is a schematic diagram of a calculation process of the flow anomaly factor in the technical solution provided in the embodiment of the present application. Referring to fig. 4, in some possible embodiments, the step of processing the residual sequence to obtain the flow anomaly factor corresponding to the current time may include:
step 2051: and performing z-score transformation on the residual sequence to obtain transformation data.
Because the residual error value of the traffic is larger, the residual error needs to be preprocessed, and the z-score transformation is carried out on the residual error sequence in the sliding window to obtain preprocessed transformation data.
The processing of the transformed data can use a static local outlier rapid detection algorithm (Faster Influenced Outlierness, FINFLO) provided by the application, the total data amount in a time sequence sliding window is kept unchanged, old data is continuously deleted from the sliding time window, new data is added, and an outlier factor of a data object in the current window is calculated in the sliding window according to the designed FINFLO algorithm.
For the specific process of the algorithm, please refer to the following steps.
Step 2052: and identifying the residual error corresponding to the current moment in the transformation data as an object p, acquiring a local centroid factor (Local Centroid Factor, LCF) of the object p, and determining whether the local centroid factor is larger than 1. The local centroid factor characterizes the local degree of normality of the object.
Specifically, when the object p is present in the coordinate system, it becomes a point,
in some possible embodiments, the local centroid factor of the object p is obtained, including being determined by the following formula:
Figure BDA0003113473270000071
wherein lcf characterizes the local centroid factor, NN k (p) represents the number of k-neighbors of the object p, o represents the nearest object to the object p, and the distance between o and p is called the k-neighbor distance of p, and is represented as k-dist (p). Object o satisfies the following condition:
condition 1: at least NNK objects o '∈D\ { p } exist such that D (p, o'). Ltoreq.d (p, o). Where o' characterizes any one of the objects within the window and d characterizes the distance.
Condition 2: there are at least NNK-1 objects o 'εD\ { p }, such that D (p, o') < D (p, o). Where o' characterizes any one of the objects within the window and d characterizes the distance.
Step 2053: and when the local centroid factor is not more than 1, acquiring reverse k-nearest neighbor of the object p.
Specifically, when the local centroid factor is not more than 1, acquiring reverse k-nearest neighbor of the object p; when the local centroid factor is greater than 1, then it is determined that the object p is a normal factor.
Step 2054: and if the reverse k-neighbor of the object p is larger than the average extraction difference value of all factors in the window, acquiring the local abnormal factor of the object p.
In some possible embodiments, obtaining local anomaly factors (Local Outlier Factor, LOF) of the object includes determining by:
Figure BDA0003113473270000081
wherein lof characterizes local abnormality factors, den (p) characterizes the local density (Local Outlier Density, LOD), IS of the subject p k (p) characterizing the influence domain of the number image p, den avg (IS k (p)) characterising the average local density of data objects within an influence domain, RNN k (p) characterizes the number of inverse k-neighbors of the object p, q characterizes any factor (residual) within the window, and N represents the number of factors.
In some possible embodiments, den (p) is determined by the following formula:
Figure BDA0003113473270000082
wherein ,
Figure BDA0003113473270000083
characterizing the inverse, e, of the k-nearest neighbor distance of the object p -k-dist(p) The base is characterized as e and the index is that of the k-nearest neighbor distance of the negative object p.
Specifically, when the reverse k-nearest neighbor of the object p is larger than the average extraction difference value of all factors in the window, obtaining the local abnormal factor of the object p; when the inverse k-nearest neighbor of the object p is not greater than the average extraction difference of all factors in the window, the object p is determined to be a normal factor.
The pseudocode of the FINFLO algorithm is as follows:
Figure BDA0003113473270000084
Figure BDA0003113473270000091
step 206: and determining that the flow abnormality occurs when the abnormality factor exceeds a preset threshold.
In some possible embodiments, when the abnormality factor exceeds a preset threshold, an alarm is sent first, and whether abnormal traffic occurs is further determined manually or in other manners.
Specifically, at time t, the abnormal factor of the data in the window exceeds the preset threshold, and at this time, the abnormal flow is not directly determined, but alarm information is sent to the manager, and the manager determines whether the abnormal flow occurs.
The pseudocode of the overall flow anomaly detection algorithm is as follows:
Figure BDA0003113473270000092
Figure BDA0003113473270000101
fig. 5 is a schematic device diagram of the technical solution provided in the embodiment of the present application. As shown in fig. 4, a flow anomaly detection device includes:
the observation value obtaining module 401 is configured to obtain an observation value of the flow at the current time.
A calculating module 403, configured to calculate a residual error at the current time corresponding to the observed value and the predicted value of the flow at the current time.
In some possible embodiments, the flow anomaly detection device further includes:
and the predicted value acquisition module is used for inputting the historical flow data before the current moment into the long-short-period memory network model, and taking the predicted result of the long-short-period memory network model as the predicted value of the flow at the current moment.
A window module 404 for establishing a time series sliding window.
The first determining module 405 puts the residual error corresponding to the current time into the time sequence sliding window corresponding to the current time according to the time sequence, so as to obtain a residual error sequence.
And a processing module 406, configured to process the residual sequence to obtain a flow anomaly factor corresponding to the current time.
In some possible embodiments, processing the residual sequence to obtain the flow anomaly factor corresponding to the current time includes:
processing the residual sequence to obtain transformation data;
identifying residual errors corresponding to the current moment in the transformation data as an object p, acquiring a local centroid factor of the object p, and determining whether the local centroid factor is larger than 1;
obtaining a local centroid factor of the object p, comprising determining by:
Figure BDA0003113473270000111
/>
wherein lcf characterizes the local centroid factor, o characterizes the nearest object to the object p, k-dist (p) characterizes the k-proximity distance of the object p, NN k (p) characterizes the number of k-neighbors of the object p.
When the local centroid factor is not more than 1, acquiring reverse k-nearest neighbor of the object p;
and if the reverse k-neighbor of the object p is larger than the average extraction difference value of all factors in the window, acquiring the local abnormal factor of the object p as a flow abnormal factor.
Obtaining local anomaly factors of the object, including determining by the following formula:
Figure BDA0003113473270000112
wherein lof characterizes local abnormality factors, den (p) characterizes local density, IS k (p) characterizing the influence domain of the number image p, den avg (IS k (p)) characterising the average local density of data objects within an influence domain, RNN k (p) represents the number of reverse k-neighbors of the object p, q represents any factor within the window, and N represents the number of factors. den (p) is determined by the following formula:
Figure BDA0003113473270000113
wherein ,
Figure BDA0003113473270000114
characterizing the inverse, e, of the k-nearest neighbor distance of the object p -k-dist(p) The base is characterized as e and the index is that of the k-nearest neighbor distance of the negative object p.
A second determining module 407 is configured to determine that a flow abnormality occurs when the abnormality factor exceeds a preset threshold.
The invention provides a new risk data anomaly detection method, which consists of a static local anomaly point detection algorithm (FINFLO algorithm), a time sequence sliding window and a monitoring data prediction model-LSTM. Taking the prediction model as a base line, the prediction model can give out the flow of the next moment T, and an abnormal factor of the moment T is given out by combining the flow observation value and the flow prediction value of the sliding window and the moment T, and if the abnormal factor exceeds a set threshold value, the data acquisition source at the moment T is recorded to be abnormal. The method has low computational complexity, ensures shorter running time, and can discover most abnormal data under the condition of only a small amount of data.
Fig. 6 shows a schematic structural diagram of a computer device provided in an embodiment of the present specification, where the computer device may include: processor 610, memory 620, input/output interface 630, communication interface 640, and bus 650. Wherein processor 610, memory 620, input/output interface 630, and communication interface 640 enable communication connections among each other within the device via bus 650. The computer device may be used to perform the method shown in fig. 2 described above.
The processor 610 may be implemented by a general-purpose CPU (Central Processing Unit ), microprocessor, application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, etc. for executing relevant programs to implement the technical solutions provided in the embodiments of the present disclosure.
The Memory 620 may be implemented in the form of ROM (Read Only Memory), RAM (Random Access Memory ), a static storage device, a dynamic storage device, or the like. Memory 620 may store an operating system and other application programs, and when the technical solutions provided by the embodiments of the present specification are implemented in software or firmware, relevant program codes are stored in memory 620 and invoked for execution by processor 610.
The input/output interface 630 is used for connecting with an input/output module to realize information input and output. The input/output module may be configured as a component in a device (not shown) or may be external to the device to provide corresponding functionality. Wherein the input devices may include a keyboard, mouse, touch screen, microphone, various types of sensors, etc., and the output devices may include a display, speaker, vibrator, indicator lights, etc.
The communication interface 640 is used to connect a communication module (not shown in the figure) to enable communication interaction between the present device and other devices. The communication module may implement communication through a wired manner (such as USB, network cable, etc.), or may implement communication through a wireless manner (such as mobile network, WIFI, bluetooth, etc.).
Bus 650 includes a path to transfer information between components of the device (e.g., processor 610, memory 620, input/output interface 630, and communication interface 640).
It should be noted that although the above device only shows the processor 610, the memory 620, the input/output interface 630, the communication interface 640, and the bus 650, in the implementation, the device may further include other components necessary for achieving normal operation. Furthermore, it will be understood by those skilled in the art that the above-described apparatus may include only the components necessary to implement the embodiments of the present description, and not all the components shown in the drawings.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program for instructing relevant hardware, where the program may be stored in a computer readable storage medium, and the storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The foregoing description of the embodiments has been provided for the purpose of illustrating the general principles of the invention, and is not meant to limit the scope of the invention, but to limit the invention to the particular embodiments, and any modifications, equivalents, improvements, etc. that fall within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims (6)

1. A data anomaly detection method, comprising:
obtaining an observation value of the flow at the current moment;
calculating residual errors of the current moment corresponding to the observed value and the predicted value of the current moment flow;
establishing a time sequence sliding window;
the residual error corresponding to the current moment is put into the time sequence sliding window corresponding to the current moment according to the time sequence, and a residual error sequence is obtained;
processing the residual sequence to obtain a flow anomaly factor corresponding to the current moment;
determining that flow abnormality occurs when the abnormality factor exceeds a preset threshold;
the processing the residual sequence to obtain the flow anomaly factor corresponding to the current moment comprises the following steps:
processing the residual sequence to obtain transformation data;
identifying residual errors corresponding to the current moment in the transformation data as an object p, acquiring a local centroid factor of the object p, and determining whether the local centroid factor is larger than 1;
when the local centroid factor is not more than 1, acquiring reverse k-nearest neighbor of the object p;
if the reverse k-nearest neighbor of the object p is larger than the average extraction difference value of all factors in the window, acquiring a local abnormal factor of the object p as a flow abnormal factor;
the obtaining the local centroid factor of the object p includes determining by the following formula:
Figure FDA0004149449230000011
wherein lcf characterizes the local centroid factor, o characterizes the nearest object to the object p, k-dist (p) characterizes the k-proximity distance of the object p, NN k (p) characterizes the number of k-neighbors of the object p.
2. The method of claim 1, wherein the obtaining the local anomaly factor for the object comprises determining by the formula:
Figure FDA0004149449230000021
wherein lof characterizes a local abnormality factor, den (p) characterizes the local density of the object p, IS k (p) characterizing the influence domain of the number image p, den avg (IS k (p)) characterising the average local density of data objects within an influence domain, RNN k (p) characterizing the number of reverse k-neighbors of the object p, q characterizing any residues within the windowThe difference object, N, represents the number of residual objects.
3. The method according to claim 2, wherein den (p) is determined by the following formula:
Figure FDA0004149449230000022
wherein ,
Figure FDA0004149449230000023
characterizing the inverse, e, of the k-nearest neighbor distance of the object p -k-dist(p) The base is characterized as e and the index is that of the k-nearest neighbor distance of the negative object p.
4. A method according to any one of claims 1-3, characterized in that the method further comprises:
and inputting the historical flow data before the current moment into a long-short-period memory network model for prediction processing, and taking a predicted result obtained by the long-short-period memory network model for prediction processing as a predicted value of the flow at the current moment.
5. A data anomaly detection device, comprising:
the observation value acquisition module is used for acquiring the observation value of the flow at the current moment;
the calculation module is used for calculating residual errors of the current moment corresponding to the observed value and the predicted value of the current moment flow;
the window module is used for establishing a time sequence sliding window;
the first determining module is used for placing the residual error corresponding to the current moment into the time sequence sliding window corresponding to the current moment according to the time sequence to obtain a residual error sequence;
the processing module is used for processing the residual sequence to obtain a flow abnormal factor corresponding to the current moment;
the second determining module is used for determining that flow abnormality occurs when the abnormality factor exceeds a preset threshold;
the processing the residual sequence to obtain the flow anomaly factor corresponding to the current moment comprises the following steps:
processing the residual sequence to obtain transformation data;
identifying residual errors corresponding to the current moment in the transformation data as an object p, acquiring a local centroid factor of the object p, and determining whether the local centroid factor is larger than 1;
when the local centroid factor is not more than 1, acquiring reverse k-nearest neighbor of the object p;
if the reverse k-nearest neighbor of the object p is larger than the average extraction difference value of all factors in the window, acquiring a local abnormal factor of the object p as a flow abnormal factor;
the obtaining the local centroid factor of the object p includes determining by the following formula:
Figure FDA0004149449230000031
wherein lcf characterizes the local centroid factor, o characterizes the nearest object to the object p, k-dist (p) characterizes the k-proximity distance of the object p, NN k (p) characterizes the number of k-neighbors of the object p.
6. The data anomaly detection apparatus of claim 5, wherein the apparatus further comprises:
the predicted value acquisition module is used for inputting the historical flow data before the current moment into the long-short-period memory network model for prediction processing, and taking a predicted result obtained by the long-short-period memory network model for prediction processing as a predicted value of the flow at the current moment.
CN202110657018.9A 2021-06-11 2021-06-11 Data anomaly detection method and device Active CN113420800B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110657018.9A CN113420800B (en) 2021-06-11 2021-06-11 Data anomaly detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110657018.9A CN113420800B (en) 2021-06-11 2021-06-11 Data anomaly detection method and device

Publications (2)

Publication Number Publication Date
CN113420800A CN113420800A (en) 2021-09-21
CN113420800B true CN113420800B (en) 2023-06-02

Family

ID=77788496

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110657018.9A Active CN113420800B (en) 2021-06-11 2021-06-11 Data anomaly detection method and device

Country Status (1)

Country Link
CN (1) CN113420800B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113878214B (en) * 2021-12-08 2022-03-25 苏芯物联技术(南京)有限公司 Welding quality real-time detection method and system based on LSTM and residual distribution
CN114020598B (en) * 2022-01-05 2022-04-19 云智慧(北京)科技有限公司 Method, device and equipment for detecting abnormity of time series data

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104537034B (en) * 2014-12-22 2017-11-10 国家电网公司 The Condition Monitoring Data cleaning method of power transmission and transforming equipment based on time series analysis
CN106685750B (en) * 2015-11-11 2019-12-24 华为技术有限公司 System anomaly detection method and device
CN107092582B (en) * 2017-03-31 2021-04-27 江苏方天电力技术有限公司 Online abnormal value detection and confidence evaluation method based on residual posterior
CN110008080B (en) * 2018-12-25 2023-08-11 创新先进技术有限公司 Business index anomaly detection method and device based on time sequence and electronic equipment
CN109886833B (en) * 2019-01-21 2023-01-17 广东电网有限责任公司信息中心 Deep learning method for smart grid server flow anomaly detection

Also Published As

Publication number Publication date
CN113420800A (en) 2021-09-21

Similar Documents

Publication Publication Date Title
CN113420800B (en) Data anomaly detection method and device
JP3832281B2 (en) Outlier rule generation device, outlier detection device, outlier rule generation method, outlier detection method, and program thereof
CN112491872A (en) Abnormal network access behavior detection method and system based on equipment image
US11631306B2 (en) Methods and system for monitoring an environment
CN112818066A (en) Time sequence data anomaly detection method and device, electronic equipment and storage medium
CN110928862A (en) Data cleaning method, data cleaning apparatus, and computer storage medium
CN111931713B (en) Abnormal behavior detection method and device, electronic equipment and storage medium
CN111368980A (en) State detection method, device, equipment and storage medium
CN113284002A (en) Power consumption data anomaly detection method and device, computer equipment and storage medium
CN113792691A (en) Video identification method, system, device and medium
CN111383246A (en) Scroll detection method, device and equipment
CN117131110A (en) Method and system for monitoring dielectric loss of capacitive equipment based on correlation analysis
CN117112336B (en) Intelligent communication equipment abnormality detection method, equipment, storage medium and device
US10437944B2 (en) System and method of modeling irregularly sampled temporal data using Kalman filters
Beigi et al. Anomaly detection in information streams without prior domain knowledge
CN110770753B (en) Device and method for real-time analysis of high-dimensional data
CN115881228B (en) Gene detection data cleaning method and system based on artificial intelligence
Madhwaraj et al. Forest fire detection using machine learning
CN116205885A (en) Abnormal state detection method, system, electronic device and readable storage medium
CN115514614A (en) Cloud network anomaly detection model training method based on reinforcement learning and storage medium
CN112784691B (en) Target detection model training method, target detection method and device
KR102242042B1 (en) Method, apparatus and computer program for data labeling
JP2021192155A (en) Program, method and system for supporting abnormality detection
CN114443407A (en) Detection method and system of server, electronic equipment and storage medium
CN113743293B (en) Fall behavior detection method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant