CN114595134A - Abnormal data detection method and device, electronic equipment and computer storage medium - Google Patents

Abnormal data detection method and device, electronic equipment and computer storage medium Download PDF

Info

Publication number
CN114595134A
CN114595134A CN202210230052.2A CN202210230052A CN114595134A CN 114595134 A CN114595134 A CN 114595134A CN 202210230052 A CN202210230052 A CN 202210230052A CN 114595134 A CN114595134 A CN 114595134A
Authority
CN
China
Prior art keywords
data sequence
maintenance data
current moment
value
training sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210230052.2A
Other languages
Chinese (zh)
Inventor
杨熠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Agricultural Bank of China
Original Assignee
Agricultural Bank of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Agricultural Bank of China filed Critical Agricultural Bank of China
Priority to CN202210230052.2A priority Critical patent/CN114595134A/en
Publication of CN114595134A publication Critical patent/CN114595134A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3452Performance evaluation by statistical analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/302Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a software system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/32Monitoring with visual or acoustical indication of the functioning of the machine
    • G06F11/324Display of status information
    • G06F11/327Alarm or error message display
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • G06F11/3476Data logging

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Mathematical Physics (AREA)
  • Testing And Monitoring For Control Systems (AREA)

Abstract

The application provides a method and a device for detecting abnormal data, electronic equipment and a computer storage medium, wherein the method comprises the following steps: continuously acquiring an operation and maintenance data sequence at the current moment; then, for each operation and maintenance data sequence at the current moment, inputting the operation and maintenance data sequence at the current moment into a prediction model, and outputting the prediction value of the next minute of data in the operation and maintenance data sequence at the current moment by using the prediction model; the prediction model is obtained by training a target coding and decoding model through a training sample set; the target coding and decoding model is a coding and decoding model added with an attention mechanism; generating a dynamic baseline according to the predicted values of the data in the operation and maintenance data sequence at the current moment in the next minute; and finally, if abnormal data exist in the target time period in the operation and maintenance data sequence at the current moment according to the dynamic baseline, generating alarm information. Therefore, the purpose of accurately and efficiently detecting abnormal data is achieved.

Description

Abnormal data detection method and device, electronic equipment and computer storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for detecting abnormal data, an electronic device, and a computer storage medium.
Background
At present, performance data of an application system, such as transaction amount, response time and the like, is an important ring in monitoring, and the performance data is divided into different dimensions, such as province city, server and the like.
Under the conditions of various system quantities and various dimensions, operation and maintenance responsible persons need to fully know the conditions of the responsible systems, and aiming at basic data such as the maximum value and the minimum value of the ring ratio of historical data, the setting of the threshold value according to the historical ring ratio is not necessarily accurate manually according to experience, and meanwhile, a large amount of setting work needs to consume a large amount of labor cost.
Disclosure of Invention
In view of this, the present application provides a method and an apparatus for detecting abnormal data, an electronic device, and a computer storage medium, which are used to accurately and efficiently detect abnormal data.
The first aspect of the present application provides a method for detecting abnormal data, including:
continuously acquiring an operation and maintenance data sequence at the current moment;
for each operation and maintenance data sequence at the current moment, inputting the operation and maintenance data sequence at the current moment into a prediction model, and outputting the prediction model to obtain a predicted value of data in the operation and maintenance data sequence at the current moment in the next minute; the prediction model is obtained by training a target coding and decoding model through a training sample set; the target coding and decoding model is a coding and decoding model added with an attention mechanism; the training sample set includes: at least one training sample data sequence;
generating a dynamic baseline according to the predicted values of the data in the operation and maintenance data sequence at the current moment in the next minute; wherein the dynamic baseline comprises a real value and a predicted value of the operation and maintenance data sequence every minute;
judging whether abnormal data exist in the target time period in the operation and maintenance data sequence at the current moment or not according to the dynamic baseline;
and if abnormal data exist in the target time period in the operation and maintenance data sequence at the current moment, generating alarm information.
Optionally, the method for constructing the prediction model includes:
constructing a training sample set; wherein the training sample set comprises: at least one training sample data sequence; the training sample data sequence is a sampling rate according to the minute level, and N sampling points are used as a training sample data sequence;
inputting the first N-1 sampling points in the training sample data sequence to a target coding and decoding model, and outputting to obtain a predicted value of the Nth sampling point in the training sample data sequence;
and continuously adjusting parameters in the target coding and decoding model according to the error between the predicted value of the Nth sampling point in the training sample data sequence and the true value of the Nth sampling point in the training sample data sequence until the error between the predicted value of the Nth sampling point in the adjusted training sample data sequence and the true value of the Nth sampling point in the training sample data sequence meets a preset convergence condition, and determining the adjusted target coding and decoding model as a prediction model.
Optionally, the determining, according to the dynamic baseline, whether there is abnormal data in the target time period in the operation and maintenance data sequence at the current time includes:
determining the mean square error between a true value and a predicted value in the operation and maintenance data sequence at the current moment in a target time period;
and if the mean square error is larger than a reference standard, determining that abnormal data exists in the target time period in the operation and maintenance data sequence at the current moment.
Optionally, before determining the mean square error between the real value and the predicted value in the operation and maintenance data sequence at the current time in the target time period, the method further includes:
aiming at the indexes of each category in the operation and maintenance data sequence, giving the indexes of the categories with weights corresponding to the categories to obtain a weighted operation and maintenance data sequence;
the determining of the mean square error between the real value and the predicted value in the operation and maintenance data sequence at the current time in the target time period includes:
determining the mean square error between a real value and a predicted value in the weighted operation and maintenance data sequence at the current moment in a target time period;
if the mean square error is greater than a reference standard, determining whether abnormal data exists in the operation and maintenance data sequence at the current moment in a target time period includes:
and if the mean square error is larger than the reference standard, determining that abnormal data exists in the target time period in the weighted operation and maintenance data sequence at the current moment.
Optionally, before determining the mean square error between the real value and the predicted value in the operation and maintenance data sequence at the current time in the target time period, the method further includes:
normalizing the true value and the predicted value in the operation and maintenance data sequence at the current moment to obtain the normalized true value and the normalized predicted value in the operation and maintenance data sequence at the current moment;
the determining of the mean square error between the real value and the predicted value in the operation and maintenance data sequence at the current time in the target time period includes:
determining the mean square error between a real value and a predicted value in the normalized operation and maintenance data sequence at the current moment in a target time period;
if the mean square error is greater than a reference standard, determining whether abnormal data exists in the operation and maintenance data sequence at the current moment in a target time period includes:
and if the mean square error is larger than the reference standard, determining that abnormal data exists in the target time period in the normalized operation and maintenance data sequence at the current moment.
Optionally, if it is determined that abnormal data exists in the target time period in the operation and maintenance data sequence at the current time, after generating the alarm information, the method further includes:
according to historical data in the dynamic baseline, calculating the historical average value of each sampling point and the average value of the historical proportional values;
taking the plus and minus 5 times standard deviation of the historical mean value of the sampling point and the mean value of the historical proportion value as the upper and lower bounds of the mean value;
judging whether sampling points exceeding the upper and lower boundaries exist within first preset time after the alarm information is generated;
and if the sampling points exceeding the upper and lower limits are judged within the first preset time, the alarm information is confirmed to be effective.
Optionally, if it is determined that there are sampling points exceeding the upper and lower bounds within the preset time, after the alarm information is determined to be valid, the method further includes:
taking plus and minus 3 times of standard deviation of a data mean value of second preset time before the current time in the operation and maintenance data sequence at the current time as an upper boundary and a lower boundary of a predicted value;
judging whether the actual value at the current moment exceeds the upper and lower bounds of the predicted value;
and if the actual value of the current moment exceeds the upper and lower bounds of the predicted value, generating an alarm prompt.
A second aspect of the present application provides an apparatus for detecting abnormal data, including:
the acquisition unit is used for continuously acquiring the operation and maintenance data sequence at the current moment;
the prediction unit is used for inputting the operation and maintenance data sequence at the current moment into a prediction model aiming at each operation and maintenance data sequence at the current moment, and outputting the prediction value of the next minute of data in the operation and maintenance data sequence at the current moment by the prediction model; the prediction model is obtained by training a target coding and decoding model through a training sample set; the target coding and decoding model is a coding and decoding model added with an attention mechanism; the training sample set includes: at least one training sample data sequence;
the first generation unit is used for generating a dynamic baseline according to the predicted values of the data in the operation and maintenance data sequence at the current moment in the next minute; wherein the dynamic baseline comprises a real value and a predicted value of the operation and maintenance data sequence every minute;
the first judgment unit is used for judging whether abnormal data exist in the operation and maintenance data sequence at the current moment in a target time period according to the dynamic baseline;
and the second generating unit is used for generating alarm information if the first judging unit judges that abnormal data exists in the target time period in the operation and maintenance data sequence at the current moment.
Optionally, the building unit of the prediction model includes:
the construction unit is used for constructing a training sample set; wherein the training sample set comprises: at least one training sample data sequence; the training sample data sequence is a sampling rate according to the minute level, and N sampling points are used as a training sample data sequence;
the input unit is used for inputting the first N-1 sampling points in the training sample data sequence to a target coding and decoding model and outputting to obtain a predicted value of the Nth sampling point in the training sample data sequence;
and the adjusting unit is used for continuously adjusting parameters in the target coding and decoding model according to the error between the predicted value of the Nth sampling point in the training sample data sequence and the true value of the Nth sampling point in the training sample data sequence until the error between the predicted value of the Nth sampling point in the training sample data sequence and the true value of the Nth sampling point in the training sample data sequence after adjustment meets a preset convergence condition, and determining the adjusted target coding and decoding model as a prediction model.
Optionally, the first determining unit includes:
the first determining unit is used for determining the mean square error between a true value and a predicted value in the operation and maintenance data sequence at the current moment in a target time period;
and a second determining unit, configured to determine that abnormal data exists in the target time period in the operation and maintenance data sequence at the current time if the mean square error is greater than a reference standard.
Optionally, the method for detecting abnormal data further includes:
the weighting unit is used for giving a weight corresponding to each category to the index of each category in the operation and maintenance data sequence to obtain the weighted operation and maintenance data sequence;
wherein the first determining unit is configured to:
determining the mean square error between a real value and a predicted value in the weighted operation and maintenance data sequence at the current moment in a target time period;
wherein the second determination unit is configured to:
and if the mean square error is larger than the reference standard, determining that abnormal data exists in the target time period in the weighted operation and maintenance data sequence at the current moment.
Optionally, the method for detecting abnormal data further includes:
the normalization unit is used for performing normalization processing on the actual value and the predicted value in the operation and maintenance data sequence at the current moment to obtain the actual value and the predicted value in the operation and maintenance data sequence at the normalized current moment;
wherein the first determining unit is configured to:
determining the mean square error between a real value and a predicted value in the normalized operation and maintenance data sequence at the current moment in a target time period;
wherein the second determination unit is configured to:
and if the mean square error is larger than the reference standard, determining that abnormal data exists in the target time period in the normalized operation and maintenance data sequence at the current moment.
Optionally, the method for detecting abnormal data further includes:
the calculating unit is used for calculating the historical mean value of each sampling point and the mean value of the historical proportion value according to the historical data in the dynamic baseline;
the third determining unit is used for taking the plus-minus 5-time standard deviation of the historical mean value of the sampling point and the mean value of the historical proportion value as the upper and lower bounds of the mean value;
the second judgment unit is used for judging whether sampling points exceeding the upper and lower boundaries exist within first preset time after the alarm information is generated;
and the confirming unit is used for confirming that the alarm information is valid if the second judging unit judges that the sampling points exceeding the upper and lower boundaries exist within the first preset time.
Optionally, the method for detecting abnormal data further includes:
a fourth determining unit, configured to use, in the operation and maintenance data sequence at the current time, plus or minus 3 times of a standard deviation of a data mean value of a second preset time before the current time as an upper and lower bound of a predicted value;
the third judgment unit is used for judging whether the actual value at the current moment exceeds the upper and lower bounds of the predicted value;
and the third generating unit is used for generating an alarm prompt if the actual value at the current moment exceeds the upper and lower bounds of the predicted value, which is judged by the third judging unit.
A third aspect of the present application provides an electronic device comprising:
one or more processors;
a storage device having one or more programs stored thereon;
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of detecting anomalous data as described in any one of the first aspects.
A fourth aspect of the present application provides a computer storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the method for detecting abnormal data according to any one of the first aspects.
As can be seen from the above aspects, the present application provides a method, an apparatus, an electronic device, and a computer storage medium for detecting abnormal data, where the method for detecting abnormal data includes: continuously acquiring an operation and maintenance data sequence at the current moment; then, for each operation and maintenance data sequence at the current moment, inputting the operation and maintenance data sequence at the current moment into a prediction model, and outputting the prediction value of the next minute of data in the operation and maintenance data sequence at the current moment by using the prediction model; the prediction model is obtained by training a target coding and decoding model through a training sample set; the target coding and decoding model is a coding and decoding model added with an attention mechanism; the training sample set includes: at least one training sample data sequence; generating a dynamic baseline according to the predicted values of the data in the operation and maintenance data sequence at the current moment in the next minute; the dynamic baseline comprises a real value and a predicted value of each minute of the operation and maintenance data sequence; finally, judging whether abnormal data exist in the target time period in the operation and maintenance data sequence at the current moment or not according to the dynamic baseline; and if abnormal data exist in the target time period in the operation and maintenance data sequence at the current moment, generating alarm information. Therefore, the purpose of accurately and efficiently detecting abnormal data is achieved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a specific flowchart of a method for detecting abnormal data according to an embodiment of the present disclosure;
fig. 2 is a flowchart of a method for constructing a prediction model according to another embodiment of the present application;
FIG. 3 is a schematic diagram of a network construction of a prediction model according to another embodiment of the present application;
FIG. 4 is a flowchart of a method for detecting abnormal data according to another embodiment of the present application;
fig. 5 is a flowchart of a method for detecting abnormal data according to another embodiment of the present application;
FIG. 6 is a flowchart of a method for detecting abnormal data according to another embodiment of the present application;
fig. 7 is a schematic diagram of an apparatus for detecting abnormal data according to another embodiment of the present application;
fig. 8 is a schematic view of an electronic device implementing a method for detecting abnormal data according to another embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without making any creative effort belong to the protection scope of the present application.
It should be noted that the terms "first", "second", and the like, referred to in this application, are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence of functions performed by these devices, modules or units, but the terms "include", or any other variation thereof are intended to cover a non-exclusive inclusion, so that a process, method, article, or apparatus that includes a series of elements includes not only those elements but also other elements that are not explicitly listed, or includes elements inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The embodiment of the application provides a method for detecting abnormal data, as shown in fig. 1, specifically including the following steps:
and S101, continuously acquiring the operation and maintenance data sequence at the current moment.
And S102, inputting the operation and maintenance data sequence at the current moment into a prediction model aiming at each operation and maintenance data sequence at the current moment, and outputting the prediction value of the next minute of data in the operation and maintenance data sequence at the current moment by using the prediction model.
The prediction model is obtained by training a target coding and decoding model through a training sample set; the target coding and decoding model is an Encoder-decoder (Encoder-decoder) model added with an Attention (Attention) mechanism; the training sample set includes: at least one training sample data sequence.
It should be noted that the existing Encoder-Decode model inputs a fixed-length time sequence into the Encoder, and outputs a fixed-length vector. Encoder maps input into a potential representation, Decoder decodes the potential representation to restore the original form, and theoretically, the structure of Encoder-Decode is a lossy compression mode and has the function of noise reduction, so that the Encoder-Decode structure has the capability of filtering abnormal points or abnormal sequences.
However, the Encoder-Decode model has certain limitations, the biggest limitation is that the only connection between the Encoder and the decoder is a fixed-length vector, that is, the Encoder needs to compress the information of the whole sequence into a fixed-length vector, obviously, the vector cannot completely represent the information of the whole sequence, and the information carried by the first input content is diluted or covered by the later input information, and the phenomenon becomes more serious as the length of the input sequence is normal. If a segment of the input sequence is obtained in the initial stage of decoding without sufficient information, the accuracy of decoding is naturally reduced.
Therefore, the application introduces an Attention mechanism, and the basic idea of the Attention mechanism is to break through the limitation that the traditional Encoder-Decoder structure depends on an internal fixed-length vector during encoding, when the Encoder encodes an input sentence, a hidden state vector corresponding to each moment in a sequence is reserved, and when the Decode decodes, the information (hidden state vector) is fully utilized, and not only the last hidden state vector output by the Encoder.
The Attention mechanism is a method for selectively focusing on part of information from the existing mass information according to a certain current state. It selectively focuses on considering the corresponding relevant information in the input at the time of model output by retaining the intermediate results of the Encoder encoding the input sequence, then training a model to selectively learn these inputs and associating the output sequence with them at the time of output, instead of using a fixed vector representation as in the conventional Encoder-Decoder.
Optionally, in another embodiment of the present application, an implementation manner of the method for constructing the prediction model, as shown in fig. 2, includes:
s201, constructing a training sample set.
Wherein, training the sample set includes: at least one training sample data sequence; the training sample data sequence is a sampling rate according to the minute level, and N sampling points are used as a training sample data sequence.
S202, inputting the first N-1 sampling points in the training sample data sequence to a target coding and decoding model, and outputting to obtain a predicted value of the Nth sampling point in the training sample data sequence.
S203, judging whether the error between the predicted value of the Nth sampling point in the training sample data sequence and the true value of the Nth sampling point in the training sample data sequence meets a preset convergence condition or not.
Specifically, if it is determined that an error between the predicted value of the nth sampling point in the training sample data sequence and the true value of the nth sampling point in the training sample data sequence satisfies a preset convergence condition, executing step S204; if it is determined that the error between the predicted value of the nth sampling point in the training sample data sequence and the true value of the nth sampling point in the training sample data sequence does not satisfy the preset convergence condition, step S205 is performed.
And S204, determining the target coding and decoding model as a prediction model.
S205, adjusting parameters in the target coding and decoding model.
Fig. 3 is a schematic diagram of a network construction for a prediction model. And illustrated on the basis of fig. 3:
firstly, data of three months of history are selected for training. The total of 1440 x 30 x 3 time-sequenced monitoring index data was obtained at a sampling rate of minutes, i.e., one monitoring index per minute, 1440 monitoring indices per day. Grouping is performed in groups of every 50 points, i.e., a group of 1 st to 50 th points, a group of second to 51 st points, and so on. The first 49 points of each set of data are input into an lstm network, lossy filtering of the data is achieved by adding an Encoder-Decode and a corresponding attention between each layer of the lstm network, and the training effect of the data is enhanced due to data loss. Finally, a predicted value is obtained, and the predicted value is compared with the 50 th actual data of the group to judge the accuracy of prediction. And forming a model through cyclic training.
And S103, generating a dynamic baseline according to the predicted values of the data in the operation and maintenance data sequence at the current moment in the next minute.
Wherein the dynamic baseline comprises the real value and the predicted value of each minute of the operation and maintenance data sequence.
Continuing with the above example, in practical application, the time sequence index data in the current 49 minutes is input for the prediction model, that is, the data of the future 1 minute can be obtained immediately, and the latest data obtained every minute is subjected to real-time uninterrupted prediction, that is, a dynamic baseline composed of the prediction data and the real data is obtained.
And S104, judging whether abnormal data exist in the target time period in the operation and maintenance data sequence at the current moment or not according to the dynamic baseline.
Specifically, if it is determined that abnormal data exists in the operation and maintenance data sequence at the current time within the target time period, step S105 is executed.
And S105, generating alarm information.
Optionally, in another embodiment of the present application, an implementation manner of step S104, as shown in fig. 4, includes:
s401, determining the mean square error between the true value and the predicted value in the operation and maintenance data sequence at the current moment in the target time period.
The target time period may be preset and modified by a technician or a related authorized worker, such as 5 minutes, 3 minutes, 10 minutes, and the like, which is not limited herein.
Specifically, the mean square error between the true value and the predicted value in the operation and maintenance data sequence at the current time can be calculated by using the following calculation formula:
Figure BDA0003540171540000101
wherein x isiIs the true value of the i minute, yiIs the predicted value of the ith minute, and n is the target time period.
S402, if the mean square error is larger than the reference standard, abnormal data exist in the target time period in the operation and maintenance data sequence at the current moment.
The reference standard is the mean square error of data on the training set every 5 minutes in the process of training the prediction model in the previous period, and the maximum mean square error value is selected as the reference standard.
In the actual operation and maintenance process, the tolerance of the upper and lower threshold values of different types of indexes is different, for example, the transaction amount index, when the actual value exceeds the predicted value, although an alarm is generated, the alarm is determined to be a false alarm due to instantaneous rush of transactions, holidays and the like. However, when the transaction amount is lower than the predicted value, it is often to indicate the occurrence of an abnormality, and therefore, in another embodiment of the present application, before performing step S401, the method further includes:
and aiming at the indexes of each category in the operation and maintenance data sequence, giving the indexes of the categories with weights corresponding to the categories to obtain the weighted operation and maintenance data sequence.
The upper and lower weighting coefficients corresponding to different indexes may be shown in table 1.
Type of index Upper weight Lower weight
Amount of transaction 0.6 1
Success rate 0.5 1
Response time 1 0.5
TABLE 1
It should be noted that the weight coefficient may be adjusted according to an actual situation, for example, a system responsible person feels that the transaction amount is not increased by the alarm and belongs to a normal service situation, but if the increase is particularly severe, the alarm is required, then when the predicted value is subtracted from the actual value and is greater than 0, the weight is multiplied by the actual value, for example, the transaction amount is increased by 10000, and finally, the value is calculated only according to 6000, so that the situation is ensured not to be alarmed by mistake.
Then, in a corresponding step S401, a mean square error between a true value and a predicted value in the weighted operation and maintenance data sequence at the current time within the target time period is determined; in step S402, if the mean square error is greater than the reference standard, it is determined that abnormal data exists in the target time period in the weighted operation and maintenance data sequence at the current time.
For the transaction amount index, if the transaction amount per minute is large, the calculated mean square error index has large overall fluctuation, which is not beneficial for data comparison, and therefore, data needs to be scaled between 0 and 1 for normalization, and therefore, in another embodiment of the present application, before executing step S401, the method further includes:
and normalizing the true value and the predicted value in the operation and maintenance data sequence at the current moment to obtain the normalized true value and the normalized predicted value in the operation and maintenance data sequence at the current moment.
Specifically, the following normalization processing method may be adopted:
f(xi)=(xi-min(x))/(max(x)-min(x));
wherein x isiIn order to obtain the true value to be normalized, in consideration of timeliness of an alarm, the scheme requires that an anomaly is detected within a target time period (e.g., 5 minutes), so for time sequence performance data of one sampling point in 1 minute, data of five points, i being 1, 2, 3, 4 and 5, needs to be normalized. max (x) and min (x) are the maximum value and the minimum value in the formula, respectively, of the actual value and the predicted value within 5 minutes. f (x)i) Is the actual value xiAnd (5) normalizing the result.
Then, in a corresponding step S401, the mean square error between the real value and the predicted value in the operation and maintenance data sequence at the normalized current time within the target time period is determined; step S402 is to determine that abnormal data exists in the normalized operation and maintenance data sequence at the current time within the target time period if the mean square error is greater than the reference standard.
Optionally, in another embodiment of the present application, an implementation manner after step S105, as shown in fig. 5, further includes:
s501, according to historical data in the dynamic baseline, calculating the historical average value of each sampling point and the average value of the historical proportion values.
S502, taking the plus and minus 5 times standard deviation of the historical mean value of the sampling points and the mean value of the historical proportion value as the upper and lower bounds of the mean value.
S503, judging whether sampling points exceeding an upper boundary and a lower boundary exist within a first preset time after the alarm information is generated.
Specifically, if it is determined that there are sampling points exceeding the upper and lower bounds within the first preset time, step S504 is executed.
S504, confirming that the alarm information is valid.
Specifically, the alarm range is judged through the historical mean value and the proportion value, whether the alarm information is in a reasonable range is judged again, and a part of false alarms are filtered again.
Under the condition of low data value, if the overall fluctuation of the predicted value and the actual value is not large, false alarm can still be generated. Therefore, in another embodiment of the present application, an implementation manner after step S504, as shown in fig. 6, further includes:
s601, taking plus and minus 3 times standard deviation of a data mean value of second preset time before the current time in the operation and maintenance data sequence at the current time as an upper and lower bound of a predicted value.
S602, judging whether the actual value at the current moment exceeds the upper and lower bounds of the predicted value.
Specifically, if it is determined that the actual value at the current time exceeds the upper and lower bounds of the predicted value, step S603 is executed.
And S603, generating an alarm prompt.
According to the scheme, the application provides a method for detecting abnormal data, which comprises the following steps: continuously acquiring an operation and maintenance data sequence at the current moment; then, for each operation and maintenance data sequence at the current moment, inputting the operation and maintenance data sequence at the current moment into a prediction model, and outputting the prediction value of the next minute of data in the operation and maintenance data sequence at the current moment by using the prediction model; the prediction model is obtained by training a target coding and decoding model through a training sample set; the target coding and decoding model is a coding and decoding model added with a concern mechanism; the training sample set includes: at least one training sample data sequence; generating a dynamic baseline according to the predicted values of the data in the operation and maintenance data sequence at the current moment in the next minute; the dynamic baseline comprises a real value and a predicted value of the operation and maintenance data sequence every minute; finally, judging whether abnormal data exist in the target time period in the operation and maintenance data sequence at the current moment or not according to the dynamic baseline; and if abnormal data exist in the target time period in the operation and maintenance data sequence at the current moment, generating alarm information. Therefore, the purpose of accurately and efficiently detecting abnormal data is achieved.
Another embodiment of the present application provides a device for detecting abnormal data, as shown in fig. 7, specifically including:
an obtaining unit 701, configured to continuously obtain an operation and maintenance data sequence at a current time.
And the prediction unit 702 is configured to, for each operation and maintenance data sequence at the current time, input the operation and maintenance data sequence at the current time into the prediction model, and output the prediction model to obtain a predicted value of the data in the operation and maintenance data sequence at the current time in the next minute.
The prediction model is obtained by training a target coding and decoding model through a training sample set; the target coding and decoding model is a coding and decoding model added with a concern mechanism; the training sample set includes: at least one training sample data sequence.
Optionally, in another embodiment of the present application, an implementation manner of the construction unit of the prediction model includes:
and the construction unit is used for constructing a training sample set.
Wherein, training the sample set includes: at least one training sample data sequence; the training sample data sequence is a sampling rate according to the minute level, and N sampling points are used as a training sample data sequence.
And the input unit is used for inputting the first N-1 sampling points in the training sample data sequence to the target coding and decoding model and outputting to obtain the predicted value of the Nth sampling point in the training sample data sequence.
And the adjusting unit is used for continuously adjusting the parameters in the target coding and decoding model according to the error between the predicted value of the Nth sampling point in the training sample data sequence and the true value of the Nth sampling point in the training sample data sequence until the error between the predicted value of the Nth sampling point in the adjusted training sample data sequence and the true value of the Nth sampling point in the training sample data sequence meets a preset convergence condition, and determining the adjusted target coding and decoding model as a prediction model.
For a specific working process of the unit disclosed in the above embodiment of the present application, reference may be made to the content of the corresponding method embodiment, as shown in fig. 2, which is not described herein again.
The first generating unit 703 is configured to generate a dynamic baseline according to predicted values of the data in the operation and maintenance data sequence at all current times in the next minute.
Wherein the dynamic baseline comprises the real value and the predicted value of each minute of the operation and maintenance data sequence.
A first determining unit 704, configured to determine whether there is abnormal data in the target time period in the operation and maintenance data sequence at the current time according to the dynamic baseline.
Optionally, in another embodiment of the present application, an implementation manner of the first determining unit 704 includes:
and the first determining unit is used for determining the mean square error between the real value and the predicted value in the operation and maintenance data sequence at the current moment in the target time period.
And the second determining unit is used for determining that abnormal data exists in the target time period in the operation and maintenance data sequence at the current moment if the mean square error is larger than the reference standard.
For a specific working process of the unit disclosed in the above embodiment of the present application, reference may be made to the content of the corresponding method embodiment, as shown in fig. 4, which is not described herein again.
Optionally, in another embodiment of the present application, an implementation manner of the apparatus for detecting abnormal data further includes:
the weighting unit is used for giving a weight corresponding to each category to the index of each category in the operation and maintenance data sequence to obtain the weighted operation and maintenance data sequence;
wherein the first determining unit is configured to:
determining the mean square error between a real value and a predicted value in the weighted operation and maintenance data sequence at the current moment in a target time period;
wherein the second determination unit is configured to:
and if the mean square error is larger than the reference standard, determining that abnormal data exists in the target time period in the operation and maintenance data sequence weighted at the current moment.
For specific working processes of the units disclosed in the above embodiments of the present application, reference may be made to corresponding method embodiments. And will not be described in detail herein.
Optionally, in another embodiment of the present application, an implementation manner of the apparatus for detecting abnormal data further includes:
the normalization unit is used for performing normalization processing on the actual value and the predicted value in the operation and maintenance data sequence at the current moment to obtain the actual value and the predicted value in the operation and maintenance data sequence at the normalized current moment;
wherein the first determining unit is configured to:
determining the mean square error between a real value and a predicted value in the normalized operation and maintenance data sequence at the current moment in a target time period;
wherein the second determination unit is configured to:
and if the mean square error is larger than the reference standard, determining that abnormal data exists in the target time period in the normalized operation and maintenance data sequence at the current moment.
For specific working processes of the units disclosed in the above embodiments of the present application, reference may be made to corresponding method embodiments. And will not be described in detail herein.
The second generating unit 705 is configured to generate alarm information if the first determining unit 704 determines that abnormal data exists in the target time period in the operation and maintenance data sequence at the current time.
For a specific working process of the unit disclosed in the above embodiment of the present application, reference may be made to the content of the corresponding method embodiment, as shown in fig. 1, which is not described herein again.
Optionally, in another embodiment of the present application, an implementation manner of the apparatus for detecting abnormal data further includes:
and the calculating unit is used for calculating the historical mean value of each sampling point and the mean value of the historical proportion value according to the historical data in the dynamic baseline.
And the third determining unit is used for taking the positive and negative 5-time standard deviation of the historical mean value of the sampling point and the mean value of the historical proportion value as the upper and lower bounds of the mean value.
And the second judging unit is used for judging whether sampling points exceeding upper and lower bounds exist within the first preset time after the alarm information is generated.
And the confirming unit is used for confirming that the alarm information is effective if the second judging unit judges that the sampling points exceeding the upper and lower bounds exist within the first preset time.
For a specific working process of the unit disclosed in the above embodiment of the present application, reference may be made to the content of the corresponding method embodiment, as shown in fig. 5, which is not described herein again.
Optionally, in another embodiment of the present application, an implementation manner of the apparatus for detecting abnormal data further includes:
and the fourth determining unit is used for taking the plus and minus 3 times standard deviation of the data mean value of the second preset time before the current time in the operation and maintenance data sequence at the current time as the upper and lower bounds of the predicted value.
And the third judgment unit is used for judging whether the actual value at the current moment exceeds the upper and lower bounds of the predicted value.
And the third generating unit is used for generating an alarm prompt if the third judging unit judges that the actual value at the current moment exceeds the upper and lower bounds of the predicted value.
For a specific working process of the unit disclosed in the above embodiment of the present application, reference may be made to the content of the corresponding method embodiment, as shown in fig. 6, which is not described herein again.
As can be seen from the above, the present application provides an apparatus for detecting abnormal data: the obtaining unit 701 continuously obtains an operation and maintenance data sequence at the current moment; then, the prediction unit 702 inputs the operation and maintenance data sequence at the current time to the prediction model for each operation and maintenance data sequence at the current time, and the prediction model outputs the operation and maintenance data sequence at the current time to obtain a predicted value of the data in the operation and maintenance data sequence at the current time in the next minute; the prediction model is obtained by training a target coding and decoding model through a training sample set; the target coding and decoding model is a coding and decoding model added with a concern mechanism; the training sample set includes: at least one training sample data sequence; the first generating unit 703 generates a dynamic baseline according to the predicted values of the data in the operation and maintenance data sequence at all the current time in the next minute; the dynamic baseline comprises a real value and a predicted value of the operation and maintenance data sequence every minute; finally, the first determining unit 704 determines whether abnormal data exists in the target time period in the operation and maintenance data sequence at the current time according to the dynamic baseline; if the first determining unit 704 determines that abnormal data exists in the operation and maintenance data sequence at the current time within the target time period, the second generating unit 705 generates the warning information. Therefore, the purpose of accurately and efficiently detecting abnormal data is achieved.
Another embodiment of the present application provides an electronic device, as shown in fig. 8, including:
one or more processors 801.
A storage device 802 on which one or more programs are stored.
The one or more programs, when executed by the one or more processors 801, cause the one or more processors 801 to implement a method of detecting anomalous data as described in any of the above embodiments.
Another embodiment of the present application provides a computer storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the method for detecting abnormal data according to any one of the above embodiments.
In the above embodiments disclosed in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The apparatus and method embodiments described above are illustrative only, as the flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, functional modules in the embodiments of the present disclosure may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part. The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present disclosure may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a live broadcast device, or a network device) to execute all or part of the steps of the method according to the embodiments of the present disclosure. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
Those skilled in the art can make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A method for detecting anomalous data, comprising:
continuously acquiring an operation and maintenance data sequence at the current moment;
for each operation and maintenance data sequence at the current moment, inputting the operation and maintenance data sequence at the current moment into a prediction model, and outputting the prediction model to obtain a predicted value of data in the operation and maintenance data sequence at the current moment in the next minute; the prediction model is obtained by training a target coding and decoding model through a training sample set; the target coding and decoding model is a coding and decoding model added with an attention mechanism; the training sample set comprises: at least one training sample data sequence;
generating a dynamic baseline according to the predicted values of the data in the operation and maintenance data sequence at the current moment in the next minute; wherein the dynamic baseline comprises a real value and a predicted value of the operation and maintenance data sequence every minute;
judging whether abnormal data exist in the operation and maintenance data sequence at the current moment in a target time period or not according to the dynamic baseline;
and if abnormal data exist in the target time period in the operation and maintenance data sequence at the current moment, generating alarm information.
2. The detection method according to claim 1, wherein the construction method of the prediction model comprises:
constructing a training sample set; wherein the training sample set comprises: at least one training sample data sequence; the training sample data sequence is a sampling rate according to the minute level, and N sampling points are used as a training sample data sequence;
inputting the first N-1 sampling points in the training sample data sequence to a target coding and decoding model, and outputting to obtain a predicted value of the Nth sampling point in the training sample data sequence;
and continuously adjusting parameters in the target coding and decoding model according to the error between the predicted value of the Nth sampling point in the training sample data sequence and the true value of the Nth sampling point in the training sample data sequence until the error between the predicted value of the Nth sampling point in the adjusted training sample data sequence and the true value of the Nth sampling point in the training sample data sequence meets a preset convergence condition, and determining the adjusted target coding and decoding model as a prediction model.
3. The detection method according to claim 1, wherein the determining whether abnormal data exists in the operation and maintenance data sequence at the current time within a target time period according to the dynamic baseline includes:
determining the mean square error between a true value and a predicted value in the operation and maintenance data sequence at the current moment in a target time period;
and if the mean square error is larger than a reference standard, determining that abnormal data exists in the target time period in the operation and maintenance data sequence at the current moment.
4. The detection method according to claim 3, wherein before determining the mean square error between the real value and the predicted value in the operation and maintenance data sequence at the current time within the target time period, the method further comprises:
aiming at the indexes of each category in the operation and maintenance data sequence, giving the indexes of the categories with weights corresponding to the categories to obtain a weighted operation and maintenance data sequence;
the determining of the mean square error between the real value and the predicted value in the operation and maintenance data sequence at the current time in the target time period includes:
determining the mean square error between a real value and a predicted value in the weighted operation and maintenance data sequence at the current moment in a target time period;
if the mean square error is greater than a reference standard, determining whether abnormal data exists in the operation and maintenance data sequence at the current moment in a target time period includes:
and if the mean square error is larger than the reference standard, determining that abnormal data exists in the target time period in the weighted operation and maintenance data sequence at the current moment.
5. The detection method according to claim 3, wherein before determining the mean square error between the real value and the predicted value in the operation and maintenance data sequence at the current time within the target time period, the method further comprises:
normalizing the real value and the predicted value in the operation and maintenance data sequence at the current moment to obtain the normalized real value and the normalized predicted value in the operation and maintenance data sequence at the current moment;
the determining of the mean square error between the real value and the predicted value in the operation and maintenance data sequence at the current time in the target time period includes:
determining the mean square error between a real value and a predicted value in the normalized operation and maintenance data sequence at the current moment in a target time period;
if the mean square error is greater than a reference standard, determining whether abnormal data exists in the operation and maintenance data sequence at the current moment in a target time period includes:
and if the mean square error is larger than the reference standard, determining that abnormal data exists in the target time period in the normalized operation and maintenance data sequence at the current moment.
6. The detection method according to claim 1, wherein after generating the alarm information if it is determined that abnormal data exists in the target time period in the operation and maintenance data sequence at the current time, the method further comprises:
according to historical data in the dynamic baseline, calculating the historical average value of each sampling point and the average value of the historical proportional values;
taking the plus and minus 5 times standard deviation of the historical mean value of the sampling point and the mean value of the historical proportion value as the upper and lower bounds of the mean value;
judging whether sampling points exceeding the upper and lower boundaries exist within first preset time after the alarm information is generated;
and if the sampling points exceeding the upper and lower limits are judged within the first preset time, the alarm information is confirmed to be effective.
7. The detecting method according to claim 6, wherein after determining that the alarm information is valid if the sampling points exceeding the upper and lower bounds are determined within the preset time, the method further comprises:
taking plus and minus 3 times of standard deviation of a data mean value of second preset time before the current time in the operation and maintenance data sequence at the current time as an upper boundary and a lower boundary of a predicted value;
judging whether the actual value at the current moment exceeds the upper and lower bounds of the predicted value;
and if the actual value of the current moment exceeds the upper and lower bounds of the predicted value, generating an alarm prompt.
8. An apparatus for detecting abnormal data, comprising:
the acquisition unit is used for continuously acquiring the operation and maintenance data sequence at the current moment;
the prediction unit is used for inputting the operation and maintenance data sequence at the current moment into a prediction model aiming at each operation and maintenance data sequence at the current moment, and outputting the prediction value of the next minute of data in the operation and maintenance data sequence at the current moment by the prediction model; the prediction model is obtained by training a target coding and decoding model through a training sample set; the target coding and decoding model is a coding and decoding model added with an attention mechanism; the training sample set includes: at least one training sample data sequence;
the first generation unit is used for generating a dynamic baseline according to the predicted values of the data in the operation and maintenance data sequence at the current moment in the next minute; wherein the dynamic baseline comprises a real value and a predicted value of the operation and maintenance data sequence every minute;
the first judgment unit is used for judging whether abnormal data exist in a target time period in the operation and maintenance data sequence at the current moment or not according to the dynamic baseline;
and the second generating unit is used for generating alarm information if the first judging unit judges that abnormal data exists in the target time period in the operation and maintenance data sequence at the current moment.
9. An electronic device, comprising:
one or more processors;
a storage device having one or more programs stored thereon;
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement a method of detection of anomalous data as claimed in any one of claims 1 to 7.
10. A computer storage medium, having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the method of detecting anomalous data as in any one of claims 1 to 7.
CN202210230052.2A 2022-03-10 2022-03-10 Abnormal data detection method and device, electronic equipment and computer storage medium Pending CN114595134A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210230052.2A CN114595134A (en) 2022-03-10 2022-03-10 Abnormal data detection method and device, electronic equipment and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210230052.2A CN114595134A (en) 2022-03-10 2022-03-10 Abnormal data detection method and device, electronic equipment and computer storage medium

Publications (1)

Publication Number Publication Date
CN114595134A true CN114595134A (en) 2022-06-07

Family

ID=81809343

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210230052.2A Pending CN114595134A (en) 2022-03-10 2022-03-10 Abnormal data detection method and device, electronic equipment and computer storage medium

Country Status (1)

Country Link
CN (1) CN114595134A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116074215A (en) * 2022-12-30 2023-05-05 中国联合网络通信集团有限公司 Network quality detection method, device, equipment and storage medium
CN116486313A (en) * 2023-06-25 2023-07-25 安元科技股份有限公司 Video analysis system and method suitable for scenes

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116074215A (en) * 2022-12-30 2023-05-05 中国联合网络通信集团有限公司 Network quality detection method, device, equipment and storage medium
CN116074215B (en) * 2022-12-30 2024-04-19 中国联合网络通信集团有限公司 Network quality detection method, device, equipment and storage medium
CN116486313A (en) * 2023-06-25 2023-07-25 安元科技股份有限公司 Video analysis system and method suitable for scenes
CN116486313B (en) * 2023-06-25 2023-08-29 安元科技股份有限公司 Video analysis system and method suitable for scenes

Similar Documents

Publication Publication Date Title
CN109815084B (en) Abnormity identification method and device, electronic equipment and storage medium
CN114595134A (en) Abnormal data detection method and device, electronic equipment and computer storage medium
US11169514B2 (en) Unsupervised anomaly detection, diagnosis, and correction in multivariate time series data
CN109325692B (en) Real-time data analysis method and device for water pipe network
CN110766059A (en) Transformer fault prediction method, device and equipment
CN107911396A (en) Log in method for detecting abnormality and system
CN111030992B (en) Detection method, server and computer readable storage medium
CN112039903A (en) Network security situation assessment method based on deep self-coding neural network model
CN110852906B (en) Method and system for identifying electricity stealing suspicion based on high-dimensional random matrix
CN116990479A (en) Water quality monitoring method, system, equipment and medium based on Zigbee technology
CN116677569A (en) Fan tower stress monitoring and early warning method and system
CN113408722B (en) Situation assessment factor extraction method based on layer-by-layer loss compensation depth self-encoder
CN113591078B (en) Industrial control intrusion detection system and method based on convolutional neural network architecture optimization
CN117097541A (en) API service attack detection method, device, equipment and storage medium
CN116362593A (en) Construction method, evaluation method and device of river and lake ecological safety evaluation model
CN116126807A (en) Log analysis method and related device
CN116011281A (en) Method and device for analyzing seismic vulnerability of electric power facility
CN115688961A (en) Power equipment fault prediction method and system based on deep learning
CN115345343A (en) Method and device for predicting turbidity of water supply pipe network
CN113347014B (en) Industrial control system situation combination prediction method based on time sequence
CN114915845A (en) System and method for predicting IPTV user declaration
KR102137201B1 (en) Apparatus for evaluating the performance of an operating program before the construction of a water treatment system and apparatus for generating a virtual signal in the water treatment system
Qi et al. A combined prediction method of industrial internet security situation based on time series
CN116703207A (en) Thermal power plant safety monitoring method and system based on artificial intelligence
CN116962080B (en) Alarm filtering method, system and medium based on network node risk assessment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination