CN114020811A - Data anomaly detection method and device and electronic equipment - Google Patents

Data anomaly detection method and device and electronic equipment Download PDF

Info

Publication number
CN114020811A
CN114020811A CN202111321740.1A CN202111321740A CN114020811A CN 114020811 A CN114020811 A CN 114020811A CN 202111321740 A CN202111321740 A CN 202111321740A CN 114020811 A CN114020811 A CN 114020811A
Authority
CN
China
Prior art keywords
data
detection
sample
neighborhood
distance
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111321740.1A
Other languages
Chinese (zh)
Inventor
张为欢
王培君
管虹翔
梁广会
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202111321740.1A priority Critical patent/CN114020811A/en
Publication of CN114020811A publication Critical patent/CN114020811A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Probability & Statistics with Applications (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Fuzzy Systems (AREA)
  • Testing And Monitoring For Control Systems (AREA)

Abstract

The invention discloses a data anomaly detection method and device and electronic equipment. The detection method comprises the following steps: receiving product operation data, and sending the product operation data to a target park, wherein the target park is accessed with an abnormal detection system, a data detection model runs in the abnormal detection system, the data detection model excavates sample data causing a loss value rate to be larger than a preset probability threshold value in a model training process, and retrains the sample data again, the loss value rate is used for indicating a ratio of sample data quantity of model classification errors to total data quantity, the abnormal detection system is adopted to analyze the product operation data to obtain a detection result, and abnormal data in the detection result is sent to an alarm system. The invention solves the technical problem of lower detection accuracy of the anomaly detection system in the related technology.

Description

Data anomaly detection method and device and electronic equipment
Technical Field
The invention relates to the technical field of data processing, in particular to a data anomaly detection method and device and electronic equipment.
Background
In the financial technology industry, most enterprises/companies begin to build alarm systems based on artificial intelligence algorithms for identifying abnormal data so that relevant personnel can quickly process the data. In the related technology, a mature artificial intelligence algorithm is introduced, and a model obtained based on historical data training is designed to detect abnormal data in production and further send an alarm. However, the existing model is obtained by training and iterating the off-line data of the previous period, and with the rapid development of the financial business field, more positive and negative samples which are difficult to be distinguished exist in various training samples, so that the actual detection rate of the existing model is reduced, and the detection accuracy rate is reduced.
In view of the above problems, no effective solution has been proposed.
Disclosure of Invention
The embodiment of the invention provides a data anomaly detection method, a device and electronic equipment thereof, which at least solve the technical problem of low detection accuracy of an anomaly detection system in the related technology.
According to an aspect of an embodiment of the present invention, there is provided a data anomaly detection method, including: receiving product operation data; sending the product operation data to a target park, wherein the target park is accessed with an anomaly detection system, a data detection model is operated in the anomaly detection system, the data detection model excavates sample data causing a loss value rate to be larger than a preset probability threshold value in a model training process, and retrains the sample data, and the loss value rate is used for indicating a ratio of sample data volume with wrong model classification to total data volume; analyzing the product operation data by adopting the anomaly detection system to obtain a detection result; and sending abnormal data in the detection result to an alarm system.
Optionally, before receiving the product operation data, the detection method further includes: obtaining product operation data in a first preset time period in a historical process to obtain historical sample data; receiving product error data of product operation data input by external terminal equipment, wherein the product error data indicate sample data which causes a loss value rate to be greater than a preset probability threshold value in a model training process; and inputting the historical sample data and the product error data into a data detection model so as to carry out iterative training on the data detection model.
Optionally, the step of inputting the historical sample data and the product error data into a data detection model to perform iterative training on the data detection model includes: performing anomaly detection on the historical sample data by adopting a preset normal distribution algorithm to obtain a negative sample set and a positive sample set; and analyzing the sample data in the positive sample set by adopting a local abnormal factor algorithm to determine wrong data in the positive sample set and finish model training.
Optionally, the step of performing anomaly detection on the historical sample data by using a preset normal distribution algorithm to obtain a negative sample set and a positive sample set includes: taking each index to be detected as a reference, extracting sample data corresponding to each index to be detected in the historical sample data; calculating the sample mean value of all the extracted sample data to obtain an index detection mean value; performing sample variance calculation on all the extracted sample data to obtain an index detection variance value; calculating a normal distribution area based on the index detection mean value and the index detection variance value; and classifying the historical sample data in the normal distribution area into a positive sample set, and classifying the historical sample data which does not fall in the normal distribution area into the negative sample set.
Optionally, the step of analyzing the sample data in the positive sample set by using a local abnormal factor algorithm to determine the error data in the positive sample set includes: combining the index data corresponding to each index to be detected and the corresponding data time in the historical sample data to obtain a plurality of time sequence data; determining surrounding neighborhood points in the target time sequence data by taking each time sequence data as a center to obtain a neighborhood set; calculating the reachable distance of other neighborhood points in the neighborhood set and the local reachable density of the target time sequence data; calculating the density mean value of the local reachable densities of all other neighborhood points in the neighborhood set, and calculating the density ratio between the density mean value and the local reachable density of the target time sequence data; and if the density ratio is larger than a preset ratio threshold, determining that the target time sequence data is wrong data in the positive sample set.
Optionally, the step of determining a neighborhood point around the target time series data by taking each time series data as a center to obtain a neighborhood set includes: acquiring a vector distance value between the target time sequence data and other time sequence data; sequencing all vector distance values to obtain a sequencing result; and classifying the time sequence data with the vector distance value smaller than or equal to a preset distance threshold value into the neighborhood set based on the sorting result.
Optionally, the step of calculating the reachable distance of other neighborhood points in the neighborhood set and the local reachable density of the target time series data includes: if the actual distance between other neighborhood points in the neighborhood set and the target time sequence data is less than or equal to k distance points, determining the reachable distance between the other neighborhood points and the target time sequence data to be a preset distance threshold; if the actual distance between other neighborhood points in the neighborhood set and the target time sequence data is greater than k distance points, determining the reachable distance between the other neighborhood points and the target time sequence data as the actual distance; and calculating the reciprocal of the reachable distance between the target time sequence data and other neighborhood points in the neighborhood set to obtain the local reachable density of the target time sequence data.
Optionally, after sending the abnormal data in the detection result to the alarm system, the detection method further includes: receiving a data confirmation result fed back by the alarm system; if the data confirmation result indicates that the detection result is real, classifying the abnormal data into a positive sample library; and if the data confirmation result indicates that the detection result is false, classifying the abnormal data into a negative sample library.
According to another aspect of the embodiments of the present invention, there is also provided a data anomaly detection apparatus, including: the receiving unit is used for receiving product operation data; the system comprises a first sending unit, a second sending unit and a third sending unit, wherein the first sending unit is used for sending the product operation data to a target park, the target park is accessed to an abnormity detection system, a data detection model runs in the abnormity detection system, the data detection model excavates sample data causing a loss value rate to be larger than a preset probability threshold value in a model training process, and retrains the sample data, and the loss value rate is used for indicating a ratio of sample data volume with wrong model classification to total data volume; the analysis unit is used for analyzing the product operation data by adopting the anomaly detection system to obtain a detection result; and the second sending unit is used for sending the abnormal data in the detection result to the alarm system.
Optionally, the detection apparatus further comprises: the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring product operation data in a first preset time period in a historical process before receiving the product operation data to obtain historical sample data; the system comprises a first receiving module, a second receiving module and a third receiving module, wherein the first receiving module is used for receiving product error data of product operation data input by external terminal equipment, and the product error data indicate sample data which causes a loss value rate to be larger than a preset probability threshold value in a model training process; and the first training module is used for inputting the historical sample data and the product error data into a data detection model so as to carry out iterative training on the data detection model.
Optionally, the first training module comprises: the first detection submodule is used for carrying out abnormal detection on the historical sample data by adopting a preset normal distribution algorithm to obtain a negative sample set and a positive sample set; and the first analysis submodule is used for analyzing the sample data in the positive sample set by adopting a local abnormal factor algorithm so as to determine the error data in the positive sample set and finish model training.
Optionally, the first detection submodule includes: the first extraction submodule is used for extracting sample data corresponding to each index to be detected from the historical sample data by taking each index to be detected as a reference; the first calculation submodule is used for calculating the sample mean value of all the extracted sample data to obtain an index detection mean value; the second calculation submodule is used for performing sample variance calculation on all the extracted sample data to obtain an index detection variance value; the third calculation submodule is used for calculating a normal distribution area based on the index detection mean value and the index detection variance value; and the first classification submodule is used for classifying the historical sample data in the normal distribution area into a positive sample set, and classifying the historical sample data which does not fall in the normal distribution area into the negative sample set.
Optionally, the first analysis sub-module comprises: the first combination sub-module is used for combining the index data corresponding to each index to be detected and the corresponding data time in the historical sample data to obtain a plurality of time sequence data; the first determining submodule is used for determining surrounding neighborhood points of the target time sequence data by taking each time sequence data as a center to obtain a neighborhood set; the fourth calculation submodule is used for calculating the reachable distance of other neighborhood points in the neighborhood set and the local reachable density of the target time sequence data; the fifth calculation submodule is used for calculating the density mean value of the local reachable densities of all other neighborhood points in the neighborhood set and calculating the density ratio between the density mean value and the local reachable density of the target time sequence data; and the second determining submodule is used for determining that the target time sequence data is wrong data in the positive sample set if the density ratio is larger than a preset ratio threshold.
Optionally, the first determining sub-module includes: the first acquisition submodule is used for acquiring vector distance values between the target time sequence data and other time sequence data; the first sequencing submodule is used for sequencing all the vector distance values to obtain a sequencing result; and the second classification submodule is used for classifying the time sequence data of which the vector distance value is less than or equal to a preset distance threshold value into the neighborhood set based on the sequencing result.
Optionally, the fourth computing submodule includes: a third determining submodule, configured to determine, if an actual distance between another neighborhood point in the neighborhood set and the target time series data is less than or equal to a k distance point, that an reachable distance between the another neighborhood point and the target time series data is a preset distance threshold; a fourth determining submodule, configured to determine, if an actual distance between another neighboring point in the neighboring set and the target timing data is greater than a k-distance point, an reachable distance between the another neighboring point and the target timing data to be the actual distance; and the sixth calculating submodule is used for calculating the reciprocal of the reachable distance between the target time sequence data and other neighborhood points in the neighborhood set to obtain the local reachable density of the target time sequence data.
Optionally, the detection apparatus further comprises: after the abnormal data in the detection result is sent to the alarm system, the second receiving module is used for receiving a data confirmation result fed back by the alarm system; the first classification module is used for classifying the abnormal data into a positive sample library if the data confirmation result indicates that the detection result is real; and the second classification module is used for classifying the abnormal data into a negative sample library if the data confirmation result indicates that the detection result is false.
According to another aspect of the embodiments of the present invention, there is also provided an electronic device, including: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to perform any of the data anomaly detection methods described above via execution of the executable instructions.
According to another aspect of the embodiments of the present invention, there is also provided a computer-readable storage medium, where the computer-readable storage medium includes a stored computer program, and when the computer program runs, the apparatus where the computer-readable storage medium is located is controlled to execute any one of the above data anomaly detection methods.
In the method, product operation data are received, the product operation data are sent to a target park, an abnormal detection system is connected to the target park, a data detection model operates in the abnormal detection system, the data detection model excavates sample data causing a loss value rate to be larger than a preset probability threshold value in a model training process, the sample data are retrained, the loss value rate is used for indicating the ratio of the sample data volume with wrong model classification to the total data volume, the abnormal detection system is adopted to analyze the product operation data, a detection result is obtained, and the abnormal data in the detection result are sent to an alarm system. In the application, the detection rate of the abnormal detection system can be improved by carrying out difficult mining on the real-time product operation data (namely mining some sample data with a large loss value in the model training process) and retraining the data, so that the abnormal data can be found more accurately in real time, and the technical problem of low detection accuracy of the abnormal detection system in the related technology is solved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
FIG. 1 is a flow diagram of an alternative data anomaly detection method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of an alternative method for increasing the accuracy of an artificial intelligence alarm system in accordance with embodiments of the present invention;
FIG. 3 is a flow diagram of an alternative hard case mining according to an embodiment of the invention;
fig. 4 is a schematic diagram of an alternative data anomaly detection apparatus according to an embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
To facilitate understanding of the invention by those skilled in the art, some terms or nouns referred to in the embodiments of the invention are explained below:
difficult excavation: it is referred to mining some sample data that causes a high loss value rate in the model training process (i.e. sample data that makes the model classify errors with a high probability), and retraining the sample data.
Hard-to-classify positive (negative) samples: the positive (negative) samples are misclassified as negative (positive) samples, with the highest loss of positive (negative) samples in the training process.
LOF algorithm: local Outlier Factor, a Local anomaly Factor algorithm, compares the Local density of a given data point and its neighborhood points, and these data point samples are considered anomalous data samples when their Local density is significantly lower than the Local density of the neighborhood points.
The following embodiments of the invention can be applied to various systems and applications for detecting abnormal data or scenes needing to detect abnormal data, the related data can be data operated in actual production, for example, data related to various financial products (for example, data of fund products, data related to bond products) and the like.
Example one
In accordance with an embodiment of the present invention, there is provided a data anomaly detection method embodiment, it is noted that the steps illustrated in the flowchart of the drawings may be performed in a computer system such as a set of computer-executable instructions, and that while a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than here.
Fig. 1 is a flow chart of an alternative data anomaly detection method according to an embodiment of the present invention, as shown in fig. 1, the method includes the following steps:
step S102, product operation data is received.
And step S104, sending the product operation data to a target park, wherein the target park is accessed with an anomaly detection system, a data detection model is operated in the anomaly detection system, the data detection model excavates sample data causing a loss value rate to be larger than a preset probability threshold value in the model training process, and retrains the sample data, and the loss value rate is used for indicating the ratio of the sample data quantity with wrong model classification to the total data quantity.
And S106, analyzing the product operation data by adopting an anomaly detection system to obtain a detection result.
And step S108, sending the abnormal data in the detection result to an alarm system.
Through the steps, the product operation data can be received, the product operation data are sent to the target park, the target park is connected with an abnormity detection system, a data detection model operates in the abnormity detection system, the data detection model excavates sample data causing loss value rate larger than a preset probability threshold value in a model training process, and retrains the sample data, the loss value rate is used for indicating the ratio of the sample data volume with wrong model classification to the total data volume, the abnormity detection system is adopted to analyze the product operation data to obtain a detection result, and the abnormity data in the detection result are sent to an alarm system. In the embodiment of the invention, the detection rate of the anomaly detection system can be improved by carrying out hard mining on the real-time product operation data (adopting a data detection model operated in the anomaly detection system to carry out anomaly detection, and mining some sample data with large loss value in the training process of the model) and retraining the data, so that the anomaly data can be found more accurately in real time, and the technical problem of lower detection accuracy of the anomaly detection system in the related technology is solved.
The following will explain the embodiments of the present invention in detail with reference to the above steps.
In an embodiment of the present invention, before receiving the product operation data, the detection method further includes: obtaining product operation data in a first preset time period in a historical process to obtain historical sample data; receiving product error data of product operation data input by external terminal equipment, wherein the product error data indicate sample data which causes a loss value rate to be greater than a preset probability threshold value in a model training process; and inputting the historical sample data and the product error data into the data detection model so as to carry out iterative training on the data detection model.
In the embodiment of the present invention, historical sample data (i.e., product operation data in a first preset time period in a historical process) and difficultly-differentiated sample data (i.e., product error data, which is sample data causing a loss value rate greater than a preset probability threshold (e.g., sixty percent, where the preset probability threshold is determined by an actual situation and is not limited herein) obtained by difficultly mining real-time data in production) may be input to a model (i.e., a data detection model) for iterative training.
Optionally, the step of inputting the historical sample data and the product error data into the data detection model to perform iterative training on the data detection model includes: performing anomaly detection on historical sample data by adopting a preset normal distribution algorithm to obtain a negative sample set and a positive sample set; and analyzing the sample data in the positive sample set by adopting a local abnormal factor algorithm to determine wrong data in the positive sample set and finish model training.
In the embodiment of the invention, historical sample data can be mined twice through an algorithm to determine product error data and complete model training, and the 3-sigma algorithm (namely a preset normal distribution algorithm) can be used for carrying out anomaly detection on the historical sample data (the historical sample data is required to accord with normal distribution, if the historical sample data does not accord with the normal distribution, log calculation can be used for converting the historical sample data into normal distribution), so that a negative sample set and a positive sample set are obtained, and then an LOF algorithm (namely a local anomaly factor algorithm) is used for detecting the sample data in the positive sample set so as to determine the error data and complete the model training.
Optionally, the step of performing anomaly detection on the historical sample data by using a preset normal distribution algorithm to obtain a negative sample set and a positive sample set includes: taking each index to be detected as a reference, extracting sample data corresponding to each index to be detected in historical sample data; calculating the sample mean value of all the extracted sample data to obtain an index detection mean value; performing sample variance calculation on all the extracted sample data to obtain an index detection variance value; calculating a normal distribution area based on the index detection mean value and the index detection variance value; and classifying the historical sample data in the normal distribution area into a positive sample set, and classifying the historical sample data which does not fall in the normal distribution area into a negative sample set.
In the embodiment of the present invention, a 3-sigma algorithm (i.e., a preset normal distribution algorithm) is used to perform anomaly detection on historical sample data, so as to perform sample mean calculation on the sample data of an index to be detected (i.e., an index to be detected, such as success rate, transaction time consumption, and the like) to obtain an index detection mean value μ, then perform sample variance calculation to obtain an index detection variance value σ, determine that the sample data is not within a 3-sigma range (i.e., is not within a range of a normal distribution region, such as (μ -3 σ, μ +3 σ)) as a negative sample set, and determine that the sample data is within the 3-sigma range (i.e., is within the range of the normal distribution region, such as (μ -3 σ, μ +3 σ)) as a positive sample set.
Optionally, the step of analyzing the sample data in the positive sample set by using a local abnormal factor algorithm to determine the misclassified data in the positive sample set includes: combining the index data corresponding to each index to be detected and the corresponding data time in the historical sample data to obtain a plurality of time sequence data; determining surrounding neighborhood points in the target time sequence data by taking each time sequence data as a center to obtain a neighborhood set; calculating the reachable distance of other neighborhood points in the neighborhood set and the local reachable density of the target time sequence data; calculating the density mean value of the local reachable density of all other neighborhood points in the neighborhood set, and calculating the density ratio between the density mean value and the local reachable density of the target time sequence data; and if the density ratio is larger than a preset ratio threshold, determining that the target time sequence data is wrong data in the positive sample set.
In the embodiment of the present invention, an LOF algorithm may be used to detect sample data in a positive sample set, where two-dimensional data (i.e., time series data) needs to be used, index data (e.g., success rate) corresponding to each index to be detected and corresponding data time may be combined (e.g., time points at which success rate and transaction time consumption occur are combined into time series two-dimensional data), so as to obtain time series data, and with each time series data as a center, a neighborhood point (e.g., a set point O is an inside neighborhood point) around a target time series data (which may be set as a point P) is determined, so as to obtain a neighborhood set (which may be set as N)k(P) representing a kth distance neighborhood, i.e. all points having a distance to point P less than or equal to K distance), then calculating the reachable distances of other neighborhood points in the neighborhood set (e.g. the kth reachable distance of point O to point P, being the kth distance of point O, or the actual distance between point O and point P) and the local reachable density of the target time series data (i.e. the inverse of the reachable distance of point P to other neighborhood points), calculating the density ratio between the density mean of the local reachable densities of all other neighborhood points in the neighborhood set and the local reachable density of the target time series data (i.e. by the formula, i.e. the mean of the local reachable densities of points in the neighborhood divided by the local reachable density of point P).
Figure BDA0003345547460000091
Where ρ isk(O) represents the local achievable density of the point O, pk(P) represents the local achievable density of point P,|Nk(P) | represents the number of points in the neighborhood, LOFk(P) represents a density ratio.
If the density ratio is greater than a preset ratio threshold (for example, 1), determining that the target time sequence data is wrong fraction data in the positive sample set, if the density ratio is close to the preset ratio threshold, it indicates that the density of the neighborhood points of the point P is almost the same, the point P may be in the same cluster as the neighborhood, and if the density ratio is less than the preset ratio threshold, it indicates that the density of the point P is higher than the density of the neighborhood points, which indicates that the point P is a dense point.
Optionally, the step of determining, with each time series data as a center, neighborhood points around the target time series data to obtain a neighborhood set includes: acquiring a vector distance value between target time sequence data and other time sequence data; sequencing all vector distance values to obtain a sequencing result; and classifying the time sequence data with the vector distance value smaller than or equal to a preset distance threshold value into a neighborhood set based on the sorting result.
In the embodiment of the present invention, vector distance values between the target time series data and other time series data may be calculated, the vector distance values are sorted to obtain a sorting result, for example, a kth distance of the target time series data (for example, point P) is calculated, distances between the point P and other points (for example, other time series data) are sorted from small to large, the kth distance is the K distance, and then time series data having a vector distance value smaller than or equal to a preset distance threshold (if a K distance neighborhood is calculated, the preset distance threshold is the kth distance) are sorted into a neighborhood set, for example, a K distance neighborhood is calculated, that is, a neighborhood set composed of all points having a distance to the point P smaller than or equal to the K distance is recorded as Nk(P)。
Optionally, the step of calculating the reachable distance of other neighborhood points in the neighborhood set and the local reachable density of the target time series data includes: if the actual distance between other neighborhood points in the neighborhood set and the target time sequence data is less than or equal to k distance points, determining the reachable distance between other neighborhood points and the target time sequence data as a preset distance threshold; if the actual distance between other neighborhood points in the neighborhood set and the target time sequence data is greater than the k distance point, determining the reachable distance between the other neighborhood points and the target time sequence data as the actual distance; and calculating the reciprocal of the reachable distance between the target time sequence data and other neighborhood points in the neighborhood set to obtain the local reachable density of the target time sequence data.
In an embodiment of the present invention, the reachable distance may be calculated by the following formula, where point P represents the target timing data, point O represents other neighborhood points in the neighborhood set, and dk(O) represents the Kth distance of the point O, d (P, O) represents the actual distance between the point O and the point P, dk(P, O) is the reachable distance.
dk(P,O)=max{dk(O),d(P,O)};
That is, if the distance to the point P is less than or equal to the k-distance point (i.e., if the actual distance from other neighbor points in the neighbor set to the target time series data is less than or equal to the k-distance point), the reachable distance is the k-distance (i.e., a preset distance threshold), otherwise, the reachable distance is the actual distance (i.e., if the actual distance between other neighbor points in the neighbor set and the target time series data is greater than the k-distance point, the reachable distance between other neighbor points and the target time series data is determined to be the actual distance).
Then, the local reachable density can be calculated by the following formula, wherein the point O represents a point in the neighborhood, the point P is used for target time sequence data, and the local reachable density of the point P is calculated by the reciprocal of the reachable distance between the point P and other neighborhood points.
Figure BDA0003345547460000101
After the model is trained and the training model is iteratively updated, the model can be directly used for detecting abnormal data. And the real-time data is difficultly mined and model iteration is carried out, so that the production abnormity can be found more accurately in real time.
Step S102, product operation data is received.
In an embodiment of the present invention, the product operation data is real-time production data in production, for example, data about funds, data about bonds, and the like.
And step S104, sending the product operation data to a target park, wherein the target park is accessed with an anomaly detection system, a data detection model is operated in the anomaly detection system, the data detection model excavates sample data causing a loss value rate to be larger than a preset probability threshold value in the model training process, and retrains the sample data, and the loss value rate is used for indicating the ratio of the sample data quantity with wrong model classification to the total data quantity.
In the embodiment of the invention, after the historical sample data of the production stock is used for basic model training to obtain the initial model (namely the data detection model), the data detection model can be put into the abnormity detection system which is built in production, wherein, the data detection model mines the sample data causing the loss value rate to be larger than a preset probability threshold (for example, sixty percent) in the model training process, and retraining the sample data, the loss value rate being indicative of a ratio of a sample data amount to a total data amount for which the model classification is incorrect, the anomaly detection system may be divided into different parks (e.g., the A, B park), which may perform different functions, can set up two lives in the two gardens of system to alleviate single garden system pressure, so, can send corresponding target garden according to the type of product operation data.
In the embodiment of the invention, each park can correspond to one abnormality detection system, and the parks can run in parallel, namely, two parks can live in two, three parks can live in three, so that the advantage of parallel running of a plurality of parks is as follows: the system pressure can be reduced, the data processing speed is improved, and if one of the parks has a fault, the abnormal detection system can be changed through the multi-activity (for example, double-activity) function (namely, the abnormal detection system corresponding to the other park is selected), and the data processing is continued.
And S106, analyzing the product operation data by adopting an anomaly detection system to obtain a detection result.
In the embodiment of the invention, the product operation data can be detected by the abnormity detection system, whether the data is abnormal or not is preliminarily determined according to the detection result, and the data is stored in the sample library.
And step S108, sending the abnormal data in the detection result to an alarm system.
Optionally, after sending the abnormal data in the detection result to the alarm system, the detection method further includes: receiving a data confirmation result fed back by the alarm system; if the data confirmation result indicates that the detection result is real, classifying the abnormal data into a positive sample library; and if the data confirmation result indicates that the detection result is false, classifying the abnormal data into a negative sample library.
In the embodiment of the invention, the abnormity warning system detects whether a new suspected abnormity exists in real time, the abnormity is displayed and notified to related personnel, after the related personnel receive the notification (namely, a data confirmation result fed back by the warning system is received), whether the suspected abnormity is real abnormity is confirmed according to the actual situation, the abnormity warning system confirms, after the confirmation is finished, abnormal data enters the positive sample library (namely, if the data confirmation result indicates that the detection result is real, the abnormal data is classified into the positive sample library), and data which is not abnormal enters the negative sample library (namely, if the data confirmation result indicates that the detection result is false, the abnormal data is classified into the negative sample library), the warning notification system captures the content in the positive sample library, and the real warning data is rapidly sent to the related personnel in real time.
The embodiment of the invention provides a method for improving the accuracy of an artificial intelligence warning system, which not only solves the problem of detection accuracy reduction caused by updating and training an artificial intelligence model based on a traditional method in production, but also can find production abnormity more accurately and in real time by carrying out difficult mining on real-time data and carrying out model iteration, is suitable for improving the accuracy of the artificial intelligence model-based warning system in production, is compatible with various artificial intelligence algorithms, and has better expandability.
Example two
FIG. 2 is a schematic diagram of an alternative method for improving the accuracy of an artificial intelligence alarm system according to an embodiment of the present invention, including the following steps:
step 1: and confirming the trained model, using historical sample data of the stock in production for basic model training to obtain an initial model, and putting the model into production to build an anomaly detection system (wherein the anomaly detection system can be divided into different parks (for example, A, B parks), and the different parks can execute different functions, namely, two parks and two lives of the system can be set so as to reduce the system pressure of a single park).
Step 2: the production data are accessed to the corresponding park in real time, the initial model detects the data through the anomaly detection system, whether the data are abnormal or not is preliminarily determined according to the result of model output training, and the data are stored in a sample library.
And step 3: and (4) alarm detection, wherein an abnormity alarm system detects whether a new suspected abnormity exists in real time, displays the abnormity and informs development operation and maintenance personnel to confirm whether the abnormity is true.
And 4, step 4: after receiving the notification, the development and operation and maintenance personnel confirm whether the suspected abnormality is a real abnormality according to the actual situation, confirm in the abnormality warning system, and after the confirmation, enter the positive sample library for the data which is abnormal, and enter the negative sample library for the data which is not abnormal.
And the alarm notification system captures the content in the positive sample library and quickly transmits the real alarm data to related personnel in real time.
And 5: and (4) model updating iteration, namely, performing iterative training by simultaneously inputting historical sample data (namely, obtained from a sample library) and difficultly mined difficultly divided samples for producing real-time data into the model so as to update the model.
Fig. 3 is a flowchart of an optional hard case mining according to an embodiment of the present invention, and as shown in fig. 3, in order to improve accuracy of hard case mining, the embodiment of the present invention may perform mining through two algorithms. Firstly, anomaly detection can be performed on real-time data (wherein the real-time data is required to be in accordance with normal distribution, and if the real-time data is not in accordance with the normal distribution, log computation can be used for converting the real-time data into the normal distribution), sample data of an index to be detected (such as success rate, transaction time consumption and the like) can be subjected to sample mean computation to obtain an index detection mean value mu, then sample variance computation is performed to obtain an index detection variance value sigma, the sample data which are not in a 3-sigma range (such as mu-3 sigma and mu +3 sigma) are confirmed to be a difficultly-divided negative sample set, and the sample data which are in the 3-sigma range (such as mu-3 sigma and mu +3 sigma) are confirmed to be a positive sample set.
If the 3-Sigma detection is a difficult negative sample set (namely, the real-time data is abnormal), the real-time data is stored in a difficult sample library, and if the 3-Sigma detection is a positive sample set (namely, the real-time data is normal), the LOF algorithm is continuously used for detection (wherein, the LOF algorithm needs two-dimensional data, namely, index values, such as the success rate and the time point when the transaction consumes time, can be combined into time-series two-dimensional data).
Firstly, calculating a kth distance, and sequencing distances between a point P (namely a selected target real-time data point) and other points from small to large, wherein the kth distance is the k distance; secondly, calculating a k distance neighborhood set: all points with the distance to the point P less than or equal to the k distance are k in total and are marked as Nk(P); thirdly, calculating the reachable distance, if the distance to the point P is less than or equal to the k distance, the k distance is obtained, otherwise, the actual distance is obtained, and the following formula is shown:
dk(P,O)=max{dk(O),d(P,O)};
and fourthly, calculating the local reachable density as (wherein, the point in the neighborhood is taken as the point O): the calculation is performed by calculating the reciprocal of the reachable distance of point P from other neighboring points.
Figure BDA0003345547460000131
And fifthly, calculating a local outlier factor, namely calculating by the following formula, namely dividing the mean value of the local reachable densities of the points in the neighborhood by the local reachable density of the point P.
Figure BDA0003345547460000132
LOFk(P) neighborhood N representing Pk(P) local achievable density of other points in the array to local achievable density value of PIf the value is closer to 1, the neighborhood point density of P is shown to be almost the same, P may be the same cluster with the neighborhood, if the ratio is smaller than 1, the density of P is shown to be higher than the neighborhood point density, P is a dense point, if the ratio is larger than 1, the density of P is shown to be smaller than the neighborhood point density, P is an abnormal point, namely the real-time data with the ratio larger than 1 enters a difficult negative sample library.
The embodiment of the invention has the following beneficial effects:
(1) by carrying out difficult mining on real-time data and carrying out model iteration, the production abnormity can be found more accurately in real time;
(2) the artificial intelligence model trained by the application can be suitable for various product production warning systems, and is compatible with various artificial intelligence algorithms, and the expandability is good.
EXAMPLE III
The data anomaly detection device provided in the present embodiment includes a plurality of implementation units, each implementation unit corresponding to each implementation step in the first embodiment.
Fig. 4 is a schematic diagram of an alternative data anomaly detection apparatus according to an embodiment of the present invention, as shown in fig. 4, the detection apparatus may include: a receiving unit 40, a first transmitting unit 42, an analyzing unit 44, a second transmitting unit 46, wherein,
a receiving unit 40 for receiving product operation data;
the first sending unit 42 is configured to send the product operation data to a target campus, where the target campus is accessed to an anomaly detection system, a data detection model is operated in the anomaly detection system, the data detection model excavates sample data causing a loss value rate greater than a preset probability threshold in a model training process, and retrains the sample data, and the loss value rate is used to indicate a ratio of sample data size with a model classification error to total data size;
the analysis unit 44 is used for analyzing the product operation data by adopting an anomaly detection system to obtain a detection result;
and a second sending unit 46, configured to send the abnormal data in the detection result to the alarm system.
The detection device can receive product operation data through the receiving unit 40, and send the product operation data to the target garden through the first sending unit 42, wherein the target garden is connected with an abnormality detection system, a data detection model runs in the abnormality detection system, the data detection model excavates sample data causing a loss value rate larger than a preset probability threshold value in a model training process, and retrains the sample data, the loss value rate is used for indicating a ratio of the sample data volume with wrong model classification to a total data volume, the abnormality detection system is adopted to analyze the product operation data through the analysis unit 44 to obtain a detection result, and the abnormal data in the detection result is sent to the alarm system through the second sending unit 46. In the embodiment of the invention, the detection rate of the anomaly detection system can be improved by carrying out hard-case mining on the real-time product operation data and retraining the data, so that the anomaly data can be found more accurately in real time, and the technical problem of lower detection accuracy of the anomaly detection system in the related technology is solved.
Optionally, the detection device further includes: the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring product operation data in a first preset time period in a historical process before receiving the product operation data to obtain historical sample data; the system comprises a first receiving module, a second receiving module and a third receiving module, wherein the first receiving module is used for receiving product error data of product operation data input by external terminal equipment, and the product error data indicate sample data which causes a loss value rate to be larger than a preset probability threshold value in a model training process; the first training module is used for inputting the historical sample data and the product error data into the data detection model so as to carry out iterative training on the data detection model.
Optionally, the first training module includes: the first detection submodule is used for carrying out abnormal detection on historical sample data by adopting a preset normal distribution algorithm to obtain a negative sample set and a positive sample set; and the first analysis submodule is used for analyzing the sample data in the positive sample set by adopting a local abnormal factor algorithm so as to determine wrong data in the positive sample set and finish model training.
Optionally, the first detection submodule includes: the first extraction submodule is used for extracting sample data corresponding to each index to be detected from historical sample data by taking each index to be detected as a reference; the first calculation submodule is used for calculating the sample mean value of all the extracted sample data to obtain an index detection mean value; the second calculation submodule is used for performing sample variance calculation on all the extracted sample data to obtain an index detection variance value; the third calculation submodule is used for calculating a normal distribution area based on the index detection mean value and the index detection variance value; and the first classification submodule is used for classifying the historical sample data in the normal distribution area into a positive sample set and classifying the historical sample data which does not fall in the normal distribution area into a negative sample set.
Optionally, the first analysis sub-module includes: the first combination submodule is used for combining the index data corresponding to each index to be detected and the corresponding data time in the historical sample data to obtain a plurality of time sequence data; the first determining submodule is used for determining surrounding neighborhood points of the target time sequence data by taking each time sequence data as a center to obtain a neighborhood set; the fourth calculation submodule is used for calculating the reachable distance of other neighborhood points in the neighborhood set and the local reachable density of the target time sequence data; the fifth calculation submodule is used for calculating the density mean value of the local reachable density of all other neighborhood points in the neighborhood set and calculating the density ratio between the density mean value and the local reachable density of the target time sequence data; and the second determining submodule is used for determining that the target time sequence data is wrong data in the positive sample set if the density ratio is larger than the preset ratio threshold.
Optionally, the first determining sub-module includes: the first acquisition submodule is used for acquiring a vector distance value between target time sequence data and other time sequence data; the first sequencing submodule is used for sequencing all the vector distance values to obtain a sequencing result; and the second classification submodule is used for classifying the time sequence data of which the vector distance value is less than or equal to a preset distance threshold value into a neighborhood set based on the sequencing result.
Optionally, the fourth computation submodule includes: the third determining submodule is used for determining the reachable distance between other neighborhood points and the target time sequence data as a preset distance threshold if the actual distance between the other neighborhood points in the neighborhood set and the target time sequence data is less than or equal to k distance points; the fourth determining submodule is used for determining the reachable distance between other neighborhood points and the target time sequence data as the actual distance if the actual distance between the other neighborhood points and the target time sequence data in the neighborhood set is greater than the k distance point; and the sixth calculating submodule is used for calculating the reciprocal of the reachable distance between the target time sequence data and other neighborhood points in the neighborhood set to obtain the local reachable density of the target time sequence data.
Optionally, the detection device further includes: after the abnormal data in the detection result is sent to the alarm system, the second receiving module is used for receiving a data confirmation result fed back by the alarm system; the first classification module is used for classifying the abnormal data into a positive sample library if the data confirmation result indicates that the detection result is real; and the second classification module is used for classifying the abnormal data into the negative sample library if the data confirmation result indicates that the detection result is false.
The above-mentioned detection device may further include a processor and a memory, the above-mentioned receiving unit 40, the first transmitting unit 42, the analyzing unit 44, the second transmitting unit 46, etc. are all stored in the memory as program units, and the processor executes the above-mentioned program units stored in the memory to implement the corresponding functions.
The processor comprises a kernel, and the kernel calls a corresponding program unit from the memory. The kernel can be set to be one or more, and abnormal data in the detection result is sent to the alarm system by adjusting the kernel parameters.
The memory may include volatile memory in a computer readable medium, Random Access Memory (RAM) and/or nonvolatile memory such as Read Only Memory (ROM) or flash memory (flash RAM), and the memory includes at least one memory chip.
The present application further provides a computer program product adapted to perform a program for initializing the following method steps when executed on a data processing device: receiving product operation data, and sending the product operation data to a target park, wherein the target park is accessed with an abnormal detection system, a data detection model runs in the abnormal detection system, the data detection model excavates sample data causing a loss value rate to be larger than a preset probability threshold value in a model training process, and retrains the sample data again, the loss value rate is used for indicating a ratio of sample data quantity of model classification errors to total data quantity, the abnormal detection system is adopted to analyze the product operation data to obtain a detection result, and abnormal data in the detection result is sent to an alarm system.
According to another aspect of the embodiments of the present invention, there is also provided an electronic device, including: a processor; and a memory for storing executable instructions for the processor; wherein the processor is configured to perform any of the data anomaly detection methods described above via execution of executable instructions.
According to another aspect of the embodiments of the present invention, there is also provided a computer-readable storage medium, where the computer-readable storage medium includes a stored computer program, and when the computer program runs, the apparatus where the computer-readable storage medium is located is controlled to execute any one of the above data anomaly detection methods.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units may be a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (11)

1. A data anomaly detection method is characterized by comprising the following steps:
receiving product operation data;
sending the product operation data to a target park, wherein the target park is accessed with an anomaly detection system, a data detection model is operated in the anomaly detection system, the data detection model excavates sample data causing a loss value rate to be larger than a preset probability threshold value in a model training process, and retrains the sample data, and the loss value rate is used for indicating a ratio of sample data volume with wrong model classification to total data volume;
analyzing the product operation data by adopting the anomaly detection system to obtain a detection result;
and sending abnormal data in the detection result to an alarm system.
2. The inspection method of claim 1, wherein prior to receiving product operational data, the inspection method further comprises:
obtaining product operation data in a first preset time period in a historical process to obtain historical sample data;
receiving product error data of product operation data input by external terminal equipment, wherein the product error data indicate sample data which causes a loss value rate to be greater than a preset probability threshold value in a model training process;
and inputting the historical sample data and the product error data into a data detection model so as to carry out iterative training on the data detection model.
3. The method of claim 2, wherein the step of inputting the historical sample data and the product error data into a data detection model for iterative training of the data detection model comprises:
performing anomaly detection on the historical sample data by adopting a preset normal distribution algorithm to obtain a negative sample set and a positive sample set;
and analyzing the sample data in the positive sample set by adopting a local abnormal factor algorithm to determine wrong data in the positive sample set and finish model training.
4. The detection method according to claim 3, wherein the step of performing anomaly detection on the historical sample data by using a preset normal distribution algorithm to obtain a negative sample set and a positive sample set comprises:
taking each index to be detected as a reference, extracting sample data corresponding to each index to be detected in the historical sample data;
calculating the sample mean value of all the extracted sample data to obtain an index detection mean value;
performing sample variance calculation on all the extracted sample data to obtain an index detection variance value;
calculating a normal distribution area based on the index detection mean value and the index detection variance value;
and classifying the historical sample data in the normal distribution area into a positive sample set, and classifying the historical sample data which does not fall in the normal distribution area into the negative sample set.
5. The method according to claim 3, wherein the step of analyzing the sample data in the positive sample set by using a local anomaly factor algorithm to determine the wrong data in the positive sample set comprises:
combining the index data corresponding to each index to be detected and the corresponding data time in the historical sample data to obtain a plurality of time sequence data;
determining surrounding neighborhood points in the target time sequence data by taking each time sequence data as a center to obtain a neighborhood set;
calculating the reachable distance of other neighborhood points in the neighborhood set and the local reachable density of the target time sequence data;
calculating the density mean value of the local reachable densities of all other neighborhood points in the neighborhood set, and calculating the density ratio between the density mean value and the local reachable density of the target time sequence data;
and if the density ratio is larger than a preset ratio threshold, determining that the target time sequence data is wrong data in the positive sample set.
6. The detection method according to claim 5, wherein the step of determining a neighborhood point around the target time series data by centering on each time series data to obtain a neighborhood set comprises:
acquiring a vector distance value between the target time sequence data and other time sequence data;
sequencing all vector distance values to obtain a sequencing result;
and classifying the time sequence data with the vector distance value smaller than or equal to a preset distance threshold value into the neighborhood set based on the sorting result.
7. The detection method according to claim 5, wherein the step of calculating the reachable distances of other neighborhood points in the neighborhood set and the local reachable density of the target time series data comprises:
if the actual distance between other neighborhood points in the neighborhood set and the target time sequence data is less than or equal to k distance points, determining the reachable distance between the other neighborhood points and the target time sequence data to be a preset distance threshold;
if the actual distance between other neighborhood points in the neighborhood set and the target time sequence data is greater than k distance points, determining the reachable distance between the other neighborhood points and the target time sequence data as the actual distance;
and calculating the reciprocal of the reachable distance between the target time sequence data and other neighborhood points in the neighborhood set to obtain the local reachable density of the target time sequence data.
8. The detection method according to claim 1, wherein after sending the abnormal data in the detection result to an alarm system, the detection method further comprises:
receiving a data confirmation result fed back by the alarm system;
if the data confirmation result indicates that the detection result is real, classifying the abnormal data into a positive sample library;
and if the data confirmation result indicates that the detection result is false, classifying the abnormal data into a negative sample library.
9. A data abnormality detection apparatus, characterized by comprising:
the receiving unit is used for receiving product operation data;
the system comprises a first sending unit, a second sending unit and a third sending unit, wherein the first sending unit is used for sending the product operation data to a target park, the target park is accessed to an abnormity detection system, a data detection model runs in the abnormity detection system, the data detection model excavates sample data causing a loss value rate to be larger than a preset probability threshold value in a model training process, and retrains the sample data, and the loss value rate is used for indicating a ratio of sample data volume with wrong model classification to total data volume;
the analysis unit is used for analyzing the product operation data by adopting the anomaly detection system to obtain a detection result;
and the second sending unit is used for sending the abnormal data in the detection result to the alarm system.
10. An electronic device, comprising:
a processor; and
a memory for storing executable instructions of the processor;
wherein the processor is configured to perform the data anomaly detection method of any one of claims 1 to 8 via execution of the executable instructions.
11. A computer-readable storage medium, comprising a stored computer program, wherein when the computer program runs, the computer-readable storage medium controls a device to execute the data anomaly detection method according to any one of claims 1 to 8.
CN202111321740.1A 2021-11-09 2021-11-09 Data anomaly detection method and device and electronic equipment Pending CN114020811A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111321740.1A CN114020811A (en) 2021-11-09 2021-11-09 Data anomaly detection method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111321740.1A CN114020811A (en) 2021-11-09 2021-11-09 Data anomaly detection method and device and electronic equipment

Publications (1)

Publication Number Publication Date
CN114020811A true CN114020811A (en) 2022-02-08

Family

ID=80062694

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111321740.1A Pending CN114020811A (en) 2021-11-09 2021-11-09 Data anomaly detection method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN114020811A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115941807A (en) * 2022-12-22 2023-04-07 陕西通信规划设计研究院有限公司 Efficient data compression method for park security system
CN117034178A (en) * 2023-10-08 2023-11-10 北京日光旭升精细化工技术研究所 Online monitoring system for detergent production equipment

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115941807A (en) * 2022-12-22 2023-04-07 陕西通信规划设计研究院有限公司 Efficient data compression method for park security system
CN115941807B (en) * 2022-12-22 2024-02-23 陕西通信规划设计研究院有限公司 Efficient data compression method for park security system
CN117034178A (en) * 2023-10-08 2023-11-10 北京日光旭升精细化工技术研究所 Online monitoring system for detergent production equipment
CN117034178B (en) * 2023-10-08 2024-01-12 北京日光旭升精细化工技术研究所 Online monitoring system for detergent production equipment

Similar Documents

Publication Publication Date Title
CN105279365B (en) For the method for the sample for learning abnormality detection
CN111882446B (en) Abnormal account detection method based on graph convolution network
Wang et al. iVAT and aVAT: enhanced visual analysis for cluster tendency assessment
CN112148772A (en) Alarm root cause identification method, device, equipment and storage medium
CN109977895B (en) Wild animal video target detection method based on multi-feature map fusion
CN114020811A (en) Data anomaly detection method and device and electronic equipment
CN110837874B (en) Business data anomaly detection method based on time sequence classification
CN110634081A (en) Method and device for processing abnormal data of hydropower station
US20230385699A1 (en) Data boundary deriving system and method
CN116229052B (en) Method for detecting state change of substation equipment based on twin network
CN112580693A (en) Petrochemical process fault diagnosis method based on self-help resampling neighborhood preserving embedding
CN111738348A (en) Power data anomaly detection method and device
CN116400168A (en) Power grid fault diagnosis method and system based on depth feature clustering
CN112966088A (en) Unknown intention recognition method, device, equipment and storage medium
CN114595352A (en) Image identification method and device, electronic equipment and readable storage medium
CN118052558B (en) Wind control model decision method and system based on artificial intelligence
CN117368862A (en) High-efficiency weather radar data quality evaluation system
CN109901553B (en) Heterogeneous industrial big data collaborative modeling process fault monitoring method based on multiple visual angles
Sharma et al. A semi-supervised generalized vae framework for abnormality detection using one-class classification
CN111539931A (en) Appearance abnormity detection method based on convolutional neural network and boundary limit optimization
CN117349786B (en) Evidence fusion transformer fault diagnosis method based on data equalization
CN116232761B (en) Method and system for detecting abnormal network traffic based on shapelet
CN117909881A (en) Fault diagnosis method and device for multi-source data fusion pumping unit
Liu et al. Container-code recognition system based on computer vision and deep neural networks
CN113469816A (en) Digital currency identification method, system and storage medium based on multigroup technology

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination