CN109359138A - A kind of method for detecting abnormality and device based on Density Estimator - Google Patents

A kind of method for detecting abnormality and device based on Density Estimator Download PDF

Info

Publication number
CN109359138A
CN109359138A CN201811219917.5A CN201811219917A CN109359138A CN 109359138 A CN109359138 A CN 109359138A CN 201811219917 A CN201811219917 A CN 201811219917A CN 109359138 A CN109359138 A CN 109359138A
Authority
CN
China
Prior art keywords
probability
offset
characterizes
standard value
eigenvector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811219917.5A
Other languages
Chinese (zh)
Inventor
段强
李锐
于治楼
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jinan Inspur Hi Tech Investment and Development Co Ltd
Original Assignee
Jinan Inspur Hi Tech Investment and Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jinan Inspur Hi Tech Investment and Development Co Ltd filed Critical Jinan Inspur Hi Tech Investment and Development Co Ltd
Priority to CN201811219917.5A priority Critical patent/CN109359138A/en
Publication of CN109359138A publication Critical patent/CN109359138A/en
Pending legal-status Critical Current

Links

Landscapes

  • Complex Calculations (AREA)

Abstract

The present invention provides a kind of method for detecting abnormality and device based on Density Estimator, comprising: obtain at least three feature vectors Jing Guo data processing in advance;Determine the corresponding density estimation of each described eigenvector;According to density estimation described in each, the probability density function of at least three feature vector is determined;According to the probability density function, the probability of each described eigenvector appearance is obtained;Determine the corresponding offset of each described probability;Each described offset is standardized, corresponding standard value is obtained;According to standard value described in each and preset threshold value, determine whether each described eigenvector is abnormal.This programme has extensive adaptability.

Description

A kind of method for detecting abnormality and device based on Density Estimator
Technical field
The present invention relates to Data Detection Technology field, in particular to a kind of method for detecting abnormality based on Density Estimator and Device.
Background technique
With the development of information technology, big data era has been arrived.In the fields such as finance, network security and internet, By learning a large amount of historical datas, normal data and abnormal data can be distinguished using Outlier Detection Algorithm, hence for exception Problem carries out early warning.
Currently, commonly based on density anomaly detection algorithm have local outlier factor (Local Outlier Factor, LOF) algorithm and its variant, such as simplified-LOF algorithm, LDF algorithm and LOOP algorithm.
But these algorithms, which are suitable for particular data set scene, is distributed sparse point, i.e. outlier to find, therefore not With extensive adaptability.
Summary of the invention
The embodiment of the invention provides a kind of method for detecting abnormality and device based on Density Estimator has extensive suitable Ying Xing.
In a first aspect, being obtained in advance the embodiment of the invention provides a kind of method for detecting abnormality based on Density Estimator At least three feature vectors by data processing, further includes:
Determine the corresponding density estimation of each described eigenvector;
According to density estimation described in each, the probability density function of at least three feature vector is determined;
According to the probability density function, the probability of each described eigenvector appearance is obtained;
Determine the corresponding offset of each described probability;
Each described offset is standardized, corresponding standard value is obtained;
According to standard value described in each and preset threshold value, determine whether each described eigenvector is abnormal.
Preferably,
The corresponding density estimation of each described eigenvector of the determination, comprising:
According to following first formula, the corresponding density estimation of each described eigenvector is determined:
Wherein, DiThe corresponding density estimation of i-th of described eigenvector is characterized, k characterizes k quantity and i-th of spy Levy the Neighbor Points of vector, Hh(d (i, p)) characterizes the kernel function of given pre-set bandwidths, wherein d (i, p) characterizes i-th of spy Levy the distance between vector and p-th of described eigenvector.
Preferably,
The kernel function, comprising:
According to following second formula, the kernel function is determined:
Or,
According to following third formula, the kernel function is determined:
Wherein, π characterizes pi, and e characterizes natural constant.
Preferably,
It is described that the probability density function of at least three feature vector is determined according to density estimation described in each, packet It includes:
It sums to each density estimation, obtains the probability density function of at least three feature vector;
Then,
It is described according to the probability density function, obtain each described eigenvector appearance probability, comprising:
For each described eigenvector, described eigenvector is substituted into the probability density function, described in acquisition The probability that feature vector occurs.
Preferably,
The corresponding offset of each described probability of the determination, comprising:
According to following 4th formula, the corresponding offset of each described probability is determined:
Wherein, ZiCharacterize the corresponding offset of i-th of probability, XiI-th of probability is characterized, n characterizes the probability Quantity.
Preferably,
It is described that each described offset is standardized, obtain corresponding standard value, comprising:
According to following 5th formula, the corresponding standard value of each described offset is determined:
Wherein, Bi characterizes the corresponding standard value of i-th of offset, and Zi characterizes i-th of offset, Zmax characterization The maximum offset of numerical value, Zmin characterize the smallest offset of numerical value.
Preferably,
It is described to determine whether each described eigenvector is abnormal according to standard value described in each and preset threshold value, Include:
For standard value described in each, determine whether the standard value is greater than preset threshold value;
If so, determining that the corresponding described eigenvector of the standard value is abnormal.
Second aspect, the embodiment of the invention provides a kind of abnormal detectors based on Density Estimator, comprising:
Data capture unit, for obtaining at least three feature vectors by data processing in advance;
Calculation processing unit, for determining that each described eigenvector of the acquisition of the data capture unit is corresponding close Degree estimation;According to density estimation described in each, the probability density function of at least three feature vector is determined;According to described Probability density function obtains the probability of each described eigenvector appearance;Determine the corresponding offset of each described probability; Each described offset is standardized, corresponding standard value is obtained;
Abnormality detecting unit, each described standard value and preset threshold for being obtained according to the calculation processing unit Value determines whether each described eigenvector is abnormal.
Preferably,
The calculation processing unit, for determining that each described eigenvector is corresponding close according to following first formula Degree estimation:
Wherein, DiThe corresponding density estimation of i-th of described eigenvector is characterized, k characterizes k quantity and i-th of spy Levy the Neighbor Points of vector, Hh(d (i, p)) characterizes the kernel function of given pre-set bandwidths, wherein d (i, p) characterizes i-th of spy Levy the distance between vector and p-th of described eigenvector.
Preferably,
The kernel function, comprising:
According to following second formula, the kernel function is determined:
Or,
According to following third formula, the kernel function is determined:
Wherein, π characterizes pi, and e characterizes natural constant.
Preferably,
The calculation processing unit obtains at least three feature for summing to each density estimation The probability density function of vector;For each described eigenvector, described eigenvector is substituted into the probability density function In, obtain the probability that described eigenvector occurs.
Preferably,
The calculation processing unit determines the corresponding offset of each described probability for according to following 4th formula:
Wherein, ZiCharacterize the corresponding offset of i-th of probability, XiI-th of probability is characterized, n characterizes the probability Quantity.
Preferably,
The calculation processing unit determines the corresponding standard of each described offset for according to following 5th formula Value:
Wherein, Bi characterizes the corresponding standard value of i-th of offset, and Zi characterizes i-th of offset, Zmax characterization The maximum offset of numerical value, Zmin characterize the smallest offset of numerical value.
Preferably,
It is default to determine whether the standard value is greater than for being directed to each described standard value for the abnormality detecting unit Threshold value;
If so, determining that the corresponding described eigenvector of the standard value is abnormal.
In embodiments of the present invention, by determining that each passes through density corresponding to the feature vector that data processing obtains Estimation, can determine the probability density function of general characteristic vector, according to each available feature of the probability density function The probability that vector occurs, then by determining the corresponding offset of probability, and it is standardized, offset can be obtained Finally each standard value is compared with preset threshold value for corresponding standard value, that is, can determine whether feature vector is abnormal, It is distributed sparse point without finding according to specific set of data scene, therefore there is extensive adaptability.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is the present invention Some embodiments for those of ordinary skill in the art without creative efforts, can also basis These attached drawings obtain other attached drawings.
Fig. 1 is a kind of flow chart for method for detecting abnormality based on Density Estimator that one embodiment of the invention provides;
Fig. 2 is the flow chart for another method for detecting abnormality based on Density Estimator that one embodiment of the invention provides;
Fig. 3 is a kind of structural representation for abnormal detector based on Density Estimator that one embodiment of the invention provides Figure.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments, based on the embodiments of the present invention, those of ordinary skill in the art Every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.
As shown in Figure 1, the embodiment of the invention provides a kind of method for detecting abnormality based on Density Estimator, comprising:
Step 101: obtaining at least three feature vectors Jing Guo data processing in advance;
Step 102: determining the corresponding density estimation of each described eigenvector;
Step 103: according to density estimation described in each, determining the probability density letter of at least three feature vector Number;
Step 104: according to the probability density function, obtaining the probability of each described eigenvector appearance;
Step 105: determining the corresponding offset of each described probability;
Step 106: each described offset being standardized, corresponding standard value is obtained;
Step 107: according to standard value described in each and preset threshold value, determining whether each described eigenvector is different Often.
In embodiments of the present invention, by determining that each passes through density corresponding to the feature vector that data processing obtains Estimation, can determine the probability density function of general characteristic vector, according to each available feature of the probability density function The probability that vector occurs, then by determining the corresponding offset of probability, and it is standardized, offset can be obtained Finally each standard value is compared with preset threshold value for corresponding standard value, that is, can determine whether feature vector is abnormal, It is distributed sparse point without finding according to specific set of data scene, therefore there is extensive adaptability.
It should be noted that a kind of method for detecting abnormality based on Density Estimator provided by the invention, can be applied to Several scenes, such as unit exception state-detection in industrial production, the detection of abnormal operation in financial analysis, the thing in road traffic Therefore multi-happening section etc..The demand that the detection of all kinds of unconventional data can also be met detects including outlier and assembles center Detection etc..
In an embodiment of the present invention, the corresponding density estimation of each described eigenvector of the determination, comprising:
According to following first formula, the corresponding density estimation of each described eigenvector is determined:
Wherein, DiThe corresponding density estimation of i-th of described eigenvector is characterized, k characterizes k quantity and i-th of spy Levy the Neighbor Points of vector, Hh(d (i, p)) characterizes the kernel function of given pre-set bandwidths, wherein d (i, p) characterizes i-th of spy Levy the distance between vector and p-th of described eigenvector.
In embodiments of the present invention, by determining the k Neighbor Points with ith feature vector, that is, it can determine ith feature Vector and the distance between any feature vector p in k Neighbor Points, and then determine the kernel function of given bandwidth, then to core letter Number is summed, and finally can obtain the corresponding density estimation of each feature vector divided by the quantity of Neighbor Points.
In an embodiment of the present invention, the kernel function, comprising:
According to following second formula, the kernel function is determined:
Or,
According to following third formula, the kernel function is determined:
Wherein, π characterizes pi, and e characterizes natural constant.
In embodiments of the present invention, the kernel function for giving pre-set bandwidths, can pass through above-mentioned first formula or the second formula It determines, and ith feature vector and the distance between any feature vector p in k Neighbor Points, it can be mahalanobis distance, Europe Family name's distance, manhatton distance, reach distance.
In an embodiment of the present invention, described according to density estimation described in each, determine at least three feature to The probability density function of amount, comprising:
It sums to each density estimation, obtains the probability density function of at least three feature vector;
Then,
It is described according to the probability density function, obtain each described eigenvector appearance probability, comprising:
For each described eigenvector, described eigenvector is substituted into the probability density function, described in acquisition The probability that feature vector occurs.
In embodiments of the present invention, it sums to the density estimation of each feature vector, total characteristic vector can be obtained Probability density function, then each feature vector is substituted into the probability density function respectively, feature vector can be obtained and existed The probability that corresponding value in the curve of probability density function characterization, i.e. each feature vector occur.
In an embodiment of the present invention, the corresponding offset of each described probability of the determination, comprising:
According to following 4th formula, the corresponding offset of each described probability is determined:
Wherein, ZiCharacterize the corresponding offset of i-th of probability, XiI-th of probability is characterized, n characterizes the probability Quantity.
In embodiments of the present invention, after determining the probability that each feature vector occurs, it is also necessary to each probability It is standardized, the value obtained after standardization can be considered the corresponding offset of probability, i.e., is occurred according to general characteristic vector general Rate determines the probability that any feature vector occurs, deviates the degree for the probability that each feature vector occurs, i.e., by any feature to Measure the difference of the average value for the probability that the probability occurred and general characteristic vector occur, then the probability occurred divided by general characteristic vector Standard deviation, can obtain any feature vector appearance probability corresponding to offset.To sum up, it is calculated compared to LOF etc. Method compares extremum phenomenon that may be present using averag density, and more robust comparison can be provided by above-mentioned formula As a result.
In an embodiment of the present invention, described that each described offset is standardized, obtain corresponding mark Quasi- value, comprising:
According to following 5th formula, the corresponding standard value of each described offset is determined:
Wherein, Bi characterizes the corresponding standard value of i-th of offset, and Zi characterizes i-th of offset, Zmax characterization The maximum offset of numerical value, Zmin characterize the smallest offset of numerical value.
In embodiments of the present invention, it after the offset corresponding to the probability for determining the appearance of each feature vector, also needs Each offset is normalized, i.e., by the difference of any offset and the smallest offset of numerical value, divided by numerical value The process of the difference of maximum offset and the smallest offset of numerical value, so that after being zoomed to the normalization in [0,1] section Standard value (i.e. by the standard value after the normalization of each offset), to promote the accuracy rate of operation.
In an embodiment of the present invention, described according to standard value described in each and preset threshold value, determine each institute Whether abnormal state feature vector, comprising:
For standard value described in each, determine whether the standard value is greater than preset threshold value;
If so, determining that the corresponding described eigenvector of the standard value is abnormal.
In embodiments of the present invention, each standard value is compared with preset threshold value respectively, that is, can determine each Whether the corresponding feature vector of a standard value is abnormal, i.e., when standard value is greater than threshold value, illustrates the standard value and exception, therefore can To determine that the corresponding feature vector of the standard value is abnormal, user can determine this feature vector institute according to abnormal feature vector At least one corresponding initial data is abnormal, to realize the purpose of abnormality detection.
In order to more clearly illustrate technical solution of the present invention and advantage, below to one kind provided in an embodiment of the present invention Method for detecting abnormality based on Density Estimator is described in detail, and can specifically include following steps:
Step 201: obtaining at least three feature vectors Jing Guo data processing.
Specifically, by carrying out duplicate removal to collected at least three datas, going null value, the number such as missing value and coding of filling a vacancy According to processing operation, it can extract and be derived for certain valuable from a large amount of, rambling, elusive data Value, at least three significant feature vectors.
Step 202: according to the quantity of the distance between corresponding each Neighbor Points of each feature vector and Neighbor Points, really The fixed corresponding density estimation of each feature vector.
Specifically, the corresponding density estimation of each feature vector is determined according to following first formula,
Wherein, DiThe corresponding density estimation of ith feature vector is characterized, k characterizes k quantity and ith feature vector Neighbor Points, Hh(d (i, p)) characterizes the kernel function of given pre-set bandwidths, wherein d (i, p) characterizes ith feature vector and p-th The distance between feature vector.
And the kernel function of given pre-set bandwidths, it can be obtained according to following second formula or third formula:
Second formula:
Third publicity:
Wherein, π characterizes pi, and e characterizes natural constant 2.71828.
Step 203: summing to each density estimation, obtain the probability density function of at least three feature vectors.
Specifically, it when obtaining the probability density function of general characteristic vector, needs by summing to each density estimation It obtains.
Step 204: being directed to each feature vector, feature vector is substituted into probability density function, obtain feature vector The probability of appearance.
Specifically, it by substituting into each feature vector respectively in the probability density function of acquisition, can obtain each The probability that the corresponding value of a feature vector, i.e. feature vector occur.
Step 205: according to the quantity of probability, determining the corresponding offset of each probability.
Specifically, according to following 4th formula, the corresponding offset of each probability is determined:
Wherein, ZiCharacterize the corresponding offset of i-th of probability, XiI-th of probability is characterized, n characterizes the quantity of probability.
Specifically, it when determining the corresponding offset of each probability, needs to be standardized each probability, standard Value after change can be considered the corresponding offset of probability, i.e., by the difference of any probability and the average value of probability, divided by the mark of probability Quasi- difference obtains.
Step 206: each offset is directed to, by offset divided by the maximum offset of numerical value and the smallest offset of numerical value The difference of amount obtains the corresponding standard value of offset.
Specifically, according to following 5th formula, the corresponding standard value of each offset is determined:
Wherein, Bi characterizes the corresponding standard value of i-th of offset, and Zi characterizes i-th of offset, and it is maximum that Zmax characterizes numerical value Offset, Zmin characterize the smallest offset of numerical value.
Each offset is normalized, the standard value after being zoomed to the normalization in [0,1] section (i.e. by the standard value after the normalization of each offset), to promote the accuracy rate of operation.
Step 207: being directed to each standard value, determine whether standard value is greater than preset threshold value, if so, the standard of determination It is abnormal to be worth corresponding feature vector.
Specifically, by the way that each standard value to be compared with preset threshold value respectively, that is, it can determine each standard Whether abnormal it is worth corresponding feature vector, i.e., when standard value is greater than threshold value, description standard value is abnormal, thus may determine that standard It is abnormal to be worth corresponding feature vector.
As shown in figure 3, the present invention provides a kind of abnormal detectors based on Density Estimator, comprising:
Data capture unit 301, for obtaining at least three feature vectors by data processing in advance;
Calculation processing unit 302, for determining each described eigenvector pair of the acquisition of the data capture unit 301 The density estimation answered;According to density estimation described in each, the probability density function of at least three feature vector is determined;Root According to the probability density function, the probability of each described eigenvector appearance is obtained;Determine that each described probability is corresponding Offset;Each described offset is standardized, corresponding standard value is obtained;
Abnormality detecting unit 303, each described standard value for being obtained according to the calculation processing unit 302 and pre- If threshold value, determine whether each described eigenvector abnormal.
In embodiments of the present invention, each for determining that data capture unit obtains by calculation processing unit is by data Density estimation corresponding to obtained feature vector is handled, the probability density function of general characteristic vector can be determined, according to this The probability that each available feature vector of probability density function occurs, then by determining the corresponding offset of probability, and it is right It is standardized, and can obtain the corresponding standard value of offset, finally by abnormality detecting unit by calculation processing list Each standard value that member obtains is compared with preset threshold value, that is, can determine whether feature vector is abnormal, without basis Specific set of data scene is distributed sparse point to find, therefore has extensive adaptability.
In an embodiment of the present invention, the calculation processing unit, for determining each institute according to following first formula State the corresponding density estimation of feature vector:
Wherein, DiThe corresponding density estimation of i-th of described eigenvector is characterized, k characterizes k quantity and i-th of spy Levy the Neighbor Points of vector, Hh(d (i, p)) characterizes the kernel function of given pre-set bandwidths, wherein d (i, p) characterizes i-th of spy Levy the distance between vector and p-th of described eigenvector.
In an embodiment of the present invention, the kernel function, comprising:
According to following second formula, the kernel function is determined:
Or,
According to following third formula, the kernel function is determined:
Wherein, π characterizes pi, and e characterizes natural constant.
In an embodiment of the present invention, the calculation processing unit is obtained for summing to each density estimation Obtain the probability density function of at least three feature vector;For each described eigenvector, by described eigenvector generation Enter in the probability density function, obtains the probability that described eigenvector occurs.
In an embodiment of the present invention, the calculation processing unit, for determining each institute according to following 4th formula State the corresponding offset of probability:
Wherein, ZiCharacterize the corresponding offset of i-th of probability, XiI-th of probability is characterized, n characterizes the probability Quantity.
In an embodiment of the present invention, the calculation processing unit, for determining each institute according to following 5th formula State the corresponding standard value of offset:
Wherein, Bi characterizes the corresponding standard value of i-th of offset, and Zi characterizes i-th of offset, Zmax characterization The maximum offset of numerical value, Zmin characterize the smallest offset of numerical value.
In an embodiment of the present invention, the abnormality detecting unit, described in determining for each described standard value Whether standard value is greater than preset threshold value;
If so, determining that the corresponding described eigenvector of the standard value is abnormal.
The each embodiment of the present invention at least has the following beneficial effects:
1, in an embodiment of the present invention, by determining that each passes through corresponding to the feature vector that data processing obtains Density estimation can determine the probability density function of general characteristic vector, according to the probability density function it is available each The probability that feature vector occurs, then by determining the corresponding offset of probability, and it is standardized, it can obtain partially Each standard value is finally compared with preset threshold value, that is, whether can determine feature vector by the corresponding standard value of shifting amount It is abnormal, it is distributed sparse point without finding according to specific set of data scene, therefore there is extensive adaptability.
2, it in an embodiment of the present invention, by determining the k Neighbor Points with ith feature vector, that is, can determine i-th Feature vector and the distance between any feature vector p in k Neighbor Points, and then determine the kernel function of given bandwidth, then right Kernel function is summed, and finally can obtain the corresponding density estimation of each feature vector divided by the quantity of Neighbor Points.
3, the kernel function in an embodiment of the present invention, giving pre-set bandwidths, can pass through above-mentioned first formula or second Formula determines, and ith feature vector and the distance between any feature vector p in k Neighbor Points, can be geneva away from From, Euclidean distance, manhatton distance, reach distance.
It should be noted that, in this document, such as first and second etc relational terms are used merely to an entity Or operation is distinguished with another entity or operation, is existed without necessarily requiring or implying between these entities or operation Any actual relationship or order.Moreover, the terms "include", "comprise" or its any other variant be intended to it is non- It is exclusive to include, so that the process, method, article or equipment for including a series of elements not only includes those elements, It but also including other elements that are not explicitly listed, or further include solid by this process, method, article or equipment Some elements.In the absence of more restrictions, the element limited by sentence " including one ", is not arranged Except there is also other identical factors in the process, method, article or apparatus that includes the element.
Finally, it should be noted that the foregoing is merely presently preferred embodiments of the present invention, it is merely to illustrate skill of the invention Art scheme, is not intended to limit the scope of the present invention.Any modification for being made all within the spirits and principles of the present invention, Equivalent replacement, improvement etc., are included within the scope of protection of the present invention.

Claims (10)

1. a kind of method for detecting abnormality based on Density Estimator, which is characterized in that obtain in advance by data processing at least Three feature vectors, further includes:
Determine the corresponding density estimation of each described eigenvector;
According to density estimation described in each, the probability density function of at least three feature vector is determined;
According to the probability density function, the probability of each described eigenvector appearance is obtained;
Determine the corresponding offset of each described probability;
Each described offset is standardized, corresponding standard value is obtained;
According to standard value described in each and preset threshold value, determine whether each described eigenvector is abnormal.
2. the method according to claim 1, wherein
The corresponding density estimation of each described eigenvector of the determination, comprising:
According to following first formula, the corresponding density estimation of each described eigenvector is determined:
Wherein, DiThe corresponding density estimation of i-th of described eigenvector is characterized, k characterizes k quantity and i-th of described eigenvector Neighbor Points, Hh(d (i, p)) characterizes the kernel function of given pre-set bandwidths, wherein d (i, p) characterizes i-th of described eigenvector The distance between p-th of described eigenvector.
3. according to the method described in claim 2, it is characterized in that,
The kernel function, comprising:
According to following second formula, the kernel function is determined:
Or,
According to following third formula, the kernel function is determined:
Wherein, π characterizes pi, and e characterizes natural constant.
4. the method according to claim 1, wherein
It is described according to density estimation described in each, determine the probability density function of at least three feature vector, comprising:
It sums to each density estimation, obtains the probability density function of at least three feature vector;
Then,
It is described according to the probability density function, obtain each described eigenvector appearance probability, comprising:
For each described eigenvector, described eigenvector is substituted into the probability density function, obtains the feature The probability that vector occurs.
5. according to claim 1 to any method in 4, which is characterized in that
The corresponding offset of each described probability of the determination, comprising:
According to following 4th formula, the corresponding offset of each described probability is determined:
Wherein, ZiCharacterize the corresponding offset of i-th of probability, XiI-th of probability is characterized, n characterizes the number of the probability Amount;
And/or
It is described that each described offset is standardized, obtain corresponding standard value, comprising:
According to following 5th formula, the corresponding standard value of each described offset is determined:
Wherein, Bi characterizes the corresponding standard value of i-th of offset, and Zi characterizes i-th of offset, and Zmax characterizes numerical value The maximum offset, Zmin characterize the smallest offset of numerical value;
And/or
It is described according to standard value described in each and preset threshold value, determine whether each described eigenvector abnormal, comprising:
For standard value described in each, determine whether the standard value is greater than preset threshold value;
If so, determining that the corresponding described eigenvector of the standard value is abnormal.
6. a kind of abnormal detector based on Density Estimator characterized by comprising
Data capture unit, for obtaining at least three feature vectors by data processing in advance;
Calculation processing unit, for determining that the corresponding density of each described eigenvector of the acquisition of the data capture unit is estimated Meter;According to density estimation described in each, the probability density function of at least three feature vector is determined;According to the probability Density function obtains the probability of each described eigenvector appearance;Determine the corresponding offset of each described probability;To every One offset is standardized, and obtains corresponding standard value;
Abnormality detecting unit, each described standard value and preset threshold value for being obtained according to the calculation processing unit, Determine whether each described eigenvector is abnormal.
7. device according to claim 6, which is characterized in that
The calculation processing unit, for determining that the corresponding density of each described eigenvector is estimated according to following first formula Meter:
Wherein, DiThe corresponding density estimation of i-th of described eigenvector is characterized, k characterizes k quantity and i-th of described eigenvector Neighbor Points, Hh(d (i, p)) characterizes the kernel function of given pre-set bandwidths, wherein d (i, p) characterizes i-th of described eigenvector The distance between p-th of described eigenvector.
8. device according to claim 7, which is characterized in that
The kernel function, comprising:
According to following second formula, the kernel function is determined:
Or,
According to following third formula, the kernel function is determined:
Wherein, π characterizes pi, and e characterizes natural constant.
9. device according to claim 6, which is characterized in that
The calculation processing unit obtains at least three feature vector for summing to each density estimation Probability density function;For each described eigenvector, described eigenvector is substituted into the probability density function, is obtained The probability for taking described eigenvector to occur.
10. according to the device any in claim 6 to 9, which is characterized in that
The calculation processing unit determines the corresponding offset of each described probability for according to following 4th formula:
Wherein, ZiCharacterize the corresponding offset of i-th of probability, XiI-th of probability is characterized, n characterizes the number of the probability Amount;
And/or
The calculation processing unit determines the corresponding standard value of each described offset for according to following 5th formula:
Wherein, Bi characterizes the corresponding standard value of i-th of offset, and Zi characterizes i-th of offset, and Zmax characterizes numerical value The maximum offset, Zmin characterize the smallest offset of numerical value;
And/or
The abnormality detecting unit determines whether the standard value is greater than preset threshold for being directed to each described standard value Value;
If so, determining that the corresponding described eigenvector of the standard value is abnormal.
CN201811219917.5A 2018-10-19 2018-10-19 A kind of method for detecting abnormality and device based on Density Estimator Pending CN109359138A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811219917.5A CN109359138A (en) 2018-10-19 2018-10-19 A kind of method for detecting abnormality and device based on Density Estimator

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811219917.5A CN109359138A (en) 2018-10-19 2018-10-19 A kind of method for detecting abnormality and device based on Density Estimator

Publications (1)

Publication Number Publication Date
CN109359138A true CN109359138A (en) 2019-02-19

Family

ID=65345921

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811219917.5A Pending CN109359138A (en) 2018-10-19 2018-10-19 A kind of method for detecting abnormality and device based on Density Estimator

Country Status (1)

Country Link
CN (1) CN109359138A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110098983A (en) * 2019-05-28 2019-08-06 上海优扬新媒信息技术有限公司 A kind of detection method and device of abnormal flow
CN110806733A (en) * 2019-10-30 2020-02-18 中国神华能源股份有限公司国华电力分公司 Thermal power plant equipment monitoring method and device and electronic equipment
CN111683102A (en) * 2020-06-17 2020-09-18 绿盟科技集团股份有限公司 FTP behavior data processing method, and method and device for identifying abnormal FTP behavior
CN112232719A (en) * 2020-12-11 2021-01-15 北京基调网络股份有限公司 Index quantitative scoring method, computer equipment and storage medium
CN114896024A (en) * 2022-03-28 2022-08-12 同方威视技术股份有限公司 Method and device for detecting running state of virtual machine based on kernel density estimation

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6115708A (en) * 1998-03-04 2000-09-05 Microsoft Corporation Method for refining the initial conditions for clustering with applications to small and large database clustering
CN103916896A (en) * 2014-03-26 2014-07-09 浙江农林大学 Anomaly detection method based on multi-dimensional Epanechnikov kernel density estimation
CN105721199A (en) * 2016-01-18 2016-06-29 中国石油大学(华东) Real-time cloud service bottleneck detection method based on kernel density estimation and fuzzy inference system
CN106789885A (en) * 2016-11-17 2017-05-31 国家电网公司 User's unusual checking analysis method under a kind of big data environment
CN107092582A (en) * 2017-03-31 2017-08-25 江苏方天电力技术有限公司 One kind is based on the posterior exceptional value on-line checking of residual error and method for evaluating confidence

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6115708A (en) * 1998-03-04 2000-09-05 Microsoft Corporation Method for refining the initial conditions for clustering with applications to small and large database clustering
CN103916896A (en) * 2014-03-26 2014-07-09 浙江农林大学 Anomaly detection method based on multi-dimensional Epanechnikov kernel density estimation
CN105721199A (en) * 2016-01-18 2016-06-29 中国石油大学(华东) Real-time cloud service bottleneck detection method based on kernel density estimation and fuzzy inference system
CN106789885A (en) * 2016-11-17 2017-05-31 国家电网公司 User's unusual checking analysis method under a kind of big data environment
CN107092582A (en) * 2017-03-31 2017-08-25 江苏方天电力技术有限公司 One kind is based on the posterior exceptional value on-line checking of residual error and method for evaluating confidence

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110098983A (en) * 2019-05-28 2019-08-06 上海优扬新媒信息技术有限公司 A kind of detection method and device of abnormal flow
CN110098983B (en) * 2019-05-28 2021-06-04 上海优扬新媒信息技术有限公司 Abnormal flow detection method and device
CN110806733A (en) * 2019-10-30 2020-02-18 中国神华能源股份有限公司国华电力分公司 Thermal power plant equipment monitoring method and device and electronic equipment
CN110806733B (en) * 2019-10-30 2021-09-21 中国神华能源股份有限公司国华电力分公司 Thermal power plant equipment monitoring method and device and electronic equipment
CN111683102A (en) * 2020-06-17 2020-09-18 绿盟科技集团股份有限公司 FTP behavior data processing method, and method and device for identifying abnormal FTP behavior
CN111683102B (en) * 2020-06-17 2022-12-06 绿盟科技集团股份有限公司 FTP behavior data processing method, and method and device for identifying abnormal FTP behavior
CN112232719A (en) * 2020-12-11 2021-01-15 北京基调网络股份有限公司 Index quantitative scoring method, computer equipment and storage medium
CN114896024A (en) * 2022-03-28 2022-08-12 同方威视技术股份有限公司 Method and device for detecting running state of virtual machine based on kernel density estimation
CN114896024B (en) * 2022-03-28 2022-11-22 同方威视技术股份有限公司 Method and device for detecting running state of virtual machine based on kernel density estimation

Similar Documents

Publication Publication Date Title
CN109359138A (en) A kind of method for detecting abnormality and device based on Density Estimator
CN110995508B (en) KPI mutation-based adaptive unsupervised online network anomaly detection method
US7613668B2 (en) Anomaly detection system and a method of teaching it
CN110895526A (en) Method for correcting data abnormity in atmosphere monitoring system
CN107679734A (en) It is a kind of to be used for the method and system without label data classification prediction
CN111811567B (en) Equipment detection method based on curve inflection point comparison and related device
US7716152B2 (en) Use of sequential nearest neighbor clustering for instance selection in machine condition monitoring
CN113344133B (en) Method and system for detecting abnormal fluctuation of time sequence behaviors
CN112788066A (en) Abnormal flow detection method and system for Internet of things equipment and storage medium
CN112258689B (en) Ship data processing method and device and ship data quality management platform
CN108647737A (en) A kind of auto-adaptive time sequence variation detection method and device based on cluster
Weiß Continuously monitoring categorical processes
CN116066343A (en) Intelligent early warning method and system for fault model of oil delivery pump unit
CN117591836B (en) Pipeline detection data analysis method and related device
CN109584232A (en) Equipment use state on-line monitoring method, system and terminal based on image recognition
Tang et al. Traffic outlier detection by density-based bounded local outlier factors
CN113723861A (en) Abnormal electricity consumption behavior detection method and device, computer equipment and storage medium
CN112949714A (en) Fault possibility estimation method based on random forest
CN108268901A (en) A kind of algorithm that environmental monitoring abnormal data is found based on dynamic time warping distance
CN113987243A (en) Image file gathering method, image file gathering device and computer readable storage medium
CN110046651B (en) Pipeline state identification method based on monitoring data multi-attribute feature fusion
CN117150233B (en) Power grid abnormal data management method, system, equipment and medium
CN108268467B (en) Attribute-based abnormal data detection method and device
CN106960183A (en) A kind of image pedestrian's detection algorithm that decision tree is lifted based on gradient
CN115830341A (en) Camera offset detection method based on feature point matching

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190219