CN109359138A - A kind of method for detecting abnormality and device based on Density Estimator - Google Patents
A kind of method for detecting abnormality and device based on Density Estimator Download PDFInfo
- Publication number
- CN109359138A CN109359138A CN201811219917.5A CN201811219917A CN109359138A CN 109359138 A CN109359138 A CN 109359138A CN 201811219917 A CN201811219917 A CN 201811219917A CN 109359138 A CN109359138 A CN 109359138A
- Authority
- CN
- China
- Prior art keywords
- probability
- offset
- characterizes
- standard value
- eigenvector
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Complex Calculations (AREA)
Abstract
The present invention provides a kind of method for detecting abnormality and device based on Density Estimator, comprising: obtain at least three feature vectors Jing Guo data processing in advance;Determine the corresponding density estimation of each described eigenvector;According to density estimation described in each, the probability density function of at least three feature vector is determined;According to the probability density function, the probability of each described eigenvector appearance is obtained;Determine the corresponding offset of each described probability;Each described offset is standardized, corresponding standard value is obtained;According to standard value described in each and preset threshold value, determine whether each described eigenvector is abnormal.This programme has extensive adaptability.
Description
Technical field
The present invention relates to Data Detection Technology field, in particular to a kind of method for detecting abnormality based on Density Estimator and
Device.
Background technique
With the development of information technology, big data era has been arrived.In the fields such as finance, network security and internet,
By learning a large amount of historical datas, normal data and abnormal data can be distinguished using Outlier Detection Algorithm, hence for exception
Problem carries out early warning.
Currently, commonly based on density anomaly detection algorithm have local outlier factor (Local Outlier Factor,
LOF) algorithm and its variant, such as simplified-LOF algorithm, LDF algorithm and LOOP algorithm.
But these algorithms, which are suitable for particular data set scene, is distributed sparse point, i.e. outlier to find, therefore not
With extensive adaptability.
Summary of the invention
The embodiment of the invention provides a kind of method for detecting abnormality and device based on Density Estimator has extensive suitable
Ying Xing.
In a first aspect, being obtained in advance the embodiment of the invention provides a kind of method for detecting abnormality based on Density Estimator
At least three feature vectors by data processing, further includes:
Determine the corresponding density estimation of each described eigenvector;
According to density estimation described in each, the probability density function of at least three feature vector is determined;
According to the probability density function, the probability of each described eigenvector appearance is obtained;
Determine the corresponding offset of each described probability;
Each described offset is standardized, corresponding standard value is obtained;
According to standard value described in each and preset threshold value, determine whether each described eigenvector is abnormal.
Preferably,
The corresponding density estimation of each described eigenvector of the determination, comprising:
According to following first formula, the corresponding density estimation of each described eigenvector is determined:
Wherein, DiThe corresponding density estimation of i-th of described eigenvector is characterized, k characterizes k quantity and i-th of spy
Levy the Neighbor Points of vector, Hh(d (i, p)) characterizes the kernel function of given pre-set bandwidths, wherein d (i, p) characterizes i-th of spy
Levy the distance between vector and p-th of described eigenvector.
Preferably,
The kernel function, comprising:
According to following second formula, the kernel function is determined:
Or,
According to following third formula, the kernel function is determined:
Wherein, π characterizes pi, and e characterizes natural constant.
Preferably,
It is described that the probability density function of at least three feature vector is determined according to density estimation described in each, packet
It includes:
It sums to each density estimation, obtains the probability density function of at least three feature vector;
Then,
It is described according to the probability density function, obtain each described eigenvector appearance probability, comprising:
For each described eigenvector, described eigenvector is substituted into the probability density function, described in acquisition
The probability that feature vector occurs.
Preferably,
The corresponding offset of each described probability of the determination, comprising:
According to following 4th formula, the corresponding offset of each described probability is determined:
Wherein, ZiCharacterize the corresponding offset of i-th of probability, XiI-th of probability is characterized, n characterizes the probability
Quantity.
Preferably,
It is described that each described offset is standardized, obtain corresponding standard value, comprising:
According to following 5th formula, the corresponding standard value of each described offset is determined:
Wherein, Bi characterizes the corresponding standard value of i-th of offset, and Zi characterizes i-th of offset, Zmax characterization
The maximum offset of numerical value, Zmin characterize the smallest offset of numerical value.
Preferably,
It is described to determine whether each described eigenvector is abnormal according to standard value described in each and preset threshold value,
Include:
For standard value described in each, determine whether the standard value is greater than preset threshold value;
If so, determining that the corresponding described eigenvector of the standard value is abnormal.
Second aspect, the embodiment of the invention provides a kind of abnormal detectors based on Density Estimator, comprising:
Data capture unit, for obtaining at least three feature vectors by data processing in advance;
Calculation processing unit, for determining that each described eigenvector of the acquisition of the data capture unit is corresponding close
Degree estimation;According to density estimation described in each, the probability density function of at least three feature vector is determined;According to described
Probability density function obtains the probability of each described eigenvector appearance;Determine the corresponding offset of each described probability;
Each described offset is standardized, corresponding standard value is obtained;
Abnormality detecting unit, each described standard value and preset threshold for being obtained according to the calculation processing unit
Value determines whether each described eigenvector is abnormal.
Preferably,
The calculation processing unit, for determining that each described eigenvector is corresponding close according to following first formula
Degree estimation:
Wherein, DiThe corresponding density estimation of i-th of described eigenvector is characterized, k characterizes k quantity and i-th of spy
Levy the Neighbor Points of vector, Hh(d (i, p)) characterizes the kernel function of given pre-set bandwidths, wherein d (i, p) characterizes i-th of spy
Levy the distance between vector and p-th of described eigenvector.
Preferably,
The kernel function, comprising:
According to following second formula, the kernel function is determined:
Or,
According to following third formula, the kernel function is determined:
Wherein, π characterizes pi, and e characterizes natural constant.
Preferably,
The calculation processing unit obtains at least three feature for summing to each density estimation
The probability density function of vector;For each described eigenvector, described eigenvector is substituted into the probability density function
In, obtain the probability that described eigenvector occurs.
Preferably,
The calculation processing unit determines the corresponding offset of each described probability for according to following 4th formula:
Wherein, ZiCharacterize the corresponding offset of i-th of probability, XiI-th of probability is characterized, n characterizes the probability
Quantity.
Preferably,
The calculation processing unit determines the corresponding standard of each described offset for according to following 5th formula
Value:
Wherein, Bi characterizes the corresponding standard value of i-th of offset, and Zi characterizes i-th of offset, Zmax characterization
The maximum offset of numerical value, Zmin characterize the smallest offset of numerical value.
Preferably,
It is default to determine whether the standard value is greater than for being directed to each described standard value for the abnormality detecting unit
Threshold value;
If so, determining that the corresponding described eigenvector of the standard value is abnormal.
In embodiments of the present invention, by determining that each passes through density corresponding to the feature vector that data processing obtains
Estimation, can determine the probability density function of general characteristic vector, according to each available feature of the probability density function
The probability that vector occurs, then by determining the corresponding offset of probability, and it is standardized, offset can be obtained
Finally each standard value is compared with preset threshold value for corresponding standard value, that is, can determine whether feature vector is abnormal,
It is distributed sparse point without finding according to specific set of data scene, therefore there is extensive adaptability.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is the present invention
Some embodiments for those of ordinary skill in the art without creative efforts, can also basis
These attached drawings obtain other attached drawings.
Fig. 1 is a kind of flow chart for method for detecting abnormality based on Density Estimator that one embodiment of the invention provides;
Fig. 2 is the flow chart for another method for detecting abnormality based on Density Estimator that one embodiment of the invention provides;
Fig. 3 is a kind of structural representation for abnormal detector based on Density Estimator that one embodiment of the invention provides
Figure.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention
In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is
A part of the embodiment of the present invention, instead of all the embodiments, based on the embodiments of the present invention, those of ordinary skill in the art
Every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.
As shown in Figure 1, the embodiment of the invention provides a kind of method for detecting abnormality based on Density Estimator, comprising:
Step 101: obtaining at least three feature vectors Jing Guo data processing in advance;
Step 102: determining the corresponding density estimation of each described eigenvector;
Step 103: according to density estimation described in each, determining the probability density letter of at least three feature vector
Number;
Step 104: according to the probability density function, obtaining the probability of each described eigenvector appearance;
Step 105: determining the corresponding offset of each described probability;
Step 106: each described offset being standardized, corresponding standard value is obtained;
Step 107: according to standard value described in each and preset threshold value, determining whether each described eigenvector is different
Often.
In embodiments of the present invention, by determining that each passes through density corresponding to the feature vector that data processing obtains
Estimation, can determine the probability density function of general characteristic vector, according to each available feature of the probability density function
The probability that vector occurs, then by determining the corresponding offset of probability, and it is standardized, offset can be obtained
Finally each standard value is compared with preset threshold value for corresponding standard value, that is, can determine whether feature vector is abnormal,
It is distributed sparse point without finding according to specific set of data scene, therefore there is extensive adaptability.
It should be noted that a kind of method for detecting abnormality based on Density Estimator provided by the invention, can be applied to
Several scenes, such as unit exception state-detection in industrial production, the detection of abnormal operation in financial analysis, the thing in road traffic
Therefore multi-happening section etc..The demand that the detection of all kinds of unconventional data can also be met detects including outlier and assembles center
Detection etc..
In an embodiment of the present invention, the corresponding density estimation of each described eigenvector of the determination, comprising:
According to following first formula, the corresponding density estimation of each described eigenvector is determined:
Wherein, DiThe corresponding density estimation of i-th of described eigenvector is characterized, k characterizes k quantity and i-th of spy
Levy the Neighbor Points of vector, Hh(d (i, p)) characterizes the kernel function of given pre-set bandwidths, wherein d (i, p) characterizes i-th of spy
Levy the distance between vector and p-th of described eigenvector.
In embodiments of the present invention, by determining the k Neighbor Points with ith feature vector, that is, it can determine ith feature
Vector and the distance between any feature vector p in k Neighbor Points, and then determine the kernel function of given bandwidth, then to core letter
Number is summed, and finally can obtain the corresponding density estimation of each feature vector divided by the quantity of Neighbor Points.
In an embodiment of the present invention, the kernel function, comprising:
According to following second formula, the kernel function is determined:
Or,
According to following third formula, the kernel function is determined:
Wherein, π characterizes pi, and e characterizes natural constant.
In embodiments of the present invention, the kernel function for giving pre-set bandwidths, can pass through above-mentioned first formula or the second formula
It determines, and ith feature vector and the distance between any feature vector p in k Neighbor Points, it can be mahalanobis distance, Europe
Family name's distance, manhatton distance, reach distance.
In an embodiment of the present invention, described according to density estimation described in each, determine at least three feature to
The probability density function of amount, comprising:
It sums to each density estimation, obtains the probability density function of at least three feature vector;
Then,
It is described according to the probability density function, obtain each described eigenvector appearance probability, comprising:
For each described eigenvector, described eigenvector is substituted into the probability density function, described in acquisition
The probability that feature vector occurs.
In embodiments of the present invention, it sums to the density estimation of each feature vector, total characteristic vector can be obtained
Probability density function, then each feature vector is substituted into the probability density function respectively, feature vector can be obtained and existed
The probability that corresponding value in the curve of probability density function characterization, i.e. each feature vector occur.
In an embodiment of the present invention, the corresponding offset of each described probability of the determination, comprising:
According to following 4th formula, the corresponding offset of each described probability is determined:
Wherein, ZiCharacterize the corresponding offset of i-th of probability, XiI-th of probability is characterized, n characterizes the probability
Quantity.
In embodiments of the present invention, after determining the probability that each feature vector occurs, it is also necessary to each probability
It is standardized, the value obtained after standardization can be considered the corresponding offset of probability, i.e., is occurred according to general characteristic vector general
Rate determines the probability that any feature vector occurs, deviates the degree for the probability that each feature vector occurs, i.e., by any feature to
Measure the difference of the average value for the probability that the probability occurred and general characteristic vector occur, then the probability occurred divided by general characteristic vector
Standard deviation, can obtain any feature vector appearance probability corresponding to offset.To sum up, it is calculated compared to LOF etc.
Method compares extremum phenomenon that may be present using averag density, and more robust comparison can be provided by above-mentioned formula
As a result.
In an embodiment of the present invention, described that each described offset is standardized, obtain corresponding mark
Quasi- value, comprising:
According to following 5th formula, the corresponding standard value of each described offset is determined:
Wherein, Bi characterizes the corresponding standard value of i-th of offset, and Zi characterizes i-th of offset, Zmax characterization
The maximum offset of numerical value, Zmin characterize the smallest offset of numerical value.
In embodiments of the present invention, it after the offset corresponding to the probability for determining the appearance of each feature vector, also needs
Each offset is normalized, i.e., by the difference of any offset and the smallest offset of numerical value, divided by numerical value
The process of the difference of maximum offset and the smallest offset of numerical value, so that after being zoomed to the normalization in [0,1] section
Standard value (i.e. by the standard value after the normalization of each offset), to promote the accuracy rate of operation.
In an embodiment of the present invention, described according to standard value described in each and preset threshold value, determine each institute
Whether abnormal state feature vector, comprising:
For standard value described in each, determine whether the standard value is greater than preset threshold value;
If so, determining that the corresponding described eigenvector of the standard value is abnormal.
In embodiments of the present invention, each standard value is compared with preset threshold value respectively, that is, can determine each
Whether the corresponding feature vector of a standard value is abnormal, i.e., when standard value is greater than threshold value, illustrates the standard value and exception, therefore can
To determine that the corresponding feature vector of the standard value is abnormal, user can determine this feature vector institute according to abnormal feature vector
At least one corresponding initial data is abnormal, to realize the purpose of abnormality detection.
In order to more clearly illustrate technical solution of the present invention and advantage, below to one kind provided in an embodiment of the present invention
Method for detecting abnormality based on Density Estimator is described in detail, and can specifically include following steps:
Step 201: obtaining at least three feature vectors Jing Guo data processing.
Specifically, by carrying out duplicate removal to collected at least three datas, going null value, the number such as missing value and coding of filling a vacancy
According to processing operation, it can extract and be derived for certain valuable from a large amount of, rambling, elusive data
Value, at least three significant feature vectors.
Step 202: according to the quantity of the distance between corresponding each Neighbor Points of each feature vector and Neighbor Points, really
The fixed corresponding density estimation of each feature vector.
Specifically, the corresponding density estimation of each feature vector is determined according to following first formula,
Wherein, DiThe corresponding density estimation of ith feature vector is characterized, k characterizes k quantity and ith feature vector
Neighbor Points, Hh(d (i, p)) characterizes the kernel function of given pre-set bandwidths, wherein d (i, p) characterizes ith feature vector and p-th
The distance between feature vector.
And the kernel function of given pre-set bandwidths, it can be obtained according to following second formula or third formula:
Second formula:
Third publicity:
Wherein, π characterizes pi, and e characterizes natural constant 2.71828.
Step 203: summing to each density estimation, obtain the probability density function of at least three feature vectors.
Specifically, it when obtaining the probability density function of general characteristic vector, needs by summing to each density estimation
It obtains.
Step 204: being directed to each feature vector, feature vector is substituted into probability density function, obtain feature vector
The probability of appearance.
Specifically, it by substituting into each feature vector respectively in the probability density function of acquisition, can obtain each
The probability that the corresponding value of a feature vector, i.e. feature vector occur.
Step 205: according to the quantity of probability, determining the corresponding offset of each probability.
Specifically, according to following 4th formula, the corresponding offset of each probability is determined:
Wherein, ZiCharacterize the corresponding offset of i-th of probability, XiI-th of probability is characterized, n characterizes the quantity of probability.
Specifically, it when determining the corresponding offset of each probability, needs to be standardized each probability, standard
Value after change can be considered the corresponding offset of probability, i.e., by the difference of any probability and the average value of probability, divided by the mark of probability
Quasi- difference obtains.
Step 206: each offset is directed to, by offset divided by the maximum offset of numerical value and the smallest offset of numerical value
The difference of amount obtains the corresponding standard value of offset.
Specifically, according to following 5th formula, the corresponding standard value of each offset is determined:
Wherein, Bi characterizes the corresponding standard value of i-th of offset, and Zi characterizes i-th of offset, and it is maximum that Zmax characterizes numerical value
Offset, Zmin characterize the smallest offset of numerical value.
Each offset is normalized, the standard value after being zoomed to the normalization in [0,1] section
(i.e. by the standard value after the normalization of each offset), to promote the accuracy rate of operation.
Step 207: being directed to each standard value, determine whether standard value is greater than preset threshold value, if so, the standard of determination
It is abnormal to be worth corresponding feature vector.
Specifically, by the way that each standard value to be compared with preset threshold value respectively, that is, it can determine each standard
Whether abnormal it is worth corresponding feature vector, i.e., when standard value is greater than threshold value, description standard value is abnormal, thus may determine that standard
It is abnormal to be worth corresponding feature vector.
As shown in figure 3, the present invention provides a kind of abnormal detectors based on Density Estimator, comprising:
Data capture unit 301, for obtaining at least three feature vectors by data processing in advance;
Calculation processing unit 302, for determining each described eigenvector pair of the acquisition of the data capture unit 301
The density estimation answered;According to density estimation described in each, the probability density function of at least three feature vector is determined;Root
According to the probability density function, the probability of each described eigenvector appearance is obtained;Determine that each described probability is corresponding
Offset;Each described offset is standardized, corresponding standard value is obtained;
Abnormality detecting unit 303, each described standard value for being obtained according to the calculation processing unit 302 and pre-
If threshold value, determine whether each described eigenvector abnormal.
In embodiments of the present invention, each for determining that data capture unit obtains by calculation processing unit is by data
Density estimation corresponding to obtained feature vector is handled, the probability density function of general characteristic vector can be determined, according to this
The probability that each available feature vector of probability density function occurs, then by determining the corresponding offset of probability, and it is right
It is standardized, and can obtain the corresponding standard value of offset, finally by abnormality detecting unit by calculation processing list
Each standard value that member obtains is compared with preset threshold value, that is, can determine whether feature vector is abnormal, without basis
Specific set of data scene is distributed sparse point to find, therefore has extensive adaptability.
In an embodiment of the present invention, the calculation processing unit, for determining each institute according to following first formula
State the corresponding density estimation of feature vector:
Wherein, DiThe corresponding density estimation of i-th of described eigenvector is characterized, k characterizes k quantity and i-th of spy
Levy the Neighbor Points of vector, Hh(d (i, p)) characterizes the kernel function of given pre-set bandwidths, wherein d (i, p) characterizes i-th of spy
Levy the distance between vector and p-th of described eigenvector.
In an embodiment of the present invention, the kernel function, comprising:
According to following second formula, the kernel function is determined:
Or,
According to following third formula, the kernel function is determined:
Wherein, π characterizes pi, and e characterizes natural constant.
In an embodiment of the present invention, the calculation processing unit is obtained for summing to each density estimation
Obtain the probability density function of at least three feature vector;For each described eigenvector, by described eigenvector generation
Enter in the probability density function, obtains the probability that described eigenvector occurs.
In an embodiment of the present invention, the calculation processing unit, for determining each institute according to following 4th formula
State the corresponding offset of probability:
Wherein, ZiCharacterize the corresponding offset of i-th of probability, XiI-th of probability is characterized, n characterizes the probability
Quantity.
In an embodiment of the present invention, the calculation processing unit, for determining each institute according to following 5th formula
State the corresponding standard value of offset:
Wherein, Bi characterizes the corresponding standard value of i-th of offset, and Zi characterizes i-th of offset, Zmax characterization
The maximum offset of numerical value, Zmin characterize the smallest offset of numerical value.
In an embodiment of the present invention, the abnormality detecting unit, described in determining for each described standard value
Whether standard value is greater than preset threshold value;
If so, determining that the corresponding described eigenvector of the standard value is abnormal.
The each embodiment of the present invention at least has the following beneficial effects:
1, in an embodiment of the present invention, by determining that each passes through corresponding to the feature vector that data processing obtains
Density estimation can determine the probability density function of general characteristic vector, according to the probability density function it is available each
The probability that feature vector occurs, then by determining the corresponding offset of probability, and it is standardized, it can obtain partially
Each standard value is finally compared with preset threshold value, that is, whether can determine feature vector by the corresponding standard value of shifting amount
It is abnormal, it is distributed sparse point without finding according to specific set of data scene, therefore there is extensive adaptability.
2, it in an embodiment of the present invention, by determining the k Neighbor Points with ith feature vector, that is, can determine i-th
Feature vector and the distance between any feature vector p in k Neighbor Points, and then determine the kernel function of given bandwidth, then right
Kernel function is summed, and finally can obtain the corresponding density estimation of each feature vector divided by the quantity of Neighbor Points.
3, the kernel function in an embodiment of the present invention, giving pre-set bandwidths, can pass through above-mentioned first formula or second
Formula determines, and ith feature vector and the distance between any feature vector p in k Neighbor Points, can be geneva away from
From, Euclidean distance, manhatton distance, reach distance.
It should be noted that, in this document, such as first and second etc relational terms are used merely to an entity
Or operation is distinguished with another entity or operation, is existed without necessarily requiring or implying between these entities or operation
Any actual relationship or order.Moreover, the terms "include", "comprise" or its any other variant be intended to it is non-
It is exclusive to include, so that the process, method, article or equipment for including a series of elements not only includes those elements,
It but also including other elements that are not explicitly listed, or further include solid by this process, method, article or equipment
Some elements.In the absence of more restrictions, the element limited by sentence " including one ", is not arranged
Except there is also other identical factors in the process, method, article or apparatus that includes the element.
Finally, it should be noted that the foregoing is merely presently preferred embodiments of the present invention, it is merely to illustrate skill of the invention
Art scheme, is not intended to limit the scope of the present invention.Any modification for being made all within the spirits and principles of the present invention,
Equivalent replacement, improvement etc., are included within the scope of protection of the present invention.
Claims (10)
1. a kind of method for detecting abnormality based on Density Estimator, which is characterized in that obtain in advance by data processing at least
Three feature vectors, further includes:
Determine the corresponding density estimation of each described eigenvector;
According to density estimation described in each, the probability density function of at least three feature vector is determined;
According to the probability density function, the probability of each described eigenvector appearance is obtained;
Determine the corresponding offset of each described probability;
Each described offset is standardized, corresponding standard value is obtained;
According to standard value described in each and preset threshold value, determine whether each described eigenvector is abnormal.
2. the method according to claim 1, wherein
The corresponding density estimation of each described eigenvector of the determination, comprising:
According to following first formula, the corresponding density estimation of each described eigenvector is determined:
Wherein, DiThe corresponding density estimation of i-th of described eigenvector is characterized, k characterizes k quantity and i-th of described eigenvector
Neighbor Points, Hh(d (i, p)) characterizes the kernel function of given pre-set bandwidths, wherein d (i, p) characterizes i-th of described eigenvector
The distance between p-th of described eigenvector.
3. according to the method described in claim 2, it is characterized in that,
The kernel function, comprising:
According to following second formula, the kernel function is determined:
Or,
According to following third formula, the kernel function is determined:
Wherein, π characterizes pi, and e characterizes natural constant.
4. the method according to claim 1, wherein
It is described according to density estimation described in each, determine the probability density function of at least three feature vector, comprising:
It sums to each density estimation, obtains the probability density function of at least three feature vector;
Then,
It is described according to the probability density function, obtain each described eigenvector appearance probability, comprising:
For each described eigenvector, described eigenvector is substituted into the probability density function, obtains the feature
The probability that vector occurs.
5. according to claim 1 to any method in 4, which is characterized in that
The corresponding offset of each described probability of the determination, comprising:
According to following 4th formula, the corresponding offset of each described probability is determined:
Wherein, ZiCharacterize the corresponding offset of i-th of probability, XiI-th of probability is characterized, n characterizes the number of the probability
Amount;
And/or
It is described that each described offset is standardized, obtain corresponding standard value, comprising:
According to following 5th formula, the corresponding standard value of each described offset is determined:
Wherein, Bi characterizes the corresponding standard value of i-th of offset, and Zi characterizes i-th of offset, and Zmax characterizes numerical value
The maximum offset, Zmin characterize the smallest offset of numerical value;
And/or
It is described according to standard value described in each and preset threshold value, determine whether each described eigenvector abnormal, comprising:
For standard value described in each, determine whether the standard value is greater than preset threshold value;
If so, determining that the corresponding described eigenvector of the standard value is abnormal.
6. a kind of abnormal detector based on Density Estimator characterized by comprising
Data capture unit, for obtaining at least three feature vectors by data processing in advance;
Calculation processing unit, for determining that the corresponding density of each described eigenvector of the acquisition of the data capture unit is estimated
Meter;According to density estimation described in each, the probability density function of at least three feature vector is determined;According to the probability
Density function obtains the probability of each described eigenvector appearance;Determine the corresponding offset of each described probability;To every
One offset is standardized, and obtains corresponding standard value;
Abnormality detecting unit, each described standard value and preset threshold value for being obtained according to the calculation processing unit,
Determine whether each described eigenvector is abnormal.
7. device according to claim 6, which is characterized in that
The calculation processing unit, for determining that the corresponding density of each described eigenvector is estimated according to following first formula
Meter:
Wherein, DiThe corresponding density estimation of i-th of described eigenvector is characterized, k characterizes k quantity and i-th of described eigenvector
Neighbor Points, Hh(d (i, p)) characterizes the kernel function of given pre-set bandwidths, wherein d (i, p) characterizes i-th of described eigenvector
The distance between p-th of described eigenvector.
8. device according to claim 7, which is characterized in that
The kernel function, comprising:
According to following second formula, the kernel function is determined:
Or,
According to following third formula, the kernel function is determined:
Wherein, π characterizes pi, and e characterizes natural constant.
9. device according to claim 6, which is characterized in that
The calculation processing unit obtains at least three feature vector for summing to each density estimation
Probability density function;For each described eigenvector, described eigenvector is substituted into the probability density function, is obtained
The probability for taking described eigenvector to occur.
10. according to the device any in claim 6 to 9, which is characterized in that
The calculation processing unit determines the corresponding offset of each described probability for according to following 4th formula:
Wherein, ZiCharacterize the corresponding offset of i-th of probability, XiI-th of probability is characterized, n characterizes the number of the probability
Amount;
And/or
The calculation processing unit determines the corresponding standard value of each described offset for according to following 5th formula:
Wherein, Bi characterizes the corresponding standard value of i-th of offset, and Zi characterizes i-th of offset, and Zmax characterizes numerical value
The maximum offset, Zmin characterize the smallest offset of numerical value;
And/or
The abnormality detecting unit determines whether the standard value is greater than preset threshold for being directed to each described standard value
Value;
If so, determining that the corresponding described eigenvector of the standard value is abnormal.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811219917.5A CN109359138A (en) | 2018-10-19 | 2018-10-19 | A kind of method for detecting abnormality and device based on Density Estimator |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811219917.5A CN109359138A (en) | 2018-10-19 | 2018-10-19 | A kind of method for detecting abnormality and device based on Density Estimator |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109359138A true CN109359138A (en) | 2019-02-19 |
Family
ID=65345921
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811219917.5A Pending CN109359138A (en) | 2018-10-19 | 2018-10-19 | A kind of method for detecting abnormality and device based on Density Estimator |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109359138A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110098983A (en) * | 2019-05-28 | 2019-08-06 | 上海优扬新媒信息技术有限公司 | A kind of detection method and device of abnormal flow |
CN110806733A (en) * | 2019-10-30 | 2020-02-18 | 中国神华能源股份有限公司国华电力分公司 | Thermal power plant equipment monitoring method and device and electronic equipment |
CN111683102A (en) * | 2020-06-17 | 2020-09-18 | 绿盟科技集团股份有限公司 | FTP behavior data processing method, and method and device for identifying abnormal FTP behavior |
CN112232719A (en) * | 2020-12-11 | 2021-01-15 | 北京基调网络股份有限公司 | Index quantitative scoring method, computer equipment and storage medium |
CN114896024A (en) * | 2022-03-28 | 2022-08-12 | 同方威视技术股份有限公司 | Method and device for detecting running state of virtual machine based on kernel density estimation |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6115708A (en) * | 1998-03-04 | 2000-09-05 | Microsoft Corporation | Method for refining the initial conditions for clustering with applications to small and large database clustering |
CN103916896A (en) * | 2014-03-26 | 2014-07-09 | 浙江农林大学 | Anomaly detection method based on multi-dimensional Epanechnikov kernel density estimation |
CN105721199A (en) * | 2016-01-18 | 2016-06-29 | 中国石油大学(华东) | Real-time cloud service bottleneck detection method based on kernel density estimation and fuzzy inference system |
CN106789885A (en) * | 2016-11-17 | 2017-05-31 | 国家电网公司 | User's unusual checking analysis method under a kind of big data environment |
CN107092582A (en) * | 2017-03-31 | 2017-08-25 | 江苏方天电力技术有限公司 | One kind is based on the posterior exceptional value on-line checking of residual error and method for evaluating confidence |
-
2018
- 2018-10-19 CN CN201811219917.5A patent/CN109359138A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6115708A (en) * | 1998-03-04 | 2000-09-05 | Microsoft Corporation | Method for refining the initial conditions for clustering with applications to small and large database clustering |
CN103916896A (en) * | 2014-03-26 | 2014-07-09 | 浙江农林大学 | Anomaly detection method based on multi-dimensional Epanechnikov kernel density estimation |
CN105721199A (en) * | 2016-01-18 | 2016-06-29 | 中国石油大学(华东) | Real-time cloud service bottleneck detection method based on kernel density estimation and fuzzy inference system |
CN106789885A (en) * | 2016-11-17 | 2017-05-31 | 国家电网公司 | User's unusual checking analysis method under a kind of big data environment |
CN107092582A (en) * | 2017-03-31 | 2017-08-25 | 江苏方天电力技术有限公司 | One kind is based on the posterior exceptional value on-line checking of residual error and method for evaluating confidence |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110098983A (en) * | 2019-05-28 | 2019-08-06 | 上海优扬新媒信息技术有限公司 | A kind of detection method and device of abnormal flow |
CN110098983B (en) * | 2019-05-28 | 2021-06-04 | 上海优扬新媒信息技术有限公司 | Abnormal flow detection method and device |
CN110806733A (en) * | 2019-10-30 | 2020-02-18 | 中国神华能源股份有限公司国华电力分公司 | Thermal power plant equipment monitoring method and device and electronic equipment |
CN110806733B (en) * | 2019-10-30 | 2021-09-21 | 中国神华能源股份有限公司国华电力分公司 | Thermal power plant equipment monitoring method and device and electronic equipment |
CN111683102A (en) * | 2020-06-17 | 2020-09-18 | 绿盟科技集团股份有限公司 | FTP behavior data processing method, and method and device for identifying abnormal FTP behavior |
CN111683102B (en) * | 2020-06-17 | 2022-12-06 | 绿盟科技集团股份有限公司 | FTP behavior data processing method, and method and device for identifying abnormal FTP behavior |
CN112232719A (en) * | 2020-12-11 | 2021-01-15 | 北京基调网络股份有限公司 | Index quantitative scoring method, computer equipment and storage medium |
CN114896024A (en) * | 2022-03-28 | 2022-08-12 | 同方威视技术股份有限公司 | Method and device for detecting running state of virtual machine based on kernel density estimation |
CN114896024B (en) * | 2022-03-28 | 2022-11-22 | 同方威视技术股份有限公司 | Method and device for detecting running state of virtual machine based on kernel density estimation |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109359138A (en) | A kind of method for detecting abnormality and device based on Density Estimator | |
CN110995508B (en) | KPI mutation-based adaptive unsupervised online network anomaly detection method | |
US7613668B2 (en) | Anomaly detection system and a method of teaching it | |
CN110895526A (en) | Method for correcting data abnormity in atmosphere monitoring system | |
CN107679734A (en) | It is a kind of to be used for the method and system without label data classification prediction | |
CN111811567B (en) | Equipment detection method based on curve inflection point comparison and related device | |
US7716152B2 (en) | Use of sequential nearest neighbor clustering for instance selection in machine condition monitoring | |
CN113344133B (en) | Method and system for detecting abnormal fluctuation of time sequence behaviors | |
CN112788066A (en) | Abnormal flow detection method and system for Internet of things equipment and storage medium | |
CN112258689B (en) | Ship data processing method and device and ship data quality management platform | |
CN108647737A (en) | A kind of auto-adaptive time sequence variation detection method and device based on cluster | |
Weiß | Continuously monitoring categorical processes | |
CN116066343A (en) | Intelligent early warning method and system for fault model of oil delivery pump unit | |
CN117591836B (en) | Pipeline detection data analysis method and related device | |
CN109584232A (en) | Equipment use state on-line monitoring method, system and terminal based on image recognition | |
Tang et al. | Traffic outlier detection by density-based bounded local outlier factors | |
CN113723861A (en) | Abnormal electricity consumption behavior detection method and device, computer equipment and storage medium | |
CN112949714A (en) | Fault possibility estimation method based on random forest | |
CN108268901A (en) | A kind of algorithm that environmental monitoring abnormal data is found based on dynamic time warping distance | |
CN113987243A (en) | Image file gathering method, image file gathering device and computer readable storage medium | |
CN110046651B (en) | Pipeline state identification method based on monitoring data multi-attribute feature fusion | |
CN117150233B (en) | Power grid abnormal data management method, system, equipment and medium | |
CN108268467B (en) | Attribute-based abnormal data detection method and device | |
CN106960183A (en) | A kind of image pedestrian's detection algorithm that decision tree is lifted based on gradient | |
CN115830341A (en) | Camera offset detection method based on feature point matching |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190219 |