CN107545273A - A kind of local outlier detection method based on density - Google Patents
A kind of local outlier detection method based on density Download PDFInfo
- Publication number
- CN107545273A CN107545273A CN201710559390.XA CN201710559390A CN107545273A CN 107545273 A CN107545273 A CN 107545273A CN 201710559390 A CN201710559390 A CN 201710559390A CN 107545273 A CN107545273 A CN 107545273A
- Authority
- CN
- China
- Prior art keywords
- mrow
- neighborhood
- msub
- density
- represented
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Complex Calculations (AREA)
Abstract
A kind of local outlier detection method based on density provided by the invention, the degree of scatter of object and its neighborhood object is taken into full account, compared with traditional algorithm, the exceptional value that the present invention obtains is more sensitive to the intensity of anomaly of scattered data set, and the degree of accuracy of testing result is higher.Comprise the following steps:The attribute of 1 pair of data set is normalized;2 search for the k evidence from test object arest neighbors;3 calculate the average value of distance between object and its neighborhood object, and are designated as the k neighborhood distances of object;The k neighborhoods distance of 4 pairs of data carries out global normalization's processing;5 calculate the variance of distance between object and its neighborhood object, and are designated as the k neighborhood variances of object;The k neighborhoods variance of 6 pairs of data carries out global normalization's processing;7 calculate the neighborhood decentralization of object;8 calculate the neighborhood density of object;9 calculate object local outlier factor;10 determine that the maximum object of outlier is outlier.
Description
Technical field
The present invention relates to a kind of local outlier detection method based on density, belong to computer science and technology field.
Background technology
Abnormality detection is one of basic task of data mining, the purpose is to abate the noise, or is found potential significant
Knowledge.Since the last century 80's, the research to abnormality detection experienced the alternating of prosperity and decline several times.In recent years, with letter
The development of breath technology, in application field the driving of actual demand, the rapid development of sensor technology people are obtained easily
Get bulk information data.Turn into an active branch in information science again for this abnormality detection, dug in data flow, data
The multiple fields such as pick, machine learning and statistics are of great interest, and be commonly employed prospect.Now, abnormal inspection
Survey is widely used in intrusion detection, fraud detection, the inspection of industry damage, health care monitoring etc..
For earliest Outlier Detection Algorithm for whole data set, obtained result is that one group of overall situation peels off point set.
But in many reality are used, the data that are obtained are simultaneously imperfect, and many times user is also only concerned local shakiness
It is qualitative.Relative to global outlier, local outlier is referred to as based on the exception object locally studied.Since the factor that locally peels off
(LOF) after proposing, there are many parts and peel off method for checking object.The monitoring needs of local outlier solve local adjacent
The determination in domain and object calculate two subproblems compared with its neighborhood.Existing algorithm is carried out to LOF algorithms from different angles
Improve and extension, good Detection results are achieved in the data set of some specific distributions.The inspection but existing part peels off
Method of determining and calculating overall thinking object deviates the degree of its neighborhood object, is not concerned with overall degree of scatter between them,
Thus when Outliers Detection is carried out to scattered data set, the precision of these algorithms will seriously be affected.
The content of the invention
To solve the above problems, the present invention is defined using the expectation and variance of the distance between object and its adjacent region data
The k neighborhood decentralization of object, local outlier factor is redefined according to k neighborhood decentralization, and proposed a kind of new part
Outliers Detection method.Relative to conventional method, the exceptional value that our method obtains to the intensity of anomaly of scattered data set more
Sensitivity, the degree of accuracy of testing result are higher.
Specifically, the invention provides a kind of local outlier detection method based on density, this method to include:
Step 1, each attribute of data set is normalized;
Step 2, the k object from each object arest neighbors is searched for;
Step 3, the average value of distance between each object and its neighborhood object is calculated, and is designated as the k neighborhood distances of object;
Step 4, global normalization's processing is carried out to the k neighborhoods distance of each object;
Step 5, the variance of distance between each object and its neighborhood object is calculated, and is designated as the k neighborhood variances of object;
Step 6, global normalization's processing is carried out to the k neighborhoods variance of each object;
Step 7, the k neighborhood decentralization of each object is calculated;
Step 8, the k neighborhood density of each object is calculated;
Step 9, each object local outlier factor is calculated;
Step 10, the object that peels off of data set is determined;
Wherein, the data set attribute normalization operation of step 1 is represented by:
ajiRepresent the i-th dimension data of j-th of data object in data set.
Wherein, the k of step 2 is threshold value given in advance.
Wherein, step 7 neighborhood decentralization Nk-disp(o) calculating is represented by:
Nnk-adist(o) global normalization of the k neighborhood distances of object, Nn are representedk-vari(o) the k neighborhood sides of object are represented
The global normalization of difference, Nk-adist(o) the k neighborhood distances of object are represented.
Wherein, the calculating neighborhood density N of step 8k-dens(o) calculating is represented by:
Nk-dens(o) the neighborhood density of object, N are representedk-adist(o) the k neighborhood distances of object are represented.As object o and its institute
There is the coincidence of neighborhood object, in order to avoid Nk-dens(o) it is meaningless, while ensure that o k neighborhoods density is maximum, now directly allow
Nk-dens(o) the slightly larger value of the neighborhood density than other all objects is taken in data set.
Wherein, the peel off determination of object of step 10 includes:
Step 101, data set is sorted in descending order by the outlier size of object;
Step 102, the maximum preceding m of outlier is takenoutlOutlier of the individual object as data set, moutlIt is given in advance
Threshold value.
The beneficial functional of the present invention is:The present invention combines the degree of scatter of object and its neighborhood object, with object and its
The expectation of the distance between neighborhood object and variance define the k neighborhood decentralization of object, are redefined using k neighborhood decentralization
Local outlier factor, has taken into full account the regularity of distribution of data set, and exceptional value that algorithm obtains is to disperseing the data of data set
The intensity of anomaly of object is more sensitive, and the degree of accuracy of testing result is higher.
Brief description of the drawings
Fig. 1 is a kind of flow chart of local outlier detection method based on density of the present invention.
Embodiment
It must be more clearly understood to express the object, technical solutions and advantages of the present invention, below in conjunction with the accompanying drawings and specifically
The present invention is further described in more detail for embodiment.
Assuming that data set O={ o1,o2,…,onBe made up of m object, each object oi={ a1,a2,…,am}(1≤
I≤n) it is n dimension datas.Main idea is that with reference to the regularity of distribution of object and its neighborhood object, using between them
Distance expectation and its variance represent object local outlier factor, what new algorithm obtained peel off coefficient can be more accurate
Peel off degree of the object in subrange is represented, improves the accuracy and the scope of application of Outliers Detection algorithm.
Each step is described in detail with data set O and its any one object o respectively below:
Step 1, to any the dimension i, A of data seti=[a1i,a2i,...,ani] (1≤i≤m) be normalized;
Further, wherein, the normalization operation of any dimension data value is represented by:
Step 2, the set N of k evidence nearest apart from object o in data set O is determinedk(o), and remember | Nk(o) |=k;
Step 21, the Euclidean distance between object o and other objects is calculated;
Step 22, the object in data set O in addition to object o is ranked up from small to large by distance and takes preceding k object
K neighborhoods as object o;
Step 3, the average value of the distance between object o and its k neighborhood object is calculated, is designated as Nk-adist(o);
Further, wherein, N is calculatedk-adist(o) it is represented by:
Step 4, to Nk-adist(o) global normalization is carried out, and is designated as Nnk-adist(o);
Further, wherein, Nn is calculatedk-adist(o) it is represented by:
Step 5, the variance of the distance between object o and its k neighborhood object is calculated, is designated as Nk-vari(o);
Further, wherein, N is calculatedk-vari(o) it is represented by:
Step 6, to Nk-vari(o) global normalization is carried out, is designated as Nnk-vari(o);
Further, wherein, Nn is calculatedk-vari(o) it is represented by:
Step 7, object o k neighborhood decentralization is calculated, is designated as Nk-disp(o);
Further, wherein, N is calculatedk-disp(o) it is represented by:
Step 8, object o k neighborhood density is calculated, is designated as Nk-dens(o);
Further, wherein, N is calculatedk-dens(o) it is represented by:
When object o overlaps with its all neighborhood object, in order to avoid Nk-dens(o) it is meaningless, while ensure that o k neighborhoods are close
Degree is maximum, now directly allows Nk-dens(o) the slightly larger value of the neighborhood density than other all objects is taken in data set.
Step 9, object o k neighborhood degree of peeling off is calculated, is designated as VLDC (o);
Further, wherein, calculate VLDC (o) and be represented by:
Step 10, the object that peels off of data set is determined;
Step 101, data set is ranked up from big to small by the coefficient that peels off;
Step 102, the maximum preceding m of coefficient that peels off is takenoutlOutlier of the individual object as data set, moutlIt is given in advance
Threshold value;
Certainly, the present invention can also have other various embodiments, ripe in the case of without departing substantially from spirit of the invention and its essence
Know those skilled in the art when can be made according to the present invention it is various it is corresponding change and deformation, but these corresponding change and become
Shape should all belong to the protection domain of appended claims of the invention.
Claims (3)
- A kind of 1. local outlier detection method based on density, it is characterised in that comprise the following steps:Step 1, each attribute of data set is normalized;Step 2, the k object from each object arest neighbors is searched for;Step 3, the average value of distance between each object and its neighborhood object is calculated, and is designated as the k neighborhood distances of object;Step 4, global normalization's processing is carried out to the k neighborhoods distance of each object;Step 5, the variance of distance between each object and its neighborhood object is calculated, and is designated as the k neighborhood variances of object;Step 6, global normalization's processing is carried out to the k neighborhoods variance of each object;Step 7, the k neighborhood decentralization of each object is calculated;Step 8, the k neighborhood density of each object is calculated;Step 9, each object local outlier factor is calculated;Step 10, the object that peels off of data set is determined.
- 2. the method according to claim 11, wherein, step 7 neighborhood decentralization Nk-disp(o) calculating is represented by:<mrow> <msub> <mi>N</mi> <mrow> <mi>k</mi> <mo>-</mo> <mi>d</mi> <mi>i</mi> <mi>s</mi> <mi>p</mi> </mrow> </msub> <mrow> <mo>(</mo> <mi>o</mi> <mo>)</mo> </mrow> <mo>=</mo> <msup> <mrow> <mo>(</mo> <msub> <mi>Nn</mi> <mrow> <mi>k</mi> <mo>-</mo> <mi>a</mi> <mi>d</mi> <mi>i</mi> <mi>s</mi> <mi>t</mi> </mrow> </msub> <mo>(</mo> <mi>o</mi> <mo>)</mo> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> <mrow> <msub> <mi>N</mi> <mrow> <mi>k</mi> <mo>-</mo> <mi>a</mi> <mi>d</mi> <mi>i</mi> <mi>s</mi> <mi>t</mi> </mrow> </msub> <mrow> <mo>(</mo> <mi>o</mi> <mo>)</mo> </mrow> <mo>*</mo> <msub> <mi>Nn</mi> <mrow> <mi>k</mi> <mo>-</mo> <mi>var</mi> <mi>i</mi> </mrow> </msub> <mrow> <mo>(</mo> <mi>o</mi> <mo>)</mo> </mrow> </mrow> </msup> </mrow>Nnk-adist(o) global normalization of the k neighborhood distances of object, Nn are representedk-vari(o) the complete of the k neighborhood variances of object is represented Office's normalization, Nk-adist(o) the k neighborhood distances of object are represented.
- 3. according to the method for claim 1, wherein, calculate neighborhood density Nk-dens(o) calculating is represented by:<mrow> <msub> <mi>N</mi> <mrow> <mi>k</mi> <mo>-</mo> <mi>d</mi> <mi>e</mi> <mi>n</mi> <mi>s</mi> </mrow> </msub> <mrow> <mo>(</mo> <mi>o</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <msub> <mi>N</mi> <mrow> <mi>k</mi> <mo>-</mo> <mi>d</mi> <mi>i</mi> <mi>s</mi> <mi>p</mi> </mrow> </msub> <mrow> <mo>(</mo> <mi>o</mi> <mo>)</mo> </mrow> </mrow> <mrow> <msub> <mi>N</mi> <mrow> <mi>k</mi> <mo>-</mo> <mi>a</mi> <mi>d</mi> <mi>i</mi> <mi>s</mi> <mi>t</mi> </mrow> </msub> <mrow> <mo>(</mo> <mi>o</mi> <mo>)</mo> </mrow> </mrow> </mfrac> </mrow>Nk-dens(o) the neighborhood density of object, N are representedk-adist(o) the k neighborhood distances of object are represented.As object o and its all neighbour Field object overlaps, in order to avoid Nk-dens(o) it is meaningless, while ensure that o k neighborhoods density is maximum, now directly allow Nk-dens(o) The slightly larger value of the neighborhood density than other all objects is taken in data set.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710559390.XA CN107545273A (en) | 2017-07-06 | 2017-07-06 | A kind of local outlier detection method based on density |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710559390.XA CN107545273A (en) | 2017-07-06 | 2017-07-06 | A kind of local outlier detection method based on density |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107545273A true CN107545273A (en) | 2018-01-05 |
Family
ID=60971124
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710559390.XA Pending CN107545273A (en) | 2017-07-06 | 2017-07-06 | A kind of local outlier detection method based on density |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107545273A (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108333314A (en) * | 2018-04-02 | 2018-07-27 | 深圳凯达通光电科技有限公司 | A kind of air pollution intelligent monitor system |
CN109740175A (en) * | 2018-11-18 | 2019-05-10 | 浙江大学 | A kind of point judging method that peels off towards Wind turbines power curve data |
CN110648741A (en) * | 2018-06-27 | 2020-01-03 | 清华大学 | Method and device for identifying doctor with abnormal prescription based on local outlier factor |
CN112070109A (en) * | 2020-07-21 | 2020-12-11 | 广东工业大学 | Calla kiln energy consumption abnormity detection method based on improved density peak clustering |
CN113158871A (en) * | 2021-04-15 | 2021-07-23 | 重庆大学 | Wireless signal intensity abnormity detection method based on density core |
CN113191432A (en) * | 2021-05-06 | 2021-07-30 | 中国联合网络通信集团有限公司 | Outlier factor-based virtual machine cluster anomaly detection method, device and medium |
CN113408667A (en) * | 2021-07-30 | 2021-09-17 | 中国南方电网有限责任公司超高压输电公司检修试验中心 | State evaluation method, device, equipment and storage medium |
CN117272216A (en) * | 2023-11-22 | 2023-12-22 | 中国建材检验认证集团湖南有限公司 | Data analysis method for automatic flow monitoring station and manual water gauge observation station |
CN117854279A (en) * | 2024-01-09 | 2024-04-09 | 南京清正源信息技术有限公司 | Road condition prediction method and system based on edge calculation |
-
2017
- 2017-07-06 CN CN201710559390.XA patent/CN107545273A/en active Pending
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108333314A (en) * | 2018-04-02 | 2018-07-27 | 深圳凯达通光电科技有限公司 | A kind of air pollution intelligent monitor system |
CN110648741A (en) * | 2018-06-27 | 2020-01-03 | 清华大学 | Method and device for identifying doctor with abnormal prescription based on local outlier factor |
CN109740175A (en) * | 2018-11-18 | 2019-05-10 | 浙江大学 | A kind of point judging method that peels off towards Wind turbines power curve data |
CN109740175B (en) * | 2018-11-18 | 2020-12-08 | 浙江大学 | Outlier discrimination method for power curve data of wind turbine generator |
CN112070109B (en) * | 2020-07-21 | 2023-06-23 | 广东工业大学 | Water chestnut kiln energy consumption abnormality detection method based on improved density peak value clustering |
CN112070109A (en) * | 2020-07-21 | 2020-12-11 | 广东工业大学 | Calla kiln energy consumption abnormity detection method based on improved density peak clustering |
CN113158871A (en) * | 2021-04-15 | 2021-07-23 | 重庆大学 | Wireless signal intensity abnormity detection method based on density core |
CN113158871B (en) * | 2021-04-15 | 2022-08-02 | 重庆大学 | Wireless signal intensity abnormity detection method based on density core |
CN113191432A (en) * | 2021-05-06 | 2021-07-30 | 中国联合网络通信集团有限公司 | Outlier factor-based virtual machine cluster anomaly detection method, device and medium |
CN113191432B (en) * | 2021-05-06 | 2023-07-07 | 中国联合网络通信集团有限公司 | Outlier factor-based virtual machine cluster abnormality detection method, device and medium |
CN113408667A (en) * | 2021-07-30 | 2021-09-17 | 中国南方电网有限责任公司超高压输电公司检修试验中心 | State evaluation method, device, equipment and storage medium |
CN117272216A (en) * | 2023-11-22 | 2023-12-22 | 中国建材检验认证集团湖南有限公司 | Data analysis method for automatic flow monitoring station and manual water gauge observation station |
CN117272216B (en) * | 2023-11-22 | 2024-02-09 | 中国建材检验认证集团湖南有限公司 | Data analysis method for automatic flow monitoring station and manual water gauge observation station |
CN117854279A (en) * | 2024-01-09 | 2024-04-09 | 南京清正源信息技术有限公司 | Road condition prediction method and system based on edge calculation |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107545273A (en) | A kind of local outlier detection method based on density | |
Muller et al. | OutRank: ranking outliers in high dimensional data | |
Entezami et al. | Non-parametric empirical machine learning for short-term and long-term structural health monitoring | |
CN104216349B (en) | Utilize the yield analysis system and method for the sensing data of manufacturing equipment | |
Liu et al. | A two-stage approach for predicting the remaining useful life of tools using bidirectional long short-term memory | |
US20050209820A1 (en) | Diagnostic data detection and control | |
Pavlovski et al. | Hierarchical convolutional neural networks for event classification on PMU measurements | |
Kim et al. | Extracting major lines by recruiting zero-threshold canny edge links along sobel highlights | |
Arul et al. | Data anomaly detection for structural health monitoring of bridges using shapelet transform | |
CN103593470B (en) | The integrated unbalanced data flow classification algorithm of a kind of two degree | |
CN108647737A (en) | A kind of auto-adaptive time sequence variation detection method and device based on cluster | |
Li et al. | Robust outlier detection based on the changing rate of directed density ratio | |
Wang et al. | Automatic identification of spatial defect patterns for semiconductor manufacturing | |
CN112949735A (en) | Liquid hazardous chemical substance volatile concentration abnormity discovery method based on outlier data mining | |
Zhang et al. | Discretizing numerical attributes in decision tree for big data analysis | |
Sammour et al. | An agglomerative hierarchical clustering with various distance measurements for ground level ozone clustering in Putrajaya, Malaysia | |
Prabhakaran et al. | Towards prediction of paradigm shifts from scientific literature | |
Saravanan et al. | Prediction of insufficient accuracy for human activity recognition using convolutional neural network in compared with support vector machine | |
Gvishiani et al. | Mathematical methods of geoinformatics. III. Fuzzy comparisons and recognition of anomalies in time series | |
Li et al. | Control chart pattern recognition under small shifts based on multi-scale weighted ordinal pattern and ensemble classifier | |
Vijayarani et al. | Partitioning clustering algorithms for data stream outlier detection | |
Bochkaryov et al. | Application of the ensemble clustering algorithm in solving the problem of segmentation of users taking into account their loyalty | |
Carvalho et al. | A review of benchmarks for visual defect detection in the manufacturing industry | |
Maggino et al. | New tools for the construction of ranking and evaluation indicators in multidimensional systems of ordinal variables | |
Kathiresan et al. | Efficient Detection Using Soft Computing Approach of Modified Fuzzy C-Means Based Outlier Detection in Electronics Patient Records Systems |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180105 |
|
RJ01 | Rejection of invention patent application after publication |