CN109508733A - A kind of method for detecting abnormality based on distribution probability measuring similarity - Google Patents

A kind of method for detecting abnormality based on distribution probability measuring similarity Download PDF

Info

Publication number
CN109508733A
CN109508733A CN201811233705.2A CN201811233705A CN109508733A CN 109508733 A CN109508733 A CN 109508733A CN 201811233705 A CN201811233705 A CN 201811233705A CN 109508733 A CN109508733 A CN 109508733A
Authority
CN
China
Prior art keywords
data
sample
training
node
test point
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811233705.2A
Other languages
Chinese (zh)
Inventor
高欣
井潇
何杨
查森
纪维佳
任昺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN201811233705.2A priority Critical patent/CN109508733A/en
Publication of CN109508733A publication Critical patent/CN109508733A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers

Abstract

The embodiment of the present invention proposes a kind of method for detecting abnormality based on distribution probability measuring similarity, it include: multiple subsets that multiple stochastical sampling obtains normal sample data, the random isolation processes that each subset is saved with full binary tree structure delimit the threshold depth of backtracking according to drift ratio;External leaf node position and the threshold depth that each tree is fallen according to test point, leaf node where it trace back to the ancestor node of threshold depth, extract training data of all data as measurement and test point similarity under the node;With certain point in test point and training dataset for endpoint, the probability that remaining data points occur between this two o'clock is calculated separately in each attribute dimensions and obtains the exceptional value of the point in conjunction with the dissimilar degree of all the points in Min Shi distance calculating test point and data set.Technical solution provided in an embodiment of the present invention can effectively solve training data and concentrate data without exception and local anomaly test problems.

Description

A kind of method for detecting abnormality based on distribution probability measuring similarity
[technical field]
The present invention relates to machine learning field method for detecting abnormality, more particularly to one kind to be based on distribution probability measuring similarity Method for detecting abnormality.
[background technique]
When solving the problems, such as abnormality detection using machine learning method, there are no abnormal datas to be trained, part is different Often, each dimension dimension different distributions of data differ larger problem.The unsupervised part of data is solved according to suitable sorting algorithm Abnormality detection problem is one of the hot spot studied now to improve model to normal and whole exceptional sample discrimination.It solves At present for abnormality detection problem, conventional method is broadly divided into three types.The first is the method by mathematical statistics, is passed through The probability size that each dimension in statistical number strong point or each dimension various combination of data point occur judges whether exception.Second for based on The method for detecting abnormality of distance mainly judges to be by calculating test point with the degree of closeness of normal data is locally or globally gone up No is abnormal.The third is to judge whether it is abnormal by clustering or calculating the modes such as distribution relative density based on data distribution. But these methods have certain supposed premise: abnormal point and normal distribution cluster meet farther out, abnormal data density it is far low In normal data density etc..But in actual application environment, being distributed with for abnormal data may very be concentrated or away from normal number It is closer according to distribution cluster, or even have abnormal data and be wrapped in inside normal data cluster.The hypotheses that above-mentioned algorithm is done It excessively idealizes, this does not always set up even possibility very little in actual application environment, causes model recognition effect unstable It is fixed.In addition, will appear the situation of cluster exception in the events such as Epidemic outbreak of disease, network attack.It is a large amount of different when occurring extremely Regular data distribution concentrates on one or more distributed areas, that is to say, that spatial abnormal feature feature is more concentrated, is densely distributed, base Correct judgement can not be made to this situation in the serial algorithm of density.And in a practical situation, the case where anomalous concentration is that have Very big researching value, discrete abnormal point illustrates that such abnormal probability of happening is lower, and for the region of anomalous concentration Probability of happening is relatively high, detects that this exception can reduce loss to the full extent.Such as in network attack, if can be in time It detects cluster exception, finds out its attack mode, then can provide effective Informational support for network O&M worker, avoid system It is broken.
In conclusion for abnormality detection problem, there are following difficult points at present: abnormal data acquisition is more difficult;Abnormal number More closely there is local anomaly according to away from normal data, existing major part algorithm only judges abnormal journey by global or local data distribution Degree, can not comprehensively consider global and local distributed intelligence;Abnormal data distribution is more intensive, and distribution density is close with normal data Even higher, highdensity abnormal data cluster is easily judged as normal data by the algorithm based on relative density;Each attribute of data point Cloth range difference is larger, and by traditional distance calculating method such as Euclidean distance, weight has big difference between different dimensions, and early period Normalization, standardization can adapt data original distribution state again;Abnormal data is distributed in inside normal data, and is distributed Range has certain intersection;Have label training data concentrate data point distribution will not completely high density concentrate on a region, need to sieve The training data for having information redundancy is selected, guarantees that subsequent processing is interference-free.
[summary of the invention]
In view of this, the embodiment of the present invention proposes a kind of method for detecting abnormality based on distribution probability measuring similarity, To improve disaggregated model to the discrimination of positive negative sample entirety.
A kind of method for detecting abnormality based on distribution probability measuring similarity that the embodiment of the present invention proposes, comprising:
Multiple stochastical sampling obtains multiple subsets of normal sample data, complete with binary tree structure save each subset with Machine isolation processes delimit the threshold depth of backtracking according to drift ratio;
External leaf node position and the threshold depth that each tree is fallen according to test point, leaf node is recalled where it To the ancestor node of threshold depth, training data of all data as measurement and test point similarity under the node is extracted;
Remaining data points appearance is calculated separately in each attribute dimensions for endpoint with certain point in test point and training dataset Probability between this two o'clock calculates the dissimilar degree of all the points in test point and data set in conjunction with Min Shi distance, obtains this The exceptional value of point.
In the above method, multiple stochastical sampling obtains multiple subsets of normal sample data, with the preservation of full binary tree structure The random isolation processes of each subset, according to drift ratio delimit backtracking threshold depth method are as follows: by training dataset D with Machine samples to obtain several training subsets X_all, and each subset X contains m sample X={ X1, X2..., Xm, m is less than training The positive integer of data set D size, can select appropriate value according to the actual situation, and each sample point contains n dimension, i.e., i-th SampleRandomly select dimension and isolation threshold, isolation threshold be subset in certain dimension between Random value between its maximum value and minimum value;Continuous iteration, until meet following three conditions one of them, then terminate and change Generation: (1) spatially only one sample of each isolation;(2) spatially each sample point is identical in the dimension values;(3) reach Iteration limit number;By this process record in a tree structure, a complete binary tree is formed, each node can contain There is zero or two child node, what is saved in leaf node is the sample in each insulating space, and what internal node saved is isolation Dimension and corresponding threshold value, depth threshold of the Dt as the retrospective search neighbours training points in tree need according to each training points The average value E (h (x)) of depth h (x) is determined in each tree, as follows:
Wherein, E (h (x)) is the mean depth after sample x is traversed on all t isolation trees, and t is selected according to the actual situation Select suitable positive integer, liIt (x) is the pathdepth of i-th tree;
Need to be arranged a drift ratio r, 0≤r≤1, the i.e. relative depature in all normal training dataset D for Dt The ratio data of normal data distribution, r setting need to according in the dispersion degree and actual conditions of data distribution to model Indices demand is measured, and is selected before each training sample mean depth in ((1-r) * 100) % according to the drift ratio r of setting Minimum value as tree in retrospective search part training points depth threshold Dt.
In the above method, external leaf node position and the threshold depth of each tree are fallen according to test point, where it Leaf node traces back to the ancestor node of threshold depth, extracts all data under the node as measurement and test point similarity The method of training data are as follows: test point is sent into every one tree, if test sample falls in certain node under Dt depth, by test point Place node is recalled upwards, and until forefathers' node of Dt depth, training sample Ltd all under forefathers' node are extracted work For the data for next calculating test sample intensity of anomaly, the data extracted in all trees are incorporated as next instruction Practice sample, if test sample more than Dt depth, using data all in Ltd as next training sample, is surveyed this For sample sheet, farther out, local training data in its vicinity is less for the distribution in the overall situation away from normal data, therefore by Dt Under all training samples extract as next training sample, the intensity of anomaly of further validation test data.
In the above method, it is calculated separately in each attribute dimensions for endpoint with certain point in test point and training dataset There is the probability between this two o'clock in remainder strong point, and the dissmilarity of all the points in test point and data set is calculated in conjunction with Min Shi distance Degree, the method for obtaining the exceptional value of the point are as follows: firstly, by Ri(x, y) is defined as sample x and sample y in i-th dimension xiAnd yiTwo Region between value, at this time x ∈ Ltd.If S is the subspace set where all data of Ltd, SiFor space S i-th dimension sky Between distribution, ifSelect SiMost value be boundary, then Ri(x, y) is converted to Ri(x, S), Numi(x, y, d, S) is I-th dimension diWhether in RiBoolean in (x, y) range, wherein d is other samples in Ltd in addition to x, Mi(x,y|Ltd,S) It is as follows for the training points number in Ltd in i-th dimension between x and y:
Wherein, I () is indicator function, and condition in bracket is otherwise 0 if true, its value is 1;Then with Mi(x,y| Ltd, S) different degree of the ratio as x and y in i-th dimension in Ltd is accounted for, calculate different degree of the x and y in all dimensions D ' (x, y), as follows:
Wherein, p is the index value in Minkowski Distance, finally, the abnormality score p (y) of test point y is as follows:
Wherein, p (y) be the similarity at test point y and all midpoints Ltd and, test point is ranked up, p (y) is got over Greatly, intensity of anomaly is higher.
[Detailed description of the invention]
In order to illustrate the technical solution of the embodiments of the present invention more clearly, below will be to needed in the embodiment attached Figure is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for this field For those of ordinary skill, without any creative labor, it can also be obtained according to these attached drawings other attached Figure.
Fig. 1 is that the process for the method for detecting abnormality based on distribution probability measuring similarity that the embodiment of the present invention is proposed is shown It is intended to.
[specific embodiment]
For a better understanding of the technical solution of the present invention, being retouched in detail to the embodiment of the present invention with reference to the accompanying drawing It states.
It will be appreciated that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.Base Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts it is all its Its embodiment, shall fall within the protection scope of the present invention.
The embodiment of the present invention provides the method for detecting abnormality based on distribution probability measuring similarity, referring to FIG. 1, it is this The flow diagram for the method for detecting abnormality based on distribution probability measuring similarity that inventive embodiments are proposed, as shown in Figure 1, Method includes the following steps:
Step 101, multiple stochastical sampling obtains multiple subsets of normal sample data, is saved with full binary tree structure each The random isolation processes of subset delimit the threshold depth of backtracking according to drift ratio.
Specifically, obtaining several training subsets X_all by training dataset D stochastical sampling, each subset X contains m Sample X={ X1, X2..., Xm, m is the positive integer less than training dataset D size, can select suitable number according to the actual situation Value, each sample point contain n dimension, i.e. i-th of sample Randomly select dimension and isolation threshold Value, isolation threshold are random value of the subset in certain dimension between its maximum value and minimum value;Continuous iteration, until meeting Below three conditions one of them, then terminate iteration: (1) spatially only one sample of each isolation;(2) spatially each Sample point is identical in the dimension values;(3) reach iteration limit number;By this process record in a tree structure, formed One complete binary tree, each node can contain zero or two child node, and what is saved in leaf node is each insulating space In sample, what internal node saved is the dimension and corresponding threshold value of isolation, and Dt is used as the retrospective search neighbours training points in tree Depth threshold, need the average value E (h (x)) of the depth h (x) in each tree according to each training points to determine, following institute Show:
Wherein, E (h (x)) is the mean depth after sample x is traversed on all t isolation trees, and t is selected according to the actual situation Select suitable positive integer, liIt (x) is the pathdepth of i-th tree;
Need to be arranged a drift ratio r, 0≤r≤1, the i.e. relative depature in all normal training dataset D for Dt The ratio data of normal data distribution, r setting need to according in the dispersion degree and actual conditions of data distribution to model Indices demand is measured, and is selected before each training sample mean depth in ((1-r) * 100) % according to the drift ratio r of setting Minimum value as tree in retrospective search part training points depth threshold Dt.
Algorithm 1 and algorithm 2 are the isolation processes of step 101 and the pseudocode of depth threshold setting method:
Step 102, external leaf node position and the threshold depth that each tree is fallen according to test point, leaf where it Node traces back to the ancestor node of threshold depth, extracts training of all data as measurement and test point similarity under the node Data.
Specifically, test point is sent into every one tree, if test sample falls in certain node under Dt depth, by test point institute Recall upwards in node, until forefathers' node of Dt depth, training sample Ltd all under forefathers' node are extracted into conduct The data extracted in all trees are incorporated as next training by the data for next calculating test sample intensity of anomaly Sample, if test sample more than Dt depth, using data all in Ltd as next training sample, tests this For sample, farther out, local training data in its vicinity is less, therefore will be under Dt for the distribution in the overall situation away from normal data All training samples are extracted as next training sample, the intensity of anomaly of further validation test data.
Algorithm 3 and algorithm 4 are the pseudocode that local training data method is extracted in step 102:
Step 103, its remainder is calculated separately for endpoint in each attribute dimensions with certain point in test point and training dataset There is the probability between this two o'clock in strong point, and the dissimilar journey of all the points in test point and data set is calculated in conjunction with Min Shi distance Degree, obtains the exceptional value of the point.
Specifically, firstly, by Ri(x, y) is defined as sample x and sample y in i-th dimension xiAnd yiRegion between two values, this When x ∈ Ltd.If S is the subspace set where all data of Ltd, SiFor space S i-th dimension spatial distribution range, ifSelect SiMost value be boundary, then Ri(x, y) is converted to Ri(x, S), Numi(x, y, d, S) is i-th dimension diWhether RiBoolean in (x, y) range, wherein d is other samples in Ltd in addition to x, Mi(x, y | Ltd, S) it is in Ltd i-th The training points number between x and y is tieed up, as follows:
Wherein, I () is indicator function, and condition in bracket is otherwise 0 if true, its value is 1;Then with Mi(x,y| Ltd, S) different degree of the ratio as x and y in i-th dimension in Ltd is accounted for, calculate different degree of the x and y in all dimensions D ' (x, y), as follows:
Wherein, p is the index value in Minkowski Distance, finally, the abnormality score p (y) of test point y is as follows:
Wherein, p (y) be the similarity at test point y and all midpoints Ltd and, test point is ranked up, p (y) is got over Greatly, intensity of anomaly is higher.
Table solves 10 public affairs first is that the embodiment of the present invention provides the method for detecting abnormality based on distribution probability measuring similarity When opening data set abnormality detection task, the pretreatment of data set is divided into normal data and abnormal data for all kinds of in data set.
Table one
Table solves 10 public affairs second is that the embodiment of the present invention provides the method for detecting abnormality based on distribution probability measuring similarity When opening data set abnormality detection task, AUC value (ranking of random selection positive sample is higher than the probability of random selection negative sample) Contrast and experiment, wherein in the embodiment of the present invention control methods be the type solution never KNN of balanced sort problem, Eight kinds of methods of iForest, SCiForest, iNNE, ALSH, L1SH, L2SH, KLSH.By table one, it can be concluded that, the present invention is mentioned Method DPSM out is concentrated in public data and is significantly improved in AUC value compared to control methods.It is especially real in the first seven group Have greatly improved in testing, proposition method is highest level in eight groups of methods, remaining two groups also close with highest level. The method that the embodiment of the present invention is proposed achieves certain breakthrough in method for detecting abnormality.
Table two
In conclusion the embodiment of the present invention has the advantages that
In the technical solution that the present invention is implemented, multiple stochastical sampling obtains multiple subsets of normal sample data, with complete two Fork tree construction saves the random isolation processes of each subset, and the threshold depth of backtracking delimited according to drift ratio;According to test point External leaf node position and the threshold depth for falling in each tree, the ancestors that leaf node where it traces back to threshold depth save Point extracts training data of all data as measurement and test point similarity under the node;With test point and training dataset Certain interior point is endpoint, probability of the remaining data points appearance between this two o'clock is calculated separately in each attribute dimensions, in conjunction with Min Shi Distance calculates the dissimilar degree of all the points in test point and data set, obtains the exceptional value of the point.According to embodiments of the present invention The technical solution of offer can effectively solve local anomaly test problems, can be using original training data according to where test point The distribution of regional area normal data obtains its intensity of anomaly, improve the local anomaly of abnormality detection model detectability and its Overall target.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all in essence of the invention Within mind and principle, any modification, equivalent substitution, improvement and etc. done be should be included within the scope of the present invention.

Claims (4)

1. a kind of method for detecting abnormality based on distribution probability measuring similarity, which is characterized in that the method step includes:
(1) repeatedly stochastical sampling obtain normal sample data multiple subsets, complete with binary tree structure save each subset with Machine isolation processes delimit the threshold depth of backtracking according to drift ratio;
(2) external leaf node position and the threshold depth that each tree is fallen according to test point, leaf node is recalled where it To the ancestor node of threshold depth, training data of all data as measurement and test point similarity under the node is extracted;
(3) remaining data points appearance is calculated separately for endpoint in each attribute dimensions with certain point in test point and training dataset Probability between this two o'clock calculates the dissimilar degree of all the points in test point and data set in conjunction with Min Shi distance, obtains this The exceptional value of point.
2. the method according to claim 1, wherein repeatedly stochastical sampling obtains multiple sons of normal sample data Collection, the random isolation processes of each subset are saved with full binary tree structure, and the threshold depth of backtracking, tool delimited according to drift ratio Body is described as follows: obtaining several training subsets X_all by training dataset D stochastical sampling, each subset X contains m sample X ={ X1, X2..., Xm, m is the positive integer less than training dataset D size, can select appropriate value according to the actual situation, each Sample point contains n dimension, i.e. i-th of sampleDimension and isolation threshold are randomly selected, is isolated Threshold value is random value of the subset in certain dimension between its maximum value and minimum value;Continuous iteration, until meeting following three A condition one of them, then terminate iteration: (1) spatially only one sample of each isolation;(2) spatially each sample point It is identical in the dimension values;(3) reach iteration limit number;By this process record in a tree structure, formation one is complete Full binary tree, each node can contain zero or two child node, and what is saved in leaf node is the sample in each insulating space This, what internal node saved is the dimension and corresponding threshold value of isolation, depth of the Dt as the retrospective search neighbours training points in tree Threshold value needs the average value E (h (x)) of the depth h (x) in each tree according to each training points to determine, as follows:
Wherein, E (h (x)) is the mean depth after sample x is traversed on all t isolation trees, and t selects to close according to the actual situation Suitable positive integer, liIt (x) is the pathdepth of i-th tree;
Need to be arranged a drift ratio r for Dt, 0≤r≤1, i.e., relative depature is normal in all normal training dataset D The setting of the ratio data of data distribution range, r need to be according to every to model in the dispersion degree and actual conditions of data distribution Index demand is measured, and is selected before each training sample mean depth in ((1-r) * 100) % most according to the drift ratio r of setting Depth threshold Dt of the small value as retrospective search part training points in tree.
3. the method according to claim 1, wherein falling in the external leaf node position of each tree according to test point It sets and threshold depth, leaf node where it traces back to the ancestor node of threshold depth, extract all data under the node and make To measure the training data with test point similarity, illustrate are as follows: test point is sent into every one tree, if test sample is fallen in Certain node under Dt depth, the node where test point are recalled upwards, will be under forefathers' node until forefathers' node of Dt depth All training sample Ltd are extracted as the following data for calculating test sample intensity of anomaly, will be extracted in all trees To data be incorporated as next training sample, if test sample more than Dt depth, using data all in Ltd as Next training sample, for this test sample, the distribution in the overall situation away from normal data farther out, in its vicinity Local training data is less, therefore training samples all under Dt are extracted as next training sample, further tests Demonstrate,prove the intensity of anomaly of test data.
4. the method according to claim 1, wherein with certain point in test point and training dataset for endpoint, Calculate separately remaining data points in each attribute dimensions and probability between this two o'clock occur, in conjunction with Min Shi distance calculate test point with The dissimilar degree of all the points, obtains the exceptional value of the point, illustrates in data set are as follows: firstly, by Ri(x, y) is defined as sample This x and sample y are in i-th dimension xiAnd yiRegion between two values, at this time x ∈ Ltd.If S is the subspace where all data of Ltd Set, SiFor space S i-th dimension spatial distribution range, ifSelect SiMost value be boundary, then Ri(x, y) turns It is changed to Ri(x, S), Numi(x, y, d, S) is i-th dimension diWhether in RiBoolean in (x, y) range, wherein d is that y is removed in Ltd Other samples in addition, Mi(x, y | Ltd, S) is the training points number in Ltd in i-th dimension positioned at x and y between, as follows:
Wherein, I () is indicator function, and condition in bracket is otherwise 0 if true, its value is 1;Then with Mi(x, y | Ltd, S) it accounts for Different degree of the ratio as x and y in i-th dimension in Ltd calculates different degree D ' (x, y) of the x and y in all dimensions, It is as follows:
Wherein, p is the index value in Minkowski Distance, finally, the abnormality score p (y) of test point y is as follows:
Wherein, p (y) be the similarity at test point y and all midpoints Ltd and, test point is ranked up, p (y) is bigger, different Chang Chengdu is higher.
CN201811233705.2A 2018-10-23 2018-10-23 A kind of method for detecting abnormality based on distribution probability measuring similarity Pending CN109508733A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811233705.2A CN109508733A (en) 2018-10-23 2018-10-23 A kind of method for detecting abnormality based on distribution probability measuring similarity

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811233705.2A CN109508733A (en) 2018-10-23 2018-10-23 A kind of method for detecting abnormality based on distribution probability measuring similarity

Publications (1)

Publication Number Publication Date
CN109508733A true CN109508733A (en) 2019-03-22

Family

ID=65745932

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811233705.2A Pending CN109508733A (en) 2018-10-23 2018-10-23 A kind of method for detecting abnormality based on distribution probability measuring similarity

Country Status (1)

Country Link
CN (1) CN109508733A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110266680A (en) * 2019-06-17 2019-09-20 辽宁大学 A kind of industrial communication method for detecting abnormality based on dual similarity measurement
CN110377828A (en) * 2019-07-22 2019-10-25 腾讯科技(深圳)有限公司 Information recommendation method, device, server and storage medium
CN110781433A (en) * 2019-10-11 2020-02-11 腾讯科技(深圳)有限公司 Data type determination method and device, storage medium and electronic device
CN111639680A (en) * 2020-05-09 2020-09-08 西北工业大学 Identity recognition method based on expert feedback mechanism
CN111784966A (en) * 2020-06-15 2020-10-16 武汉烽火众智数字技术有限责任公司 Personnel management and control method and system based on machine learning
CN112085053A (en) * 2020-07-30 2020-12-15 济南浪潮高新科技投资发展有限公司 Data drift discrimination method and device based on nearest neighbor method
CN112181706A (en) * 2020-10-23 2021-01-05 北京邮电大学 Power dispatching data anomaly detection method based on logarithmic interval isolation
CN113204542A (en) * 2021-04-22 2021-08-03 武汉大学 Abnormal electricity sample cleaning and behavior recognition method
WO2022134578A1 (en) * 2020-12-22 2022-06-30 深圳壹账通智能科技有限公司 Method and apparatus for determining answer sequence

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040064029A1 (en) * 2002-09-30 2004-04-01 The Government Of The Usa As Represented By The Secretary Of The Dept. Of Health & Human Services Computer-aided classification of anomalies in anatomical structures
CN102664961A (en) * 2012-05-04 2012-09-12 北京邮电大学 Method for anomaly detection in MapReduce environment
CN103400152A (en) * 2013-08-20 2013-11-20 哈尔滨工业大学 High sliding window data stream anomaly detection method based on layered clustering
CN103473540A (en) * 2013-09-11 2013-12-25 天津工业大学 Vehicle track incremental modeling and on-line abnormity detection method of intelligent traffic system
CN104317681A (en) * 2014-09-02 2015-01-28 上海交通大学 Behavioral abnormality automatic detection method and behavioral abnormality automatic detection system aiming at computer system
WO2015167562A1 (en) * 2014-04-30 2015-11-05 Hewlett-Packard Development Company, L.P. Using local memory nodes of a multicore machine to process a search query
US20150332523A1 (en) * 2014-05-19 2015-11-19 EpiSys Science, Inc. Method and apparatus for biologically inspired autonomous infrastructure monitoring
CN106503086A (en) * 2016-10-11 2017-03-15 成都云麒麟软件有限公司 The detection method of distributed local outlier
CN106598822A (en) * 2015-10-15 2017-04-26 华为技术有限公司 Abnormal data detection method and device applied to capacity estimation
CN107426207A (en) * 2017-07-21 2017-12-01 哈尔滨工程大学 A kind of network intrusions method for detecting abnormality based on SA iForest
CN107657288A (en) * 2017-10-26 2018-02-02 国网冀北电力有限公司 A kind of power scheduling flow data method for detecting abnormality based on isolated forest algorithm
CN108333314A (en) * 2018-04-02 2018-07-27 深圳凯达通光电科技有限公司 A kind of air pollution intelligent monitor system

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040064029A1 (en) * 2002-09-30 2004-04-01 The Government Of The Usa As Represented By The Secretary Of The Dept. Of Health & Human Services Computer-aided classification of anomalies in anatomical structures
CN102664961A (en) * 2012-05-04 2012-09-12 北京邮电大学 Method for anomaly detection in MapReduce environment
CN103400152A (en) * 2013-08-20 2013-11-20 哈尔滨工业大学 High sliding window data stream anomaly detection method based on layered clustering
CN103473540A (en) * 2013-09-11 2013-12-25 天津工业大学 Vehicle track incremental modeling and on-line abnormity detection method of intelligent traffic system
WO2015167562A1 (en) * 2014-04-30 2015-11-05 Hewlett-Packard Development Company, L.P. Using local memory nodes of a multicore machine to process a search query
US20150332523A1 (en) * 2014-05-19 2015-11-19 EpiSys Science, Inc. Method and apparatus for biologically inspired autonomous infrastructure monitoring
CN104317681A (en) * 2014-09-02 2015-01-28 上海交通大学 Behavioral abnormality automatic detection method and behavioral abnormality automatic detection system aiming at computer system
CN106598822A (en) * 2015-10-15 2017-04-26 华为技术有限公司 Abnormal data detection method and device applied to capacity estimation
CN106503086A (en) * 2016-10-11 2017-03-15 成都云麒麟软件有限公司 The detection method of distributed local outlier
CN107426207A (en) * 2017-07-21 2017-12-01 哈尔滨工程大学 A kind of network intrusions method for detecting abnormality based on SA iForest
CN107657288A (en) * 2017-10-26 2018-02-02 国网冀北电力有限公司 A kind of power scheduling flow data method for detecting abnormality based on isolated forest algorithm
CN108333314A (en) * 2018-04-02 2018-07-27 深圳凯达通光电科技有限公司 A kind of air pollution intelligent monitor system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
NISHAD P ET AL: "Anomaly detection for IGBTs using Mahalanobis distance", 《MICROELECTRONICS RELIABILITY》 *
董国宾等: "基于RFID路径数据的异常路径检测", 《计算机应用研究》 *

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110266680B (en) * 2019-06-17 2021-08-24 辽宁大学 Industrial communication anomaly detection method based on dual similarity measurement
CN110266680A (en) * 2019-06-17 2019-09-20 辽宁大学 A kind of industrial communication method for detecting abnormality based on dual similarity measurement
CN110377828B (en) * 2019-07-22 2023-05-26 腾讯科技(深圳)有限公司 Information recommendation method, device, server and storage medium
CN110377828A (en) * 2019-07-22 2019-10-25 腾讯科技(深圳)有限公司 Information recommendation method, device, server and storage medium
CN110781433B (en) * 2019-10-11 2023-06-02 腾讯科技(深圳)有限公司 Data type determining method and device, storage medium and electronic device
CN110781433A (en) * 2019-10-11 2020-02-11 腾讯科技(深圳)有限公司 Data type determination method and device, storage medium and electronic device
CN111639680A (en) * 2020-05-09 2020-09-08 西北工业大学 Identity recognition method based on expert feedback mechanism
CN111639680B (en) * 2020-05-09 2022-08-09 西北工业大学 Identity recognition method based on expert feedback mechanism
CN111784966A (en) * 2020-06-15 2020-10-16 武汉烽火众智数字技术有限责任公司 Personnel management and control method and system based on machine learning
CN112085053B (en) * 2020-07-30 2022-08-26 山东浪潮科学研究院有限公司 Data drift discrimination method and device based on nearest neighbor method
CN112085053A (en) * 2020-07-30 2020-12-15 济南浪潮高新科技投资发展有限公司 Data drift discrimination method and device based on nearest neighbor method
CN112181706A (en) * 2020-10-23 2021-01-05 北京邮电大学 Power dispatching data anomaly detection method based on logarithmic interval isolation
CN112181706B (en) * 2020-10-23 2023-09-22 北京邮电大学 Power dispatching data anomaly detection method based on logarithmic interval isolation
WO2022134578A1 (en) * 2020-12-22 2022-06-30 深圳壹账通智能科技有限公司 Method and apparatus for determining answer sequence
CN113204542A (en) * 2021-04-22 2021-08-03 武汉大学 Abnormal electricity sample cleaning and behavior recognition method
CN113204542B (en) * 2021-04-22 2023-08-22 武汉大学 Abnormal electricity consumption sample cleaning and behavior recognition method

Similar Documents

Publication Publication Date Title
CN109508733A (en) A kind of method for detecting abnormality based on distribution probability measuring similarity
US20200374720A1 (en) Method for Detecting Abnormal Data in Sensor Network
CN111833172A (en) Consumption credit fraud detection method and system based on isolated forest
US20080306715A1 (en) Detecting Method Over Network Intrusion
Arbin et al. Comparative analysis between k-means and k-medoids for statistical clustering
CN111562108A (en) Rolling bearing intelligent fault diagnosis method based on CNN and FCMC
CN103473540B (en) The modeling of intelligent transportation system track of vehicle increment type and online method for detecting abnormality
CN113887616A (en) Real-time abnormity detection system and method for EPG (electronic program guide) connection number
CN109039503A (en) A kind of frequency spectrum sensing method, device, equipment and computer readable storage medium
CN107579846B (en) Cloud computing fault data detection method and system
CN112735097A (en) Regional landslide early warning method and system
CN106792883A (en) Sensor network abnormal deviation data examination method and system
CN113516228B (en) Network anomaly detection method based on deep neural network
CN110297207A (en) Method for diagnosing faults, system and the electronic device of intelligent electric meter
CN101738998A (en) System and method for monitoring industrial process based on local discriminatory analysis
CN111445147A (en) Generative confrontation network model evaluation method for mechanical fault diagnosis
CN111950645A (en) Method for improving class imbalance classification performance by improving random forest
CN112756759A (en) Spot welding robot workstation fault judgment method
CN114707571A (en) Credit data anomaly detection method based on enhanced isolation forest
CN108508297A (en) A kind of fault arc detection method based on change coefficient and SVM
CN113822336A (en) Cloud hard disk fault prediction method, device and system and readable storage medium
CN114756420A (en) Fault prediction method and related device
CN106991171A (en) Topic based on Intelligent campus information service platform finds method
CN110472188A (en) A kind of abnormal patterns detection method of facing sensing data
CN106683263B (en) The defect management method and system of valuable bills

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20190322

WD01 Invention patent application deemed withdrawn after publication