CN104504233A - Method for abnormal recognition based on random sampling of multi-dimensional vector entropies - Google Patents

Method for abnormal recognition based on random sampling of multi-dimensional vector entropies Download PDF

Info

Publication number
CN104504233A
CN104504233A CN201410646085.0A CN201410646085A CN104504233A CN 104504233 A CN104504233 A CN 104504233A CN 201410646085 A CN201410646085 A CN 201410646085A CN 104504233 A CN104504233 A CN 104504233A
Authority
CN
China
Prior art keywords
point
vector
sample point
entropy
space
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410646085.0A
Other languages
Chinese (zh)
Other versions
CN104504233B (en
Inventor
张玉超
邓波
彭甫阳
李海龙
李冬红
齐超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing System Engineering Research Institute
Original Assignee
Beijing System Engineering Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing System Engineering Research Institute filed Critical Beijing System Engineering Research Institute
Priority to CN201410646085.0A priority Critical patent/CN104504233B/en
Publication of CN104504233A publication Critical patent/CN104504233A/en
Application granted granted Critical
Publication of CN104504233B publication Critical patent/CN104504233B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Complex Calculations (AREA)

Abstract

The invention provides a method for abnormal recognition based on random sampling of multi-dimensional vector entropies. The method comprises the following steps of I, selecting sampling points from sample points in a sample space omega so as to generate a sub-sample space omega; II, determining the multi-dimensional vector entropies of the sample points; III, repeating the steps and determining a fusion result of the multi-dimensional vector entropies of the sample points; IV, determining the abnormal degree of the sample points; and V, determining abnormal points. According to the method, through fusing the sample points obtained through random sampling, the facing problems of large sample capacity, high dimensionality and the like of abnormal recognition in large-scale data are solved, time complexity for abnormal recognition can be reduced, accuracy for a recognition effect is improved, and the method has stronger expansibility.

Description

A kind of abnormality recognition method based on multi-C vector entropy stochastic sampling
Technical field
The present invention relates to a kind of method of anomalous identification field, specifically relate to a kind of abnormality recognition method based on multi-C vector entropy stochastic sampling.
Background technology
Anomalous identification refers to find out from one group of related data to have away from overall acnode or abnormity point, and these abnormity point neither belong to the point that cluster do not belong to ground unrest yet, and they produced by diverse mechanism often.Current, anomalous identification, as a kind of important data mining and analytical approach, has been widely used in the field such as telecommunication fraud, credit card abuse, loan examination & approval, drug research, medical analysis, consumer behaviour analysis, weather forecast, financial field client segmentation, network invasion monitoring.
In prior art, the method for anomalous identification mainly comprises the abnormality recognition method of Corpus--based Method, based on the abnormality recognition method of distance and the abnormality recognition method of density based and the abnormality recognition method four kinds based on cluster, introduces respectively below.
(1) abnormality recognition method of Corpus--based Method
Statistical method is the method based on model, is data creation model, and assesses them according to the situation of object fitting model and have and muchly may meet this model.From the eighties in 20th century, Identifying Outliers is extensively studied in field of statistics, and usual user carries out modeling by certain statistical distribution to data point, then with the model of supposition, the distribution according to point determines whether exception.Such as, in statistics, tentation data collection Normal Distribution, the data object that the deviation between those and average meets or exceeds 3 times of standard deviations just can be referred to as exceptional value.According to this law, the abnormality recognition method of a series of Corpus--based Method can be derived.
Often there is following shortcoming in the abnormality recognition method of Corpus--based Method: first, and the prerequisite of Corpus--based Method method to know what distribution data set obeys, if misjudgment just causes heavytailed distribution, affects recognition result; Secondly, the method can only identify single variable, and namely each identification can only be confined to single index, cannot analyze, therefore cannot analyze high dimensional data in conjunction with multi objective.
(2) based on the abnormality recognition method of distance.
Abnormality recognition method based on distance is thought, if an object is away from most of point, so it is exactly abnormal.This method than statistical method more generally, easilier to use because determine the point of data set between distance than determining that its statistical distribution is easier.The intensity of anomaly score of an object usually can be given by the distance of K the arest neighbors to it.The method is usually comparatively responsive to the selection of arest neighbors number K, if K is too little, then abnormal score may be inaccurate; If K is too large, then normal point also may be identified as abnormity point.Usually the mean distance of K arest neighbors can be chosen as abnormal score.
Often there is following shortcoming in the abnormality recognition method of Corpus--based Method: first, and the time complexity of the method is many at Ο (n 2), be difficult to be applicable to large data sets; Secondly, the method is responsive to the selection and comparison of parameter, the recognition result that easily impact is final; Meanwhile, because the method uses the threshold value of the overall situation, the data set with different densities region can not therefore be processed.
(3) abnormality recognition method of density based.
From the viewpoint of density based, abnormity point is the object in density regions.The usual score of intensity of anomaly of an object is the inverse of this data collection density.The Identifying Outliers of density based is closely related with the anomalous identification based on distance, because density defines by the distance of arest neighbors usually, a kind of method of conventional definition density is, definition density is the inverse of the mean distance to K arest neighbors.Distinguishingly, when packet contains the region of different densities, the identification abnormity point that they can not be correct, therefore follow-up local density's detection technique of extending again is to judge abnormity point,
Often there is following shortcoming in the abnormality recognition method of density based: first, the same with the method based on distance, and the time complexity of the method is higher, very difficult to the process of large data sets; Secondly, the method is responsive to the selection and comparison of parameter, the recognition result that also easily impact is final.
(4) based on the abnormality recognition method of cluster
If an object does not belong to any bunch by force, so this object is defined as the abnormity point based on cluster.Utilize clustering technique abnormity point, usual employing abandons the method for the tuftlet away from other bunches, and this method also can use together with other any clustering techniques, but needs the threshold value of the spacing of most tuftlet size and tuftlet and other bunches, therefore, it is extremely sensitive to the selection of bunch number.If less bunch is also poly-in height, the abnormality recognition method so based on cluster cannot detect this abnormity point.But this method can utilize and note abnormalities a little based on linear and close to linear complexity clustering technique, and therefore time complexity is lower.
Often there is following shortcoming in the abnormality recognition method based on cluster: first, clustering algorithm produce bunch quality very large to the quality influence of the abnormity point that this algorithm produces; Secondly, the abnormity point set of generation and their score may rely on the number of used bunch and the existence of data abnormal point very much.These all can increase the difficulty of anomalous identification.
To sum up, can find out: the anomalous identification application of Corpus--based Method is mainly limited to scientific data statistics, and this is mainly because must know that the distribution characteristics of data which limits its range of application in advance.Abnormality recognition method based on distance is compared with the abnormality recognition method of Corpus--based Method, does not need user to have any domain knowledge.And, distance abnormal more close to the essential reason that formed of abnormity point.The anomalous identification of density based extends based on the one of the abnormality recognition method of distance, and the identification for local outlier is more effective.Local anomaly identification then meets real-life true application more.Anomalous identification technology based on cluster often depends on clustering result quality and the time loss of the clustering algorithm of itself.
But along with the increase of sample data amount, also propose larger challenge to anomalous identification, above-mentioned four kinds of abnormality recognition method life period expenses are large, apply the shortcomings such as limited in higher dimensional space.Abnormality recognition method of the prior art is for Small Sample Database collection, and time complexity is many at Ο (n 2) or Ο (n 3); And under large-scale data, this time overhead may be difficult to accept.In addition, the increase of dimension also brings another " dimension disaster " problem, and namely more and more sparse in the space of data occupied by it, the distance between sample point is almost equal, causes much losing meaning based on the parameter of Distance geometry density based.Therefore, need to provide one efficiently, Identifying Outliers method accurately.
Summary of the invention
For overcoming above-mentioned the deficiencies in the prior art, the invention provides a kind of abnormality recognition method based on multi-C vector entropy stochastic sampling.
Realizing the solution that above-mentioned purpose adopts is:
Based on an abnormality recognition method for multi-C vector entropy stochastic sampling, its improvements are: said method comprising the steps of:
I, from sample space Ω, sample point chooses sampled point, generates space, subsample ω;
II, determine the multi-C vector entropy of described sample point;
III, repetition above-mentioned steps, determine the fusion results of described sample point multi-C vector entropy;
IV, determine the intensity of anomaly of described sample point;
V, determine abnormity point.
Further, in described step I, determine the number N of the sample point in described sample space;
The method of sampling is used to determine in described sample point individual described sampled point, generates described subsample space ω according to described sampled point.
Further, the described method of sampling is stochastic sampling method.
Further, comprise the following steps in described Step II:
The vector that the sampled point of S201, the sample point determining described sample space Ω and described subsample space ω is formed;
S202, the multi-C vector entropy determining between each described sample point to each described sampled point.
Further, the described multi-C vector entropy Φ (A) between each described sample point to each described sampled point is determined by following formula:
In formula, A is arbitrary described sample point; The multi-C vector entropy that Φ (A) is sample point A;
for vector vectorial entropy;
for with described sample point A for initial point and any vector that is terminal with each described sampled point,
for vector mould;
D is the dimension of described sample point;
for vector jth dimension attribute, if property value is negative, then takes absolute value and calculate multi-C vector entropy;
N is the number of sampled point in the space ω of described subsample, n is the number of sample point in described sample space Ω.
Further, comprise the following steps in described Step II I:
S301, multiplicity are K time, all obtain K multi-C vector entropy for each described sample point; The scope of K is 10≤K≤20;
S302, employing mean value convergence strategy merge described K multi-C vector entropy, determine the fusion value of mean value as described sample point of the multi-C vector entropy of described sample point.
Further, repeat K step I, obtain K described subsample space ω and meet following requirement:
The union of described K sub-sample space ω is the complete or collected works of described sample space Ω.
Further, in described step IV, the fusion value of described sample point is sorted, determine the intensity of anomaly of sample point in described sample space Ω according to described fusion value.
Further, the score value of described fusion value is higher, and the intensity of anomaly of described sample point is higher, otherwise intensity of anomaly is lower.
Further, in described step V, according to the threshold decision of the multi-C vector entropy fusion value of described sample point, sample point is abnormity point or normal point;
If described multi-C vector entropy fusion value is more than or equal to threshold value, judges that this sample point is abnormity point, otherwise be normal point.
Compared with prior art, the present invention has following beneficial effect:
1, method provided by the invention is based on stochastic sampling strategy, generates space, multiple subsample, thus can reduce the time complexity of anomalous identification.
2, method provided by the invention is by building the multi-C vector between sample point and sampled point, and carries out anomalous identification by the entropy distribution calculating multi-C vector, solves the dimension disaster problem of higher dimensional space.
3, in method provided by the invention, the calculating multi-C vector entropy process of stochastic sampling strategy is relatively independent, is conducive to the extendability strengthening this method.
4, method provided by the invention utilizes fusion type strategy, and by combining repeatedly the multi-C vector entropy result of calculation of stochastic sampling, the mean value calculating multiple multi-C vector entropy characterizes intensity of anomaly, increases the diversity that sample multi-C vector entropy calculates.
5, method provided by the invention can provide the quantized value of all sample point intensity of anomalys, and according to its high low degree sequence, is conducive to the discrimination increasing each sample point intensity of anomaly.
Accompanying drawing explanation
Fig. 1 is the abnormality recognition method process flow diagram based on multi-C vector entropy stochastic sampling in the present embodiment;
Fig. 2 determines space, subsample schematic diagram from sample space in the present embodiment;
Fig. 3 is the schematic diagram merging sample point multi-C vector entropy in the present embodiment;
Fig. 4 is that in the present embodiment, normal point is distributed in ellipsoidal surfaces, abnormity point obeys the normal distribution schematic diagram that volume is greater than spheroid;
Fig. 5 is the normal distribution schematic diagram that in the present embodiment, normal point is distributed in spheroid, abnormity point obedience volume is greater than spheroid;
Fig. 6 is the distribution situation schematic diagram of the multi-C vector entropy that in the Fig. 4 provided in the present embodiment, all-pair is answered
Fig. 7 is the distribution situation schematic diagram of the multi-C vector entropy that in the Fig. 5 provided in the present embodiment, all-pair is answered.
Embodiment
Below in conjunction with accompanying drawing, the specific embodiment of the present invention is described in further detail.
For realize to various dimensions, large-scale data concentrate abnormal data efficient, identify accurately, the present embodiment provides a kind of abnormality recognition method based on multi-C vector entropy stochastic sampling.
As shown in Figure 1, in the present embodiment based on the abnormality recognition method process flow diagram of multi-C vector entropy stochastic sampling; The method can the intensity of anomaly of all sample points in recognition sample space.The method comprises the following steps:
Step one, in the sample point of sample space, choose sampled point.
Step 2, determine the multi-C vector entropy of sample point in described sample space.
Step 3, repetition above-mentioned steps one and step 2 K time, determine the fusion results of described sample point multi-C vector entropy.
Step 4, determine the intensity of anomaly of described sample point.
In step one, in sample space Ω, comprise N number of sample point, use the method for sampling to choose from sample point individual sampled point, generates space, subsample ω.
In the present embodiment, the method for sampling selects stochastic sampling method.
In step 2, determine the multi-C vector entropy of described sampled point, can realize in two steps.
First, the vector that the sample point of described sample space Ω and the sampled point of described subsample space ω are formed is determined then, the multi-C vector entropy of all sample points to each sampled point is determined.
Determine the multi-C vector entropy of all described sample points, comprise the following steps:
Determine arbitrary sample point A, to determine in sample space between sample point A to sampled point form the multi-C vector entropy Φ (A) of vector:
In formula, for with each sample point for initial point and any vector that is terminal with each sampled point, and
for vector field homoemorphism;
N is the number of sampled point in space, described subsample, n is the sample point number of sample space; ;
D is the dimension of sample point in described sample space;
for vector jth dimension attribute, if property value is negative, then gets its absolute value and calculate multi-C vector entropy;
for vector multi-C vector entropy.
In step 3, repeat above-mentioned steps one and two K time, determine the fusion results of described sample point multi-C vector entropy.Specifically comprise the following steps:
First, return step one and step 2, calculate the multi-C vector entropy of sample point, judge whether to reach K time, enter step 4 if reach, otherwise return step one continuation calculating.By said method, for each described sample point, K multi-C vector entropy all can be obtained; The scope of K is 10≤K≤20.
Repeat step one K time, expression can obtain K sub-sample space, it should be noted that, obtains K described subsample space ω and meets following requirement:
The union of described K sub-sample space ω is the complete or collected works of described sample space Ω.
Then, adopt mean value convergence strategy to merge described K multi-C vector entropy, determine the fusion value of mean value as described sample point of the multi-C vector entropy of described sample point.
In step 4, determine the intensity of anomaly of sample point in described sample space.
The method is: the fusion value of the multi-C vector entropy of each described sample point sorted, and determines the intensity of anomaly of sampled point in described sample space according to described fusion value.
The score value of fusion value is higher, and the intensity of anomaly of described sample point is higher, otherwise the score value of fusion value is lower, and the intensity of anomaly of sample point is lower.
By the intensity of anomaly of all sample points in above-mentioned steps one to step 4 determination sample space, by step 5, according to intensity of anomaly determination abnormity point.Method for: according to the threshold decision of the multi-C vector entropy fusion value of described sample point, sample point is abnormity point or normal point.
If described multi-C vector entropy fusion value is more than or equal to threshold value, judge that this sample point is abnormity point, otherwise, if described multi-C vector entropy fusion value is less than threshold value, judge that this sample point is normal point.
For the threshold value determining multi-C vector entropy fusion value, provide two kinds of methods:
1, according to the requirement of Identifying Outliers in diverse ways or system, the threshold value of the multi-C vector entropy of sample point is determined.
It should be noted that, due to the requirement of diverse ways or system, the value that Threshold selection is different.In general, accuracy requirement is higher, and threshold value value is less.
2, determine the basis for estimation of abnormity point further according to the multi-C vector entropy fusion value of sample points all in sample space, method is: calculate population mean further, in this, as basis for estimation to all fusion values.
Application Example one
Suppose in a sample space, data set is D={a 1, a 2..., a 30, sample size N is 30, and attribute dimension d is 5.Use method determination abnormity point of the present invention.Concrete steps are as follows:
Step one, in sample space, determine sampled point.
The sample point of the data centralization of random selecting 10% is stochastic sampling point, i.e. sampled point number in space, subsample generate space, subsample, be expressed as D *={ s 1, s 2, s 3, s 1, s 2, s 3be respectively three sampled points.
Setting s 1={ 2,3,1,2,3}, s 2={ 4,3,2,2,1}, s 3={ 3,2,1,1,2}.
Step 2, determine the multi-C vector entropy of described sample point.Specifically can comprise following two steps:
S201, the vector determining in described sample space in each sample point and space, described subsample between each sampled point respectively.
In the present embodiment, first determine arbitrary sample point A={3,1,2,1,2} in sample space, determine with sample point A be initial point, with sampled point s 1={ 2,3,1,2,3} is the vector of terminal
Then, this sample point A to s is determined 2={ 4,3,2,2,1} and s 3={ the vector of 3,2,1,1,2};
Finally, the vector of all sampled points in all sample points and space, subsample in sample space can be determined according to said method;
S202, the vector distribution relation Φ determining between each described sample point to each described sampled point respectively, described vector distribution relation is as the multi-C vector entropy of described sample point to described sampled point.
With above-mentioned vector for example, by determining shown in following formula that sample point A is to sampled point s 1multi-C vector entropy:
It should be noted that, (wherein, j=1,2 ..., 5) represent vector jth dimension attribute, if property value is negative, then gets its absolute value and calculate multi-C vector entropy.
As calculated, the multi-C vector entropy Φ (A)=2.158 of sample point is determined.
Step 3, repetition above-mentioned steps, determine the fusion results of described sample point multi-C vector entropy.Specifically comprise the following steps:
First, repeat step one and step 2 K time, 10≤K≤20, in the present embodiment, K gets 10.
Repeat 10 times, expression can obtain 10 sub-sample spaces, and 10 described subsample space ω meet following requirement:
The union of 10 sub-sample space ω is the complete or collected works of described sample space Ω.
Then, adopt mean value convergence strategy to merge described 10 multi-C vector entropys, determine the fusion value of mean value as described sample point of the multi-C vector entropy of described sample point.
Suppose for sample point A={3,1,2,1,2}, the multi-C vector entropy calculated for 10 times is respectively that { 3,1,2,1,2,3,4,3,2,2}, then the mean value calculating 10 multi-C vector entropy of this sample point is 2.3.Above-mentioned mean value characterizes the intensity of anomaly of sample point A.
Step 4, determine the intensity of anomaly of described sample point.
The fusion value of the multi-C vector entropy of each described sample point is sorted, determines the intensity of anomaly of sampled point in described sample space according to described fusion value.
The score value of fusion value is higher, and the intensity of anomaly of described sample point is higher, otherwise the score value of fusion value is lower, and the intensity of anomaly of sample point is lower.
Step 5, determine abnormity point.
According to the abnormity point of the threshold value determination sample space of the multi-C vector entropy fusion value of described sample point.If the multi-C vector entropy fusion value of described sample point is more than or equal to described threshold value, then judge that described sample point is as abnormity point; Otherwise be judged to be normal point.
Such as, the multi-C vector entropy fusion value of 3 points is respectively 10,5,2, if the threshold value of user's setting is 6, then only has be abnormal at first; If the threshold value of user's setting is 3, then the 1st, 2 be abnormal; If with setting threshold value be 1, then be a little all abnormal.
Application Example two
As shown in Figure 4,5, Fig. 4,5 is respectively the normal distribution schematic diagram that normal point is distributed in ellipsoidal surfaces, abnormity point obedience volume is greater than the normal distribution schematic diagram of spheroid and normal point is distributed in spheroid, abnormity point obedience volume is greater than spheroid.
Fig. 4,5 simulates in three-dimensional space under different scene, the distribution situation of elliposoidal data set, and in figure, plus sige represents normal point, squarely represents abnormity point, comprises normal point 200 altogether, abnormity point 20, totally 220 data points.
Method of the present invention is used to carry out Identifying Outliers.First, at the elliposoidal data centralization that often kind of scene is corresponding, the point of random selecting 10%, i.e. 22 points, space, corresponding subsample is generated.
Then, respectively for often kind of scene, travel through all sample points in this scene, calculate the multi-C vector entropy of each sample point.Finally, the distribution situation of multi-C vector entropy under often kind of scene is drawn.
As shown in Figure 6,7, Fig. 6 gives the distribution situation of the multi-C vector entropy that all-pair in Fig. 4 is answered, and Fig. 7 gives the distribution situation of the multi-C vector entropy that all-pair in Fig. 5 is answered.As can be seen from above-mentioned two figure, the value of abnormity point multi-C vector entropy is apparently higher than the value of the multi-C vector entropy of normal point.
Finally should be noted that: above embodiment is only for illustration of the technical scheme of the application but not the restriction to its protection domain; although with reference to above-described embodiment to present application has been detailed description; those of ordinary skill in the field are to be understood that: those skilled in the art still can carry out all changes, amendment or equivalent replacement to the embodiment of application after reading the application; but these change, revise or be equal to replacement, all applying within the claims awaited the reply.

Claims (10)

1. based on an abnormality recognition method for multi-C vector entropy stochastic sampling, it is characterized in that: said method comprising the steps of:
I, from sample space Ω, sample point chooses sampled point, generates space, subsample ω;
II, determine the multi-C vector entropy of described sample point;
III, repetition above-mentioned steps, determine the fusion results of described sample point multi-C vector entropy;
IV, determine the intensity of anomaly of described sample point;
V, determine abnormity point.
2. a kind of abnormality recognition method based on multi-C vector entropy stochastic sampling as claimed in claim 1, is characterized in that: in described step I, determines the number N of the sample point in described sample space;
The method of sampling is used to determine in described sample point individual described sampled point, generates described subsample space ω according to described sampled point.
3. a kind of abnormality recognition method based on multi-C vector entropy stochastic sampling as claimed in claim 2, is characterized in that: the described method of sampling is stochastic sampling method.
4. a kind of abnormality recognition method based on multi-C vector entropy stochastic sampling as claimed in claim 1, is characterized in that: comprise the following steps in described Step II:
The vector that the sampled point of S201, the sample point determining described sample space Ω and described subsample space ω is formed;
S202, the multi-C vector entropy determining between each described sample point to each described sampled point.
5. a kind of abnormality recognition method based on multi-C vector entropy stochastic sampling as claimed in claim 4, is characterized in that: determine the described multi-C vector entropy Φ (A) between each described sample point to each described sampled point by following formula:
In formula, A is arbitrary described sample point; The multi-C vector entropy that Φ (A) is sample point A;
for vector vectorial entropy;
for with described sample point A for initial point and any vector that is terminal with each described sampled point,
for vector mould;
D is the dimension of described sample point;
for vector jth dimension attribute, if property value is negative, then takes absolute value and calculate multi-C vector entropy;
N is the number of sampled point in the space ω of described subsample, n is the number of sample point in described sample space Ω.
6. a kind of abnormality recognition method based on multi-C vector entropy stochastic sampling as claimed in claim 1, is characterized in that: comprise the following steps in described Step II I:
S301, multiplicity are K time, all obtain K multi-C vector entropy for each described sample point; The scope of K is 10≤K≤20;
S302, employing mean value convergence strategy merge described K multi-C vector entropy, determine the fusion value of mean value as described sample point of the multi-C vector entropy of described sample point.
7. a kind of abnormality recognition method based on multi-C vector entropy stochastic sampling as claimed in claim 6, is characterized in that: repeat K step I, obtains K described subsample space ω and meets following requirement:
The union of described K sub-sample space ω is the complete or collected works of described sample space Ω.
8. a kind of abnormality recognition method based on multi-C vector entropy stochastic sampling as claimed in claim 1, it is characterized in that: in described step IV, the fusion value of described sample point is sorted, determines the intensity of anomaly of sample point in described sample space Ω according to described fusion value.
9. a kind of abnormality recognition method based on multi-C vector entropy stochastic sampling as claimed in claim 8, is characterized in that: the score value of described fusion value is higher, and the intensity of anomaly of described sample point is higher, otherwise intensity of anomaly is lower.
10. a kind of abnormality recognition method based on multi-C vector entropy stochastic sampling as claimed in claim 1, it is characterized in that: in described step V, according to the threshold decision of the multi-C vector entropy fusion value of described sample point, sample point is abnormity point or normal point;
If described multi-C vector entropy fusion value is more than or equal to threshold value, judges that this sample point is abnormity point, otherwise be normal point.
CN201410646085.0A 2014-11-14 2014-11-14 A kind of abnormality recognition method based on multi-C vector entropy stochastical sampling Expired - Fee Related CN104504233B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410646085.0A CN104504233B (en) 2014-11-14 2014-11-14 A kind of abnormality recognition method based on multi-C vector entropy stochastical sampling

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410646085.0A CN104504233B (en) 2014-11-14 2014-11-14 A kind of abnormality recognition method based on multi-C vector entropy stochastical sampling

Publications (2)

Publication Number Publication Date
CN104504233A true CN104504233A (en) 2015-04-08
CN104504233B CN104504233B (en) 2017-06-06

Family

ID=52945630

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410646085.0A Expired - Fee Related CN104504233B (en) 2014-11-14 2014-11-14 A kind of abnormality recognition method based on multi-C vector entropy stochastical sampling

Country Status (1)

Country Link
CN (1) CN104504233B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWM391701U (en) * 2009-10-30 2010-11-01 Wei-Lieh Hsu A traffic detection system
CN103281293A (en) * 2013-03-22 2013-09-04 南京江宁台湾农民创业园发展有限公司 Network flow rate abnormity detection method based on multi-dimension layering relative entropy
US20130322690A1 (en) * 2012-06-04 2013-12-05 Electronics And Telecommunications Research Institute Situation recognition apparatus and method using object energy information
CN103441982A (en) * 2013-06-24 2013-12-11 杭州师范大学 Intrusion alarm analyzing method based on relative entropy
CN103577835A (en) * 2013-08-02 2014-02-12 中国科学技术大学苏州研究院 Method using multi-dimensional feature vectors to detect IP ID covert channel
CN104123544A (en) * 2014-07-23 2014-10-29 通号通信信息集团有限公司 Video analysis based abnormal behavior detection method and system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWM391701U (en) * 2009-10-30 2010-11-01 Wei-Lieh Hsu A traffic detection system
US20130322690A1 (en) * 2012-06-04 2013-12-05 Electronics And Telecommunications Research Institute Situation recognition apparatus and method using object energy information
CN103281293A (en) * 2013-03-22 2013-09-04 南京江宁台湾农民创业园发展有限公司 Network flow rate abnormity detection method based on multi-dimension layering relative entropy
CN103441982A (en) * 2013-06-24 2013-12-11 杭州师范大学 Intrusion alarm analyzing method based on relative entropy
CN103577835A (en) * 2013-08-02 2014-02-12 中国科学技术大学苏州研究院 Method using multi-dimensional feature vectors to detect IP ID covert channel
CN104123544A (en) * 2014-07-23 2014-10-29 通号通信信息集团有限公司 Video analysis based abnormal behavior detection method and system

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
A.PATCHA ETAL: "An overview of anomaly detection techniques:Existing solutions and latest technological trends", 《COMPUTER NETWORKS》 *
XINGWANG ZHAO ETAL: "A simple and effective outlier detection algorithm for categorical data", 《INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS》 *
张春生等: "基于纵横距离的单纯异常点检测算法及应用", 《内蒙古民族大学学报(自然科学版)》 *
王海龙: "基于信息熵的大规模网络流量异常检测", 《计算机工程》 *
郑黎明等: "基于多维熵值分类的骨干网流量异常检测研究", 《计算机研究与发展》 *

Also Published As

Publication number Publication date
CN104504233B (en) 2017-06-06

Similar Documents

Publication Publication Date Title
CN106056136A (en) Data clustering method for rapidly determining clustering center
CN110874381B (en) Spatial density clustering-based user side load data abnormal value identification method
CN106485089B (en) The interval parameter acquisition methods of harmonic wave user's typical condition
CN105574642A (en) Smart grid big data-based electricity price execution checking method
CN107391670A (en) A kind of mixing recommendation method for merging collaborative filtering and user property filtering
CN103473786A (en) Gray level image segmentation method based on multi-objective fuzzy clustering
CN107016407A (en) A kind of reaction type density peaks clustering method and system
CN105426441B (en) A kind of automatic preprocess method of time series
KR102593835B1 (en) Face recognition technology based on heuristic Gaussian cloud transformation
CN111401149B (en) Lightweight video behavior identification method based on long-short-term time domain modeling algorithm
CN108919067A (en) A kind of recognition methods for GIS partial discharge mode
CN115983984A (en) Multi-model fusion client risk rating method
CN117078048A (en) Digital twinning-based intelligent city resource management method and system
CN104268576A (en) Electric system transient stability classification method based on TNN-SVM
CN111026741A (en) Data cleaning method and device based on time series similarity
CN108764541B (en) Wind energy prediction method combining space characteristic and error processing
CN108985455A (en) A kind of computer application neural net prediction method and system
CN107832753B (en) Face feature extraction method based on four-value weight and multiple classification
CN103136540A (en) Behavior recognition method based on concealed structure reasoning
CN117556369A (en) Power theft detection method and system for dynamically generated residual error graph convolution neural network
CN104731887B (en) A kind of user method for measuring similarity in collaborative filtering
CN105184654A (en) Public opinion hotspot real-time acquisition method and acquisition device based on community division
Gajawada et al. Optimal clustering method based on genetic algorithm
CN104361135A (en) Image retrieval method
CN104680118A (en) Method and system for generating face character detection model

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20170606

Termination date: 20191114