CN104933080B - A kind of method and device of determining abnormal data - Google Patents

A kind of method and device of determining abnormal data Download PDF

Info

Publication number
CN104933080B
CN104933080B CN201410108593.3A CN201410108593A CN104933080B CN 104933080 B CN104933080 B CN 104933080B CN 201410108593 A CN201410108593 A CN 201410108593A CN 104933080 B CN104933080 B CN 104933080B
Authority
CN
China
Prior art keywords
doubtful
data
abnormal
data unit
unit
Prior art date
Application number
CN201410108593.3A
Other languages
Chinese (zh)
Other versions
CN104933080A (en
Inventor
颜海涛
Original Assignee
中国移动通信集团湖北有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国移动通信集团湖北有限公司 filed Critical 中国移动通信集团湖北有限公司
Priority to CN201410108593.3A priority Critical patent/CN104933080B/en
Publication of CN104933080A publication Critical patent/CN104933080A/en
Application granted granted Critical
Publication of CN104933080B publication Critical patent/CN104933080B/en

Links

Abstract

The invention discloses a kind of method of determining abnormal data, including:The cube is divided into N number of minimum data unit identical with the cube dimension by the traversing result obtained after being traversed according to each dimension to cube, and calculates the corresponding space length value of all minimum data units;Doubtful abnormal data set is determined according to the space length value;It is concentrated in the doubtful abnormal data and chooses a doubtful abnormal minimum data unit, recursive method is combined according to dimension, the doubtful abnormal minimum data unit, the minimum data unit adjacent with the doubtful abnormal minimum data unit are combined into doubtful abnormal data subset, and the space length difference of the doubtful abnormal data unit in the doubtful abnormal data subset is calculated, and then determine whether the doubtful abnormal data unit in the doubtful abnormal data subset is abnormal data unit.The invention also discloses a kind of devices of determining abnormal data.

Description

A kind of method and device of determining abnormal data
Technical field
The present invention relates to multidimensional data analysis technology more particularly to a kind of method and devices of determining abnormal data.
Background technology
In data analysis and Data Mining, multidimensional data analysis is very important on one side, the multidimensional Data analysis can be found from complicated multidimensional data there are the problem of or potential business opportunity.
In the prior art, there are three ways to analysis cube:First, cube is dropped Dimension, analyzes cube using decision tree scheduling algorithm;Second, using the complicated simulation algorithm such as neural network to multidimensional Data set is analyzed;Third analyzes cube according to expertise.But there are following problems for the above method:
(a)Existing analytic process is complicated, and the consuming time is long or even needs to answer cube by external tool Miscellaneous data modeling;
(b)Technical staff needs have certain basis to statistics or data analysis etc., therefore, to the technology of technical staff Level requirement is higher;
(c)Lack the mechanism that business personnel's experience and control method are embodied to data to the process that shows in the prior art, because This so that the data of output are only simple digital information, and not comprising business information, and non-technical professional cannot manage Solution, the data visualization degree for leading to output are poor;
(d)The existing analytic process to cube, which lays particular emphasis on to find out the multidimensional data and concentrate, has generality rule Data set, the data set with generality rule is fitted, so as to other similar to the cube under scene into Row analysis and prediction, but this process often ignores the discovery to abnormal data.
Invention content
To solve existing technical problem, an embodiment of the present invention provides a kind of methods and dress of determining abnormal data It puts, abnormal data can be accurately positioned.
In order to achieve the above objectives, the technical proposal of the invention is realized in this way:It is determined the present invention provides one kind abnormal The method of data, including:
The traversing result obtained after being traversed according to each dimension to cube, by the cube point Into N number of minimum data unit identical with the cube dimension, and it is corresponding to calculate all minimum data units Space length value;
Doubtful abnormal data set is determined according to the space length value;
It is concentrated in the doubtful abnormal data and chooses a doubtful abnormal minimum data unit, combined according to dimension recursive Method, by the doubtful abnormal minimum data unit, the minimum data unit adjacent with the doubtful abnormal minimum data unit Be combined into doubtful abnormal data subset, and calculate the space of doubtful abnormal data unit in the doubtful abnormal data subset away from Deviation value, the size of the space length difference and the space length value of the doubtful abnormal minimum data unit, determines Whether the doubtful abnormal data unit in the doubtful abnormal data subset is abnormal data unit;
Wherein, N is the product for the dimension value number that the multidimensional data concentrates each dimension.
Further, before the traversing result that the basis obtains after being traversed to each dimension of cube, The method further includes:
Cube, and the control rule of cube described in input control are inputted, it will according to the control rule The cube is converted into pending data object.
Further, it is described that doubtful abnormal data set is determined according to the space length value, including:
All space length values are fitted according to normal distribution rule, and are chosen with being fitted the space length The standard deviation for the normal distribution being worth to apart from the farthest corresponding data of X point as doubtful abnormal data, it is and described doubtful The collection of the corresponding minimum data unit composition of abnormal data is combined into doubtful abnormal data set.
Further, the space length for calculating the doubtful abnormal data unit in the doubtful abnormal data subset is poor Value, including:
Exterior space distance value and the inside for calculating the doubtful abnormal data unit in the doubtful abnormal data subset are empty Between distance value, space length difference is calculated according to the exterior space distance value and the inner space distance value.
Further, the space length difference and the space length of the doubtful abnormal minimum data unit The size of value determines whether the doubtful abnormal data unit in the doubtful abnormal data subset is abnormal data unit, including:
It is described doubtful if the space length difference is more than the space length value of the doubtful abnormal minimum data unit Abnormal data unit is abnormal data unit;
It is described to doubt if the space length difference is not more than the space length value of the doubtful abnormal minimum data unit It is normal data unit like abnormal data unit.
Further, whether the doubtful abnormal data unit determined in the doubtful abnormal data subset is abnormal number After unit, the method further includes:
Show abnormal data set, the abnormal data set is that all doubtful exceptions that the doubtful abnormal data is concentrated are minimum Data cell combines the set that the abnormal minimum data unit that recursive method determines forms by dimension.
The present invention also provides a kind of device of determining abnormal data, including:
Computing unit, the traversing result obtained after being traversed for basis to each dimension of cube, by institute It states cube and is divided into N number of minimum data unit identical with the cube dimension, and calculate all minimums The corresponding space length value of data cell;Wherein, N is the product for the dimension value number that the multidimensional data concentrates each dimension;
Determination unit, for determining doubtful abnormal data set according to the space length value;
Recursive unit chooses a doubtful abnormal minimum data unit for being concentrated in the doubtful abnormal data, according to Dimension combines recursive method, will the doubtful abnormal minimum data unit, adjacent with the doubtful exception minimum data unit Minimum data unit be combined into doubtful abnormal data subset, and calculate the doubtful abnormal number in the doubtful abnormal data subset According to the space length difference of unit;
Comparing unit, for the space length difference and the space length of the doubtful abnormal minimum data unit The size of value determines whether the doubtful abnormal data unit in the doubtful abnormal data subset is abnormal data unit.
Further, described device further includes:
Input unit, for inputting cube, and the control rule of cube described in input control;
Converting unit, for the cube to be converted into pending data object according to the control rule.
Further, the determination unit includes:
Subelement is fitted, for all space length values being fitted according to normal distribution rule;
First chooses subelement, for choosing and being fitted the standard deviation distance for the normal distribution that the space length is worth to The farthest corresponding data of X point are as doubtful abnormal data;
Determination subelement, for the set of minimum data unit composition corresponding with the doubtful abnormal data to be determined as Doubtful abnormal data set.
Further, the recursive unit includes:Second chooses subelement, for concentrating choosing in the doubtful abnormal data Take a doubtful abnormal minimum data unit;
Subelement is combined, for combining recursive method according to dimension, by the doubtful abnormal minimum data unit and institute It states the adjacent minimum data unit of doubtful abnormal minimum data unit and is combined into doubtful abnormal data subset;
First computation subunit, for calculating the outside of the doubtful abnormal data unit in the doubtful abnormal data subset Space length value and inner space distance value;
Second computation subunit, for calculating space according to the exterior space distance value and the inner space distance value Distance difference.
Further, the space length difference and the space length of the doubtful abnormal minimum data unit The size of value determines whether the doubtful abnormal data unit in the doubtful abnormal data subset is abnormal data unit, including:
It is described doubtful if the space length difference is more than the space length value of the doubtful abnormal minimum data unit Abnormal data unit is abnormal data unit;
It is described to doubt if the space length difference is not more than the space length value of the doubtful abnormal minimum data unit It is normal data unit like abnormal data unit.
Further, described device further includes:
Display unit, for showing abnormal data set;The abnormal data set is the institute that the doubtful abnormal data is concentrated The set that the abnormal minimum data unit that recursive method determines forms is combined by dimension by doubtful abnormal minimum data unit.
Compared with conventional method, the method and device of determining abnormal data that the embodiment of the present invention is provided is avoided that pair Cube carries out dimensionality reduction, since the process that dimensionality reduction is carried out to cube is the process of information loss, the present invention Abnormal data is accurately positioned under the premise of not to cube information loss in embodiment;
The embodiment of the present invention to the dimension of continuous type or discrete type without normalized, by dimension combine recurrence with And combination of extending determines abnormal data in entire dimension, moreover, the embodiment of the present invention will be pending more according to control rule Dimension data collection is converted into metadata, establishes the correspondence of the data and control rule of multidimensional data concentration, accordingly, it is determined that go out Abnormal data is more accurate, and the abnormal data determined can carry more business information, understands convenient for technical staff;
The embodiment of the present invention can concentrate the data distribution feature of the determining cube in multidimensional data automatically, and then Determine abnormal data, and analytic process is simple, it is short to expend the time, without external tools such as data modeling;It is moreover, of the invention Dimensional extent information, data size information, the intensity of anomaly information of abnormal data are carried in the abnormal data that embodiment is determined Etc. information, intensity of anomaly information is such as determined by space length value, and then understand convenient for technical staff, visualization is high, right The requirement of technical staff is relatively low.
Description of the drawings
Fig. 1 is the realization flow diagram for the method that the embodiment of the present invention determines abnormal data;
Fig. 2 is the structure diagram for the device that the embodiment of the present invention determines abnormal data;
Fig. 3 is the structure diagram for the determination unit that the embodiment of the present invention is determined in the device of abnormal data;
Fig. 4 is the structure diagram for the recursive unit that the embodiment of the present invention is determined in the device of abnormal data;
Fig. 5 is the flow diagram of the specific implementation for the method that the embodiment of the present invention determines abnormal data.
Specific embodiment
Embodiments of the present invention are described in detail below in conjunction with specific embodiment and attached drawing.
Fig. 1 is the realization flow diagram for the method that the embodiment of the present invention determines abnormal data, as shown in Figure 1, a kind of true Determine the method for abnormal data, including:
Step 101:The traversing result obtained after being traversed according to each dimension to cube, by the multidimensional Data set is divided into N number of minimum data unit identical with the cube dimension, and calculates all minimum data lists The corresponding space length value of member;
Step 102:Doubtful abnormal data set is determined according to the space length value;
Step 103:It is concentrated in the doubtful abnormal data and chooses a doubtful abnormal minimum data unit, according to dimension group Recursive method is closed, by the doubtful abnormal minimum data unit, the minimum adjacent with the doubtful abnormal minimum data unit Data cell is combined into doubtful abnormal data subset, and calculates the doubtful abnormal data unit in the doubtful abnormal data subset Space length difference, the space length difference is big with the space length value of the doubtful abnormal minimum data unit It is small, determine whether the doubtful abnormal data unit in the doubtful abnormal data subset is abnormal data unit;
Wherein, N is the product for the dimension value number that the multidimensional data concentrates each dimension.
Further, before step 101, the method further includes:
Cube, and the control rule of cube described in input control are inputted, it will according to the control rule The cube is converted into pending data object.
Further, it is described that doubtful abnormal data set is determined according to the space length value, including:
All space length values are fitted according to normal distribution rule, and are chosen with being fitted the space length The standard deviation for the normal distribution being worth to apart from the farthest corresponding data of X point as doubtful abnormal data, it is and described doubtful The collection of the corresponding minimum data unit composition of abnormal data is combined into doubtful abnormal data set.
Here, the X can be the arbitrary positive integer more than or equal to 1.
Further, the space length for calculating the doubtful abnormal data unit in the doubtful abnormal data subset is poor Value, including:
Exterior space distance value and the inside for calculating the doubtful abnormal data unit in the doubtful abnormal data subset are empty Between distance value, space length difference is calculated according to the exterior space distance value and the inner space distance value.
Here, suppose that { M, N } is a doubtful abnormal data unit in doubtful abnormal data subset, then it is described doubtful different The exterior space distance value OutDistince { M, N } of regular data unit { M, N } is:
Wherein, the MiFor the minimum data unit neighbouring with the M, Y1For the minimum data unit neighbouring with the M Number;The NjFor the minimum data unit neighbouring with the N, Y2Number for the minimum data unit neighbouring with the N;
Inner space distance value InnerDistince { M, N } of the doubtful abnormal data unit { M, N } is:
The space length difference is that Distince { M, N } is:
Wherein, the G is the dimension of cube.
Here, the denominator in the calculation formula of the inner space distance value of the doubtful abnormal data unit, with it is doubtful different There is the number of the doubtful abnormal minimum data unit in boundary part in the corresponding doubtful abnormal data subset of regular data unit It closes.
Further, the space length difference and the space length of the doubtful abnormal minimum data unit The size of value determines whether the doubtful abnormal data unit in the doubtful abnormal data subset is abnormal data unit, including:
It is described doubtful if the space length difference is more than the space length value of the doubtful abnormal minimum data unit Abnormal data unit is abnormal data unit;
It is described to doubt if the space length difference is not more than the space length value of the doubtful abnormal minimum data unit It is normal data unit like abnormal data unit.
Further, whether the doubtful abnormal data unit determined in the doubtful abnormal data subset is abnormal number After unit, the method further includes:
Show abnormal data set, the abnormal data set is that all doubtful exceptions that the doubtful abnormal data is concentrated are minimum Data cell combines the set that the abnormal minimum data unit that recursive method determines forms by dimension.
To realize the above method, the embodiment of the present invention additionally provides a kind of device of determining abnormal data, including:
Computing unit 21, the traversing result obtained after being traversed for basis to each dimension of cube will The cube is divided into N number of minimum data unit identical with the cube dimension, and calculate it is all it is described most The corresponding space length value of small data unit;Wherein, N is the product for the dimension value number that the multidimensional data concentrates each dimension;
Determination unit 22, for determining doubtful abnormal data set according to the space length value;
Recursive unit 23 is chosen a doubtful abnormal minimum data unit for being concentrated in the doubtful abnormal data, is pressed Combine recursive method according to dimension, will the doubtful abnormal minimum data unit, with the doubtful exception minimum data unit phase Adjacent minimum data unit is combined into doubtful abnormal data subset, and calculate the doubtful exception in the doubtful abnormal data subset The space length difference of data cell;
Comparing unit 24, for the space length difference and the doubtful abnormal minimum data unit space away from Size from value determines whether the doubtful abnormal data unit in the doubtful abnormal data subset is abnormal data unit.
Further, described device further includes:
Input unit 25, for inputting cube, and the control rule of cube described in input control;
Converting unit 26, for the cube to be converted into pending data pair according to the control rule As.
Further, as shown in figure 3, the determination unit 22 includes:
Subelement 221 is fitted, for all space length values being fitted according to normal distribution rule;
First chooses subelement 222, for choosing and being fitted the standard deviation for the normal distribution that the space length is worth to Apart from the farthest corresponding data of X point as doubtful abnormal data;
Determination subelement 223, for the set of minimum data unit composition corresponding with the doubtful abnormal data is true It is set to doubtful abnormal data set.
Further, as shown in figure 4, the recursive unit 23 includes:
Second chooses subelement 231, and a doubtful abnormal minimum data is chosen for being concentrated in the doubtful abnormal data Unit;
Combine subelement 232, for combining recursive method according to dimension, will the doubtful exception minimum data unit, The minimum data unit adjacent with the doubtful abnormal minimum data unit is combined into doubtful abnormal data subset;
First computation subunit 233, for calculating the doubtful abnormal data unit in the doubtful abnormal data subset Exterior space distance value and inner space distance value;
Second computation subunit 234, for according to the exterior space distance value and inner space distance value calculating Space length difference.
Further, the space length difference and the space length of the doubtful abnormal minimum data unit The size of value determines whether the doubtful abnormal data unit in the doubtful abnormal data subset is abnormal data unit, including:
It is described doubtful if the space length difference is more than the space length value of the doubtful abnormal minimum data unit Abnormal data unit is abnormal data unit;
It is described to doubt if the space length difference is not more than the space length value of the doubtful abnormal minimum data unit It is normal data unit like abnormal data unit.
Further, described device further includes:
Display unit 27, for showing abnormal data set;The abnormal data set is that the doubtful abnormal data is concentrated All doubtful abnormal minimum data units combine the collection that the abnormal minimum data unit that recursive method determines forms by dimension It closes.
Here, the computing unit, determination unit, recursive unit, comparing unit and converting unit can be run on It, can be by the central processing unit that is located on computer on computer(CPU)Or microprocessor(MPU)Or digital signal processor (DSP)Or programmable gate array(FPGA)It realizes;The input unit can be input to by input equipment on computer;Institute Stating display unit can be by showing that equipment shows abnormal data.
Embodiment 1
Fig. 5 is the flow diagram of the specific implementation for the method that the embodiment of the present invention determines abnormal data, with three-dimensional data For collection, the method for the embodiment of the present invention is described in detail;The dimension of the 3-D data set for dimension A, dimension B and Dimension C, wherein, the dimension A represents the time, and the dimension B represents region, and the dimension C represents the age, and numerical value represents client Number;As shown in figure 5, the method that the embodiment of the present invention determines abnormal data, including:
Step 501:A 3-D data set, and the control rule of 3-D data set described in input control are inputted, according to institute It states control rule and the 3-D data set is converted to data object to be run;
The key of this method embodiment be the data rule of cube is combed it is clear, i.e., according to control rule, Determine that the multidimensional data concentrates the business information such as value, the type of dimension of each dimension, and then will according to the business information The cube is converted into metadata to be run, and then the data for establishing multidimensional data concentration are corresponding with control rule Relationship.
Here, the control rule is the metadata information that multidimensional data concentrates data, can be used to detect the multidimensional number According to the data of concentration whether meet dimension type such as discrete type or continuous type, data range(If the age is between 10-90)、 Dimension logical relation(Such as dimension A and the dimension B degrees of association)Deng.
Step 502:Dimension traversal is carried out to each dimension of the 3-D data set, obtains the dimension of the 3-D data set Spend traversing result;
For example, the dimension A concentrated to the three-dimensional data is traversed, the traversing result of the dimension A is obtained;Specifically Ground if the dimension value of the dimension A is continuous variable, traverses the corresponding all dimension values of the dimension A;Here, suppose that the dimension Degree A has 13 dimension values, and the traversing result that the dimension A is exported according to traversal order is:A1In January ,=2012, dimension A2=2012 years 2 months ... ..., dimension A13In January ,=2013;
The dimension B is traversed, obtains the traversing result of the dimension B;Specifically, if the dimension value of the dimension B It is discrete variable, then the corresponding all dimension values of dimension B is begun stepping through from first;It is assumed that the dimension B has 16 dimensions Value, the traversing result that the dimension B is exported according to traversal order are:Dimension B1=Wuhan, dimension B2=Yichang ... ..., dimension B16= Yibin;
By taking the method traversed to the dimension A and dimension B as an example, the dimension C is traversed, is corresponded to The traversing result of the dimension C;Here, suppose that the dimension value of the dimension C is discrete variable, and dimension value number is 67, according to The traversing result that traversal order exports the dimension C is:Dimension C1=13, dimension C2=15 ... ..., dimension C67=55。
Step 503:Permutation and combination is carried out to the traversing result obtained in step 502, the 3-D data set is divided into N number of Dimension is three-dimensional minimum data unit, and calculates the corresponding space length value of all each described minimum data units;
In the embodiment of the present invention, the permutation and combination shares 13 × 16 × 67 kinds of situations, correspondingly, by the three-dimensional data Collection is divided into N=13 × 16 × 67 dimension for three-dimensional minimum data unit;
Specific steps include:
(a)The 3-D data set is divided into N=13 × 16 × 67 dimension for three-dimensional minimum data unit Mabc, it is described Mabc={ dimension Aa, dimension Bb, dimension Cc ∣ 1≤a≤13,1≤b≤16,1≤c≤67 };Therefore, the MabcRepresent that dimension A is A, dimension B is b and corresponding client's number when dimension C is c;
(b)Calculate the space length value of all minimum data units;
Here, each minimum data unit periphery is up to Y neighbouring minimum data units, the Y and the multidimensional The dimension of data set is related, specifically, Y=(Dimension -1)× dimension, by taking the present embodiment 3-D data set as an example, Y=(3-1)×3= 6;Wherein, the minimum data unit in middle section up and down, have adjacent minimum data unit around, totally 6 It is a, and the number of the adjacent minimum data unit of the minimum data unit in boundary part can be more less slightly, less than 6;Meter Calculate the space length value of each minimum data unit, all minimum data units adjacent with the minimum data unit Distince;The MabcSpace length value Distince(Mabc)For:
Wherein, the MiFor with the MabcAdjacent minimum data unit, Y are and the MabcAdjacent minimum data list The number of member.
Further, by taking cuboid as an example, step 503 is described in detail:The length and width and high score of the cuboid Not Dui Yingyu the 3-D data set dimension A dimension value number, the dimension value number of dimension B and dimension C dimension value numbers, because This, the rectangular volume is 13 × 16 × 67;By the cuboid be divided into 13 × 16 × 67 volumes be 1 × 1 × 1 it is small Square, each small square are known as a unit, and each unit is known as minimum data unit;
Wherein, small square in the cuboid middle section up and down, left and right and it is front and rear have it is adjacent small The minimum data unit of square, i.e. middle section has 6 adjacent minimum data units, and with the boundary portion of the cuboid Point the number of the adjacent small square of small square be less than 6, i.e., it is adjacent most with the minimum data unit in boundary part The number of small data unit is less than 6;
Specifically, MabcDuring minimum data unit for the 3-D data set middle section, on dimension A with the Mabc Adjacent minimum data unit is M(a-1)bcAnd M(a+1)bc, on dimension B with the MabcAdjacent minimum data unit is Ma(b-1)cAnd Ma(b+1)c, on dimension C with the MabcAdjacent minimum data unit is Mab(c-1)And Mab(c+1), therefore, with institute State MabcThe number of adjacent minimum data unit is 6, then the MabcSpace length value Distince(Mabc)For:
Here, as the MabcDuring in boundary part, the number of adjacent minimum data unit is less than 6, for example, with it is described MabcAdjacent minimum data unit is M(a+1)bc、Ma(b+1)cAnd Mab(c+1)When, the MabcSpace length value Distince (Mabc)For:
Step 504:It is concentrated according to the space length value in the three-dimensional data and determines doubtful abnormal data set;
Since the space length value of all minimum data units that multidimensional data is concentrated meets normal distribution rule, The space length value for all minimum data units that three-dimensional data described in the present embodiment is concentrated meets normal distribution rule;Root According to mentioned above principle, the three-dimensional data is concentrated the space length values of all minimum data units according to normal distribution rule into Row fitting, and choose corresponding apart from X farthest point with the fitting space length standard deviation of normal distribution being worth to Data as doubtful abnormal data, and the collection of minimum data unit composition corresponding with the doubtful abnormal data be combined into it is doubtful different Regular data collection;Wherein, the X is the positive integer more than or equal to 1, can be the maximum integer less than or equal to N × 0.01.
Step 505:The doubtful abnormal minimum data list of arbitrary selection one is concentrated in the doubtful abnormal data that step 504 obtains Member combines recursive method according to dimension, in each dimension, gradually to adjacent with the doubtful abnormal minimum data unit Minimum data unit, which extend combining, obtains doubtful abnormal data subset, and calculate doubting in the doubtful abnormal data subset Like abnormal data unit exterior space distance value and inner space distance value and calculate the exterior space distance value and interior The difference of portion's space length value obtains space length difference, the space length difference and the doubtful exception chosen The size of the space length value of minimum data unit judges that the doubtful abnormal data unit in the doubtful abnormal data subset is No is abnormal data unit.
Specifically, it is assumed that choose the doubtful abnormal minimum data unit M that doubtful abnormal data is concentrateda1b1c1, and with institute State Ma1b1c1Adjacent minimum data unit has 6, to the M on dimension Aa1b1c1First time extension is carried out, step is:
On dimension A, with the Ma1b1c1Neighbouring minimum data unit is respectively M(a1-1)b1c1With M(a1+1)b1c1, first will The Ma1b1c1、M(a1-1)b1c1And M(a1+1)b1c1Permutation and combination is carried out, obtains newest doubtful abnormal data subset A, respectively {Ma1b1c1, M(a1-1)b1c1}、{Ma1b1c1, M(a1+1)b1c1And { M(a1-1)b1c1, M(a1+1)b1c1, in above-mentioned doubtful abnormal data A doubtful abnormal data unit of conduct one is arbitrarily chosen in collection A, for example, choosing { Ma1b1c1、M(a1-1)b1c1Doubted as one Like abnormal data unit, and with the { Ma1b1c1、M(a1-1)b1c1Adjacent minimum data unit totally 10, described in calculating {Ma1b1c1、M(a1-1)b1c1And { the Ma1b1c1、M(a1-1)b1c1Space length value between neighbouring minimum data unit;
Here, described { Ma1b1c1、M(a1-1)b1c1Space length value be known as exterior space distance value OutDistince {Ma1b1c1、M(a1-1)b1c1};{ M described in calculatinga1b1c1、M(a1-1)b1c1Exterior space distance value, calculation formula is:
Wherein, the MiFor with the Ma1b1c1Neighbouring minimum data unit, Y1For with the Ma1b1c1Neighbouring minimum number According to the number of unit;The MjFor with the M(a1-1)b1c1Neighbouring minimum data unit, Y2For with the M(a1-1)b1c1Neighbouring The number of minimum data unit.
{ M described in calculatinga1b1c1、M(a1-1)b1c1Inner space distance value, calculation formula is:
According to { the Ma1b1c1、M(a1-1)b1c1Exterior space distance value and inner space distance value calculate described in {Ma1b1c1、M(a1-1)b1c1Space length difference Distince { Ma1b1c1、M(a1-1)b1c1, calculation formula is:
Further, judge the doubtful abnormal data unit { M in the doubtful abnormal data subset Aa1b1c1、 M(a1-1)b1c1Whether it is abnormal data unit, deterministic process is:
Compare { the Ma1b1c1、M(a1-1)b1c1Space length difference Distince { Ma1b1c1、M(a1-1)b1c1With it is described The space length value Distince { M of Ma1b1c1a1b1c1 } size the, if Distince { Ma1b1c1、M(a1-1)b1c1}> Distince{Ma1b1c1, then described { Ma1b1c1、M(a1-1)b1c1Be abnormal data unit, i.e.,:The Ma1b1c1And M(a1-1)b1c1 For abnormal minimum data unit;
Otherwise, described { Ma1b1c1、M(a1-1)b1c1Be normal data unit, i.e.,:The Ma1b1c1And M(a1-1)b1c1It is normal Minimum data unit.
According to above-mentioned extension computational methods, on dimension A, all doubtful exceptions in doubtful abnormal data subset A are calculated Exterior space distance value, inner space distance value and the space length difference of data cell, to determine the doubtful abnormal number Whether it is abnormal data unit according to unit, and then obtains the abnormal data subset A on dimension A;
Meanwhile according to above-mentioned extension computational methods, respectively to the M on dimension B and dimension Ca1b1c1Carry out extension meter It calculates, respectively obtains the abnormal data subset B and abnormal data subset C on dimension B and dimension C;By obtained abnormal data Collect the set of A, abnormal data subset B and the corresponding abnormal minimum data unit compositions of abnormal data subset C as abnormal data Subset.
Step 506:Dimension combines recursive method according to step 505, the institute concentrated to the doubtful abnormal data There is doubtful abnormal minimum data unit to be combined recurrence, and by the abnormal minimum data in obtained all abnormal data subsets The set of unit composition is determined as abnormal data set.
Step 507:Abnormal data set determined by display.
Here, abnormal data is all determined since the present invention implements to concentrate in multidimensional data, and determine Dimensional extent, size of data, the intensity of anomaly that the abnormal data includes the abnormal data are such as determined by space length value Therefore intensity of anomaly, and then understands that visualization is high convenient for technical staff.
The foregoing is only a preferred embodiment of the present invention, is not intended to limit the scope of the present invention.

Claims (11)

  1. A kind of 1. method of determining abnormal data, which is characterized in that the method includes:
    The cube is divided into N by the traversing result obtained after being traversed according to each dimension to cube A minimum data unit identical with the cube dimension, and calculate the corresponding space of all minimum data units Distance value;
    Doubtful abnormal data set is determined according to the space length value;
    It is concentrated in the doubtful abnormal data and chooses a doubtful abnormal minimum data unit, recursive side is combined according to dimension Method, by the doubtful abnormal minimum data unit, the minimum data unit group adjacent with the doubtful abnormal minimum data unit Doubtful abnormal data subset is synthesized, and calculates the space length of the doubtful abnormal data unit in the doubtful abnormal data subset Difference, the size of the space length difference and the space length value of the doubtful abnormal minimum data unit, determines institute Whether the doubtful abnormal data unit stated in doubtful abnormal data subset is abnormal data unit;
    Wherein, N is the product for the dimension value number that the multidimensional data concentrates each dimension;It is described to calculate the doubtful abnormal data The space length difference of doubtful abnormal data unit in subset, including:It calculates doubtful in the doubtful abnormal data subset The exterior space distance value of abnormal data unit and inner space distance value, according to the exterior space distance value and the inside Space length value calculates space length difference.
  2. 2. according to the method described in claim 1, it is characterized in that, each dimension progress time of the basis to cube Before the traversing result obtained after going through, the method further includes:
    Cube, and the control rule of cube described in input control are inputted, according to the control rule by described in Cube is converted into pending data object.
  3. 3. according to the method described in claim 1, it is characterized in that, described determine doubtful abnormal number according to the space length value According to collection, including:
    All space length values are fitted according to normal distribution rule, and selection is worth with being fitted the space length The standard deviation for the normal distribution arrived apart from the farthest corresponding data of X point as doubtful abnormal data, with the doubtful exception The collection of the corresponding minimum data unit composition of data is combined into doubtful abnormal data set.
  4. 4. according to the method described in claim 1, it is characterized in that, the space length difference with it is described doubtful different The size of the space length value of normal minimum data unit, determines the doubtful abnormal data unit in the doubtful abnormal data subset Whether it is abnormal data unit, including:
    If the space length difference is more than the space length value of the doubtful abnormal minimum data unit, the doubtful exception Data cell is abnormal data unit;
    It is described doubtful different if the space length difference is not more than the space length value of the doubtful abnormal minimum data unit Regular data unit is normal data unit.
  5. It is 5. according to the method described in claim 1, it is characterized in that, doubtful in the determining doubtful abnormal data subset After whether abnormal data unit is abnormal data unit, the method further includes:
    Show abnormal data set, the abnormal data set is all doubtful abnormal minimum data that the doubtful abnormal data is concentrated Unit combines the set that the abnormal minimum data unit that recursive method determines forms by dimension.
  6. 6. a kind of device of determining abnormal data, which is characterized in that described device includes:
    Computing unit, the traversing result obtained after being traversed for basis to each dimension of cube, will be described more Dimension data collection is divided into N number of minimum data unit identical with the cube dimension, and calculates all minimum data The corresponding space length value of unit;Wherein, N is the product for the dimension value number that the multidimensional data concentrates each dimension;
    Determination unit, for determining doubtful abnormal data set according to the space length value;
    Recursive unit chooses a doubtful abnormal minimum data unit, according to dimension for being concentrated in the doubtful abnormal data Recursive method is combined, by the doubtful abnormal minimum data unit, adjacent most with the doubtful exception minimum data unit Small data unit is combined into doubtful abnormal data subset, and calculates the doubtful abnormal data list in the doubtful abnormal data subset The exterior space distance value of member and inner space distance value are calculated according to the exterior space distance value and inner space distance value Space length difference;
    Comparing unit, for the space length difference and the space length value of the doubtful abnormal minimum data unit Size determines whether the doubtful abnormal data unit in the doubtful abnormal data subset is abnormal data unit.
  7. 7. device according to claim 6, which is characterized in that described device further includes:
    Input unit, for inputting cube, and the control rule of cube described in input control;
    Converting unit, for the cube to be converted into pending data object according to the control rule.
  8. 8. device according to claim 6, which is characterized in that the determination unit includes:
    Subelement is fitted, for all space length values being fitted according to normal distribution rule;
    First chooses subelement, farthest for choosing and being fitted the standard deviation for the normal distribution that the space length is worth to distance The corresponding data of X point as doubtful abnormal data;
    Determination subelement, it is doubtful for the set of minimum data unit composition corresponding with the doubtful abnormal data to be determined as Abnormal data set.
  9. 9. device according to claim 6, which is characterized in that the recursive unit includes:Second chooses subelement, is used for It is concentrated in the doubtful abnormal data and chooses a doubtful abnormal minimum data unit;
    Subelement is combined, for combining recursive method according to dimension, is doubted by the doubtful abnormal minimum data unit, with described Doubtful abnormal data subset is combined into like the adjacent minimum data unit of abnormal minimum data unit;
    First computation subunit, for calculating the exterior space of the doubtful abnormal data unit in the doubtful abnormal data subset Distance value and inner space distance value;
    Second computation subunit, for calculating space length according to the exterior space distance value and the inner space distance value Difference.
  10. 10. device according to claim 6, which is characterized in that the space length difference with it is described doubtful The size of the space length value of abnormal minimum data unit determines the doubtful abnormal data list in the doubtful abnormal data subset Whether member is abnormal data unit, including:
    If the space length difference is more than the space length value of the doubtful abnormal minimum data unit, the doubtful exception Data cell is abnormal data unit;
    It is described doubtful different if the space length difference is not more than the space length value of the doubtful abnormal minimum data unit Regular data unit is normal data unit.
  11. 11. device according to claim 6, which is characterized in that described device further includes:
    Display unit, for showing abnormal data set;The abnormal data set is that described doubtful all of abnormal data concentration doubt The set that the abnormal minimum data unit that recursive method determines forms is combined by dimension like abnormal minimum data unit.
CN201410108593.3A 2014-03-21 2014-03-21 A kind of method and device of determining abnormal data CN104933080B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410108593.3A CN104933080B (en) 2014-03-21 2014-03-21 A kind of method and device of determining abnormal data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410108593.3A CN104933080B (en) 2014-03-21 2014-03-21 A kind of method and device of determining abnormal data

Publications (2)

Publication Number Publication Date
CN104933080A CN104933080A (en) 2015-09-23
CN104933080B true CN104933080B (en) 2018-06-26

Family

ID=54120248

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410108593.3A CN104933080B (en) 2014-03-21 2014-03-21 A kind of method and device of determining abnormal data

Country Status (1)

Country Link
CN (1) CN104933080B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107566192B (en) * 2017-10-18 2019-09-20 中国联合网络通信集团有限公司 A kind of abnormal flow processing method and Network Management Equipment
CN109035021B (en) * 2018-07-17 2020-06-09 阿里巴巴集团控股有限公司 Method, device and equipment for monitoring transaction index

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101533407A (en) * 2009-04-10 2009-09-16 中国科学院软件研究所 Method for detecting exceptional data in ETL flow
CN102339288A (en) * 2010-07-21 2012-02-01 中国移动通信集团辽宁有限公司 Method and device for detecting abnormal data of data warehouse
CN103198147A (en) * 2013-04-19 2013-07-10 上海岩土工程勘察设计研究院有限公司 Method for distinguishing and processing abnormal automatized monitoring data

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101533407A (en) * 2009-04-10 2009-09-16 中国科学院软件研究所 Method for detecting exceptional data in ETL flow
CN102339288A (en) * 2010-07-21 2012-02-01 中国移动通信集团辽宁有限公司 Method and device for detecting abnormal data of data warehouse
CN103198147A (en) * 2013-04-19 2013-07-10 上海岩土工程勘察设计研究院有限公司 Method for distinguishing and processing abnormal automatized monitoring data

Also Published As

Publication number Publication date
CN104933080A (en) 2015-09-23

Similar Documents

Publication Publication Date Title
Joseph et al. Impact of regularization on spectral clustering
Wang et al. Confidence analysis of standard deviational ellipse and its extension into higher dimensional Euclidean space
Sarwate et al. Signal processing and machine learning with differential privacy: Algorithms and challenges for continuous data
Lin et al. A hybrid evolutionary immune algorithm for multiobjective optimization problems
Zhang et al. Spectral methods meet EM: A provably optimal algorithm for crowdsourcing
Livne et al. Lean algebraic multigrid (LAMG): Fast graph Laplacian linear solver
Mauroy et al. On the use of Fourier averages to compute the global isochrons of (quasi) periodic dynamics
Wang et al. Information theory in scientific visualization
Giordano et al. Linear response methods for accurate covariance estimates from mean field variational Bayes
Grafke et al. The instanton method and its numerical implementation in fluid mechanics
Owhadi Bayesian numerical homogenization
Nie et al. The algebraic degree of semidefinite programming
Yang et al. Scalable optimization of neighbor embedding for visualization
Choo et al. Customizing computational methods for visual analytics with big data
Sussman et al. A consistent adjacency spectral embedding for stochastic blockmodel graphs
Bresson et al. Multiclass total variation clustering
Sergeyev et al. Numerical Methods for Solving Initial Value Problems on the Infinity Computer.
Delius et al. Quantum group symmetry in sine-Gordon and affine Toda field theories on the half-line
Yu et al. Hierarchical streamline bundles
Zhang et al. Linear or nonlinear? Automatic structure discovery for partially linear models
Deb et al. An integrated approach to automated innovization for discovering useful design principles: Case studies from engineering
Lespinats et al. DD-HDS: A method for visualization and exploration of high-dimensional data
Joseph et al. Impact of regularization on spectral clustering
Shang et al. Optimal error regions for quantum state estimation
Xu et al. A rough margin-based ν-twin support vector machine

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant