CN104933080B  A kind of method and device of determining abnormal data  Google Patents
A kind of method and device of determining abnormal data Download PDFInfo
 Publication number
 CN104933080B CN104933080B CN201410108593.3A CN201410108593A CN104933080B CN 104933080 B CN104933080 B CN 104933080B CN 201410108593 A CN201410108593 A CN 201410108593A CN 104933080 B CN104933080 B CN 104933080B
 Authority
 CN
 China
 Prior art keywords
 doubtful
 data
 abnormal
 data unit
 unit
 Prior art date
Links
 230000002159 abnormal effect Effects 0.000 title claims abstract description 297
 230000000875 corresponding Effects 0.000 claims abstract description 28
 239000000203 mixture Substances 0.000 claims description 13
 239000012141 concentrate Substances 0.000 claims description 11
 238000000034 method Methods 0.000 description 8
 238000010586 diagram Methods 0.000 description 7
 238000007405 data analysis Methods 0.000 description 5
 238000004364 calculation method Methods 0.000 description 4
 238000005516 engineering process Methods 0.000 description 3
 238000004458 analytical method Methods 0.000 description 2
 238000004422 calculation algorithm Methods 0.000 description 2
 238000007796 conventional method Methods 0.000 description 1
 238000007418 data mining Methods 0.000 description 1
 238000003066 decision tree Methods 0.000 description 1
 235000019800 disodium phosphate Nutrition 0.000 description 1
 230000001537 neural Effects 0.000 description 1
 230000001429 stepping Effects 0.000 description 1
Abstract
The invention discloses a kind of method of determining abnormal data, including：The cube is divided into N number of minimum data unit identical with the cube dimension by the traversing result obtained after being traversed according to each dimension to cube, and calculates the corresponding space length value of all minimum data units；Doubtful abnormal data set is determined according to the space length value；It is concentrated in the doubtful abnormal data and chooses a doubtful abnormal minimum data unit, recursive method is combined according to dimension, the doubtful abnormal minimum data unit, the minimum data unit adjacent with the doubtful abnormal minimum data unit are combined into doubtful abnormal data subset, and the space length difference of the doubtful abnormal data unit in the doubtful abnormal data subset is calculated, and then determine whether the doubtful abnormal data unit in the doubtful abnormal data subset is abnormal data unit.The invention also discloses a kind of devices of determining abnormal data.
Description
Technical field
The present invention relates to multidimensional data analysis technology more particularly to a kind of method and devices of determining abnormal data.
Background technology
In data analysis and Data Mining, multidimensional data analysis is very important on one side, the multidimensional
Data analysis can be found from complicated multidimensional data there are the problem of or potential business opportunity.
In the prior art, there are three ways to analysis cube：First, cube is dropped
Dimension, analyzes cube using decision tree scheduling algorithm；Second, using the complicated simulation algorithm such as neural network to multidimensional
Data set is analyzed；Third analyzes cube according to expertise.But there are following problems for the above method：
（a）Existing analytic process is complicated, and the consuming time is long or even needs to answer cube by external tool
Miscellaneous data modeling；
（b）Technical staff needs have certain basis to statistics or data analysis etc., therefore, to the technology of technical staff
Level requirement is higher；
（c）Lack the mechanism that business personnel's experience and control method are embodied to data to the process that shows in the prior art, because
This so that the data of output are only simple digital information, and not comprising business information, and nontechnical professional cannot manage
Solution, the data visualization degree for leading to output are poor；
（d）The existing analytic process to cube, which lays particular emphasis on to find out the multidimensional data and concentrate, has generality rule
Data set, the data set with generality rule is fitted, so as to other similar to the cube under scene into
Row analysis and prediction, but this process often ignores the discovery to abnormal data.
Invention content
To solve existing technical problem, an embodiment of the present invention provides a kind of methods and dress of determining abnormal data
It puts, abnormal data can be accurately positioned.
In order to achieve the above objectives, the technical proposal of the invention is realized in this way：It is determined the present invention provides one kind abnormal
The method of data, including：
The traversing result obtained after being traversed according to each dimension to cube, by the cube point
Into N number of minimum data unit identical with the cube dimension, and it is corresponding to calculate all minimum data units
Space length value；
Doubtful abnormal data set is determined according to the space length value；
It is concentrated in the doubtful abnormal data and chooses a doubtful abnormal minimum data unit, combined according to dimension recursive
Method, by the doubtful abnormal minimum data unit, the minimum data unit adjacent with the doubtful abnormal minimum data unit
Be combined into doubtful abnormal data subset, and calculate the space of doubtful abnormal data unit in the doubtful abnormal data subset away from
Deviation value, the size of the space length difference and the space length value of the doubtful abnormal minimum data unit, determines
Whether the doubtful abnormal data unit in the doubtful abnormal data subset is abnormal data unit；
Wherein, N is the product for the dimension value number that the multidimensional data concentrates each dimension.
Further, before the traversing result that the basis obtains after being traversed to each dimension of cube,
The method further includes：
Cube, and the control rule of cube described in input control are inputted, it will according to the control rule
The cube is converted into pending data object.
Further, it is described that doubtful abnormal data set is determined according to the space length value, including：
All space length values are fitted according to normal distribution rule, and are chosen with being fitted the space length
The standard deviation for the normal distribution being worth to apart from the farthest corresponding data of X point as doubtful abnormal data, it is and described doubtful
The collection of the corresponding minimum data unit composition of abnormal data is combined into doubtful abnormal data set.
Further, the space length for calculating the doubtful abnormal data unit in the doubtful abnormal data subset is poor
Value, including：
Exterior space distance value and the inside for calculating the doubtful abnormal data unit in the doubtful abnormal data subset are empty
Between distance value, space length difference is calculated according to the exterior space distance value and the inner space distance value.
Further, the space length difference and the space length of the doubtful abnormal minimum data unit
The size of value determines whether the doubtful abnormal data unit in the doubtful abnormal data subset is abnormal data unit, including：
It is described doubtful if the space length difference is more than the space length value of the doubtful abnormal minimum data unit
Abnormal data unit is abnormal data unit；
It is described to doubt if the space length difference is not more than the space length value of the doubtful abnormal minimum data unit
It is normal data unit like abnormal data unit.
Further, whether the doubtful abnormal data unit determined in the doubtful abnormal data subset is abnormal number
After unit, the method further includes：
Show abnormal data set, the abnormal data set is that all doubtful exceptions that the doubtful abnormal data is concentrated are minimum
Data cell combines the set that the abnormal minimum data unit that recursive method determines forms by dimension.
The present invention also provides a kind of device of determining abnormal data, including：
Computing unit, the traversing result obtained after being traversed for basis to each dimension of cube, by institute
It states cube and is divided into N number of minimum data unit identical with the cube dimension, and calculate all minimums
The corresponding space length value of data cell；Wherein, N is the product for the dimension value number that the multidimensional data concentrates each dimension；
Determination unit, for determining doubtful abnormal data set according to the space length value；
Recursive unit chooses a doubtful abnormal minimum data unit for being concentrated in the doubtful abnormal data, according to
Dimension combines recursive method, will the doubtful abnormal minimum data unit, adjacent with the doubtful exception minimum data unit
Minimum data unit be combined into doubtful abnormal data subset, and calculate the doubtful abnormal number in the doubtful abnormal data subset
According to the space length difference of unit；
Comparing unit, for the space length difference and the space length of the doubtful abnormal minimum data unit
The size of value determines whether the doubtful abnormal data unit in the doubtful abnormal data subset is abnormal data unit.
Further, described device further includes：
Input unit, for inputting cube, and the control rule of cube described in input control；
Converting unit, for the cube to be converted into pending data object according to the control rule.
Further, the determination unit includes：
Subelement is fitted, for all space length values being fitted according to normal distribution rule；
First chooses subelement, for choosing and being fitted the standard deviation distance for the normal distribution that the space length is worth to
The farthest corresponding data of X point are as doubtful abnormal data；
Determination subelement, for the set of minimum data unit composition corresponding with the doubtful abnormal data to be determined as
Doubtful abnormal data set.
Further, the recursive unit includes：Second chooses subelement, for concentrating choosing in the doubtful abnormal data
Take a doubtful abnormal minimum data unit；
Subelement is combined, for combining recursive method according to dimension, by the doubtful abnormal minimum data unit and institute
It states the adjacent minimum data unit of doubtful abnormal minimum data unit and is combined into doubtful abnormal data subset；
First computation subunit, for calculating the outside of the doubtful abnormal data unit in the doubtful abnormal data subset
Space length value and inner space distance value；
Second computation subunit, for calculating space according to the exterior space distance value and the inner space distance value
Distance difference.
Further, the space length difference and the space length of the doubtful abnormal minimum data unit
The size of value determines whether the doubtful abnormal data unit in the doubtful abnormal data subset is abnormal data unit, including：
It is described doubtful if the space length difference is more than the space length value of the doubtful abnormal minimum data unit
Abnormal data unit is abnormal data unit；
It is described to doubt if the space length difference is not more than the space length value of the doubtful abnormal minimum data unit
It is normal data unit like abnormal data unit.
Further, described device further includes：
Display unit, for showing abnormal data set；The abnormal data set is the institute that the doubtful abnormal data is concentrated
The set that the abnormal minimum data unit that recursive method determines forms is combined by dimension by doubtful abnormal minimum data unit.
Compared with conventional method, the method and device of determining abnormal data that the embodiment of the present invention is provided is avoided that pair
Cube carries out dimensionality reduction, since the process that dimensionality reduction is carried out to cube is the process of information loss, the present invention
Abnormal data is accurately positioned under the premise of not to cube information loss in embodiment；
The embodiment of the present invention to the dimension of continuous type or discrete type without normalized, by dimension combine recurrence with
And combination of extending determines abnormal data in entire dimension, moreover, the embodiment of the present invention will be pending more according to control rule
Dimension data collection is converted into metadata, establishes the correspondence of the data and control rule of multidimensional data concentration, accordingly, it is determined that go out
Abnormal data is more accurate, and the abnormal data determined can carry more business information, understands convenient for technical staff；
The embodiment of the present invention can concentrate the data distribution feature of the determining cube in multidimensional data automatically, and then
Determine abnormal data, and analytic process is simple, it is short to expend the time, without external tools such as data modeling；It is moreover, of the invention
Dimensional extent information, data size information, the intensity of anomaly information of abnormal data are carried in the abnormal data that embodiment is determined
Etc. information, intensity of anomaly information is such as determined by space length value, and then understand convenient for technical staff, visualization is high, right
The requirement of technical staff is relatively low.
Description of the drawings
Fig. 1 is the realization flow diagram for the method that the embodiment of the present invention determines abnormal data；
Fig. 2 is the structure diagram for the device that the embodiment of the present invention determines abnormal data；
Fig. 3 is the structure diagram for the determination unit that the embodiment of the present invention is determined in the device of abnormal data；
Fig. 4 is the structure diagram for the recursive unit that the embodiment of the present invention is determined in the device of abnormal data；
Fig. 5 is the flow diagram of the specific implementation for the method that the embodiment of the present invention determines abnormal data.
Specific embodiment
Embodiments of the present invention are described in detail below in conjunction with specific embodiment and attached drawing.
Fig. 1 is the realization flow diagram for the method that the embodiment of the present invention determines abnormal data, as shown in Figure 1, a kind of true
Determine the method for abnormal data, including：
Step 101：The traversing result obtained after being traversed according to each dimension to cube, by the multidimensional
Data set is divided into N number of minimum data unit identical with the cube dimension, and calculates all minimum data lists
The corresponding space length value of member；
Step 102：Doubtful abnormal data set is determined according to the space length value；
Step 103：It is concentrated in the doubtful abnormal data and chooses a doubtful abnormal minimum data unit, according to dimension group
Recursive method is closed, by the doubtful abnormal minimum data unit, the minimum adjacent with the doubtful abnormal minimum data unit
Data cell is combined into doubtful abnormal data subset, and calculates the doubtful abnormal data unit in the doubtful abnormal data subset
Space length difference, the space length difference is big with the space length value of the doubtful abnormal minimum data unit
It is small, determine whether the doubtful abnormal data unit in the doubtful abnormal data subset is abnormal data unit；
Wherein, N is the product for the dimension value number that the multidimensional data concentrates each dimension.
Further, before step 101, the method further includes：
Cube, and the control rule of cube described in input control are inputted, it will according to the control rule
The cube is converted into pending data object.
Further, it is described that doubtful abnormal data set is determined according to the space length value, including：
All space length values are fitted according to normal distribution rule, and are chosen with being fitted the space length
The standard deviation for the normal distribution being worth to apart from the farthest corresponding data of X point as doubtful abnormal data, it is and described doubtful
The collection of the corresponding minimum data unit composition of abnormal data is combined into doubtful abnormal data set.
Here, the X can be the arbitrary positive integer more than or equal to 1.
Further, the space length for calculating the doubtful abnormal data unit in the doubtful abnormal data subset is poor
Value, including：
Exterior space distance value and the inside for calculating the doubtful abnormal data unit in the doubtful abnormal data subset are empty
Between distance value, space length difference is calculated according to the exterior space distance value and the inner space distance value.
Here, suppose that { M, N } is a doubtful abnormal data unit in doubtful abnormal data subset, then it is described doubtful different
The exterior space distance value OutDistince { M, N } of regular data unit { M, N } is：
Wherein, the M_{i}For the minimum data unit neighbouring with the M, Y_{1}For the minimum data unit neighbouring with the M
Number；The N_{j}For the minimum data unit neighbouring with the N, Y_{2}Number for the minimum data unit neighbouring with the N；
Inner space distance value InnerDistince { M, N } of the doubtful abnormal data unit { M, N } is：
The space length difference is that Distince { M, N } is：
Wherein, the G is the dimension of cube.
Here, the denominator in the calculation formula of the inner space distance value of the doubtful abnormal data unit, with it is doubtful different
There is the number of the doubtful abnormal minimum data unit in boundary part in the corresponding doubtful abnormal data subset of regular data unit
It closes.
Further, the space length difference and the space length of the doubtful abnormal minimum data unit
The size of value determines whether the doubtful abnormal data unit in the doubtful abnormal data subset is abnormal data unit, including：
It is described doubtful if the space length difference is more than the space length value of the doubtful abnormal minimum data unit
Abnormal data unit is abnormal data unit；
It is described to doubt if the space length difference is not more than the space length value of the doubtful abnormal minimum data unit
It is normal data unit like abnormal data unit.
Further, whether the doubtful abnormal data unit determined in the doubtful abnormal data subset is abnormal number
After unit, the method further includes：
Show abnormal data set, the abnormal data set is that all doubtful exceptions that the doubtful abnormal data is concentrated are minimum
Data cell combines the set that the abnormal minimum data unit that recursive method determines forms by dimension.
To realize the above method, the embodiment of the present invention additionally provides a kind of device of determining abnormal data, including：
Computing unit 21, the traversing result obtained after being traversed for basis to each dimension of cube will
The cube is divided into N number of minimum data unit identical with the cube dimension, and calculate it is all it is described most
The corresponding space length value of small data unit；Wherein, N is the product for the dimension value number that the multidimensional data concentrates each dimension；
Determination unit 22, for determining doubtful abnormal data set according to the space length value；
Recursive unit 23 is chosen a doubtful abnormal minimum data unit for being concentrated in the doubtful abnormal data, is pressed
Combine recursive method according to dimension, will the doubtful abnormal minimum data unit, with the doubtful exception minimum data unit phase
Adjacent minimum data unit is combined into doubtful abnormal data subset, and calculate the doubtful exception in the doubtful abnormal data subset
The space length difference of data cell；
Comparing unit 24, for the space length difference and the doubtful abnormal minimum data unit space away from
Size from value determines whether the doubtful abnormal data unit in the doubtful abnormal data subset is abnormal data unit.
Further, described device further includes：
Input unit 25, for inputting cube, and the control rule of cube described in input control；
Converting unit 26, for the cube to be converted into pending data pair according to the control rule
As.
Further, as shown in figure 3, the determination unit 22 includes：
Subelement 221 is fitted, for all space length values being fitted according to normal distribution rule；
First chooses subelement 222, for choosing and being fitted the standard deviation for the normal distribution that the space length is worth to
Apart from the farthest corresponding data of X point as doubtful abnormal data；
Determination subelement 223, for the set of minimum data unit composition corresponding with the doubtful abnormal data is true
It is set to doubtful abnormal data set.
Further, as shown in figure 4, the recursive unit 23 includes：
Second chooses subelement 231, and a doubtful abnormal minimum data is chosen for being concentrated in the doubtful abnormal data
Unit；
Combine subelement 232, for combining recursive method according to dimension, will the doubtful exception minimum data unit,
The minimum data unit adjacent with the doubtful abnormal minimum data unit is combined into doubtful abnormal data subset；
First computation subunit 233, for calculating the doubtful abnormal data unit in the doubtful abnormal data subset
Exterior space distance value and inner space distance value；
Second computation subunit 234, for according to the exterior space distance value and inner space distance value calculating
Space length difference.
Further, the space length difference and the space length of the doubtful abnormal minimum data unit
The size of value determines whether the doubtful abnormal data unit in the doubtful abnormal data subset is abnormal data unit, including：
It is described doubtful if the space length difference is more than the space length value of the doubtful abnormal minimum data unit
Abnormal data unit is abnormal data unit；
It is described to doubt if the space length difference is not more than the space length value of the doubtful abnormal minimum data unit
It is normal data unit like abnormal data unit.
Further, described device further includes：
Display unit 27, for showing abnormal data set；The abnormal data set is that the doubtful abnormal data is concentrated
All doubtful abnormal minimum data units combine the collection that the abnormal minimum data unit that recursive method determines forms by dimension
It closes.
Here, the computing unit, determination unit, recursive unit, comparing unit and converting unit can be run on
It, can be by the central processing unit that is located on computer on computer（CPU）Or microprocessor（MPU）Or digital signal processor
（DSP）Or programmable gate array（FPGA）It realizes；The input unit can be input to by input equipment on computer；Institute
Stating display unit can be by showing that equipment shows abnormal data.
Embodiment 1
Fig. 5 is the flow diagram of the specific implementation for the method that the embodiment of the present invention determines abnormal data, with threedimensional data
For collection, the method for the embodiment of the present invention is described in detail；The dimension of the 3D data set for dimension A, dimension B and
Dimension C, wherein, the dimension A represents the time, and the dimension B represents region, and the dimension C represents the age, and numerical value represents client
Number；As shown in figure 5, the method that the embodiment of the present invention determines abnormal data, including：
Step 501：A 3D data set, and the control rule of 3D data set described in input control are inputted, according to institute
It states control rule and the 3D data set is converted to data object to be run；
The key of this method embodiment be the data rule of cube is combed it is clear, i.e., according to control rule,
Determine that the multidimensional data concentrates the business information such as value, the type of dimension of each dimension, and then will according to the business information
The cube is converted into metadata to be run, and then the data for establishing multidimensional data concentration are corresponding with control rule
Relationship.
Here, the control rule is the metadata information that multidimensional data concentrates data, can be used to detect the multidimensional number
According to the data of concentration whether meet dimension type such as discrete type or continuous type, data range（If the age is between 1090）、
Dimension logical relation（Such as dimension A and the dimension B degrees of association）Deng.
Step 502：Dimension traversal is carried out to each dimension of the 3D data set, obtains the dimension of the 3D data set
Spend traversing result；
For example, the dimension A concentrated to the threedimensional data is traversed, the traversing result of the dimension A is obtained；Specifically
Ground if the dimension value of the dimension A is continuous variable, traverses the corresponding all dimension values of the dimension A；Here, suppose that the dimension
Degree A has 13 dimension values, and the traversing result that the dimension A is exported according to traversal order is：A_{1}In January ,=2012, dimension A_{2}=2012 years
2 months ... ..., dimension A_{13}In January ,=2013；
The dimension B is traversed, obtains the traversing result of the dimension B；Specifically, if the dimension value of the dimension B
It is discrete variable, then the corresponding all dimension values of dimension B is begun stepping through from first；It is assumed that the dimension B has 16 dimensions
Value, the traversing result that the dimension B is exported according to traversal order are：Dimension B_{1}=Wuhan, dimension B_{2}=Yichang ... ..., dimension B_{16}=
Yibin；
By taking the method traversed to the dimension A and dimension B as an example, the dimension C is traversed, is corresponded to
The traversing result of the dimension C；Here, suppose that the dimension value of the dimension C is discrete variable, and dimension value number is 67, according to
The traversing result that traversal order exports the dimension C is：Dimension C_{1}=13, dimension C_{2}=15 ... ..., dimension C_{67}=55。
Step 503：Permutation and combination is carried out to the traversing result obtained in step 502, the 3D data set is divided into N number of
Dimension is threedimensional minimum data unit, and calculates the corresponding space length value of all each described minimum data units；
In the embodiment of the present invention, the permutation and combination shares 13 × 16 × 67 kinds of situations, correspondingly, by the threedimensional data
Collection is divided into N=13 × 16 × 67 dimension for threedimensional minimum data unit；
Specific steps include：
（a）The 3D data set is divided into N=13 × 16 × 67 dimension for threedimensional minimum data unit M_{abc}, it is described
M_{abc}={ dimension Aa, dimension Bb, dimension Cc ∣ 1≤a≤13,1≤b≤16,1≤c≤67 }；Therefore, the M_{abc}Represent that dimension A is
A, dimension B is b and corresponding client's number when dimension C is c；
（b）Calculate the space length value of all minimum data units；
Here, each minimum data unit periphery is up to Y neighbouring minimum data units, the Y and the multidimensional
The dimension of data set is related, specifically, Y=（Dimension 1）× dimension, by taking the present embodiment 3D data set as an example, Y=（31）×3=
6；Wherein, the minimum data unit in middle section up and down, have adjacent minimum data unit around, totally 6
It is a, and the number of the adjacent minimum data unit of the minimum data unit in boundary part can be more less slightly, less than 6；Meter
Calculate the space length value of each minimum data unit, all minimum data units adjacent with the minimum data unit
Distince；The M_{abc}Space length value Distince（M_{abc}）For：
Wherein, the M_{i}For with the M_{abc}Adjacent minimum data unit, Y are and the M_{abc}Adjacent minimum data list
The number of member.
Further, by taking cuboid as an example, step 503 is described in detail：The length and width and high score of the cuboid
Not Dui Yingyu the 3D data set dimension A dimension value number, the dimension value number of dimension B and dimension C dimension value numbers, because
This, the rectangular volume is 13 × 16 × 67；By the cuboid be divided into 13 × 16 × 67 volumes be 1 × 1 × 1 it is small
Square, each small square are known as a unit, and each unit is known as minimum data unit；
Wherein, small square in the cuboid middle section up and down, left and right and it is front and rear have it is adjacent small
The minimum data unit of square, i.e. middle section has 6 adjacent minimum data units, and with the boundary portion of the cuboid
Point the number of the adjacent small square of small square be less than 6, i.e., it is adjacent most with the minimum data unit in boundary part
The number of small data unit is less than 6；
Specifically, M_{abc}During minimum data unit for the 3D data set middle section, on dimension A with the M_{abc}
Adjacent minimum data unit is M_{（a1）bc}And M_{（a+1）bc}, on dimension B with the M_{abc}Adjacent minimum data unit is
M_{a（b1）c}And M_{a（b+1）c}, on dimension C with the M_{abc}Adjacent minimum data unit is M_{ab（c1）}And M_{ab（c+1）}, therefore, with institute
State M_{abc}The number of adjacent minimum data unit is 6, then the M_{abc}Space length value Distince（M_{abc}）For：
Here, as the M_{abc}During in boundary part, the number of adjacent minimum data unit is less than 6, for example, with it is described
M_{abc}Adjacent minimum data unit is M_{（a+1）bc}、M_{a（b+1）c}And M_{ab（c+1）}When, the M_{abc}Space length value Distince
（M_{abc}）For：
Step 504：It is concentrated according to the space length value in the threedimensional data and determines doubtful abnormal data set；
Since the space length value of all minimum data units that multidimensional data is concentrated meets normal distribution rule,
The space length value for all minimum data units that threedimensional data described in the present embodiment is concentrated meets normal distribution rule；Root
According to mentioned above principle, the threedimensional data is concentrated the space length values of all minimum data units according to normal distribution rule into
Row fitting, and choose corresponding apart from X farthest point with the fitting space length standard deviation of normal distribution being worth to
Data as doubtful abnormal data, and the collection of minimum data unit composition corresponding with the doubtful abnormal data be combined into it is doubtful different
Regular data collection；Wherein, the X is the positive integer more than or equal to 1, can be the maximum integer less than or equal to N × 0.01.
Step 505：The doubtful abnormal minimum data list of arbitrary selection one is concentrated in the doubtful abnormal data that step 504 obtains
Member combines recursive method according to dimension, in each dimension, gradually to adjacent with the doubtful abnormal minimum data unit
Minimum data unit, which extend combining, obtains doubtful abnormal data subset, and calculate doubting in the doubtful abnormal data subset
Like abnormal data unit exterior space distance value and inner space distance value and calculate the exterior space distance value and interior
The difference of portion's space length value obtains space length difference, the space length difference and the doubtful exception chosen
The size of the space length value of minimum data unit judges that the doubtful abnormal data unit in the doubtful abnormal data subset is
No is abnormal data unit.
Specifically, it is assumed that choose the doubtful abnormal minimum data unit M that doubtful abnormal data is concentrated_{a1b1c1}, and with institute
State M_{a1b1c1}Adjacent minimum data unit has 6, to the M on dimension A_{a1b1c1}First time extension is carried out, step is：
On dimension A, with the M_{a1b1c1}Neighbouring minimum data unit is respectively M_{（a11）b1c1}With M_{（a1+1）b1c1}, first will
The M_{a1b1c1}、M_{（a11）b1c1}And M_{（a1+1）b1c1}Permutation and combination is carried out, obtains newest doubtful abnormal data subset A, respectively
{M_{a1b1c1}, M_{（a11）b1c1}}、{M_{a1b1c1}, M_{（a1+1）b1c1}And { M_{（a11）b1c1}, M_{（a1+1）b1c1}, in abovementioned doubtful abnormal data
A doubtful abnormal data unit of conduct one is arbitrarily chosen in collection A, for example, choosing { M_{a1b1c1}、M_{（a11）b1c1}Doubted as one
Like abnormal data unit, and with the { M_{a1b1c1}、M_{（a11）b1c1}Adjacent minimum data unit totally 10, described in calculating
{M_{a1b1c1}、M_{（a11）b1c1}And { the M_{a1b1c1}、M_{（a11）b1c1}Space length value between neighbouring minimum data unit；
Here, described { M_{a1b1c1}、M_{（a11）b1c1}Space length value be known as exterior space distance value OutDistince
{M_{a1b1c1}、M_{（a11）b1c1}}；{ M described in calculating_{a1b1c1}、M_{（a11）b1c1}Exterior space distance value, calculation formula is：
Wherein, the M_{i}For with the M_{a1b1c1}Neighbouring minimum data unit, Y_{1}For with the M_{a1b1c1}Neighbouring minimum number
According to the number of unit；The M_{j}For with the M_{(a11)b1c1}Neighbouring minimum data unit, Y_{2}For with the M_{(a11)b1c1}Neighbouring
The number of minimum data unit.
{ M described in calculating_{a1b1c1}、M_{（a11）b1c1}Inner space distance value, calculation formula is：
According to { the M_{a1b1c1}、M_{（a11）b1c1}Exterior space distance value and inner space distance value calculate described in
{M_{a1b1c1}、M_{（a11）b1c1}Space length difference Distince { M_{a1b1c1}、M_{（a11）b1c1}, calculation formula is：
Further, judge the doubtful abnormal data unit { M in the doubtful abnormal data subset A_{a1b1c1}、
M_{（a11）b1c1}Whether it is abnormal data unit, deterministic process is：
Compare { the M_{a1b1c1}、M_{（a11）b1c1}Space length difference Distince { M_{a1b1c1}、M_{（a11）b1c1}With it is described
The space length value Distince { M of Ma1b1c1_{a1b1c}1 } size the, if Distince { M_{a1b1c1}、M_{（a11）b1c1}}>
Distince{M_{a1b1c1}, then described { M_{a1b1c1}、M_{（a11）b1c1}Be abnormal data unit, i.e.,：The M_{a1b1c1}And M_{（a11）b1c1}
For abnormal minimum data unit；
Otherwise, described { M_{a1b1c1}、M_{（a11）b1c1}Be normal data unit, i.e.,：The M_{a1b1c1}And M_{（a11）b1c1}It is normal
Minimum data unit.
According to abovementioned extension computational methods, on dimension A, all doubtful exceptions in doubtful abnormal data subset A are calculated
Exterior space distance value, inner space distance value and the space length difference of data cell, to determine the doubtful abnormal number
Whether it is abnormal data unit according to unit, and then obtains the abnormal data subset A on dimension A；
Meanwhile according to abovementioned extension computational methods, respectively to the M on dimension B and dimension C_{a1b1c1}Carry out extension meter
It calculates, respectively obtains the abnormal data subset B and abnormal data subset C on dimension B and dimension C；By obtained abnormal data
Collect the set of A, abnormal data subset B and the corresponding abnormal minimum data unit compositions of abnormal data subset C as abnormal data
Subset.
Step 506：Dimension combines recursive method according to step 505, the institute concentrated to the doubtful abnormal data
There is doubtful abnormal minimum data unit to be combined recurrence, and by the abnormal minimum data in obtained all abnormal data subsets
The set of unit composition is determined as abnormal data set.
Step 507：Abnormal data set determined by display.
Here, abnormal data is all determined since the present invention implements to concentrate in multidimensional data, and determine
Dimensional extent, size of data, the intensity of anomaly that the abnormal data includes the abnormal data are such as determined by space length value
Therefore intensity of anomaly, and then understands that visualization is high convenient for technical staff.
The foregoing is only a preferred embodiment of the present invention, is not intended to limit the scope of the present invention.
Claims (11)
 A kind of 1. method of determining abnormal data, which is characterized in that the method includes：The cube is divided into N by the traversing result obtained after being traversed according to each dimension to cube A minimum data unit identical with the cube dimension, and calculate the corresponding space of all minimum data units Distance value；Doubtful abnormal data set is determined according to the space length value；It is concentrated in the doubtful abnormal data and chooses a doubtful abnormal minimum data unit, recursive side is combined according to dimension Method, by the doubtful abnormal minimum data unit, the minimum data unit group adjacent with the doubtful abnormal minimum data unit Doubtful abnormal data subset is synthesized, and calculates the space length of the doubtful abnormal data unit in the doubtful abnormal data subset Difference, the size of the space length difference and the space length value of the doubtful abnormal minimum data unit, determines institute Whether the doubtful abnormal data unit stated in doubtful abnormal data subset is abnormal data unit；Wherein, N is the product for the dimension value number that the multidimensional data concentrates each dimension；It is described to calculate the doubtful abnormal data The space length difference of doubtful abnormal data unit in subset, including：It calculates doubtful in the doubtful abnormal data subset The exterior space distance value of abnormal data unit and inner space distance value, according to the exterior space distance value and the inside Space length value calculates space length difference.
 2. according to the method described in claim 1, it is characterized in that, each dimension progress time of the basis to cube Before the traversing result obtained after going through, the method further includes：Cube, and the control rule of cube described in input control are inputted, according to the control rule by described in Cube is converted into pending data object.
 3. according to the method described in claim 1, it is characterized in that, described determine doubtful abnormal number according to the space length value According to collection, including：All space length values are fitted according to normal distribution rule, and selection is worth with being fitted the space length The standard deviation for the normal distribution arrived apart from the farthest corresponding data of X point as doubtful abnormal data, with the doubtful exception The collection of the corresponding minimum data unit composition of data is combined into doubtful abnormal data set.
 4. according to the method described in claim 1, it is characterized in that, the space length difference with it is described doubtful different The size of the space length value of normal minimum data unit, determines the doubtful abnormal data unit in the doubtful abnormal data subset Whether it is abnormal data unit, including：If the space length difference is more than the space length value of the doubtful abnormal minimum data unit, the doubtful exception Data cell is abnormal data unit；It is described doubtful different if the space length difference is not more than the space length value of the doubtful abnormal minimum data unit Regular data unit is normal data unit.
 It is 5. according to the method described in claim 1, it is characterized in that, doubtful in the determining doubtful abnormal data subset After whether abnormal data unit is abnormal data unit, the method further includes：Show abnormal data set, the abnormal data set is all doubtful abnormal minimum data that the doubtful abnormal data is concentrated Unit combines the set that the abnormal minimum data unit that recursive method determines forms by dimension.
 6. a kind of device of determining abnormal data, which is characterized in that described device includes：Computing unit, the traversing result obtained after being traversed for basis to each dimension of cube, will be described more Dimension data collection is divided into N number of minimum data unit identical with the cube dimension, and calculates all minimum data The corresponding space length value of unit；Wherein, N is the product for the dimension value number that the multidimensional data concentrates each dimension；Determination unit, for determining doubtful abnormal data set according to the space length value；Recursive unit chooses a doubtful abnormal minimum data unit, according to dimension for being concentrated in the doubtful abnormal data Recursive method is combined, by the doubtful abnormal minimum data unit, adjacent most with the doubtful exception minimum data unit Small data unit is combined into doubtful abnormal data subset, and calculates the doubtful abnormal data list in the doubtful abnormal data subset The exterior space distance value of member and inner space distance value are calculated according to the exterior space distance value and inner space distance value Space length difference；Comparing unit, for the space length difference and the space length value of the doubtful abnormal minimum data unit Size determines whether the doubtful abnormal data unit in the doubtful abnormal data subset is abnormal data unit.
 7. device according to claim 6, which is characterized in that described device further includes：Input unit, for inputting cube, and the control rule of cube described in input control；Converting unit, for the cube to be converted into pending data object according to the control rule.
 8. device according to claim 6, which is characterized in that the determination unit includes：Subelement is fitted, for all space length values being fitted according to normal distribution rule；First chooses subelement, farthest for choosing and being fitted the standard deviation for the normal distribution that the space length is worth to distance The corresponding data of X point as doubtful abnormal data；Determination subelement, it is doubtful for the set of minimum data unit composition corresponding with the doubtful abnormal data to be determined as Abnormal data set.
 9. device according to claim 6, which is characterized in that the recursive unit includes：Second chooses subelement, is used for It is concentrated in the doubtful abnormal data and chooses a doubtful abnormal minimum data unit；Subelement is combined, for combining recursive method according to dimension, is doubted by the doubtful abnormal minimum data unit, with described Doubtful abnormal data subset is combined into like the adjacent minimum data unit of abnormal minimum data unit；First computation subunit, for calculating the exterior space of the doubtful abnormal data unit in the doubtful abnormal data subset Distance value and inner space distance value；Second computation subunit, for calculating space length according to the exterior space distance value and the inner space distance value Difference.
 10. device according to claim 6, which is characterized in that the space length difference with it is described doubtful The size of the space length value of abnormal minimum data unit determines the doubtful abnormal data list in the doubtful abnormal data subset Whether member is abnormal data unit, including：If the space length difference is more than the space length value of the doubtful abnormal minimum data unit, the doubtful exception Data cell is abnormal data unit；It is described doubtful different if the space length difference is not more than the space length value of the doubtful abnormal minimum data unit Regular data unit is normal data unit.
 11. device according to claim 6, which is characterized in that described device further includes：Display unit, for showing abnormal data set；The abnormal data set is that described doubtful all of abnormal data concentration doubt The set that the abnormal minimum data unit that recursive method determines forms is combined by dimension like abnormal minimum data unit.
Priority Applications (1)
Application Number  Priority Date  Filing Date  Title 

CN201410108593.3A CN104933080B (en)  20140321  20140321  A kind of method and device of determining abnormal data 
Applications Claiming Priority (1)
Application Number  Priority Date  Filing Date  Title 

CN201410108593.3A CN104933080B (en)  20140321  20140321  A kind of method and device of determining abnormal data 
Publications (2)
Publication Number  Publication Date 

CN104933080A CN104933080A (en)  20150923 
CN104933080B true CN104933080B (en)  20180626 
Family
ID=54120248
Family Applications (1)
Application Number  Title  Priority Date  Filing Date 

CN201410108593.3A CN104933080B (en)  20140321  20140321  A kind of method and device of determining abnormal data 
Country Status (1)
Country  Link 

CN (1)  CN104933080B (en) 
Families Citing this family (2)
Publication number  Priority date  Publication date  Assignee  Title 

CN107566192B (en) *  20171018  20190920  中国联合网络通信集团有限公司  A kind of abnormal flow processing method and Network Management Equipment 
CN109035021B (en) *  20180717  20200609  阿里巴巴集团控股有限公司  Method, device and equipment for monitoring transaction index 
Citations (3)
Publication number  Priority date  Publication date  Assignee  Title 

CN101533407A (en) *  20090410  20090916  中国科学院软件研究所  Method for detecting exceptional data in ETL flow 
CN102339288A (en) *  20100721  20120201  中国移动通信集团辽宁有限公司  Method and device for detecting abnormal data of data warehouse 
CN103198147A (en) *  20130419  20130710  上海岩土工程勘察设计研究院有限公司  Method for distinguishing and processing abnormal automatized monitoring data 

2014
 20140321 CN CN201410108593.3A patent/CN104933080B/en active IP Right Grant
Patent Citations (3)
Publication number  Priority date  Publication date  Assignee  Title 

CN101533407A (en) *  20090410  20090916  中国科学院软件研究所  Method for detecting exceptional data in ETL flow 
CN102339288A (en) *  20100721  20120201  中国移动通信集团辽宁有限公司  Method and device for detecting abnormal data of data warehouse 
CN103198147A (en) *  20130419  20130710  上海岩土工程勘察设计研究院有限公司  Method for distinguishing and processing abnormal automatized monitoring data 
Also Published As
Publication number  Publication date 

CN104933080A (en)  20150923 
Similar Documents
Publication  Publication Date  Title 

Joseph et al.  Impact of regularization on spectral clustering  
Wang et al.  Confidence analysis of standard deviational ellipse and its extension into higher dimensional Euclidean space  
Sarwate et al.  Signal processing and machine learning with differential privacy: Algorithms and challenges for continuous data  
Lin et al.  A hybrid evolutionary immune algorithm for multiobjective optimization problems  
Zhang et al.  Spectral methods meet EM: A provably optimal algorithm for crowdsourcing  
Livne et al.  Lean algebraic multigrid (LAMG): Fast graph Laplacian linear solver  
Mauroy et al.  On the use of Fourier averages to compute the global isochrons of (quasi) periodic dynamics  
Wang et al.  Information theory in scientific visualization  
Giordano et al.  Linear response methods for accurate covariance estimates from mean field variational Bayes  
Grafke et al.  The instanton method and its numerical implementation in fluid mechanics  
Owhadi  Bayesian numerical homogenization  
Nie et al.  The algebraic degree of semidefinite programming  
Yang et al.  Scalable optimization of neighbor embedding for visualization  
Choo et al.  Customizing computational methods for visual analytics with big data  
Sussman et al.  A consistent adjacency spectral embedding for stochastic blockmodel graphs  
Bresson et al.  Multiclass total variation clustering  
Sergeyev et al.  Numerical Methods for Solving Initial Value Problems on the Infinity Computer.  
Delius et al.  Quantum group symmetry in sineGordon and affine Toda field theories on the halfline  
Yu et al.  Hierarchical streamline bundles  
Zhang et al.  Linear or nonlinear? Automatic structure discovery for partially linear models  
Deb et al.  An integrated approach to automated innovization for discovering useful design principles: Case studies from engineering  
Lespinats et al.  DDHDS: A method for visualization and exploration of highdimensional data  
Joseph et al.  Impact of regularization on spectral clustering  
Shang et al.  Optimal error regions for quantum state estimation  
Xu et al.  A rough marginbased νtwin support vector machine 
Legal Events
Date  Code  Title  Description 

C06  Publication  
PB01  Publication  
C10  Entry into substantive examination  
SE01  Entry into force of request for substantive examination  
GR01  Patent grant  
GR01  Patent grant 