Specific embodiment
In order to make those skilled in the art better understand the technical solutions in the application, below in conjunction with this specification
Attached drawing in embodiment is clearly and completely described the technical solution in this specification embodiment, it is clear that described
Embodiment is merely a part but not all of the embodiments of the present application.Based on the embodiment in the application, this field
Those of ordinary skill's every other embodiment obtained without creative efforts, all should belong to the application
The range of protection.
The thought of this specification embodiment is, the multiple data detected are clustered according to numerical value, leads to
One or more data groups can be divided into for the multiple data detected by crossing cluster, and in each data group
The numerical value of data be not much different, the detection of abnormal data is then carried out as unit of data group again;Due to each data group
The numerical value of group internal data is not much different, it can thus be avoided shadow brought by the difference of the order of magnitude between data extreme value, data
It rings, to improve the accuracy of anomaly data detection.
Specifically, method provided by this specification embodiment can be applied to each service server, also can be applied to
Terminal device, this specification embodiment are defined not to this.
Fig. 1 is one of the method flow diagram of detection method of abnormal data that this specification embodiment provides, shown in FIG. 1
Method includes at least following steps:
Step 102, according to the numerical value of data to be tested each in data to be tested set, data to be tested set is carried out
Cluster, obtains at least one data group;Wherein, the numerical value of each data to be tested in the same data group meets default poly-
Class condition.
The executing subject of this specification embodiment institute providing method is the detection device of abnormal data.
Wherein, above-mentioned default cluster condition can be set according to the numerical value of each data to be tested, the default cluster
Condition can be specifically configured based on gathering the data to be tested of numerical value relatively for a kind of design.For example, having
It can be that the difference between the numerical value of each data to be tested is less than or equal in advance by above-mentioned default cluster condition setting when body is implemented
If threshold value, it is, of course, also possible to by above-mentioned default cluster condition setting be that gather be a kind of number to be detected according to practical application scene
According to numerical value and the category statistical average difference be less than or equal to the category data standard difference setting multiple etc.,
Specification embodiment does not set the actual conditions content of above-mentioned default cluster condition.
It should be noted that being obtained regardless of above-mentioned default cluster condition is arranged based on cluster in above-mentioned steps 102
The numerical value of data to be tested in each data group is not much different.
In addition, in this specification embodiment, by being clustered to data to be tested set, obtained data group
It can be one or more.
Step 104, the group characteristics of at least one above-mentioned data group are determined;Wherein, above-mentioned group characteristics include cluster
Personal feature information corresponding to the global feature information and each data group of obtained all data groups.
In the specific implementation, the global feature information for all data groups that above-mentioned cluster obtains includes at least cluster and obtains
Data group quantity, in addition to this it is possible to include other, this specification embodiment will not enumerate.It is above-mentioned each
Personal feature information corresponding to data group may include the statistical average of data to be tested in each data group, to be checked
Maximum value or minimum value in measured data etc..Certainly, above-mentioned personal feature information may be other, and this specification is implemented
Example will not enumerate.
Step 106, above-mentioned according to setting detected rule detection based on above-mentioned global feature information and/or personal feature information
Abnormal data group in data group.
In this specification embodiment, it is above-mentioned according to setting detected rule detection that above-mentioned global feature information can be based only on
Abnormal data group in data group;Or it is based only on above-mentioned personal feature information and detects above-mentioned number according to setting detected rule
According to the abnormal data group in group;Also alternatively, being detected simultaneously based on global feature information and personal feature information according to setting
Abnormal data group in the above-mentioned data group of rule detection.
In this specification embodiment, will test the data to be tested in determining abnormal data group be determined as it is above-mentioned to
Abnormal data in detection data set.
If for example, obtaining data group 1, data group 2 and data group after clustering to data to be tested set
3, if detecting that data group 2 is abnormal data group, then all data to be tested in data group 2 are determined as different
Regular data.
The detection method for the abnormal data that this specification embodiment provides, since the numerical value according to data to be tested will be to be checked
The data to be tested that numerical value is not much different in measured data set gather for a data group, then, as unit of data group into
The detection of row abnormal data, can be to avoid the order of magnitude between the maximum value minimum or data in data to be tested set
It is influenced brought by difference, so as to improve the accuracy of anomaly data detection.
The detection method for the abnormal data that this specification embodiment provides for ease of understanding, it is following to be discussed in detail one by one
State the specific implementation process of each step.
Specifically, in above-mentioned steps 102, according to the numerical value of data to be tested each in data to be tested set to be checked
Measured data set is clustered, at least one data group is obtained, specifically comprise the following steps (1), step (2), step (3) and
Step (4):
Step (1), the data to be tested that are read from data to be tested set for n-th, successively by the number to be detected
According to currently cluster at least one obtained primary data group and be combined;Wherein, N is the positive integer greater than 1;
Step (2), the number for determining above-mentioned data to be tested with data to be tested in the primary data group after each combination
Data standard according to data to be tested in the primary data group after the difference and each combination between average value is poor;
Step (3) is based on above-mentioned difference and data standard deviation, determines primary data group belonging to data to be tested;
The data to be tested that n-th is read are added to the primary data group belonging to it by step (4), obtain at least one
A data group.
In this specification embodiment, gather in the data to be tested read from data to be tested set to n-th
When class, if there is currently multiple primary data groups, then the data to be tested respectively read n-th respectively with it is each initial
Data group is combined, if there is currently only a primary data groups, then only needs the number to be detected for reading n-th
It is combined according to the primary data group.It should be noted that n-th is read in this specification embodiment
Data to be tested are combined with primary data group, and the data to be tested that n-th is read actually are put into the initial number
According in group, as a data to be tested in the primary data group.Therefore, in the primary data group after said combination
It then include the data to be tested that last time n-th is read.
Wherein, in above-mentioned steps (3), it is based on above-mentioned difference and data standard deviation, is determined initial belonging to data to be tested
Data group specifically includes:
The primary data group that above-mentioned difference is less than or equal to the setting multiple of data standard difference is determined as above-mentioned to be checked
Primary data group belonging to measured data.
Wherein, for the above-mentioned multiple that sets as preset numerical value, the specific value of above-mentioned setting multiple can be according to reality
Application scenarios are configured, and this specification embodiment is not defined the specific value of above-mentioned setting multiple.
In the specific implementation, when being clustered to the data to be tested that n-th is read, if there is currently multiple initial
Data group, the then data to be tested first read n-th and a primary data in above-mentioned multiple primary data groups
Group is combined, and it is flat to calculate data corresponding to the primary data group after combining with the data to be tested that n-th is read
Whether mean value and data standard deviation, the numerical value of data to be tested for judging that n-th is read are less than with the difference of the statistical average
Or the setting multiple poor equal to above-mentioned data standard, if so, the data to be tested for determining that n-th is read belong to the primary data
The data to be tested that n-th is read are classified as in the primary data group by group.It is subsequent, n-th reading need not be detected again
Data to be tested whether belong to other data groups in above-mentioned multiple data groups.If above-mentioned determine that n-th is read
Data to be tested be not belonging to above-mentioned primary data group, then again successively from above-mentioned multiple data groups detect n-th read
Data to be tested belonging to primary data group.
The above process for ease of understanding, it is following to be illustrated by specific embodiment.
For example, when the data to be tested read from data to be tested set to n-th cluster, currently
Existing primary data group includes primary data group 1, primary data group 2 and primary data group 3, is determining n-th
Belonging to the data to be tested of reading when primary data group, first by the data to be tested of n-th reading and 1 group of original group
It closes, judges whether meet following formula after the data to be tested for reading n-th are combined with primary data group 1;
Wherein, in above-mentioned formula, xNIndicate the numerical value for the data to be tested that n-th is read, μN1N-th is read in expression
Data to be tested combined with primary data group 1 after the obtained statistical average of primary data group, σN1It indicates n-th
The data standard for the primary data group that the data to be tested of reading obtain after combining with primary data group 1 is poor, and m is setting times
Number.
If meeting above-mentioned formula after combining the data to be tested that n-th is read with primary data group 1, then it is assumed that the
The data to be tested that n times are read belong to primary data group 1, at this moment, are then directly classified as the data to be tested that n-th is read just
Beginning data group 1;If being unsatisfactory for above-mentioned public affairs after combining the data to be tested that n-th is read with primary data group 1
Formula, then it is assumed that the data to be tested that n-th is read are not belonging to primary data group 1, then the data to be tested again read n-th
It is combined with primary data group 2, and executes above-mentioned detection process.
If the data to be tested for judging that n-th is read are not admitted to primary data group 1,2 and of primary data group
The data to be tested itself that n-th is read then directly are separately formed a primary data group by primary data group 3.
Certainly, above-mentioned only to list primary data group belonging to the data to be tested that determining n-th is read wherein
A kind of specific embodiment, belonging to the data to be tested that above-mentioned successively detection n-th is read other than data group, at this
In specification embodiment, it can also realize by other means, it is following to be illustrated citing.
When still to be clustered in the data to be tested read from data to be tested set to n-th, currently
It include being said for primary data group 1, primary data group 2 and primary data group 3 through existing primary data group
It is bright.
The data to be tested that n-th is read respectively and primary data group 1, primary data group 2 and primary data group
Group 3 is combined, and calculates separately 2 and of primary data group after the primary data group 1 after combination, combination by following formula
Ratio corresponding to primary data group 3 after combination;
Wherein, in above-mentioned formula, xNIndicate the numerical value for the data to be tested that n-th is read, μNiN-th is read in expression
Data to be tested combined with primary data group i after the obtained statistical average of primary data group, σNiIt indicates n-th
The data standard for the primary data group that the data to be tested of reading obtain after combining with primary data group i is poor, f (xNi) indicate
Ratio corresponding to i-th of primary data group after combination.
Ratio corresponding to primary data group after above-mentioned each combination is compared with setting multiple, ratio is small
In or equal to setting multiple primary data group be determined as n-th reading data to be tested belonging to primary data group.
After determining primary data group belonging to data to be tested, directly the data to be tested are added to belonging to it
Primary data group obtained to be detected after each data to be tested to be added to the primary data group belonging to it
At least one data group corresponding to data acquisition system.
Certainly, when being clustered to the data to be tested in data to be tested set, for first time from number to be detected
According to the data to be tested read in set, at this moment, primary data group there has been no, for this kind of situation, in this specification reality
It applies in example, in above-mentioned steps 102, according to the numerical value of data to be tested each in data to be tested set to data to be tested set
It is clustered, further includes:
For the data to be tested that first time reads from data to be tested set, which is determined as one
Primary data group.
In this specification embodiment, the data to be tested oneself that first time is read from data to be tested group are constituted
One primary data group, in this way, after reading data to be tested from data to be tested group second, it can be by second
The data to be tested of reading are combined with the primary data group, to detect whether the data to be tested that second is read belong to
The primary data group, and so on.
For example, if the data to be tested that read from data to be tested set are 28 for the first time, then directly by 28 oneself
Constitute a primary data group.
The detailed process that data to be tested set is clustered that this specification embodiment provides for ease of understanding, it is following
It will be illustrated by specific embodiment, Fig. 2 is being clustered to data to be tested set of providing of this specification embodiment
Method flow diagram, clustering method shown in Fig. 2 include at least following steps:
Step 202, data to be tested first time read from data to be tested set are as an initial number
According to group.
It wherein, for the first time can be in data to be tested set from data to be tested read in data to be tested set
Any one data.
Step 204, again from data to be tested set read a data to be tested, and by this read it is to be detected
Data are combined with above-mentioned primary data group.
Step 206, calculate the data to be tested this time read with combine after primary data group statistical average
The data standard of the data original group after data to be tested after difference, and calculating combination is poor.
Step 208, judge whether above-mentioned difference is less than or equal to the setting multiple of data standard difference;If so, executing step
Rapid 210;It is no to then follow the steps 212.
Step 210, this data to be tested read is added to above-mentioned primary data group.
Step 212, by the data to be tested of this reading separately as a primary data group.
Step 214, it detects in data to be tested set with the presence or absence of the data to be tested not clustered;If so, executing step
Rapid 204;Otherwise, terminate.
The data to be tested in data to be tested set are clustered by the method that this specification embodiment provides, no
Need to be arranged in advance the parameters such as the number of data included in the number of data group, each data group, side easy to operate
Just.
It certainly, can also be using other clustering methods to be checked in addition to clustering method provided by this specification embodiment
Measured data group is clustered, and the difference between the numerical value of data to be tested is directly such as less than or equal to the to be checked of given threshold
Measured data cluster is a kind of, or is clustered etc. using K-means clustering algorithm to data to be tested, and this specification embodiment is not
It enumerates again.
Fig. 3 is the two of the method flow diagram of the detection method for the abnormal data that this specification embodiment provides, shown in Fig. 3
Method includes at least following steps:
Step 302, data to be tested first time read from data to be tested set are determined as one initially
Data group.
Step 304, the data to be tested read from data to be tested set for n-th, by the data to be tested with
There is currently at least one primary data group be combined;Wherein, N is the positive integer greater than 1.
Step 306, number to be detected in the primary data group after determining the data to be tested and each combination that n-th is read
According to statistical average between difference, and determine the data mark of data to be tested in the primary data group after each combination
It is quasi- poor.
Step 308, it is based on above-mentioned difference and data standard deviation, primary data group belonging to data to be tested is determined, obtains
To at least one data group corresponding to data to be tested set.
Step 310, the group characteristics for the data group that cluster obtains are determined;Wherein, which includes that cluster obtains
All data groups global feature information and each data group corresponding personal feature information respectively.
Step 312, above-mentioned according to setting detected rule detection based on above-mentioned global feature information and/or personal feature information
Abnormal data group in data group.
Step 314, the abnormal data data to be tested in abnormal data group being determined as in data to be tested set.
In the specific implementation, the setting detected rule in above-mentioned steps 106 includes that the first setting detected rule and second is set
Detected rule is determined, correspondingly, based on above-mentioned global feature information and/or personal feature information according in setting detected rule detection
The abnormal data group in data group is stated, specifically comprises the following steps one and step 2;
Step 1: based on above-mentioned global feature information or personal feature information according to the first setting detected rule to clustering
To all data groups detected, to screen out the non-abnormal data group in above-mentioned data group;Wherein, the first setting inspection
The gauge then global feature information based on all data groups or based on personal feature information corresponding to each data group
It determines;
Step 2: being screened out based on above-mentioned global feature information and personal feature information according to the second setting detected rule determination
Abnormal data group after Diao Fei abnormal data group in remaining data group;Wherein, the second setting detected rule is based on institute
There is personal feature information corresponding to the global feature information and each data group of data group to be determined.
Certainly, in step 1, if above-mentioned first sets detected rule as the global feature based on all data groups
Information is determined, then is carried out according to the first setting detected rule to all data groups that cluster obtains based on global feature information
Detection;It is determined if above-mentioned first sets detected rule by the personal feature information of group based on the data, then based on a
Body characteristics information is detected according to all data groups that the first setting detected rule obtains cluster.
In this specification embodiment, when abnormal data group in detection data group, it can be detected by two steps
Mode detect the abnormal data group in all data groups, when being detected by step 1, can first screen out different
Then obvious non-abnormal data group in regular data group, then will screen out rear remaining data group and be carried out by step 2
Detection improves the accuracy of detection in such a way that two-wheeled detects, also, first tentatively screen out obvious non-abnormal data
Group can also reduce the workload of subsequent detection.
In the specific implementation, the global feature information of above-mentioned all data groups includes the number for the data group that cluster obtains
Amount;Personal feature information corresponding to above-mentioned each data group includes that the data of the data to be tested in above-mentioned data group are flat
Mean value;
Above-mentioned first setting detected rule includes one of following rule or a variety of:
Statistical average of the quantity of data group less than or equal to data to be tested in the first given threshold, data group
Statistical average less than or equal to data to be tested in the second given threshold or data group is set more than or equal to third
Threshold value;
Above-mentioned second setting detected rule includes one of following rule or a variety of:
The quantity of data group is greater than or equal to statistical average corresponding to the 4th given threshold and data group and meets
Sets requirement.
For example, in the specific implementation, the quantity for the data group that above-mentioned first setting detected rule can obtain for cluster
It is 1, then it is assumed that abnormal data is not present in the data to be tested;Alternatively, when the statistical average of some data group is less than second
When given threshold, then it is assumed that the data group is not abnormal data group.
Wherein, the specific value of above-mentioned second given threshold can be configured according to practical business scene, this specification
Embodiment is limited not to this.
Wherein, statistical average corresponding to data group meets sets requirement and can be number in above-mentioned second setting rule
It is preceding N% according to the ranking that average value is maximum or statistical average is minimum or statistical average is in all data groups
Deng.
It certainly, can be with when first, which sets detected rule, is less than or equal to the first given threshold as the quantity of data group
It determines in data to be tested group there is no abnormal data, at this moment need not then execute and be examined according to the second setting detected rule again
The step of survey, can only realize the detection of abnormal data according to global feature information.
Fig. 4 is the three of the method flow diagram of the detection method for the abnormal data that this specification embodiment provides, shown in Fig. 4
Method includes at least following steps:
Step 402, data to be tested first time read from data to be tested set are determined as one initially
Data group.
Step 404, the data to be tested read from data to be tested set for n-th, by the data to be tested with
There is currently at least one primary data group be combined;Wherein, N is the positive integer greater than 1.
Step 406, number to be detected in the primary data group after determining the data to be tested and each combination that n-th is read
According to statistical average between difference, and determine the data mark of data to be tested in the primary data group after each combination
It is quasi- poor.
Step 408, it is based on above-mentioned difference and data standard deviation, primary data group belonging to data to be tested is determined, obtains
To multiple data groups corresponding to data to be tested set.
After determining primary data group belonging to data to be tested, directly the data to be tested are added to belonging to it
Primary data group obtained to be detected after each data to be tested to be added to the primary data group belonging to it
At least one data group corresponding to data acquisition system.
Step 410, the group characteristics for the data group that cluster obtains are determined;Wherein, which includes that cluster obtains
All data groups global feature information and each data group corresponding personal feature information respectively.
Step 412, detected rule is set to cluster according to first based on above-mentioned global feature information or personal feature information
Obtained data group is detected, to screen out the non-abnormal data group in above-mentioned data group;Wherein, the first setting detection
Global feature information of the rule based on all data groups or based on personal feature information institute corresponding to each data group
It determines.
Step 414, sieve is determined according to the second setting detected rule based on above-mentioned global feature information and personal feature information
Remove the abnormal data group after non-abnormal data group in remaining data group;Wherein, the second setting detected rule is based on
Personal feature information corresponding to the global feature information and each data group of all data groups is determined.
Step 416, the abnormal data data to be tested in abnormal data group being determined as in data to be tested set.
The detection method for the abnormal data that this specification embodiment provides for ease of understanding, it is following to illustrate this explanation
The detection method for the abnormal data that book embodiment provides.
For example, in a specific embodiment, data to be tested collection is combined into [1,2,2,2,20,13,13,5,5,1,0],
Above-mentioned data to be tested set is clustered first, specific cluster process is as follows:
A data to be tested are read from data to be tested set for the first time, it is assumed that read data to be tested are 1,
It is separately formed a primary data group by 1, is denoted as [1];A data to be tested are read from data to be tested set again,
Assuming that read data to be tested are 2,2 are combined with above-mentioned primary data group [1], obtained primary data group
It is denoted as [1,2], statistical average corresponding to data group [1,2] is 1.5, data standard corresponding to data group [1,2]
Difference is 0.5, f=(2-1.5)/0.5=1, it is assumed that the above-mentioned multiple that sets is 1.4, since 1 less than 1.4, hence, it can be determined that out 2
Belong to primary data group [1], obtains primary data group [1,2].
Again from data to be tested set read a data, it is assumed that read data to be tested be 2, by 2 with it is upper
It states primary data group [1,2] to be combined, obtained primary data group is denoted as [1,2,2], and data group [1,2,2] institute is right
The statistical average answered is 5/3, and data standard difference corresponding to data group [1,2,2] isF=0.7 is calculated,
Since 0.7 less than 1.4, hence, it can be determined that 2 belonging to primary data group [1,2] out, primary data group [1,2,2] are obtained.
Successively each data to be tested group in data to be tested set is clustered according to the method described above, if institute
Some data to be tested of reading are simultaneously not belonging to above-mentioned primary data group, then the data to be tested are directly separately formed one
Primary data group.
It for the example above, can be determined by calculating, read data 2 belong to just from data to be tested group
Beginning data group [1,2,2] obtains primary data group [1,2,2,2];The read data 20 from data to be tested group
Primary data group [1,2,2,2] are then not belonging to, therefore, by 20 separately as a primary data group, are denoted as primary data
Group [20], i.e., there is currently primary data group [1,2,2,2] and primary data groups [20].
When reading data to be tested 13 from data to be tested set again, it is necessary first to which detecting data to be tested 13 is
It is no to belong to primary data group [1,2,2,2], if determining that data to be tested 13 belong to primary data group [1,2,2,2],
Then data to be tested 13 are added in primary data group [1,2,2,2];If determining that data to be tested 13 are not belonging to just
Beginning data group [1,2,2,2], then continue to test whether data to be tested 13 belong to primary data group [20], if determining
Data to be tested 13 belong to primary data group [20], then data to be tested 13 are added in primary data group [20];If
It is to determine that data to be tested 13 are also not belonging to primary data group [20], then data to be tested 13 is directly separately formed one
Primary data group.
Above-mentioned data to be tested set is clustered according to aforesaid way, cluster obtains three data groups, remembers respectively
For data group 1, data group 2 and data group 3, data group 1 is [1,2,2,2,1,0], data group 2 be [20,13,
13], data group 3 is [5,5].Wherein, schematic diagram such as Fig. 5 institute corresponding to data group 1, data group 2 and data group 3
Show.
The group characteristics of data group are as follows: group's quantity of the data group clustered is 3, data group 1
Statistical average be 1.3, the statistical average of data group 2 is 15, and the statistical average of data group 3 is 5.
For above-mentioned data to be tested set, the first setting detected rule are as follows: if the group for the data group that cluster obtains
Group quantity is 1, it is determined that abnormal data is not present in the data to be tested set;If the statistical average of some data group is small
In 5, it is determined that the data group is non-abnormal data group.
Based on above-mentioned first setting detected rule, it can determine that data group 1 is non-abnormal data group, but can not
Determine whether data group 2 and data group 3 are abnormal data group.
For above-mentioned data to be tested set, the second setting detected rule are as follows: when the group for the data group that cluster obtains
When quantity is greater than or equal to 3, the maximum data group of statistical average is determined as abnormal data group.
For the example above, the quantity of the data group clustered is 3, and the maximum data group of statistical average is number
According to group 2, hence, it can be determined that data group 2 is abnormal data group out.
Shown in sum up, data group 2 [20,13,13] is abnormal data group, and therefore, data to be tested 20,13,13 are
Abnormal data in above-mentioned data to be tested set.
The detection method for the abnormal data that this specification embodiment provides, when carrying out the detection of abnormal data, root first
Data to be tested set is clustered according to the numerical value of each data to be tested, obtains at least one data group;Later, with number
It is the detection that unit carries out abnormal data according to group.It, can be by number to be detected by polymerization processing in this specification embodiment
Gather according to the data to be tested that numerical value in polymerization is not much different for a data group, and abnormal data is carried out based on data group
Detection, can be to avoid brought by the difference of the order of magnitude between the maximum value minimum or data in data to be tested set
It influences, to improve the accuracy of anomaly data detection.
Corresponding to the method that this specification embodiment provides, it is based on identical thinking, this specification embodiment additionally provides
A kind of detection device of abnormal data, the detection method of the abnormal data for executing the offer of this specification embodiment, Fig. 6 is this
The module composition schematic diagram of the detection device for the abnormal data that specification embodiment provides, device shown in fig. 6, comprising:
Cluster module 602, for according to the numerical value of data to be tested each in data to be tested set to data to be tested
Set is clustered, at least one data group is obtained;Wherein, the numerical value of each data to be tested in the same data group is full
The default cluster condition of foot;
First determining module 604, for determining the group characteristics of at least one data group;Wherein, group characteristics include
Cluster the global feature information of obtained all data groups and the personal feature information that each data group difference is corresponding;
Detection module 606, for being detected based on global feature information and/or personal feature information according to setting detected rule
Abnormal data group in data group.
Optionally, the device that this specification embodiment provides, further includes:
Second determining module is determined as number to be detected for will test the data to be tested in determining abnormal data group
According to the abnormal data in set.
Optionally, above-mentioned setting detected rule includes the first setting detected rule and the second setting detected rule;
Correspondingly, above-mentioned detection module 606, comprising:
First detection unit, for setting detected rule pair according to first based on global feature information or personal feature information
It clusters obtained all data groups to be detected, to screen out the non-abnormal data group in all data groups;Wherein, first
Set global feature information or based on each data group corresponding to individual special of the detected rule based on all data groups
Reference breath determines;
Second detection unit, for true according to the second setting detected rule based on global feature information and personal feature information
Surely the abnormal data group after Diao Fei abnormal data group in remaining data group is screened out;Wherein, the second setting detected rule
Personal feature information corresponding to global feature information and each data group based on all data groups is determined.
Optionally, the global feature information of all data groups includes the quantity for the data group that cluster obtains;
The personal feature information of each data group includes the statistical average of the data to be tested in data group;
First setting detected rule includes any one in following rule:
Statistical average of the quantity of data group less than or equal to data to be tested in the first given threshold, data group
Statistical average less than or equal to data to be tested in the second given threshold or data group is set more than or equal to third
Threshold value;
Second setting detected rule includes any one in following rule:
The quantity of data group is greater than or equal to statistical average corresponding to the 4th given threshold and data group and meets
Sets requirement.
Optionally, above-mentioned cluster module 602, comprising:
Assembled unit, the data to be tested for being read for n-th from data to be tested set, respectively by number to be detected
According to currently cluster at least one obtained primary data group and be combined;Wherein, N is the positive integer greater than 1;
First determination unit, for determine respectively data to be tested with it is to be detected in the primary data group after each combination
The data standard of data to be tested in the primary data group after difference and each combination between the statistical average of data
Difference;
Second determination unit determines primary data group belonging to data to be tested for being based on difference and data standard deviation
Group;
Adding unit, the data to be tested for reading n-th are added to the primary data group belonging to it, obtain
To at least one data group.
Optionally, above-mentioned second determination unit, is specifically used for:
The primary data group that difference is less than or equal to the setting multiple of data standard difference is determined as data to be tested institute
The primary data group of category.
Optionally, above-mentioned cluster module 602, further includes:
Third determination unit will be to for for for the first time from the data to be tested of the reading in data to be tested set
Detection data is determined as a primary data group.
The detection device of the abnormal data of this specification embodiment can also carry out the detection device of abnormal data in Fig. 1-Fig. 5
The method of execution, and the detection device of abnormal data is realized in Fig. 1-embodiment illustrated in fig. 5 function, details are not described herein.
The detection device for the abnormal data that this specification embodiment provides, when carrying out the detection of abnormal data, root first
Data to be tested set is clustered according to the numerical value of each data to be tested, obtains at least one data group;Later, with number
It is the detection that unit carries out abnormal data according to group.It, can be by number to be detected by polymerization processing in this specification embodiment
Gather according to the data to be tested that numerical value in polymerization is not much different for a data group, and abnormal data is carried out based on data group
Detection, can be to avoid brought by the difference of the order of magnitude between the maximum value minimum or data in data to be tested set
It influences, to improve the accuracy of anomaly data detection.
Further, based on above-mentioned Fig. 1 to method shown in fig. 5, this specification embodiment additionally provides a kind of abnormal number
According to detection device, as shown in Figure 7.
The detection device of abnormal data can generate bigger difference because configuration or performance are different, may include one or
More than one processor 701 and memory 702 can store one or more storages in memory 702 using journey
Sequence or data.Wherein, memory 702 can be of short duration storage or persistent storage.The application program for being stored in memory 702 can be with
Including one or more modules (diagram is not shown), each module may include one in the detection device to abnormal data
Family computer executable instruction information.Further, processor 701 can be set to communicate with memory 702, in exception
The series of computation machine executable instruction information in memory 702 is executed on the detection device of data.The detection of abnormal data is set
Standby can also include one or more power supplys 703, one or more wired or wireless network interfaces 704, one or
More than one input/output interface 705, one or more keyboards 706 etc..
In a specific embodiment, the detection device of abnormal data include memory and one or one with
On program, perhaps more than one program is stored in memory and one or more than one program can wrap for one of them
Include one or more modules, and each module may include that series of computation machine in detection device to abnormal data can
Information is executed instruction, and is configured to execute this by one or more than one processor or more than one program includes
For carrying out following computer executable instructions information:
Data to be tested set is clustered according to the numerical value of data to be tested each in data to be tested set, is obtained
At least one data group;Wherein, the numerical value of each data to be tested in the same data group meets default cluster condition;
Determine the group characteristics of at least one data group;Wherein, group characteristics include all data groups that cluster obtains
Personal feature information corresponding to the global feature information of group and each data group difference;
Based on above-mentioned global feature information and/or personal feature information according in setting detected rule detection data group
Abnormal data group.
Optionally, computer executable instructions information when executed, is examined based on group characteristics according to setting detected rule
It surveys after abnormal data group, following steps can also be performed:
It will test the abnormal number that the data to be tested in determining abnormal data group are determined as in data to be tested set
According to.
Optionally, when executed, setting detected rule includes the first setting detection rule to computer executable instructions information
Then detected rule is set with second;
Based on above-mentioned global feature information and/or personal feature information according in setting detected rule detection data group
Abnormal data group, comprising:
The institute that cluster is obtained according to the first setting detected rule based on above-mentioned global feature information or personal feature information
There is data group to be detected, screens out the non-abnormal data group in all data groups;Wherein, the first setting detected rule base
It is determined in the global feature information of all data groups or based on personal feature information corresponding to each data group;
Determined based on above-mentioned global feature information and personal feature information according to the second setting detected rule screen out it is non-different
Abnormal data group after regular data group in remaining data group;Wherein, the second setting detected rule is based on all data
Personal feature information corresponding to the global feature information and each data group of group is determined.
Optionally, when executed, the global feature information of all data groups includes computer executable instructions information
Cluster the quantity of obtained data group;
The personal feature information of each data group includes the statistical average of the data to be tested in data group;
First setting detected rule includes any one in following rule:
Statistical average of the quantity of data group less than or equal to data to be tested in the first given threshold, data group
Statistical average less than or equal to data to be tested in the second given threshold or data group is set more than or equal to third
Threshold value;
Second setting detected rule includes any one in following rule:
The quantity of data group is greater than or equal to statistical average corresponding to the 4th given threshold and data group and meets
Sets requirement.
Optionally, computer executable instructions information when executed, according to each to be detected in data to be tested set
The numerical value of data clusters data to be tested set, obtains at least one data group, comprising:
For the data to be tested that n-th is read from data to be tested set, respectively by data to be tested and current cluster
At least one obtained primary data group is combined;Wherein, N is the positive integer greater than 1;
The statistical average of data to be tested in primary data group after determining data to be tested and each combination respectively
Between difference and each combination after primary data group in data to be tested data standard it is poor;
Based on difference and data standard deviation, primary data group belonging to data to be tested is determined;
The data to be tested that n-th is read are added to the primary data group belonging to it, obtain at least one number
According to group.
Optionally, computer executable instructions information when executed, is based on difference and data standard deviation, determines to be detected
Primary data group belonging to data, comprising:
The primary data group that difference is less than or equal to the setting multiple of data standard difference is determined as data to be tested institute
The primary data group of category.
Optionally, computer executable instructions information when executed, according to each to be detected in data to be tested set
The numerical value of data clusters data to be tested set, further includes:
For first time from the data to be tested of the reading in data to be tested set, data to be tested are determined as one
Primary data group.
The detection device for the abnormal data that this specification embodiment provides, when carrying out the detection of abnormal data, root first
Data to be tested set is clustered according to the numerical value of each data to be tested, obtains at least one data group;Later, with number
It is the detection that unit carries out abnormal data according to group.It, can be by number to be detected by polymerization processing in this specification embodiment
Gather according to the data to be tested that numerical value in polymerization is not much different for a data group, and abnormal data is carried out based on data group
Detection, can be to avoid brought by the difference of the order of magnitude between the maximum value minimum or data in data to be tested set
It influences, to improve the accuracy of anomaly data detection.
Further, based on above-mentioned Fig. 1 to method shown in fig. 5, this specification embodiment additionally provides a kind of storage Jie
Matter, for storing computer executable instructions information, in a kind of specific embodiment, the storage medium can for USB flash disk, CD,
Hard disk etc., the computer executable instructions information of storage medium storage are able to achieve following below scheme when being executed by processor:
Data to be tested set is clustered according to the numerical value of data to be tested each in data to be tested set, is obtained
At least one data group;Wherein, the numerical value of each data to be tested in the same data group meets default cluster condition;
Determine the group characteristics of at least one data group;Wherein, group characteristics include all data groups that cluster obtains
Personal feature information corresponding to the global feature information of group and each data group difference;
Based on above-mentioned global feature information and/or personal feature information according in setting detected rule detection data group
Abnormal data group.
Optionally, the computer executable instructions information of storage medium storage is based on group when being executed by processor
After feature is according to the abnormal data group of setting detected rule detection, following steps can also be performed:
It will test the abnormal number that the data to be tested in determining abnormal data group are determined as in data to be tested set
According to.
Optionally, the computer executable instructions information of storage medium storage is when being executed by processor, setting detection
Rule includes the first setting detected rule and the second setting detected rule;
Based on above-mentioned global feature information and/personal feature information according to different in setting detected rule detection data group
Regular data group, comprising:
The institute that cluster is obtained according to the first setting detected rule based on above-mentioned global feature information or personal feature information
There is data group to be detected, screens out the non-abnormal data group in all data groups;Wherein, the first setting detected rule base
It is determined in the global feature information of all data groups or based on personal feature information corresponding to each data group;
Determined based on above-mentioned global feature information and personal feature information according to the second setting detected rule screen out it is non-different
Abnormal data group after regular data group in remaining data group;Wherein, the second setting detected rule is based on all data
Personal feature information corresponding to the global feature information and each data group of group is determined.
Optionally, the computer executable instructions information of storage medium storage is when being executed by processor, all data
The global feature information of group includes the quantity for the data group that cluster obtains;
The personal feature information of each data group includes the statistical average of the data to be tested in data group;
First setting detected rule includes any one in following rule:
Statistical average of the quantity of data group less than or equal to data to be tested in the first given threshold, data group
Statistical average less than or equal to data to be tested in the second given threshold or data group is set more than or equal to third
Threshold value;
Second setting detected rule includes any one in following rule:
The quantity of data group is greater than or equal to statistical average corresponding to the 4th given threshold and data group and meets
Sets requirement.
Optionally, the computer executable instructions information of storage medium storage is when being executed by processor, according to be checked
The numerical value of each data to be tested clusters data to be tested set in measured data set, obtains at least one data group
Group, comprising:
For the data to be tested that n-th is read from data to be tested set, respectively by data to be tested and current cluster
At least one obtained primary data group is combined;Wherein, N is the positive integer greater than 1;
The statistical average of data to be tested in primary data group after determining data to be tested and each combination respectively
Between difference and each combination after primary data group in data to be tested data standard it is poor;
Based on difference and data standard deviation, primary data group belonging to data to be tested is determined;
The data to be tested that n-th is read are added to the primary data group belonging to it, obtain at least one number
According to group.
Optionally, the computer executable instructions information of storage medium storage is based on difference when being executed by processor
With data standard deviation, primary data group belonging to data to be tested is determined, comprising:
The primary data group that difference is less than or equal to the setting multiple of data standard difference is determined as data to be tested institute
The primary data group of category.
Optionally, the computer executable instructions information of storage medium storage is when being executed by processor, according to be checked
The numerical value of each data to be tested clusters data to be tested set in measured data set, further includes:
For first time from the data to be tested of the reading in data to be tested set, data to be tested are determined as one
Primary data group.
The computer executable instructions information for the storage medium storage that this specification embodiment provides is being executed by processor
When, when carrying out the detection of abnormal data, data to be tested set is gathered according to the numerical value of each data to be tested first
Class obtains at least one data group;Later, the detection of abnormal data is carried out as unit of data group.In this specification reality
It applies in example, the data to be tested that numerical value is not much different in can polymerizeing data to be tested is handled by polymerization and are gathered for a data
Group, and the detection based on data group progress abnormal data, can be to avoid the maximum value minimum in data to be tested set
Or influenced brought by the difference of the order of magnitude between data, to improve the accuracy of anomaly data detection.
In the 1990s, the improvement of a technology can be distinguished clearly be on hardware improvement (for example,
Improvement to circuit structures such as diode, transistor, switches) or software on improvement (improvement for method flow).So
And with the development of technology, the improvement of current many method flows can be considered as directly improving for hardware circuit.
Designer nearly all obtains corresponding hardware circuit by the way that improved method flow to be programmed into hardware circuit.Cause
This, it cannot be said that the improvement of a method flow cannot be realized with hardware entities module.For example, programmable logic device
(Programmable Logic Device, PLD) (such as field programmable gate array (Field Programmable Gate
Array, FPGA)) it is exactly such a integrated circuit, logic function determines device programming by user.By designer
Voluntarily programming comes a digital display circuit " integrated " on a piece of PLD, designs and makes without asking chip maker
Dedicated IC chip.Moreover, nowadays, substitution manually makes IC chip, this programming is also used instead mostly " is patrolled
Volume compiler (logic compiler) " software realizes that software compiler used is similar when it writes with program development,
And the source code before compiling also write by handy specific programming language, this is referred to as hardware description language
(Hardware Description Language, HDL), and HDL is also not only a kind of, but there are many kind, such as ABEL
(Advanced Boolean Expression Language)、AHDL(Altera Hardware Description
Language)、Confluence、CUPL(Cornell University Programming Language)、HDCal、JHDL
(Java Hardware Description Language)、Lava、Lola、MyHDL、PALASM、RHDL(Ruby
Hardware Description Language) etc., VHDL (Very-High-Speed is most generally used at present
Integrated Circuit Hardware Description Language) and Verilog.Those skilled in the art also answer
This understands, it is only necessary to method flow slightly programming in logic and is programmed into integrated circuit with above-mentioned several hardware description languages,
The hardware circuit for realizing the logical method process can be readily available.
Controller can be implemented in any suitable manner, for example, controller can take such as microprocessor or processing
The computer for the computer readable program code (such as software or firmware) that device and storage can be executed by (micro-) processor can
Read medium, logic gate, switch, specific integrated circuit (Application Specific Integrated Circuit,
ASIC), the form of programmable logic controller (PLC) and insertion microcontroller, the example of controller includes but is not limited to following microcontroller
Device: ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20 and Silicone Labs C8051F320 are deposited
Memory controller is also implemented as a part of the control logic of memory.It is also known in the art that in addition to
Pure computer readable program code mode is realized other than controller, can be made completely by the way that method and step is carried out programming in logic
Controller is obtained to come in fact in the form of logic gate, switch, specific integrated circuit, programmable logic controller (PLC) and insertion microcontroller etc.
Existing identical function.Therefore this controller is considered a kind of hardware component, and to including for realizing various in it
The device of function can also be considered as the structure in hardware component.Or even, it can will be regarded for realizing the device of various functions
For either the software module of implementation method can be the structure in hardware component again.
System, device, module or the unit that above-described embodiment illustrates can specifically realize by computer chip or entity,
Or it is realized by the product with certain function.It is a kind of typically to realize that equipment is computer.Specifically, computer for example may be used
Think personal computer, laptop computer, cellular phone, camera phone, smart phone, personal digital assistant, media play
It is any in device, navigation equipment, electronic mail equipment, game console, tablet computer, wearable device or these equipment
The combination of equipment.
For convenience of description, it is divided into various units when description apparatus above with function to describe respectively.Certainly, implementing this
The function of each unit can be realized in the same or multiple software and or hardware when application.
It should be understood by those skilled in the art that, embodiments herein can provide as method, system or computer program
Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the application
Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the application, which can be used in one or more,
The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces
The form of product.
The application is reference according to the method for this specification embodiment, the stream of equipment (system) and computer program product
Journey figure and/or block diagram describe.It should be understood that can be by computer program instructions information realization flowchart and/or the block diagram
The combination of process and/or box in each flow and/or block and flowchart and/or the block diagram.It can provide these calculating
Machine program instruction information is to general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices
Processor is to generate a machine, so that the instruction executed by computer or the processor of other programmable data processing devices
Information generates specifies for realizing in one or more flows of the flowchart and/or one or more blocks of the block diagram
Function device.
These computer program instructions information, which may also be stored in, is able to guide computer or other programmable data processing devices
In computer-readable memory operate in a specific manner, so that command information stored in the computer readable memory produces
Raw includes the manufacture of command information device, the command information device realize in one or more flows of the flowchart and/or
The function of being specified in one or more blocks of the block diagram.
These computer program instructions information also can be loaded onto a computer or other programmable data processing device, so that
Series of operation steps are executed on a computer or other programmable device to generate computer implemented processing, thus calculating
The command information that is executed on machine or other programmable devices provide for realizing in one or more flows of the flowchart and/or
The step of function of being specified in one or more blocks of the block diagram.
In a typical configuration, calculating equipment includes one or more processors (CPU), input/output interface, net
Network interface and memory.
Memory may include the non-volatile memory in computer-readable medium, random access memory (RAM) and/or
The forms such as Nonvolatile memory, such as read-only memory (ROM) or flash memory (flash RAM).Memory is computer-readable medium
Example.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method
Or technology come realize information store.Information can be computer-readable instruction information, data structure, the module of program or other numbers
According to.The example of the storage medium of computer includes, but are not limited to phase change memory (PRAM), static random access memory
(SRAM), dynamic random access memory (DRAM), other kinds of random access memory (RAM), read-only memory
(ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory techniques, CD-ROM are read-only
Memory (CD-ROM), digital versatile disc (DVD) or other optical storage, magnetic cassettes, tape magnetic disk storage or
Other magnetic storage devices or any other non-transmission medium, can be used for storage can be accessed by a computing device information.According to
Herein defines, and computer-readable medium does not include temporary computer readable media (transitory media), such as modulation
Data-signal and carrier wave.
It should also be noted that, the terms "include", "comprise" or its any other variant are intended to nonexcludability
It include so that the process, method, commodity or the equipment that include a series of elements not only include those elements, but also to wrap
Include other elements that are not explicitly listed, or further include for this process, method, commodity or equipment intrinsic want
Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including described want
There is also other identical elements in the process, method of element, commodity or equipment.
It will be understood by those skilled in the art that embodiments herein can provide as method, system or computer program product.
Therefore, complete hardware embodiment, complete software embodiment or embodiment combining software and hardware aspects can be used in the application
Form.It is deposited moreover, the application can be used to can be used in the computer that one or more wherein includes computer usable program code
The shape for the computer program product implemented on storage media (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.)
Formula.
The application can computer executable instructions information it is general up and down described in the text, such as
Program module.Generally, program module include routines performing specific tasks or implementing specific abstract data types, it is program, right
As, component, data structure etc..The application can also be practiced in a distributed computing environment, in these distributed computing environment
In, by executing task by the connected remote processing devices of communication network.In a distributed computing environment, program module
It can be located in the local and remote computer storage media including storage equipment.
All the embodiments in this specification are described in a progressive manner, same and similar portion between each embodiment
Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for system reality
For applying example, since it is substantially similar to the method embodiment, so being described relatively simple, related place is referring to embodiment of the method
Part explanation.
The above description is only an example of the present application, is not intended to limit this application.For those skilled in the art
For, various changes and changes are possible in this application.All any modifications made within the spirit and principles of the present application are equal
Replacement, improvement etc., should be included within the scope of the claims of this application.