CN110059712A - The detection method and device of abnormal data - Google Patents

The detection method and device of abnormal data Download PDF

Info

Publication number
CN110059712A
CN110059712A CN201910130668.0A CN201910130668A CN110059712A CN 110059712 A CN110059712 A CN 110059712A CN 201910130668 A CN201910130668 A CN 201910130668A CN 110059712 A CN110059712 A CN 110059712A
Authority
CN
China
Prior art keywords
data
tested
data group
group
feature information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910130668.0A
Other languages
Chinese (zh)
Inventor
王娜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced New Technologies Co Ltd
Advantageous New Technologies Co Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201910130668.0A priority Critical patent/CN110059712A/en
Publication of CN110059712A publication Critical patent/CN110059712A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/10Pre-processing; Data cleansing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Testing And Monitoring For Control Systems (AREA)

Abstract

This specification embodiment provides a kind of detection method and device of abnormal data, this method comprises: clustering according to the numerical value of data to be tested each in data to be tested set to data to be tested set, obtains at least one data group;Wherein, the numerical value of each data to be tested in the same data group meets default cluster condition;Determine the group characteristics of at least one above-mentioned data group;Wherein, above-mentioned group characteristics include personal feature information corresponding to the global feature information and each data group difference for all data groups that cluster obtains;Based on above-mentioned global feature information and/or personal feature information according to the abnormal data group in setting detected rule detection data group.

Description

The detection method and device of abnormal data
Technical field
This application involves technical field of data processing more particularly to a kind of detection method and device of abnormal data.
Background technique
With the fast development of Internet technology, more and more business can be handled online.With in line service Development, in order to realize the monitoring to business.It generally requires and the abnormal data in portfolio is detected.So-called abnormal data Detection, then be the abnormal data filtered out in business datum.Wherein, which can be trade company's turnover, customer complaint Amount etc..
It, may be sometime to will appear business more or less but for the business in a period of time There is maximum or minimum so as to cause the numerical value set of the portfolio in this time, or even occurs not one in situation Numerical value on a order of magnitude, these can all reduce the accuracy of anomaly data detection.
Therefore, it is necessary to a kind of scheme be proposed, to solve number between maximum or minimum, numerical value in numerical value set The problem of difference of magnitude brought by anomaly data detection to influencing.
Summary of the invention
The purpose of this specification embodiment is to provide a kind of detection method and device of abnormal data, is carrying out abnormal data Detection when, data to be tested set is clustered according to the numerical value of each data to be tested first, obtains at least one number According to group;Later, the detection of abnormal data is carried out as unit of data group.In this specification embodiment, at polymerization Reason data to be tested can be polymerize in the data to be tested that are not much different of numerical value gather for a data group, and be based on data group Group carries out the detection of abnormal data, can be to avoid quantity between the maximum value minimum or data in data to be tested set It is influenced brought by the difference of grade, to improve the accuracy of anomaly data detection.
In order to solve the above technical problems, this specification embodiment is achieved in that
This specification embodiment provides a kind of detection method of abnormal data, comprising:
According to the numerical value of data to be tested each in data to be tested set, the data to be tested set is gathered Class obtains at least one data group;Wherein, the numerical value of each data to be tested in the same data group meets default cluster Condition;
Determine the group characteristics of at least one data group;Wherein, group characteristics include all numbers that cluster obtains According to the corresponding personal feature information of the global feature information of group and each data group difference;
The data are detected according to setting detected rule based on the global feature information and/or the personal feature information Abnormal data group in group.
This specification embodiment additionally provides a kind of detection device of abnormal data, comprising:
Cluster module, for the numerical value according to data to be tested each in data to be tested set, to the number to be detected It is clustered according to set, obtains at least one data group;Wherein, the numerical value of each data to be tested in the same data group Meet default cluster condition;
First determining module, for determining the group characteristics of at least one data group;Wherein, the group characteristics Corresponding of global feature information and each data group difference including clustering obtained all data groups Body characteristics information;
Detection module, for being advised based on the global feature information and/or the personal feature information according to setting detection Then detect the abnormal data group in the data group.
This specification embodiment additionally provides a kind of detection device of abnormal data, comprising:
Processor;And
It is arranged to the memory of storage computer executable instructions, the executable instruction makes the place when executed Manage device:
According to the numerical value of data to be tested each in data to be tested set, the data to be tested set is gathered Class obtains at least one data group;Wherein, the numerical value of each data to be tested in the same data group meets default cluster Condition;
Determine the group characteristics of at least one data group;Wherein, the group characteristics include the institute that cluster obtains There are the global feature information of the data group and the personal feature information that each data group difference is corresponding;
The data are detected according to setting detected rule based on the global feature information and/or the personal feature information Abnormal data group in group.
This specification embodiment additionally provides a kind of storage medium, described to hold for storing computer executable instructions Following below scheme is realized in row instruction when executed:
According to the numerical value of data to be tested each in data to be tested set, the data to be tested set is gathered Class obtains at least one data group;Wherein, the numerical value of each data to be tested in the same data group meets default cluster Condition;
Determine the group characteristics of at least one data group;Wherein, the group characteristics include the institute that cluster obtains There are the global feature information of the data group and the personal feature information that each data group difference is corresponding;
The data are detected according to setting detected rule based on the global feature information and/or the personal feature information Abnormal data group in group.
Technical solution in the present embodiment, when carrying out the detection of abnormal data, first according to each data to be tested Numerical value clusters data to be tested set, obtains at least one data group;Later, it is carried out as unit of data group different The detection of regular data.In this specification embodiment, numerical value in can polymerizeing data to be tested is handled by polymerization and is differed not Big data to be tested gather for a data group, and the detection of abnormal data is carried out based on data group, can be to avoid to be checked It is influenced brought by the difference of the order of magnitude between maximum value minimum or data in measured data set, to improve exception The accuracy of Data Detection.
Detailed description of the invention
In order to illustrate more clearly of this specification embodiment or technical solution in the prior art, below will to embodiment or Attached drawing needed to be used in the description of the prior art is briefly described, it should be apparent that, the accompanying drawings in the following description is only Some embodiments as described in this application, for those of ordinary skill in the art, in the premise not made the creative labor Under, it is also possible to obtain other drawings based on these drawings.
Fig. 1 is one of the method flow diagram of detection method of abnormal data that this specification embodiment provides;
Fig. 2 is to be clustered in the detection method for the abnormal data that this specification embodiment provides to data to be tested Method flow diagram;
Fig. 3 is the two of the method flow diagram of the detection method for the abnormal data that this specification embodiment provides;
Fig. 4 is the three of the method flow diagram of the detection method for the abnormal data that this specification embodiment provides;
The signal of the data group clustered in the detection method for the abnormal data that Fig. 5 provides for this specification embodiment Figure;
Fig. 6 is the module composition schematic diagram of the detection device for the abnormal data that this specification embodiment provides;
Fig. 7 is the structural schematic diagram of the detection device for the abnormal data that this specification embodiment provides.
Specific embodiment
In order to make those skilled in the art better understand the technical solutions in the application, below in conjunction with this specification Attached drawing in embodiment is clearly and completely described the technical solution in this specification embodiment, it is clear that described Embodiment is merely a part but not all of the embodiments of the present application.Based on the embodiment in the application, this field Those of ordinary skill's every other embodiment obtained without creative efforts, all should belong to the application The range of protection.
The thought of this specification embodiment is, the multiple data detected are clustered according to numerical value, leads to One or more data groups can be divided into for the multiple data detected by crossing cluster, and in each data group The numerical value of data be not much different, the detection of abnormal data is then carried out as unit of data group again;Due to each data group The numerical value of group internal data is not much different, it can thus be avoided shadow brought by the difference of the order of magnitude between data extreme value, data It rings, to improve the accuracy of anomaly data detection.
Specifically, method provided by this specification embodiment can be applied to each service server, also can be applied to Terminal device, this specification embodiment are defined not to this.
Fig. 1 is one of the method flow diagram of detection method of abnormal data that this specification embodiment provides, shown in FIG. 1 Method includes at least following steps:
Step 102, according to the numerical value of data to be tested each in data to be tested set, data to be tested set is carried out Cluster, obtains at least one data group;Wherein, the numerical value of each data to be tested in the same data group meets default poly- Class condition.
The executing subject of this specification embodiment institute providing method is the detection device of abnormal data.
Wherein, above-mentioned default cluster condition can be set according to the numerical value of each data to be tested, the default cluster Condition can be specifically configured based on gathering the data to be tested of numerical value relatively for a kind of design.For example, having It can be that the difference between the numerical value of each data to be tested is less than or equal in advance by above-mentioned default cluster condition setting when body is implemented If threshold value, it is, of course, also possible to by above-mentioned default cluster condition setting be that gather be a kind of number to be detected according to practical application scene According to numerical value and the category statistical average difference be less than or equal to the category data standard difference setting multiple etc., Specification embodiment does not set the actual conditions content of above-mentioned default cluster condition.
It should be noted that being obtained regardless of above-mentioned default cluster condition is arranged based on cluster in above-mentioned steps 102 The numerical value of data to be tested in each data group is not much different.
In addition, in this specification embodiment, by being clustered to data to be tested set, obtained data group It can be one or more.
Step 104, the group characteristics of at least one above-mentioned data group are determined;Wherein, above-mentioned group characteristics include cluster Personal feature information corresponding to the global feature information and each data group of obtained all data groups.
In the specific implementation, the global feature information for all data groups that above-mentioned cluster obtains includes at least cluster and obtains Data group quantity, in addition to this it is possible to include other, this specification embodiment will not enumerate.It is above-mentioned each Personal feature information corresponding to data group may include the statistical average of data to be tested in each data group, to be checked Maximum value or minimum value in measured data etc..Certainly, above-mentioned personal feature information may be other, and this specification is implemented Example will not enumerate.
Step 106, above-mentioned according to setting detected rule detection based on above-mentioned global feature information and/or personal feature information Abnormal data group in data group.
In this specification embodiment, it is above-mentioned according to setting detected rule detection that above-mentioned global feature information can be based only on Abnormal data group in data group;Or it is based only on above-mentioned personal feature information and detects above-mentioned number according to setting detected rule According to the abnormal data group in group;Also alternatively, being detected simultaneously based on global feature information and personal feature information according to setting Abnormal data group in the above-mentioned data group of rule detection.
In this specification embodiment, will test the data to be tested in determining abnormal data group be determined as it is above-mentioned to Abnormal data in detection data set.
If for example, obtaining data group 1, data group 2 and data group after clustering to data to be tested set 3, if detecting that data group 2 is abnormal data group, then all data to be tested in data group 2 are determined as different Regular data.
The detection method for the abnormal data that this specification embodiment provides, since the numerical value according to data to be tested will be to be checked The data to be tested that numerical value is not much different in measured data set gather for a data group, then, as unit of data group into The detection of row abnormal data, can be to avoid the order of magnitude between the maximum value minimum or data in data to be tested set It is influenced brought by difference, so as to improve the accuracy of anomaly data detection.
The detection method for the abnormal data that this specification embodiment provides for ease of understanding, it is following to be discussed in detail one by one State the specific implementation process of each step.
Specifically, in above-mentioned steps 102, according to the numerical value of data to be tested each in data to be tested set to be checked Measured data set is clustered, at least one data group is obtained, specifically comprise the following steps (1), step (2), step (3) and Step (4):
Step (1), the data to be tested that are read from data to be tested set for n-th, successively by the number to be detected According to currently cluster at least one obtained primary data group and be combined;Wherein, N is the positive integer greater than 1;
Step (2), the number for determining above-mentioned data to be tested with data to be tested in the primary data group after each combination Data standard according to data to be tested in the primary data group after the difference and each combination between average value is poor;
Step (3) is based on above-mentioned difference and data standard deviation, determines primary data group belonging to data to be tested;
The data to be tested that n-th is read are added to the primary data group belonging to it by step (4), obtain at least one A data group.
In this specification embodiment, gather in the data to be tested read from data to be tested set to n-th When class, if there is currently multiple primary data groups, then the data to be tested respectively read n-th respectively with it is each initial Data group is combined, if there is currently only a primary data groups, then only needs the number to be detected for reading n-th It is combined according to the primary data group.It should be noted that n-th is read in this specification embodiment Data to be tested are combined with primary data group, and the data to be tested that n-th is read actually are put into the initial number According in group, as a data to be tested in the primary data group.Therefore, in the primary data group after said combination It then include the data to be tested that last time n-th is read.
Wherein, in above-mentioned steps (3), it is based on above-mentioned difference and data standard deviation, is determined initial belonging to data to be tested Data group specifically includes:
The primary data group that above-mentioned difference is less than or equal to the setting multiple of data standard difference is determined as above-mentioned to be checked Primary data group belonging to measured data.
Wherein, for the above-mentioned multiple that sets as preset numerical value, the specific value of above-mentioned setting multiple can be according to reality Application scenarios are configured, and this specification embodiment is not defined the specific value of above-mentioned setting multiple.
In the specific implementation, when being clustered to the data to be tested that n-th is read, if there is currently multiple initial Data group, the then data to be tested first read n-th and a primary data in above-mentioned multiple primary data groups Group is combined, and it is flat to calculate data corresponding to the primary data group after combining with the data to be tested that n-th is read Whether mean value and data standard deviation, the numerical value of data to be tested for judging that n-th is read are less than with the difference of the statistical average Or the setting multiple poor equal to above-mentioned data standard, if so, the data to be tested for determining that n-th is read belong to the primary data The data to be tested that n-th is read are classified as in the primary data group by group.It is subsequent, n-th reading need not be detected again Data to be tested whether belong to other data groups in above-mentioned multiple data groups.If above-mentioned determine that n-th is read Data to be tested be not belonging to above-mentioned primary data group, then again successively from above-mentioned multiple data groups detect n-th read Data to be tested belonging to primary data group.
The above process for ease of understanding, it is following to be illustrated by specific embodiment.
For example, when the data to be tested read from data to be tested set to n-th cluster, currently Existing primary data group includes primary data group 1, primary data group 2 and primary data group 3, is determining n-th Belonging to the data to be tested of reading when primary data group, first by the data to be tested of n-th reading and 1 group of original group It closes, judges whether meet following formula after the data to be tested for reading n-th are combined with primary data group 1;
Wherein, in above-mentioned formula, xNIndicate the numerical value for the data to be tested that n-th is read, μN1N-th is read in expression Data to be tested combined with primary data group 1 after the obtained statistical average of primary data group, σN1It indicates n-th The data standard for the primary data group that the data to be tested of reading obtain after combining with primary data group 1 is poor, and m is setting times Number.
If meeting above-mentioned formula after combining the data to be tested that n-th is read with primary data group 1, then it is assumed that the The data to be tested that n times are read belong to primary data group 1, at this moment, are then directly classified as the data to be tested that n-th is read just Beginning data group 1;If being unsatisfactory for above-mentioned public affairs after combining the data to be tested that n-th is read with primary data group 1 Formula, then it is assumed that the data to be tested that n-th is read are not belonging to primary data group 1, then the data to be tested again read n-th It is combined with primary data group 2, and executes above-mentioned detection process.
If the data to be tested for judging that n-th is read are not admitted to primary data group 1,2 and of primary data group The data to be tested itself that n-th is read then directly are separately formed a primary data group by primary data group 3.
Certainly, above-mentioned only to list primary data group belonging to the data to be tested that determining n-th is read wherein A kind of specific embodiment, belonging to the data to be tested that above-mentioned successively detection n-th is read other than data group, at this In specification embodiment, it can also realize by other means, it is following to be illustrated citing.
When still to be clustered in the data to be tested read from data to be tested set to n-th, currently It include being said for primary data group 1, primary data group 2 and primary data group 3 through existing primary data group It is bright.
The data to be tested that n-th is read respectively and primary data group 1, primary data group 2 and primary data group Group 3 is combined, and calculates separately 2 and of primary data group after the primary data group 1 after combination, combination by following formula Ratio corresponding to primary data group 3 after combination;
Wherein, in above-mentioned formula, xNIndicate the numerical value for the data to be tested that n-th is read, μNiN-th is read in expression Data to be tested combined with primary data group i after the obtained statistical average of primary data group, σNiIt indicates n-th The data standard for the primary data group that the data to be tested of reading obtain after combining with primary data group i is poor, f (xNi) indicate Ratio corresponding to i-th of primary data group after combination.
Ratio corresponding to primary data group after above-mentioned each combination is compared with setting multiple, ratio is small In or equal to setting multiple primary data group be determined as n-th reading data to be tested belonging to primary data group.
After determining primary data group belonging to data to be tested, directly the data to be tested are added to belonging to it Primary data group obtained to be detected after each data to be tested to be added to the primary data group belonging to it At least one data group corresponding to data acquisition system.
Certainly, when being clustered to the data to be tested in data to be tested set, for first time from number to be detected According to the data to be tested read in set, at this moment, primary data group there has been no, for this kind of situation, in this specification reality It applies in example, in above-mentioned steps 102, according to the numerical value of data to be tested each in data to be tested set to data to be tested set It is clustered, further includes:
For the data to be tested that first time reads from data to be tested set, which is determined as one Primary data group.
In this specification embodiment, the data to be tested oneself that first time is read from data to be tested group are constituted One primary data group, in this way, after reading data to be tested from data to be tested group second, it can be by second The data to be tested of reading are combined with the primary data group, to detect whether the data to be tested that second is read belong to The primary data group, and so on.
For example, if the data to be tested that read from data to be tested set are 28 for the first time, then directly by 28 oneself Constitute a primary data group.
The detailed process that data to be tested set is clustered that this specification embodiment provides for ease of understanding, it is following It will be illustrated by specific embodiment, Fig. 2 is being clustered to data to be tested set of providing of this specification embodiment Method flow diagram, clustering method shown in Fig. 2 include at least following steps:
Step 202, data to be tested first time read from data to be tested set are as an initial number According to group.
It wherein, for the first time can be in data to be tested set from data to be tested read in data to be tested set Any one data.
Step 204, again from data to be tested set read a data to be tested, and by this read it is to be detected Data are combined with above-mentioned primary data group.
Step 206, calculate the data to be tested this time read with combine after primary data group statistical average The data standard of the data original group after data to be tested after difference, and calculating combination is poor.
Step 208, judge whether above-mentioned difference is less than or equal to the setting multiple of data standard difference;If so, executing step Rapid 210;It is no to then follow the steps 212.
Step 210, this data to be tested read is added to above-mentioned primary data group.
Step 212, by the data to be tested of this reading separately as a primary data group.
Step 214, it detects in data to be tested set with the presence or absence of the data to be tested not clustered;If so, executing step Rapid 204;Otherwise, terminate.
The data to be tested in data to be tested set are clustered by the method that this specification embodiment provides, no Need to be arranged in advance the parameters such as the number of data included in the number of data group, each data group, side easy to operate Just.
It certainly, can also be using other clustering methods to be checked in addition to clustering method provided by this specification embodiment Measured data group is clustered, and the difference between the numerical value of data to be tested is directly such as less than or equal to the to be checked of given threshold Measured data cluster is a kind of, or is clustered etc. using K-means clustering algorithm to data to be tested, and this specification embodiment is not It enumerates again.
Fig. 3 is the two of the method flow diagram of the detection method for the abnormal data that this specification embodiment provides, shown in Fig. 3 Method includes at least following steps:
Step 302, data to be tested first time read from data to be tested set are determined as one initially Data group.
Step 304, the data to be tested read from data to be tested set for n-th, by the data to be tested with There is currently at least one primary data group be combined;Wherein, N is the positive integer greater than 1.
Step 306, number to be detected in the primary data group after determining the data to be tested and each combination that n-th is read According to statistical average between difference, and determine the data mark of data to be tested in the primary data group after each combination It is quasi- poor.
Step 308, it is based on above-mentioned difference and data standard deviation, primary data group belonging to data to be tested is determined, obtains To at least one data group corresponding to data to be tested set.
Step 310, the group characteristics for the data group that cluster obtains are determined;Wherein, which includes that cluster obtains All data groups global feature information and each data group corresponding personal feature information respectively.
Step 312, above-mentioned according to setting detected rule detection based on above-mentioned global feature information and/or personal feature information Abnormal data group in data group.
Step 314, the abnormal data data to be tested in abnormal data group being determined as in data to be tested set.
In the specific implementation, the setting detected rule in above-mentioned steps 106 includes that the first setting detected rule and second is set Detected rule is determined, correspondingly, based on above-mentioned global feature information and/or personal feature information according in setting detected rule detection The abnormal data group in data group is stated, specifically comprises the following steps one and step 2;
Step 1: based on above-mentioned global feature information or personal feature information according to the first setting detected rule to clustering To all data groups detected, to screen out the non-abnormal data group in above-mentioned data group;Wherein, the first setting inspection The gauge then global feature information based on all data groups or based on personal feature information corresponding to each data group It determines;
Step 2: being screened out based on above-mentioned global feature information and personal feature information according to the second setting detected rule determination Abnormal data group after Diao Fei abnormal data group in remaining data group;Wherein, the second setting detected rule is based on institute There is personal feature information corresponding to the global feature information and each data group of data group to be determined.
Certainly, in step 1, if above-mentioned first sets detected rule as the global feature based on all data groups Information is determined, then is carried out according to the first setting detected rule to all data groups that cluster obtains based on global feature information Detection;It is determined if above-mentioned first sets detected rule by the personal feature information of group based on the data, then based on a Body characteristics information is detected according to all data groups that the first setting detected rule obtains cluster.
In this specification embodiment, when abnormal data group in detection data group, it can be detected by two steps Mode detect the abnormal data group in all data groups, when being detected by step 1, can first screen out different Then obvious non-abnormal data group in regular data group, then will screen out rear remaining data group and be carried out by step 2 Detection improves the accuracy of detection in such a way that two-wheeled detects, also, first tentatively screen out obvious non-abnormal data Group can also reduce the workload of subsequent detection.
In the specific implementation, the global feature information of above-mentioned all data groups includes the number for the data group that cluster obtains Amount;Personal feature information corresponding to above-mentioned each data group includes that the data of the data to be tested in above-mentioned data group are flat Mean value;
Above-mentioned first setting detected rule includes one of following rule or a variety of:
Statistical average of the quantity of data group less than or equal to data to be tested in the first given threshold, data group Statistical average less than or equal to data to be tested in the second given threshold or data group is set more than or equal to third Threshold value;
Above-mentioned second setting detected rule includes one of following rule or a variety of:
The quantity of data group is greater than or equal to statistical average corresponding to the 4th given threshold and data group and meets Sets requirement.
For example, in the specific implementation, the quantity for the data group that above-mentioned first setting detected rule can obtain for cluster It is 1, then it is assumed that abnormal data is not present in the data to be tested;Alternatively, when the statistical average of some data group is less than second When given threshold, then it is assumed that the data group is not abnormal data group.
Wherein, the specific value of above-mentioned second given threshold can be configured according to practical business scene, this specification Embodiment is limited not to this.
Wherein, statistical average corresponding to data group meets sets requirement and can be number in above-mentioned second setting rule It is preceding N% according to the ranking that average value is maximum or statistical average is minimum or statistical average is in all data groups Deng.
It certainly, can be with when first, which sets detected rule, is less than or equal to the first given threshold as the quantity of data group It determines in data to be tested group there is no abnormal data, at this moment need not then execute and be examined according to the second setting detected rule again The step of survey, can only realize the detection of abnormal data according to global feature information.
Fig. 4 is the three of the method flow diagram of the detection method for the abnormal data that this specification embodiment provides, shown in Fig. 4 Method includes at least following steps:
Step 402, data to be tested first time read from data to be tested set are determined as one initially Data group.
Step 404, the data to be tested read from data to be tested set for n-th, by the data to be tested with There is currently at least one primary data group be combined;Wherein, N is the positive integer greater than 1.
Step 406, number to be detected in the primary data group after determining the data to be tested and each combination that n-th is read According to statistical average between difference, and determine the data mark of data to be tested in the primary data group after each combination It is quasi- poor.
Step 408, it is based on above-mentioned difference and data standard deviation, primary data group belonging to data to be tested is determined, obtains To multiple data groups corresponding to data to be tested set.
After determining primary data group belonging to data to be tested, directly the data to be tested are added to belonging to it Primary data group obtained to be detected after each data to be tested to be added to the primary data group belonging to it At least one data group corresponding to data acquisition system.
Step 410, the group characteristics for the data group that cluster obtains are determined;Wherein, which includes that cluster obtains All data groups global feature information and each data group corresponding personal feature information respectively.
Step 412, detected rule is set to cluster according to first based on above-mentioned global feature information or personal feature information Obtained data group is detected, to screen out the non-abnormal data group in above-mentioned data group;Wherein, the first setting detection Global feature information of the rule based on all data groups or based on personal feature information institute corresponding to each data group It determines.
Step 414, sieve is determined according to the second setting detected rule based on above-mentioned global feature information and personal feature information Remove the abnormal data group after non-abnormal data group in remaining data group;Wherein, the second setting detected rule is based on Personal feature information corresponding to the global feature information and each data group of all data groups is determined.
Step 416, the abnormal data data to be tested in abnormal data group being determined as in data to be tested set.
The detection method for the abnormal data that this specification embodiment provides for ease of understanding, it is following to illustrate this explanation The detection method for the abnormal data that book embodiment provides.
For example, in a specific embodiment, data to be tested collection is combined into [1,2,2,2,20,13,13,5,5,1,0], Above-mentioned data to be tested set is clustered first, specific cluster process is as follows:
A data to be tested are read from data to be tested set for the first time, it is assumed that read data to be tested are 1, It is separately formed a primary data group by 1, is denoted as [1];A data to be tested are read from data to be tested set again, Assuming that read data to be tested are 2,2 are combined with above-mentioned primary data group [1], obtained primary data group It is denoted as [1,2], statistical average corresponding to data group [1,2] is 1.5, data standard corresponding to data group [1,2] Difference is 0.5, f=(2-1.5)/0.5=1, it is assumed that the above-mentioned multiple that sets is 1.4, since 1 less than 1.4, hence, it can be determined that out 2 Belong to primary data group [1], obtains primary data group [1,2].
Again from data to be tested set read a data, it is assumed that read data to be tested be 2, by 2 with it is upper It states primary data group [1,2] to be combined, obtained primary data group is denoted as [1,2,2], and data group [1,2,2] institute is right The statistical average answered is 5/3, and data standard difference corresponding to data group [1,2,2] isF=0.7 is calculated, Since 0.7 less than 1.4, hence, it can be determined that 2 belonging to primary data group [1,2] out, primary data group [1,2,2] are obtained.
Successively each data to be tested group in data to be tested set is clustered according to the method described above, if institute Some data to be tested of reading are simultaneously not belonging to above-mentioned primary data group, then the data to be tested are directly separately formed one Primary data group.
It for the example above, can be determined by calculating, read data 2 belong to just from data to be tested group Beginning data group [1,2,2] obtains primary data group [1,2,2,2];The read data 20 from data to be tested group Primary data group [1,2,2,2] are then not belonging to, therefore, by 20 separately as a primary data group, are denoted as primary data Group [20], i.e., there is currently primary data group [1,2,2,2] and primary data groups [20].
When reading data to be tested 13 from data to be tested set again, it is necessary first to which detecting data to be tested 13 is It is no to belong to primary data group [1,2,2,2], if determining that data to be tested 13 belong to primary data group [1,2,2,2], Then data to be tested 13 are added in primary data group [1,2,2,2];If determining that data to be tested 13 are not belonging to just Beginning data group [1,2,2,2], then continue to test whether data to be tested 13 belong to primary data group [20], if determining Data to be tested 13 belong to primary data group [20], then data to be tested 13 are added in primary data group [20];If It is to determine that data to be tested 13 are also not belonging to primary data group [20], then data to be tested 13 is directly separately formed one Primary data group.
Above-mentioned data to be tested set is clustered according to aforesaid way, cluster obtains three data groups, remembers respectively For data group 1, data group 2 and data group 3, data group 1 is [1,2,2,2,1,0], data group 2 be [20,13, 13], data group 3 is [5,5].Wherein, schematic diagram such as Fig. 5 institute corresponding to data group 1, data group 2 and data group 3 Show.
The group characteristics of data group are as follows: group's quantity of the data group clustered is 3, data group 1 Statistical average be 1.3, the statistical average of data group 2 is 15, and the statistical average of data group 3 is 5.
For above-mentioned data to be tested set, the first setting detected rule are as follows: if the group for the data group that cluster obtains Group quantity is 1, it is determined that abnormal data is not present in the data to be tested set;If the statistical average of some data group is small In 5, it is determined that the data group is non-abnormal data group.
Based on above-mentioned first setting detected rule, it can determine that data group 1 is non-abnormal data group, but can not Determine whether data group 2 and data group 3 are abnormal data group.
For above-mentioned data to be tested set, the second setting detected rule are as follows: when the group for the data group that cluster obtains When quantity is greater than or equal to 3, the maximum data group of statistical average is determined as abnormal data group.
For the example above, the quantity of the data group clustered is 3, and the maximum data group of statistical average is number According to group 2, hence, it can be determined that data group 2 is abnormal data group out.
Shown in sum up, data group 2 [20,13,13] is abnormal data group, and therefore, data to be tested 20,13,13 are Abnormal data in above-mentioned data to be tested set.
The detection method for the abnormal data that this specification embodiment provides, when carrying out the detection of abnormal data, root first Data to be tested set is clustered according to the numerical value of each data to be tested, obtains at least one data group;Later, with number It is the detection that unit carries out abnormal data according to group.It, can be by number to be detected by polymerization processing in this specification embodiment Gather according to the data to be tested that numerical value in polymerization is not much different for a data group, and abnormal data is carried out based on data group Detection, can be to avoid brought by the difference of the order of magnitude between the maximum value minimum or data in data to be tested set It influences, to improve the accuracy of anomaly data detection.
Corresponding to the method that this specification embodiment provides, it is based on identical thinking, this specification embodiment additionally provides A kind of detection device of abnormal data, the detection method of the abnormal data for executing the offer of this specification embodiment, Fig. 6 is this The module composition schematic diagram of the detection device for the abnormal data that specification embodiment provides, device shown in fig. 6, comprising:
Cluster module 602, for according to the numerical value of data to be tested each in data to be tested set to data to be tested Set is clustered, at least one data group is obtained;Wherein, the numerical value of each data to be tested in the same data group is full The default cluster condition of foot;
First determining module 604, for determining the group characteristics of at least one data group;Wherein, group characteristics include Cluster the global feature information of obtained all data groups and the personal feature information that each data group difference is corresponding;
Detection module 606, for being detected based on global feature information and/or personal feature information according to setting detected rule Abnormal data group in data group.
Optionally, the device that this specification embodiment provides, further includes:
Second determining module is determined as number to be detected for will test the data to be tested in determining abnormal data group According to the abnormal data in set.
Optionally, above-mentioned setting detected rule includes the first setting detected rule and the second setting detected rule;
Correspondingly, above-mentioned detection module 606, comprising:
First detection unit, for setting detected rule pair according to first based on global feature information or personal feature information It clusters obtained all data groups to be detected, to screen out the non-abnormal data group in all data groups;Wherein, first Set global feature information or based on each data group corresponding to individual special of the detected rule based on all data groups Reference breath determines;
Second detection unit, for true according to the second setting detected rule based on global feature information and personal feature information Surely the abnormal data group after Diao Fei abnormal data group in remaining data group is screened out;Wherein, the second setting detected rule Personal feature information corresponding to global feature information and each data group based on all data groups is determined.
Optionally, the global feature information of all data groups includes the quantity for the data group that cluster obtains;
The personal feature information of each data group includes the statistical average of the data to be tested in data group;
First setting detected rule includes any one in following rule:
Statistical average of the quantity of data group less than or equal to data to be tested in the first given threshold, data group Statistical average less than or equal to data to be tested in the second given threshold or data group is set more than or equal to third Threshold value;
Second setting detected rule includes any one in following rule:
The quantity of data group is greater than or equal to statistical average corresponding to the 4th given threshold and data group and meets Sets requirement.
Optionally, above-mentioned cluster module 602, comprising:
Assembled unit, the data to be tested for being read for n-th from data to be tested set, respectively by number to be detected According to currently cluster at least one obtained primary data group and be combined;Wherein, N is the positive integer greater than 1;
First determination unit, for determine respectively data to be tested with it is to be detected in the primary data group after each combination The data standard of data to be tested in the primary data group after difference and each combination between the statistical average of data Difference;
Second determination unit determines primary data group belonging to data to be tested for being based on difference and data standard deviation Group;
Adding unit, the data to be tested for reading n-th are added to the primary data group belonging to it, obtain To at least one data group.
Optionally, above-mentioned second determination unit, is specifically used for:
The primary data group that difference is less than or equal to the setting multiple of data standard difference is determined as data to be tested institute The primary data group of category.
Optionally, above-mentioned cluster module 602, further includes:
Third determination unit will be to for for for the first time from the data to be tested of the reading in data to be tested set Detection data is determined as a primary data group.
The detection device of the abnormal data of this specification embodiment can also carry out the detection device of abnormal data in Fig. 1-Fig. 5 The method of execution, and the detection device of abnormal data is realized in Fig. 1-embodiment illustrated in fig. 5 function, details are not described herein.
The detection device for the abnormal data that this specification embodiment provides, when carrying out the detection of abnormal data, root first Data to be tested set is clustered according to the numerical value of each data to be tested, obtains at least one data group;Later, with number It is the detection that unit carries out abnormal data according to group.It, can be by number to be detected by polymerization processing in this specification embodiment Gather according to the data to be tested that numerical value in polymerization is not much different for a data group, and abnormal data is carried out based on data group Detection, can be to avoid brought by the difference of the order of magnitude between the maximum value minimum or data in data to be tested set It influences, to improve the accuracy of anomaly data detection.
Further, based on above-mentioned Fig. 1 to method shown in fig. 5, this specification embodiment additionally provides a kind of abnormal number According to detection device, as shown in Figure 7.
The detection device of abnormal data can generate bigger difference because configuration or performance are different, may include one or More than one processor 701 and memory 702 can store one or more storages in memory 702 using journey Sequence or data.Wherein, memory 702 can be of short duration storage or persistent storage.The application program for being stored in memory 702 can be with Including one or more modules (diagram is not shown), each module may include one in the detection device to abnormal data Family computer executable instruction information.Further, processor 701 can be set to communicate with memory 702, in exception The series of computation machine executable instruction information in memory 702 is executed on the detection device of data.The detection of abnormal data is set Standby can also include one or more power supplys 703, one or more wired or wireless network interfaces 704, one or More than one input/output interface 705, one or more keyboards 706 etc..
In a specific embodiment, the detection device of abnormal data include memory and one or one with On program, perhaps more than one program is stored in memory and one or more than one program can wrap for one of them Include one or more modules, and each module may include that series of computation machine in detection device to abnormal data can Information is executed instruction, and is configured to execute this by one or more than one processor or more than one program includes For carrying out following computer executable instructions information:
Data to be tested set is clustered according to the numerical value of data to be tested each in data to be tested set, is obtained At least one data group;Wherein, the numerical value of each data to be tested in the same data group meets default cluster condition;
Determine the group characteristics of at least one data group;Wherein, group characteristics include all data groups that cluster obtains Personal feature information corresponding to the global feature information of group and each data group difference;
Based on above-mentioned global feature information and/or personal feature information according in setting detected rule detection data group Abnormal data group.
Optionally, computer executable instructions information when executed, is examined based on group characteristics according to setting detected rule It surveys after abnormal data group, following steps can also be performed:
It will test the abnormal number that the data to be tested in determining abnormal data group are determined as in data to be tested set According to.
Optionally, when executed, setting detected rule includes the first setting detection rule to computer executable instructions information Then detected rule is set with second;
Based on above-mentioned global feature information and/or personal feature information according in setting detected rule detection data group Abnormal data group, comprising:
The institute that cluster is obtained according to the first setting detected rule based on above-mentioned global feature information or personal feature information There is data group to be detected, screens out the non-abnormal data group in all data groups;Wherein, the first setting detected rule base It is determined in the global feature information of all data groups or based on personal feature information corresponding to each data group;
Determined based on above-mentioned global feature information and personal feature information according to the second setting detected rule screen out it is non-different Abnormal data group after regular data group in remaining data group;Wherein, the second setting detected rule is based on all data Personal feature information corresponding to the global feature information and each data group of group is determined.
Optionally, when executed, the global feature information of all data groups includes computer executable instructions information Cluster the quantity of obtained data group;
The personal feature information of each data group includes the statistical average of the data to be tested in data group;
First setting detected rule includes any one in following rule:
Statistical average of the quantity of data group less than or equal to data to be tested in the first given threshold, data group Statistical average less than or equal to data to be tested in the second given threshold or data group is set more than or equal to third Threshold value;
Second setting detected rule includes any one in following rule:
The quantity of data group is greater than or equal to statistical average corresponding to the 4th given threshold and data group and meets Sets requirement.
Optionally, computer executable instructions information when executed, according to each to be detected in data to be tested set The numerical value of data clusters data to be tested set, obtains at least one data group, comprising:
For the data to be tested that n-th is read from data to be tested set, respectively by data to be tested and current cluster At least one obtained primary data group is combined;Wherein, N is the positive integer greater than 1;
The statistical average of data to be tested in primary data group after determining data to be tested and each combination respectively Between difference and each combination after primary data group in data to be tested data standard it is poor;
Based on difference and data standard deviation, primary data group belonging to data to be tested is determined;
The data to be tested that n-th is read are added to the primary data group belonging to it, obtain at least one number According to group.
Optionally, computer executable instructions information when executed, is based on difference and data standard deviation, determines to be detected Primary data group belonging to data, comprising:
The primary data group that difference is less than or equal to the setting multiple of data standard difference is determined as data to be tested institute The primary data group of category.
Optionally, computer executable instructions information when executed, according to each to be detected in data to be tested set The numerical value of data clusters data to be tested set, further includes:
For first time from the data to be tested of the reading in data to be tested set, data to be tested are determined as one Primary data group.
The detection device for the abnormal data that this specification embodiment provides, when carrying out the detection of abnormal data, root first Data to be tested set is clustered according to the numerical value of each data to be tested, obtains at least one data group;Later, with number It is the detection that unit carries out abnormal data according to group.It, can be by number to be detected by polymerization processing in this specification embodiment Gather according to the data to be tested that numerical value in polymerization is not much different for a data group, and abnormal data is carried out based on data group Detection, can be to avoid brought by the difference of the order of magnitude between the maximum value minimum or data in data to be tested set It influences, to improve the accuracy of anomaly data detection.
Further, based on above-mentioned Fig. 1 to method shown in fig. 5, this specification embodiment additionally provides a kind of storage Jie Matter, for storing computer executable instructions information, in a kind of specific embodiment, the storage medium can for USB flash disk, CD, Hard disk etc., the computer executable instructions information of storage medium storage are able to achieve following below scheme when being executed by processor:
Data to be tested set is clustered according to the numerical value of data to be tested each in data to be tested set, is obtained At least one data group;Wherein, the numerical value of each data to be tested in the same data group meets default cluster condition;
Determine the group characteristics of at least one data group;Wherein, group characteristics include all data groups that cluster obtains Personal feature information corresponding to the global feature information of group and each data group difference;
Based on above-mentioned global feature information and/or personal feature information according in setting detected rule detection data group Abnormal data group.
Optionally, the computer executable instructions information of storage medium storage is based on group when being executed by processor After feature is according to the abnormal data group of setting detected rule detection, following steps can also be performed:
It will test the abnormal number that the data to be tested in determining abnormal data group are determined as in data to be tested set According to.
Optionally, the computer executable instructions information of storage medium storage is when being executed by processor, setting detection Rule includes the first setting detected rule and the second setting detected rule;
Based on above-mentioned global feature information and/personal feature information according to different in setting detected rule detection data group Regular data group, comprising:
The institute that cluster is obtained according to the first setting detected rule based on above-mentioned global feature information or personal feature information There is data group to be detected, screens out the non-abnormal data group in all data groups;Wherein, the first setting detected rule base It is determined in the global feature information of all data groups or based on personal feature information corresponding to each data group;
Determined based on above-mentioned global feature information and personal feature information according to the second setting detected rule screen out it is non-different Abnormal data group after regular data group in remaining data group;Wherein, the second setting detected rule is based on all data Personal feature information corresponding to the global feature information and each data group of group is determined.
Optionally, the computer executable instructions information of storage medium storage is when being executed by processor, all data The global feature information of group includes the quantity for the data group that cluster obtains;
The personal feature information of each data group includes the statistical average of the data to be tested in data group;
First setting detected rule includes any one in following rule:
Statistical average of the quantity of data group less than or equal to data to be tested in the first given threshold, data group Statistical average less than or equal to data to be tested in the second given threshold or data group is set more than or equal to third Threshold value;
Second setting detected rule includes any one in following rule:
The quantity of data group is greater than or equal to statistical average corresponding to the 4th given threshold and data group and meets Sets requirement.
Optionally, the computer executable instructions information of storage medium storage is when being executed by processor, according to be checked The numerical value of each data to be tested clusters data to be tested set in measured data set, obtains at least one data group Group, comprising:
For the data to be tested that n-th is read from data to be tested set, respectively by data to be tested and current cluster At least one obtained primary data group is combined;Wherein, N is the positive integer greater than 1;
The statistical average of data to be tested in primary data group after determining data to be tested and each combination respectively Between difference and each combination after primary data group in data to be tested data standard it is poor;
Based on difference and data standard deviation, primary data group belonging to data to be tested is determined;
The data to be tested that n-th is read are added to the primary data group belonging to it, obtain at least one number According to group.
Optionally, the computer executable instructions information of storage medium storage is based on difference when being executed by processor With data standard deviation, primary data group belonging to data to be tested is determined, comprising:
The primary data group that difference is less than or equal to the setting multiple of data standard difference is determined as data to be tested institute The primary data group of category.
Optionally, the computer executable instructions information of storage medium storage is when being executed by processor, according to be checked The numerical value of each data to be tested clusters data to be tested set in measured data set, further includes:
For first time from the data to be tested of the reading in data to be tested set, data to be tested are determined as one Primary data group.
The computer executable instructions information for the storage medium storage that this specification embodiment provides is being executed by processor When, when carrying out the detection of abnormal data, data to be tested set is gathered according to the numerical value of each data to be tested first Class obtains at least one data group;Later, the detection of abnormal data is carried out as unit of data group.In this specification reality It applies in example, the data to be tested that numerical value is not much different in can polymerizeing data to be tested is handled by polymerization and are gathered for a data Group, and the detection based on data group progress abnormal data, can be to avoid the maximum value minimum in data to be tested set Or influenced brought by the difference of the order of magnitude between data, to improve the accuracy of anomaly data detection.
In the 1990s, the improvement of a technology can be distinguished clearly be on hardware improvement (for example, Improvement to circuit structures such as diode, transistor, switches) or software on improvement (improvement for method flow).So And with the development of technology, the improvement of current many method flows can be considered as directly improving for hardware circuit. Designer nearly all obtains corresponding hardware circuit by the way that improved method flow to be programmed into hardware circuit.Cause This, it cannot be said that the improvement of a method flow cannot be realized with hardware entities module.For example, programmable logic device (Programmable Logic Device, PLD) (such as field programmable gate array (Field Programmable Gate Array, FPGA)) it is exactly such a integrated circuit, logic function determines device programming by user.By designer Voluntarily programming comes a digital display circuit " integrated " on a piece of PLD, designs and makes without asking chip maker Dedicated IC chip.Moreover, nowadays, substitution manually makes IC chip, this programming is also used instead mostly " is patrolled Volume compiler (logic compiler) " software realizes that software compiler used is similar when it writes with program development, And the source code before compiling also write by handy specific programming language, this is referred to as hardware description language (Hardware Description Language, HDL), and HDL is also not only a kind of, but there are many kind, such as ABEL (Advanced Boolean Expression Language)、AHDL(Altera Hardware Description Language)、Confluence、CUPL(Cornell University Programming Language)、HDCal、JHDL (Java Hardware Description Language)、Lava、Lola、MyHDL、PALASM、RHDL(Ruby Hardware Description Language) etc., VHDL (Very-High-Speed is most generally used at present Integrated Circuit Hardware Description Language) and Verilog.Those skilled in the art also answer This understands, it is only necessary to method flow slightly programming in logic and is programmed into integrated circuit with above-mentioned several hardware description languages, The hardware circuit for realizing the logical method process can be readily available.
Controller can be implemented in any suitable manner, for example, controller can take such as microprocessor or processing The computer for the computer readable program code (such as software or firmware) that device and storage can be executed by (micro-) processor can Read medium, logic gate, switch, specific integrated circuit (Application Specific Integrated Circuit, ASIC), the form of programmable logic controller (PLC) and insertion microcontroller, the example of controller includes but is not limited to following microcontroller Device: ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20 and Silicone Labs C8051F320 are deposited Memory controller is also implemented as a part of the control logic of memory.It is also known in the art that in addition to Pure computer readable program code mode is realized other than controller, can be made completely by the way that method and step is carried out programming in logic Controller is obtained to come in fact in the form of logic gate, switch, specific integrated circuit, programmable logic controller (PLC) and insertion microcontroller etc. Existing identical function.Therefore this controller is considered a kind of hardware component, and to including for realizing various in it The device of function can also be considered as the structure in hardware component.Or even, it can will be regarded for realizing the device of various functions For either the software module of implementation method can be the structure in hardware component again.
System, device, module or the unit that above-described embodiment illustrates can specifically realize by computer chip or entity, Or it is realized by the product with certain function.It is a kind of typically to realize that equipment is computer.Specifically, computer for example may be used Think personal computer, laptop computer, cellular phone, camera phone, smart phone, personal digital assistant, media play It is any in device, navigation equipment, electronic mail equipment, game console, tablet computer, wearable device or these equipment The combination of equipment.
For convenience of description, it is divided into various units when description apparatus above with function to describe respectively.Certainly, implementing this The function of each unit can be realized in the same or multiple software and or hardware when application.
It should be understood by those skilled in the art that, embodiments herein can provide as method, system or computer program Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the application Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the application, which can be used in one or more, The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces The form of product.
The application is reference according to the method for this specification embodiment, the stream of equipment (system) and computer program product Journey figure and/or block diagram describe.It should be understood that can be by computer program instructions information realization flowchart and/or the block diagram The combination of process and/or box in each flow and/or block and flowchart and/or the block diagram.It can provide these calculating Machine program instruction information is to general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices Processor is to generate a machine, so that the instruction executed by computer or the processor of other programmable data processing devices Information generates specifies for realizing in one or more flows of the flowchart and/or one or more blocks of the block diagram Function device.
These computer program instructions information, which may also be stored in, is able to guide computer or other programmable data processing devices In computer-readable memory operate in a specific manner, so that command information stored in the computer readable memory produces Raw includes the manufacture of command information device, the command information device realize in one or more flows of the flowchart and/or The function of being specified in one or more blocks of the block diagram.
These computer program instructions information also can be loaded onto a computer or other programmable data processing device, so that Series of operation steps are executed on a computer or other programmable device to generate computer implemented processing, thus calculating The command information that is executed on machine or other programmable devices provide for realizing in one or more flows of the flowchart and/or The step of function of being specified in one or more blocks of the block diagram.
In a typical configuration, calculating equipment includes one or more processors (CPU), input/output interface, net Network interface and memory.
Memory may include the non-volatile memory in computer-readable medium, random access memory (RAM) and/or The forms such as Nonvolatile memory, such as read-only memory (ROM) or flash memory (flash RAM).Memory is computer-readable medium Example.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method Or technology come realize information store.Information can be computer-readable instruction information, data structure, the module of program or other numbers According to.The example of the storage medium of computer includes, but are not limited to phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other kinds of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory techniques, CD-ROM are read-only Memory (CD-ROM), digital versatile disc (DVD) or other optical storage, magnetic cassettes, tape magnetic disk storage or Other magnetic storage devices or any other non-transmission medium, can be used for storage can be accessed by a computing device information.According to Herein defines, and computer-readable medium does not include temporary computer readable media (transitory media), such as modulation Data-signal and carrier wave.
It should also be noted that, the terms "include", "comprise" or its any other variant are intended to nonexcludability It include so that the process, method, commodity or the equipment that include a series of elements not only include those elements, but also to wrap Include other elements that are not explicitly listed, or further include for this process, method, commodity or equipment intrinsic want Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including described want There is also other identical elements in the process, method of element, commodity or equipment.
It will be understood by those skilled in the art that embodiments herein can provide as method, system or computer program product. Therefore, complete hardware embodiment, complete software embodiment or embodiment combining software and hardware aspects can be used in the application Form.It is deposited moreover, the application can be used to can be used in the computer that one or more wherein includes computer usable program code The shape for the computer program product implemented on storage media (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) Formula.
The application can computer executable instructions information it is general up and down described in the text, such as Program module.Generally, program module include routines performing specific tasks or implementing specific abstract data types, it is program, right As, component, data structure etc..The application can also be practiced in a distributed computing environment, in these distributed computing environment In, by executing task by the connected remote processing devices of communication network.In a distributed computing environment, program module It can be located in the local and remote computer storage media including storage equipment.
All the embodiments in this specification are described in a progressive manner, same and similar portion between each embodiment Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for system reality For applying example, since it is substantially similar to the method embodiment, so being described relatively simple, related place is referring to embodiment of the method Part explanation.
The above description is only an example of the present application, is not intended to limit this application.For those skilled in the art For, various changes and changes are possible in this application.All any modifications made within the spirit and principles of the present application are equal Replacement, improvement etc., should be included within the scope of the claims of this application.

Claims (15)

1. a kind of detection method of abnormal data, comprising:
According to the numerical value of data to be tested each in data to be tested set, the data to be tested set is clustered, is obtained To at least one data group;Wherein, the numerical value of each data to be tested in the same data group meets default cluster condition;
Determine the group characteristics of at least one data group;Wherein, the group characteristics include all institutes that cluster obtains State the global feature information of data group and the personal feature information that each data group difference is corresponding;
The data group is detected according to setting detected rule based on the global feature information and/or the personal feature information In abnormal data group.
2. the method as described in claim 1, the number abnormal according to setting detected rule detection based on the group characteristics After group, the method also includes:
Will test the data to be tested in the determining abnormal data group be determined as it is different in the data to be tested set Regular data.
3. method according to claim 1 or 2, the setting detected rule includes the first setting detected rule and the second setting Detected rule;
It is described that the data group is detected according to setting detected rule based on the global feature information and the/personal feature information Abnormal data group in group, comprising:
Cluster is obtained according to the first setting detected rule based on the global feature information or the personal feature information All data groups detected, screen out the non-abnormal data group in all data groups;Wherein, described Global feature information or based on each data group institute right of the one setting detected rule based on all data groups The personal feature information answered is determined;
It is screened out based on the global feature information and the personal feature information according to the second setting detected rule determination Abnormal data group after the non-abnormal data group in remaining data group;Wherein, the second setting detected rule The institute of personal feature information corresponding to global feature information and each data group based on all data groups is really It is fixed.
4. method as claimed in claim 3, the global feature information of all data groups include cluster obtain it is described The quantity of data group;
The personal feature information of each data group includes the statistical average of the data to be tested in the data group;
The first setting detected rule includes any one in following rule:
The quantity of the data group is flat less than or equal to the data of data to be tested in the first given threshold, the data group The statistical average that mean value is less than or equal to data to be tested in the second given threshold or the data group is greater than or equal to Third given threshold;
The second setting detected rule includes any one in following rule:
The quantity of the data group is greater than or equal to statistical average corresponding to the 4th given threshold and the data group Meet sets requirement.
5. the method as described in claim 1, the numerical value according to data to be tested each in data to be tested set is to institute It states data to be tested set to be clustered, obtains at least one data group, comprising:
For the data to be tested that n-th is read from the data to be tested set, successively by the data to be tested with At least one obtained primary data group is currently clustered to be combined;Wherein, N is the positive integer greater than 1;
In primary data group after determining the data to be tested and each combination between the statistical average of data to be tested Difference and each combination after primary data group in data to be tested data standard it is poor;
It is poor based on the difference and the data standard, determine primary data group belonging to the data to be tested;
The data to be tested that n-th is read are added to the primary data group belonging to it, obtain at least one data group Group.
6. method as claimed in claim 5, described poor based on the difference and the data standard, the number to be detected is determined According to affiliated primary data group, comprising:
The primary data group that the difference is less than or equal to the setting multiple of the data standard difference is determined as described to be checked Primary data group belonging to measured data.
7. such as method described in claim 5 or 6, the numerical value pair according to data to be tested each in data to be tested set The data to be tested set is clustered, further includes:
It is for first time from the data to be tested of the reading in the data to be tested set, the data to be tested are true It is set to a primary data group.
8. a kind of detection device of abnormal data, comprising:
Cluster module, for the numerical value according to data to be tested each in data to be tested set, to the data to be tested collection Conjunction is clustered, at least one data group is obtained;Wherein, the numerical value of each data to be tested in the same data group meets Default cluster condition;
First determining module, for determining the group characteristics of at least one data group;Wherein, the group characteristics include Cluster the global feature information of obtained all data groups and the individual spy that each data group difference is corresponding Reference breath;
Detection module, for being examined based on the global feature information and/or the personal feature information according to setting detected rule Survey the abnormal data group in the data group.
9. device as claimed in claim 8, described device further include:
Second determining module, for will test the data to be tested in the determining abnormal data group be determined as it is described to be checked Abnormal data in measured data set.
10. device as claimed in claim 8 or 9, the setting detected rule includes that the first setting detected rule and second is set Determine detected rule;
The detection module, comprising:
First detection unit, for being examined based on the global feature information or the personal feature information according to first setting All data groups that gauge then obtains cluster detect, to screen out the non-abnormal number in all data groups According to group;Wherein, the first global feature information of the setting detected rule based on all data groups or based on every Personal feature information corresponding to a data group is determined;
Second detection unit, for being examined based on the global feature information and the personal feature information according to second setting Gauge then determines the abnormal data group screened out after the non-abnormal data group in remaining data group;Wherein, described Corresponding to second setting global feature information and each data group of the detected rule based on all data groups Personal feature information is determined.
11. device as claimed in claim 8, the cluster module, comprising:
Assembled unit, the data to be tested for reading for n-th from the data to be tested set successively will be described Data to be tested are combined at least one obtained primary data group is currently clustered;Wherein, N is the positive integer greater than 1;
First determination unit, for determining the data to be tested and data to be tested in the primary data group after each combination Statistical average between difference and each combination after primary data group in data to be tested data standard it is poor;
Second determination unit, it is poor for being based on the difference and the data standard, it determines belonging to the data to be tested just Beginning data group;
Adding unit, the data to be tested for reading n-th are added to the primary data group belonging to it, obtain to A few data group.
12. device as claimed in claim 11, second determination unit, are specifically used for:
The primary data group that the difference is less than or equal to the setting multiple of the data standard difference is determined as described to be checked Primary data group belonging to measured data.
13. the device as described in claim 11 or 12, the cluster module, further includes:
Third determination unit, for being directed to the data to be tested from the reading in the data to be tested set for the first time, The data to be tested are determined as a primary data group.
14. a kind of detection device of abnormal data, comprising:
Processor;And
It is arranged to the memory of storage computer executable instructions, the executable instruction makes the processing when executed Device:
According to the numerical value of data to be tested each in data to be tested set, the data to be tested set is clustered, is obtained To at least one data group;Wherein, the numerical value of each data to be tested in the same data group meets default cluster condition;
Determine the group characteristics of at least one data group;Wherein, the group characteristics include all institutes that cluster obtains State the global feature information of data group and the personal feature information that each data group difference is corresponding;
It is detected in the data group based on the global feature information and the personal feature information according to setting detected rule Abnormal data group.
15. a kind of storage medium, for storing computer executable instructions, the executable instruction is realized following when executed Process:
According to the numerical value of data to be tested each in data to be tested set, the data to be tested set is clustered, is obtained To at least one data group;Wherein, the numerical value of each data to be tested in the same data group meets default cluster condition;
Determine the group characteristics of at least one data group;Wherein, the group characteristics include all institutes that cluster obtains State the global feature information of data group and the personal feature information that each data group difference is corresponding;
The data group is detected according to setting detected rule based on the global feature information and/or the personal feature information In abnormal data group.
CN201910130668.0A 2019-02-21 2019-02-21 The detection method and device of abnormal data Pending CN110059712A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910130668.0A CN110059712A (en) 2019-02-21 2019-02-21 The detection method and device of abnormal data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910130668.0A CN110059712A (en) 2019-02-21 2019-02-21 The detection method and device of abnormal data

Publications (1)

Publication Number Publication Date
CN110059712A true CN110059712A (en) 2019-07-26

Family

ID=67315990

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910130668.0A Pending CN110059712A (en) 2019-02-21 2019-02-21 The detection method and device of abnormal data

Country Status (1)

Country Link
CN (1) CN110059712A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110602101A (en) * 2019-09-16 2019-12-20 北京三快在线科技有限公司 Method, device, equipment and storage medium for determining network abnormal group
CN113537363A (en) * 2021-07-20 2021-10-22 北京奇艺世纪科技有限公司 Abnormal object detection method and device, electronic equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107247954A (en) * 2017-06-16 2017-10-13 山东省计算中心(国家超级计算济南中心) A kind of image outlier detection method based on deep neural network
CN108206813A (en) * 2016-12-19 2018-06-26 中国移动通信集团山西有限公司 Method for auditing safely, device and server based on k means clustering algorithms
CN108681493A (en) * 2018-05-29 2018-10-19 深圳乐信软件技术有限公司 Data exception detection method, device, server and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108206813A (en) * 2016-12-19 2018-06-26 中国移动通信集团山西有限公司 Method for auditing safely, device and server based on k means clustering algorithms
CN107247954A (en) * 2017-06-16 2017-10-13 山东省计算中心(国家超级计算济南中心) A kind of image outlier detection method based on deep neural network
CN108681493A (en) * 2018-05-29 2018-10-19 深圳乐信软件技术有限公司 Data exception detection method, device, server and storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110602101A (en) * 2019-09-16 2019-12-20 北京三快在线科技有限公司 Method, device, equipment and storage medium for determining network abnormal group
CN113537363A (en) * 2021-07-20 2021-10-22 北京奇艺世纪科技有限公司 Abnormal object detection method and device, electronic equipment and storage medium
CN113537363B (en) * 2021-07-20 2023-12-15 北京奇艺世纪科技有限公司 Abnormal object detection method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
TWI818999B (en) Predictive model training method and device for new scenarios
TW201928841A (en) Method, apparatus, and device for training risk control model and risk control
CN107679700A (en) Business flow processing method, apparatus and server
CN108596410B (en) Automatic wind control event processing method and device
CN110033130A (en) The monitoring method and device of abnormal traffic
CN109274597B (en) Control method, device and equipment for special service line
CN109086961A (en) A kind of Information Risk monitoring method and device
CN110443618A (en) The generation method and device of air control strategy
CN107066519A (en) A kind of task detection method and device
US20140071170A1 (en) Non-uniformly scaling a map for emphasizing areas of interest
CN108734304A (en) A kind of training method of data model, device and computer equipment
CN110059712A (en) The detection method and device of abnormal data
CN110334013A (en) Test method, device and the electronic equipment of decision engine
CN109597678A (en) Task processing method and device
CN107038127A (en) Application system and its buffer control method and device
CN109144715A (en) A kind of method, server and the equipment of resource optimization and update
CN108804563A (en) A kind of data mask method, device and equipment
CN109492401A (en) A kind of content vector risk checking method, device, equipment and medium
WO2021120845A1 (en) Homogeneous risk unit feature set generation method, apparatus and device, and medium
CN102254569B (en) Quad-data rate (QDR) controller and realization method thereof
CN109039695B (en) Service fault processing method, device and equipment
CN110245166A (en) Verification of data method and device
CN109582388A (en) One parameter configuration method, device and equipment
CN107679547A (en) A kind of data processing method for being directed to two disaggregated models, device and electronic equipment
CN109903165B (en) Model merging method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20200923

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman, British Islands

Applicant after: Innovative advanced technology Co.,Ltd.

Address before: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman, British Islands

Applicant before: Advanced innovation technology Co.,Ltd.

Effective date of registration: 20200923

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman, British Islands

Applicant after: Advanced innovation technology Co.,Ltd.

Address before: A four-storey 847 mailbox in Grand Cayman Capital Building, British Cayman Islands

Applicant before: Alibaba Group Holding Ltd.