CN104679970A - Data detection method and device - Google Patents

Data detection method and device Download PDF

Info

Publication number
CN104679970A
CN104679970A CN201310629648.0A CN201310629648A CN104679970A CN 104679970 A CN104679970 A CN 104679970A CN 201310629648 A CN201310629648 A CN 201310629648A CN 104679970 A CN104679970 A CN 104679970A
Authority
CN
China
Prior art keywords
traffic data
historical traffic
day
time range
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201310629648.0A
Other languages
Chinese (zh)
Other versions
CN104679970B (en
Inventor
杨承继
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba China Co Ltd
Original Assignee
Autonavi Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Autonavi Software Co Ltd filed Critical Autonavi Software Co Ltd
Priority to CN201310629648.0A priority Critical patent/CN104679970B/en
Publication of CN104679970A publication Critical patent/CN104679970A/en
Application granted granted Critical
Publication of CN104679970B publication Critical patent/CN104679970B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Traffic Control Systems (AREA)

Abstract

The application discloses a data detection method and a data detection device. The data detection method includes: obtaining target data which includes history traffic data of a target road in each day of a preset statistic period; screening out the history traffic data of release dates which conform to preset typical day types according to the preset typical day types; performing first abnormal detection on the history traffic data of the same preset typical day type and in the same preset statistic time segment so as to obtain a first abnormal detection result; performing second abnormal detection on the history traffic data of the release dates which conform to the typical day types by the day so as to obtain a second abnormal detection result; confirming the first abnormal detection result and the second abnormal detection result as an abnormal data detection result of the target data. The data detection method and the data detection device can detect abnormal data of the history traffic data of the target road so as to guarantee that the history traffic data for analyzing typicality is the data which can truly reflect traffic conditions of the road, and thereby improve accuracy of an analysis result.

Description

A kind of data detection method and device
Technical field
The application relates to Data Detection Technology field, particularly a kind of data detection method and device.
Background technology
Along with development and the widespread use of intelligent transportation system, intelligent and mobilism are also progressively moved towards in the application of its urban traffic guidance, on this basis, intelligent transportation system is by obtaining the current traffic data in city every the schedule time (as 2 minutes or 5 minutes), and issue in time, so that user can understand the current traffic information in its city, place timely.Because current traffic data is issued comparatively frequent, have accumulated a large amount of historical traffic data, therefore, can by carrying out the analysis of different grain size, different dimensions to historical traffic data, obtain the traffic circulation rule of urban road, thus provide important foundation for filling up and predicting of urban traffic information.
At present, directly typicalness analysis is carried out to all historical traffic data in city, obtain the transport information of each statistical time range of every bar road in each correspondence quasi-representative day.
In actual applications, because various factors (as weather, traffic hazard etc.) all can cause the partial data in urban history traffic data to be abnormal data, this part abnormal data actual capabilities truly can not reflect the traffic conditions of road, current technical scheme cannot detect these abnormal datas, therefore, directly to all historical traffic data in city carry out typicalness analysis obtain every bar road each quasi-representative day correspondence the transport information of each statistical time range be inaccurate.
Summary of the invention
For the described technical matters that prior art exists, analyze according to historical traffic data each road each quasi-representative day correspondence each statistical time range transport information before, the application provides a kind of data detection method and device, by the anomaly data detection in urban history traffic data out, to guarantee that the historical traffic data for analyzing typicalness is all the data that comparatively truly can reflect road traffic condition, thus improve precision of analysis.
This application provides a kind of data detection method, comprising:
Obtain target data, the historical traffic data of described target data comprises a target road in default measurement period every day;
According to preset typical day type, from described target data, filter out the historical traffic data meeting described typical case's day type date issued;
Identical and historical traffic data in preset same statistical time range carries out the first abnormality detection to typical case's day type, obtain the first abnormality detection result;
Second abnormality detection is carried out to the historical traffic data of the every day meeting typical case's day type date issued, obtains the second abnormality detection result;
By described first abnormality detection result and described second abnormality detection result, be defined as the anomaly data detection result of described target data.
Said method, preferably, described identical and historical traffic data in preset same statistical time range carries out the first abnormality detection to typical case's day type, obtain the first abnormality detection result, comprising:
Determine typical case's day type identical and the U statistic of each historical traffic data in preset same statistical time range and region of rejection critical value;
Judge whether described U statistic is greater than its region of rejection critical value, if so, then determine that described historical traffic data is abnormal, otherwise, determine that described historical traffic data is normal.
Said method, preferably, the historical traffic data of described every day to meeting typical case's day type date issued carries out the second abnormality detection, obtains the second abnormality detection result, comprising:
Following steps are performed to the historical traffic data of the every day meeting typical case's day type date issued:
Historical traffic data is on the same day divided according to issuing time, obtains historical traffic data sequence;
Determine U statistic and the region of rejection critical value thereof of each historical traffic data in historical traffic data sequence;
Judge whether described U statistic is greater than its region of rejection critical value, if so, then determine that described historical traffic data is abnormal, otherwise, determine that described historical traffic data is normal.
Said method, preferably, divides historical traffic data on the same day according to issuing time, obtains historical traffic data sequence, comprising:
By in historical traffic data on the same day, the historical traffic data that issuing time is in same issuing time section is divided in same historical traffic data subsequence;
From first historical traffic data subsequence, obtain historical traffic data average μ and the variances sigma of adjacent two historical traffic data subsequences successively, wherein, x ifor the value of i-th historical traffic data in historical traffic data subsequence, n is the number of historical traffic data value in described historical traffic data subsequence;
Judge average μ and the variances sigma whether all correspondent equal of adjacent two historical traffic data subsequences, if, described two historical traffic data subsequences are merged as a historical traffic data sequence, otherwise, using above-mentioned two historical traffic data subsequences as historical traffic data sequence.
Said method, preferably, determine the U statistic of historical traffic data, comprising:
According to determine the U statistic of described historical traffic data;
Wherein, U is the U statistic of described historical traffic data, x ibe the value of i-th described historical traffic data, n is the identical and number that is historical traffic data in preset same statistical time range of typical case's day type, or n is the number of the historical traffic data in data sequence, wherein σ = Σ 1 n ( x i - μ ) / ( n - 1 ) ;
Wherein, determine the region of rejection critical value of described historical traffic data, comprising:
According to P (U> μ α/2)=α and preset distributions table, determine region of rejection critical value μ α/2, wherein, α is default insolation level value.
Said method, preferably, by described first abnormality detection result and described second abnormality detection result, after being defined as the anomaly data detection result of described target data, described method also comprises:
Identical and the normal historical traffic data in preset same statistical time range according to typical case's day type, utilizes obtain average μ;
Wherein, n is the identical and value number of normal historical traffic data in preset same statistical time range of typical case's day type, x iit is the value of i-th described normal historical traffic data;
Described average μ is defined as the traffic data statistical value of the described preset statistical time range of the typical day belonging to described typical case's day type.
Said method, preferably, after the traffic data statistical value of described preset statistical time range described average μ being defined as the typical day belonging to described typical case's day type, described method also comprises:
For each typical case's day, judge whether the preset statistical time range of described typical case's day lacks traffic data statistical value;
When the preset statistical time range disappearance traffic data statistical value of described typical case's day, according to the previous statistical time range of described statistical time range and the traffic data statistical value of a rear statistical time range, fill up the traffic data statistical value of described statistical time range.
Said method, preferably, after the traffic data statistical value of described preset statistical time range described average μ being defined as the typical day belonging to described typical case's day type, described method also comprises:
For each typical case's day, judge whether the traffic data statistical value of the preset statistical time range of described typical case's day exceeds default threshold range;
When the traffic data statistical value of the preset statistical time range of described typical case's day exceeds default threshold range, determine that the traffic data statistical value of described statistical time range is sudden change value, according to the previous statistical time range of this statistical time range and the traffic data statistical value of a rear statistical time range, to the smoothing process of traffic data statistical value of described statistical time range.
Present invention also provides a kind of data detection device, comprising:
Data acquisition module, for obtaining target data, the historical traffic data of described target data comprises a target road in default measurement period every day;
Data screening module, for according to preset typical day type, from described target data, filters out the historical traffic data meeting described typical case's day type date issued;
First detection module, for identical and historical traffic data in preset same statistical time range carries out the first abnormality detection to typical case's day type, obtains the first abnormality detection result;
Second detection module, for carrying out the second abnormality detection to the historical traffic data of the every day meeting typical case's day type date issued, obtains the second abnormality detection result;
Result determination module, for by described first abnormality detection result and described second abnormality detection result, is defined as the anomaly data detection result of described target data.
Said apparatus, preferably, described first detection module comprises:
First statistics submodule, for determining typical case's day type identical and the U statistic of each historical traffic data in preset same statistical time range and region of rejection critical value;
First result generates submodule, for judging whether described U statistic is greater than its region of rejection critical value, if so, then determines that described historical traffic data is abnormal, otherwise, determine that described historical traffic data is normal.
Said apparatus, preferably, described second detection module comprises:
Retrieval submodule, for meeting date issued in the historical traffic data of every day of typical case's day type, dividing historical traffic data on the same day according to issuing time, obtaining historical traffic data sequence;
Second statistics submodule, for determining U statistic and the region of rejection critical value thereof of each historical traffic data in historical traffic data sequence;
Second result generates submodule, for judging whether described U statistic is greater than its region of rejection critical value, if so, then determines that described historical traffic data is abnormal, otherwise, determine that described historical traffic data is normal.
Said apparatus, preferably, described retrieval submodule comprises:
Subsequence division unit, for by historical traffic data on the same day, the historical traffic data that issuing time is in same issuing time section is divided in same historical traffic data subsequence;
Subsequence average acquiring unit, for from first historical traffic data subsequence, obtains historical traffic data average μ and the variances sigma of adjacent two historical traffic data subsequences successively, wherein, x ifor the value of i-th historical traffic data in historical traffic data subsequence, n is the number of historical traffic data value in described historical traffic data subsequence;
Sequence determination unit, whether average μ and variances sigma for judging adjacent two historical traffic data subsequences be all equal, if, described two historical traffic data subsequences are merged as a historical traffic data sequence, otherwise, using above-mentioned two historical traffic data subsequences as historical traffic data sequence.
Said apparatus, preferably, described first statistics submodule or described second statistics submodule comprise:
U statistic determining unit, for basis determine the U statistic of described historical traffic data;
Wherein, U is the U statistic of described historical traffic data, x ibe the value of i-th described historical traffic data, n is the identical and number that is historical traffic data in preset same statistical time range of typical case's day type, or n is the number of the historical traffic data in data sequence, wherein σ = Σ 1 n ( x i - μ ) / ( n - 1 ) ;
Critical value determining unit, for according to P (U> μ α/2)=α and preset distributions table, determine region of rejection critical value μ α/2, wherein, α is default insolation level value.
Said apparatus, preferably, also comprises:
Data mean value acquisition module, for identical according to typical case's day type and in preset same statistical time range normal historical traffic data, utilizes obtain average μ;
Wherein, n is the identical and value number of normal historical traffic data in preset same statistical time range of typical case's day type, x iit is the value of i-th described normal historical traffic data;
Statistical value determination module, for being defined as the traffic data statistical value of the described preset statistical time range of the typical day belonging to described typical case's day type by described average μ.
Said apparatus, preferably, also comprises:
Disappearance judge module, for each typical case's day, judges whether the preset statistical time range of described typical case's day lacks traffic data statistical value, if so, triggers statistical value and fill up module;
Statistical value fills up module, for according to the previous statistical time range of described statistical time range and the traffic data statistical value of a rear statistical time range, fills up the traffic data statistical value of described statistical time range.
Said apparatus, preferably, also comprises:
Scope judge module, for for each typical case's day, judges whether the traffic data statistical value of the preset statistical time range of described typical case's day exceeds default threshold range, if so, triggers statistical value Leveling Block;
Statistical value Leveling Block, for determining that the traffic data statistical value of described statistical time range is sudden change value, according to the previous statistical time range of this statistical time range and the traffic data statistical value of a rear statistical time range, to the smoothing process of traffic data statistical value of described statistical time range.
From such scheme, a kind of data detection method that the application provides and device, after the target data (target data is the historical traffic data of every day in default measurement period) getting target road, identical and historical traffic data in preset same measurement period carries out the first longitudinal abnormality detection to typical case's day type in target data successively, obtain the first abnormality detection result, second abnormality detection is carried out to the historical traffic data of the every day meeting typical case's day type date issued in target data simultaneously, obtain the second abnormality detection result, thus the anomaly data detection result the first abnormality detection result and the second abnormality detection result are defined as in this target road.The application was analyzing each road before the transport information of each statistical time range of typical case's day according to historical traffic data, a kind of data detection method and device are provided, by the anomaly data detection of the historical traffic data of this road out, to guarantee that the historical traffic data for analyzing typicalness is all the data that comparatively truly can reflect road traffic condition, thus improve precision of analysis.
Accompanying drawing explanation
In order to be illustrated more clearly in the technical scheme in the embodiment of the present application, below the accompanying drawing used required in describing embodiment is briefly described, apparently, accompanying drawing in the following describes is only some embodiments of the application, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to these accompanying drawings.
The process flow diagram of a kind of data detection method that Fig. 1 provides for the embodiment of the present application one;
Fig. 2 is the partial process view of the embodiment of the present application one;
The partial process view of a kind of data detection method that Fig. 3 provides for the embodiment of the present application two;
The partial process view of a kind of data detection method reality that Fig. 4 provides for the embodiment of the present application three;
The partial process view of a kind of data detection method that Fig. 5 provides for the embodiment of the present application four;
The partial process view of a kind of data detection method that Fig. 6 provides for the embodiment of the present application five;
The partial process view of a kind of data detection method that Fig. 7 provides for the embodiment of the present application six;
The structural representation of a kind of data detection device that Fig. 8 provides for the embodiment of the present application seven;
The part-structure schematic diagram of a kind of data detection device that Fig. 9 provides for the embodiment of the present application eight;
Another part structural representation of a kind of data detection device that Figure 10 provides for the embodiment of the present application eight;
The part-structure schematic diagram of a kind of data detection device that Figure 11 provides for the embodiment of the present application nine;
Another part structural representation of a kind of data detection device that Figure 12 provides for the embodiment of the present application nine;
The structural representation of a kind of data detection device that Figure 13 provides for the embodiment of the present application ten;
The structural representation of a kind of data detection device that Figure 14 provides for the embodiment of the present application 11;
The structural representation of a kind of data detection device that Figure 15 provides for the embodiment of the present application 12.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present application, be clearly and completely described the technical scheme in the embodiment of the present application, obviously, described embodiment is only some embodiments of the present application, instead of whole embodiments.Based on the embodiment in the application, those of ordinary skill in the art are not making the every other embodiment obtained under creative work prerequisite, all belong to the scope of the application's protection.
Embodiment one
With reference to figure 1, be the process flow diagram of a kind of data detection method embodiment one that the application provides, described method can comprise the following steps:
Step 101: obtain target data.
Wherein, the traffic historical data of described target data comprises a target road in default measurement period every day, that is, described target data is the data needing to carry out abnormality detection.
Wherein, described default measurement period can be arranged voluntarily by testing staff, and such as can arrange described default measurement period is based on experience value 1 year or one month etc.
It should be noted that, to export at present in the data handling system of historical traffic data and usually can issue a traffic data (as vehicle has travelled hourage required for this target road, the occupancy of road surface or the vehicle number etc. of road surface at the travel speed of this target road, vehicle) every a time point (as 2 minutes, 5 minutes etc.).Described target data can be all traffic datas accumulative in the measurement period preset, and also can be the traffic data obtained according to regular time point sampling traffic data accumulative in this measurement period.
Step 102: according to preset typical day type, from target data, filter out the historical traffic data meeting described typical case's day type date issued.
Step 103: identical and historical traffic data in preset same statistical time range carries out the first abnormality detection to typical case's day type, obtains the first abnormality detection result;
Wherein, carry out the first abnormality detection in described step 103 to be and to carry out longitudinal anomaly data detection to described target data, such as, measurement period is one month, typical case's day type is Monday, and the statistical time range preset is respectively 7:00-8:00,12:00-13:00,18:00-19:00, then step 103 carries out the first abnormality detection to the historical traffic data of the 7:00-8:00 of four Mondays of this month, the historical traffic data of the 12:00-13:00 of four Mondays, the 18:00-19:00 historical traffic data of four Mondays respectively.
Step 104: the second abnormality detection is carried out to the historical traffic data of the every day meeting typical case's day type date issued, obtains the second abnormality detection result.
Wherein, carry out the second abnormality detection in described step 104 to be and to carry out horizontal anomaly data detection to described target data, such as, anomaly data detection is carried out to the historical traffic data that each typical case in measurement period comprises day, obtains the second horizontal abnormality detection result.If measurement period is one month, typical case's day type is Monday, then step 104 carries out Data Detection respectively to the historical traffic data that four Mondays are corresponding.
Step 105: by described first abnormality detection result and described second abnormality detection result, be defined as the anomaly data detection result of described target data.
In the embodiment of the present invention, typical case's day type and statistical time range all can be pre-set by user, as typical case's day type can be chosen to be Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday, vacation on May Day, 11 vacations, the Spring Festival etc.Statistical time range can be set to peak period on and off duty, as 7:00-9:00,12:00-14:00,18:00-19:00 etc. in the middle of one day according to daily life custom.
Preferably, for improving the validity that typical case chooses day further, and realize choosing typical day of robotization, the embodiment of the present invention provides a kind of typical case's day type choosing method, the method is as preliminary election typical case's day using the every day in measurement period, preliminary election typical case is carried out combination of two day, for each combination execution method flow as shown in Figure 2, comprises the following steps:
Step 201: utilize the related coefficient of the historical traffic data that two the preliminary election typical cases obtaining present combination comprise day;
Wherein, x tbe t the historical traffic data of preliminary election typical case's day, y tfor t the historical traffic data of another preliminary election typical case's day, the number of the historical traffic data that n comprises day for preliminary election typical case, ρ xyit is the related coefficient of the historical traffic data of two preliminary election typical case's days.
Step 202: judge whether described related coefficient is greater than default first threshold values, if so, performs step 203, if not then performing step 204.
Step 203: using described two preliminary elections typical case's day as one typical day type.
Step 204: using described two preliminary elections typical case's day as one typical day type.
According to this, typical day type of described measurement period is obtained.
For being more clearly and detailedly described the choosing of typical day in the embodiment of the present invention, be described in detail with an instantiation below.The as if statistics cycle is one week, and preliminary election typical case is set to Monday, Tuesday, Wednesday, Thursday, Friday, Saturday and Sunday day, and carry out combination of two by between preliminary election typical case's day, the correlation coefficient calculating each combination is as shown in table 1 below:
Table 1 related coefficient
Related coefficient Monday Tuesday Wednesday Thursday Friday Saturday Sunday
Monday 1 0.6716 0.7524 0.7135 0.6398 0.1249 0.2203
Tuesday 0.6716 1 0.7478 0.7759 0.7079 0.3323 0.3084
Wednesday 0.7524 0.7478 1 0.8214 0.7871 0.3714 0.3987
Thursday 0.7135 0.7759 0.8214 1 0.7727 0.3844 0.3739
Friday 0.6398 0.7079 0.7871 0.7727 1 0.4569 0.3962
Saturday 0.1249 0.3323 0.3714 0.3844 0.4569 1 0.5244
Sunday 0.2203 0.3084 0.3987 0.3739 0.3962 0.5244 1
When described first threshold values is set to 0.8, can using Wednesday and Thursday as one typical day type, known by upper table 1, typical case's day type comprises Monday, Tuesday, Wednesday and Thursday, Friday, Saturday, Sunday.
Preferably, the embodiment of the present invention is the determination statistical time range realizing robotization, and the embodiment of the present invention also provides a kind of statistical time range obtain manner, specific as follows:
According to issuing time by early to the order in evening, obtain all historical traffic data issued for a day successively N number of, from first historical traffic data, calculate the standard deviation of the first two historical traffic data, if standard deviation is more than or equal to predetermined threshold value, the first two historical traffic data is combined as a historical traffic data, if standard deviation is less than predetermined threshold value, then add the 3rd historical traffic data, calculate the standard deviation of first three historical traffic data, if standard deviation is more than or equal to predetermined threshold value, first three historical traffic data is combined as a historical traffic data, if be less than predetermined threshold value, add the 4th historical traffic data and calculate the standard deviation of front four historical traffic data ... the rest may be inferred, until add m historical traffic data standard deviation to be more than or equal to its pre-set threshold value, now, this m historical traffic data is combined as a historical traffic data, choose m+1 and m+2 historical traffic data afterwards and proceed the determination that next historical traffic data combines, until in described target data every day historical traffic data in all historical traffic data be all selected and be disposed, the acquisition time of historical traffic data corresponding for each historical traffic data combination finally obtained is combined into a time period, each time period is defined as a statistical time range.
Such as, for a target road (single track road), the release cycle intending traffic data is 5 minutes, then one day (typical case day) issues 288 traffic datas, and (namely within one day 24 hours, be converted into minute is 24*60=1440 minute, 1440 minutes/5 minutes=288 traffic datas), the standard deviation sigma of m traffic data before calculating, if σ≤ε, continue to add next traffic data and recalculate standard deviation sigma, until σ > ε, the sample points participating in calculating is n, remember that the time period that the issuing time point that this n traffic data is corresponding is formed is a statistical time range, the judgement of next round is carried out from n+1 traffic data, until complete to 288 traffic data analyzings.
From such scheme, a kind of data detection method that the application provides, after the target data (target data is the historical traffic data of every day in default measurement period) getting target road, identical and historical traffic data in preset same measurement period carries out the first longitudinal abnormality detection to typical case's day type in target data successively, obtain the first abnormality detection result, second abnormality detection is carried out to the historical traffic data of the every day meeting typical case's day type date issued in target data simultaneously, obtain the second abnormality detection result, thus the anomaly data detection result the first abnormality detection result and the second abnormality detection result are defined as in this target road.The embodiment of the present application one was analyzing each road before the transport information of each statistical time range of typical case's day according to historical traffic data, a kind of data detection method and device are provided, by the anomaly data detection of the historical traffic data of this road out, to guarantee that the historical traffic data for analyzing typicalness is all the data that comparatively truly can reflect road traffic condition, thus improve precision of analysis.
Embodiment two
Compared with the data detection method that the data detection method that the embodiment of the present invention two provides and embodiment one provide, refinement is carried out to step 103 in the process flow diagram shown in earlier figures 1.With reference to figure 3, be the process flow diagram of step 103, wherein, described step 103 can comprise the following steps:
Step 301: determine typical case's day type identical and the U statistic of each historical traffic data in preset same statistical time range and region of rejection critical value.
Such as, all Mondays each historical traffic data among statistical time range 7:00-8:00s of target road in measurement period is carried out to the determination of U statistic and region of rejection critical value.
Wherein, when determining the U statistic of each historical traffic data in described step 301, can realize in the following manner:
Utilize obtain the U statistic of each historical traffic data.
Wherein, U is the identical and U statistic of each historical traffic data in preset same statistical time range of described typical case's day type, x ibe the value of i-th described historical traffic data, n is the identical and number of historical traffic data in preset same statistical time range of described typical case's day type, and μ can pass through μ = ( Σ i 1 n x i ) / n Obtain, σ can pass through σ = Σ 1 n ( x i - μ ) / ( n - 1 ) Obtain.
Determine the region of rejection critical value of each historical traffic data in described step 301, can obtain in the following manner:
According to P (U> μ α/2)=α and preset distributions table, determine region of rejection critical value μ α/2, wherein, α is default insolation level value.Such as, insolation level value α (general setting α is 0.05) is first determined; Then, search preset gaussian distribution table according to U and α, search and make P (U> μ α/2the region of rejection critical value μ that)=α sets up α/2.
Step 302: judge whether described U statistic is greater than its region of rejection critical value, if so, performs step 303, otherwise, perform step 304.
Step 303: determine that described historical traffic data is abnormal.
Step 304: determine that described historical traffic data is normal.
Embodiment three
The present embodiment three is compared with embodiment two with previous embodiment one, and distinctive points is to carry out refinement to the step 104 in the flow process shown in Fig. 1.With reference to figure 4, for the process flow diagram of step 104, wherein, in described step 104, following steps (if typical case's day type is Monday, then performing following steps to the historical traffic data of each Monday in measurement period) are performed to the historical traffic data of the every day meeting typical case's day type date issued:
Step 401: divided according to issuing time by historical traffic data on the same day, obtains historical traffic data sequence.
Wherein, when specific implementation, described step 401 can realize in the following manner:
First, by historical traffic data on the same day, the historical traffic data that issuing time is in same issuing time section is divided in same historical traffic data subsequence.Such as, each issuing time section half an hour or one hour can be set to, 48 or 24 issuing time sections can be divided into the time of one day.As, issuing time is divided into a historical traffic data subsequence at the historical traffic data of 7:00-7:30, issuing time is divided into another historical traffic data subsequence at the historical traffic data of 7:30 to 8:00, by that analogy.
Secondly, from first historical traffic data subsequence, the average μ and the variances sigma that obtain the historical traffic data of adjacent two historical traffic data subsequences successively (namely start with first historical traffic data subsequence, obtain average μ and the variances sigma of the historical traffic data of a jth historical traffic data subsequence and jth+1 historical traffic data subsequence successively, wherein j=j+1, and j is odd number, the initial value of j is 1), wherein x ifor the value of i-th historical traffic data in historical traffic data subsequence, n is the number of historical traffic data in described historical traffic data subsequence.Such as, historical traffic data on the same day comprises historical traffic data 1, historical traffic data 2, historical traffic data 3, historical traffic data 4 ... .., historical traffic data 2k-1 and historical traffic data 2k, then historical traffic data 1 and historical traffic data 2 are carried out combination to obtain historical traffic data and combine 1, historical traffic data 3 and historical traffic data 4 are carried out combination and obtain historical traffic data combination 2, historical traffic data 2k-1 and historical traffic data 2k is carried out combination and obtain historical traffic data combination k, utilize with obtain historical traffic data average μ and the variances sigma of the combination of each historical traffic data successively.
Finally, judge average μ and the variances sigma whether all correspondent equal (namely judging that the average μ of adjacent two historical traffic data subsequences is equal and variances sigma that is these adjacent two historical traffic data subsequences is equal) of adjacent two historical traffic data subsequences, if, then these two adjacent historical traffic data subsequences are merged into same historical traffic data sequence, otherwise, using above-mentioned adjacent two historical traffic data subsequences as historical traffic data sequence.
Step 402: U statistic and the region of rejection critical value thereof of determining each historical traffic data in historical traffic data sequence.
Wherein, when determining the U statistic of each historical traffic data in described step 402, can realize in the following manner:
Utilize obtain the U statistic of each described historical traffic data.
Wherein, U is the U statistic of each historical traffic data in described historical traffic data sequence, x ibe the value of i-th described historical traffic data, n is the number of historical traffic data in described historical traffic data sequence, and μ can pass through obtain, and σ can pass through σ = Σ 1 n ( x i - μ ) / ( n - 1 ) Obtain.
And when determining the region of rejection critical value of each historical traffic data in described step 402, can realize in the following manner:
According to P (U> μ α/2)=α and preset distributions table, determine region of rejection critical value μ α/2, wherein, α is default insolation level value, and { U> μ α/2it is small probability event.Such as, first determine insolation level value α (general setting α be 0.05), then, look into preset gaussian distribution table according to U and α and find and make P (U> μ α/2the region of rejection critical value μ that)=α sets up α/2.
Step 403: judge whether described U statistic is greater than its region of rejection critical value, if so, performs step 404, otherwise, perform step 405.
Step 404: determine that described historical traffic data is abnormal.
Step 405: determine that described historical traffic data is normal.
Embodiment four
The technical scheme that the embodiment of the present invention four provides, compared with previous embodiment one, embodiment two, embodiment three, also comprises step 106 ~ step 107 after step 105, with reference to figure 5, and the process flow diagram for step 106 ~ step 107:
Step 106: the identical and normal historical traffic data in preset same statistical time range according to typical case's day type, utilizes obtain average μ.
Wherein, n is the identical and number of normal historical traffic data in preset same statistical time range of typical case's day type, x iit is the value of i-th described normal historical traffic data.
It should be noted that; described step 106 can be first identical and in historical traffic data in preset same statistical time range in typical case's day type; according to described anomaly data detection result, normal or abnormal historical traffic data is marked; and then obtain normal historical traffic data, and calculate the average μ of normal historical traffic data.
Step 107: the traffic data statistical value described average μ being defined as the described preset statistical time range of the typical day belonging to described typical case's day type.
Wherein, described traffic data statistical value is the identical and characteristic statistic of historical traffic data in preset same statistical time range of described typical case's day type.
Preferably, the embodiment of the present invention, for ease of follow-up when real-time release traffic data, if certain statistical time range disappearance traffic data of certain typical case's day of a certain road, the foundation that statistical time range for this disappearance traffic data provides traffic data to fill up, using described average μ as described one of them characteristic statistic belonging to the described preset statistical time range of the typical day of described typical case's day type.And the following any one or more characteristic statistic of the described preset statistical time range of the typical day belonging to described typical case's day type is calculated according to target data: sample size k, variances sigma and confidence value, wherein: sample size k refers to the identical and number of historical traffic data in preset same statistical time range of typical case's day type, described average μ = ( Σ 1 n x i ) / n , Variance is σ = Σ 1 n ( x i - μ ) / ( n - 1 ) , Confidence level is or n/k, wherein x ifor typical case's day type identical and in same statistical time range i-th normal historical traffic data, n is the identical and number of normal historical traffic data in preset same statistical time range of typical case's day type.
Embodiment five
The technical scheme that the present embodiment five provides, compared with previous embodiment four, also comprises step 108 ~ step 109 after step 107.With reference to figure 6, be step 108 ~ step 109 process flow diagram:
Step 108: for each typical case's day, judges whether the preset statistical time range of described typical case's day lacks traffic data statistical value, if so, performs 109, if not then process ends.
Step 109: according to the previous statistical time range of described statistical time range and the traffic data statistical value of a rear statistical time range, fill up the traffic data statistical value of described statistical time range.
In the embodiment of the present invention, the statistical time range of disappearance traffic data statistical value may be following state 1 or state 2, for improving the validity of data filling, in the embodiment of the present invention, when the statistical time range lacking traffic data statistical value is state 1, perform abovementioned steps 109, when the statistical time range lacking traffic data statistical value is state 2, do not perform abovementioned steps 109:
The statistical time range of state 1, this disappearance traffic data statistical value is not that first statistical time range of described typical case's day neither its last statistical time range.
The statistical time range of state 2, this disappearance traffic data statistical value is first statistical time range or its last statistical time range of typical case's day.
Described step 109 can adopt one-variable linear regression method, fills up the traffic data statistical value of the statistical time range occurring shortage of data.
Embodiment six
The technical scheme that the present embodiment six provides, compared with previous embodiment four, also comprises step 110 ~ step 112 after step 107, compared with previous embodiment five, after step 109, also comprises step 110 ~ 112.With reference to figure 7, the process flow diagram for step 110 ~ step 112:
Step 110: for each typical case's day, judges whether the traffic data statistical value of preset statistical time range of described typical case's day exceeds default threshold range, if so, performs step 111, if not then process ends.
Wherein, described threshold range can pre-set for user.
Step 111: determine that the traffic data statistical value of described statistical time range is sudden change value, and according to the previous statistical time range of this statistical time range and the traffic data statistical value of a rear statistical time range, to the smoothing process of traffic data statistical value of described statistical time range.
In the embodiment of the present invention, traffic data statistical value is the statistical time range of sudden change value may be following state 1 or state 2, preferably, for improving the validity of data smoothing, in the embodiment of the present invention, when the statistical time range that traffic data statistical value is sudden change value is state 1, perform abovementioned steps 111, when the statistical time range that traffic data statistical value is sudden change value is state 2, do not perform abovementioned steps 111:
First statistical time range of state 1, this traffic data statistical value to be the statistical time range of sudden change value be not described typical case's day neither its last statistical time range.
First statistical time range of state 2, this traffic data statistical value to be the statistical time range of sudden change value be typical case's day or its last statistical time range.
Wherein, in step 111, according to the traffic data statistical value of its last statistical time range and a rear statistical time range, the traffic data statistical value of statistical time range described in the method smoothing processing adopting medium filtering.The method of described medium filtering refers to can the nonlinear signal processing technology of effective restraint speckle based on a kind of of sequencing statistical theory, its ultimate principle is that the Mesophyticum of each point value in a neighborhood of this point of value of any in Serial No. is replaced, thus eliminates isolated noise spot.Intermediate value refers to that number in centre of position after all spectra data sorting, if even number, then the arithmetical mean of two numbers in centre is put in fetch bit.
Embodiment seven
With reference to figure 8, be the structural representation of a kind of data detection device that the embodiment of the present application seven provides, described data detection device can comprise:
Data acquisition module 801, for obtaining target data.
Wherein, the traffic historical data of described target data comprises a target road in default measurement period every day, that is, described target data is the data needing to carry out abnormality detection.
Wherein, described default measurement period can be arranged voluntarily by testing staff, and such as can arrange described default measurement period is based on experience value 1 year or one month etc.
Data screening module 802, for according to preset typical day type, from described target data, filters out the historical traffic data meeting described typical case's day type date issued.
First detection module 803, for identical and historical traffic data in preset same statistical time range carries out the first abnormality detection to typical case's day type, obtains the first abnormality detection result.
Wherein, described first detection module 803 carries out the first abnormality detection and is and carries out longitudinal anomaly data detection to described target data, such as, measurement period is one month, typical case's day type is Monday, and the statistical time range preset is respectively 7:00-8:00,12:00-13:00,18:00-19:00, then first detection module 803 carries out the first abnormality detection to the historical traffic data of the 7:00-8:00 of four Mondays of this month, the historical traffic data of the 12:00-13:00 of four Mondays, the 18:00-19:00 historical traffic data of four Mondays respectively.
Second detection module 804, for carrying out the second abnormality detection to the historical traffic data of the every day meeting typical case's day type date issued, obtains the second abnormality detection result.
Wherein, described second detection module 804 carries out the second abnormality detection and is and carries out horizontal anomaly data detection to described target data, such as, anomaly data detection is carried out to the historical traffic data that each typical case in measurement period comprises day, obtains the second horizontal abnormality detection result.If measurement period is one month, typical case's day type is Monday, then the second detection module 804 carries out Data Detection respectively to the historical traffic data that four Mondays are corresponding.
Result determination module 805, for by described first abnormality detection result and described second abnormality detection result, is defined as the anomaly data detection result of described target data.
From such scheme, a kind of data detection device that the embodiment of the present application seven provides, after the target data (target data is the historical traffic data of every day in default measurement period) getting target road, identical and historical traffic data in preset same measurement period carries out the first longitudinal abnormality detection to typical case's day type in target data successively, obtain the first abnormality detection result, second abnormality detection is carried out to the historical traffic data of the every day meeting typical case's day type date issued in target data simultaneously, obtain the second abnormality detection result, thus the anomaly data detection result the first abnormality detection result and the second abnormality detection result are defined as in this target road.The embodiment of the present application seven was analyzing each road before the transport information of each statistical time range of typical case's day according to historical traffic data, a kind of data detection method and device are provided, by the anomaly data detection of the historical traffic data of this road out, to guarantee that the historical traffic data for analyzing typicalness is all the data that comparatively truly can reflect road traffic condition, thus improve precision of analysis.
Embodiment eight
Compared with the data detection device that the data detection device that the embodiment of the present application eight provides and embodiment seven provide, carrying out refinement to the concrete structure of first detection module 803, with reference to figure 9, is the structural representation of the application's first detection module 803, wherein, described first detection module 803 can comprise:
First statistics submodule 831, for determining typical case's day type identical and the U statistic of each historical traffic data in preset statistical time range and region of rejection critical value.
Such as, all Mondays each historical traffic data among statistical time range 7:00-8:00s of target road in measurement period is carried out to the determination of U statistic and region of rejection critical value.
Wherein, the realization of described first statistics submodule 831 can with reference to structure as shown in Figure 10, and wherein, described first statistics submodule 831 can comprise:
U statistic determining unit 1001, for basis determine the U statistic of described historical traffic data.
Wherein, U is the U statistic of described historical traffic data, x ibe the value of i-th described historical traffic data, n is the identical and number that is historical traffic data in preset same statistical time range of typical case's day type, wherein μ = ( Σ i 1 n x i ) / n , σ = Σ 1 n ( x i - μ ) / ( n - 1 ) .
Critical value determining unit 1002, for according to P (U> μ α/2)=α and preset distributions table, determine region of rejection critical value μ α/2, wherein, α is default insolation level value.
Such as, first described critical value determining unit 1002 determines insolation level value α (general setting α be 0.05), then, searches preset gaussian distribution table find and make P (U> μ according to U and α α/2the region of rejection critical value μ that)=α sets up α/2.
First result generates submodule 832, for judging whether described U statistic is greater than its region of rejection critical value, if then determine that described historical traffic data is abnormal, then determines that described historical traffic data is normal if not.
Embodiment nine
The data detection device that embodiment nine provides is compared with embodiment eight with previous embodiment seven, carries out refinement to the structure of the second detection module 804.With reference to Figure 11, be the structural representation of the application second detection module 804, wherein, described second detection module 804 can comprise:
Retrieval submodule 841, for meeting date issued in the historical traffic data of every day of typical case's day type, dividing historical traffic data on the same day according to issuing time, obtaining historical traffic data sequence.
Wherein, the realization of described retrieval submodule 841 can with reference to the structure in such as Figure 12, and wherein, described retrieval submodule 841 can comprise:
Subsequence division unit 1201, for by historical traffic data on the same day, the historical traffic data that issuing time is in same issuing time section is divided in same historical traffic data subsequence.
Such as, such as, each issuing time section half an hour or one hour can be set to, 48 or 24 issuing time sections can be divided into the time of one day.As, issuing time is divided into a historical traffic data subsequence at the historical traffic data of 7:00-7:30, issuing time is divided into another historical traffic data subsequence at the historical traffic data of 7:30 to 8:00, by that analogy.
Subsequence average acquiring unit 1202, for from first historical traffic data subsequence, the average μ and the variances sigma that obtain the historical traffic data of adjacent two historical traffic data subsequences successively (namely start with first historical traffic data subsequence, obtain average μ and the variances sigma of the historical traffic data of a jth historical traffic data subsequence and jth+1 historical traffic data subsequence successively, wherein j=j+1, and j is odd number, the initial value of j is 1), wherein x ifor the value of i-th historical traffic data in historical traffic data subsequence, n is the number of historical traffic data in described historical traffic data subsequence.
Sequence determination unit 1203, average μ and variances sigma for judging adjacent two historical traffic data subsequences whether all equal (namely judging that the average μ of adjacent two historical traffic data subsequences is equal and variances sigma that is these adjacent two historical traffic data subsequences is equal), if, then these two adjacent historical traffic data subsequences are merged into same historical traffic data sequence, otherwise, using above-mentioned adjacent two historical traffic data subsequences as historical traffic data sequence.
Second statistics submodule 842, for determining U statistic and the region of rejection critical value thereof of each historical traffic data in historical traffic data sequence.
Wherein, the realization of described second statistics submodule 842 can with reference to structure as shown in Figure 10, and wherein, described second statistics submodule 842 can comprise:
U statistic determining unit 1001, for basis determine the U statistic of described historical traffic data.
Wherein, U is the U statistic of described historical traffic data, x ibe the value of i-th described historical traffic data, n is the number of the historical traffic data in historical traffic data sequence, wherein μ = ( Σ i 1 n x i ) / n , σ = Σ 1 n ( x i - μ ) / ( n - 1 ) .
Critical value determining unit 1002, for according to P (U> μ α/2)=α and preset distributions table, determine region of rejection critical value μ α/2, wherein, α is default insolation level value.
Such as, first described critical value determining unit 1002 determines insolation level value α (general setting α be 0.05), then searches preset gaussian distribution table according to U and α and finds and make P (U> μ α/2the region of rejection critical value μ that)=α sets up α/2.
Second result generates submodule 843, for judging whether described U statistic is greater than its region of rejection critical value, if so, then determining that described historical traffic data is abnormal, if not, then determining that described historical traffic data is normal.
Embodiment ten
The data detection device that the embodiment of the present invention ten provides and the data detection device that previous embodiment seven ~ embodiment nine provides, also comprise data mean value acquisition module 806 and statistical value determination module 807.With reference to Figure 13, be the structural representation of a kind of data detection device embodiment ten that the application provides, described device can also comprise:
Data mean value acquisition module 806, for identical according to typical case's day type and in preset same statistical time range normal historical traffic data, utilizes obtain average μ.
Wherein, n is the identical and number of normal historical traffic data in preset same statistical time range of typical case's day type, x iit is the value of i-th described normal historical traffic data.
Statistical value determination module 807, for being defined as the traffic data statistical value of the described preset statistical time range of the typical day belonging to described typical case's day type by described average μ.
Embodiment 11
The data detection device that the present embodiment 11 provides compared with previous embodiment ten, also comprise disappearance judge module 808 and statistical value fill up module 809.With reference to Figure 14, be the structural representation of a kind of data detection device that the present embodiment 11 provides, wherein, described device can also comprise:
Disappearance judge module 808, for each typical case's day, judges whether the preset statistical time range of described typical case's day lacks traffic data statistical value, if so, triggers statistical value and fill up module 809;
Statistical value fills up module 809, for according to the previous statistical time range of described statistical time range and the traffic data statistical value of a rear statistical time range, fills up the traffic data statistical value of described statistical time range.
In the embodiment of the present invention, the statistical time range of disappearance traffic data statistical value may be following state 1 or state 2, for improving the validity of data filling, in the embodiment of the present invention, when the statistical time range lacking traffic data statistical value is state 1, trigger statistical value and fill up module 809, when the statistical time range lacking traffic data statistical value is state 2, does not trigger statistical value and fill up module 809:
The statistical time range of state 1, this disappearance traffic data statistical value is not that first statistical time range of described typical case's day neither its last statistical time range.
The statistical time range of state 2, this disappearance traffic data statistical value is first statistical time range or its last statistical time range of typical case's day.
Statistical value fills up module 809 can adopt one-variable linear regression method, fills up the traffic data statistical value of the statistical time range occurring shortage of data.
Embodiment 12
The data detection device that the present embodiment 12 provides, compared with previous embodiment ten, embodiment 11, also comprises scope judge module 810 and statistical value Leveling Block 811.With reference to Figure 15, be the structural representation of a kind of data detection device that the embodiment of the present application 12 provides, this device can be on the device shown in Figure 13 or Figure 14, also comprise scope judge module 810 and statistical value Leveling Block 811:
Scope judge module 810, for for each typical case's day, judges whether the traffic data statistical value of the preset statistical time range of described typical case's day exceeds default threshold range, if so, triggers statistical value Leveling Block 811;
Statistical value Leveling Block 811, for determining that the traffic data statistical value of described statistical time range is sudden change value, according to the previous statistical time range of this statistical time range and the traffic data statistical value of a rear statistical time range, to the smoothing process of traffic data statistical value of described statistical time range.
In the embodiment of the present invention, traffic data statistical value is the statistical time range of sudden change value may be following state 1 or state 2, preferably, for improving the validity of data smoothing, in the embodiment of the present invention, when the statistical time range that traffic data statistical value is sudden change value is state 1, trigger statistical value Leveling Block 811, when the statistical time range that traffic data statistical value is sudden change value is state 2, do not trigger statistical value Leveling Block 811:
First statistical time range of state 1, this traffic data statistical value to be the statistical time range of sudden change value be not described typical case's day neither its last statistical time range.
First statistical time range of state 2, this traffic data statistical value to be the statistical time range of sudden change value be typical case's day or its last statistical time range.
Wherein, statistical value Leveling Block 811 according to the traffic data statistical value of its last statistical time range and a rear statistical time range, the traffic data statistical value of statistical time range described in the method smoothing processing adopting medium filtering.The method of described medium filtering refers to can the nonlinear signal processing technology of effective restraint speckle based on a kind of of sequencing statistical theory, its ultimate principle is that the Mesophyticum of each point value in a neighborhood of this point of value of any in Serial No. is replaced, thus eliminates isolated noise spot.Intermediate value refers to that number in centre of position after all spectra data sorting, if even number, then the arithmetical mean of two numbers in centre is put in fetch bit.
It should be noted that, each embodiment in this instructions all adopts the mode of going forward one by one to describe, and what each embodiment stressed is the difference with other embodiments, between each embodiment identical similar part mutually see.
Finally, also it should be noted that, in this article, the such as relational terms of first and second grades and so on is only used for an entity or operation to separate with another entity or operational zone, and not necessarily requires or imply the relation that there is any this reality between these entities or operation or sequentially.And, term " comprises ", " comprising " or its any other variant are intended to contain comprising of nonexcludability, thus make to comprise the process of a series of key element, method, article or equipment and not only comprise those key elements, but also comprise other key elements clearly do not listed, or also comprise by the intrinsic key element of this process, method, article or equipment.When not more restrictions, the key element limited by statement " comprising ... ", and be not precluded within process, method, article or the equipment comprising described key element and also there is other identical element.
A kind of data detection method provided the application above and device are described in detail, apply specific case herein to set forth the principle of the application and embodiment, the explanation of above embodiment is just for helping method and the core concept thereof of understanding the application; Meanwhile, for one of ordinary skill in the art, according to the thought of the application, all will change in specific embodiments and applications, in sum, this description should not be construed as the restriction to the application.

Claims (16)

1. a data detection method, is characterized in that, comprising:
Obtain target data, the historical traffic data of described target data comprises a target road in default measurement period every day;
According to preset typical day type, from described target data, filter out the historical traffic data meeting described typical case's day type date issued;
Identical and historical traffic data in preset same statistical time range carries out the first abnormality detection to typical case's day type, obtain the first abnormality detection result;
Second abnormality detection is carried out to the historical traffic data of the every day meeting typical case's day type date issued, obtains the second abnormality detection result;
By described first abnormality detection result and described second abnormality detection result, be defined as the anomaly data detection result of described target data.
2. method according to claim 1, is characterized in that, described identical and historical traffic data in preset same statistical time range carries out the first abnormality detection to typical case's day type, obtains the first abnormality detection result, comprising:
Determine typical case's day type identical and the U statistic of each historical traffic data in preset same statistical time range and region of rejection critical value;
Judge whether described U statistic is greater than its region of rejection critical value, if so, then determine that described historical traffic data is abnormal, otherwise, determine that described historical traffic data is normal.
3. method according to claim 1, is characterized in that, the historical traffic data of described every day to meeting typical case's day type date issued carries out the second abnormality detection, obtains the second abnormality detection result, comprising:
Following steps are performed to the historical traffic data of the every day meeting typical case's day type date issued:
Historical traffic data is on the same day divided according to issuing time, obtains historical traffic data sequence;
Determine U statistic and the region of rejection critical value thereof of each historical traffic data in historical traffic data sequence;
Judge whether described U statistic is greater than its region of rejection critical value, if so, then determine that described historical traffic data is abnormal, otherwise, determine that described historical traffic data is normal.
4. method according to claim 3, is characterized in that, is divided by historical traffic data on the same day according to issuing time, obtains historical traffic data sequence, comprising:
By in historical traffic data on the same day, the historical traffic data that issuing time is in same issuing time section is divided in same historical traffic data subsequence;
From first historical traffic data subsequence, obtain historical traffic data average μ and the variances sigma of adjacent two historical traffic data subsequences successively, wherein, x ifor the value of i-th historical traffic data in historical traffic data subsequence, n is the number of historical traffic data value in described historical traffic data subsequence;
Judge average μ and the variances sigma whether all correspondent equal of adjacent two historical traffic data subsequences, if, described two historical traffic data subsequences are merged as a historical traffic data sequence, otherwise, using above-mentioned two historical traffic data subsequences as historical traffic data sequence.
5. according to the method in claim 2 or 3, it is characterized in that, determine the U statistic of historical traffic data, comprising:
According to determine the U statistic of described historical traffic data;
Wherein, U is the U statistic of described historical traffic data, x ibe the value of i-th described historical traffic data, n is the identical and number that is historical traffic data in preset same statistical time range of typical case's day type, or n is the number of the historical traffic data in data sequence, wherein σ = Σ 1 n ( x i - μ ) / ( n - 1 ) ;
Wherein, determine the region of rejection critical value of described historical traffic data, comprising:
According to P (U> μ α/2)=α and preset distributions table, determine region of rejection critical value μ α/2, wherein, α is default insolation level value.
6. method according to claim 1, is characterized in that, by described first abnormality detection result and described second abnormality detection result, after being defined as the anomaly data detection result of described target data, described method also comprises:
Identical and the normal historical traffic data in preset same statistical time range according to typical case's day type, utilizes obtain average μ;
Wherein, n is the identical and value number of normal historical traffic data in preset same statistical time range of typical case's day type, x iit is the value of i-th described normal historical traffic data;
Described average μ is defined as the traffic data statistical value of the described preset statistical time range of the typical day belonging to described typical case's day type.
7. method according to claim 6, is characterized in that, after the traffic data statistical value of described preset statistical time range described average μ being defined as the typical day belonging to described typical case's day type, described method also comprises:
For each typical case's day, judge whether the preset statistical time range of described typical case's day lacks traffic data statistical value;
When the preset statistical time range disappearance traffic data statistical value of described typical case's day, according to the previous statistical time range of described statistical time range and the traffic data statistical value of a rear statistical time range, fill up the traffic data statistical value of described statistical time range.
8. method according to claim 6, is characterized in that, after the traffic data statistical value of described preset statistical time range described average μ being defined as the typical day belonging to described typical case's day type, described method also comprises:
For each typical case's day, judge whether the traffic data statistical value of the preset statistical time range of described typical case's day exceeds default threshold range;
When the traffic data statistical value of the preset statistical time range of described typical case's day exceeds default threshold range, determine that the traffic data statistical value of described statistical time range is sudden change value, according to the previous statistical time range of this statistical time range and the traffic data statistical value of a rear statistical time range, to the smoothing process of traffic data statistical value of described statistical time range.
9. a data detection device, is characterized in that, comprising:
Data acquisition module, for obtaining target data, the historical traffic data of described target data comprises a target road in default measurement period every day;
Data screening module, for according to preset typical day type, from described target data, filters out the historical traffic data meeting described typical case's day type date issued;
First detection module, for identical and historical traffic data in preset same statistical time range carries out the first abnormality detection to typical case's day type, obtains the first abnormality detection result;
Second detection module, for carrying out the second abnormality detection to the historical traffic data of the every day meeting typical case's day type date issued, obtains the second abnormality detection result;
Result determination module, for by described first abnormality detection result and described second abnormality detection result, is defined as the anomaly data detection result of described target data.
10. device according to claim 9, is characterized in that, described first detection module comprises:
First statistics submodule, for determining typical case's day type identical and the U statistic of each historical traffic data in preset same statistical time range and region of rejection critical value;
First result generates submodule, for judging whether described U statistic is greater than its region of rejection critical value, if so, then determines that described historical traffic data is abnormal, otherwise, determine that described historical traffic data is normal.
11. devices according to claim 9, is characterized in that, described second detection module comprises:
Retrieval submodule, for meeting date issued in the historical traffic data of every day of typical case's day type, dividing historical traffic data on the same day according to issuing time, obtaining historical traffic data sequence;
Second statistics submodule, for determining U statistic and the region of rejection critical value thereof of each historical traffic data in historical traffic data sequence;
Second result generates submodule, for judging whether described U statistic is greater than its region of rejection critical value, if so, then determines that described historical traffic data is abnormal, otherwise, determine that described historical traffic data is normal.
12. devices according to claim 11, is characterized in that, described retrieval submodule comprises:
Subsequence division unit, for by historical traffic data on the same day, the historical traffic data that issuing time is in same issuing time section is divided in same historical traffic data subsequence;
Subsequence average acquiring unit, for from first historical traffic data subsequence, obtains historical traffic data average μ and the variances sigma of adjacent two historical traffic data subsequences successively, wherein, x ifor the value of i-th historical traffic data in historical traffic data subsequence, n is the number of historical traffic data value in described historical traffic data subsequence;
Sequence determination unit, whether average μ and variances sigma for judging adjacent two historical traffic data subsequences be all equal, if, described two historical traffic data subsequences are merged as a historical traffic data sequence, otherwise, using above-mentioned two historical traffic data subsequences as historical traffic data sequence.
13. devices according to claim 10 or 11, is characterized in that, described first statistics submodule or described second statistics submodule, comprising:
U statistic determining unit, for basis determine the U statistic of described historical traffic data;
Wherein, U is the U statistic of described historical traffic data, x ibe the value of i-th described historical traffic data, n is the identical and number that is historical traffic data in preset same statistical time range of typical case's day type, or n is the number of the historical traffic data in data sequence, wherein σ = Σ 1 n ( x i - μ ) / ( n - 1 ) ;
Critical value determining unit, for according to P (U> μ α/2)=α and preset distributions table, determine region of rejection critical value μ α/2, wherein, α is default insolation level value.
14. devices according to claim 9, is characterized in that, also comprise:
Data mean value acquisition module, for identical according to typical case's day type and in preset same statistical time range normal historical traffic data, utilizes obtain average μ;
Wherein, n is the identical and value number of normal historical traffic data in preset same statistical time range of typical case's day type, x iit is the value of i-th described normal historical traffic data;
Statistical value determination module, for being defined as the traffic data statistical value of the described preset statistical time range of the typical day belonging to described typical case's day type by described average μ.
15. devices according to claim 14, is characterized in that, also comprise:
Disappearance judge module, for each typical case's day, judges whether the preset statistical time range of described typical case's day lacks traffic data statistical value, if so, triggers statistical value and fill up module;
Statistical value fills up module, for according to the previous statistical time range of described statistical time range and the traffic data statistical value of a rear statistical time range, fills up the traffic data statistical value of described statistical time range.
16. devices according to claim 14, is characterized in that, also comprise:
Scope judge module, for for each typical case's day, judges whether the traffic data statistical value of the preset statistical time range of described typical case's day exceeds default threshold range, if so, triggers statistical value Leveling Block;
Statistical value Leveling Block, for determining that the traffic data statistical value of described statistical time range is sudden change value, according to the previous statistical time range of this statistical time range and the traffic data statistical value of a rear statistical time range, to the smoothing process of traffic data statistical value of described statistical time range.
CN201310629648.0A 2013-11-29 2013-11-29 A kind of data detection method and device Active CN104679970B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310629648.0A CN104679970B (en) 2013-11-29 2013-11-29 A kind of data detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310629648.0A CN104679970B (en) 2013-11-29 2013-11-29 A kind of data detection method and device

Publications (2)

Publication Number Publication Date
CN104679970A true CN104679970A (en) 2015-06-03
CN104679970B CN104679970B (en) 2018-11-09

Family

ID=53315008

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310629648.0A Active CN104679970B (en) 2013-11-29 2013-11-29 A kind of data detection method and device

Country Status (1)

Country Link
CN (1) CN104679970B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105718715A (en) * 2015-12-23 2016-06-29 华为技术有限公司 Anomaly detection method and device
CN106295683A (en) * 2016-08-01 2017-01-04 上海理工大学 A kind of outlier detection method of time series data based on sharpness
CN106452931A (en) * 2016-12-27 2017-02-22 中国建设银行股份有限公司 Monitoring index, domain value discovery method, domain value adjusting method and automatic monitoring system
CN108520430A (en) * 2018-03-23 2018-09-11 西安艾润物联网技术服务有限责任公司 Car park payment exception analysis method, equipment and computer readable storage medium
CN108880841A (en) * 2017-05-11 2018-11-23 上海宏时数据系统有限公司 A kind of threshold values setting, abnormality detection system and the method for service monitoring system
CN108961761A (en) * 2018-08-14 2018-12-07 百度在线网络技术(北京)有限公司 Method and apparatus for generating information
CN109270898A (en) * 2018-08-30 2019-01-25 大连理工大学 A kind of building energy consumption data collector with quality of data diagnosis and repair function
CN115576502A (en) * 2022-12-07 2023-01-06 苏州浪潮智能科技有限公司 Data storage method and device, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101694747A (en) * 2009-08-25 2010-04-14 北京世纪高通科技有限公司 Method and device for indentifying abnormal vehicle speed
CN101794345A (en) * 2009-12-30 2010-08-04 北京世纪高通科技有限公司 Data processing method and device
CN101814112A (en) * 2010-01-11 2010-08-25 北京世纪高通科技有限公司 Method and device for processing data
CN101950477A (en) * 2010-08-23 2011-01-19 北京世纪高通科技有限公司 Method and device for processing traffic information

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101694747A (en) * 2009-08-25 2010-04-14 北京世纪高通科技有限公司 Method and device for indentifying abnormal vehicle speed
CN101794345A (en) * 2009-12-30 2010-08-04 北京世纪高通科技有限公司 Data processing method and device
CN101814112A (en) * 2010-01-11 2010-08-25 北京世纪高通科技有限公司 Method and device for processing data
CN101950477A (en) * 2010-08-23 2011-01-19 北京世纪高通科技有限公司 Method and device for processing traffic information

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
陈淑燕等: "服务于智能交通系统的离群交通数据识别", 《东南大学学报(自然科学版)》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105718715A (en) * 2015-12-23 2016-06-29 华为技术有限公司 Anomaly detection method and device
CN106295683A (en) * 2016-08-01 2017-01-04 上海理工大学 A kind of outlier detection method of time series data based on sharpness
CN106452931A (en) * 2016-12-27 2017-02-22 中国建设银行股份有限公司 Monitoring index, domain value discovery method, domain value adjusting method and automatic monitoring system
CN106452931B (en) * 2016-12-27 2019-09-17 中国建设银行股份有限公司 Monitor control index and thresholding discovery method, thresholding method of adjustment and automatic monitored control system
CN108880841A (en) * 2017-05-11 2018-11-23 上海宏时数据系统有限公司 A kind of threshold values setting, abnormality detection system and the method for service monitoring system
CN108520430A (en) * 2018-03-23 2018-09-11 西安艾润物联网技术服务有限责任公司 Car park payment exception analysis method, equipment and computer readable storage medium
CN108961761A (en) * 2018-08-14 2018-12-07 百度在线网络技术(北京)有限公司 Method and apparatus for generating information
CN109270898A (en) * 2018-08-30 2019-01-25 大连理工大学 A kind of building energy consumption data collector with quality of data diagnosis and repair function
CN115576502A (en) * 2022-12-07 2023-01-06 苏州浪潮智能科技有限公司 Data storage method and device, electronic equipment and storage medium
WO2024119746A1 (en) * 2022-12-07 2024-06-13 苏州元脑智能科技有限公司 Data storage method and apparatus, electronic device and storage medium

Also Published As

Publication number Publication date
CN104679970B (en) 2018-11-09

Similar Documents

Publication Publication Date Title
CN104679970A (en) Data detection method and device
CN102169630B (en) Quality control method of road continuous traffic flow data
CN106503840A (en) Parking stall Forecasting Methodology and system can be used in parking lot
CN116013087B (en) Traffic flow statistical method based on urban moving vehicle detection
CN107085943B (en) Short-term prediction method and system for road travel time
Stahl et al. The challenges of hydrological drought definition, quantification and communication: an interdisciplinary perspective
CN103413443A (en) Short-term traffic flow forecasting method based on hidden Markov model
CN103646167A (en) Satellite abnormal condition detection system based on telemeasuring data
CN103793599A (en) Travel anomaly detection method based on hidden Markov model
CN105448092A (en) Analysis method and apparatus of associated vehicles
CN111291216B (en) Method and system for analyzing foothold based on face structured data
CN103971519A (en) System and method of using traffic conflicts for judging accident-prone sections
Habtemichael et al. Incident-induced delays on freeways: quantification method by grouping similar traffic patterns
CN110110339A (en) A kind of hydrologic forecast error calibration method and system a few days ago
CN111680888B (en) Method for determining road network capacity based on RFID data
CN116758707B (en) Geological disaster monitoring system and method based on big data
Tang et al. On missing traffic data imputation based on fuzzy C-means method by considering spatial–temporal correlation
CN116309610B (en) Vehicle management method and system based on artificial intelligence
CN111145535B (en) Travel time reliability distribution prediction method under complex scene
CN108596381B (en) Urban parking demand prediction method based on OD data
Taniarza et al. Anomalous trajectory detection from taxi GPS traces using combination of iBAT and DTW
CN113470376B (en) Real-time regional in-transit vehicle counting method and system based on bayonet vehicle passing data
CN114407661A (en) Data-driven electric vehicle energy consumption prediction method, system, device and medium
CN115222936A (en) Expired interest point determining method and device, electronic equipment and storage medium
CN114565155A (en) Road safety testing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20200518

Address after: 310052 room 508, floor 5, building 4, No. 699, Wangshang Road, Changhe street, Binjiang District, Hangzhou City, Zhejiang Province

Patentee after: Alibaba (China) Co.,Ltd.

Address before: 102200, No. 8, No., Changsheng Road, Changping District science and Technology Park, Beijing, China. 1-5

Patentee before: AUTONAVI SOFTWARE Co.,Ltd.

TR01 Transfer of patent right