CN100535955C - Method for recognizing outlier traffic data - Google Patents

Method for recognizing outlier traffic data Download PDF

Info

Publication number
CN100535955C
CN100535955C CNB2008100247009A CN200810024700A CN100535955C CN 100535955 C CN100535955 C CN 100535955C CN B2008100247009 A CNB2008100247009 A CN B2008100247009A CN 200810024700 A CN200810024700 A CN 200810024700A CN 100535955 C CN100535955 C CN 100535955C
Authority
CN
China
Prior art keywords
data
factor
outlier
reach
distance
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CNB2008100247009A
Other languages
Chinese (zh)
Other versions
CN101246645A (en
Inventor
陈淑燕
王炜
瞿高峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CNB2008100247009A priority Critical patent/CN100535955C/en
Publication of CN101246645A publication Critical patent/CN101246645A/en
Application granted granted Critical
Publication of CN100535955C publication Critical patent/CN100535955C/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The present invention provides a method of identifying outlier traffic data, characterized in that the method firstly collects the traffic data, calculating data average local outlier factor, then judging the outlier data by one of the following two norms: the maximal m data of the average local outlier factor is outlier data, or the average local outlier factor more than the preset threshold is outlier data, finally the identified outlier factor is corrected by deleting or adopting filter method, or analyzing the concealed message contained in the outlier data. The method effectivily detect outlier data of boundary and interior, the effect precedes the outlier detecting method based on the statistics.

Description

A kind of method of discerning outlier traffic data
Technical field
The present invention proposes a kind of method of discerning outlier traffic data, relates to the quality control of the traffic data that intelligent transportation system gathers, and belongs to intelligent information processing technology field in the intelligent transportation system.
Background technology
Traffic data occupies an important position in intelligent transportation system, one of core technology of intelligent transportation system (ITS) is the real-time estimation and the forecasting techniques of traffic parameter, because sampling distortion, measuring error, burst traffic events and other influence factor that may exist, the traffic data of gathering is concentrated the sample that exists the general behavior of not following data model usually, and these abnormity point are the data that peel off.When the traffic data of gathering was used for modeling, these abnormity point did not possess representativeness, effectively modeling and descriptive system.For accuracy and the reliability that improves dynamic information, guarantee the result of use of traffic model, at first need abnormal data is discerned and handled accordingly.
At present, mainly discern the data that peel off based on statistical method in the traffic engineering field, this method is calculated simple, but its application need is known the distribution of data in advance, this is often difficult, and real data does not often meet the mathematical distribution of any perfect condition yet.In addition, based on the statistics the detection algorithm that peels off only be suitable for excavating univariate numeric type data mostly, to higher-dimension, periodic data, grouped data then difficulty discern, this has limited its application.
In order to overcome the defective of said method, the present invention proposes to use the unusual traffic data of outlier data digging algorithm identified based on density.
Summary of the invention
Technical matters: unusual traffic data can make the main points of institute's established model thicken, the essence that can not reflect real system, the invention provides a kind of method of the unusual traffic data of identification based on density, this method can effectively detect border and the inner data that peel off, and its effect is better than the detection method that peels off based on statistics.
Technical scheme: the method for identification outlier traffic data of the present invention is the average part of the computational data factor that peels off at first, then use one of following two criterions to judge outlier: average part m the highest data of the factor that peel off are outlier, or on average the part factor that peels off is an outlier greater than the data of given threshold value.
The average part of the described computational data factor method that peels off is, under a certain natural number k value, the k-part of calculating each data factor that peels off, with certain step-size change k value, the k-part of each data of the double counting factor that peels off, then by on average obtaining the average part of each data factor that peels off, its computing method are:
lof ( p ) = Σ k lof k ( p ) k 2 - k 1 s + 1 - - 1
Wherein, k 1And k 2Be respectively the bound of k, k 1Be to be not less than 10 natural number, s is a step-length, lof k(p) for appointing-the k-part of the data p factor that peels off.
The k-part of each data factor computing method that peel off are, the k-part of all data can reach the ratio that density mean value and the k-part of p can reach density in the k-neighborhood of p, that is:
lof k ( p ) = Σ o ∈ N k ( p ) lrd k ( o ) lrd k | N k ( p ) | - - 2
Wherein, k is a natural number, N k(p) be the k-neighborhood of data p, | N k(p) | be the element number that this neighborhood contains; The k-neighborhood of p is made up of the data that the distance between all and the p is not more than the k-distance of p, and the k-distance of p is data p and the distance between its k nearest data; Lrd k(p) for the k-part of p can reach density, o is the interior arbitrary data of the k-neighborhood of p, lrd k(o) can reach density for the k-part of o.
The k-part of arbitrary data p can reach density and be these data and its k-inverse apart from the average reach distance of neighborhood, and its computing method are:
lrd k ( p ) = 1 / Σ o ∈ N k ( p ) reach _ dis p k ( p , o ) | N k ( p ) | - - 3
Reach_disp k(p o) is the reach distance that p arrives arbitrary data o in its k-neighborhood, and p is the higher value of distance between the k-distance of o and p and the o with respect to the reach distance of o, that is:
reach_disp k(p,o)=max{k_distance(o),d(p,o)} --4。
Beneficial effect: in the method, the degree that peels off of a point is relevant with the point around it, and this has embodied the notion of " part ", and this is it and the definition difference that in the past peeled off, and also is the advantage place.In addition, use the mean value of the local factor that peels off to judge outlier, make testing result more stable, the variation with parameter k value does not have than cataclysm.Can find the local outlier of other method omission its distinctive feature is arranged based on the mining algorithm that peels off of density, have better application to be worth.
Description of drawings
Fig. 1 is a flow chart of steps of the present invention.Wherein have: k minimum value k Min, k maximal value k Max, k changes step-length k Step
Fig. 2 is traffic flow arrival rate and the density relationship and the data that peel off thereof,
Fig. 3 is the surface evenness test data and the data that peel off.
Embodiment
Specify the working of an invention mode below with reference to the accompanying drawings.Step is as follows:
1. the data acquisition equipment in the utilization intelligent transportation system obtains traffic data as vehicle detection coil, video detector, moving vehicle, radar, ultrasound wave etc., and as the speed of a motor vehicle, vehicle flowrate, occupation rate, hourage etc., establishing institute's image data collection is D;
2. the k-part of calculating each data among the D can reach density;
A given natural number k calculates the k-distance (k-distance (p)) of each data p, its value for p and between its k nearest neighbours o ∈ D apart from d (p, o),
The k neighborhood of p is defined as
N k(p)={q∈D\{p}|d(p,q)≤k_distance(p)} (1)
P with respect to the reach distance of o is
reach_disp k(p,o)=max{k_distance(o),d(p,o)} (2)
It is data p and its k-inverse apart from the average reach distance of neighborhood that the k-part of p can reach density,
lrd k ( p ) = 1 / Σ o ∈ N k ( p ) reach _ dis p k ( o ) | N k ( p ) | - - - ( 3 )
3. the k-part of the computational data factor that peels off then;
The k-part of the p factor that peels off is defined as
lof k ( p ) = Σ o ∈ N k ( p ) lrd k ( o ) lrd k ( p ) | N k ( p ) | - - - ( 4 )
4. with certain step-size change k value, repeating step 2 and 3, the k-part of calculating each data factor that peels off.The part factor that peels off has illustrated that the degree that peels off of data, the part of the data factor that peels off is big more, and it might be the data that peel off more.
5. the average part of calculating each data factor that peels off is to eliminate the influence of parameter k to testing result.
6. judge outlier based on the average part factor that peels off.Can use following two criterions: average part m the highest data of the factor that peel off are outlier, or on average the part factor that peels off all is an outlier greater than the data of given threshold value.
7. the deletion or the data that peel off that adopt the filtering technique correction to be identified are perhaps analyzed these data that peel off and are obtained and hide Info.
Embodiment 1: the traffic flow modeling
The traffic flow of highway is described with average speed, arrival rate, density usually, and the relation between arrival rate and the density can be described with figure, is referred to as the traffic flow fundamental figure.Checkout equipment or the transmission equipment traffic events of makeing mistakes, happen suddenly all may make traffic flow data generation abnormal change, no matter be sample error or the data that peel off that unusual traffic events produced, the aspect of model will be thickened, really the inherent law of reactive system.Therefore, before setting up model, need to find out and remove the data that peel off,, improve the accuracy and the reliability of institute's established model with the peel off influence of data of minimizing.
Now collect 709 of the traffic flow datas that the commonplace mouthful of airport expressway in Nanjing arranged, the sampling period is 1 minute, plans to build the model between upright arrival rate and the density.Utilization is sought special sample based on the detection method LOF of density, makes k=20, calculates the local anomaly factor of all samples.With 10 is that step-length increases the k value, and the local anomaly factor of all samples of double counting is until k=150.Then, calculate the average local anomaly factor of all samples, be averaged 12 the highest data of the local anomaly factor and be the data that peel off.Fig. 2 is institute's traffic flow arrival rate and density relationship basic scheme, wherein adds the data that peel off that are of circle.As can be seen, border and the inner data that peel off all effectively detect.
Delete the above-mentioned data that peel off, then adopt the data set that does not contain the data that peel off to set up the traffic flow model of highway.
Embodiment 2: the surface evenness Test Application
Surface evenness is an important indicator of road surface function, and it has not only reflected the driving comfort on road surface, also reflects the health status on road surface from the side.International roughness index IRI (International RoughnessIndex) is extensively adopted by countries in the world, is defined as the total displacement (m of unit) of basic body suspension and the ratio of operating range (km of unit), and unit is m/km.8000 in existing IRI sample, data acquisition every one meter once, test gained with Australian import surface evenness test carriage.
Utilization is sought special sample based on the detection method LOF of density, makes k since 50 for initial value, increases with step-length 10, calculates the local anomaly factor of all samples.Then obtain the average local anomaly factor of all samples.Here, suppose the average local anomaly factor greater than 1.8 all be the data that peel off, 28 the strongest points of degree of then finding to peel off as shown in Figure 3, wherein add the data that peel off that are of circle.
Compare with other check points, these peel off this place's pavement roughness of data declaration or road surfaces are damaged more serious, also may be sampling error or noise.For each detected exceptional value, need manually to participate in, at that time road surface, checkout equipment etc. are further analyzed, correctly distinguish the reason of its generation.

Claims (4)

1. method of discerning outlier traffic data, it is characterized in that this method is at first gathered obtains traffic data, the average part of the computational data factor that peels off, then use one of following two criterions to judge to peel off data: average part m the highest data of the factor that peel off are to peel off data, or the average part factor that peels off is the data that peel off greater than the data of given threshold value, deletion at last or the data that peel off that adopt the filtering method correction to be identified are perhaps analyzed the hiding Info that data comprised that peel off.
2. a kind of method of discerning outlier traffic data according to claim 1, the average part that the it is characterized in that described computational data factor method that peels off is, under a certain natural number k value, the k-part of calculating each data factor that peels off, with certain step-size change k value, the k-part of each data of the double counting factor that peels off, then by on average obtaining the average part of each data factor that peels off, its computing method are:
lof ( p ) = Σ k lof k ( p ) k 2 - k 1 s + 1 - - 1
Wherein, k 1And k 2Be respectively the bound of k, k 1Be to be not less than 10 natural number, s is a step-length, lof k(p) be the k-part of the arbitrary data p factor that peels off.
3. a kind of method of discerning outlier traffic data according to claim 2, the k-part that it is characterized in that each data factor computing method that peel off are, the k-part of all data can reach the ratio that density mean value and the k-part of p can reach density in the k-neighborhood of p, that is:
lof k ( p ) = Σ o ∈ N k ( p ) lrd k ( o ) lrd k ( p ) | N k ( p ) | - - 2
Wherein, k is a natural number, N k(p) be the k-neighborhood of data p, | N k(p) | be the element number that this neighborhood contains; The k-neighborhood of p is made up of the data that the distance between all and the p is not more than the k-distance of p, and the k-distance of p is data p and the distance between its k nearest data; Lrd k(p) for the k-part of p can reach density, o is the interior arbitrary data of the k-neighborhood of p, lrd k(o) can reach density for the k-part of o.
4. a kind of method of discerning outlier traffic data according to claim 3, the k-part that it is characterized in that arbitrary data p can reach density and be these data and its k-inverse apart from the average reach distance of neighborhood, and its computing method are:
lrd k ( p ) = 1 / Σ o ∈ N k ( p ) reach _ disp k ( p , o ) | N k ( p ) | - - 3
Reach_disp k(p o) is the reach distance that p arrives arbitrary data o in its k-neighborhood, and its value is the higher value of distance between the k-distance of o and p and the o, that is:
reach_disp k(p,o)=max{k_distance(o),d(p,o)} --4。
CNB2008100247009A 2008-04-01 2008-04-01 Method for recognizing outlier traffic data Expired - Fee Related CN100535955C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB2008100247009A CN100535955C (en) 2008-04-01 2008-04-01 Method for recognizing outlier traffic data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB2008100247009A CN100535955C (en) 2008-04-01 2008-04-01 Method for recognizing outlier traffic data

Publications (2)

Publication Number Publication Date
CN101246645A CN101246645A (en) 2008-08-20
CN100535955C true CN100535955C (en) 2009-09-02

Family

ID=39947074

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2008100247009A Expired - Fee Related CN100535955C (en) 2008-04-01 2008-04-01 Method for recognizing outlier traffic data

Country Status (1)

Country Link
CN (1) CN100535955C (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104504901A (en) * 2014-12-29 2015-04-08 浙江银江研究院有限公司 Multidimensional data based detecting method of traffic abnormal spots

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101866551B (en) * 2010-06-02 2012-05-09 北京世纪高通科技有限公司 Processing method and processing device of traffic flow information
CN101950483B (en) * 2010-09-15 2013-03-20 青岛海信网络科技股份有限公司 Repairing method and device for traffic data fault
CN104317908B (en) * 2014-10-28 2018-08-17 河南师范大学 Outlier detection method based on three decisions and distance
CN104376078A (en) * 2014-11-14 2015-02-25 南京大学 Abnormal data detection method based on knowledge entropy
CN104462802A (en) * 2014-11-26 2015-03-25 浪潮电子信息产业股份有限公司 Method for analyzing outlier data in large-scale data
CN104408116A (en) * 2014-11-26 2015-03-11 浪潮电子信息产业股份有限公司 Method for detecting outlier data from large-scale high dimensional data based on genetic algorithm
CN104951893A (en) * 2015-06-24 2015-09-30 银江股份有限公司 Urban-traffic-oriented method for evaluating road police alarm handling efficiency of traffic polices
CN106649339A (en) * 2015-10-30 2017-05-10 北大方正集团有限公司 Method and device for mining outlier
CN106910334B (en) * 2015-12-22 2019-12-24 阿里巴巴集团控股有限公司 Method and device for predicting road section conditions based on big data
CN107146409B (en) * 2017-06-01 2019-11-19 东方网力科技股份有限公司 The identification of equipment detection time exception and true time difference evaluation method in road network
CN107941537B (en) * 2017-10-25 2019-08-27 南京航空航天大学 A kind of mechanical equipment health state evaluation method
CN109086291B (en) * 2018-06-09 2022-07-12 西安电子科技大学 Parallel anomaly detection method and system based on MapReduce
CN109308395B (en) * 2018-09-30 2022-12-02 西安电子科技大学 Wafer-level space measurement parameter anomaly identification method based on LOF-KNN algorithm
CN109814022A (en) * 2019-01-02 2019-05-28 浙江大学 A kind of chip aging test data processing method
CN110207827B (en) * 2019-05-23 2020-05-08 浙江大学 Electrical equipment temperature real-time early warning method based on abnormal factor extraction
CN116612641B (en) * 2023-07-19 2023-09-22 天津中德应用技术大学 Vehicle queue control data processing method based on intelligent network connection

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于离群指数的时序数据离群挖掘. 郑斌详,席裕庚,杜秀华.自动化学报,第30卷第1期. 2004
基于离群指数的时序数据离群挖掘. 郑斌详,席裕庚,杜秀华.自动化学报,第30卷第1期. 2004 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104504901A (en) * 2014-12-29 2015-04-08 浙江银江研究院有限公司 Multidimensional data based detecting method of traffic abnormal spots
CN104504901B (en) * 2014-12-29 2016-06-08 浙江银江研究院有限公司 A kind of traffic abnormity point detecting method based on multidimensional data

Also Published As

Publication number Publication date
CN101246645A (en) 2008-08-20

Similar Documents

Publication Publication Date Title
CN100535955C (en) Method for recognizing outlier traffic data
CN111623722B (en) Multi-sensor-based slope deformation three-dimensional monitoring system and method
CN102087788B (en) Method for estimating traffic state parameter based on confidence of speed of float car
CN109643485A (en) A kind of urban highway traffic method for detecting abnormality
CN110285877B (en) Train real-time positioning tracking and speed calculating method based on Spark Streaming
CN105241465B (en) A kind of method of road renewal
CN109492708B (en) LS-KNN-based pipeline magnetic flux leakage internal detection missing data interpolation method
CN102279424B (en) Early warning system for power grid meteorological disaster
CN109190272B (en) Concrete structure defect detection method based on elastic waves and machine learning
CN112749210B (en) Vehicle collision recognition method and system based on deep learning
CN104851301B (en) Vehicle parameter identification method based on deceleration strip sound analysis
CN106247173A (en) The method and device of pipeline leakage testing
CN106647514A (en) Cement enterprise carbon emission real-time on-line monitoring management system
CN115100819B (en) Landslide hazard early warning method and device based on big data analysis and electronic equipment
CN117390378B (en) Intelligent management method and system for dual-carbon platform data
CN111971581A (en) Device, method and computer program product for verifying data provided by a rain sensor
Sharifi et al. Outsourced probe data effectiveness on signalized arterials
CN113706871A (en) Multisource data fusion system in urban traffic big data processing
CN106841830B (en) Early-warning for high pressure method, apparatus and system based on electric field intensity signal detection
CN105222885A (en) Optical fiber vibration detection method and device
Wieczorek et al. Techniques for validating an automatic bottleneck detection tool using archived freeway sensor data
CN114611728B (en) Sewage pipe network blockage monitoring method and system
CN202166754U (en) Meteorological disaster prewarning system for power network
CN113255820B (en) Training method for falling-stone detection model, falling-stone detection method and related device
CN105761504A (en) Vehicle speed real-time measuring method based on inhomogeneous video image frame collection

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20090902

Termination date: 20120401