WO2015039693A1 - Method and system for data quality assessment - Google Patents

Method and system for data quality assessment Download PDF

Info

Publication number
WO2015039693A1
WO2015039693A1 PCT/EP2013/069551 EP2013069551W WO2015039693A1 WO 2015039693 A1 WO2015039693 A1 WO 2015039693A1 EP 2013069551 W EP2013069551 W EP 2013069551W WO 2015039693 A1 WO2015039693 A1 WO 2015039693A1
Authority
WO
WIPO (PCT)
Prior art keywords
quality
data
indicator
sample
traffic data
Prior art date
Application number
PCT/EP2013/069551
Other languages
French (fr)
Inventor
Francesco ALESIANI
Mahsa FAIZRAHNEMOON
Original Assignee
Nec Europe Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nec Europe Ltd. filed Critical Nec Europe Ltd.
Priority to PCT/EP2013/069551 priority Critical patent/WO2015039693A1/en
Publication of WO2015039693A1 publication Critical patent/WO2015039693A1/en

Links

Classifications

    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/01Detecting movement of traffic to be counted or controlled
    • G08G1/0104Measuring and analyzing of parameters relative to traffic conditions
    • G08G1/0108Measuring and analyzing of parameters relative to traffic conditions based on the source of data
    • G08G1/0116Measuring and analyzing of parameters relative to traffic conditions based on the source of data from roadside infrastructure, e.g. beacons
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/02Computing arrangements based on specific mathematical models using fuzzy logic
    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/01Detecting movement of traffic to be counted or controlled
    • G08G1/0104Measuring and analyzing of parameters relative to traffic conditions
    • G08G1/0125Traffic data processing
    • G08G1/0129Traffic data processing for creating historical data or processing based on historical data

Definitions

  • the present invention relates to a method and system for assessing the quality of ITS related traffic data.
  • ITS Intelligent Transport Systems
  • the performance of these systems is dependent on the information they use and on its reliability.
  • ITS systems have evolved from singled manned system to open market system, where information provision and use is implemented by different actors.
  • Data quality can be assessed towards different criteria (for instance accuracy, completeness, %) since the expectation of the end-user may be multiple.
  • Some quality criteria can be considered at the design/implementation phase of a road traffic measuring system and for its nominal behavior, while others shall be evaluated at runtime or on regular bases.
  • some quality criteria may be evaluated against a reference system either in closed testing facility or performing dedicated onsite campaign.
  • such a system comprises computation means for determining samples of traffic data collected for a particular type of traffic measure, and for defining, for a particular sample of traffic data under analysis, a set of quality indicators l x that assess said sample with respect to different aspects, and
  • analyses tools for each of said quality indicators l x , wherein said analyses tools are configured to calculate, for each of said quality indicators l x , a quality indicator value by evaluating the consistency and/or deviation of said sample with respect to a spatial or temporal neighbor sample.
  • a mechanism is identified that allows evaluating the quality of traffic data provided by a traffic measurement system or a data provider to be used in the context of ITS systems.
  • Traffic data quality is relevant for safety reasons and for exploitation aspects, and providing data quality associated with the actual data allows improving value of the data itself.
  • the present invention addresses the problem by defining a set of quality indicators that allow evaluating different aspects of the traffic data.
  • Embodiments of the invention allow evaluating, with predictive capabilities, the current and trend of the reliability of each sensor of a traffic measurement system by exploiting information on the previous measure of the same sensor, the underlying physical phenomenon characteristics and/or the relationship with other measurement sites.
  • embodiments of the invention address the online computation of confidence and/or reliability quality indicators of the provided data based on some consistency or rate of variance tests.
  • embodiments of the invention use subsampling techniques and compute some structure in the data that can then be used to measure consistency in the data on different road sections or in different time intervals.
  • Embodiments of the invention allow measurements in particular with respect to the following aspects:
  • Validity validity is the period of measurement or validity of the provided data
  • Timeliness is the delay introduced between the physical phenomenon and the availability of the data
  • Coverage is the spatial availability of the data or the portion of the road that is monitored
  • Accessibility is the level at which the data is made available
  • Fig. 1 is a schematic view illustrating the basic building blocks of a singled manned road traffic measuring system
  • Fig. 2 is a schematic view illustrating the basic building blocks of a road traffic measuring system according to an open data market scenario
  • Fig. 3 is a schematic view illustrating an indicator function block
  • Fig. 4 is a quality indicator block diagram in accordance with an embodiment of the present invention
  • Fig. 5 is a schematic view illustrating a scenario of section measures
  • Fig. 6 is a schematic view illustrating a scenario of corridor measures
  • Fig. 7 is a schematic view illustrating a scenario of network measures
  • Fig. 8 is a diagram illustrating extraction of correlation variables from correlation function shape
  • Fig. 9 is a schematic view illustrating fuzzy systems used in connection with corridor related quality indicators
  • Fig. 10 is a schematic view illustrating membership functions employed in connection with the fuzzy systems of Fig. 9,
  • Fig. 1 1 is a schematic view illustrating a first configuration of combined fuzzy systems used in connection with corridor related quality indicators
  • Fig. 12 is a schematic view illustrating a second configuration of combined fuzzy systems used in connection with corridor related quality indicators
  • Fig. 13 is a schematic view illustrating three membership functions of a first type used in connection with corridor related quality indicators
  • Fig. 14 is a schematic view illustrating three membership functions of a second type used in connection with corridor related quality indicators
  • Fig. 15 is a schematic view illustrating three membership functions of a third type used in connection with corridor related quality indicators
  • Fig. 16 is a schematic view illustrating the working principle of a statistical tool
  • Fig. 17 is a schematic view illustrating the processing for determining a fluctuation quality indicator for a time interval
  • Fig. 18 is a diagram illustrating the process of reference mask creation for evaluating a time interval quality indicator
  • Fig. 19 is a diagram illustrating the process of estimation of an unsymmetrical probability distribution
  • Fig. 20 is a diagram illustrating fundamental diagram estimation
  • Fig. 21 is a schematic view illustrating the processing for determining a time interval quality indicator for a section
  • Fig. 22 is a schematic view illustrating the processing for determining a time window consistency quality indicator for a section
  • Fig. 23 is a diagram schematically illustrating alarm activation with hysteresis
  • Fig. 24 is a diagram illustrating the performance of quality indicator
  • Fig. 25 is a diagram illustrating the performance of quality indicator IB
  • Fig. 26 is a diagram illustrating the performance of quality indicator lc
  • Fig. 27 is a diagram illustrating correlation between consecutive sections
  • Fig. 28 is a diagram illustrating different fuzzy system quality indicators
  • Fig. 29 is a diagram illustrating fundamental diagram estimation with two linear functions
  • Fig. 30 is a diagram illustrating fundamental diagram approximation with a polynomial function
  • Fig. 31 is a diagram illustrating the performance of a fundamental diagram check for a test day
  • Fig. 32 is a diagram illustrating the performance of a fundamental diagram check for a training day.
  • the present invention addresses the problem of computing multiple different indicators of the quality of road traffic data (resulting either from the direct measurement of traffic data or from elaborated traffic data), wherein each of the applied indicators evaluates the data according to different aspects and attitudes.
  • An indicator is a real number that provides information about the reliability or an assessment of the quality of the collected data.
  • quality indicators are assumed to be normalized to the interval between 0 and 1 , i.e. each indicator provides a value in [0, 1], wherein a higher value of the indicator means higher quality of data.
  • different kinds of normalization are also possible.
  • a data quality assessment apparatus is provided that takes as input traffic data collected over a long period and that provides as a result the quality indictor values.
  • Fig. 4 is a quality indicator block diagram related to an embodiment of the invention, in which a total number of 6 quality indicators is being determined (together with two intermediate quality indicators, as will be explained in more detail below).
  • the quality indicators are associated with different analysis tools, e.g. statistical tools or fuzzy logic.
  • the traffic data is organized or grouped in a certain way that is specific for each of the quality indicators.
  • Corridor related quality indicators (IG and I H in Fig. 4):
  • a first embodiment deals with the analysis of the data quality based on the observation of two or more road sections. It is noted that, generally, data taken from different measure sites can be considered within data quality assessment procedures. For instance, measure configuration can be classified in the following categories:
  • Section measures This kind of measures refers to a point along the road, as explanatory illustrated in Fig. 5 (measurement point "A").
  • Corridor measures This kind of measures refers to multiple measures that are spatially related. For instance, in highway scenarios there exist two configurations, with or without merging (entering/exit) flows. For other scenarios (non-highway roads) there may be present multiple and sparse merging areas, as illustrated in Fig. 6 (measurement points "A" and "B”).
  • Network measures This kind of measures is the most complex configuration and may include multiple merging points and traffic signal control systems, as illustrated in Fig. 6 (measurement points "A"-"E").
  • the indicator in the case described here is based on the correlation of two road sections, wherein the correlation is defined as
  • x refers to the measurements taken by a first measurement point A
  • y refers to the measurements taken by a second (neighbored) measurement point B.
  • x yA (t, d)
  • y ye (t + ⁇ , d)
  • is the delay between the two sections.
  • the day whose data is under consideration is represented by i/ and n is the length of the vector of data, i.e. n gives the number of measurements contained within the data sample that is being analyzed.
  • the measurements x and y, respectively, taken by the two measurement points A and B, respectively can be related to any specific traffic measures type. Specifically, the following types of traffic measure can be considered:
  • Flow or volume this is the total number of vehicles passing in a point on the road over a given interval of time.
  • the road itself can be a lane or the whole carriageway. Since the flow is a directional information, normally it will be differentiated between the two directions of movement of a road
  • Speed this is the distribution or the mean of the speed of vehicles passing in a defined section of a road in a specific time interval
  • Occupancy is the percentage of time a roadway detection zone is occupied by vehicles or where the sensor detects presence of some vehicle 4) Travel time: the distribution or mean measured travel time that vehicles take to traverse a specific section of a road
  • Density measure the distribution or mean of the number of vehicles in a specific section of a road
  • Delay is the difference between the travel time in free flow or a maximum allowed speed and the actual travel time
  • Queue length refers to the length or number of vehicles with speed under a specific threshold that indicate that the vehicles are waiting and can not proceed further.
  • Information presented can be also classified by type of vehicle, where type of vehicle may refer to the length, weight class or any other characteristics of the vehicle, including the type of use.
  • the delay variable can be used to infer information on the traffic state and it can be compared with the measure of travel time or speed between the two sections.
  • the indicator of the sections is defined using fuzzy logic.
  • the fuzzy logic allows to relate intervals of values with linguistic rules.
  • the fuzzy logic system is defined by a set of fuzzy membership functions for the input and output and a set of rule to pass from the input to the output. Three fuzzy systems are used; two of them are shown in Fig. 9.
  • the inputs of the fuzzy system (A) are c mx , a mx , and A UAX of the current data, that is of the data sample currently being analyzed. Bigger area, tighter angle and higher correlation develop higher quality.
  • the second fuzzy system (B) is developed to extract the quality of specific data compared to historical data.
  • the input to the fuzzy system (B) is therefore the difference (A) of the correlation variables.
  • This fuzzy system determines if the quality of the specific data is higher or lower than the average data by assigning a value between 0 and 1 .
  • FIG. 10 An embodiment of the qualitative shape of the membership functions that can be used for the fuzzy systems is illustrated in Fig. 10.
  • a last fuzzy system (C) is developed in such a way that it allows integrating partial indicators, as outputted either by fuzzy system (A) or (B).
  • Fig. 1 1 illustrates a fuzzy system configuration, in which fuzzy system (C) receives as first input the indicator associated with the current data and as second input the indicator associated with the historical data, both generated by fuzzy system (A). The resulting output is a quality indicator termed IG (see Fig. 4 for reference).
  • Fig. 12 illustrates a fuzzy system configuration, in which fuzzy system (C) receives as first input the indicator of the difference of the data generated by fuzzy system (B), and as second input the indicator associated with the historical data, generated by fuzzy system (A). The resulting output is a quality indicator termed I H.
  • o Indicator function three membership functions of type A, where a, b, c are equal to (0,0.5,1 )
  • Time interval quality indicator (IB in Fig. 4):
  • a statistical tool as employed herein is configured to derive two basic outputs: the mean and standard deviation of the means of resamples of the input data. The procedure is as follow:
  • the statistical tool can be seen as a module that, given an array of data, generates its mean and the standard deviation of the mean.
  • the inputs to the statistical tool are the means of each basic time step data along the days, /// , in a given time interval, and the outputs ( &, ⁇ and Ob, ⁇ ) are the mean and the standard deviation of the means of the re-samples of the input.
  • the fluctuation formula is then defined based on &, ⁇ and Ob, ⁇ as When the value of the fluctuation decreases, the trend of change of the data is increased.
  • the fluctuation could be linked to an error in the measure or related to any actual physical phenomenon.
  • a reference mask is created.
  • a mask can be developed based on the fluctuation values of some other sections or based on the history of the same section.
  • FIG. 18 An embodiment of reference mask creation is illustrated in Fig. 18.
  • the distribution of the mask is not symmetric and is found by estimating the standard deviation of the right and the left side of the diagram. This approach is described hereinafter in connection with an unsymmetrical probability distribution, as shown in Fig. 19, by applying either a maximum likelihood approximation or a sigma search.
  • the final indicator (termed IB in Fig. 4) is defined based on the difference between the mask and the indicator of the current time interval.
  • the indicator is then where the probability distribution Pr of the mask has been computed based on the data samples. The probability is given by ) where
  • the fundamental diagram represents the relationship between the density (in terms of vehicle per unit of length) and the flow (in terms of vehicle per unit of time).
  • the fundamental diagram depends on the specific road segment and its characteristics, as for example the maximal speed and the number of lanes.
  • the fundamental diagram is defined as two polynomials (indicated by the solid lines), separated by the critical density K c . Each part of the diagram is characterized by a standard deviation (indicated by the dashed lines).
  • One way to derive the fundamental diagram from the density and flow data is by applying the following procedure (it is noted that, depending on the data available, some processing may be needed to derive the density from the occupancy measure):
  • the fundamental diagram is approximated with two polynomial function divided by the critical density.
  • One example is to have two linear approximations.
  • the critical density is first defined as the density corresponding to the highest flow.
  • Kc the approximation is derived using regression.
  • An iterative process can than start by changing the value of K c by a positive and negative amount and then computing the approximation error.
  • the value of K c is updated towards the value that minimizes the approximation error.
  • the value of the standard deviation is computed for each half interval. These values are used to define the interval of confidence.
  • Fig. 20 shows an example fundamental diagram with its two linear approximations.
  • the current measure being under analysis is compared to the diagram and a quality indicator L is generated based on the weighted distance between the diagram and the measure as follows:
  • Further quality indicators IA and lc in Fig. 4):
  • the first step is to extract the quality of the data of a single section.
  • the quality of a single basic time step and a time interval can be extracted by the quality indicators described above.
  • the quality of a specific data with respect to the data in a window can also be considered.
  • the indicator of a basic time step may be defined based on the output of the statistical tool.
  • the input of the statistical tool is the data of a single time step
  • the outputs are ⁇ tone and 3 ⁇ 4 which are the mean and standard deviation of means of resamples of the input.
  • the quality indicator is defined as follows:
  • the quality indicator has a value close to one which represents high quality.
  • Still another quality indicator could be based on a time window consistency check.
  • Such time window quality indicator measures the quality of a specific data, d, compared to other data in a neighborhood (window) around it.
  • the input of the statistical tool is the data of the window, excluding the data under consideration, and the outputs are the mean and standard deviation of the resamples of input.
  • the output leads us to the distribution of the data and the quality indicator is defined as:
  • another quality indicator can be created by considering the projection of the traffic measure, as for example the flow, to the following or precedent sensor and to verify its value with respect to the section statistics.
  • the projection is computed using a model for the evolution of the traffic flow in time and space.
  • the simpler model is the constant speed propagation.
  • Alternative method is to use multi modal constant speed propagation, where the traffic is divided in classes of different speed and the flow or number of vehicle for each class is propagated according to the specific class velocity.
  • hysteresis thresholds may be defined that allow to rise an alarm only if the corresponding quality indicator drops below a predefined lower threshold for a predefined time period that is sufficiently long, and the alarm is closed only if the quality indicator is over threshold for a sufficiently long time period, as illustrated in Fig. 23.
  • an alarm probability can be defined based on the percentage of time the corresponding quality indicator is below a specific threshold.
  • Indicators for the whole period up to the current time step typically the period of observation is restarted at the midnight of the previous day, but can also be a window of 24 hours
  • indicators defined in this invention include time step, interval, correlation based and fundamental diagram consistency check 5) Parameters of the computation, as the size of the time window, the presence and size of the pre-processing, size of the time interval.
  • Figs. 24-32 show the quality of flow extracted by the quality indicators defined above.
  • the number of iterations of the statistical tool is 1000 for all figures.
  • Fig. 24 the quality of each minute data along the days of one week for two different sections is shown. The quality is extracted according to the indicator . The figure clearly shows that the amount of unavailable data affects the quality of the data.
  • the time intervals data quality indicator is represented in Fig. 25 for the same sections shown in Fig. 21 .
  • the value of the fluctuation quality indicator I F is plotted in the two diagrams in the second row.
  • the value of the quality indicator IB is shown in the top row.
  • Each time interval is 20 minutes.
  • a mask is defined based on 5 other sections and the quality of the section is extracted here.
  • Fig. 26 On the top of Fig. 26 the quality of the data of the third day of the week with respect to the other days of the week is depicted. The quality is calculated based on quality indicator lc. The size of the window is 5, since the data is not available for the weekend. The flow of the third day is shown in the second row and the average flow of the other days of the week is represented at the bottom. The quality of the data is extracted for each minute and it can be seen in on the top of Fig. 26.
  • Fig. 27 shows the correlation between two sections in a row.
  • the vector of the first section is one hour data, 10:00 - 1 1 :00 am, and the time delay is assumed to be in [-10 10] minutes.
  • the correlation is computed for the last day of the week and for the average data of the week.
  • Fig. 28 the quality of the data for each 10 minutes are plotted.
  • the upper diagram depicts the quality of the data of the last day of the week and the average data of the week based on the fuzzy logic A. It can be seen that in some parts the last day has higher or lower quality compared to the average data.
  • the below I diagram shows the quality of the last day compared to the average of the week which is extracted based on the fuzzy logic B.
  • Figs. 29 and 30 show two different approximations of the fundamental diagram. These diagrams are then used to check the consistency of new data.
  • Fig. 31 shows the fundamental diagram estimated in one training day
  • Fig. 32 shows its use in the following test day. Error is reported. For the analyzed section the error level is almost the same between the two days.

Landscapes

  • Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Analytical Chemistry (AREA)
  • Chemical & Material Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Algebra (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Fuzzy Systems (AREA)
  • Molecular Biology (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Automation & Control Theory (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Traffic Control Systems (AREA)

Abstract

A method for assessing the quality of ITS related traffic data is disclosed that comprises determining samples of traffic data collected for a particular type of traffic measure, for a particular sample of traffic data under analysis, defining a set of quality indicators l x that assess said sample with respect to different aspects, for each of said quality indicators l x, calculating a quality indicator value by evaluating the consistency and/or deviation of said sample with respect to a spatial or temporal neighbor sample by means of applying different analyses tools for each of said quality indicators l x. Furthermore, a corresponding system for assessing the quality of ITS related traffic data is described.

Description

METHOD AND SYSTEM FOR DATA QUALITY ASSESSMENT
The present invention relates to a method and system for assessing the quality of ITS related traffic data.
Intelligent Transport Systems (ITS) are widely used to improve utilization and increase safety of transport system. The performance of these systems is dependent on the information they use and on its reliability. ITS systems have evolved from singled manned system to open market system, where information provision and use is implemented by different actors.
In either cases, when the system is closely managed (as illustrated in Fig. 1 , where the concept is illustrated for a single measurement point "A" that collects traffic data for a particular section of a road) or when the information is exchanged by different entities (as illustrated in Fig. 2), the quality of the data shall be assessed in order to provide a reliable system. Generally speaking "Quality" defines the extent that the provided quantity meets end-user requirements.
Data quality can be assessed towards different criteria (for instance accuracy, completeness, ...) since the expectation of the end-user may be multiple. Some quality criteria can be considered at the design/implementation phase of a road traffic measuring system and for its nominal behavior, while others shall be evaluated at runtime or on regular bases. Moreover, some quality criteria may be evaluated against a reference system either in closed testing facility or performing dedicated onsite campaign.
In modern ITS system, accuracy is increasingly important and performance of the system is directly connected to the provided data quality. A degradation of the quality of the data is not typically followed by a graceful degradation of the ITS system, since small error may induce large effects. An indication of the reliability of the provided data is crucial in this case. Further, during operation of a road traffic measuring system it may happen that sensors of the measurement points or some other parts of the whole system may not function at the nominal levels. For the above reason, it is important to have other online technology available. In this regard, manual check of the status of the sensors is a typical approach which, however, is lengthy and costly. Further it does not scale with the size of the system and it may hinder the use of more advanced ITS systems.
In: C. Chen, J. Kwon, J. Rice, A. Skabardonis, P. Varaiya, "Detecting Errors and Imputing Missing Data for Single Loop Surveillance Systems", 82nd Annual Meeting Transportation Research Board, January 2003, Washington, D.C., it is proposed to provide online indicators that include the check of some unrealistic conditions like, for instance, non zero measures with zero occupancy, non zero flow with zero density, or samples with high occupancy, especially with non zero flow. However, for many applications the accuracy and comprehensiveness of this solution is not sufficient.
In view of the above it is an objective of the present invention to improve and further develop a method and a system for assessing the quality of ITS related traffic data in such a way that an enhancement in terms of comprehensiveness and predictive quality is achieved.
In accordance with the invention, the aforementioned object is accomplished by a method comprising the features of claim 1. According to this claim such a method comprises
determining samples of traffic data collected for a particular type of traffic measure,
for a particular sample of traffic data under analysis, defining a set of quality indicators vthat assess said sample with respect to different aspects,
for each of said quality indicators lx, calculating a quality indicator value by evaluating the consistency and/or deviation of said sample with respect to a spatial or temporal neighbor sample by means of applying different analyses tools for each of said quality indicators lx.
Furthermore, the above mentioned objective is accomplished by a system comprising the features of claim 15. According to this claim such a system comprises computation means for determining samples of traffic data collected for a particular type of traffic measure, and for defining, for a particular sample of traffic data under analysis, a set of quality indicators lx that assess said sample with respect to different aspects, and
a number of different analyses tools for each of said quality indicators lx, wherein said analyses tools are configured to calculate, for each of said quality indicators lx, a quality indicator value by evaluating the consistency and/or deviation of said sample with respect to a spatial or temporal neighbor sample.
Preferred embodiments of the invention are specified in the dependent claims.
According to the invention a mechanism is identified that allows evaluating the quality of traffic data provided by a traffic measurement system or a data provider to be used in the context of ITS systems. Traffic data quality is relevant for safety reasons and for exploitation aspects, and providing data quality associated with the actual data allows improving value of the data itself. The present invention addresses the problem by defining a set of quality indicators that allow evaluating different aspects of the traffic data. Embodiments of the invention allow evaluating, with predictive capabilities, the current and trend of the reliability of each sensor of a traffic measurement system by exploiting information on the previous measure of the same sensor, the underlying physical phenomenon characteristics and/or the relationship with other measurement sites. Furthermore, embodiments of the invention address the online computation of confidence and/or reliability quality indicators of the provided data based on some consistency or rate of variance tests. Still further, embodiments of the invention use subsampling techniques and compute some structure in the data that can then be used to measure consistency in the data on different road sections or in different time intervals.
By defining a set of different quality indicators that assess data with respect to different aspects and attitudes, comprehensive measurements of the data quality for ITS systems are possible, resulting in the ability to identify more complex failure patterns, to effectively plan maintenance activities for the individual sensors of the traffic measurement system and, generally, to implement more reliable ITS systems. Embodiments of the invention allow measurements in particular with respect to the following aspects:
Accuracy (which is the main focus of the present invention): how close the current measure is to the real underlay physical phenomenon Completeness: the presence of missing data
Validity: validity is the period of measurement or validity of the provided data
Timeliness: is the delay introduced between the physical phenomenon and the availability of the data
Coverage: is the spatial availability of the data or the portion of the road that is monitored
Accessibility: is the level at which the data is made available
There are several ways how to design and further develop the teaching of the present invention in an advantageous way. To this end it is to be referred to the patent claims subordinate to patent claims 1 and 15 on the one hand and to the following explanation of preferred embodiments of the invention by way of example, illustrated by the drawing on the other hand. In connection with the explanation of the preferred embodiments of the invention by the aid of the drawing, generally preferred embodiments and further developments of the teaching will be explained. In the drawing
Fig. 1 is a schematic view illustrating the basic building blocks of a singled manned road traffic measuring system,
Fig. 2 is a schematic view illustrating the basic building blocks of a road traffic measuring system according to an open data market scenario,
Fig. 3 is a schematic view illustrating an indicator function block,
Fig. 4 is a quality indicator block diagram in accordance with an embodiment of the present invention, Fig. 5 is a schematic view illustrating a scenario of section measures,
Fig. 6 is a schematic view illustrating a scenario of corridor measures,
Fig. 7 is a schematic view illustrating a scenario of network measures,
Fig. 8 is a diagram illustrating extraction of correlation variables from correlation function shape,
Fig. 9 is a schematic view illustrating fuzzy systems used in connection with corridor related quality indicators,
Fig. 10 is a schematic view illustrating membership functions employed in connection with the fuzzy systems of Fig. 9,
Fig. 1 1 is a schematic view illustrating a first configuration of combined fuzzy systems used in connection with corridor related quality indicators,
Fig. 12 is a schematic view illustrating a second configuration of combined fuzzy systems used in connection with corridor related quality indicators,
Fig. 13 is a schematic view illustrating three membership functions of a first type used in connection with corridor related quality indicators,
Fig. 14 is a schematic view illustrating three membership functions of a second type used in connection with corridor related quality indicators,
Fig. 15 is a schematic view illustrating three membership functions of a third type used in connection with corridor related quality indicators,
Fig. 16 is a schematic view illustrating the working principle of a statistical tool,
Fig. 17 is a schematic view illustrating the processing for determining a fluctuation quality indicator for a time interval, Fig. 18 is a diagram illustrating the process of reference mask creation for evaluating a time interval quality indicator,
Fig. 19 is a diagram illustrating the process of estimation of an unsymmetrical probability distribution,
Fig. 20 is a diagram illustrating fundamental diagram estimation,
Fig. 21 is a schematic view illustrating the processing for determining a time interval quality indicator for a section,
Fig. 22 is a schematic view illustrating the processing for determining a time window consistency quality indicator for a section,
Fig. 23 is a diagram schematically illustrating alarm activation with hysteresis,
Fig. 24 is a diagram illustrating the performance of quality indicator ,
Fig. 25 is a diagram illustrating the performance of quality indicator IB,
Fig. 26 is a diagram illustrating the performance of quality indicator lc,
Fig. 27 is a diagram illustrating correlation between consecutive sections,
Fig. 28 is a diagram illustrating different fuzzy system quality indicators,
Fig. 29 is a diagram illustrating fundamental diagram estimation with two linear functions,
Fig. 30 is a diagram illustrating fundamental diagram approximation with a polynomial function, Fig. 31 is a diagram illustrating the performance of a fundamental diagram check for a test day, and
Fig. 32 is a diagram illustrating the performance of a fundamental diagram check for a training day.
In the following description of preferred embodiments of the present invention, it is a key aspect to note that only a single quality indicator can not capture the overall quality of ITS related traffic data. For this reason, in accordance with embodiments of the invention a number of different quality indicators are combined to comprehensively evaluate the quality of traffic data. In other words, the present invention addresses the problem of computing multiple different indicators of the quality of road traffic data (resulting either from the direct measurement of traffic data or from elaborated traffic data), wherein each of the applied indicators evaluates the data according to different aspects and attitudes.
An indicator is a real number that provides information about the reliability or an assessment of the quality of the collected data. Hereinafter, quality indicators are assumed to be normalized to the interval between 0 and 1 , i.e. each indicator provides a value in [0, 1], wherein a higher value of the indicator means higher quality of data. As will be appreciated by those skilled in the art, different kinds of normalization are also possible. As shown in Fig. 3, in order to calculate quality indicator values a data quality assessment apparatus is provided that takes as input traffic data collected over a long period and that provides as a result the quality indictor values.
In accordance with the present invention a set of different quality indicators is defined that allows evaluating different aspects of the traffic data. Fig. 4 is a quality indicator block diagram related to an embodiment of the invention, in which a total number of 6 quality indicators is being determined (together with two intermediate quality indicators, as will be explained in more detail below). As can be obtained from Fig. 4, the quality indicators are associated with different analysis tools, e.g. statistical tools or fuzzy logic. Furthermore, as indicated in the left column of the diagram, before the respective analysis tool is applied, the traffic data is organized or grouped in a certain way that is specific for each of the quality indicators.
It is noted that in some cases it may prove to be useful to add a pre-processing step in the system in order to avoid problems related to measures taken over a time period too short for performing reliable analysis, especially in single manned operated systems. According to a preferred embodiment, data is then filtered with nearby samples using an averaging filter as follows: y^ = v ^ v ∑aTx{t - r) where ατ , Κι , K2 are configuration parameters.
The different ways of data organization/preprocessing and analysis tool application will be explained hereinafter in detail for each of the quality indicators shown in Fig. 4. As will be appreciated by those skilled in the art, other quality indicators than those defined in connection with the embodiment of Fig. 4 can be defined.
Corridor related quality indicators (IG and I H in Fig. 4):
A first embodiment deals with the analysis of the data quality based on the observation of two or more road sections. It is noted that, generally, data taken from different measure sites can be considered within data quality assessment procedures. For instance, measure configuration can be classified in the following categories:
1) Section measures: This kind of measures refers to a point along the road, as explanatory illustrated in Fig. 5 (measurement point "A").
2) Corridor measures: This kind of measures refers to multiple measures that are spatially related. For instance, in highway scenarios there exist two configurations, with or without merging (entering/exit) flows. For other scenarios (non-highway roads) there may be present multiple and sparse merging areas, as illustrated in Fig. 6 (measurement points "A" and "B"). 3) Network measures: This kind of measures is the most complex configuration and may include multiple merging points and traffic signal control systems, as illustrated in Fig. 6 (measurement points "A"-"E").
The indicator in the case described here is based on the correlation of two road sections, wherein the correlation is defined as
Figure imgf000010_0001
In the above formula / denotes the basic time step at which measurements are taken, e.g. every 30 seconds, x refers to the measurements taken by a first measurement point A, while y refers to the measurements taken by a second (neighbored) measurement point B. Specifically, x = yA (t, d) and y = ye (t + τ, d) , wherein τ is the delay between the two sections. The day whose data is under consideration is represented by i/ and n is the length of the vector of data, i.e. n gives the number of measurements contained within the data sample that is being analyzed.
It is noted that the measurements x and y, respectively, taken by the two measurement points A and B, respectively, can be related to any specific traffic measures type. Specifically, the following types of traffic measure can be considered:
1) Flow or volume: this is the total number of vehicles passing in a point on the road over a given interval of time. The road itself can be a lane or the whole carriageway. Since the flow is a directional information, normally it will be differentiated between the two directions of movement of a road
2) Speed: this is the distribution or the mean of the speed of vehicles passing in a defined section of a road in a specific time interval
3) Occupancy: is the percentage of time a roadway detection zone is occupied by vehicles or where the sensor detects presence of some vehicle 4) Travel time: the distribution or mean measured travel time that vehicles take to traverse a specific section of a road
5) Density: measure the distribution or mean of the number of vehicles in a specific section of a road
6) Delay: is the difference between the travel time in free flow or a maximum allowed speed and the actual travel time
7) Queue length: refers to the length or number of vehicles with speed under a specific threshold that indicate that the vehicles are waiting and can not proceed further.
Information presented can be also classified by type of vehicle, where type of vehicle may refer to the length, weight class or any other characteristics of the vehicle, including the type of use.
Turning back to the calculation of the corridor related quality indicators, based on the correlation of two sections A and B, the following information is derived, as schematically shown in Fig. 8:
• The maximum correlation value cUAX
• The closest local (i.e. neighbored) minimum correlation values
• The delay at the maximum correlation value τΜΑΧ
• The angle at the maximum aMAX
• The area defined by the maximum Amx
When data in the two sections is reliable, that is when the indicator of the sections is over a predefined threshold, the delay variable can be used to infer information on the traffic state and it can be compared with the measure of travel time or speed between the two sections. In the present embodiment the indicator of the sections is defined using fuzzy logic. The fuzzy logic allows to relate intervals of values with linguistic rules. The fuzzy logic system is defined by a set of fuzzy membership functions for the input and output and a set of rule to pass from the input to the output. Three fuzzy systems are used; two of them are shown in Fig. 9. The inputs of the fuzzy system (A) are cmx , amx , and AUAX of the current data, that is of the data sample currently being analyzed. Bigger area, tighter angle and higher correlation develop higher quality.
The second fuzzy system (B) is developed to extract the quality of specific data compared to historical data. The input to the fuzzy system (B) is therefore the difference (A) of the correlation variables. This fuzzy system determines if the quality of the specific data is higher or lower than the average data by assigning a value between 0 and 1 .
An embodiment of the qualitative shape of the membership functions that can be used for the fuzzy systems is illustrated in Fig. 10.
A last fuzzy system (C) is developed in such a way that it allows integrating partial indicators, as outputted either by fuzzy system (A) or (B). Fig. 1 1 illustrates a fuzzy system configuration, in which fuzzy system (C) receives as first input the indicator associated with the current data and as second input the indicator associated with the historical data, both generated by fuzzy system (A). The resulting output is a quality indicator termed IG (see Fig. 4 for reference). Fig. 12, on the other hand, illustrates a fuzzy system configuration, in which fuzzy system (C) receives as first input the indicator of the difference of the data generated by fuzzy system (B), and as second input the indicator associated with the historical data, generated by fuzzy system (A). The resulting output is a quality indicator termed I H.
Hereinafter, an implementation of the entire fuzzy system according to a specific embodiment will be described. However, it is to be understood that the exact definition can be changed depending on the actual requirements, e.g. in terms of sensibility of the final indicator.
Fuzzy Logic System A:
• Input
o Max Correlation, three membership functions of type A as illustrated in Fig. 13, where a, b, c are equal to (0,0.5, 1 ) o Area, three membership functions of type A, where a, b, c are equal to (0,10,20)
o Cosine of the angle, three membership functions of type A, where a, b, c are equal to (-1 ,-0.5,0)
• Output
o Indicator function, three membership functions of type A, where a, b, c are equal to (0,0.5,1 )
The rule set that is applied in fuzzy logic system A and that relates the logical link between inputs to the output is described in the following table:
Figure imgf000013_0001
It is noted that in the above table the condition are in AND. So the first line of the table is read as: IF INPUT(1 ) = set 1 AND INPUT(2) = set 1 AND INPUT(3) = set 1 THEN OUPUT=set 1.
Fuzzy Logic System B:
• Input
o Delta Max Correlation, three membership functions of type B as illustrated in Fig. 14, where a, b, c, d, e are equal to (-1 ,-0.5,0,0.5,1 ) o Delta Area, three membership functions of type C as illustrated in Fig. 15, where a, b, c, d, e are equal to (-20,-10,0,10,20) o Delta Cosine of the angle, three membership of type B functions, where a, b, c, d, e are equal to (-2,-1 ,0,-1 ,2)
• Output
o Indicator function, three membership functions of type A, where a, b, c are equal to (0,0.5,1 ) les being applied in fuzzy logic system B are summarized in the following
Figure imgf000014_0001
Fuzzy Logic System C
• Input
o Indicator on current state, three membership functions of type A, where a, b, c are equal to (0,0.5,1 )
o Indicator on historical data, three membership functions of type A, where a, b, c are equal to (0,0.5,1 )
• Output
o Indicator function, three membership functions of type A, where a, b, c are equal to (0,0.5,1 ) les being applied in fuzzy logic system C are summarized in the following
Figure imgf000015_0001
Time interval quality indicator (IB in Fig. 4):
In order to define the quality indicator of data of a time interval, it is determined how fast the data is changing by using a statistical tool and defining a formula that quantifies the fluctuation trend of the data. Generally, as shown in Fig. 16, a statistical tool as employed herein is configured to derive two basic outputs: the mean and standard deviation of the means of resamples of the input data. The procedure is as follow:
1 ) A sample of a society is selected and resampled
2) The means of the resampled data are generated
3) The process is repeated for a given number of times
4) The mean and standard deviation of the generated means is computed
The statistical tool can be seen as a module that, given an array of data, generates its mean and the standard deviation of the mean.
As shown in Fig. 17, in the specific embodiment the inputs to the statistical tool are the means of each basic time step data along the days, /// , in a given time interval, and the outputs ( &, μ and Ob, μ) are the mean and the standard deviation of the means of the re-samples of the input. The fluctuation formula is then defined based on &, μ and Ob, μ as When the value of the fluctuation decreases, the trend of change of the data is increased.
The fluctuation could be linked to an error in the measure or related to any actual physical phenomenon. In a preferred embodiment, in order to distinguish these two cases, a reference mask is created. A mask can be developed based on the fluctuation values of some other sections or based on the history of the same section.
An embodiment of reference mask creation is illustrated in Fig. 18. The distribution of the mask is not symmetric and is found by estimating the standard deviation of the right and the left side of the diagram. This approach is described hereinafter in connection with an unsymmetrical probability distribution, as shown in Fig. 19, by applying either a maximum likelihood approximation or a sigma search.
To approximate the distribution of the data, a not symmetrical probability distribution is used. This distribution is defined by mean value and the standard deviation on the left and the right of the mean value. In order to compute the three variables the two standard deviations are related by the skewness ras follow:
The problem can be written as
Figure imgf000016_0001
where the standard deviation, the mean and the skewness are computed iteratively. Proper scaling factor of the single term can be added and defined in order to best fit the distribution. x_ and x+ are the values of the input histogram
Figure imgf000016_0002
Another way is proposed to find ¾ and ¾ when the distribution is not symmetric. By dividing the density function by the maximum value of the density function, one achieves a function whose values are between zero and one. When χ = μ + σΆ or = μ - σι , the value of this function is e~i Therefore, looking for points whose second coordinate values are
Figure imgf000017_0001
leads to ¾ and¾. The distribution is:
Figure imgf000017_0002
The final indicator (termed IB in Fig. 4) is defined based on the difference between the mask and the indicator of the current time interval. The indicator is then
Figure imgf000017_0003
where the probability distribution Pr of the mask has been computed based on the data samples. The probability is given by
Figure imgf000017_0004
) where
Figure imgf000017_0005
Fundamental Diagram consistency quality indicator (L in Fig. 4):
Another indicator is defined based on the fundamental diagram (FD). The fundamental diagram represents the relationship between the density (in terms of vehicle per unit of length) and the flow (in terms of vehicle per unit of time). The fundamental diagram depends on the specific road segment and its characteristics, as for example the maximal speed and the number of lanes.
As shown in Fig. 20, the fundamental diagram is defined as two polynomials (indicated by the solid lines), separated by the critical density Kc. Each part of the diagram is characterized by a standard deviation (indicated by the dashed lines). One way to derive the fundamental diagram from the density and flow data is by applying the following procedure (it is noted that, depending on the data available, some processing may be needed to derive the density from the occupancy measure):
The fundamental diagram is approximated with two polynomial function divided by the critical density. One example is to have two linear approximations. First an initial value of critical density is generated. The critical density is first defined as the density corresponding to the highest flow. For each semi interval defined by Kc, the approximation is derived using regression. An iterative process can than start by changing the value of Kc by a positive and negative amount and then computing the approximation error. The value of Kc is updated towards the value that minimizes the approximation error. Finally the value of the standard deviation is computed for each half interval. These values are used to define the interval of confidence. Fig. 20 shows an example fundamental diagram with its two linear approximations.
Once the fundamental diagram is estimated from past data, the current measure being under analysis is compared to the diagram and a quality indicator L is generated based on the weighted distance between the diagram and the measure as follows:
IL = 2Pr(J) = 2ψ(ά I μ, oRlGmtLEFT ) where ORIGHT/LEFT is the standard deviation of the error as estimated by the procedure either on the left or right side of the critical density. Further quality indicators (IA and lc in Fig. 4):
The first step is to extract the quality of the data of a single section. The quality of a single basic time step and a time interval can be extracted by the quality indicators described above. The quality of a specific data with respect to the data in a window can also be considered.
For instance, the indicator of a basic time step may be defined based on the output of the statistical tool. As shown in Fig. 21 , the input of the statistical tool is the data of a single time step, The outputs are μ„ and ¾ which are the mean and standard deviation of means of resamples of the input. The quality indicator is defined as follows:
When the quality of the data and the quality of the mean is high, it means that the value of the standard deviation is low. Therefore the quality indicator has a value close to one which represents high quality.
Still another quality indicator could be based on a time window consistency check. Such time window quality indicator measures the quality of a specific data, d, compared to other data in a neighborhood (window) around it. As shown in Fig. 22, the input of the statistical tool is the data of the window, excluding the data under consideration, and the outputs are the mean and standard deviation of the resamples of input. The output leads us to the distribution of the data and the quality indicator is defined as:
Figure imgf000019_0001
Similar to the above time window quality indicator for a single section, another quality indicator can be created by considering the projection of the traffic measure, as for example the flow, to the following or precedent sensor and to verify its value with respect to the section statistics. The projection is computed using a model for the evolution of the traffic flow in time and space. The simpler model is the constant speed propagation. Alternative method is to use multi modal constant speed propagation, where the traffic is divided in classes of different speed and the flow or number of vehicle for each class is propagated according to the specific class velocity.
Furthermore, based on the time related quality indicators described above, more compact quality indicators can be defined. For instance, according to one embodiment hysteresis thresholds may be defined that allow to rise an alarm only if the corresponding quality indicator drops below a predefined lower threshold for a predefined time period that is sufficiently long, and the alarm is closed only if the quality indicator is over threshold for a sufficiently long time period, as illustrated in Fig. 23. Still further, an alarm probability can be defined based on the percentage of time the corresponding quality indicator is below a specific threshold.
In Open Data Marked scenario actors shall exchange information of the quality on data and the method used for their computation. Based on the above quality indicator calculation, possible information that may be exchanged among different entities with quality indication can include, however, without being limited thereto:
1) Detailed time step indicators (indicator that refers to the current time step)
2) Indicators for the last closed time interval (time interval consists of several time steps)
3) Indicators for the whole period up to the current time step: typically the period of observation is restarted at the midnight of the previous day, but can also be a window of 24 hours
4) Method of computation: indicators defined in this invention include time step, interval, correlation based and fundamental diagram consistency check 5) Parameters of the computation, as the size of the time window, the presence and size of the pre-processing, size of the time interval.
Performance
In order to verify the defined quality indicators, data from a Highway system is used. The area of study is a road stretch where the length of each section is of 500 meters. The road includes 7 lanes. Data is available for every minute of each lane for every day of a week excluding the weekend. Before computing the quality indicators, the data per lane is aggregated for each direction. When data is not available it is substituted with an invalid value (negative value). Figs. 24-32 show the quality of flow extracted by the quality indicators defined above. The number of iterations of the statistical tool is 1000 for all figures. In the verification, the input data has been pre-processed by a filter of k=10 samples in order to smooth data, k is changing according to the number of the available data. In general, the coefficients of the smooth are 1/(1 + k).
In Fig. 24, the quality of each minute data along the days of one week for two different sections is shown. The quality is extracted according to the indicator . The figure clearly shows that the amount of unavailable data affects the quality of the data.
The time intervals data quality indicator is represented in Fig. 25 for the same sections shown in Fig. 21 . The value of the fluctuation quality indicator I F is plotted in the two diagrams in the second row. The value of the quality indicator IB is shown in the top row. Each time interval is 20 minutes. A mask is defined based on 5 other sections and the quality of the section is extracted here.
On the top of Fig. 26 the quality of the data of the third day of the week with respect to the other days of the week is depicted. The quality is calculated based on quality indicator lc. The size of the window is 5, since the data is not available for the weekend. The flow of the third day is shown in the second row and the average flow of the other days of the week is represented at the bottom. The quality of the data is extracted for each minute and it can be seen in on the top of Fig. 26.
Fig. 27 shows the correlation between two sections in a row. The vector of the first section is one hour data, 10:00 - 1 1 :00 am, and the time delay is assumed to be in [-10 10] minutes. The correlation is computed for the last day of the week and for the average data of the week.
In Fig. 28, the quality of the data for each 10 minutes are plotted. The upper diagram depicts the quality of the data of the last day of the week and the average data of the week based on the fuzzy logic A. It can be seen that in some parts the last day has higher or lower quality compared to the average data. The below I diagram shows the quality of the last day compared to the average of the week which is extracted based on the fuzzy logic B.
Figs. 29 and 30 show two different approximations of the fundamental diagram. These diagrams are then used to check the consistency of new data.
With relationship to the fundamental diagrams, Fig. 31 shows the fundamental diagram estimated in one training day, while Fig. 32 shows its use in the following test day. Error is reported. For the analyzed section the error level is almost the same between the two days.
Many modifications and other embodiments of the invention set forth herein will come to mind the one skilled in the art to which the invention pertains having the benefit of the teachings presented in the foregoing description and the associated drawings. Therefore, it is to be understood that the invention is not to be limited to the specific embodiments disclosed and that modifications and other embodiments are intended to be included within the scope of the appended claims. Although specific terms are employed herein, they are used in a generic and descriptive sense only and not for purposes of limitation.

Claims

C l a i m s
1. Method for assessing the quality of ITS related traffic data, comprising: determining samples of traffic data collected for a particular type of traffic measure,
for a particular sample of traffic data under analysis, defining a set of quality indicators vthat assess said sample with respect to different aspects,
for each of said quality indicators lx, calculating a quality indicator value by evaluating the consistency and/or deviation of said sample with respect to a spatial or temporal neighbor sample by means of applying different analyses tools for each of said quality indicators lx.
2. Method according to claim 1 , wherein said set of quality indicators lx includes an indicator - corridor correlation indicator - that assesses the data quality for two or more sections by analyzing the correlation of traffic data obtained by measurement points or areas located at said sections.
3. Method according to claim 2, wherein said corridor correlation indicator value is calculated based on logical fuzzy rule sets which compare the sample of traffic data under analysis with historical data samples based on variables derived from the correlation function shape.
4. Method according to claim 3, wherein said variables derived from the correlation function shape include the maximum correlation value and the closest local minimum correlation values, the delay at the maximum correlation value and, based on these values, an angle and an area at the maximum correlation.
5. Method according to any of claims 1 to 4, wherein said set of quality indicators lx includes an indicator - time interval indicator - that assesses the fluctuation trend of traffic data in a single measurement point or area.
6. Method according to claim 5, herein said time interval indicator determination includes the step of applying random subsampling of said traffic data and the step of computing the fluctuation trend by measuring the variation of the means of said subsamples.
7. Method according to claim 5 or 6, wherein said time interval indicator determination includes the step of identifying anomalous deviations by using the underlying estimated probability distribution of a reference mask that analyses the data on a time interval.
8. Method according to any of claims 1 to 7, wherein said set of quality indicators lx includes an indicator - fundamental diagram indicator - that assesses data quality based on fundamental diagram profiles by determining deviations of a sample of traffic data under analysis with respect to an estimated historical fundamental diagram.
9. Method according to any of claims 1 to 8, wherein said set of quality indicators lx includes an indicator that assesses traffic data quality in a single measurement point or area by identifying anomalous deviations of an historical profile of a single time instant by using the underlying estimated probability distribution.
10. Method according to any of claims 1 to 9, wherein said set of quality indicators lx includes an indicator that assesses traffic data quality in a specific time period with respect to temporally closed data (window) by evaluating the deviation to the underlying estimated probability distribution.
1 1. Method according to any of claims 1 to 10, wherein a probability distribution for the quantification of the quality of data is evaluated based on the histogram of the samples.
12. Method according to any of claims 1 to 1 1 , wherein a probability distribution for the quantification of the quality of data is evaluated via the solution of a maximum likelihood problem.
13. Method according to any of claims 1 to 12, wherein a quality indicator is derived by combining a quality indicator determined for historical data and a quality indicator determined for a current data sample based on the correlation of two or more sections.
14. Method according to any of claims 1 to 13, wherein a compact quality indicator is defined based on higher granularity quality indicators.
15. System for assessing the quality of ITS related traffic data, comprising: computation means for determining samples of traffic data collected for a particular type of traffic measure, and for defining, for a particular sample of traffic data under analysis, a set of quality indicators lx that assess said sample with respect to different aspects, and
a number of different analyses tools for each of said quality indicators lx, wherein said analyses tools are configured to calculate, for each of said quality indicators lx, a quality indicator value by evaluating the consistency and/or deviation of said sample with respect to a spatial or temporal neighbor sample.
16. System according to claim 15, wherein at least one of said analyses tools comprises a fuzzy system that is configured to be applied with a set of fuzzy membership functions on correlation parameters derived from said traffic data.
17. System according to claim 15 or 16, wherein one fuzzy system is configured to extract the quality of specific data compared to historical data by receiving as input the differences of correlation variables derived from the respective data.
18. System according to any of claims 15 or 17, wherein at least one of said analyses tools comprises a statistical tool that is configured to quantify the fluctuation trend of the input data.
19. System according to claim 18, wherein said statistical tool is configured to derive as output the mean and standard deviation of the means of resamples of the input data.
PCT/EP2013/069551 2013-09-20 2013-09-20 Method and system for data quality assessment WO2015039693A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/EP2013/069551 WO2015039693A1 (en) 2013-09-20 2013-09-20 Method and system for data quality assessment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2013/069551 WO2015039693A1 (en) 2013-09-20 2013-09-20 Method and system for data quality assessment

Publications (1)

Publication Number Publication Date
WO2015039693A1 true WO2015039693A1 (en) 2015-03-26

Family

ID=49474372

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2013/069551 WO2015039693A1 (en) 2013-09-20 2013-09-20 Method and system for data quality assessment

Country Status (1)

Country Link
WO (1) WO2015039693A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111339478A (en) * 2020-02-28 2020-06-26 大连大学 Weather data quality evaluation method based on improved fuzzy analytic hierarchy process
CN114466393A (en) * 2022-04-13 2022-05-10 深圳市永达电子信息股份有限公司 Rail transit vehicle-ground communication potential risk monitoring method and system
CN116824867A (en) * 2023-08-30 2023-09-29 山东华夏高科信息股份有限公司 Multi-source highway facility data signal optimization collection processing method
GB2619325A (en) * 2022-05-31 2023-12-06 Canon Kk Perception service test mode in intelligent transport systems
CN118133048A (en) * 2024-05-07 2024-06-04 临沂大学 College student physique test data acquisition method and system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE102004039283A1 (en) * 2004-08-13 2006-03-02 Daimlerchrysler Ag Forecasting journey time in road network, by taking into account time-space associations and/or patterns when selecting proportion of measured data as predicted parameter
US20090080973A1 (en) * 2007-09-25 2009-03-26 Traffic.Com, Inc. Estimation of Actual Conditions of a Roadway Segment by Weighting Roadway Condition Data with the Quality of the Roadway Condition Data

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE102004039283A1 (en) * 2004-08-13 2006-03-02 Daimlerchrysler Ag Forecasting journey time in road network, by taking into account time-space associations and/or patterns when selecting proportion of measured data as predicted parameter
US20090080973A1 (en) * 2007-09-25 2009-03-26 Traffic.Com, Inc. Estimation of Actual Conditions of a Roadway Segment by Weighting Roadway Condition Data with the Quality of the Roadway Condition Data

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
BATTELLE: "TRAFFIC DATA QUALITY MEASUREMENT", 15 September 2004 (2004-09-15), XP055121929, Retrieved from the Internet <URL:http://isddc.dot.gov/OLPFiles/FHWA/013402.pdf> [retrieved on 20140605] *
C. CHEN; J. KWON; J. RICE; A. SKABARDONIS; P. VARAIYA: "Detecting Errors and Imputing Missing Data for Single Loop Surveillance Systems", 82ND ANNUAL MEETING TRANSPORTATION RESEARCH BOARD, January 2003 (2003-01-01)
NAN DING ET AL: "Distributed Algorithm for Traffic Data Collection and Data Quality Analysis Based on Wireless Sensor Networks", INTERNATIONAL JOURNAL OF DISTRIBUTED SENSOR NETWORKS, vol. 11, no. 1, 1 January 2011 (2011-01-01), pages 1 - 9, XP055121886, ISSN: 1550-1329, DOI: 10.1155/2011/717208 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111339478A (en) * 2020-02-28 2020-06-26 大连大学 Weather data quality evaluation method based on improved fuzzy analytic hierarchy process
CN111339478B (en) * 2020-02-28 2023-06-09 大连大学 Meteorological data quality assessment method based on improved fuzzy analytic hierarchy process
CN114466393A (en) * 2022-04-13 2022-05-10 深圳市永达电子信息股份有限公司 Rail transit vehicle-ground communication potential risk monitoring method and system
CN114466393B (en) * 2022-04-13 2022-07-12 深圳市永达电子信息股份有限公司 Rail transit vehicle-ground communication potential risk monitoring method and system
GB2619325A (en) * 2022-05-31 2023-12-06 Canon Kk Perception service test mode in intelligent transport systems
CN116824867A (en) * 2023-08-30 2023-09-29 山东华夏高科信息股份有限公司 Multi-source highway facility data signal optimization collection processing method
CN116824867B (en) * 2023-08-30 2023-11-17 山东华夏高科信息股份有限公司 Multi-source highway facility data signal optimization collection processing method
CN118133048A (en) * 2024-05-07 2024-06-04 临沂大学 College student physique test data acquisition method and system
CN118133048B (en) * 2024-05-07 2024-07-09 临沂大学 College student physique test data acquisition method and system

Similar Documents

Publication Publication Date Title
Sarmadi et al. Bridge health monitoring in environmental variability by new clustering and threshold estimation methods
Dervilis et al. On robust regression analysis as a means of exploring environmental and operational conditions for SHM data
CN109186813A (en) A kind of temperature sensor self-checking unit and method
Wang et al. Bayesian modeling of external corrosion in underground pipelines based on the integration of Markov chain Monte Carlo techniques and clustered inspection data
US9146800B2 (en) Method for detecting anomalies in a time series data with trajectory and stochastic components
CN111460392B (en) Magnetic suspension train and suspension system fault detection method and system thereof
WO2015039693A1 (en) Method and system for data quality assessment
CN105931458B (en) A kind of method of road traffic flow detection device reliability assessment
KR20190065015A (en) Support method for responding to stream disaster, and support system for responding to stream disaster
Quiñones-Grueiro et al. An unsupervised approach to leak detection and location in water distribution networks
Kim et al. Long-term bridge health monitoring and performance assessment based on a Bayesian approach
CN116308305B (en) Bridge health monitoring data management system
Zheng et al. Travel time reliability for urban networks: modelling and empirics
Moghaddam et al. Evaluating the performance of algorithms for the detection of travel time outliers
Chen et al. A combination model for evaluating deformation regional characteristics of arch dams using time series clustering and residual correction
Mollineaux et al. Structural health monitoring of progressive damage
Scalabrin et al. A Bayesian forecasting and anomaly detection framework for vehicular monitoring networks
He et al. Link dynamic vehicle count estimation based on travel time distribution using license plate recognition data
Widhalm et al. Identifying faulty traffic detectors with Floating Car Data
Canepa et al. A dual model/artificial neural network framework for privacy analysis in traffic monitoring systems
Kachroo et al. Model-based methodology for validation of traffic flow detectors by minimizing human bias in video data processing
Du et al. Fault-tolerant control of variable speed limits for freeway work zone with recurrent sensor faults
KR101939446B1 (en) Method and system for determining homogeneity of traffic condition between point detector and section detector data
Richardson et al. Network stratification method by travel time variation
TWI484353B (en) Methods and Systems for Calculating Random Errors

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13780078

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 13780078

Country of ref document: EP

Kind code of ref document: A1