WO2015039693A1

WO2015039693A1 - Method and system for data quality assessment

Info

Publication number: WO2015039693A1
Application number: PCT/EP2013/069551
Authority: WO
Inventors: Francesco ALESIANI; Mahsa FAIZRAHNEMOON
Original assignee: Nec Europe Ltd.
Priority date: 2013-09-20
Filing date: 2013-09-20
Publication date: 2015-03-26

Abstract

A method for assessing the quality of ITS related traffic data is disclosed that comprises determining samples of traffic data collected for a particular type of traffic measure, for a particular sample of traffic data under analysis, defining a set of quality indicators l _xthat assess said sample with respect to different aspects, for each of said quality indicators l _x, calculating a quality indicator value by evaluating the consistency and/or deviation of said sample with respect to a spatial or temporal neighbor sample by means of applying different analyses tools for each of said quality indicators l _x. Furthermore, a corresponding system for assessing the quality of ITS related traffic data is described.

Description

METHOD AND SYSTEM FOR DATA QUALITY ASSESSMENT

The present invention relates to a method and system for assessing the quality of ITS related traffic data.

Intelligent Transport Systems (ITS) are widely used to improve utilization and increase safety of transport system. The performance of these systems is dependent on the information they use and on its reliability. ITS systems have evolved from singled manned system to open market system, where information provision and use is implemented by different actors.

In either cases, when the system is closely managed (as illustrated in Fig. 1 , where the concept is illustrated for a single measurement point "A" that collects traffic data for a particular section of a road) or when the information is exchanged by different entities (as illustrated in Fig. 2), the quality of the data shall be assessed in order to provide a reliable system. Generally speaking "Quality" defines the extent that the provided quantity meets end-user requirements.

Data quality can be assessed towards different criteria (for instance accuracy, completeness, ...) since the expectation of the end-user may be multiple. Some quality criteria can be considered at the design/implementation phase of a road traffic measuring system and for its nominal behavior, while others shall be evaluated at runtime or on regular bases. Moreover, some quality criteria may be evaluated against a reference system either in closed testing facility or performing dedicated onsite campaign.

In modern ITS system, accuracy is increasingly important and performance of the system is directly connected to the provided data quality. A degradation of the quality of the data is not typically followed by a graceful degradation of the ITS system, since small error may induce large effects. An indication of the reliability of the provided data is crucial in this case. Further, during operation of a road traffic measuring system it may happen that sensors of the measurement points or some other parts of the whole system may not function at the nominal levels. For the above reason, it is important to have other online technology available. In this regard, manual check of the status of the sensors is a typical approach which, however, is lengthy and costly. Further it does not scale with the size of the system and it may hinder the use of more advanced ITS systems.

In: C. Chen, J. Kwon, J. Rice, A. Skabardonis, P. Varaiya, "Detecting Errors and Imputing Missing Data for Single Loop Surveillance Systems", 82^nd Annual Meeting Transportation Research Board, January 2003, Washington, D.C., it is proposed to provide online indicators that include the check of some unrealistic conditions like, for instance, non zero measures with zero occupancy, non zero flow with zero density, or samples with high occupancy, especially with non zero flow. However, for many applications the accuracy and comprehensiveness of this solution is not sufficient.

In view of the above it is an objective of the present invention to improve and further develop a method and a system for assessing the quality of ITS related traffic data in such a way that an enhancement in terms of comprehensiveness and predictive quality is achieved.

In accordance with the invention, the aforementioned object is accomplished by a method comprising the features of claim 1. According to this claim such a method comprises

determining samples of traffic data collected for a particular type of traffic measure,

for a particular sample of traffic data under analysis, defining a set of quality indicators vthat assess said sample with respect to different aspects,

for each of said quality indicators l_x, calculating a quality indicator value by evaluating the consistency and/or deviation of said sample with respect to a spatial or temporal neighbor sample by means of applying different analyses tools for each of said quality indicators l_x.

Furthermore, the above mentioned objective is accomplished by a system comprising the features of claim 15. According to this claim such a system comprises computation means for determining samples of traffic data collected for a particular type of traffic measure, and for defining, for a particular sample of traffic data under analysis, a set of quality indicators l_x that assess said sample with respect to different aspects, and

a number of different analyses tools for each of said quality indicators l_x, wherein said analyses tools are configured to calculate, for each of said quality indicators l_x, a quality indicator value by evaluating the consistency and/or deviation of said sample with respect to a spatial or temporal neighbor sample.

Preferred embodiments of the invention are specified in the dependent claims.

According to the invention a mechanism is identified that allows evaluating the quality of traffic data provided by a traffic measurement system or a data provider to be used in the context of ITS systems. Traffic data quality is relevant for safety reasons and for exploitation aspects, and providing data quality associated with the actual data allows improving value of the data itself. The present invention addresses the problem by defining a set of quality indicators that allow evaluating different aspects of the traffic data. Embodiments of the invention allow evaluating, with predictive capabilities, the current and trend of the reliability of each sensor of a traffic measurement system by exploiting information on the previous measure of the same sensor, the underlying physical phenomenon characteristics and/or the relationship with other measurement sites. Furthermore, embodiments of the invention address the online computation of confidence and/or reliability quality indicators of the provided data based on some consistency or rate of variance tests. Still further, embodiments of the invention use subsampling techniques and compute some structure in the data that can then be used to measure consistency in the data on different road sections or in different time intervals.

By defining a set of different quality indicators that assess data with respect to different aspects and attitudes, comprehensive measurements of the data quality for ITS systems are possible, resulting in the ability to identify more complex failure patterns, to effectively plan maintenance activities for the individual sensors of the traffic measurement system and, generally, to implement more reliable ITS systems. Embodiments of the invention allow measurements in particular with respect to the following aspects:

Accuracy (which is the main focus of the present invention): how close the current measure is to the real underlay physical phenomenon Completeness: the presence of missing data

Validity: validity is the period of measurement or validity of the provided data

Timeliness: is the delay introduced between the physical phenomenon and the availability of the data

Coverage: is the spatial availability of the data or the portion of the road that is monitored

Accessibility: is the level at which the data is made available

There are several ways how to design and further develop the teaching of the present invention in an advantageous way. To this end it is to be referred to the patent claims subordinate to patent claims 1 and 15 on the one hand and to the following explanation of preferred embodiments of the invention by way of example, illustrated by the drawing on the other hand. In connection with the explanation of the preferred embodiments of the invention by the aid of the drawing, generally preferred embodiments and further developments of the teaching will be explained. In the drawing

Fig. 1 is a schematic view illustrating the basic building blocks of a singled manned road traffic measuring system,

Fig. 2 is a schematic view illustrating the basic building blocks of a road traffic measuring system according to an open data market scenario,

Fig. 3 is a schematic view illustrating an indicator function block,

Fig. 4 is a quality indicator block diagram in accordance with an embodiment of the present invention, Fig. 5 is a schematic view illustrating a scenario of section measures,

Fig. 6 is a schematic view illustrating a scenario of corridor measures,

Fig. 7 is a schematic view illustrating a scenario of network measures,

Fig. 8 is a diagram illustrating extraction of correlation variables from correlation function shape,

Fig. 9 is a schematic view illustrating fuzzy systems used in connection with corridor related quality indicators,

Fig. 10 is a schematic view illustrating membership functions employed in connection with the fuzzy systems of Fig. 9,

Fig. 1 1 is a schematic view illustrating a first configuration of combined fuzzy systems used in connection with corridor related quality indicators,

Fig. 12 is a schematic view illustrating a second configuration of combined fuzzy systems used in connection with corridor related quality indicators,

Fig. 13 is a schematic view illustrating three membership functions of a first type used in connection with corridor related quality indicators,

Fig. 14 is a schematic view illustrating three membership functions of a second type used in connection with corridor related quality indicators,

Fig. 15 is a schematic view illustrating three membership functions of a third type used in connection with corridor related quality indicators,

Fig. 16 is a schematic view illustrating the working principle of a statistical tool,

Fig. 17 is a schematic view illustrating the processing for determining a fluctuation quality indicator for a time interval, Fig. 18 is a diagram illustrating the process of reference mask creation for evaluating a time interval quality indicator,

Fig. 19 is a diagram illustrating the process of estimation of an unsymmetrical probability distribution,

Fig. 20 is a diagram illustrating fundamental diagram estimation,

Fig. 21 is a schematic view illustrating the processing for determining a time interval quality indicator for a section,

Fig. 22 is a schematic view illustrating the processing for determining a time window consistency quality indicator for a section,

Fig. 23 is a diagram schematically illustrating alarm activation with hysteresis,

Fig. 24 is a diagram illustrating the performance of quality indicator ,

Fig. 25 is a diagram illustrating the performance of quality indicator IB,

Fig. 26 is a diagram illustrating the performance of quality indicator lc,

Fig. 27 is a diagram illustrating correlation between consecutive sections,

Fig. 28 is a diagram illustrating different fuzzy system quality indicators,

Fig. 29 is a diagram illustrating fundamental diagram estimation with two linear functions,

Fig. 30 is a diagram illustrating fundamental diagram approximation with a polynomial function, Fig. 31 is a diagram illustrating the performance of a fundamental diagram check for a test day, and

Fig. 32 is a diagram illustrating the performance of a fundamental diagram check for a training day.

In the following description of preferred embodiments of the present invention, it is a key aspect to note that only a single quality indicator can not capture the overall quality of ITS related traffic data. For this reason, in accordance with embodiments of the invention a number of different quality indicators are combined to comprehensively evaluate the quality of traffic data. In other words, the present invention addresses the problem of computing multiple different indicators of the quality of road traffic data (resulting either from the direct measurement of traffic data or from elaborated traffic data), wherein each of the applied indicators evaluates the data according to different aspects and attitudes.

An indicator is a real number that provides information about the reliability or an assessment of the quality of the collected data. Hereinafter, quality indicators are assumed to be normalized to the interval between 0 and 1 , i.e. each indicator provides a value in [0, 1], wherein a higher value of the indicator means higher quality of data. As will be appreciated by those skilled in the art, different kinds of normalization are also possible. As shown in Fig. 3, in order to calculate quality indicator values a data quality assessment apparatus is provided that takes as input traffic data collected over a long period and that provides as a result the quality indictor values.

In accordance with the present invention a set of different quality indicators is defined that allows evaluating different aspects of the traffic data. Fig. 4 is a quality indicator block diagram related to an embodiment of the invention, in which a total number of 6 quality indicators is being determined (together with two intermediate quality indicators, as will be explained in more detail below). As can be obtained from Fig. 4, the quality indicators are associated with different analysis tools, e.g. statistical tools or fuzzy logic. Furthermore, as indicated in the left column of the diagram, before the respective analysis tool is applied, the traffic data is organized or grouped in a certain way that is specific for each of the quality indicators.

It is noted that in some cases it may prove to be useful to add a pre-processing step in the system in order to avoid problems related to measures taken over a time period too short for performing reliable analysis, especially in single manned operated systems. According to a preferred embodiment, data is then filtered with nearby samples using an averaging filter as follows: y^ = _{v ^ v} ∑a_Tx{t - r) where α_τ , Κ_ι , K₂ are configuration parameters.

The different ways of data organization/preprocessing and analysis tool application will be explained hereinafter in detail for each of the quality indicators shown in Fig. 4. As will be appreciated by those skilled in the art, other quality indicators than those defined in connection with the embodiment of Fig. 4 can be defined.

Corridor related quality indicators (IG and I H in Fig. 4):

A first embodiment deals with the analysis of the data quality based on the observation of two or more road sections. It is noted that, generally, data taken from different measure sites can be considered within data quality assessment procedures. For instance, measure configuration can be classified in the following categories:

1) Section measures: This kind of measures refers to a point along the road, as explanatory illustrated in Fig. 5 (measurement point "A").

2) Corridor measures: This kind of measures refers to multiple measures that are spatially related. For instance, in highway scenarios there exist two configurations, with or without merging (entering/exit) flows. For other scenarios (non-highway roads) there may be present multiple and sparse merging areas, as illustrated in Fig. 6 (measurement points "A" and "B"). 3) Network measures: This kind of measures is the most complex configuration and may include multiple merging points and traffic signal control systems, as illustrated in Fig. 6 (measurement points "A"-"E").

The indicator in the case described here is based on the correlation of two road sections, wherein the correlation is defined as

In the above formula / denotes the basic time step at which measurements are taken, e.g. every 30 seconds, x refers to the measurements taken by a first measurement point A, while y refers to the measurements taken by a second (neighbored) measurement point B. Specifically, x = yA (t, d) and y = ye (t + τ, d) , wherein τ is the delay between the two sections. The day whose data is under consideration is represented by i/ and n is the length of the vector of data, i.e. n gives the number of measurements contained within the data sample that is being analyzed.

It is noted that the measurements x and y, respectively, taken by the two measurement points A and B, respectively, can be related to any specific traffic measures type. Specifically, the following types of traffic measure can be considered:

1) Flow or volume: this is the total number of vehicles passing in a point on the road over a given interval of time. The road itself can be a lane or the whole carriageway. Since the flow is a directional information, normally it will be differentiated between the two directions of movement of a road

2) Speed: this is the distribution or the mean of the speed of vehicles passing in a defined section of a road in a specific time interval

3) Occupancy: is the percentage of time a roadway detection zone is occupied by vehicles or where the sensor detects presence of some vehicle 4) Travel time: the distribution or mean measured travel time that vehicles take to traverse a specific section of a road

5) Density: measure the distribution or mean of the number of vehicles in a specific section of a road

6) Delay: is the difference between the travel time in free flow or a maximum allowed speed and the actual travel time

7) Queue length: refers to the length or number of vehicles with speed under a specific threshold that indicate that the vehicles are waiting and can not proceed further.

Information presented can be also classified by type of vehicle, where type of vehicle may refer to the length, weight class or any other characteristics of the vehicle, including the type of use.

Turning back to the calculation of the corridor related quality indicators, based on the correlation of two sections A and B, the following information is derived, as schematically shown in Fig. 8:

• The maximum correlation value c_UAX

• The closest local (i.e. neighbored) minimum correlation values

• The delay at the maximum correlation value τ_ΜΑΧ

• The angle at the maximum a_MAX

• The area defined by the maximum A_mx

When data in the two sections is reliable, that is when the indicator of the sections is over a predefined threshold, the delay variable can be used to infer information on the traffic state and it can be compared with the measure of travel time or speed between the two sections. In the present embodiment the indicator of the sections is defined using fuzzy logic. The fuzzy logic allows to relate intervals of values with linguistic rules. The fuzzy logic system is defined by a set of fuzzy membership functions for the input and output and a set of rule to pass from the input to the output. Three fuzzy systems are used; two of them are shown in Fig. 9. The inputs of the fuzzy system (A) are c_mx , a_mx , and A_UAX of the current data, that is of the data sample currently being analyzed. Bigger area, tighter angle and higher correlation develop higher quality.

The second fuzzy system (B) is developed to extract the quality of specific data compared to historical data. The input to the fuzzy system (B) is therefore the difference (A) of the correlation variables. This fuzzy system determines if the quality of the specific data is higher or lower than the average data by assigning a value between 0 and 1 .

An embodiment of the qualitative shape of the membership functions that can be used for the fuzzy systems is illustrated in Fig. 10.

A last fuzzy system (C) is developed in such a way that it allows integrating partial indicators, as outputted either by fuzzy system (A) or (B). Fig. 1 1 illustrates a fuzzy system configuration, in which fuzzy system (C) receives as first input the indicator associated with the current data and as second input the indicator associated with the historical data, both generated by fuzzy system (A). The resulting output is a quality indicator termed IG (see Fig. 4 for reference). Fig. 12, on the other hand, illustrates a fuzzy system configuration, in which fuzzy system (C) receives as first input the indicator of the difference of the data generated by fuzzy system (B), and as second input the indicator associated with the historical data, generated by fuzzy system (A). The resulting output is a quality indicator termed I H.

Hereinafter, an implementation of the entire fuzzy system according to a specific embodiment will be described. However, it is to be understood that the exact definition can be changed depending on the actual requirements, e.g. in terms of sensibility of the final indicator.

Fuzzy Logic System A:

• Input

o Max Correlation, three membership functions of type A as illustrated in Fig. 13, where a, b, c are equal to (0,0.5, 1 ) o Area, three membership functions of type A, where a, b, c are equal to (0,10,20)

o Cosine of the angle, three membership functions of type A, where a, b, c are equal to (-1 ,-0.5,0)

• Output

o Indicator function, three membership functions of type A, where a, b, c are equal to (0,0.5,1 )

The rule set that is applied in fuzzy logic system A and that relates the logical link between inputs to the output is described in the following table:

It is noted that in the above table the condition are in AND. So the first line of the table is read as: IF INPUT(1 ) = set 1 AND INPUT(2) = set 1 AND INPUT(3) = set 1 THEN OUPUT=set 1.

Fuzzy Logic System B:

• Input

o Delta Max Correlation, three membership functions of type B as illustrated in Fig. 14, where a, b, c, d, e are equal to (-1 ,-0.5,0,0.5,1 ) o Delta Area, three membership functions of type C as illustrated in Fig. 15, where a, b, c, d, e are equal to (-20,-10,0,10,20) o Delta Cosine of the angle, three membership of type B functions, where a, b, c, d, e are equal to (-2,-1 ,0,-1 ,2)

• Output

o Indicator function, three membership functions of type A, where a, b, c are equal to (0,0.5,1 ) les being applied in fuzzy logic system B are summarized in the following

Fuzzy Logic System C

• Input

o Indicator on current state, three membership functions of type A, where a, b, c are equal to (0,0.5,1 )

o Indicator on historical data, three membership functions of type A, where a, b, c are equal to (0,0.5,1 )

• Output

o Indicator function, three membership functions of type A, where a, b, c are equal to (0,0.5,1 ) les being applied in fuzzy logic system C are summarized in the following

Time interval quality indicator (IB in Fig. 4):

In order to define the quality indicator of data of a time interval, it is determined how fast the data is changing by using a statistical tool and defining a formula that quantifies the fluctuation trend of the data. Generally, as shown in Fig. 16, a statistical tool as employed herein is configured to derive two basic outputs: the mean and standard deviation of the means of resamples of the input data. The procedure is as follow:

1 ) A sample of a society is selected and resampled

2) The means of the resampled data are generated

3) The process is repeated for a given number of times

4) The mean and standard deviation of the generated means is computed

The statistical tool can be seen as a module that, given an array of data, generates its mean and the standard deviation of the mean.

As shown in Fig. 17, in the specific embodiment the inputs to the statistical tool are the means of each basic time step data along the days, /// , in a given time interval, and the outputs ( &, _μ and Ob, _μ) are the mean and the standard deviation of the means of the re-samples of the input. The fluctuation formula is then defined based on &, _μ and Ob, _μ as When the value of the fluctuation decreases, the trend of change of the data is increased.

The fluctuation could be linked to an error in the measure or related to any actual physical phenomenon. In a preferred embodiment, in order to distinguish these two cases, a reference mask is created. A mask can be developed based on the fluctuation values of some other sections or based on the history of the same section.

An embodiment of reference mask creation is illustrated in Fig. 18. The distribution of the mask is not symmetric and is found by estimating the standard deviation of the right and the left side of the diagram. This approach is described hereinafter in connection with an unsymmetrical probability distribution, as shown in Fig. 19, by applying either a maximum likelihood approximation or a sigma search.

To approximate the distribution of the data, a not symmetrical probability distribution is used. This distribution is defined by mean value and the standard deviation on the left and the right of the mean value. In order to compute the three variables the two standard deviations are related by the skewness ras follow:

The problem can be written as

where the standard deviation, the mean and the skewness are computed iteratively. Proper scaling factor of the single term can be added and defined in order to best fit the distribution. x_ and x₊ are the values of the input histogram

Another way is proposed to find ¾ and ¾ when the distribution is not symmetric. By dividing the density function by the maximum value of the density function, one achieves a function whose values are between zero and one. When χ = μ + σ_Ά or = μ - σ_ι , the value of this function is e^~i Therefore, looking for points whose second coordinate values are

leads to ¾ and¾. The distribution is:

The final indicator (termed IB in Fig. 4) is defined based on the difference between the mask and the indicator of the current time interval. The indicator is then

where the probability distribution Pr of the mask has been computed based on the data samples. The probability is given by

) where

Fundamental Diagram consistency quality indicator (L in Fig. 4):

Another indicator is defined based on the fundamental diagram (FD). The fundamental diagram represents the relationship between the density (in terms of vehicle per unit of length) and the flow (in terms of vehicle per unit of time). The fundamental diagram depends on the specific road segment and its characteristics, as for example the maximal speed and the number of lanes.

As shown in Fig. 20, the fundamental diagram is defined as two polynomials (indicated by the solid lines), separated by the critical density K_c. Each part of the diagram is characterized by a standard deviation (indicated by the dashed lines). One way to derive the fundamental diagram from the density and flow data is by applying the following procedure (it is noted that, depending on the data available, some processing may be needed to derive the density from the occupancy measure):

The fundamental diagram is approximated with two polynomial function divided by the critical density. One example is to have two linear approximations. First an initial value of critical density is generated. The critical density is first defined as the density corresponding to the highest flow. For each semi interval defined by Kc, the approximation is derived using regression. An iterative process can than start by changing the value of K_c by a positive and negative amount and then computing the approximation error. The value of K_c is updated towards the value that minimizes the approximation error. Finally the value of the standard deviation is computed for each half interval. These values are used to define the interval of confidence. Fig. 20 shows an example fundamental diagram with its two linear approximations.

Once the fundamental diagram is estimated from past data, the current measure being under analysis is compared to the diagram and a quality indicator L is generated based on the weighted distance between the diagram and the measure as follows:

I_L = 2Pr(J) = 2ψ(ά I μ, o_RlGmtLEFT ) where ORIGHT/LEFT is the standard deviation of the error as estimated by the procedure either on the left or right side of the critical density. Further quality indicators (IA and lc in Fig. 4):

The first step is to extract the quality of the data of a single section. The quality of a single basic time step and a time interval can be extracted by the quality indicators described above. The quality of a specific data with respect to the data in a window can also be considered.

For instance, the indicator of a basic time step may be defined based on the output of the statistical tool. As shown in Fig. 21 , the input of the statistical tool is the data of a single time step, The outputs are μ„ and ¾ which are the mean and standard deviation of means of resamples of the input. The quality indicator is defined as follows:

When the quality of the data and the quality of the mean is high, it means that the value of the standard deviation is low. Therefore the quality indicator has a value close to one which represents high quality.

Still another quality indicator could be based on a time window consistency check. Such time window quality indicator measures the quality of a specific data, d, compared to other data in a neighborhood (window) around it. As shown in Fig. 22, the input of the statistical tool is the data of the window, excluding the data under consideration, and the outputs are the mean and standard deviation of the resamples of input. The output leads us to the distribution of the data and the quality indicator is defined as:

Similar to the above time window quality indicator for a single section, another quality indicator can be created by considering the projection of the traffic measure, as for example the flow, to the following or precedent sensor and to verify its value with respect to the section statistics. The projection is computed using a model for the evolution of the traffic flow in time and space. The simpler model is the constant speed propagation. Alternative method is to use multi modal constant speed propagation, where the traffic is divided in classes of different speed and the flow or number of vehicle for each class is propagated according to the specific class velocity.

Furthermore, based on the time related quality indicators described above, more compact quality indicators can be defined. For instance, according to one embodiment hysteresis thresholds may be defined that allow to rise an alarm only if the corresponding quality indicator drops below a predefined lower threshold for a predefined time period that is sufficiently long, and the alarm is closed only if the quality indicator is over threshold for a sufficiently long time period, as illustrated in Fig. 23. Still further, an alarm probability can be defined based on the percentage of time the corresponding quality indicator is below a specific threshold.

In Open Data Marked scenario actors shall exchange information of the quality on data and the method used for their computation. Based on the above quality indicator calculation, possible information that may be exchanged among different entities with quality indication can include, however, without being limited thereto:

1) Detailed time step indicators (indicator that refers to the current time step)

2) Indicators for the last closed time interval (time interval consists of several time steps)

3) Indicators for the whole period up to the current time step: typically the period of observation is restarted at the midnight of the previous day, but can also be a window of 24 hours

4) Method of computation: indicators defined in this invention include time step, interval, correlation based and fundamental diagram consistency check 5) Parameters of the computation, as the size of the time window, the presence and size of the pre-processing, size of the time interval.

Performance

In order to verify the defined quality indicators, data from a Highway system is used. The area of study is a road stretch where the length of each section is of 500 meters. The road includes 7 lanes. Data is available for every minute of each lane for every day of a week excluding the weekend. Before computing the quality indicators, the data per lane is aggregated for each direction. When data is not available it is substituted with an invalid value (negative value). Figs. 24-32 show the quality of flow extracted by the quality indicators defined above. The number of iterations of the statistical tool is 1000 for all figures. In the verification, the input data has been pre-processed by a filter of k=10 samples in order to smooth data, k is changing according to the number of the available data. In general, the coefficients of the smooth are 1/(1 + k).

In Fig. 24, the quality of each minute data along the days of one week for two different sections is shown. The quality is extracted according to the indicator . The figure clearly shows that the amount of unavailable data affects the quality of the data.

The time intervals data quality indicator is represented in Fig. 25 for the same sections shown in Fig. 21 . The value of the fluctuation quality indicator I F is plotted in the two diagrams in the second row. The value of the quality indicator IB is shown in the top row. Each time interval is 20 minutes. A mask is defined based on 5 other sections and the quality of the section is extracted here.

On the top of Fig. 26 the quality of the data of the third day of the week with respect to the other days of the week is depicted. The quality is calculated based on quality indicator lc. The size of the window is 5, since the data is not available for the weekend. The flow of the third day is shown in the second row and the average flow of the other days of the week is represented at the bottom. The quality of the data is extracted for each minute and it can be seen in on the top of Fig. 26.

Fig. 27 shows the correlation between two sections in a row. The vector of the first section is one hour data, 10:00 - 1 1 :00 am, and the time delay is assumed to be in [-10 10] minutes. The correlation is computed for the last day of the week and for the average data of the week.

In Fig. 28, the quality of the data for each 10 minutes are plotted. The upper diagram depicts the quality of the data of the last day of the week and the average data of the week based on the fuzzy logic A. It can be seen that in some parts the last day has higher or lower quality compared to the average data. The below I diagram shows the quality of the last day compared to the average of the week which is extracted based on the fuzzy logic B.

Figs. 29 and 30 show two different approximations of the fundamental diagram. These diagrams are then used to check the consistency of new data.

With relationship to the fundamental diagrams, Fig. 31 shows the fundamental diagram estimated in one training day, while Fig. 32 shows its use in the following test day. Error is reported. For the analyzed section the error level is almost the same between the two days.

Many modifications and other embodiments of the invention set forth herein will come to mind the one skilled in the art to which the invention pertains having the benefit of the teachings presented in the foregoing description and the associated drawings. Therefore, it is to be understood that the invention is not to be limited to the specific embodiments disclosed and that modifications and other embodiments are intended to be included within the scope of the appended claims. Although specific terms are employed herein, they are used in a generic and descriptive sense only and not for purposes of limitation.

Claims

C l a i m s

1. Method for assessing the quality of ITS related traffic data, comprising: determining samples of traffic data collected for a particular type of traffic measure,

2. Method according to claim 1 , wherein said set of quality indicators l_x includes an indicator - corridor correlation indicator - that assesses the data quality for two or more sections by analyzing the correlation of traffic data obtained by measurement points or areas located at said sections.

3. Method according to claim 2, wherein said corridor correlation indicator value is calculated based on logical fuzzy rule sets which compare the sample of traffic data under analysis with historical data samples based on variables derived from the correlation function shape.

4. Method according to claim 3, wherein said variables derived from the correlation function shape include the maximum correlation value and the closest local minimum correlation values, the delay at the maximum correlation value and, based on these values, an angle and an area at the maximum correlation.

5. Method according to any of claims 1 to 4, wherein said set of quality indicators l_x includes an indicator - time interval indicator - that assesses the fluctuation trend of traffic data in a single measurement point or area.

6. Method according to claim 5, herein said time interval indicator determination includes the step of applying random subsampling of said traffic data and the step of computing the fluctuation trend by measuring the variation of the means of said subsamples.

7. Method according to claim 5 or 6, wherein said time interval indicator determination includes the step of identifying anomalous deviations by using the underlying estimated probability distribution of a reference mask that analyses the data on a time interval.

8. Method according to any of claims 1 to 7, wherein said set of quality indicators l_x includes an indicator - fundamental diagram indicator - that assesses data quality based on fundamental diagram profiles by determining deviations of a sample of traffic data under analysis with respect to an estimated historical fundamental diagram.

9. Method according to any of claims 1 to 8, wherein said set of quality indicators l_x includes an indicator that assesses traffic data quality in a single measurement point or area by identifying anomalous deviations of an historical profile of a single time instant by using the underlying estimated probability distribution.

10. Method according to any of claims 1 to 9, wherein said set of quality indicators l_x includes an indicator that assesses traffic data quality in a specific time period with respect to temporally closed data (window) by evaluating the deviation to the underlying estimated probability distribution.

1 1. Method according to any of claims 1 to 10, wherein a probability distribution for the quantification of the quality of data is evaluated based on the histogram of the samples.

12. Method according to any of claims 1 to 1 1 , wherein a probability distribution for the quantification of the quality of data is evaluated via the solution of a maximum likelihood problem.

13. Method according to any of claims 1 to 12, wherein a quality indicator is derived by combining a quality indicator determined for historical data and a quality indicator determined for a current data sample based on the correlation of two or more sections.

14. Method according to any of claims 1 to 13, wherein a compact quality indicator is defined based on higher granularity quality indicators.

15. System for assessing the quality of ITS related traffic data, comprising: computation means for determining samples of traffic data collected for a particular type of traffic measure, and for defining, for a particular sample of traffic data under analysis, a set of quality indicators l_x that assess said sample with respect to different aspects, and

16. System according to claim 15, wherein at least one of said analyses tools comprises a fuzzy system that is configured to be applied with a set of fuzzy membership functions on correlation parameters derived from said traffic data.

17. System according to claim 15 or 16, wherein one fuzzy system is configured to extract the quality of specific data compared to historical data by receiving as input the differences of correlation variables derived from the respective data.

18. System according to any of claims 15 or 17, wherein at least one of said analyses tools comprises a statistical tool that is configured to quantify the fluctuation trend of the input data.

19. System according to claim 18, wherein said statistical tool is configured to derive as output the mean and standard deviation of the means of resamples of the input data.