CN111612783B

CN111612783B - Data quality assessment method and system

Info

Publication number: CN111612783B
Application number: CN202010472680.2A
Authority: CN
Inventors: 李安然; 张兰; 李向阳; 谢筠庭
Original assignee: University of Science and Technology of China USTC
Current assignee: University of Science and Technology of China USTC
Priority date: 2020-05-28
Filing date: 2020-05-28
Publication date: 2023-10-24
Anticipated expiration: 2040-05-28
Also published as: CN111612783A

Abstract

The invention discloses a data quality evaluation method and a system, wherein the method comprises the following steps: evaluating internal characteristics of the data irrelevant to tasks on the data set to obtain the data set meeting the minimum internal quality requirement; extracting the characteristics of each data in the data set and the sample data set meeting the minimum internal quality requirement to obtain the characteristic vector of each data; performing context quality assessment on the feature vector of each data in the data set and the sample data set meeting the minimum internal quality requirement to obtain a quality assessment result; and sorting the quality evaluation results. The invention can comprehensively consider the internal quality irrelevant to the task, the context quality relevant to the task and the requirement on large-scale data quality evaluation when evaluating the data quality, thereby effectively improving the comprehensiveness, accuracy and efficiency of the data quality evaluation.

Description

Data quality assessment method and system

Technical Field

The present invention relates to the field of data processing technologies, and in particular, to a data quality evaluation method and system.

Background

Today, with the rapid development of mobile networks, sensor networks and crowd sensing technologies, a wide variety of data is being generated in large quantities. At the same time, a large number of data-based information services are also emerging, in which the quality of the data plays a vital role. 1) The high quality data may provide sufficient and accurate information to accomplish a particular task, such as training a high quality machine learning model; helping smart city systems make informed decisions. 2) A large number of services provide the data itself as a product to users on demand, for example, crowd-sourced services. For these services, the quality of the data determines the satisfaction of the user. 3) High data quality helps to optimize system resource utilization. Limited resources (e.g., bandwidth, storage and computing resources) should be preferentially allocated to high quality data to ensure system performance and quality of service. Taking crowd sensing application as an example, a large number of participants upload images in a mobile phone, effective data quality assessment, especially effective quality assessment of a large image set, can remarkably promote quality of the uploaded images, so that bandwidth loss caused by low-quality image transmission is avoided.

Data quality assessment has attracted attention from researchers, however, existing assessment methods suffer from the following drawbacks when faced with specific tasks and large amounts of data. First, existing work mostly focuses on the inherent quality of data, while important context quality is ignored. With the same data, one task may perform well while another task may perform poorly. For example, a high quality image dataset for training face recognition may be a poor quality dataset for object detection tasks. Second, existing works mostly aim at single data units (such as a picture and a text) when evaluating the quality of data, and lack an evaluation method for the overall quality of the data set. If the overall quality of the data set is obtained simply by the quality statistics of the individual data units, such as the statistics of the minimum or average value of the quality of all the data units, the influence of the relationship between the data units on the quality of the data set is ignored. Finally, although data quality has been proposed for various dimensions, it remains a challenge to fuse these dimensions to obtain a comprehensive overall quality result.

Therefore, how to evaluate the quality of data more comprehensively and accurately is a problem to be solved urgently.

Disclosure of Invention

In view of the above, the present invention provides a data quality evaluation method, which can comprehensively consider the internal quality irrelevant to the task, the context quality relevant to the task, and the requirement for large-scale data quality evaluation during data quality evaluation, thereby effectively improving the comprehensiveness, accuracy and efficiency of data quality evaluation.

The invention provides a data quality assessment method, which comprises the following steps:

evaluating internal characteristics of the data irrelevant to tasks on the data set to obtain the data set meeting the minimum internal quality requirement;

extracting the characteristics of each data in the data set and the sample data set meeting the minimum internal quality requirement to obtain the characteristic vector of each data;

performing context quality assessment on the feature vector of each data in the data set and the sample data set meeting the minimum internal quality requirement to obtain a quality assessment result;

and carrying out quality sorting on the quality evaluation results.

Preferably, the evaluating the internal features of the data independent of the task to obtain the data set meeting the minimum internal quality requirement includes:

evaluating the correctness, reliability and error-free degree of the data set by a pattern matching method to obtain an accuracy quantized value;

evaluating the data acquisition and storage precision of the data set to obtain an accuracy quantization value;

evaluating the unbiased degree of the data set to obtain an objectivity quantification value;

evaluating the trusted degree of the data source of the data set to obtain a reliability quantification value;

and obtaining a data set meeting the minimum internal quality requirement based on the accuracy quantized value, the objectivity quantized value, the reliability quantized value, the accuracy minimum quality requirement, the objectivity minimum quality requirement and the reliability minimum quality requirement.

Preferably, the feature extraction is performed on each data in the data set and the sample data set meeting the minimum intrinsic quality requirement, so as to obtain a feature vector of each data, which includes:

and extracting the eighth layer of features from each picture data in the data set and the sample data set meeting the minimum intrinsic quality requirement by using a VGG-16 model as feature vectors of the picture data.

and extracting the penultimate layer of features from each text data in the data set and the sample data set meeting the minimum intrinsic quality requirement by using a BERT model as feature vectors of the text data.

Preferably, performing a context quality assessment on the feature vector of each data in the data set and the sample data set meeting the minimum intrinsic quality requirement to obtain a quality assessment result, including:

calculating the ratio of the number of similar point pairs to the distance in the data set meeting the minimum internal quality requirement and the sample data set by adopting a method based on local sensitive hash to obtain a task correlation evaluation result;

calculating the average distance between the data set meeting the minimum internal quality requirement and the sample data set by adopting a method based on local sensitive hash to obtain a content diversity evaluation result;

calculating the ratio of the number of non-empty data in the data set meeting the minimum internal quality requirement and the sample data set to the total data amount to obtain an integrity evaluation result;

evaluating whether the data volume in the data set and the sample data set meeting the minimum internal quality requirement meets the requirement of a given task or not, and obtaining a suitable degree evaluation result of the data volume;

and evaluating whether the service cycle of the data set meeting the minimum internal quality requirement and the service cycle of the sample data set meet the requirement of a given task or not, and obtaining a time-efficiency evaluation result.

A data quality assessment system, comprising:

the intrinsic quality evaluation module is used for evaluating the internal characteristics of the data irrelevant to the task on the data set to obtain the data set meeting the minimum intrinsic quality requirement;

the feature extraction module is used for carrying out feature extraction on each data in the data set and the sample data set meeting the minimum internal quality requirement to obtain a feature vector of each data;

the context quality evaluation module is used for performing context quality evaluation on the feature vector of each data in the data set and the sample data set meeting the minimum internal quality requirement to obtain a quality evaluation result;

and the quality sorting module is used for sorting the quality of the quality evaluation result.

Preferably, the intrinsic quality assessment module comprises:

the accuracy evaluation unit is used for evaluating the accuracy, the reliability and the degree of no error of the data set through a pattern matching method to obtain an accuracy quantized value;

the accuracy evaluation unit is used for evaluating the data acquisition and storage accuracy of the data set to obtain an accuracy quantized value;

the objectivity evaluation unit is used for evaluating the unbiased degree of the data set to obtain an objectivity quantification value;

the dependability evaluation unit is used for evaluating the trusted degree of the data source of the data set to obtain a dependability quantized value;

and the determining unit is used for obtaining a data set meeting the minimum internal quality requirement based on the accuracy quantized value, the objectivity quantized value, the reliability quantized value, the accuracy minimum quality requirement value, the objectivity minimum quality requirement value and the reliability minimum quality requirement value.

Preferably, the feature extraction module is specifically configured to:

Preferably, the context quality assessment module comprises:

the task relevance evaluation unit is used for calculating the ratio of the number of similar point pairs to the distance in the data set and the sample data set meeting the minimum internal quality requirement by adopting a method based on local sensitive hash to obtain a task relevance evaluation result;

the content diversity evaluation unit is used for calculating the average distance between the data set meeting the minimum internal quality requirement and the sample data set by adopting a method based on local sensitive hash to obtain a content diversity evaluation result;

the integrity evaluation unit is used for calculating the ratio of the number of non-empty data in the data set meeting the minimum internal quality requirement and the sample data set to the total data amount to obtain an integrity evaluation result;

the data volume fitness evaluation unit is used for evaluating whether the data volume in the data set meeting the minimum internal quality requirement and the sample data set meets the requirement of a given task or not, and obtaining a data volume fitness evaluation result;

and the timeliness evaluation unit is used for evaluating whether the service cycle of the data set meeting the minimum internal quality requirement and the sample data set meets the requirement of a given task or not, so as to obtain a timeliness evaluation result.

In summary, the invention discloses a data quality evaluation method, when the data quality is required to be evaluated, firstly evaluating the internal characteristics of data irrelevant to tasks on a data set to obtain the data set meeting the minimum internal quality requirement; then, extracting the characteristics of each data in the data set and the sample data set meeting the minimum internal quality requirement to obtain the characteristic vector of each data; and carrying out context quality assessment on the feature vector of each data in the data set and the sample data set meeting the minimum internal quality requirement to obtain a quality assessment result, and carrying out quality sequencing on the quality assessment result. The invention can comprehensively consider the internal quality irrelevant to the task, the context quality relevant to the task and the requirement on large-scale data quality evaluation when evaluating the data quality, thereby effectively improving the comprehensiveness, accuracy and efficiency of the data quality evaluation.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a method of embodiment 1 of a data quality assessment method of the present disclosure;

FIG. 2 is a flow chart of a method of embodiment 2 of a data quality assessment method of the present disclosure;

FIG. 3 is a schematic diagram of a data quality evaluation system according to an embodiment 1 of the present disclosure;

fig. 4 is a schematic structural diagram of an embodiment 2 of a data quality evaluation system according to the present disclosure.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

As shown in fig. 1, a method flowchart of an embodiment 1 of a data quality evaluation method disclosed in the present invention may include:

s101, evaluating internal characteristics of data irrelevant to tasks on the data set to obtain the data set meeting the minimum internal quality requirement;

when the data quality needs to be evaluated, the accuracy, the precision, the objectivity and the reliability are firstly in four dimensionsThe data set is evaluated for internal features of the data that are not related to the task. For data set D, the quantitative values for the four dimensions of accuracy, precision, objectivity and reliability are respectively Minimum quality requirements for four dimensions of accuracy, precision, objectivity and reliability are θ respectively _c ，θ _p ，θ _o ，θ _r . The data set D must meet the minimum intrinsic quality requirementInferior datasets that do not meet the lowest intrinsic quality requirement R will be placed directly at the bottom of the ordered list without further evaluation.

S102, extracting the characteristics of each data in the data set and the sample data set meeting the minimum internal quality requirement to obtain the characteristic vector of each data;

for the data set M meeting the minimum internal quality requirement R, the quality evaluator extracts the characteristics of each data in the data set M and the sample data set S to obtain the characteristic vector of each data.

S103, carrying out context quality assessment on the feature vector of each data in the data set and the sample data set meeting the minimum internal quality requirement to obtain a quality assessment result;

the context quality assessment dataset is adapted to the extent of a given task. In this embodiment, the data consumer expresses the task' S need for data by providing a small sample data set S. And then carrying out context quality assessment on the feature vector of each data in the data set and the sample data set meeting the minimum internal quality requirement from five dimensions of task relativity, content diversity, integrity, appropriateness of data volume and timeliness to obtain a quality assessment result.

S104, quality sorting is carried out on the quality evaluation results.

Finally, by minimizing Kendall tau distance, using quality ordering method (rank aggregation), a best quality ordered data set sequence is calculated given multiple input data set quality assessment result sequences, with the higher ordered data set having higher data quality on a given task.

In summary, in the above embodiment, when the data quality needs to be evaluated, firstly, evaluating the internal features of the data unrelated to the task on the data set to obtain the data set meeting the minimum internal quality requirement; then, extracting the characteristics of each data in the data set and the sample data set meeting the minimum internal quality requirement to obtain the characteristic vector of each data; and carrying out context quality assessment on the feature vector of each data in the data set and the sample data set meeting the minimum internal quality requirement to obtain a quality assessment result, and carrying out quality sequencing on the quality assessment result. The invention can comprehensively consider the internal quality irrelevant to the task, the context quality relevant to the task and the requirement on large-scale data quality evaluation when evaluating the data quality, thereby effectively improving the comprehensiveness, accuracy and efficiency of the data quality evaluation.

As shown in fig. 2, a method flowchart of an embodiment 2 of a data quality evaluation method disclosed in the present invention may include:

s201, evaluating the correctness, reliability and error-free degree of a data set by a pattern matching method to obtain an accuracy quantized value;

when the data quality needs to be evaluated, the data set is firstly evaluated for the internal characteristics of the data irrelevant to tasks from four dimensions of accuracy, precision, objectivity and reliability.

Specifically, the accuracy, reliability and error-free degree of the data are evaluated by using a pattern matching method, and an accuracy quantized value is obtainedFor example, for text data, its spelling and grammar are evaluated for correctness.

S202, evaluating the data acquisition and storage precision of a data set to obtain an accuracy quantization value;

meanwhile, the data acquisition and storage precision of the data set is evaluated to obtain an accuracy quantized valueFor example, a pre-trained convolutional neural network is utilized to estimate the accuracy of the image, including the JPEG compression rate of the picture, the degree of blurring, etc.

S203, evaluating the unbiased degree of the data set to obtain an objectivity quantification value;

meanwhile, the unbiased degree of the data set is evaluated by adopting a method for checking history objective records and questionnaire investigation, so as to obtain an objectivity quantification value

S204, evaluating the trusted degree of the data source of the data set to obtain a reliability quantification value;

meanwhile, the method for checking the history objective record and questionnaire is adopted to evaluate the trusted degree of the data source of the data set, and a reliability quantification value is obtained

S205, obtaining a data set meeting the minimum internal quality requirement based on the accuracy quantized value, the objectivity quantized value, the reliability quantized value, the accuracy minimum quality requirement, the objectivity minimum quality requirement and the reliability minimum quality requirement;

then, the value is quantized according to the accuracyAccuracy quantization value->Objectivity quantification value->Reliability quantization value->Minimum accuracy quality requirement value θ _c Minimum accuracy quality requirement value θ _p Objective minimum quality requirement value theta _o And a reliability minimum quality requirement value theta _r To meet the minimum internal quality requirementIs a data set of the (c).

S206, extracting the characteristics of each data in the data set and the sample data set meeting the minimum internal quality requirement to obtain the characteristic vector of each data;

Specifically, for the picture data, extracting the eighth layer of features of the picture data by using a VGG-16 model as feature vectors of the picture; for text data, its penultimate layer features are extracted as feature expressions of the text using the BERT (Bidirectional Encoder Representations from Transformers) model from the bi-directional encoder characterizer of the transformer.

S207, calculating the ratio of the number of similar point pairs to the distance in a data set and a sample data set meeting the minimum internal quality requirement by adopting a method based on local sensitive hash, and obtaining a task relevance evaluation result;

Wherein, a method based on local sensitive hash is adopted to calculate the ratio of the number X (M, S) of similar point pairs in the data set M and the sample data set S to the |D|, and the ratio is used to approximate the value of the task relevance. Specific:

the feature vectors are hashed locally and with high probability, similar data points are mapped into the same bucket and dissimilar data points are mapped into different buckets.

Calculating data point d in the same bucket _i E M and data point d _j Distance Dis (d) of E S _i ,d _j ) (e.g., based on the Euclidean distance or cosine distance of the feature vector), when the distance is less than the threshold delta, then data point d _i And data point d _j Are considered to be similar pairs of points.

X (M, S)/|d| (X (M, S) is the number of pairs of similar points) is calculated and used to approximate the value of the expressed task relevance.

S208, calculating the average distance between the data set meeting the minimum internal quality requirement and the sample data set by adopting a method based on local sensitive hash, and obtaining a content diversity evaluation result;

meanwhile, an average distance between the data set M and the sample data set S is calculated by adopting a method based on local sensitive hash sampling, and the distance is used for approximating the value of expressing the content diversity. Specific:

Uniformly sampling the data in all barrels to obtain a set G, and calculating the distance Dis (d) of all data points in the set G _i ,d _j ),d _i ,d _j E G, and approximating the value of content diversity with the mean (the higher the sampling rate, the closer the calculated mean is to the value of true content diversity).

S209, calculating the ratio of the number of non-empty data in the data set and the sample data set meeting the minimum internal quality requirement to the total data amount to obtain an integrity evaluation result;

and meanwhile, calculating the ratio of the number of non-null data in the data set and the sample data set meeting the minimum internal quality requirement to the total data amount to obtain an integrity evaluation result.

S210, evaluating whether the data volume in the data set and the sample data set meeting the minimum internal quality requirement meets the requirement of a given task or not, and obtaining a suitability evaluation result of the data volume;

and simultaneously, evaluating whether the data volume in the data set and the sample data set meeting the minimum internal quality requirement meets the requirement of a given task, and obtaining a suitable degree evaluation result of the data volume.

S211, evaluating whether the service periods of the data set and the sample data set meeting the minimum internal quality requirement meet the requirement of a given task or not, and obtaining a timeliness evaluation result;

and simultaneously, evaluating whether the service cycle of the data set and the sample data set meeting the minimum internal quality requirement meets the requirement of a given task or not, and obtaining a time-efficiency evaluation result.

S212, quality sorting is carried out on the quality evaluation results.

In summary, in the data quality evaluation method disclosed by the invention, the evaluation index is comprehensive, and the internal quality irrelevant to the task and the context quality relevant to the task are comprehensively considered; the evaluation process is efficient and is suitable for large-scale data collection; the evaluation method is high in universality and suitable for various types of data; the evaluation result has interpretability.

As shown in fig. 3, a schematic structural diagram of an embodiment 1 of a data quality evaluation system disclosed in the present invention may include:

the intrinsic quality evaluation module 301 is configured to evaluate the internal features of the data set that are not related to the task, so as to obtain a data set that meets the minimum intrinsic quality requirement;

when the data quality needs to be evaluated, the data set is firstly evaluated for the internal characteristics of the data irrelevant to tasks from four dimensions of accuracy, precision, objectivity and reliability. For data set D, the quantitative values for the four dimensions of accuracy, precision, objectivity and reliability are respectively Minimum quality requirements for four dimensions of accuracy, precision, objectivity and reliability are θ respectively _c ，θ _p ，θ _o ，θ _r . The data set D must meet the minimum intrinsic quality requirementInferior datasets that do not meet the lowest intrinsic quality requirement R will be placed directly at the bottom of the ordered list without further evaluation.

The feature extraction module 302 is configured to perform feature extraction on each data in the data set and the sample data set that meet the minimum intrinsic quality requirement, so as to obtain a feature vector of each data;

A context quality evaluation module 303, configured to perform context quality evaluation on the feature vector of each data in the data set and the sample data set that meet the minimum intrinsic quality requirement, to obtain a quality evaluation result;

The quality sorting module 304 is configured to sort the quality of the quality evaluation result.

As shown in fig. 4, a schematic structural diagram of an embodiment 2 of a data quality evaluation system disclosed in the present invention may include:

an accuracy evaluation unit 401, configured to evaluate accuracy, reliability and error-free degree of the data set by using a pattern matching method, so as to obtain an accuracy quantized value;

Specifically, the correctness, reliability and absence of data are evaluated by using a pattern matching methodThe degree of error, obtaining an accuracy quantized valueFor example, for text data, its spelling and grammar are evaluated for correctness.

An accuracy evaluation unit 402, configured to evaluate accuracy of data collection and storage of the data set, and obtain an accuracy quantization value;

An objectivity evaluation unit 403, configured to evaluate the unbiased degree of the data set, so as to obtain an objectivity quantification value;

A dependability evaluation unit 404, configured to evaluate the degree of trust of the data source of the data set, and obtain a quantized reliability value;

A determining unit 405, configured to obtain a dataset that meets a minimum intrinsic quality requirement based on the accuracy quantization value, the objectivity quantization value, the reliability quantization value, the accuracy minimum quality requirement, the objectivity minimum quality requirement, and the reliability minimum quality requirement;

A feature extraction module 406, configured to perform feature extraction on each data in the data set and the sample data set that meet the minimum intrinsic quality requirement, so as to obtain a feature vector of each data;

A task relevance evaluation unit 407, configured to calculate a ratio of the number of similar point pairs to the distance in the data set and the sample data set that satisfy the minimum internal quality requirement by using a method based on local sensitive hashing, to obtain a task relevance evaluation result;

A content diversity evaluation unit 408, configured to calculate an average distance between the data set and the sample data set that meets the minimum internal quality requirement by using a method based on local sensitive hash, so as to obtain a content diversity evaluation result;

Uniformly sampling the data in all barrels to obtainTo set G, and calculate the distance Dis (d) of all data points in set G _i ,d _j ),d _i ,d _j E G, and approximating the value of content diversity with the mean (the higher the sampling rate, the closer the calculated mean is to the value of true content diversity).

An integrity evaluation unit 409, configured to calculate a ratio of the number of non-null data in the data set and the sample data set that satisfy the minimum internal quality requirement to the total data amount, to obtain an integrity evaluation result;

A fitness evaluation unit 410, configured to evaluate whether the data volume in the data set and the sample data set that meet the minimum intrinsic quality requirement meets the requirement of the given task, and obtain a fitness evaluation result of the data volume;

A timeliness evaluation unit 411, configured to evaluate whether a service cycle of a data set and a sample data set that meet a minimum intrinsic quality requirement meets a requirement of a given task, to obtain a timeliness evaluation result;

A quality ranking module 412, configured to rank the quality evaluation results.

In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method of data quality assessment, comprising:

obtaining a data set meeting the minimum internal quality requirement based on the accuracy quantized value, the objectivity quantized value, the reliability quantized value, the accuracy minimum quality requirement, the objectivity minimum quality requirement and the reliability minimum quality requirement;

evaluating whether the service cycle of the data set and the sample data set meeting the minimum internal quality requirement meets the requirement of a given task or not, and obtaining a timeliness evaluation result;

and carrying out quality sorting on the quality evaluation results.

2. The method of claim 1, wherein the feature extraction of each data in the data set and the sample data set meeting the minimum intrinsic quality requirement to obtain a feature vector for each data comprises:

3. The method of claim 1, wherein the feature extraction of each data in the data set and the sample data set meeting the minimum intrinsic quality requirement to obtain a feature vector for each data comprises:

4. A data quality assessment system, comprising:

the quality sorting module is used for sorting the quality of the quality evaluation result;

the intrinsic quality assessment module comprises:

a determining unit, configured to obtain a data set that meets a minimum intrinsic quality requirement based on the accuracy quantization value, the objectivity quantization value, the reliability quantization value, the accuracy minimum quality requirement, the objectivity minimum quality requirement, and the reliability minimum quality requirement;

the context quality assessment module comprises:

5. The system of claim 4, wherein the feature extraction module is specifically configured to:

6. The system of claim 4, wherein the feature extraction module is specifically configured to: