CN116303387A

CN116303387A - Scientific data center data quality assessment method

Info

Publication number: CN116303387A
Application number: CN202310167738.6A
Authority: CN
Inventors: 伍观娣; 陶玉柱; 李一凡
Original assignee: Guangdong Academy of Forestry
Current assignee: Guangdong Academy of Forestry
Priority date: 2023-02-27
Filing date: 2023-02-27
Publication date: 2023-06-23

Abstract

The invention discloses a data quality assessment method of a scientific data center, which comprises the following steps: step one, importing data; step two, quality analysis; step three, formulating a weighting rule; step four, quality assessment; step five, setting a feedback threshold value; step six, collecting feedback; step seven, feedback correction; step eight, updating iteration; in the second step, the relevance of the data is determined according to the quality of the content related to the data, and in the second step, the repeatability of the data is analyzed according to similar data under the same background; according to the invention, the applicability of the evaluation method is enlarged by formulating evaluation rules with various different tendencies; according to the invention, the evaluation result is corrected by collecting the user feedback, so that the evaluation error can be found in time.

Description

Scientific data center data quality assessment method

Technical Field

The invention relates to the technical field of data quality evaluation, in particular to a data quality evaluation method of a scientific data center.

Background

Along with the continuous development of science and technology, the propagation speed of information is rapidly expanded, various system applications are more and more, wherein the processing of data has extremely important positions in the system applications, the quality of the data also determines whether one system application can obtain the trust of a user, the existing data quality evaluation method takes the data multidimensional analysis as an evaluation basis, the data validity is mainly determined by the content and the quantity of the data without examining the source of the data, and the accuracy is to be improved; the existing data quality evaluation method comprehensively considers the analysis results of all dimensions when preparing an evaluation rule, and then uniformly evaluates the analysis results, wherein the evaluation results have comprehensiveness but cannot adapt to various consulting trends of users; the existing data quality assessment method cannot find and correct assessment errors in time due to the fact that feedback is not collected for users.

Disclosure of Invention

The invention aims to provide a data quality evaluation method for a scientific data center, so as to solve the problems in the background technology.

In order to achieve the above purpose, the present invention provides the following technical solutions: the data quality evaluation method of the scientific data center comprises the following steps: step one, importing data; step two, quality analysis; step three, formulating a weighting rule; step four, quality assessment; step five, setting a feedback threshold value; step six, collecting feedback; step seven, feedback correction; step eight, updating iteration;

in the first step, data in a scientific data center database is imported into an evaluation system;

in the second step, the data are respectively analyzed according to six dimensions of real-time performance, effectiveness, relevance, integrity and repeatability;

in the third step, the dimension classification in the second step is used for randomly taking a dimension as a main part to formulate an evaluation weighting rule;

in the fourth step, data quality evaluation is performed according to each rule specified in the third step, so that different evaluation results are obtained and are used for displaying different search trends in an arrangement mode;

in the fifth step, the feedback result is divided into positive feedback and negative feedback, and the positive feedback and the negative feedback are set to different levels of thresholds;

in the sixth step, after the user searches and refers to the data, user feedback is collected and summarized;

in the seventh step, feedback is added to the evaluation weighting rule according to the feedback result to correct the evaluation result;

in the eighth step, a data quality evaluation period is set, and the evaluation is immediately re-evaluated if the evaluation quality fluctuation is large due to correction in the period, otherwise, the evaluation is re-evaluated after one period is finished.

Preferably, in the second step, the real-time property of the data is analyzed according to the submitting time of the data and whether there is data update.

Preferably, in the second step, the validity of the data is analyzed according to the content, the number and the reliability of the data source.

Preferably, in the second step, the relevance of the data is determined according to the quality of the content associated with the data.

Preferably, in the second step, the integrity of the data includes the source background of the data, the obtaining process and the final result.

Preferably, in the second step, the repeatability of the data is analyzed according to similar data in the same background.

Preferably, in the fifth step, when the feedback result is classified into the threshold, the positive feedback and the negative feedback may be divided into the correction threshold and the re-evaluation threshold.

Preferably, in the seventh step, when the correction threshold is reached, positive and negative corrections are performed on the data evaluation according to the feedback result, and when the re-evaluation threshold is reached, that is, the feedback result deviates greatly from the initial evaluation, the calculation needs to be performed again, and the weighting needs to be performed in combination with the feedback.

Compared with the prior art, the invention has the beneficial effects that: compared with the existing data quality assessment method, the method has the advantages that the data validity is analyzed by examining the data sources, and the accuracy of quality assessment is improved; according to the invention, the applicability of the evaluation method is enlarged by formulating evaluation rules with various different tendencies; according to the invention, the evaluation result is corrected by collecting the user feedback, so that the evaluation error can be found in time.

Drawings

FIG. 1 is a flow chart of the method of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Referring to fig. 1, an embodiment of the present invention is provided: the data quality evaluation method of the scientific data center comprises the following steps: step one, importing data; step two, quality analysis; step three, formulating a weighting rule; step four, quality assessment; step five, setting a feedback threshold value; step six, collecting feedback; step seven, feedback correction; step eight, updating iteration;

in the second step, the data are analyzed according to six dimensions of real-time property, validity, relevance, integrity and repeatability, wherein the real-time property of the data is analyzed according to the submitting time of the data and whether data update exists, the validity of the data is analyzed according to the content, the number and the reliability of a data source of the data, the relevance of the data is determined according to the quality of the content associated with the data, the integrity of the data comprises the source background, the obtaining process and the final result of the data, and the repeatability of the data is analyzed according to similar data under the same background;

in the fifth step, the feedback result is divided into positive feedback and negative feedback, and the positive feedback and the negative feedback are set to be different levels of thresholds, and the positive feedback and the negative feedback can be divided into a correction threshold and a re-evaluation threshold;

in the seventh step, the feedback is added into the evaluation weighting rule according to the feedback result to correct the evaluation result, when the correction threshold is reached, the data evaluation is corrected positively and negatively according to the feedback result, when the re-evaluation threshold is reached, that is, the feedback result greatly deviates from the initial evaluation, the calculation is needed again, and the feedback is combined to weight;

Based on the above, the invention has the advantages that when the invention is used, the data source is matched with the traditional validity analysis source, so that the accuracy of data analysis is improved; the invention uses a plurality of different data quality evaluation rules to avoid the problem that the invention can not adapt to different consulting trends of users; according to the invention, the feedback threshold is set, the feedback result is collected, and the data quality is corrected and re-evaluated finally according to the feedback threshold and the feedback result, so that the data quality evaluation can be ensured to follow the sense of the user.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.

Claims

1. The data quality evaluation method of the scientific data center comprises the following steps: step one, importing data; step two, quality analysis; step three, formulating a weighting rule; step four, quality assessment; step five, setting a feedback threshold value; step six, collecting feedback; step seven, feedback correction; step eight, updating iteration; the method is characterized in that:

2. The scientific data center data quality assessment method according to claim 1, characterized in that: in the second step, the real-time property of the data is analyzed according to the submitting time of the data and whether the data update exists.

3. The scientific data center data quality assessment method according to claim 1, characterized in that: in the second step, the validity of the data is analyzed according to the content, the quantity and the reliability of the data source.

4. The scientific data center data quality assessment method according to claim 1, characterized in that: in the second step, the relevance of the data is determined according to the quality of the content associated with the data.

5. The scientific data center data quality assessment method according to claim 1, characterized in that: in the second step, the integrity of the data includes the source background of the data, the obtaining process and the final result.

6. The scientific data center data quality assessment method according to claim 1, characterized in that: in the second step, the repeatability of the data is analyzed according to similar data in the same background.

7. The scientific data center data quality assessment method according to claim 1, characterized in that: in the fifth step, when the feedback result is classified into the threshold value, the positive feedback and the negative feedback can be divided into the correction threshold value and the re-evaluation threshold value.

8. The scientific data center data quality assessment method according to claim 1, characterized in that: in the seventh step, when the correction threshold is reached, positive and negative corrections are performed on the data evaluation according to the feedback result, and when the re-evaluation threshold is reached, that is, the feedback result deviates greatly from the initial evaluation, the calculation needs to be performed again, and the feedback is combined to perform weighting.