CN109254959B - Data evaluation method and device, terminal equipment and readable storage medium - Google Patents

Data evaluation method and device, terminal equipment and readable storage medium Download PDF

Info

Publication number
CN109254959B
CN109254959B CN201810947681.0A CN201810947681A CN109254959B CN 109254959 B CN109254959 B CN 109254959B CN 201810947681 A CN201810947681 A CN 201810947681A CN 109254959 B CN109254959 B CN 109254959B
Authority
CN
China
Prior art keywords
data
evaluation
information
data quality
opening
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810947681.0A
Other languages
Chinese (zh)
Other versions
CN109254959A (en
Inventor
谢桂园
刘忻
魏文国
蔡君
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Bingo Software Co Ltd
Guangdong Polytechnic Normal University
Original Assignee
Guangzhou Bingo Software Co Ltd
Guangdong Polytechnic Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Bingo Software Co Ltd, Guangdong Polytechnic Normal University filed Critical Guangzhou Bingo Software Co Ltd
Priority to CN201810947681.0A priority Critical patent/CN109254959B/en
Publication of CN109254959A publication Critical patent/CN109254959A/en
Application granted granted Critical
Publication of CN109254959B publication Critical patent/CN109254959B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a data evaluation method, a data evaluation device, terminal equipment and a readable storage medium, wherein the method comprises the following steps: collecting data quality evaluation information and data opening evaluation information from a data resource detail interface based on a data quality evaluation function and a data opening evaluation function of a data evaluation system; and respectively inputting the data quality evaluation information and the data open evaluation information into a data quality evaluation model and a data open evaluation model which are constructed in advance for evaluation and analysis, and outputting corresponding evaluation reports. The method can analyze the data quality and the existing problems of the whole data lake, find out the reasons of the problems, rectify the common problems of the data quality and continuously promote the continuous opening and sharing of the data lake.

Description

Data evaluation method and device, terminal equipment and readable storage medium
Technical Field
The invention relates to the technical field of data processing, in particular to a data evaluation method and device, terminal equipment and a readable storage medium.
Background
After the data of the data lake is used in a large scale, various data quality evaluations and problem feedbacks are followed, and for this reason, a data quality evaluation system which can be formed by using the collected evaluation information is required to ensure the overall data quality.
In the process of research and practice of the prior art, the inventor of the invention finds that the data quality and the existing problems of the whole data lake can be analyzed, the reasons of the problems can be found, the common problems of the data quality can be rectified, and the continuous opening and sharing of the data lake can be continuously promoted based on the collection of evaluation information from data users (institutions and organizations) and the judgment of the data quality, including whether the data is timely, whether the data is complete and how the data is valuable.
Disclosure of Invention
The technical problem to be solved by the embodiment of the invention is to provide a method for constructing a data evaluation system, which can analyze the data quality and problems of the whole data lake, find out the reasons of the problems, correct the common problems of the data quality, and continuously promote the continuous opening and sharing of the data lake.
To solve the above problem, an embodiment of the present invention provides a method for constructing a data evaluation system, which is suitable for being executed in a computer device, and at least comprises the following steps:
collecting data quality evaluation information and data opening evaluation information from a data resource detail interface based on a data quality evaluation function and a data opening evaluation function of a data evaluation system;
and respectively inputting the data quality evaluation information and the data open evaluation information into a data quality evaluation model and a data open evaluation model which are constructed in advance for evaluation and analysis, and outputting corresponding evaluation reports.
Further, the respectively inputting the data quality evaluation information and the data open evaluation information into a data quality evaluation model and a data open evaluation model which are constructed in advance for evaluation and analysis, and outputting a corresponding evaluation report specifically includes:
inputting the data quality evaluation information into a pre-constructed data quality evaluation model, analyzing the timeliness, the integrity and the consistency of data, and generating a data quality evaluation report according to a set period;
inputting the data open evaluation information into a pre-constructed data open evaluation model, analyzing the number of data resources, the number of data resource records, the storage size of the data resources and the update period of the data resources, and generating a data open evaluation report according to a set period;
and outputting a final evaluation report according to the data quality evaluation report and the data opening evaluation report so that the client corrects the relevant problems of the corresponding data quality and the data opening according to the evaluation report.
Further, the evaluation and analysis step of the data quality evaluation model for the data consistency in the data quality evaluation information is as follows: extracting data needing data consistency evaluation from the data quality evaluation information to form a sample data table;
formatting the data in the sample data table, namely extracting structured information from a large amount of unstructured information in the data quality evaluation information for further analysis and application; the extracted structured information comprises fields with problems, wherein the problems comprise content inconsistency, data size inconsistency, number inconsistency and file body inconsistency;
and extracting the maximum commonalities causing data inconsistency according to the structured information to obtain a root cause causing the problem.
Further, the data quality evaluation information is overall evaluation of data resources, including qualitative star-level evaluation and evaluation content description; the data opening evaluation information is the satisfaction evaluation of the data resource opening content and the user requirement, and comprises the record number of data opening and whether the field content is enough.
An embodiment of the present invention also provides a data evaluation apparatus, including:
the data evaluation acquisition module is used for collecting data quality evaluation information and data open evaluation information from a data resource detail interface based on a data quality evaluation function and a data open evaluation function of a data evaluation system;
and the data evaluation analysis module is used for inputting the data quality evaluation information and the data open evaluation information into a data quality evaluation model and a data open evaluation model which are constructed in advance respectively for evaluation analysis and outputting corresponding evaluation reports.
An embodiment of the present invention also provides a data evaluation apparatus, including a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, the processor implementing a data evaluation method according to any one of claims 1 to 4 when executing the computer program.
An embodiment of the present invention further provides a computer-readable storage medium including a stored computer program, wherein when the computer program runs, the apparatus on which the computer-readable storage medium is located is controlled to execute a data evaluation method according to any one of claims 1 to 4.
The embodiment of the invention has the following beneficial effects:
the embodiment of the invention provides a data evaluation method, a data evaluation device, terminal equipment and a readable storage medium, wherein the method comprises the following steps: collecting data quality evaluation information and data opening evaluation information from a data resource detail interface based on a data quality evaluation function and a data opening evaluation function of a data evaluation system; and respectively inputting the data quality evaluation information and the data open evaluation information into a data quality evaluation model and a data open evaluation model which are constructed in advance for evaluation and analysis, and outputting corresponding evaluation reports. The method can analyze the data quality and the existing problems of the whole data lake, find out the reasons of the problems, rectify the common problems of the data quality and continuously promote the continuous opening and sharing of the data lake.
Drawings
Fig. 1 is a schematic flow chart of a data evaluation method according to a first embodiment of the present invention;
fig. 2 is a schematic structural diagram of a data evaluation apparatus according to a second embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
First, application scenarios, such as data evaluation and analysis, that can be provided by the present invention are described.
After the data of the data lake is used in a large scale, evaluation and problem feedback of various data qualities are carried out, and therefore a data quality evaluation system which can be formed by collected evaluation information is needed to ensure the overall data quality.
The first embodiment of the present invention:
please refer to fig. 1.
As shown in fig. 1, the data evaluation method provided by this embodiment is suitable for being executed in a computer device, and at least includes the following steps:
s101, collecting data quality evaluation information and data opening evaluation information from a data resource detail interface based on a data quality evaluation function and a data opening evaluation function of a data evaluation system;
s102, inputting the data quality evaluation information and the data open evaluation information into a data quality evaluation model and a data open evaluation model which are constructed in advance respectively for evaluation and analysis, and outputting corresponding evaluation reports.
For step S101, specifically:
the data quality evaluation information is the overall evaluation of data resources, and comprises qualitative star-level evaluation and evaluation content description; the data opening evaluation information is the satisfaction evaluation of the data resource opening content and the user requirement, and comprises the record number of data opening and whether the field content is enough.
For step S102, specifically:
inputting the data quality evaluation information into a pre-constructed data quality evaluation model, analyzing the timeliness, the integrity and the consistency of data, and generating a data quality evaluation report according to a set period;
inputting the data open evaluation information into a pre-constructed data open evaluation model, analyzing the number of data resources, the number of data resource records, the storage size of the data resources and the update period of the data resources, and generating a data open evaluation report according to a set period;
and outputting a final evaluation report according to the data quality evaluation report and the data opening evaluation report so that the client corrects the relevant problems of the corresponding data quality and the data opening according to the evaluation report.
Specifically, the evaluation and analysis step of the data quality evaluation model for the data consistency in the data quality evaluation information is as follows: extracting data needing data consistency evaluation from the data quality evaluation information to form a sample data table;
formatting the data in the sample data table, namely extracting structured information from a large amount of unstructured information in the data quality evaluation information for further analysis and application; the extracted structured information comprises fields with problems, wherein the problems comprise content inconsistency, data size inconsistency, number inconsistency and file body inconsistency;
and extracting the maximum commonalities causing data inconsistency according to the structured information to obtain a root cause causing the problem.
The data evaluation method provided by the embodiment comprises the following steps: collecting data quality evaluation information and data opening evaluation information from a data resource detail interface based on a data quality evaluation function and a data opening evaluation function of a data evaluation system; and respectively inputting the data quality evaluation information and the data open evaluation information into a data quality evaluation model and a data open evaluation model which are constructed in advance for evaluation and analysis, and outputting corresponding evaluation reports. The embodiment can analyze the data quality and the existing problems of the whole data lake, find out the causes of the problems, rectify the common problems of the data quality and continuously promote the continuous opening and sharing of the data lake.
Second embodiment of the invention:
please refer to fig. 2.
As shown in fig. 2, the present embodiment further provides a data evaluation apparatus, including:
the data evaluation acquisition module 100 is used for collecting data quality evaluation information and data opening evaluation information from a data resource detail interface based on a data quality evaluation function and a data opening evaluation function of a data evaluation system;
specifically, the data quality evaluation information is overall evaluation of data resources, including qualitative star-level evaluation and evaluation content description; the data opening evaluation information is the satisfaction evaluation of the data resource opening content and the user requirement, and comprises the record number of data opening and whether the field content is enough.
And the data evaluation analysis module 200 is configured to input the data quality evaluation information and the data open evaluation information into a data quality evaluation model and a data open evaluation model which are constructed in advance, perform evaluation analysis, and output a corresponding evaluation report.
Specifically, the data quality evaluation information is input into a pre-constructed data quality evaluation model, data timeliness, data integrity and data consistency analysis is carried out, and a data quality evaluation report is generated according to a set period;
inputting the data open evaluation information into a pre-constructed data open evaluation model, analyzing the number of data resources, the number of data resource records, the storage size of the data resources and the update period of the data resources, and generating a data open evaluation report according to a set period;
and outputting a final evaluation report according to the data quality evaluation report and the data opening evaluation report so that the client corrects the relevant problems of the corresponding data quality and the data opening according to the evaluation report. Specifically, the evaluation and analysis step of the data quality evaluation model for the data consistency in the data quality evaluation information is as follows: extracting data needing data consistency evaluation from the data quality evaluation information to form a sample data table;
formatting the data in the sample data table, namely extracting structured information from a large amount of unstructured information in the data quality evaluation information for further analysis and application; the extracted structured information comprises fields with problems, wherein the problems comprise content inconsistency, data size inconsistency, number inconsistency and file body inconsistency;
and extracting the maximum commonalities causing data inconsistency according to the structured information to obtain a root cause causing the problem.
The data evaluation device provided by the embodiment collects data quality evaluation information and data opening evaluation information from a data resource detail interface based on a data quality evaluation function and a data opening evaluation function of a data evaluation system; and respectively inputting the data quality evaluation information and the data open evaluation information into a data quality evaluation model and a data open evaluation model which are constructed in advance for evaluation and analysis, and outputting corresponding evaluation reports. The embodiment can analyze the data quality and the existing problems of the whole data lake, find out the causes of the problems, rectify the common problems of the data quality and continuously promote the continuous opening and sharing of the data lake.
An embodiment of the present invention also provides a data evaluation terminal device, which includes a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, and the processor implements a data evaluation method as described above when executing the computer program.
An embodiment of the present invention further provides a computer-readable storage medium, which includes a stored computer program, wherein when the computer program runs, the apparatus where the computer-readable storage medium is located is controlled to execute a data evaluation method as described above.
The foregoing is directed to the preferred embodiment of the present invention, and it is understood that various changes and modifications may be made by one skilled in the art without departing from the spirit of the invention, and it is intended that such changes and modifications be considered as within the scope of the invention.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

Claims (5)

1. A data evaluation method, adapted to be executed in a computing device, characterized by comprising at least the steps of:
collecting data quality evaluation information and data opening evaluation information from a data resource detail interface based on a data quality evaluation function and a data opening evaluation function of a data evaluation system; the data quality evaluation information is overall evaluation of data resources, and comprises qualitative star-level evaluation and evaluation content description; the data opening evaluation information is the satisfaction evaluation of the data resource opening content and the user requirement, and comprises the record number of data opening and whether the field content is enough;
inputting the data quality evaluation information and the data open evaluation information into a data quality evaluation model and a data open evaluation model which are constructed in advance respectively for evaluation and analysis, and outputting corresponding evaluation reports; the method specifically comprises the following steps: inputting the data quality evaluation information into a pre-constructed data quality evaluation model, analyzing the timeliness, the integrity and the consistency of data, and generating a data quality evaluation report according to a set period; inputting the data open evaluation information into a pre-constructed data open evaluation model, analyzing the number of data resources, the number of data resource records, the storage size of the data resources and the update period of the data resources, and generating a data open evaluation report according to a set period; and outputting a final evaluation report according to the data quality evaluation report and the data opening evaluation report so that the client corrects the relevant problems of the corresponding data quality and the data opening according to the evaluation report.
2. The data evaluation method according to claim 1, wherein the evaluation analysis step of the data quality evaluation model for the data consistency in the data quality evaluation information is: extracting data needing data consistency evaluation from the data quality evaluation information to form a sample data table;
formatting the data in the sample data table, namely extracting structured information from a large amount of unstructured information in the data quality evaluation information for further analysis and application; the extracted structured information comprises fields with problems, wherein the problems comprise content inconsistency, data size inconsistency, number inconsistency and file body inconsistency;
and extracting the maximum commonalities causing data inconsistency according to the structured information to obtain a root cause causing the problem.
3. A data evaluation apparatus, comprising:
the data evaluation acquisition module is used for collecting data quality evaluation information and data open evaluation information from a data resource detail interface based on a data quality evaluation function and a data open evaluation function of a data evaluation system; the data quality evaluation information is overall evaluation of data resources, and comprises qualitative star-level evaluation and evaluation content description; the data opening evaluation information is the satisfaction evaluation of the data resource opening content and the user requirement, and comprises the record number of data opening and whether the field content is enough;
the data evaluation analysis module is used for inputting the data quality evaluation information and the data open evaluation information into a data quality evaluation model and a data open evaluation model which are constructed in advance respectively for evaluation analysis and outputting corresponding evaluation reports; the method specifically comprises the following steps: inputting the data quality evaluation information into a pre-constructed data quality evaluation model, analyzing the timeliness, the integrity and the consistency of data, and generating a data quality evaluation report according to a set period; inputting the data open evaluation information into a pre-constructed data open evaluation model, analyzing the number of data resources, the number of data resource records, the storage size of the data resources and the update period of the data resources, and generating a data open evaluation report according to a set period; and outputting a final evaluation report according to the data quality evaluation report and the data opening evaluation report so that the client corrects the relevant problems of the corresponding data quality and the data opening according to the evaluation report.
4. A data evaluation terminal device comprising a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, the processor implementing a data evaluation method according to any one of claims 1 to 2 when executing the computer program.
5. A computer-readable storage medium, comprising a stored computer program, wherein the computer program, when executed, controls an apparatus in which the computer-readable storage medium is located to perform a data evaluation method according to any one of claims 1 to 2.
CN201810947681.0A 2018-08-17 2018-08-17 Data evaluation method and device, terminal equipment and readable storage medium Active CN109254959B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810947681.0A CN109254959B (en) 2018-08-17 2018-08-17 Data evaluation method and device, terminal equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810947681.0A CN109254959B (en) 2018-08-17 2018-08-17 Data evaluation method and device, terminal equipment and readable storage medium

Publications (2)

Publication Number Publication Date
CN109254959A CN109254959A (en) 2019-01-22
CN109254959B true CN109254959B (en) 2022-04-08

Family

ID=65048873

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810947681.0A Active CN109254959B (en) 2018-08-17 2018-08-17 Data evaluation method and device, terminal equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN109254959B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110059083A (en) * 2019-04-24 2019-07-26 北京金堤科技有限公司 A kind of data evaluation method, apparatus and electronic equipment
CN110263229B (en) * 2019-06-27 2020-06-02 北京中油瑞飞信息技术有限责任公司 Data lake-based data management method and device
CN113052411A (en) * 2019-12-26 2021-06-29 北京邮电大学 Data product quality evaluation method and device
US11263103B2 (en) * 2020-07-31 2022-03-01 International Business Machines Corporation Efficient real-time data quality analysis
US11204851B1 (en) 2020-07-31 2021-12-21 International Business Machines Corporation Real-time data quality analysis
CN113268894B (en) * 2021-07-20 2022-07-05 国能信控互联技术有限公司 Thermal power production data management method and system based on data center station

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105049301A (en) * 2015-08-31 2015-11-11 北京奇虎科技有限公司 Method and device for providing comprehensive evaluation services of websites
CN107423911A (en) * 2017-08-02 2017-12-01 中国科学院上海高等研究院 Software Evaluating Degree of Success method/system, computer-readable recording medium and equipment
CN107545349A (en) * 2016-06-28 2018-01-05 国网天津市电力公司 A kind of Data Quality Analysis evaluation model towards electric power big data
CN107819824A (en) * 2017-10-09 2018-03-20 中国电子科技集团公司第二十八研究所 A kind of Urban Data opens and information service system and method for servicing
CN107977798A (en) * 2017-12-21 2018-05-01 中国计量大学 A kind of risk evaluating method of e-commerce product quality

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101335618B (en) * 2008-07-09 2010-09-15 南京邮电大学 Method for evaluating and authorizing peer-to-peer network node by certificate
CN101621534B (en) * 2009-08-11 2012-07-04 阿坝师范高等专科学校 Semantic service automatic combination method facing to service system structure
CN102708149A (en) * 2012-04-01 2012-10-03 河海大学 Data quality management method and system
CN102681979B (en) * 2012-05-15 2015-04-22 北京师范大学 Content editing intelligent verifying method facing to open knowledge community
CN103970820B (en) * 2014-01-23 2017-03-08 河海大学 The method for visualizing of the open labeled data of Web multimedia resource and device
JP6569585B2 (en) * 2016-04-19 2019-09-04 株式会社デンソー Business evaluation system
CN105976120A (en) * 2016-05-17 2016-09-28 全球能源互联网研究院 Electric power operation monitoring data quality assessment system and method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105049301A (en) * 2015-08-31 2015-11-11 北京奇虎科技有限公司 Method and device for providing comprehensive evaluation services of websites
CN107545349A (en) * 2016-06-28 2018-01-05 国网天津市电力公司 A kind of Data Quality Analysis evaluation model towards electric power big data
CN107423911A (en) * 2017-08-02 2017-12-01 中国科学院上海高等研究院 Software Evaluating Degree of Success method/system, computer-readable recording medium and equipment
CN107819824A (en) * 2017-10-09 2018-03-20 中国电子科技集团公司第二十八研究所 A kind of Urban Data opens and information service system and method for servicing
CN107977798A (en) * 2017-12-21 2018-05-01 中国计量大学 A kind of risk evaluating method of e-commerce product quality

Also Published As

Publication number Publication date
CN109254959A (en) 2019-01-22

Similar Documents

Publication Publication Date Title
CN109254959B (en) Data evaluation method and device, terminal equipment and readable storage medium
CN109062780B (en) Development method of automatic test case and terminal equipment
US11416768B2 (en) Feature processing method and feature processing system for machine learning
CN106844506A (en) The knowledge retrieval method and the automatic improving method of knowledge base of a kind of artificial intelligence dialogue
CN109408821B (en) Corpus generation method and device, computing equipment and storage medium
CN104346480B (en) information mining method and device
CN111651751B (en) Security event analysis report generation method and device, storage medium and equipment
CN112418779A (en) Online self-service interviewing method based on natural language understanding
CN116595859A (en) Audit model construction method, device, equipment and medium based on machine learning
CN114399205A (en) Procedural evaluation method, system and equipment suitable for project collaboration
CN104765823A (en) Method and device for collecting website data
CN113609427A (en) System data resource extraction method and system under condition of no interface
Rebuge et al. A process mining analysis on a virtual electronic patient record system
CN112598286A (en) Crowdsourcing user cheating behavior detection method and device and electronic equipment
CN112966076A (en) Intelligent question and answer generating method and device, computer equipment and storage medium
CN107688619A (en) A kind of daily record data processing method and processing device
CN106462614B (en) Information analysis system, information analysis method, and information analysis program
CN114969018B (en) Data monitoring method and system
CN107643968A (en) Crash log processing method and processing device
CN112328812B (en) Domain knowledge extraction method and system based on self-adjusting parameters and electronic equipment
CN113377962B (en) Intelligent process simulation method based on image recognition and natural language processing
KR20130068633A (en) Apparatus and method for visualizing data
CN112069807A (en) Text data theme extraction method and device, computer equipment and storage medium
CN110175456A (en) Software action sampling method, relevant device and software systems
CN113918817B (en) Push model construction method, push model construction device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 510665 293 Zhongshan Avenue, Tianhe District, Guangzhou, Guangdong.

Applicant after: GUANGDONG POLYTECHNIC NORMAL University

Applicant after: Guangzhou Pingao Software Co., Ltd

Address before: 510665 293 Zhongshan Avenue, Tianhe District, Guangzhou, Guangdong.

Applicant before: GUANGDONG POLYTECHNIC NORMAL University

Applicant before: Guangzhou Pingao Software Co., Ltd

GR01 Patent grant
GR01 Patent grant