CN111125075A - Data management method and system for non-computable region - Google Patents

Data management method and system for non-computable region Download PDF

Info

Publication number
CN111125075A
CN111125075A CN201911298485.6A CN201911298485A CN111125075A CN 111125075 A CN111125075 A CN 111125075A CN 201911298485 A CN201911298485 A CN 201911298485A CN 111125075 A CN111125075 A CN 111125075A
Authority
CN
China
Prior art keywords
data
area
metadata
computable
checking
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911298485.6A
Other languages
Chinese (zh)
Inventor
李刚
刘浩宇
李野
顾强
赵宝国
杨光
季浩
何泽昊
董得龙
吕伟嘉
张兆杰
卢静雅
翟术然
乔亚男
陈娟
许迪
赵紫敬
孙虹
卫天超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Corp of China SGCC
State Grid Tianjin Electric Power Co Ltd
Electric Power Research Institute of State Grid Tianjin Electric Power Co Ltd
Original Assignee
State Grid Corp of China SGCC
State Grid Tianjin Electric Power Co Ltd
Electric Power Research Institute of State Grid Tianjin Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Corp of China SGCC, State Grid Tianjin Electric Power Co Ltd, Electric Power Research Institute of State Grid Tianjin Electric Power Co Ltd filed Critical State Grid Corp of China SGCC
Priority to CN201911298485.6A priority Critical patent/CN111125075A/en
Publication of CN111125075A publication Critical patent/CN111125075A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Economics (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Marketing (AREA)
  • Quality & Reliability (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Data Mining & Analysis (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a data management system for a non-computable area, which is characterized in that: the method comprises the following steps: (1) establishing a persistent original data area; (2) establishing a data integration area; (3) establishing a data summarizing area; (4) construction data quality management; (5) managing data integrity and judging data rationality rules. The method comprises the steps of combing non-calculable data in power consumption data, and identifying abnormal data by using outlier detection according to abnormal data conditions. On the basis, based on the requirement of data mining, methods such as data elimination and linear interpolation complement are comprehensively applied, and corresponding cleaning measures are provided for each subclass of data, so that qualified high-quality power utilization data are obtained, the data of the non-computable transformer area of the trade area are treated, and the overall data quality is greatly improved.

Description

Data management method and system for non-computable region
Technical Field
The invention belongs to the field of power data management, and particularly relates to a data management method and system for an unpromputable area.
Background
Since 2009, national grid companies have vigorously built electricity consumption information acquisition systems, and currently, the operation of 4.5 hundred million electric meters in the universe is realized. After years of operation, the system accumulates massive electricity data. Through data analysis, effective power utilization information such as operation errors of an electric energy meter and power utilization behavior patterns of users are mined, the potential of mass data can be developed, the operation cost can be greatly reduced, and decision support is provided for power grid companies.
With the development and application of the automatic verification system of the electric energy metering device in the self-trade area, massive power utilization data are accumulated, and the data contain massive power utilization information. However, the quality of the original data is affected by diversity, uncertainty and complexity, so that the acquired actual data is messy, has the phenomena of deficiency, abnormality and the like, and does not meet the standard requirement of knowledge acquisition of a data mining tool under many conditions. Therefore, the data of the non-computable region of the self-trade region needs to be treated.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a data management method and system for a non-computable region.
The technical problem to be solved by the invention is realized by adopting the following technical scheme:
a non-computable district data governance system based on metadata management includes the following steps:
(1) establishing a persistent original data area;
(2) establishing a data integration area, integrating data in the original data area, establishing a main foreign key relation between the data and the data, and enhancing the relevance of various aggregated data;
(3) establishing a data summarization area for integrating the basic data of various energy sources according to a unified analysis model, extracting and distributing data, making a data cleaning rule and eliminating useless invalid data;
(4) performing quality management on the cleaned data to form metadata for describing the data, and simultaneously completing the quality inspection of the metadata;
(5) and managing the integrity of the data description conditions by using the metadata, and judging the rationality of the data in a graphical mode.
The reasonability of the data is that the service data reflects the value of the data service, the data is concentrated in a certain normal and reasonable interval range, the range is beyond the abnormal range and is defined as unreasonable data, and the abnormal data is removed in the process of participating in calculation in order to avoid influencing the analysis result.
And the original data area is divided into two areas of structured data and unstructured data, wherein the two areas are divided into files, data and equipment.
Moreover, the checking of the quality of the metadata comprises metadata attribute filling rate checking, attribute and legal checking, name repeatability checking and relationship soundness checking.
Moreover, the data integrity management comprises the judgment of the data integrity from two dimensions of the data volume of the data and the value obtained by the field; counting the number of data volumes under the same acquisition frequency to determine the data volume of the data; and judging the fields with null values in the data tables of the same type, and determining whether the null values are reasonable according to rules.
A data management system for non-computable region includes
The original data area establishing module is used for establishing a persistent original data area;
the data integration area establishing module is used for integrating the data of the original data area so as to establish a main foreign key relation between the data;
the data gathering area establishing module is used for integrating the integrated basic data of the energy sources according to a unified analysis model, extracting and distributing data, making a data cleaning rule and eliminating useless invalid data;
the metadata forming module is used for performing quality management on the cleaned data, forming metadata for describing the data and finishing the inspection on the quality of the metadata;
and the data rationality judging module is used for managing the integrity of the data description conditions by using the metadata and judging the data rationality in a graphical mode.
The invention has the advantages and positive effects that:
the data management system for the non-computable region based on metadata management accurately summarizes the types of the non-computable data, and provides a targeted processing measure for the data processing steps on the basis. The data analysis method and the data analysis system can provide qualified high-quality data for data analysis of the non-computable area of the self-trade area, control of the data of the non-computable area of the self-trade area is achieved, and overall data quality is greatly improved. The method lays a solid foundation for the remote analysis of the running errors of the intelligent electric energy meters in the low-voltage distribution area, the analysis of the power utilization behaviors of users and the like.
Drawings
FIG. 1 is a flow chart of the present invention.
Detailed Description
The embodiments of the invention are described in further detail below with reference to the following figures:
a non-computable district data governance system based on metadata management, its innovation lies in: the method comprises the following steps:
(1) and establishing a persistent original data area, wherein the original data area is divided into two areas of structured data and unstructured data, and the two areas are divided into files, data and equipment. The regional security data model is established according to the original data model, so that the flow direction of partial data for analysis is unidirectional, the data is traceable, and the safety of the original data is guaranteed through decoupling of a source service system database.
The data input of the original data area is the original service system and the data exchange area, and the data output direction is the data integration area.
(2) Data integration is limited to data fusion of a data layer, namely, data is added into the data preprocessing process, intelligent synthesis of the data is added, more accurate, more complete and more reliable data than a single information source is generated for estimation and judgment, and then the data is stored in a data warehouse or a data mining module. The data integration area is used for integrating data of the original data area, data extraction and integration are achieved through an ETL tool, and data user information, standard configuration and a data analysis area of the original service system are integrated. The regional users merge the acquired data according to a uniform granularity standard, and the statistics includes information such as a maximum value, a minimum value, an average value, a probability value and the like in an interval range; the equipment standing book area is used for storing various types of equipment in a classified mode, integrating an equipment standing book model according to cleaning rules, associating the equipment standing book model with the integrated user information, and achieving linkage analysis and filing of the user information.
The data integration area carries out service support to the advanced application center, the data input direction is an original data area, and the data output direction is a data summarization area and a data exchange area.
(3) The data summarization area integrates dimensions of a big data analysis scene, economic activity analysis and business characteristics on the basis of two data levels of an original data area and a data integration area, and forms a data model. The hierarchy is used for integrating various energy basic data according to a unified analysis model, extracting and distributing the data, making a data cleaning rule, and eliminating useless invalid data (for example, the data with the abnormality of 0, such as sudden increase or sudden decrease, and the data with the abnormality after judgment). Finally forming a uniform data mining model meeting the requirement of big data. The data area is divided into a fact data area, a dimension data area and a theme analysis area. The fact data area contains digital data, the data can be summarized, and descriptive information of the data is not contained; the dimension data area mainly stores descriptive information of fact data, a standard calculation formula and the like, and redundant data integrated by each system are eliminated; and the theme analysis area is used for storing analytic data in a big data mining form used in high-level application.
(4) Construction data quality management: the method comprises the steps of completing data quality management and treatment of accessed data, periodically or manually performing quality verification on specified data according to a preset verification rule, and storing verification results and sample problem data after verification is completed, wherein the method mainly comprises data cleaning, data verification, data exception handling and the like, and aiming at abnormal data such as 'bad data' and 'non-standard data' after comparison.
Data quality management mainly realizes the quality check of metadata. The method comprises the following steps of metadata attribute filling rate check, attribute and legal check, name repeatability check and relationship soundness check. For the inspection results, the metadata management module can generate a detailed inspection report, explore, solve and process the data quality problem, establish a data quality assessment system and a tracking mechanism, and improve the data value. The retrieval and the search of related personnel on the inspection report are supported, and the specified inspection report is exported into documents such as Excel \ PPT and the like which are easier to read.
(5) And data integrity management is to judge the data integrity from two dimensions of the data volume and the value obtained by the field of the data. And counting the number of data volumes under the same acquisition frequency to determine the data volume of the data. And judging the fields with null values in the data tables of the same type, and determining whether the null values are reasonable according to rules. And defining a data integrity rule template, and defining each level data table in the data storage layer.
(6) And judging the data rationality rule in a graphical mode according to the service data condition. And defining the normal interval range of the data for the service data by taking each data table as a unit. And defining a limited range for the service data by taking each data table as a unit, wherein the value of the data defining the limited range must be the data in the limited value. And defining an upper limit value for the service data by taking each data table as a unit, wherein the upper limit value exceeds the upper limit value and the lower limit value. The limit data is exceeded into exception data.
And finally, the effective data participate in calculation, and the quality of the source end data is improved in an iterative manner.
A data management system for a non-computable region is innovative in that: comprises that
The original data area establishing module is used for establishing a persistent original data area;
the data integration area establishing module is used for integrating the data of the original data area so as to establish a main foreign key relation between the data;
the data gathering area establishing module is used for integrating the integrated basic data of the energy sources according to a unified analysis model, extracting and distributing data, making a data cleaning rule and eliminating useless invalid data;
the metadata forming module is used for performing quality management on the cleaned data, forming metadata for describing the data and finishing the inspection on the quality of the metadata;
and the data rationality judging module is used for managing the integrity of the data description conditions by using the metadata and judging the data rationality rules in a graphical mode.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting the same, and although the present invention is described in detail with reference to the above embodiments, those of ordinary skill in the art should understand that: modifications and equivalents may be made to the embodiments of the invention without departing from the spirit and scope of the invention, which is to be covered by the claims.

Claims (8)

1. A data management method for a non-computable region is characterized by comprising the following steps: the method comprises the following steps:
(1) establishing an original data area;
(2) integrating data of the original data area to establish a main foreign key relationship between the data and the data;
(3) integrating the basic data of each energy type after the main foreign key relation is established according to a unified analysis model, extracting and distributing data, formulating a data cleaning rule, and cleaning the data;
(4) performing quality management on the cleaned data to form metadata for describing the data, and finishing the inspection on the quality of the metadata;
(5) and managing the integrity of the data description conditions by using the metadata, and judging the rationality of the data in a graphical mode.
2. The data governance method for the non-computable region according to claim 1, wherein: the original data area is divided into two areas of structured data and unstructured data, wherein the two areas are divided into files, data and equipment.
3. The data governance method for the non-computable region according to claim 1, wherein: the checking of the quality of the metadata comprises metadata attribute filling rate checking, attribute and legal checking, name repeatability checking and relation soundness checking.
4. The data governance method for the non-computable region according to claim 1, wherein: the management of the integrity of the data description conditions comprises the judgment of the data integrity from two dimensions of values obtained from the data volume and the fields of the data; counting the number of data volumes under the same acquisition frequency to determine the data volume of the data; and judging the fields with null values in the data tables of the same type, and determining whether the null values are reasonable according to rules.
5. The utility model provides a but not computer district data governance system which characterized in that: comprises that
The original data area establishing module is used for establishing an original data area;
the data integration area establishing module is used for integrating the data of the original data area so as to establish a main foreign key relation between the data;
the data gathering area establishing module is used for integrating the integrated basic data of the energy sources according to a unified analysis model, extracting and distributing data, making a data cleaning rule and eliminating useless invalid data;
the metadata forming module is used for performing quality management on the cleaned data, forming metadata for describing the data and finishing the inspection on the quality of the metadata;
and the data rationality judging module is used for managing the integrity of the data description conditions by using the metadata and judging the data rationality in a graphical mode.
6. The non-computable regional data governance system based on metadata management as claimed in claim 5, wherein: the original data area is divided into two areas of structured data and unstructured data, wherein the two areas are divided into files, data and equipment.
7. The non-computable area data governance system according to claim 5, wherein: the checking of the quality of the metadata comprises metadata attribute filling rate checking, attribute and legal checking, name repeatability checking and relation soundness checking.
8. The non-computable area data governance system according to claim 5, wherein: the management of the integrity of the data description conditions comprises the judgment of the data integrity from two dimensions of values obtained from the data volume and the fields of the data; counting the number of data volumes under the same acquisition frequency to determine the data volume of the data; and judging the fields with null values in the data tables of the same type, and determining whether the null values are reasonable according to rules.
CN201911298485.6A 2019-12-17 2019-12-17 Data management method and system for non-computable region Pending CN111125075A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911298485.6A CN111125075A (en) 2019-12-17 2019-12-17 Data management method and system for non-computable region

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911298485.6A CN111125075A (en) 2019-12-17 2019-12-17 Data management method and system for non-computable region

Publications (1)

Publication Number Publication Date
CN111125075A true CN111125075A (en) 2020-05-08

Family

ID=70499158

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911298485.6A Pending CN111125075A (en) 2019-12-17 2019-12-17 Data management method and system for non-computable region

Country Status (1)

Country Link
CN (1) CN111125075A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120330911A1 (en) * 2011-06-27 2012-12-27 International Business Machines Corporation Automatic generation of instantiation rules to determine quality of data migration
CN106354786A (en) * 2016-08-23 2017-01-25 冯村 Visual analysis method and system
CN108595563A (en) * 2018-04-13 2018-09-28 林秀丽 A kind of data quality management method and device
CN108766542A (en) * 2018-05-28 2018-11-06 镇江市第人民医院 A kind of data analysis processing method and system
CN110119395A (en) * 2019-05-27 2019-08-13 普元信息技术股份有限公司 The method that data standard and quality of data association process are realized based on metadata in big data improvement
CN110175167A (en) * 2019-05-10 2019-08-27 国网天津市电力公司电力科学研究院 A kind of data cleaning method and system suitable for low-voltage platform area electricity consumption data

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120330911A1 (en) * 2011-06-27 2012-12-27 International Business Machines Corporation Automatic generation of instantiation rules to determine quality of data migration
CN106354786A (en) * 2016-08-23 2017-01-25 冯村 Visual analysis method and system
CN108595563A (en) * 2018-04-13 2018-09-28 林秀丽 A kind of data quality management method and device
CN108766542A (en) * 2018-05-28 2018-11-06 镇江市第人民医院 A kind of data analysis processing method and system
CN110175167A (en) * 2019-05-10 2019-08-27 国网天津市电力公司电力科学研究院 A kind of data cleaning method and system suitable for low-voltage platform area electricity consumption data
CN110119395A (en) * 2019-05-27 2019-08-13 普元信息技术股份有限公司 The method that data standard and quality of data association process are realized based on metadata in big data improvement

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
汪先锋, 中国环境出版集团 *

Similar Documents

Publication Publication Date Title
CN106557991B (en) Voltage monitoring data platform
CN111815132B (en) Network security management information publishing method and system for power monitoring system
CN110457294B (en) Data processing method and device
CN106327055B (en) A kind of electricity expense control method and system based on big data technology
WO2022117126A1 (en) Verification method for electrical grid measurement data
CN113064866B (en) Power business data integration system
CN105405069B (en) Electricity purchase operation decision analysis and data processing method
CN104679646B (en) A kind of method and apparatus for detecting SQL code defect
CN110503570A (en) A kind of exception electricity consumption data detection method, system, equipment, storage medium
CN111552686B (en) Power data quality assessment method and device
CN105335822A (en) Smart power grid unified data model modeling method for big data analysis
CN112817958A (en) Electric power planning data acquisition method and device and intelligent terminal
CN113762735A (en) Data quality management system and method based on rule base
CN111178676A (en) Power distribution network project investment assessment method and system
CN108920110A (en) A kind of parallel processing big data storage system and method calculating mode based on memory
CN112381583A (en) Power consumption calculation method and device based on distributed memory calculation technology
CN115016902B (en) Industrial flow digital management system and method
CN116862137A (en) Charging pile load flexible scheduling method and device based on data fusion
CN111125075A (en) Data management method and system for non-computable region
CN114418237B (en) Distribution network power supply safety capability evaluation standard quantification method, system, equipment and medium
CN111127186A (en) Application method of customer credit rating evaluation system based on big data technology
CN114218216A (en) Resource management method, device, equipment and storage medium
CN110298585B (en) Hierarchical automatic auditing method for monitoring information of substation equipment
CN111260452A (en) Method and system for constructing tax big data model
CN117742975B (en) Resource data processing method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200508