CN112241391A - Method and system for extracting, cleaning and integrating power data of power supply company - Google Patents

Method and system for extracting, cleaning and integrating power data of power supply company Download PDF

Info

Publication number
CN112241391A
CN112241391A CN202011115601.9A CN202011115601A CN112241391A CN 112241391 A CN112241391 A CN 112241391A CN 202011115601 A CN202011115601 A CN 202011115601A CN 112241391 A CN112241391 A CN 112241391A
Authority
CN
China
Prior art keywords
data
power
power supply
cleaning
supply company
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202011115601.9A
Other languages
Chinese (zh)
Inventor
胥威汀
徐浩
王海燕
汪伟
叶强
邓盈盈
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Economic and Technological Research Institute of State Grid Sichuan Electric Power Co Ltd
Original Assignee
Economic and Technological Research Institute of State Grid Sichuan Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Economic and Technological Research Institute of State Grid Sichuan Electric Power Co Ltd filed Critical Economic and Technological Research Institute of State Grid Sichuan Electric Power Co Ltd
Priority to CN202011115601.9A priority Critical patent/CN112241391A/en
Publication of CN112241391A publication Critical patent/CN112241391A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/11File system administration, e.g. details of archiving or snapshots
    • G06F16/116Details of conversion of file system types or formats
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0637Strategic management or analysis, e.g. setting a goal or target of an organisation; Planning actions based on goals; Analysis or evaluation of effectiveness of goals
    • G06Q10/06375Prediction of business process outcome or impact based on a proposed change
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Educational Administration (AREA)
  • Marketing (AREA)
  • Health & Medical Sciences (AREA)
  • Quality & Reliability (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • Primary Health Care (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Fuzzy Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Water Supply & Treatment (AREA)
  • Computational Linguistics (AREA)
  • Development Economics (AREA)
  • Public Health (AREA)
  • Game Theory and Decision Science (AREA)
  • Operations Research (AREA)
  • Probability & Statistics with Applications (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a method and a system for extracting, cleaning and integrating power data of a power supply company, wherein the method for extracting, cleaning and integrating the power data of the power supply company sequentially comprises the following steps: s1: data extraction, namely extracting power data of monthly reports of production and operation of county-level power supply companies; s2: data cleaning, namely performing data quality verification on the extracted power data and cleaning abnormal data of the extracted power data; s3: and data integration, namely integrating the power data of each month into a complete data set on the basis of the data cleaned, so as to form an internal data set of the complete power supply company. The system for extracting, cleaning and integrating the power data of the power supply company comprises a data extraction module, a data cleaning module and a data integration module. The invention solves the defect of low utilization degree of data value in the production and management monthly newspaper in the prior art.

Description

Method and system for extracting, cleaning and integrating power data of power supply company
Technical Field
The invention relates to the technical field of power systems, in particular to a method and a system for extracting, cleaning and integrating power data of a power supply company.
Background
Electric power is a forerunner of economic development and is the basis of the development of local economy. The power data can directly reflect the regional economic development vitality and characteristic state. The power supply company is used as a main unit of power supply and is responsible for guaranteeing safe and reliable supply of regional energy and economic development of service areas.
Based on the requirement of supporting the balanced development of regional power grid companies and promoting the stable development of regional economy, a data mining and analyzing system for producing monthly reports needs to be constructed, the data value in the monthly reports of the production and operation of county-level power supply companies is deeply researched, and regional characteristic data is combined, through related multi-dimensional mining and comprehensive analysis content, a county-level power supply company is guided to improve the quality and efficiency of power grid production and operation, explore production and operation problems, evaluate the current situation of regional economic development and judge the future development trend, problems and defects existing in the production and operation monthly reports of each county-level power supply company are explored, the integrity and accuracy of the monthly reports are improved, the comprehensive analysis mode and dimension of the production and operation reports are optimized, decision support is provided for provincial-level power supply companies and prefecture companies to optimize resource allocation, relevant improvement and treatment work of each county-level power supply company is assisted, and balanced and healthy development of regions is promoted.
However, the prior art has the defects that: at present, the provincial power supply company has insufficient accuracy of the overall data mastery degree of the monthly reports produced and operated by the county power supply companies, and the county power supply companies have certain differences in the format, the statistical dimension and the like of the monthly reports produced and operated, so that the provincial power supply companies have inaccurate accuracy of the power data mastery of the county power supply companies.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a method and a system for extracting, cleaning and integrating power data of a power supply company, and solves the defect that the provincial power supply company cannot accurately master the power data of the county power supply company in the prior art.
The technical scheme adopted by the invention for solving the problems is as follows:
the method for extracting, cleaning and integrating the power data of the power supply company sequentially comprises the following steps of:
s1: data extraction, namely extracting power data of monthly reports of production and operation of county-level power supply companies;
s2: data cleaning, namely performing data quality verification on the extracted power data and cleaning abnormal data of the extracted power data;
s3: and data integration, namely integrating the power data of each month into a complete data set on the basis of the data cleaned, so as to form an internal data set of the complete power supply company.
Through the steps, the provincial power supply company extracts the electric power data of the monthly newspaper of the production and operation of the county power supply company, then clears abnormal data, and integrates the abnormal data, so that certain differences of the format, the statistical dimension and the like of the monthly newspaper of the production and operation of each county power supply company are effectively avoided, and the accuracy of the provincial power supply company in mastering the electric power data of the county power supply company is improved.
Preferably, the data extraction includes: and (5) combing the data structure and constructing a data extraction program.
The data structure is combed, the goal of extraction is convenient to be clear, the automatic extraction is convenient for a data extraction program, and the extraction efficiency is improved.
Preferably, the data structure combing includes: confirming a data crawling object and converting a file format.
The data crawling object is data of a table in an annex of a production and operation monthly report of each county-level power supply company, and the source of the power data is determined according to the extraction requirement; the file format conversion is to convert all files with different file formats (including multiple file formats such as doc, pdf, wps and rar) into a uniform file format in batches, so as to prepare for file data extraction and mining.
Preferably, the data crawling object comprises power data in an industry electricity utilization classification table, an electricity sale detail statistical table, a line loss rate statistical table, a 10kv heavy loss and negative loss line and platform area detail table and a 10kv heavy load line and platform area detail table.
The attached table has various key electric power data information, and electric power data in the attached table is crawled to conveniently, comprehensively and accurately master the electric power information.
Preferably, the target format of the file format conversion is.
Docx is convenient for identification and labeling and operation.
Preferably, the data cleansing is for a case including: the form in the production and management monthly report is in a picture form, so that data cannot be crawled; the condition that the tabular form and the data dimension are inconsistent with other reports exists in the production and management monthly report; header duplication exists in the production and management monthly report form, resulting in crawl to useless fields.
The above conditions will seriously affect the accuracy of power data extraction, so cleaning the power data in the above conditions will greatly improve the accuracy of power data extraction.
Preferably, the power data includes industry power consumption information, power selling amount, line loss rate, power supply line information, distribution area and line details.
The indexes are convenient for objectively reflecting the power condition, and have guiding significance for controlling the condition of a county-level power supply company.
The system for extracting, cleaning and integrating the power data of the power supply company comprises a data extraction module, a data cleaning module and a data integration module;
the data extraction module is used for extracting the electric power data of the monthly newspaper for the production and operation of the county-level power supply company;
the data cleaning module is used for checking the data quality of the extracted power data and cleaning abnormal data of the extracted power data;
the data integration module is used for integrating the power data of each month into a complete data set on the basis of the data cleaned, so as to form a complete power supply company internal data set.
Through the modules, the provincial power supply company extracts the power data of the monthly newspaper of the production and operation of the county power supply company, then clears abnormal data, and integrates the abnormal data, so that certain differences of the format, the statistical dimension and the like of the monthly newspaper of the production and operation of each county power supply company are effectively avoided, and the accuracy of the provincial power supply company in mastering the power data of the county power supply company is improved.
Compared with the prior art, the invention has the following beneficial effects:
(1) the method effectively avoids certain differences of production and operation monthly report formats, statistical dimensions and the like of county-level power supply companies, so that the accuracy of the provincial-level power supply companies in mastering the electric power data of the county-level power supply companies is improved;
(2) the invention is convenient for defining the target of extraction, and the data extraction program is convenient for automatic extraction, thereby improving the extraction efficiency;
(3) the method and the device are beneficial to determining the source of the power data according to the extraction requirement;
(4) the invention can conveniently, comprehensively and accurately master the electric power information;
(5) the target format of the file format conversion is docx, which is convenient for identification and marking and operation;
(6) the accuracy of electric power data extraction is greatly improved;
(7) the method is convenient for objectively reflecting the power condition, and has guiding significance for controlling the condition of a county-level power supply company.
Drawings
FIG. 1 is a schematic flow chart of the present invention.
Detailed Description
The present invention will be described in further detail with reference to examples and drawings, but the present invention is not limited to these examples.
Example 1
The method for extracting, cleaning and integrating the power data of the power supply company sequentially comprises the following steps of:
s1: data extraction, namely extracting power data of monthly reports of production and operation of county-level power supply companies;
s2: data cleaning, namely performing data quality verification on the extracted power data and cleaning abnormal data of the extracted power data;
s3: and data integration, namely integrating the power data of each month into a complete data set on the basis of the data cleaned, so as to form an internal data set of the complete power supply company.
Through the steps, the provincial power supply company extracts the electric power data of the monthly newspaper of the production and operation of the county power supply company, then clears abnormal data, and integrates the abnormal data, so that certain differences of the format, the statistical dimension and the like of the monthly newspaper of the production and operation of each county power supply company are effectively avoided, and the accuracy of the provincial power supply company in mastering the electric power data of the county power supply company is improved.
Preferably, the data extraction includes: and (5) combing the data structure and constructing a data extraction program.
The data structure is combed, the goal of extraction is convenient to be clear, the automatic extraction is convenient for a data extraction program, and the extraction efficiency is improved.
Example 2
In order to better illustrate the present invention, as a further optimization of embodiment 1, this embodiment includes all the technical features of embodiment 1, and the difference is that this embodiment further includes the following technical features:
preferably, the data structure combing includes: confirming a data crawling object and converting a file format.
The data crawling object is data of a table in an annex of a production and operation monthly report of each county-level power supply company, and the source of the power data is determined according to the extraction requirement; the file format conversion is to convert all files with different file formats (including multiple file formats such as doc, pdf, wps and rar) into a uniform file format in batches, so as to prepare for file data extraction and mining.
Preferably, the data crawling object comprises power data in an industry electricity utilization classification table, an electricity sale detail statistical table, a line loss rate statistical table, a 10kv heavy loss and negative loss line and platform area detail table and a 10kv heavy load line and platform area detail table.
The attached table has various key electric power data information, and electric power data in the attached table is crawled to conveniently, comprehensively and accurately master the electric power information.
Preferably, the target format of the file format conversion is.
Docx is convenient for identification and labeling and operation.
Preferably, the data cleansing is for a case including: the form in the production and management monthly report is in a picture form, so that data cannot be crawled; the condition that the tabular form and the data dimension are inconsistent with other reports exists in the production and management monthly report; header duplication exists in the production and management monthly report form, resulting in crawl to useless fields.
The above conditions will seriously affect the accuracy of power data extraction, so cleaning the power data in the above conditions will greatly improve the accuracy of power data extraction.
Preferably, the power data includes industry power consumption information, power selling amount, line loss rate, power supply line information, distribution area and line details.
The indexes are convenient for objectively reflecting the power condition, and have guiding significance for controlling the condition of a county-level power supply company.
Example 3
The system for extracting, cleaning and integrating the power data of the power supply company comprises a data extraction module, a data cleaning module and a data integration module;
the data extraction module is used for extracting the electric power data of the monthly newspaper for the production and operation of the county-level power supply company;
the data cleaning module is used for checking the data quality of the extracted power data and cleaning abnormal data of the extracted power data;
the data integration module is used for integrating the power data of each month into a complete data set on the basis of the data cleaned, so as to form a complete power supply company internal data set.
Through the modules, the provincial power supply company extracts the power data of the monthly newspaper of the production and operation of the county power supply company, then clears abnormal data, and integrates the abnormal data, so that certain differences of the format, the statistical dimension and the like of the monthly newspaper of the production and operation of each county power supply company are effectively avoided, and the accuracy of the provincial power supply company in mastering the power data of the county power supply company is improved.
As described above, the present invention can be preferably realized.
The foregoing is only a preferred embodiment of the present invention, and the present invention is not limited thereto in any way, and any simple modification, equivalent replacement and improvement made to the above embodiment within the spirit and principle of the present invention still fall within the protection scope of the present invention.

Claims (8)

1. The method for extracting, cleaning and integrating the power data of the power supply company is characterized by sequentially comprising the following steps of:
s1: data extraction, namely extracting power data of monthly reports of production and operation of county-level power supply companies;
s2: data cleaning, namely performing data quality verification on the extracted power data and cleaning abnormal data of the extracted power data;
s3: and data integration, namely integrating the power data of each month into a complete data set on the basis of the data cleaned, so as to form an internal data set of the complete power supply company.
2. The extraction, cleaning and integration method for power supply company power data according to claim 1, characterized in that the data extraction comprises: and (5) combing the data structure and constructing a data extraction program.
3. The extraction, cleaning and integration method of power supply company power data as claimed in claim 2, wherein the data structure combing comprises: confirming a data crawling object and converting a file format.
4. The method for extracting, cleaning and integrating the power data of the power supply company according to claim 3, wherein the data crawling objects comprise the power data in the section Classification of industry power consumption, the section statistical Table of sold power, the section statistical Table of line loss rate, the section detailed Table of 10kv heavy loss, negative loss line and station area, and the section detailed Table of 10kv heavy load line and station area.
5. The method for extracting, cleaning and integrating the power data of the power supply company as claimed in claim 3, wherein the target format of the file format conversion is.
6. The extraction, cleaning and integration method for power supply company power data according to claim 1, wherein the data cleaning is performed on the condition that: the form in the production and management monthly report is in a picture form, so that data cannot be crawled; the condition that the tabular form and the data dimension are inconsistent with other reports exists in the production and management monthly report; header duplication exists in the production and management monthly report form, resulting in crawl to useless fields.
7. The method for extracting, cleaning and integrating the power data of the power supply company as claimed in claim 1, wherein the power data comprises industry power consumption information, power selling amount, line loss rate, power supply line information, station area and line detail.
8. The system for extracting, cleaning and integrating the power data of the power supply company is characterized by comprising a data extraction module, a data cleaning module and a data integration module;
the data extraction module is used for extracting the electric power data of the monthly newspaper for the production and operation of the county-level power supply company;
the data cleaning module is used for checking the data quality of the extracted power data and cleaning abnormal data of the extracted power data;
the data integration module is used for integrating the power data of each month into a complete data set on the basis of the data cleaned, so as to form a complete power supply company internal data set.
CN202011115601.9A 2020-10-19 2020-10-19 Method and system for extracting, cleaning and integrating power data of power supply company Withdrawn CN112241391A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011115601.9A CN112241391A (en) 2020-10-19 2020-10-19 Method and system for extracting, cleaning and integrating power data of power supply company

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011115601.9A CN112241391A (en) 2020-10-19 2020-10-19 Method and system for extracting, cleaning and integrating power data of power supply company

Publications (1)

Publication Number Publication Date
CN112241391A true CN112241391A (en) 2021-01-19

Family

ID=74169126

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011115601.9A Withdrawn CN112241391A (en) 2020-10-19 2020-10-19 Method and system for extracting, cleaning and integrating power data of power supply company

Country Status (1)

Country Link
CN (1) CN112241391A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114037560A (en) * 2021-11-12 2022-02-11 国网福建省电力有限公司 Company power supply comprehensive meter data acquisition method based on purchase and sale synchronization
CN114066215A (en) * 2021-11-12 2022-02-18 国网福建省电力有限公司 Company caliber electricity selling itemized access method based on purchasing and selling synchronization

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114037560A (en) * 2021-11-12 2022-02-11 国网福建省电力有限公司 Company power supply comprehensive meter data acquisition method based on purchase and sale synchronization
CN114066215A (en) * 2021-11-12 2022-02-18 国网福建省电力有限公司 Company caliber electricity selling itemized access method based on purchasing and selling synchronization

Similar Documents

Publication Publication Date Title
CN108446396B (en) Power data processing method based on improved CIM model
CN102567859A (en) Data integrated management system of intelligentized power supply system
CN112241391A (en) Method and system for extracting, cleaning and integrating power data of power supply company
CN106530121B (en) Method and system for detecting safety protection compliance of power monitoring system
CN106251094B (en) 10kV business expansion and installation work order transaction analysis device and analysis method
CN202600765U (en) Data comprehensive management system for intelligent power supply system
CN112615428A (en) Line loss analysis and treatment system and method
CN112308437A (en) Line loss management method, system, device and storage medium based on big data analysis
CN104361086A (en) Data integration method for measurable asset entire life-cycle management system
CN106503240A (en) A kind of power equipment image analysis data base construction method and device
CN112100223A (en) Power grid enterprise power equipment marketing and distribution run-through data acquisition and processing method
CN110719445A (en) Remote meter reading system and method based on image recognition
CN110852646A (en) On-site fault processing management system based on mobile operation terminal
CN108376324B (en) System and method for managing metering assets of carrier collector
CN107194529B (en) Power distribution network reliability economic benefit analysis method and device based on mining technology
CN115712636A (en) Electric power planning data acquisition method based on big data analysis
CN201417948Y (en) Distribution network status and operating mode optimizing system based on DSCADA system
CN115115470A (en) Green data center carbon emission management method based on emission factor method
CN111049157B (en) Distribution network transformer reactive compensation condition analysis method
CN115062948A (en) Power system measurement method based on Internet of things
CN111478340B (en) Distribution network line reactive compensation condition analysis method
CN114003774A (en) A big data information collection system of electric power for wisdom city
CN110852606A (en) Production early report data object analysis method based on regulation cloud
CN111260311A (en) Electric quantity data platform system and analysis method
CN111178763A (en) Enterprise energy efficiency data management method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20210119