CN107491515B - Intelligent power distribution and utilization data conversion method based on big data platform - Google Patents

Intelligent power distribution and utilization data conversion method based on big data platform Download PDF

Info

Publication number
CN107491515B
CN107491515B CN201710686759.3A CN201710686759A CN107491515B CN 107491515 B CN107491515 B CN 107491515B CN 201710686759 A CN201710686759 A CN 201710686759A CN 107491515 B CN107491515 B CN 107491515B
Authority
CN
China
Prior art keywords
data
original
file
power distribution
library
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710686759.3A
Other languages
Chinese (zh)
Other versions
CN107491515A (en
Inventor
周炜
陈海波
陆超杰
陈春霞
吴�琳
宋云翔
刘爱华
谭勇桂
邵嗣杨
郭乃网
苏运
田英杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NANJING NANRUI GROUP CO
Nari Technology Co Ltd
State Grid Shanghai Electric Power Co Ltd
NARI Nanjing Control System Co Ltd
Original Assignee
NANJING NANRUI GROUP CO
Nari Technology Co Ltd
State Grid Shanghai Electric Power Co Ltd
NARI Nanjing Control System Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NANJING NANRUI GROUP CO, Nari Technology Co Ltd, State Grid Shanghai Electric Power Co Ltd, NARI Nanjing Control System Co Ltd filed Critical NANJING NANRUI GROUP CO
Priority to CN201710686759.3A priority Critical patent/CN107491515B/en
Publication of CN107491515A publication Critical patent/CN107491515A/en
Application granted granted Critical
Publication of CN107491515B publication Critical patent/CN107491515B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/214Database migration support
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Water Supply & Treatment (AREA)
  • Tourism & Hospitality (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Primary Health Care (AREA)
  • Marketing (AREA)
  • Human Resources & Organizations (AREA)
  • General Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an intelligent power distribution and utilization data conversion method based on a big data platform, which comprises the following steps: integrating original data of each power distribution and utilization service system through a data interface, and storing the original data into a data cache region; migrating the original data of each service system from the data cache region to an original library to provide a data basis for subsequent data cleaning and analysis; processing the original data by means of a big data platform, and storing the processed data into an intermediate library according to a new table structure to serve as the original data of subsequent data analysis; and migrating the analyzed data from the intermediate library to the result library to create a global index. According to the intelligent power distribution and utilization service data processing method, the big data platform is used for processing data, raw data are converted into cooked data meeting the functional requirements of each service system of the intelligent power distribution and utilization, the service data requirements of each application function of the intelligent power distribution and utilization are met, the storage structure is optimized, and the comprehensive intelligent and lean development of the intelligent power distribution and utilization service is promoted.

Description

Intelligent power distribution and utilization data conversion method based on big data platform
Technical Field
The invention relates to an intelligent power distribution and utilization data conversion method based on a big data platform, and belongs to the technical field of power automation.
Background
Along with the continuous deepening of the construction of an intelligent power distribution and utilization grid, the number of the acquisition terminals is increased sharply, the acquisition frequency is greatly increased, the power distribution and utilization data volume is developed from TB level to PB level, and the challenges of effective integration, efficient storage and high expandability of multi-source heterogeneous mass data are faced. Meanwhile, the distribution and utilization electric service gradually develops towards the direction of intellectualization and lean, and the data analysis and processing capacity of cross-service and cross-platform needs to be further improved, so that higher requirements are provided for the high efficiency of data storage and processing, the accuracy and the real-time performance of value mining, and the human-computer interaction and visualization effect. Therefore, a big data platform facing the power distribution and utilization business application needs to be established, unified data storage and processing functions are provided on the basis, effective support is provided for the power distribution and utilization big data business application, data value is fully obtained, business decisions are established on the basis of more scientific basis, and power distribution and utilization operation efficiency, crisis coping ability and public service level are improved.
Disclosure of Invention
The invention aims to overcome the defects in the prior art, provides an intelligent power distribution and utilization data conversion method based on a big data platform, and solves the technical problems that after data are acquired from a source system, the data sources are numerous, the data formats are inconsistent, the data volume is huge, and the data cannot be directly used in the prior art.
In order to solve the technical problems, the technical scheme adopted by the invention is as follows: the intelligent power distribution and utilization data conversion method based on the big data platform comprises the following steps:
original data integration: integrating original data of each power distribution and utilization service system through a data interface, and storing the original data into a data cache region;
migration from data cache to original library: migrating the original data of each service system from the data cache region to an original library to provide a data basis for subsequent data cleaning and analysis;
migration from original library to intermediate library: processing the original data by means of a big data platform, and storing the processed data into an intermediate library according to a new table structure to serve as the original data of subsequent data analysis;
migration from intermediate repository to result repository: and migrating the analyzed data from the intermediate library to the result library to create a global index.
The data interface includes: FTP interface, database interface, Webservice interface, and text data interface.
The original library and the intermediate library are Inceptor data warehouses; the result base is a Hyperbase data warehouse.
The data storage format of the original data in the data cache region is as follows: oracle table, unformatted txt file, formatted txt file, excel file and cim/svg file.
The specific method for migrating the original data of each service system from the data cache region to the original library is as follows:
for the oracle table: writing an sqoop script and importing data into an hdfs file of a big data platform; establishing a relationship between the appearance and the hdfs file, wherein the related data supports SQL query but does not support updating, deleting and other object processing; creating an Inceptor table supporting the transaction, and inserting the data in the table into the Inceptor transaction table for use;
for unformatted txt files: the data transplanting operation is realized by adopting a java program, which specifically comprises the following steps: calling a file reading interface to read files one by one and writing the files into a Hyperbase data warehouse; creating a Hyperbase appearance to support SQL query; creating an initiator table supporting the transaction, and inserting data in the Hyperbase table into the table for other interfaces to call;
for excel or formatted txt files: uploading formatted file ftp to a specified directory of a big data platform; secondly, creating an inceptor table according to the file and the specified separator, and loading the text file data under the specified path to the inceptor table; inserting the data in the initiator table into a new initiator table supporting transaction processing;
for cim/svg files: firstly, developing a topology analysis program to analyze cim/svg files into txt files with fixed formats; the other operation processing mode is the same as the formatted text file; and thirdly, a small amount of association relation data is analyzed and completed through a java program.
The processing of the raw data by means of the big data platform comprises: and merging, associating, de-duplicating, transposing rows and columns and cleaning the original data.
The specific method for processing the original data by the big data platform comprises the following steps:
the specific method for processing the original data by the big data platform comprises the following steps:
if the data is complete and has no mutation, the original data of data cleaning is not needed, and the association, combination, duplication removal and row-column transposition are carried out on the original table by depending on the distributed processing capability of the large data platform to form the data with a new structure;
if the data is missing and has mutation, a matlab language is adopted to write a data cleaning program to correct the original data to obtain a formatted text file, the text file is uploaded to a big data platform through ftp, and an epoch table mapping file is created, so that the data in the intermediate library supports transaction management.
The specific method for creating the global index is as follows:
compiling a script program to migrate the Inceptor data in the intermediate library to a Hyperbase data warehouse;
and acquiring an index configuration file of the Hyperbase table, adding the index of the new field to the json file and carrying out effectiveness again.
Compared with the prior art, the invention has the following beneficial effects:
the data are further cleaned, analyzed and mined through a big data platform, raw data are converted into mature data meeting the functional requirements of each service system of the intelligent power distribution and utilization, the service data requirements of each application function of the intelligent power distribution and utilization are met, a storage structure is optimized, the technical problem that original data integrated from each service system of the power distribution and utilization cannot be directly used is solved, and comprehensive intelligent and lean development of the intelligent power distribution and utilization service is promoted.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a schematic diagram of a data interface access mode;
FIG. 3 is a schematic diagram of the structure of migration from a data buffer to a primary library;
FIG. 4 is a schematic diagram of a structure for migrating from an original library to an intermediate library;
FIG. 5 is a schematic diagram of the structure of migration from an intermediate library to a result library.
Detailed Description
The invention is further described below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.
As shown in fig. 1, both the original library and the intermediate library are the inclusion data warehouse; the result base is a Hyperbase data warehouse. The invention provides an intelligent power distribution and utilization data conversion method based on a big data platform, which comprises the following steps:
original data integration: integrating original data of each power distribution and utilization service system through a data interface, and storing the original data into a data cache region;
according to the actual situation of the integrated system, the data interface is divided into: FTP interface, database interface, Webservice interface, and text data interface. Fig. 2 is a schematic diagram showing a specific access method of each data interface.
Migration from data cache to original library: migrating the original data of each service system from the data cache region to an original library to provide a data basis for subsequent data cleaning and analysis;
as shown in fig. 3, the data storage format of the original data in the data buffer area is divided into: oracle table, unformatted txt file, formatted txt file, excel file and cim/svg file.
For the oracle table: writing an sqoop script and importing data into an hdfs file of a big data platform; establishing a relationship between the appearance and the hdfs file, wherein the related data supports SQL query but does not support updating, deleting and other object processing; creating an Inceptor table supporting the transaction, and inserting the data in the table into the Inceptor transaction table for use;
for unformatted txt files: the amount of such data is usually not large, so a java program is adopted to implement data migration operation, specifically: calling a file reading interface to read files one by one and writing the files into a Hyperbase data warehouse; creating a Hyperbase appearance to support SQL query; the appearance does not support the transactional deletion and modification operation at this time, an initiator table supporting the transaction needs to be created, and data in the Hyperbase appearance is inserted into the table for being called by other interfaces;
for excel or formatted txt files: uploading formatted file ftp to a specified directory of a big data platform; secondly, an inceptor table is created according to the file and the specified separator, and the operation can quickly load the text file data under the specified path to the inceptor table; thirdly, inserting the data of the initiator table into a new initiator table supporting the transaction processing in the table not supporting the transaction processing;
for cim/svg files: firstly, developing a topology analysis program to analyze cim/svg files into txt files with fixed formats; the other operation processing mode is the same as the formatted text file; and thirdly, acquiring the incidence relation data of the platform area and the outgoing line, and analyzing by a java program.
Migration from original library to intermediate library: processing the original data by means of a big data platform, and storing the processed data into an intermediate library according to a new table structure to serve as the original data of subsequent data analysis;
as shown in fig. 4, the processing of the raw data by the big data platform includes: and merging, associating, de-duplicating, transposing rows and columns and cleaning the original data.
For the data quality itself is relatively high, such as: the data integrity is good, the data has no mutation, the original data of data cleaning is not needed, the operations of association, combination, duplication removal, row-column transposition and the like can be carried out on the original table by relying on the strong distributed processing capability of a large data platform, and the data of a new structure which is more beneficial to subsequent data analysis and use is formed.
For raw data with poor data quality, such as: the method has the advantages that the data loss is large, the data are suddenly changed, and the like, and the matlab language is adopted to write a data cleaning program to correct the original data. As the processing result is a formatted text file, several operations such as uploading the file to a big data platform, creating an epoch table mapping file and the like are also needed, so that the data in the intermediate library supports transaction management.
Migration from intermediate repository to result repository: the analyzed data is migrated from the intermediate library to a result library with higher query efficiency, a global index is created, the query response speed is increased, and the user experience is improved, as shown in fig. 5.
The specific method for creating the global index is as follows: compiling a script program to migrate the Inceptor data in the intermediate library to a Hyperbase data warehouse; and acquiring an index configuration file of the Hyperbase table, adding the index of the new field to the json file and carrying out effectiveness again.
The data is put into the Inceptor data warehouse for processing, on one hand, the data warehouse supports transactions and can realize the increment, deletion, check and modification of distributed data, and on the other hand, the data warehouse supports the SQL mode operation of data and executes the MapReduce task, so that the learning cost of developers is reduced. However, when facing massive data and needing frequent real-time interaction with the foreground, the query efficiency (especially fuzzy query and range query) of the administrator is low and cannot meet the requirement. Therefore, all Inceptor tables of the middle library, which need to interact with the foreground, are migrated to the Hyperbase database, and besides the Hyperbase main key self-contained index, a secondary index can be created for one or more columns of the common retrieval column, so that the retrieval speed is greatly increased.
The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.

Claims (5)

1. The intelligent power distribution and utilization data conversion method based on the big data platform is characterized by comprising the following steps:
original data integration: integrating original data of each power distribution and utilization service system through a data interface, and storing the original data into a data cache region;
migration from data cache to original library: migrating the original data of each service system from the data cache region to an original library to provide a data basis for subsequent data cleaning and analysis;
migration from original library to intermediate library: processing the original data by means of a big data platform, and storing the processed data into an intermediate library according to a new table structure to serve as the original data of subsequent data analysis;
migration from intermediate repository to result repository: migrating the analyzed data from the intermediate library to a result library, and creating a global index;
the original library and the intermediate library are Inceptor data warehouses; the result base is a Hyperbase data warehouse;
the data storage format of the original data in the data cache region is as follows: an oracle table, a non-formatted txt file, a formatted txt file, an excel file and a cim/svg file;
the specific method for migrating the original data of each service system from the data cache region to the original library is as follows:
for the oracle table: writing an sqoop script and importing data into an hdfs file of a big data platform; establishing a surface to be associated with the hdfs file, wherein the associated data supports SQL query but does not support updating, and deleting transaction processing; creating an Inceptor table supporting the transaction, and inserting the data in the table into the Inceptor transaction table for use;
for unformatted txt files: the data transplanting operation is realized by adopting a java program, which specifically comprises the following steps: calling a file reading interface to read files one by one and writing the files into a Hyperbase data warehouse; creating a Hyperbase appearance to support SQL query; creating an initiator table supporting the transaction, and inserting data in the Hyperbase table into the table for other interfaces to call;
for excel or formatted txt files: uploading a formatted txt file ftp to a specified directory of a big data platform; secondly, creating an inceptor table according to the file and the specified separator, and loading the text file data under the specified path to the inceptor table; inserting the data in the initiator table into a new initiator table supporting transaction processing;
for cim/svg files: firstly, developing a topology analysis program to analyze cim/svg files into txt files with fixed formats; other operation processing modes are the same as those of the formatted txt text file; and thirdly, a small amount of association relation data is analyzed and completed through a java program.
2. The intelligent power distribution and utilization data conversion method based on the big data platform as claimed in claim 1, wherein the data interface comprises: FTP interface, database interface, Webservice interface, and text data interface.
3. The intelligent power distribution and utilization data conversion method based on the big data platform as claimed in claim 1, wherein the processing of the original data by the big data platform comprises: and merging, associating, de-duplicating, transposing rows and columns and cleaning the original data.
4. The intelligent power distribution and utilization data conversion method based on the big data platform as claimed in claim 3, wherein the big data platform processes the original data by a specific method comprising the following steps:
if the data is complete and has no mutation, the original data of data cleaning is not needed, and the association, combination, duplication removal and row-column transposition are carried out on the original table by depending on the distributed processing capability of the large data platform to form the data with a new structure;
if the data is missing and has mutation, a matlab language is adopted to write a data cleaning program to correct the original data to obtain a formatted txt file, the text file is uploaded to a big data platform through ftp, and an inceptor table mapping file is created, so that the data in the intermediate library supports transaction management.
5. The intelligent power distribution and utilization data conversion method based on the big data platform as claimed in claim 1, wherein the specific method for creating the global index is as follows:
compiling a script program to migrate the Inceptor data in the intermediate library to a Hyperbase data warehouse;
and acquiring an index configuration file of the Hyperbase table, adding the index of the new field to the json file and carrying out effectiveness again.
CN201710686759.3A 2017-08-11 2017-08-11 Intelligent power distribution and utilization data conversion method based on big data platform Active CN107491515B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710686759.3A CN107491515B (en) 2017-08-11 2017-08-11 Intelligent power distribution and utilization data conversion method based on big data platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710686759.3A CN107491515B (en) 2017-08-11 2017-08-11 Intelligent power distribution and utilization data conversion method based on big data platform

Publications (2)

Publication Number Publication Date
CN107491515A CN107491515A (en) 2017-12-19
CN107491515B true CN107491515B (en) 2020-10-16

Family

ID=60645350

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710686759.3A Active CN107491515B (en) 2017-08-11 2017-08-11 Intelligent power distribution and utilization data conversion method based on big data platform

Country Status (1)

Country Link
CN (1) CN107491515B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108536833A (en) * 2018-04-12 2018-09-14 成都信息工程大学 A kind of distributed, database and its construction method towards big data
CN108959356A (en) * 2018-05-07 2018-12-07 国网上海市电力公司 A kind of intelligence adapted TV university Data application system Data Mart method for building up
CN109522303B (en) * 2018-11-13 2021-06-15 深圳市思迪信息技术股份有限公司 Excel configuration-based data acquisition method and device and computer equipment
CN109299183A (en) * 2018-11-20 2019-02-01 北京锐安科技有限公司 A kind of data processing method, device, terminal device and storage medium
CN111580862A (en) * 2020-05-15 2020-08-25 中国邮政储蓄银行股份有限公司 Data migration method and device
CN111897863B (en) * 2020-07-31 2022-11-08 珠海市新德汇信息技术有限公司 Multi-source heterogeneous data fusion and convergence method
CN112995326A (en) * 2021-03-10 2021-06-18 中国电力科学研究院有限公司 Method and system for acquiring and uploading quality data of intelligent electric energy meter
CN113190543A (en) * 2021-05-24 2021-07-30 全球能源互联网研究院有限公司 Data cleaning method and device, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103593422A (en) * 2013-11-01 2014-02-19 国云科技股份有限公司 Virtual access management method of heterogeneous database
CN103902671A (en) * 2014-03-19 2014-07-02 北京科技大学 Dynamic integration method and system of multi-source heterogeneous data
CN105654730A (en) * 2015-12-31 2016-06-08 公安部交通管理科学研究所 Method for identifying fake-licensed car based on block port throughput big data analysis
CN106156165A (en) * 2015-04-16 2016-11-23 阿里巴巴集团控股有限公司 Method of data synchronization between heterogeneous data source and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103593422A (en) * 2013-11-01 2014-02-19 国云科技股份有限公司 Virtual access management method of heterogeneous database
CN103902671A (en) * 2014-03-19 2014-07-02 北京科技大学 Dynamic integration method and system of multi-source heterogeneous data
CN106156165A (en) * 2015-04-16 2016-11-23 阿里巴巴集团控股有限公司 Method of data synchronization between heterogeneous data source and device
CN105654730A (en) * 2015-12-31 2016-06-08 公安部交通管理科学研究所 Method for identifying fake-licensed car based on block port throughput big data analysis

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
星环科技."电力行业敏捷BI大数据应用".《https://jz.docin.com/p-1297485356.html》.2015,1-39. *

Also Published As

Publication number Publication date
CN107491515A (en) 2017-12-19

Similar Documents

Publication Publication Date Title
CN107491515B (en) Intelligent power distribution and utilization data conversion method based on big data platform
US11093466B2 (en) Incremental out-of-place updates for index structures
CN107544984B (en) Data processing method and device
CA3078018C (en) Scalable analysis platform for semi-structured data
US11429630B2 (en) Tiered storage for data processing
CN112347071B (en) Power distribution network cloud platform data fusion method and power distribution network cloud platform
CN106708993A (en) Spatial data storage processing middleware framework realization method based on big data technology
CN105139281A (en) Method and system for processing big data of electric power marketing
CN105243155A (en) Big data extracting and exchanging system
EP2763055B1 (en) A telecommunication method and mobile telecommunication device for providing data to a mobile application
CN102073697A (en) Data processing method and data processing device
CN104239377A (en) Platform-crossing data retrieval method and device
CN107301214A (en) Data migration method, device and terminal device in HIVE
CN108334596B (en) Massive relational data efficient parallel migration method for big data platform
CN107766541B (en) Distribution and utilization global full-volume data transmission and storage method and device, and electronic equipment
CN110309233A (en) Method, apparatus, server and the storage medium of data storage
US10558665B2 (en) Network common data form data management
CN103279502A (en) Framework and method of repeated data deleting file system combined with parallel file system
Jiadi et al. Research on Data Center Operation and Maintenance Management Based on Big Data
CN115509693A (en) Data optimization method based on cluster Pod scheduling combined with data lake
CN111753000A (en) Water supply network information system
CN112650779B (en) Cloud auditing method supporting deep page skipping query based on ElasticSearch
CN204102026U (en) Large database concept all-in-one
CN110806963A (en) Example information monitoring and visual display method based on wave cloud database
Singh NoSQL: A new horizon in big data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant