CN107491515B

CN107491515B - Intelligent power distribution and utilization data conversion method based on big data platform

Info

Publication number: CN107491515B
Application number: CN201710686759.3A
Authority: CN
Inventors: 周炜; 陈海波; 陆超杰; 陈春霞; 吴�琳; 宋云翔; 刘爱华; 谭勇桂; 邵嗣杨; 郭乃网; 苏运; 田英杰
Original assignee: NANJING NANRUI GROUP CO; Nari Technology Co Ltd; State Grid Shanghai Electric Power Co Ltd; NARI Nanjing Control System Co Ltd
Current assignee: NANJING NANRUI GROUP CO; Nari Technology Co Ltd; State Grid Shanghai Electric Power Co Ltd; NARI Nanjing Control System Co Ltd
Priority date: 2017-08-11
Filing date: 2017-08-11
Publication date: 2020-10-16
Anticipated expiration: 2037-08-11
Also published as: CN107491515A

Abstract

The invention discloses an intelligent power distribution and utilization data conversion method based on a big data platform, which comprises the following steps: integrating original data of each power distribution and utilization service system through a data interface, and storing the original data into a data cache region; migrating the original data of each service system from the data cache region to an original library to provide a data basis for subsequent data cleaning and analysis; processing the original data by means of a big data platform, and storing the processed data into an intermediate library according to a new table structure to serve as the original data of subsequent data analysis; and migrating the analyzed data from the intermediate library to the result library to create a global index. According to the intelligent power distribution and utilization service data processing method, the big data platform is used for processing data, raw data are converted into cooked data meeting the functional requirements of each service system of the intelligent power distribution and utilization, the service data requirements of each application function of the intelligent power distribution and utilization are met, the storage structure is optimized, and the comprehensive intelligent and lean development of the intelligent power distribution and utilization service is promoted.

Description

Intelligent power distribution and utilization data conversion method based on big data platform

Technical Field

The invention relates to an intelligent power distribution and utilization data conversion method based on a big data platform, and belongs to the technical field of power automation.

Background

Along with the continuous deepening of the construction of an intelligent power distribution and utilization grid, the number of the acquisition terminals is increased sharply, the acquisition frequency is greatly increased, the power distribution and utilization data volume is developed from TB level to PB level, and the challenges of effective integration, efficient storage and high expandability of multi-source heterogeneous mass data are faced. Meanwhile, the distribution and utilization electric service gradually develops towards the direction of intellectualization and lean, and the data analysis and processing capacity of cross-service and cross-platform needs to be further improved, so that higher requirements are provided for the high efficiency of data storage and processing, the accuracy and the real-time performance of value mining, and the human-computer interaction and visualization effect. Therefore, a big data platform facing the power distribution and utilization business application needs to be established, unified data storage and processing functions are provided on the basis, effective support is provided for the power distribution and utilization big data business application, data value is fully obtained, business decisions are established on the basis of more scientific basis, and power distribution and utilization operation efficiency, crisis coping ability and public service level are improved.

Disclosure of Invention

The invention aims to overcome the defects in the prior art, provides an intelligent power distribution and utilization data conversion method based on a big data platform, and solves the technical problems that after data are acquired from a source system, the data sources are numerous, the data formats are inconsistent, the data volume is huge, and the data cannot be directly used in the prior art.

In order to solve the technical problems, the technical scheme adopted by the invention is as follows: the intelligent power distribution and utilization data conversion method based on the big data platform comprises the following steps:

original data integration: integrating original data of each power distribution and utilization service system through a data interface, and storing the original data into a data cache region;

migration from data cache to original library: migrating the original data of each service system from the data cache region to an original library to provide a data basis for subsequent data cleaning and analysis;

migration from original library to intermediate library: processing the original data by means of a big data platform, and storing the processed data into an intermediate library according to a new table structure to serve as the original data of subsequent data analysis;

migration from intermediate repository to result repository: and migrating the analyzed data from the intermediate library to the result library to create a global index.

The data interface includes: FTP interface, database interface, Webservice interface, and text data interface.

The original library and the intermediate library are Inceptor data warehouses; the result base is a Hyperbase data warehouse.

The data storage format of the original data in the data cache region is as follows: oracle table, unformatted txt file, formatted txt file, excel file and cim/svg file.

The specific method for migrating the original data of each service system from the data cache region to the original library is as follows:

for the oracle table: writing an sqoop script and importing data into an hdfs file of a big data platform; establishing a relationship between the appearance and the hdfs file, wherein the related data supports SQL query but does not support updating, deleting and other object processing; creating an Inceptor table supporting the transaction, and inserting the data in the table into the Inceptor transaction table for use;

for unformatted txt files: the data transplanting operation is realized by adopting a java program, which specifically comprises the following steps: calling a file reading interface to read files one by one and writing the files into a Hyperbase data warehouse; creating a Hyperbase appearance to support SQL query; creating an initiator table supporting the transaction, and inserting data in the Hyperbase table into the table for other interfaces to call;

for excel or formatted txt files: uploading formatted file ftp to a specified directory of a big data platform; secondly, creating an inceptor table according to the file and the specified separator, and loading the text file data under the specified path to the inceptor table; inserting the data in the initiator table into a new initiator table supporting transaction processing;

for cim/svg files: firstly, developing a topology analysis program to analyze cim/svg files into txt files with fixed formats; the other operation processing mode is the same as the formatted text file; and thirdly, a small amount of association relation data is analyzed and completed through a java program.

The processing of the raw data by means of the big data platform comprises: and merging, associating, de-duplicating, transposing rows and columns and cleaning the original data.

The specific method for processing the original data by the big data platform comprises the following steps:

if the data is complete and has no mutation, the original data of data cleaning is not needed, and the association, combination, duplication removal and row-column transposition are carried out on the original table by depending on the distributed processing capability of the large data platform to form the data with a new structure;

if the data is missing and has mutation, a matlab language is adopted to write a data cleaning program to correct the original data to obtain a formatted text file, the text file is uploaded to a big data platform through ftp, and an epoch table mapping file is created, so that the data in the intermediate library supports transaction management.

The specific method for creating the global index is as follows:

compiling a script program to migrate the Inceptor data in the intermediate library to a Hyperbase data warehouse;

and acquiring an index configuration file of the Hyperbase table, adding the index of the new field to the json file and carrying out effectiveness again.

Compared with the prior art, the invention has the following beneficial effects:

the data are further cleaned, analyzed and mined through a big data platform, raw data are converted into mature data meeting the functional requirements of each service system of the intelligent power distribution and utilization, the service data requirements of each application function of the intelligent power distribution and utilization are met, a storage structure is optimized, the technical problem that original data integrated from each service system of the power distribution and utilization cannot be directly used is solved, and comprehensive intelligent and lean development of the intelligent power distribution and utilization service is promoted.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a schematic diagram of a data interface access mode;

FIG. 3 is a schematic diagram of the structure of migration from a data buffer to a primary library;

FIG. 4 is a schematic diagram of a structure for migrating from an original library to an intermediate library;

FIG. 5 is a schematic diagram of the structure of migration from an intermediate library to a result library.

Detailed Description

The invention is further described below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.

As shown in fig. 1, both the original library and the intermediate library are the inclusion data warehouse; the result base is a Hyperbase data warehouse. The invention provides an intelligent power distribution and utilization data conversion method based on a big data platform, which comprises the following steps:

according to the actual situation of the integrated system, the data interface is divided into: FTP interface, database interface, Webservice interface, and text data interface. Fig. 2 is a schematic diagram showing a specific access method of each data interface.

as shown in fig. 3, the data storage format of the original data in the data buffer area is divided into: oracle table, unformatted txt file, formatted txt file, excel file and cim/svg file.

for unformatted txt files: the amount of such data is usually not large, so a java program is adopted to implement data migration operation, specifically: calling a file reading interface to read files one by one and writing the files into a Hyperbase data warehouse; creating a Hyperbase appearance to support SQL query; the appearance does not support the transactional deletion and modification operation at this time, an initiator table supporting the transaction needs to be created, and data in the Hyperbase appearance is inserted into the table for being called by other interfaces;

for excel or formatted txt files: uploading formatted file ftp to a specified directory of a big data platform; secondly, an inceptor table is created according to the file and the specified separator, and the operation can quickly load the text file data under the specified path to the inceptor table; thirdly, inserting the data of the initiator table into a new initiator table supporting the transaction processing in the table not supporting the transaction processing;

for cim/svg files: firstly, developing a topology analysis program to analyze cim/svg files into txt files with fixed formats; the other operation processing mode is the same as the formatted text file; and thirdly, acquiring the incidence relation data of the platform area and the outgoing line, and analyzing by a java program.

as shown in fig. 4, the processing of the raw data by the big data platform includes: and merging, associating, de-duplicating, transposing rows and columns and cleaning the original data.

For the data quality itself is relatively high, such as: the data integrity is good, the data has no mutation, the original data of data cleaning is not needed, the operations of association, combination, duplication removal, row-column transposition and the like can be carried out on the original table by relying on the strong distributed processing capability of a large data platform, and the data of a new structure which is more beneficial to subsequent data analysis and use is formed.

For raw data with poor data quality, such as: the method has the advantages that the data loss is large, the data are suddenly changed, and the like, and the matlab language is adopted to write a data cleaning program to correct the original data. As the processing result is a formatted text file, several operations such as uploading the file to a big data platform, creating an epoch table mapping file and the like are also needed, so that the data in the intermediate library supports transaction management.

Migration from intermediate repository to result repository: the analyzed data is migrated from the intermediate library to a result library with higher query efficiency, a global index is created, the query response speed is increased, and the user experience is improved, as shown in fig. 5.

The specific method for creating the global index is as follows: compiling a script program to migrate the Inceptor data in the intermediate library to a Hyperbase data warehouse; and acquiring an index configuration file of the Hyperbase table, adding the index of the new field to the json file and carrying out effectiveness again.

The data is put into the Inceptor data warehouse for processing, on one hand, the data warehouse supports transactions and can realize the increment, deletion, check and modification of distributed data, and on the other hand, the data warehouse supports the SQL mode operation of data and executes the MapReduce task, so that the learning cost of developers is reduced. However, when facing massive data and needing frequent real-time interaction with the foreground, the query efficiency (especially fuzzy query and range query) of the administrator is low and cannot meet the requirement. Therefore, all Inceptor tables of the middle library, which need to interact with the foreground, are migrated to the Hyperbase database, and besides the Hyperbase main key self-contained index, a secondary index can be created for one or more columns of the common retrieval column, so that the retrieval speed is greatly increased.

The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.

Claims

1. The intelligent power distribution and utilization data conversion method based on the big data platform is characterized by comprising the following steps:

migration from intermediate repository to result repository: migrating the analyzed data from the intermediate library to a result library, and creating a global index;

the original library and the intermediate library are Inceptor data warehouses; the result base is a Hyperbase data warehouse;

the data storage format of the original data in the data cache region is as follows: an oracle table, a non-formatted txt file, a formatted txt file, an excel file and a cim/svg file;

for the oracle table: writing an sqoop script and importing data into an hdfs file of a big data platform; establishing a surface to be associated with the hdfs file, wherein the associated data supports SQL query but does not support updating, and deleting transaction processing; creating an Inceptor table supporting the transaction, and inserting the data in the table into the Inceptor transaction table for use;

for excel or formatted txt files: uploading a formatted txt file ftp to a specified directory of a big data platform; secondly, creating an inceptor table according to the file and the specified separator, and loading the text file data under the specified path to the inceptor table; inserting the data in the initiator table into a new initiator table supporting transaction processing;

for cim/svg files: firstly, developing a topology analysis program to analyze cim/svg files into txt files with fixed formats; other operation processing modes are the same as those of the formatted txt text file; and thirdly, a small amount of association relation data is analyzed and completed through a java program.

2. The intelligent power distribution and utilization data conversion method based on the big data platform as claimed in claim 1, wherein the data interface comprises: FTP interface, database interface, Webservice interface, and text data interface.

3. The intelligent power distribution and utilization data conversion method based on the big data platform as claimed in claim 1, wherein the processing of the original data by the big data platform comprises: and merging, associating, de-duplicating, transposing rows and columns and cleaning the original data.

4. The intelligent power distribution and utilization data conversion method based on the big data platform as claimed in claim 3, wherein the big data platform processes the original data by a specific method comprising the following steps:

if the data is missing and has mutation, a matlab language is adopted to write a data cleaning program to correct the original data to obtain a formatted txt file, the text file is uploaded to a big data platform through ftp, and an inceptor table mapping file is created, so that the data in the intermediate library supports transaction management.

5. The intelligent power distribution and utilization data conversion method based on the big data platform as claimed in claim 1, wherein the specific method for creating the global index is as follows: