CN112988677A - Hadoop-based power data processing subsystem - Google Patents

Hadoop-based power data processing subsystem Download PDF

Info

Publication number
CN112988677A
CN112988677A CN202110327944.XA CN202110327944A CN112988677A CN 112988677 A CN112988677 A CN 112988677A CN 202110327944 A CN202110327944 A CN 202110327944A CN 112988677 A CN112988677 A CN 112988677A
Authority
CN
China
Prior art keywords
data
file
cleaning
request
database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110327944.XA
Other languages
Chinese (zh)
Inventor
何森
苏浩辉
王奇
常安
陈彦州
肖耀辉
孙萌
郑文坚
张厚荣
赖光霖
崔曼帝
侯俊
张治然
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Maintenance and Test Center of Extra High Voltage Power Transmission Co
Original Assignee
Maintenance and Test Center of Extra High Voltage Power Transmission Co
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Maintenance and Test Center of Extra High Voltage Power Transmission Co filed Critical Maintenance and Test Center of Extra High Voltage Power Transmission Co
Priority to CN202110327944.XA priority Critical patent/CN112988677A/en
Publication of CN112988677A publication Critical patent/CN112988677A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • G06F16/168Details of user interfaces specifically adapted to file systems, e.g. browsing and visualisation, 2d or 3d GUIs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • G06F16/162Delete operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/248Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Computer Interaction (AREA)
  • Economics (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Public Health (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Biology (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Quality & Reliability (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Water Supply & Treatment (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a Hadoop-based power data processing subsystem, which comprises: the view layer is used for interacting with a user, receiving a data request of the user and returning a corresponding result; the control layer is used for processing the user data processing request forwarded by the view layer and returning a result through logic calculation, and calling a data model of the following model layer as an object of the logic calculation; and the model layer is used for providing the calculated data model for the control layer and accessing an interface of the data model instance object. By adopting the system, a user can perform data processing request operation through the view layer, and the control layer can quickly respond to a data operation instruction of the user and call the calculated data model, so that data can be quickly processed, and the efficiency and the accuracy of data processing are improved.

Description

Hadoop-based power data processing subsystem
Technical Field
The invention relates to electric power data processing, in particular to a Hadoop-based electric power data processing subsystem.
Background
Smart grids are a direction and trend of the development of the power industry. The intelligent power grid utilizes advanced information communication technology, computer technology, control technology and other advanced technologies to realize the coordination of the requirements and functions of all interest parties in power generation, power grid operation, terminal power utilization and power market, and improves the reliability, self-healing capability and stability of the system as much as possible while improving the high-efficiency operation of each part of the system as much as possible, reducing the cost and environmental influence. The final goal of the smart grid is to build a panoramic real-time system covering the whole production process of the power system, including multiple links of power generation, power transmission, power transformation, power distribution, power utilization, scheduling and the like. The basis for supporting safe, self-healing, green, strong and reliable operation of the smart power grid is power grid panoramic real-time data acquisition, transmission and storage and rapid analysis of accumulated mass multi-source data.
Big data is a new concept that has received much attention in recent years, and refers to a technical system or a technical architecture that extracts its value by an economic method by capturing, discovering, and analyzing a large amount of data with a complicated variety and source at high speed. Thus, in a broad sense, big data refers not only to the data to which it relates, but also to theories, methods, and techniques for processing and analyzing such data.
Big data is mainly applied to the fields of commerce, finance and the like in the early stage, and gradually expands to the fields of transportation, medical treatment, energy and the like, and a smart power grid is regarded as one of important technical fields of big data application. On one hand, with the rapid development of the smart grid, the large deployment of the smart meters and the wide application of the sensing technology, the power industry generates a large amount of data with various structures and complex sources, and how to store and apply the data is a difficult problem for power companies; on the other hand, the data has great utilization value, not only can the self management and operation level of the power grid be improved to a new level, but also a fundamental change is generated, and more and better services can be provided for government departments, industries and mass users, and conditions are provided for power companies to expand a lot of value-added services.
Disclosure of Invention
The invention aims to overcome at least one technical problem in the prior art and provides a Hadoop-based power data processing subsystem.
In order to achieve the purpose, the technical scheme of the invention is as follows:
a Hadoop-based power data processing subsystem comprising:
the view layer is used for interacting with a user, receiving a data request of the user and returning a corresponding result;
the control layer is used for processing the user data processing request forwarded by the view layer and returning a result through logic calculation, and calling a data model of the following model layer as an object of the logic calculation;
and the model layer is used for providing the calculated data model for the control layer and accessing an interface of the data model instance object.
Further, the view layer comprises a storage file view, a data fusion view and a data cleaning view;
the storage file view is used for operating the power grid data file by a user;
the data fusion view is used for a user to send a data fusion request, set a data fusion rule and check a data fusion result;
the data cleaning view is used for a user to send a data cleaning request, select a cleaning method and check a data cleaning result.
Furthermore, the control layer comprises a data storage module, a data fusion module and a data cleaning module;
the data storage module is used for operating the data file in the distributed file system;
the data fusion module is used for finishing format unification and data identification of data;
and the data cleaning module is used for cleaning and verifying basic data and missing value filling of the data subjected to data fusion.
Further, the data storage module is used for operating the data file in the distributed file system and comprises: file uploading, file data viewing, file rest modification, file downloading and file deletion;
the file uploading and the file downloading refer to interaction between a client and the HDFS, the file uploading refers to uploading of a power grid data file from the client to the HDFS, and the file downloading refers to copying of a data file of the HDFS to a local client; the file data viewing, the file information modification and the file deletion are all directly finished on the HDFS; the file data viewing refers to viewing the content of a target data file, and the file information modification refers to viewing and modifying the description information of the data file, including the file name and the authority of the file.
Further, the data fusion module is used for completing format unification and data identification of the data and comprises the following steps: formatting a text file, recovering a database file, converting the database file and converting a text numerical value;
the text file formatting refers to uniformly formatting text type data files with different separators and different statistical formats into orthogonal text type data files with one data example in each row and one data attribute in each column, wherein the formatted objects are text data files directly collected from a power grid or text data files obtained by traversing a database table;
the database file recovery means that a database file for storing the power grid information is stored in a server: data recovery is carried out in the corresponding database, so that the background program reads the data table in the database through the program;
the database file conversion refers to extracting the power grid data in the database by taking a table as a unit, converting the power grid data into a text type data file, wherein one table corresponds to one text type data file;
the text numerical value conversion is to convert the text content in the text data file into a numerical value with a certain rule, wherein the conversion rule is default by a system or self-defined by a user.
Further, the cleaning and verification of the cleaning module for cleaning the basic data and filling the missing values of the data subjected to data fusion includes:
the method comprises two stages of basic cleaning and cleaning verification, wherein the basic cleaning stage comprises three parts of repeated data cleaning, invalid data cleaning and incomplete data cleaning, and is used for cleaning data after direct operation or calculation of a data file; the cleaning verification stage comprises model training and filling value verification, and is cleaning inspection aiming at incomplete data cleaning after relatively complex calculation such as machine learning;
in basic cleaning, repeated data cleaning refers to deleting repeated attributes and repeated instances in a data file; invalid data cleaning refers to deleting the attribute with no data or with values in a few instances or the attribute with completely same values in a data file; incomplete data cleaning refers to filling attributes with a small number of missing numerical values by adopting a fixed value or a statistical value of the attributes of the category;
in the cleaning verification, model training refers to modeling of an SVM algorithm by using a data example without missing values after repeated data cleaning and invalid data cleaning through a machine learning method; the independent variable attribute of the model is selected by a user, the dependent variable attribute is the attribute with a missing value, and after data is input into the model, the system automatically adjusts and optimizes parameters in the process of model training and returns to an optimal model under a selected condition; the filling value verification means that the filling value in incomplete data cleaning is verified by using a model returned by model training; the attribute result with the missing value is divided into two intervals by the model established by the SVM algorithm, if the filling value falls in the interval predicted by the model, the verification is passed, otherwise, the data instance is deleted.
Further, the storage file view is used for the user to operate the power grid data file, which means that the user completes the file uploading and file viewing processes through the file storage view, and the method comprises the following steps:
1) receiving a file uploading request of a user by the file storage view, and forwarding the request to the data access module;
2) the data access module calls a file uploading function and uploads the file to a file system of the server;
3) returning the storage position of the file to be uploaded in the server file system;
4) the data access module calls a file uploading function and issues an uploading request and the position of the file in the server to the HDFS DAO;
5) the HDFS DAO calls an API for uploading the files of the packaged HDFS, and copies the files from a server file system to the HDFS according to the file storage position and the file storage rule;
6. 7, 8) returning file uploading information, if the uploading is successful, returning a file storage node, and if the uploading is failed, returning a failure reason;
9) receiving a file information viewing request of a user by a file storage view, and forwarding the request to a data access module;
10) the data access module calls a file data viewing function and issues the request to the HDFS DAO;
11) the HDFS DAO calls an API for downloading the HDFS file according to the request content, and copies the file to a server file system;
12. 13) returning the storage position of the file in the server file system;
14) the data access module reads the target file in the local file system;
15. 16) returning the specific content of the file.
Further, the data fusion view is used for a user to send a data fusion request, set a data fusion rule and view a data fusion result, and includes:
1) the data fusion view receives a data fusion request of a user and forwards the data fusion request to the data fusion module;
2) the data fusion module sends a file reading request to the data access module according to the database file selected by the user;
3) the data access module returns the storage location of the target database file;
4) the data fusion module sends the position of the database file and a database recovery request to a database corresponding to the server;
5) the server database recovers the database according to the database file;
6) the server database returns information of the recovered database;
7) the data fusion module calls a database file conversion function and sends a database table export request;
8) the server database returns a database table in a file form, and each table corresponds to one text file;
9) the data fusion module calls a text file formatting function to format the file corresponding to the database table into a uniform format in the demand analysis;
10) storing the finally obtained formatted file into the HDFS through a data access module;
11. 12) returning the processed data file content.
Further, the data cleansing view is used for a user to send a data cleansing request, select a cleansing method and view a data cleansing result, and comprises the following steps:
1) the data cleaning view receives a user data cleaning request, wherein the request comprises a basic cleaning requirement, and then the request is forwarded to a basic cleaning part of the data cleaning module;
2) the basic cleaning module calls the data access module to read the target file to be cleaned according to the request;
3) the data access module returns the position of the file to be cleaned;
4) the basic cleaning module performs basic cleaning on the target file on the HDFS, wherein the basic cleaning comprises repeated data cleaning, invalid data cleaning and incomplete data cleaning, and marks the data instance filled with the missing value;
5) the basic cleaning module forwards the data file and the marking information which are subjected to basic cleaning to the cleaning verification module;
6) the cleaning verification module carries out SVM algorithm modeling according to the original complete data after basic cleaning, searches for a verification standard value and verifies and perfects the incomplete data cleaning result;
7) returning a result after the cleaning verification;
8) the basic cleaning module stores the final data cleaning result into the HDFS;
9. 10): and the data access module returns the data cleaning result to the data cleaning view to be presented to the user.
Further, the event view layer provides two types of interfaces: request forwarding interface and request validation interface
Compared with the prior art, the invention has the beneficial effects that:
aiming at the existing problem of power data quality, the invention utilizes big data collected by a smart power grid to analyze the characteristics and process data. A Hadoop-based power data processing subsystem is provided, which focuses on a preparation support part for data processing, and is designed and realized to complete the fusion, storage, cleaning and the like of acquired data so as to improve the efficiency and accuracy of data processing.
Drawings
FIG. 1 is a block diagram of an embodiment of the present invention, illustrating an overall Hadoop-based power data processing subsystem;
FIG. 2 is a schematic diagram of the components of a data storage module;
FIG. 3 is a schematic diagram of the components of the data fusion module;
FIG. 4 is a schematic diagram of the data cleansing module;
FIG. 5 is a flow chart of data file access;
FIG. 6 is a flow chart of data file fusion;
FIG. 7 is a flow chart of data file cleansing.
Detailed Description
Example (b):
the technical solution of the present invention is further described below with reference to the accompanying drawings and examples.
Referring to fig. 1 to 7, the embodiment provides a Hadoop-based power data processing subsystem, which adopts a B/S architecture, provides functions of data storage, data fusion, data cleaning, and the like to users in a WEB service form, mainly includes a view layer, a control layer, and a model layer, and is a loosely coupled system, and uses an SSH framework to realize management and connection of each layer, and finally provides specific functions to users in a WEB service form.
The view layer is used for providing an interactive interface between the system and a user, receiving a data request of the user and returning a corresponding result; the control layer is mainly responsible for logic control and function realization of the system. The logic control part mainly comprises two functions, namely processing a user request forwarded by the view layer and returning a result through certain logic calculation, and calling a data model of the model layer as an object of the logic calculation; the model layer is mainly responsible for providing the data model for calculation and an interface for accessing the data model instance object, namely DAO, for the upper-layer system.
Therefore, by adopting the system, a user can perform data processing request operation through the view layer, and the control layer can quickly respond to the data operation instruction of the user and call the calculated data model, so that data can be processed quickly, and the efficiency and accuracy of data processing are improved.
Specifically, the view layer includes a storage file view, a data fusion view and a data cleansing view, and each view may have a plurality of pages or view blocks. The stored file view is mainly used for the user to add, delete, change, check and the like on the power grid data file. The data fusion view is mainly used for a user to send a data fusion request, set a data fusion rule and check a data fusion result. The data cleaning view is mainly used for a user to send a data cleaning request, select a cleaning method and check a data cleaning result. The support subsystem completes development of each page of the view layer by using HTML, CSS and TS technologies.
The control layer mainly comprises three functional modules, namely a data storage module, a data fusion module and a data cleaning module. The functional module is mainly used for completing logic calculation function and is a core unit for processing user request. The data storage module is mainly used for completing the addition, deletion, modification and searching of data files in the distributed file system. The data fusion module mainly completes the format unification and the data identification of the data. The data cleaning module mainly completes cleaning verification of basic data and missing value filling of the data subjected to data fusion. The control function of the control layer is managed by depending on Spring in the SSH framework, and the function implementation module is mainly implemented by JAVA language.
The data model orientation interface DAO of this model layer provides two types of DAOs, including a database DAO and a distributed file storage system DAO. The DAO of the distributed file storage system is mainly used for accessing text files of the distributed file storage system, including text files which are directly uploaded by users and subjected to data fusion. The supporting subsystem decides to select the most popular HDFS based on Hadoop in the industry as the file storage system of the system, so the DAO of the distributed file storage system is also realized by the JAVA API provided by the JAVA language calling HDFS. The database DAO is mainly provided for a data fusion module to perform data fusion on data of different databases. According to the research make internal disorder or usurp on the multi-source database and the existing ORM framework, the support subsystem decides to select the Hibernate framework to realize the database DAO. The data model is used for logic calculation of an upper layer and mainly provides a standard calculation object for data cleaning and data processing. The data model corresponds to each instance in the power grid data, so that the data model is closely combined with the service logic of the power grid, and the currently provided data model comprises fault information, equipment information, remote control information and custom information which is convenient for power grid workers to expand. The data model is described in the system in the JAVA language.
As shown in fig. 2, the data access module is mainly responsible for completing various operations of the power grid data file on the HDFS, including file uploading, file data viewing, file rest modification, file downloading, and file deletion.
The file uploading and the file downloading refer to interaction between a client and a distributed file storage system (HDFS), the file uploading refers to uploading of a power grid data file from the client to the HDFS, and the file downloading refers to copying of the data file stored in the HDFS to a local client. And the file data viewing, the file information modification and the file deletion are directly finished on the HDFS. The file data viewing refers to viewing the content of the target data file, and the file information modification refers to viewing and modifying the description information of the data file, including the file name and the authority of the file.
Since the tasks of the data access module are completely related to the HDFS, and the HDFS itself provides the interface for these file operations, the support subsystem will also implement the specific access and operation functions through the corresponding API of the encapsulated HDFS
As shown in fig. 3, the main functions of the data fusion module are all developed around the recognizable development of the content format unified data file of the power grid data file, including several core functions of text file formatting, database file recovery, database file conversion and text numerical value conversion, the first three functions can realize the format unification of the data file, and the last function is a file recognizable service.
The text file formatting refers to uniformly formatting text data files with different separators and different statistical formats into orthogonal text data files with one data instance in each row and one data attribute in each column, and the formatted objects can be text data files directly collected from a power grid or text data files obtained by traversing a database table. The database file recovery means that the database file for storing the power grid information is stored in the server 1: and data recovery is carried out in the corresponding database, so that the background program can read the data table in the database through the program. The database file conversion refers to extracting the power grid data in the database by taking a table as a unit, converting the power grid data into a text type data file, and enabling one table to correspond to one text type data file. The text numerical value conversion is to convert the text content in the text data file into a numerical value with a certain rule, and the conversion rule can be default by a system or customized by a user.
The text file formatting and text numerical value conversion are both directly operated on the file content, and on the basis of the file operation function provided by the data access module, the support subsystem realizes faster and more standard data file operation by using a JAVA class library named CSreader. The database file recovery and the database file conversion both relate to the operation of a database, including the connection of the database and the query and export of a database table, and need the support of an ORM framework, and the support subsystem can realize the functions of the part based on the Hibernate framework.
As shown in fig. 4, the data cleansing of the data cleansing module is divided into two stages, namely, a basic cleansing and a cleansing verification. The basic cleaning stage mainly comprises three parts of repeated data cleaning, invalid data cleaning and incomplete data cleaning, and is used for cleaning data after direct operation or simple calculation of the data files. The cleaning verification stage mainly comprises model training and filling value verification, and is cleaning inspection aiming at incomplete data cleaning after relatively complex calculation such as machine learning.
In basic cleansing, data cleansing is repeated to delete duplicate attributes and duplicate instances in a data file. Invalid data cleansing refers to deleting an attribute with no data or few instances with a value in a data file or an attribute with an identical value. Incomplete data cleaning refers to filling attributes with a small number of missing numerical values by using a fixed value or a statistical value of the attributes of the category.
In the cleaning verification, model training refers to modeling of an SVM algorithm by using a machine learning method and using a data example without missing values after repeated data cleaning and invalid data cleaning. The independent variable attribute of the model is selected by a user, the dependent variable attribute is the attribute with the missing value, and after data is input into the model, the system automatically adjusts and optimizes parameters in the process of model training and returns to an optimal model under a selected condition. And the filling value verification refers to verifying the filling value in incomplete data cleaning by using the model returned by model training. The attribute result with the missing value is divided into two intervals by the model established by the SVM algorithm, if the filling value falls in the interval predicted by the model, the verification is passed, otherwise, the data instance is deleted.
The data cleaning module uses the file after data fusion and also uses the file operation function of the data access module. For simple mathematical statistics calculation and complex machine learning model training in the cleaning process, the support subsystem calls a corresponding method of the weka library to realize the method.
The storage file view is used for the user to operate the power grid data file, which means that the user completes the file uploading and file viewing processes (i.e. the data file access process) through the file storage view, and as shown in fig. 5, the method comprises the following steps:
1. the file storage view receives a file uploading request of a user and forwards the request to the data access module;
2. the data access module calls a file uploading function and uploads a file to a file system of the server;
3. returning the storage position of the file to be uploaded in the server file system;
4. the data access module calls a file uploading function and issues an uploading request and the position of a file in the server to the HDFS DAO;
5, the HDFS DAO calls an API uploaded by the file of the packaged HDFS, and copies the file from the server file system to the HDFS according to the file storage position and the file storage rule;
6. 7, returning file uploading information, if the uploading is successful, returning a file storage node, and if the uploading is failed, returning a failure reason;
9. the file storage view receives a file information viewing request of a user and forwards the request to the data access module;
10. the data access module calls a file data viewing function and issues the request to the HDFS DAO;
the HDFS DAO calls an API for downloading the HDFS file according to the request content, and copies the file to a server file system;
12. 13, returning the storage position of the file in the server file system;
14. the data access module reads a target file in a local file system;
15. and 16, returning the specific content of the file.
Therefore, the data file can be efficiently and accurately accessed through the steps.
Fig. 5 shows a specific flow of file uploading and file viewing, and in fact, the data access module further includes a plurality of file operations such as file information modification and file downloading, but the specific flow is similar to the above-mentioned flow, and is not repeated here.
The data fusion view is used for the process that a user sends a data fusion request, sets a data fusion rule and checks a data fusion result to be data file fusion, and the data file fusion refers to the process that a data file collected by a power grid is converted into a text file in a unified format stored in the HDFS through file formatting and file recognition. Since data fusion of a database file also involves text file formatting, a basic flow of data file fusion is described here by taking data fusion of a database file as an example, and a specific flow is shown in fig. 6, and includes the following steps:
1. the data fusion view receives a data fusion request of a user and forwards the data fusion request to the data fusion module;
2. the data fusion module sends a file reading request to the data access module according to the database file selected by the user;
3. the data access module returns the storage position of the target database file;
4. the data fusion module sends the position of the database file and a database recovery request to a database corresponding to the server;
5. the server database recovers the database according to the database file;
6. the server database returns the information of the recovered database;
7. the data fusion module calls a database file conversion function and sends a database table export request;
8. the server database returns a database table in a file form, and each table corresponds to one text file;
9. the data fusion module calls a text file formatting function and formats the file corresponding to the database table into a uniform format in the demand analysis;
10. storing the finally obtained formatted file into the HDFS through a data access module;
11. and 12, returning the processed data file content.
Therefore, the data files can be efficiently and accurately fused through the steps.
The data cleaning view is used for a user to send a data cleaning request, select a cleaning method and check a data cleaning result to be a data file cleaning process, and the data file cleaning refers to a cleaning verification process of a file after data fusion through a series of basic cleaning and missing value filling. The specific flow is shown in fig. 7: the method comprises the following steps:
1. the data cleaning view receives a user data cleaning request, the request comprises a basic cleaning requirement, and then the request is forwarded to a basic cleaning part of the data cleaning module;
2. the basic cleaning module calls the data access module to read a target file to be cleaned according to the request;
3. the data access module returns the position of the file to be cleaned;
4. the basic cleaning module performs basic cleaning on a target file on the HDFS, wherein the basic cleaning comprises repeated data cleaning, invalid data cleaning and incomplete data cleaning, and marks a data instance filled with a missing value;
5. the basic cleaning module forwards the data file and the marking information which are subjected to basic cleaning to the cleaning verification module;
6. the cleaning verification module carries out SVM algorithm modeling according to the original complete data after basic cleaning, searches a verification standard value and verifies and perfects the incomplete data cleaning result;
7. returning a result after cleaning verification;
8. the basic cleaning module stores the final data cleaning result into the HDFS;
9. 10: and the data access module returns the data cleaning result to the data cleaning view to be presented to the user.
Therefore, the data file can be efficiently and accurately cleaned through the steps.
The view layer mainly provides buttons of various operations, namely an input frame, for a user through a graphical interface, and can simply judge the input of the user according to requirements, so that the view layer provides two types of interfaces: a request forwarding interface and a request validation interface. The interface of the viewing layer is directly exposed to the user, and is not described herein again.
The interface of the control layer is mainly provided for the view layer and other modules of the control layer to be called, and the core function logic of the system is completed.
The interfaces of the model layer are mainly distributed in two types of DAOs and are provided for the function modules of the control layer to perform specific file operation.
To sum up, to the existing problem of electric power data quality, this application utilizes smart power grids's big data of gathering under current background, analyzes its characteristics and carries out data processing. The Hadoop-based power data processing subsystem focuses on a preparation support part of data processing, and is designed and realized to complete fusion, storage, cleaning and the like of acquired data so as to improve the efficiency and accuracy of data processing.
The above embodiments are only for illustrating the technical concept and features of the present invention, and the purpose thereof is to enable those skilled in the art to understand the contents of the present invention and implement the present invention accordingly, and not to limit the protection scope of the present invention accordingly. All equivalent changes or modifications made in accordance with the spirit of the present disclosure are intended to be covered by the scope of the present disclosure.

Claims (10)

1. A Hadoop-based power data processing subsystem, comprising:
the view layer is used for interacting with a user, receiving a data request of the user and returning a corresponding result;
the control layer is used for processing the user data processing request forwarded by the view layer and returning a result through logic calculation, and calling a data model of the following model layer as an object of the logic calculation;
and the model layer is used for providing the calculated data model for the control layer and accessing an interface of the data model instance object.
2. The Hadoop-based power data processing subsystem of claim 1 wherein the view layer comprises a stored file view, a data fusion view, and a data cleansing view;
the storage file view is used for operating the power grid data file by a user;
the data fusion view is used for a user to send a data fusion request, set a data fusion rule and check a data fusion result;
the data cleaning view is used for a user to send a data cleaning request, select a cleaning method and check a data cleaning result.
3. The Hadoop-based power data processing subsystem of claim 2 wherein the control layer comprises a data storage module, a data fusion module, and a data cleansing module;
the data storage module is used for operating the data file in the distributed file system;
the data fusion module is used for finishing format unification and data identification of data;
and the data cleaning module is used for cleaning and verifying basic data and missing value filling of the data subjected to data fusion.
4. The Hadoop-based power data processing subsystem of claim 3 wherein the data storage module for operation of data files in a distributed file system comprises: file uploading, file data viewing, file rest modification, file downloading and file deletion;
the file uploading and the file downloading refer to interaction between a client and the HDFS, the file uploading refers to uploading of a power grid data file from the client to the HDFS, and the file downloading refers to copying of a data file of the HDFS to a local client; the file data viewing, the file information modification and the file deletion are all directly finished on the HDFS; the file data viewing refers to viewing the content of a target data file, and the file information modification refers to viewing and modifying the description information of the data file, including the file name and the authority of the file.
5. The Hadoop-based power data processing subsystem of claim 3 wherein the data fusion module to accomplish format unification and data recognizable data includes: formatting a text file, recovering a database file, converting the database file and converting a text numerical value;
the text file formatting refers to uniformly formatting text type data files with different separators and different statistical formats into orthogonal text type data files with one data example in each row and one data attribute in each column, wherein the formatted objects are text data files directly collected from a power grid or text data files obtained by traversing a database table;
the database file recovery means that a database file for storing the power grid information is stored in a server: data recovery is carried out in the corresponding database, so that the background program reads the data table in the database through the program;
the database file conversion refers to extracting the power grid data in the database by taking a table as a unit, converting the power grid data into a text type data file, wherein one table corresponds to one text type data file;
the text numerical value conversion is to convert the text content in the text data file into a numerical value with a certain rule, wherein the conversion rule is default by a system or self-defined by a user.
6. The Hadoop-based power data processing subsystem of claim 3, wherein the data cleansing module is configured to perform cleaning validation of the base data cleansing and missing value filling of the data fused data by:
the method comprises two stages of basic cleaning and cleaning verification, wherein the basic cleaning stage comprises three parts of repeated data cleaning, invalid data cleaning and incomplete data cleaning, and is used for cleaning data after direct operation or calculation of a data file; the cleaning verification stage comprises model training and filling value verification, and is cleaning inspection aiming at incomplete data cleaning after relatively complex calculation such as machine learning;
in basic cleaning, repeated data cleaning refers to deleting repeated attributes and repeated instances in a data file; invalid data cleaning refers to deleting the attribute with no data or with values in a few instances or the attribute with completely same values in a data file; incomplete data cleaning refers to filling attributes with a small number of missing numerical values by adopting a fixed value or a statistical value of the attributes of the category;
in the cleaning verification, model training refers to modeling of an SVM algorithm by using a data example without missing values after repeated data cleaning and invalid data cleaning through a machine learning method; the independent variable attribute of the model is selected by a user, the dependent variable attribute is the attribute with a missing value, and after data is input into the model, the system automatically adjusts and optimizes parameters in the process of model training and returns to an optimal model under a selected condition; the filling value verification means that the filling value in incomplete data cleaning is verified by using a model returned by model training; the attribute result with the missing value is divided into two intervals by the model established by the SVM algorithm, if the filling value falls in the interval predicted by the model, the verification is passed, otherwise, the data instance is deleted.
7. The Hadoop-based power data processing subsystem as claimed in claim 3, wherein the storage file view for the user to operate the grid data file means that the user completes the process of file uploading and file viewing through the file storage view, comprising the steps of:
1) receiving a file uploading request of a user by the file storage view, and forwarding the request to the data access module;
2) the data access module calls a file uploading function and uploads the file to a file system of the server;
3) returning the storage position of the file to be uploaded in the server file system;
4) the data access module calls a file uploading function and issues an uploading request and the position of the file in the server to the HDFS DAO;
5) the HDFSDAO calls an API for uploading the files of the packaged HDFS, and copies the files from the server file system to the HDFS according to the file storage position and the file storage rule;
6. 7, 8) returning file uploading information, if the uploading is successful, returning a file storage node, and if the uploading is failed, returning a failure reason;
9) receiving a file information viewing request of a user by a file storage view, and forwarding the request to a data access module;
10) the data access module calls a file data viewing function and issues the request to the HDFSDAO;
11) the HDFSDAO calls an API for downloading the HDFS file according to the request content, and copies the file to a server file system;
12. 13) returning the storage position of the file in the server file system;
14) the data access module reads the target file in the local file system;
15. 16) returning the specific content of the file.
8. The Hadoop-based power data processing subsystem of claim 3 wherein the data fusion view for a user to send a request for data fusion, set data fusion rules, and view data fusion results comprises:
1) the data fusion view receives a data fusion request of a user and forwards the data fusion request to the data fusion module;
2) the data fusion module sends a file reading request to the data access module according to the database file selected by the user;
3) the data access module returns the storage location of the target database file;
4) the data fusion module sends the position of the database file and a database recovery request to a database corresponding to the server;
5) the server database recovers the database according to the database file;
6) the server database returns information of the recovered database;
7) the data fusion module calls a database file conversion function and sends a database table export request;
8) the server database returns a database table in a file form, and each table corresponds to one text file;
9) the data fusion module calls a text file formatting function to format the file corresponding to the database table into a uniform format in the demand analysis;
10) storing the finally obtained formatted file into the HDFS through a data access module;
11. 12) returning the processed data file content.
9. The Hadoop-based power data processing subsystem of claim 3 wherein the data cleansing view for a user to send a request for data cleansing, select a cleansing method, and view data cleansing results comprises:
1) the data cleaning view receives a user data cleaning request, wherein the request comprises a basic cleaning requirement, and then the request is forwarded to a basic cleaning part of the data cleaning module;
2) the basic cleaning module calls the data access module to read the target file to be cleaned according to the request;
3) the data access module returns the position of the file to be cleaned;
4) the basic cleaning module performs basic cleaning on the target file on the HDFS, wherein the basic cleaning comprises repeated data cleaning, invalid data cleaning and incomplete data cleaning, and marks the data instance filled with the missing value;
5) the basic cleaning module forwards the data file and the marking information which are subjected to basic cleaning to the cleaning verification module;
6) the cleaning verification module carries out SVM algorithm modeling according to the original complete data after basic cleaning, searches for a verification standard value and verifies and perfects the incomplete data cleaning result;
7) returning a result after the cleaning verification;
8) the basic cleaning module stores the final data cleaning result into the HDFS;
9. 10): and the data access module returns the data cleaning result to the data cleaning view to be presented to the user.
10. The Hadoop-based power data processing subsystem of claim 1 wherein the fault map layer provides two types of interfaces: a request forwarding interface and a request validation interface.
CN202110327944.XA 2021-03-26 2021-03-26 Hadoop-based power data processing subsystem Pending CN112988677A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110327944.XA CN112988677A (en) 2021-03-26 2021-03-26 Hadoop-based power data processing subsystem

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110327944.XA CN112988677A (en) 2021-03-26 2021-03-26 Hadoop-based power data processing subsystem

Publications (1)

Publication Number Publication Date
CN112988677A true CN112988677A (en) 2021-06-18

Family

ID=76333931

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110327944.XA Pending CN112988677A (en) 2021-03-26 2021-03-26 Hadoop-based power data processing subsystem

Country Status (1)

Country Link
CN (1) CN112988677A (en)

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
赵迪: "基于电力大数据的数据挖掘支撑子系统的设计与实现", 中国优秀硕士学位论文全文数据库, no. 03, pages 1 - 74 *

Similar Documents

Publication Publication Date Title
CN107506451B (en) Abnormal information monitoring method and device for data interaction
CN105843182B (en) A kind of power scheduling accident prediction system and method based on OMS
CN106951694B (en) Adaptive modeling method for online monitoring system of secondary equipment of power system
CN103927314B (en) A kind of method and apparatus of batch data processing
CN104123227A (en) Method for automatically generating testing cases
CN108446313B (en) Data format conversion method and device
US10924551B2 (en) IRC-Infoid data standardization for use in a plurality of mobile applications
CN102439818A (en) Method and device for auto-generating goose signal connection topology from substation level
CN103605660A (en) Graphic processing method for SCD (Substation Configuration Description) file
CN111858730A (en) Data importing and exporting device, method, equipment and medium of graph database
CN114913376A (en) Image-based defect automatic identification method, device and system and storage medium
CN116011753A (en) Method and device for realizing direct dispatch monitoring reminding of fault worksheet based on RPA robot
Wu et al. An Auxiliary Decision‐Making System for Electric Power Intelligent Customer Service Based on Hadoop
CN112035466B (en) External index development framework for block chain query
CN113570345A (en) Power failure range automatic identification system based on construction project circuit diagram
CN113642850A (en) Data fusion method and terminal for power distribution network planning
CN112306992A (en) Big data platform based on internet
CN117111909A (en) Code automatic generation method, system, computer equipment and storage medium
CN112988677A (en) Hadoop-based power data processing subsystem
CN115794078A (en) Codeless AI model development system and method
CN111159203B (en) Data association analysis method, platform, electronic equipment and storage medium
CN108470047B (en) Remote platform monitoring system based on Internet of Things
CN113407505A (en) Method and system for processing security log elements
KR20220010294A (en) Rule-based OPC UA Node Generation System and Method
CN113824589B (en) RESTful-based slicing capability open interface mapping method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination