CN111538720A - Method and system for cleaning basic data in power industry - Google Patents

Method and system for cleaning basic data in power industry Download PDF

Info

Publication number
CN111538720A
CN111538720A CN202010171013.0A CN202010171013A CN111538720A CN 111538720 A CN111538720 A CN 111538720A CN 202010171013 A CN202010171013 A CN 202010171013A CN 111538720 A CN111538720 A CN 111538720A
Authority
CN
China
Prior art keywords
data
real
cleaning
time
time industrial
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010171013.0A
Other languages
Chinese (zh)
Other versions
CN111538720B (en
Inventor
曹海涛
刘林元
冯磊
陈武军
李宏博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jialing River Tingzikou Water Resources And Hydropower Development Co ltd
Original Assignee
Jialing River Tingzikou Water Resources And Hydropower Development Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jialing River Tingzikou Water Resources And Hydropower Development Co ltd filed Critical Jialing River Tingzikou Water Resources And Hydropower Development Co ltd
Priority to CN202010171013.0A priority Critical patent/CN111538720B/en
Publication of CN111538720A publication Critical patent/CN111538720A/en
Application granted granted Critical
Publication of CN111538720B publication Critical patent/CN111538720B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/26Visual data mining; Browsing structured data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Abstract

The invention discloses a method for cleaning basic data in the power industry, which comprises the following steps: collecting real-time industrial data of power station equipment; checking and carding the real-time industrial data; cleaning and managing the real-time industrial data from the production stage of the real-time industrial data; cleaning and managing the real-time industrial data from the maintenance stage of the real-time industrial data; and cleaning and managing the real-time industrial data from the using stage of the real-time industrial data. The invention also discloses a system for cleaning the basic data in the power industry. By adopting the method, the data cleaning method can be clearly researched from three aspects of production, maintenance and use of the data, and effective data which can be used for data analysis and the existing informatization system is finally sorted out, so that reference and guidance are provided for cleaning the power plants of other enterprises under the group.

Description

Method and system for cleaning basic data in power industry
Technical Field
The invention relates to the technical field of big data, in particular to a method and a system for cleaning basic data in the power industry.
Background
With the coming of the big data era, data analysis methods and tools are continuously improved, various intelligent analysis systems are continuously emerged, power generation enterprises can obtain new insights from a large amount of data and fuse the insights with all details of known services, brand-new productivity is created, and the traditional power station is promoted to be transformed and upgraded to the intelligent power station. To this end, we shall consider the data as a core asset of the power generation enterprise, i.e., a "data asset".
The data becomes assets and is already a consensus of the industry, even though some have suggested to credit the data to an asset liability statement. However, if the physical assets are compared, the management of the data assets is still in a very primitive stage. Often, an organization lacks comprehensive understanding of the data asset types and quantities, and the fine management, value mining and continuous operation of data quality, data security, asset assessment, asset exchange transaction and the like are weaker.
Data asset management is an important work content for promoting deep fusion of big data and entity economy, conversion of new and old kinetic energy and transition of economy to a high-quality development stage at the present stage. Data asset management needs to fully integrate services, technologies and management to ensure value-added value-retention of data assets.
Data asset management is in the important position of coming from top to bottom in a big data application system. The data application development with value creation as a guide is supported, and the management of the data full life cycle is realized by relying on a large data platform.
Data asset management runs through the whole life cycle process of data acquisition, application, value realization and the like. The enterprise management data assets are that the data assets quality is improved and the value of the data in two aspects of 'inner added value and outer added benefit' is promoted to be changed through the management of the life cycle of the data. Data is first defined, created or obtained normatively, then stored, maintained and used, and finally destroyed. The life cycle of the data begins before data acquisition, and enterprises make data planning and define data specifications in advance so as to acquire technical capabilities required for realizing data acquisition, delivery, storage and control.
At present, improving data quality and reducing cost become hot topics of interest for industry and enterprises. If effective combing and fine management cannot be carried out on the data, the value of the data cannot be well reflected, the data value exertion is seriously influenced, and even negative effects are brought to operation management. The importance of data asset management is mainly reflected in the following aspects:
(1) lack of a uniform data standard: the data registration and checking process lacks a uniform data standard, and the problems of data confusion and conflict, multiple sources, multiple types and the like cannot be effectively avoided. The unified standard is a necessary premise for solving the association capability of data, ensuring information interaction, data circulation and smooth system access function.
(2) Data cycle planning confusion: for some enterprises, there is an unreasonable phenomenon in planning each link of the whole life cycle process such as collection, transmission, storage, application, open sharing of internal data. Such as the data source user being in an unknown/non-consented state when collecting data, violating out-of-range processing, or not doing processing information isolation, etc.
(3) Difficult overall business management: the authority management of data such as addition, deletion, modification, use and the like is disordered, and a single data view which can comprehensively, accurately and completely reflect the operation condition of an enterprise is difficult to establish. The management and solution of the problems of data demand, data quality, data application and the like are dispersed in different business and technical departments, a clear coordination mechanism and a uniform data management channel do not exist, and business cannot obtain data support timely and as required.
(4) The data processing efficiency is low: the period of data acquisition, pretreatment and other work is long, the method is not convenient enough, the treatment efficiency is low, perfect and high-quality data attributes cannot be mined and sorted out quickly for analysis and application, and the development and treatment efficiency needs to be improved.
(5) The data quality is uneven: data quality problems such as data redundancy, data missing, data collision, etc. cannot be timely discovered and effectively solved. The method needs to establish a normative data treatment process, an assessment mechanism and the like for improvement.
(6) Safety supervision is imperative: lack of effective data security management mechanism, lack of effective control to sensitive information, privacy information, the visit of secret information makes its desensitization lose close compliance, even form potential reputation and legal risk etc. to the enterprise.
Disclosure of Invention
The technical problem to be solved by the present invention is to provide a method and a system for cleaning basic data in the power industry, which can clean and manage data to sort out effective data.
In order to solve the technical problem, the invention provides a method for cleaning basic data in the power industry, which comprises the following steps: collecting real-time industrial data of power station equipment; checking and carding the real-time industrial data; cleaning and managing the real-time industrial data from the production stage of the real-time industrial data; cleaning and managing the real-time industrial data from the maintenance stage of the real-time industrial data; and cleaning and managing the real-time industrial data from the using stage of the real-time industrial data.
As an improvement of the scheme, the method for checking and combing the real-time industrial data comprises the following steps: checking and combing power generation service data of a hydropower station control layer and a wind farm control system; aiming at a problematic data source, the system goes deep into a hydroelectric power unit and a wind power unit, and checks and combs unit service data and communication protocols.
As an improvement of the above solution, the method for cleaning and managing real-time industrial data from a production stage of the real-time industrial data includes: respectively cleaning and managing real-time industrial data according to data attributes, wherein the data attributes comprise a time attribute, a model attribute and a source system attribute; and respectively cleaning and managing the real-time industrial data according to an acquisition channel of the data, wherein the acquisition channel comprises a data acquisition channel, a data extraction channel and a derivative calculation channel.
As an improvement of the above solution, the method for cleaning and managing real-time industrial data from the maintenance stage of the real-time industrial data includes: constructing a data asset inventory, wherein the data asset inventory comprises a retrieval mode, a full data table and primary and standby source data; constructing a storage database, wherein the storage database comprises a time sequence database, a relation database, a non-structural database and a streaming media database; constructing safety guarantees, wherein the safety guarantees comprise link safety, tenant safety, content safety and protection safety; and constructing data services, wherein the data services comprise scheduling management, multi-tenant management, data synchronization, isolation synchronization, data retrieval and data calling.
As an improvement of the above solution, the method for cleaning and managing real-time industrial data from the use stage of the real-time industrial data includes: respectively cleaning and managing real-time industrial data according to application scenes, wherein the application scenes comprise real-time monitoring, associated display, historical display and model display; and respectively cleaning and managing the real-time industrial data according to an algorithm model, wherein the algorithm model comprises a single quantity time model, a plurality of quantity time models, a plurality of quantity association models, a plurality of quantity mechanism models and a plurality of other quantity models.
As an improvement of the above solution, the method for collecting real-time industrial data of power station equipment includes: passively receiving real-time industrial data of power station equipment; and actively acquiring real-time industrial data of the power station equipment.
Correspondingly, the invention also provides a system for cleaning the basic data in the power industry, which comprises the following components: the acquisition module is used for acquiring real-time industrial data of the power station equipment; the checking and carding module is used for checking and carding the real-time industrial data; and the cleaning and management module is used for cleaning and managing the real-time industrial data from the production stage, the maintenance stage and the use stage of the data respectively.
As an improvement of the above solution, the cleaning and management module includes: the production stage cleaning unit is used for respectively cleaning and managing real-time industrial data according to data attributes and data acquisition channels, wherein the data attributes comprise time attributes, model attributes and source system attributes, and the acquisition channels comprise data acquisition channels, data extraction channels and derivative calculation channels; the system comprises a maintenance stage cleaning unit, a data asset inventory, a storage database, a safety guarantee and a data service, wherein the data asset inventory comprises a retrieval mode, a full data table and primary and backup source data, the storage database comprises a time sequence database, a relation database, a non-structural database and a streaming media database, the safety guarantee comprises link safety, tenant safety, content safety and protection safety, and the data service comprises scheduling management, multi-tenant management, data synchronization, isolation synchronization, data retrieval and data calling. And the use stage cleaning unit is used for cleaning and managing the real-time industrial data respectively according to an application scene and an algorithm model, wherein the application scene comprises real-time monitoring, association display, history display and model display, and the algorithm model comprises a single quantity time model, a plurality of quantity time models, a plurality of quantity association models, a plurality of quantity mechanism models and a plurality of other quantity models.
As an improvement of the scheme, the power industry basic data cleaning system further comprises a data analysis module used for retrieving real-time industrial data according to coding, Chinese names or fuzzy query modes.
As an improvement of the scheme, the basic data cleaning system in the power industry adopts a micro-service architecture and adopts Docker containerization technology for management.
The implementation of the invention has the following beneficial effects:
the data cleaning method is researched in three aspects of data production, maintenance and use, the technical requirements and the design scheme of data asset management are provided according to the research results, the data of the power station equipment is collected, the required data is cleaned and managed, and effective data which can be used for data analysis and the existing information system are finally arranged, so that reference and guidance are provided for cleaning the power plant data of other enterprises under the group.
Drawings
FIG. 1 is a flow chart of a method of power industry basic data cleaning of the present invention;
FIG. 2 is a schematic structural diagram of the power industry basic data cleaning system of the present invention;
FIG. 3 is a schematic diagram of a cleaning and management module in the electric power industry basic data cleaning system according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings.
Referring to fig. 1, fig. 1 shows a flowchart of a method for clearing basic data of the power industry according to the present invention, which includes:
and S101, acquiring real-time industrial data of the power station equipment.
Specifically, the method for acquiring real-time industrial data of the power station equipment comprises the following steps: the method comprises the steps of passively receiving real-time industrial data of the power station equipment and actively collecting the real-time industrial data of the power station equipment.
It should be noted that the real-time industrial data is acquired by other external systems, and according to the difference of related protocols, the invention can support two modes of passive reception and active acquisition, the active acquisition mode needs to support scheduling setting to adjust the acquisition frequency, and for different acquisition targets, a general industrial protocol, a proprietary protocol, a database, a system, a file and the like need to be supported.
And S102, checking and combing the real-time industrial data.
Specifically, the method for checking and combing real-time industrial data comprises the following steps:
(1) checking and combing power generation service data of a hydropower station control layer and a wind farm control system; checking and combing all measuring point data of the existing station control layer system within a certain time range, marking the data with problems of repetition, deletion, abnormality, error, inconsistency and the like, analyzing possible reasons of the data, forming a data evaluation report, and establishing a data quality judgment rule and a model.
(2) Aiming at a problematic data source, the system goes deep into a hydroelectric power unit and a wind power unit, and checks and combs unit service data and communication protocols. Data with deep problems are combed from the design and communication protocols of a communication system inside a unit, internal related measuring points are combed, and data sources with problems are found and marked, so that the integrity and quality of service data of a power plant are improved, and a data investigation report and a communication rule standard component are formed.
And S103, cleaning and managing the real-time industrial data from the production stage of the real-time industrial data.
Specifically, the method for cleaning and managing the real-time industrial data from the production stage of the real-time industrial data comprises the following steps:
(1) and respectively cleaning and managing the real-time industrial data according to data attributes, wherein the data attributes comprise a time attribute, a model attribute and a source system attribute.
Data attributes are essential information of data, are generated along with the data, are also an important part of data cleaning, and are cleaned and managed according to three categories. Wherein:
time classification: classifying according to information such as time identification, adopted frequency and duration, stopping sampling time and the like;
and (3) classifying the models: classifying according to the equipment model, the communication model and the information;
source system classification: and cleaning and management classification are carried out through the characteristics of the source system.
(2) And respectively cleaning and managing the real-time industrial data according to an acquisition channel of the data, wherein the acquisition channel comprises a data acquisition channel, a data extraction channel and a derivative calculation channel.
It should be noted that, different data production modes lead to different data acquisition channels, and data cleaning is performed according to the characteristics of different channels. Wherein:
data acquisition: collecting a communication protocol, collecting a database, collecting an API interface and the like;
data extraction: analyzing a table structure, and searching and querying;
and (3) derivative calculation: data deformation, time sequence characteristics, composite calculation, correlation characteristics, semantic recognition, image recognition, data conversion and the like.
Therefore, in the production link of the data, the data is cleaned based on two dimensions of the data attribute and the acquisition channel (namely a production mode), the quality of the data in the whole flow direction process from the source end to the use end is ensured, and methods such as range definition, linear interpolation, correlation judgment and the like are comprehensively applied.
And S104, cleaning and managing the real-time industrial data from the maintenance stage of the real-time industrial data.
Specifically, the method for cleaning and managing the real-time industrial data from the maintenance stage of the real-time industrial data includes:
(1) and constructing a data asset inventory, wherein the data asset inventory comprises a retrieval mode, a full data table and primary and standby source data.
And (3) retrieval mode: establishing a data identification system, defining according to three angles of an equipment domain, a production domain and a management domain, and defining a data retrieval mode;
table for gross data: establishing a full standard data table, determining the data stock state and establishing a data asset table;
primary and backup source data: and the main and standby mode is carried out on important data, so that the data security is ensured.
(2) And constructing a storage database, wherein the storage database comprises a time sequence database, a relation database, a non-structural database and a streaming media database.
A time sequence database: for storing real-time data generated by the device; the time sequence database is suitable for storing dynamic data changing along with time, the complex incidence relation in the relation database is not needed, so that the efficiency of retrieving the data in the time window is high, and meanwhile, the data structure is simple, the occupied storage space is small, so that high-density data can be stored for a long time, and support is provided for data-based analysis work. The above characteristics are generally satisfied for measurement data automatically acquired by an instrument or a system, and therefore, it is recommended to store the measurement data by using a time-series database.
A relational database: the system is used for storing data such as ERP and equipment ledger; the relational database is suitable for storing static information with different attributes in a defined structure, can establish a relation among different information, and is suitable for storing data which needs to be subjected to different dimensionality correlation analysis, such as equipment ledger information, equipment fault information, technical supervision data and the like.
Unstructured database: the system is used for storing data such as test records, two-ticket information and the like; for static data which can not define structure, such as various structure diagrams, reports of manual analysis, images, audio, video and other data, a non-structural database is needed for storage, and for example, a document database can be adopted for document types.
Streaming media database: the method is used for storing data such as safety monitoring, routing inspection video and the like. For example, the video and audio can adopt a streaming media database and the like.
(3) And constructing safety guarantees including link safety, tenant safety, content safety and protection safety.
And (3) link security: establishing a data encryption channel, and sharing access interface authority and a private network transmission path;
safety of tenants: establishing multi-tenant authentication and authority management to realize data security access;
and (4) content security: perfecting a data storage strategy and ensuring safe data storage;
protection and safety: the safety of a deployment environment is enhanced, and the partitions are managed in a grading manner.
(4) And constructing data services, wherein the data services comprise scheduling management, multi-tenant management, data synchronization, isolation synchronization, data retrieval and data calling.
Scheduling management: realizing the associated data scheduling among multiple databases;
multi-tenant management: the access authority management of multiple tenants is realized;
data synchronization: realizing the user-defined target synchronization of data;
isolation synchronization: data cross-isolation safety synchronization is realized;
and (3) data retrieval: realizing multi-mode retrieval service of data;
data calling: and a local and remote dual calling mechanism of data is realized.
Therefore, data management effectively manages data from the aspects of combing and establishing data asset inventory, designing data classified storage, using data safely, serving data and the like, and ensures the manageability of a data carrier and the reliability of a use environment.
And S105, cleaning and managing the real-time industrial data from the using stage of the real-time industrial data.
Specifically, the method for cleaning and managing the real-time industrial data from the use stage of the real-time industrial data comprises the following steps:
(1) and respectively cleaning and managing the real-time industrial data according to application scenes, wherein the application scenes comprise real-time monitoring, associated display, historical display and model display.
And (3) real-time monitoring: real-time monitoring of raw data;
and (3) association display: displaying the comprehensive association of a plurality of original data in real time;
history display: displaying different dimensions of the historical data;
and (3) model display: and displaying the special mechanism model data.
(2) And respectively cleaning and managing the real-time industrial data according to an algorithm model, wherein the algorithm model comprises a single quantity time model, a plurality of quantity time models, a plurality of quantity association models, a plurality of quantity mechanism models and a plurality of other quantity models.
Single-magnitude time model: a time model of a single monitored quantity;
multiple quantity time model: a time model of a plurality of monitored quantities;
multiple-quantity correlation models: a correlation model of a plurality of monitored quantities;
several models of the mechanisms of the quantities: industry professional mechanistic models, such as spectrum models, rotation models, etc.;
various other models of quantities: mainstream big data algorithm classes such as neural network, linear regression, etc.
From the above, data cleaning is the combined use of cleaning and management methods, and through the data life cycle, cleaning method research of different latitudes is performed from each link of data acquisition, data standardization, data storage, data access and use, and a scientific method is provided for guaranteeing data quality. Therefore, the data cleaning method is researched in three aspects of data production, maintenance and use, the technical requirements and the design scheme of data asset management are provided according to the research results, the data of the power station equipment is collected, the required data is cleaned and managed, and effective data which can be used for data analysis and the existing information system is finally arranged, so that reference and guidance are provided for cleaning the power plant data of other enterprises under the group.
Referring to fig. 2, fig. 2 shows a specific structure of the power industry basic data cleaning system 100 of the present invention, which includes:
and the acquisition module 1 is used for acquiring real-time industrial data of the power station equipment.
And the checking and carding module 2 is used for checking and carding the real-time industrial data.
And the cleaning and management module 3 is used for cleaning and managing the real-time industrial data from a production stage, a maintenance stage and a use stage of the data respectively.
As shown in fig. 3, the cleaning and management module 3 includes:
the production stage cleaning unit 31 is configured to respectively clean and manage real-time industrial data according to data attributes and data acquisition channels, where the data attributes include a time attribute, a model attribute, and a source system attribute, and the acquisition channels include a data acquisition channel, a data extraction channel, and a derivative calculation channel;
the maintenance stage cleaning unit 32 is configured to construct a data asset inventory, a storage database, a security guarantee and a data service, where the data asset inventory includes a retrieval mode, a full data table and primary and backup source data, the storage database includes a time sequence database, a relationship database, a non-structural database and a streaming media database, the security guarantee includes link security, tenant security, content security and protection security, and the data service includes scheduling management, multi-tenant management, data synchronization, isolation synchronization, data retrieval and data call.
The use stage cleaning unit 33 is configured to respectively clean and manage the real-time industrial data according to an application scenario and an algorithm model, where the application scenario includes real-time monitoring, association display, history display and model display, and the algorithm model includes a single quantity time model, a plurality of quantity time models, a plurality of quantity association models, a plurality of quantity mechanism models, and a plurality of other quantity models.
Therefore, the data cleaning method is researched in three aspects of data production, maintenance and use, the technical requirements and the design scheme of data asset management are provided according to the research results, the data of the power station equipment is collected, the required data is cleaned and managed, and effective data which can be used for data analysis and the existing information system is finally arranged, so that reference and guidance are provided for cleaning the power plant data of other enterprises under the group.
Furthermore, a special conversion bus is required to be arranged for the data processing part, the data processing process is reasonably planned, the flow is unified, reusable components are extracted, and flow configuration design is carried out. Correspondingly, the basic data cleaning system in the power industry is provided with a collection input module, an edge calculation module, a cache output module, a data monitoring module and a task scheduling module. Specifically, the method comprises the following steps:
the acquisition input module: the input data in the data conversion bus is acquired by other external systems, and can support two modes of passive receiving and active acquisition according to different related protocols, wherein the active acquisition mode needs to support scheduling setting to adjust the acquisition frequency, and a general industrial protocol, a private protocol, a database, a system, a file and the like need to be supported according to different acquisition targets. The acquired data needs to be stored in a cache of the conversion bus and processed by other subsequent services.
An edge calculation module: the data conversion bus provides an edge calculation function, and data in the cache can be calculated and converted through an algorithm. The data conversion needs to adopt a configuration design, the computing unit is packaged into different operators, such as filtering, decomposition, combination, statistics, conversion and the like, a plurality of operators can form different processing flows, and the processing flows and the processing sequence can be flexibly configured. And recoding the data subjected to edge calculation and putting the data back into the buffer.
A buffer output module: data collected and calculated in the data conversion bus are all placed in a cache and need to be issued and output to other targets such as systems, services, files and the like, for different targets, an interaction protocol of the data needs to be packaged in a plug-in mode, and the rest parts of the data need to adopt a uniform operation mode so as to simplify the use cost. And aiming at different release targets, the output of targets such as a general industrial protocol, a private protocol, a database, a system, a file and the like can be supported.
A data monitoring module: the entire process needs to be designed, configured, managed and monitored in a configuration manner. By assembling a plurality of processing flows between the input end point and the output end point, the processing process can be intuitively controlled, the set flow is clear at a glance, and the maintenance and the adjustment are convenient. Meanwhile, in the configuration process, the processing results can be checked in different links so as to be used for design and debugging of the conversion flow. For a normally running process, the processing condition of each node, such as the total number of processed nodes, can also be counted.
A task scheduling module: the processing flow needs to control the processing rate, and for the input end, two processing modes of active acquisition and passive reception are simultaneously supported, and for the output end, two modes of passive calling and active sending are supported. In the mode of actively acquiring and sending, different frequencies should be supported and set, so that meaningless repeated calling is avoided, and resources are reasonably distributed and used under the condition of meeting service requirements.
In order to realize the sharing and access of data, the basic data cleaning system for the power industry further comprises a data retrieval module, a data calling module and a data synchronization module. Specifically, the method comprises the following steps:
a data retrieval module: aiming at the identification, tree retrieval, condition retrieval, accurate retrieval, fuzzy retrieval and the like need to be supported; for the retrieval of time series data, the functions of acquiring the latest data according to the identification and acquiring historical data according to the time range need to be supported, and the historical data also needs to be supported to be sparse or completed according to a certain method; for the retrieval of the relational data, functions such as identification retrieval, condition retrieval, association retrieval, grouping statistics, and the like need to be supported.
A data calling module: providing a universal and standard Restful interface for use when calling data by an external system; besides passive calling, an active sending mode can be adopted for an external system to obtain data, and the data is sent to a specified target through a data sending program.
A data synchronization module: for data distributed in different data pools, a synchronization mechanism should be provided to meet the sharing requirement between different data pools. The data synchronization needs to support the functions of history synchronization and real-time synchronization, and simultaneously supports two modes of incremental synchronization and full synchronization. Different synchronization frequencies and modes are set for different data types. Based on the consideration of security, in some cases, only one-way transmission (for example, in the case of an isolation device) is possible between two systems performing synchronization, in such cases, a synchronization service is required to support, and for different isolation systems, transmission restrictions are different, so that componentization development needs to be supported, and in the face of a new isolation restriction rule, only a corresponding adaptation protocol needs to be developed.
In order to meet the use requirements of different user groups, the basic data cleaning system for the power industry further comprises a system service module, a data service module and an application display module. Specifically, the method comprises the following steps:
a system service module: from the perspective of system service, different units and departments can be supported to use the same software and hardware system, but independent data storage areas and micro-service instances need to be created for different tenants to ensure the data security and service capability among themselves, different services for different tenants need to be supported, and different resources are set. Meanwhile, the management capability of different tenants for own data and services needs to be provided. In this respect, a Docker containerization mode is proposed for management, resources are reasonably distributed, a virtual machine mode occupies a large amount of resources, and the management, operation and maintenance are difficult and the use is avoided.
A data service module: from the perspective of data service, the system should set permissions for different data, based on the data identification, different user groups can only access their respective permissions and their data, the data that users of different levels can access are also related to their permissions, and the permissions can be set according to the user levels (such as group level, branch level, station level) and the different departments to perform corresponding authorization.
An application display module: from the perspective of application display, applications that can be seen by different users should be different, for example, a group user can see statistical information of all branch companies of a group, and a plant station can only see data information in the plant station of the user.
In order to manage and monitor the application, the power industry basic data cleaning system further comprises a configuration display module, a data analysis module and a coding management module. Specifically, the method comprises the following steps:
and the configuration display module provides a visual configuration display tool, and a user can flexibly configure required monitoring software so as to display the data mining result. And a configuration editing tool can be used for developing various quantitative state monitoring and alarming software.
And the data analysis module is used for retrieving the real-time industrial data according to the coding, Chinese name or fuzzy query mode. It should be noted that the data analysis module can retrieve data for all data of the system in the modes of encoding, Chinese name, fuzzy query, etc., and display the contents of real-time value, historical trend, encoding information, etc.; the concerned data can be downloaded, exported, a trend curve is drawn, and correlation analysis and other operations are performed.
The coding management module: the KKS data processing method is used for performing coding configuration and maintenance work and achieving KKS standardized management of data. KKS code input, modification, deletion, batch addition, batch modification, import and export, historical trend and real-time trend checking functions can be realized through code management, data standardization of multiple watersheds and multiple stations KKS can be realized, and unified retrieval management can be realized.
In addition, the basic data cleaning system in the power industry adopts a micro-service architecture and adopts Docker containerization technology for management.
Compared with the traditional single-point system, the micro service system independently encapsulates the functions with different service requirements, can avoid the competition of resources among different service functions and the influence of service processes, and can add different operation examples for the use frequency of different functions to expand the performance; in addition, the micro-service can divide a complex large system into a plurality of simple small services, thereby reducing the difficulty of business technology and further reducing the development cost. And the micro-service is more independent and standard in definition, and is convenient for different suppliers to participate in collaborative development and subsequent function upgrading. Correspondingly, in order to meet the management and business requirements, the system should be divided into a plurality of micro services according to the function range, and based on the management angle, the system should include a scheduling management micro service and a multi-tenant management micro service; based on the data perspective, services such as synchronization, retrieval, invocation, and conversion of data should be included.
Meanwhile, for the service components included in the system, a unified management interface is required for scheduling, the service components include functions of starting, stopping, deleting, configuring, backing up and the like, in order to meet the management requirements, a Docker containerization technology is required to be adopted for packaging the containers, and the Docker-based management interface is used for completing the scheduling functions.
Therefore, the system proposes to adopt a micro-service architecture, and simultaneously uses a Docker containerization technology to manage the service, thereby reducing the workload and difficulty of operation and maintenance management, improving the efficiency and reducing the cost.
In conclusion, the design of the invention needs to adopt a mainstream big data platform framework, and combines technical means such as artificial intelligence and the like, supports the deepening and expansion of the subsequent requirements from the use angles of different users, and has the characteristics of stability, high efficiency, convenient use, easy management, expandable performance, easy increase of functions and the like.
While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention.

Claims (10)

1. A method for cleaning basic data of a power industry is characterized by comprising the following steps:
collecting real-time industrial data of power station equipment;
checking and carding the real-time industrial data;
cleaning and managing the real-time industrial data from the production stage of the real-time industrial data;
cleaning and managing the real-time industrial data from the maintenance stage of the real-time industrial data;
and cleaning and managing the real-time industrial data from the using stage of the real-time industrial data.
2. The method for cleaning up the basic data of the power industry according to claim 1, wherein the method for checking and combing the real-time industrial data comprises the following steps:
checking and combing power generation service data of a hydropower station control layer and a wind farm control system;
aiming at a problematic data source, the system goes deep into a hydroelectric power unit and a wind power unit, and checks and combs unit service data and communication protocols.
3. The method for cleansing electric power industry foundation data as recited in claim 1, wherein the method for cleansing and managing real-time industrial data from a production phase of the real-time industrial data comprises:
respectively cleaning and managing real-time industrial data according to data attributes, wherein the data attributes comprise a time attribute, a model attribute and a source system attribute;
and respectively cleaning and managing the real-time industrial data according to an acquisition channel of the data, wherein the acquisition channel comprises a data acquisition channel, a data extraction channel and a derivative calculation channel.
4. The method for cleansing electric power industry foundation data as recited in claim 1, wherein the method for cleansing and managing real-time industrial data from a maintenance phase of the real-time industrial data comprises:
constructing a data asset inventory, wherein the data asset inventory comprises a retrieval mode, a full data table and primary and standby source data;
constructing a storage database, wherein the storage database comprises a time sequence database, a relation database, a non-structural database and a streaming media database;
constructing safety guarantees, wherein the safety guarantees comprise link safety, tenant safety, content safety and protection safety;
and constructing data services, wherein the data services comprise scheduling management, multi-tenant management, data synchronization, isolation synchronization, data retrieval and data calling.
5. The method for cleansing electric power industry basic data according to claim 1, wherein the method for cleansing and managing real-time industrial data from a use stage of the real-time industrial data comprises:
respectively cleaning and managing real-time industrial data according to application scenes, wherein the application scenes comprise real-time monitoring, associated display, historical display and model display;
and respectively cleaning and managing the real-time industrial data according to an algorithm model, wherein the algorithm model comprises a single quantity time model, a plurality of quantity time models, a plurality of quantity association models, a plurality of quantity mechanism models and a plurality of other quantity models.
6. The method of power industry essential data cleansing of claim 1, wherein the method of collecting real-time industrial data of power plant equipment comprises:
passively receiving real-time industrial data of power station equipment;
and actively acquiring real-time industrial data of the power station equipment.
7. An electric power industry basic data cleaning system, comprising:
the acquisition module is used for acquiring real-time industrial data of the power station equipment;
the checking and carding module is used for checking and carding the real-time industrial data;
and the cleaning and management module is used for cleaning and managing the real-time industrial data from the production stage, the maintenance stage and the use stage of the data respectively.
8. The electric power industry essential data cleaning system of claim 7, wherein the cleaning and management module comprises:
the production stage cleaning unit is used for respectively cleaning and managing real-time industrial data according to data attributes and data acquisition channels, wherein the data attributes comprise time attributes, model attributes and source system attributes, and the acquisition channels comprise data acquisition channels, data extraction channels and derivative calculation channels;
the system comprises a maintenance stage cleaning unit, a data asset inventory, a storage database, a safety guarantee and a data service, wherein the data asset inventory comprises a retrieval mode, a full data table and primary and backup source data, the storage database comprises a time sequence database, a relation database, a non-structural database and a streaming media database, the safety guarantee comprises link safety, tenant safety, content safety and protection safety, and the data service comprises scheduling management, multi-tenant management, data synchronization, isolation synchronization, data retrieval and data calling.
And the use stage cleaning unit is used for cleaning and managing the real-time industrial data respectively according to an application scene and an algorithm model, wherein the application scene comprises real-time monitoring, association display, history display and model display, and the algorithm model comprises a single quantity time model, a plurality of quantity time models, a plurality of quantity association models, a plurality of quantity mechanism models and a plurality of other quantity models.
9. The electric power industry essential data cleansing system of claim 7, further comprising a data analysis module for retrieving real-time industrial data by code, Chinese name, or fuzzy query.
10. The electric power industry basic data cleaning system according to any one of claims 7 to 9, wherein the electric power industry basic data cleaning system adopts a micro-service architecture and adopts a Docker containerization technology for management.
CN202010171013.0A 2020-03-12 2020-03-12 Method and system for cleaning basic data of power industry Active CN111538720B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010171013.0A CN111538720B (en) 2020-03-12 2020-03-12 Method and system for cleaning basic data of power industry

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010171013.0A CN111538720B (en) 2020-03-12 2020-03-12 Method and system for cleaning basic data of power industry

Publications (2)

Publication Number Publication Date
CN111538720A true CN111538720A (en) 2020-08-14
CN111538720B CN111538720B (en) 2023-07-21

Family

ID=71976753

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010171013.0A Active CN111538720B (en) 2020-03-12 2020-03-12 Method and system for cleaning basic data of power industry

Country Status (1)

Country Link
CN (1) CN111538720B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112348698A (en) * 2020-10-30 2021-02-09 中核核电运行管理有限公司 Nuclear power plant group pile management method, device and system
CN114722037A (en) * 2022-05-16 2022-07-08 中国信息通信研究院 Industrial internet middleware data processing method, middleware and readable storage medium

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090030614A1 (en) * 2007-07-25 2009-01-29 Andrew John Carnegie Method, system and apparatus for formation tester data processing
CN102609501A (en) * 2012-02-02 2012-07-25 北京华电天仁电力控制技术有限公司 Data cleaning method based on real-time historical database
CN104036001A (en) * 2014-06-13 2014-09-10 上海新炬网络技术有限公司 Dynamic hotlist priority scheduling based quick data cleaning method
CN105278373A (en) * 2015-10-16 2016-01-27 中国南方电网有限责任公司电网技术研究中心 Substation integrated information processing system realizing method
US20160179852A1 (en) * 2014-12-18 2016-06-23 Alexis Naibo Visualizing Large Data Volumes Utilizing Initial Sampling and Multi-Stage Calculations
CN106777227A (en) * 2016-12-26 2017-05-31 河南信安通信技术股份有限公司 Multidimensional data convergence analysis system and method based on cloud platform
CN107153664A (en) * 2016-03-04 2017-09-12 同方知网(北京)技术有限公司 A kind of method flow that research conclusion is simplified based on the scientific and technical literature mark that assemblage characteristic is weighted
CN107908690A (en) * 2017-11-01 2018-04-13 南京欣网互联网络科技有限公司 A kind of data processing method based on big data OA operation analysis
CN109947754A (en) * 2019-01-28 2019-06-28 中科恒运股份有限公司 Data cleaning method and device
CN110489459A (en) * 2019-08-07 2019-11-22 国网安徽省电力有限公司 A kind of enterprise-level industry number fused data analysis system based on big data platform
CN110618983A (en) * 2019-08-15 2019-12-27 复旦大学 JSON document structure-based industrial big data multidimensional analysis and visualization method
CN110727666A (en) * 2019-09-25 2020-01-24 中冶赛迪重庆信息技术有限公司 Cache assembly, method, equipment and storage medium for industrial internet platform

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090030614A1 (en) * 2007-07-25 2009-01-29 Andrew John Carnegie Method, system and apparatus for formation tester data processing
CN102609501A (en) * 2012-02-02 2012-07-25 北京华电天仁电力控制技术有限公司 Data cleaning method based on real-time historical database
CN104036001A (en) * 2014-06-13 2014-09-10 上海新炬网络技术有限公司 Dynamic hotlist priority scheduling based quick data cleaning method
US20160179852A1 (en) * 2014-12-18 2016-06-23 Alexis Naibo Visualizing Large Data Volumes Utilizing Initial Sampling and Multi-Stage Calculations
CN105278373A (en) * 2015-10-16 2016-01-27 中国南方电网有限责任公司电网技术研究中心 Substation integrated information processing system realizing method
CN107153664A (en) * 2016-03-04 2017-09-12 同方知网(北京)技术有限公司 A kind of method flow that research conclusion is simplified based on the scientific and technical literature mark that assemblage characteristic is weighted
CN106777227A (en) * 2016-12-26 2017-05-31 河南信安通信技术股份有限公司 Multidimensional data convergence analysis system and method based on cloud platform
CN107908690A (en) * 2017-11-01 2018-04-13 南京欣网互联网络科技有限公司 A kind of data processing method based on big data OA operation analysis
CN109947754A (en) * 2019-01-28 2019-06-28 中科恒运股份有限公司 Data cleaning method and device
CN110489459A (en) * 2019-08-07 2019-11-22 国网安徽省电力有限公司 A kind of enterprise-level industry number fused data analysis system based on big data platform
CN110618983A (en) * 2019-08-15 2019-12-27 复旦大学 JSON document structure-based industrial big data multidimensional analysis and visualization method
CN110727666A (en) * 2019-09-25 2020-01-24 中冶赛迪重庆信息技术有限公司 Cache assembly, method, equipment and storage medium for industrial internet platform

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112348698A (en) * 2020-10-30 2021-02-09 中核核电运行管理有限公司 Nuclear power plant group pile management method, device and system
CN114722037A (en) * 2022-05-16 2022-07-08 中国信息通信研究院 Industrial internet middleware data processing method, middleware and readable storage medium
CN114722037B (en) * 2022-05-16 2022-08-26 中国信息通信研究院 Industrial Internet middleware data processing method, middleware and readable storage medium

Also Published As

Publication number Publication date
CN111538720B (en) 2023-07-21

Similar Documents

Publication Publication Date Title
CN112685385B (en) Big data platform for smart city construction
CN112379653B (en) Intelligent power plant management and control system based on micro-service architecture
CN112396404A (en) Data center system
CN111917887A (en) System for realizing data governance under big data environment
CN114925045B (en) PaaS platform for big data integration and management
CN104036365A (en) Method for constructing enterprise-level data service platform
CN113094385B (en) Data sharing fusion platform and method based on software defined open tool set
CN114153920A (en) Big data edge platform and method
CN111047143A (en) Power grid OMS-based regional and county team index management system
CN111538720B (en) Method and system for cleaning basic data of power industry
CN112988919A (en) Power grid data market construction method and system, terminal device and storage medium
CN114218218A (en) Data processing method, device and equipment based on data warehouse and storage medium
CN112651872A (en) Community comprehensive treatment system and method based on data middlebox
CN115617776A (en) Data management system and method
CN114756563A (en) Data management system with multiple coexisting complex service lines of internet
CN115934856A (en) Method and system for constructing comprehensive energy data assets
Wu et al. An auxiliary decision-making system for electric power intelligent customer service based on Hadoop
CN115016902B (en) Industrial flow digital management system and method
Wadhera et al. A systematic Review of Big data tools and application for developments
CN113706101B (en) Intelligent system architecture and method for power grid project management
CN115496337A (en) Data system for supporting brain of enterprise
CN111797084B (en) Data coding through mark inspection method and system based on weapon equipment test flow
CN113342874A (en) Wind power big data analysis system and process based on cloud computing
CN112784129A (en) Pump station equipment operation and maintenance data supervision platform
CN112085341A (en) Master station system suitable for risk management and control of overall process of power production operation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant