CN111538720B - Method and system for cleaning basic data of power industry - Google Patents

Method and system for cleaning basic data of power industry Download PDF

Info

Publication number
CN111538720B
CN111538720B CN202010171013.0A CN202010171013A CN111538720B CN 111538720 B CN111538720 B CN 111538720B CN 202010171013 A CN202010171013 A CN 202010171013A CN 111538720 B CN111538720 B CN 111538720B
Authority
CN
China
Prior art keywords
data
real
cleaning
time
time industrial
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010171013.0A
Other languages
Chinese (zh)
Other versions
CN111538720A (en
Inventor
曹海涛
刘林元
冯磊
陈武军
李宏博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jialing River Tingzikou Water Resources And Hydropower Development Co ltd
Original Assignee
Jialing River Tingzikou Water Resources And Hydropower Development Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jialing River Tingzikou Water Resources And Hydropower Development Co ltd filed Critical Jialing River Tingzikou Water Resources And Hydropower Development Co ltd
Priority to CN202010171013.0A priority Critical patent/CN111538720B/en
Publication of CN111538720A publication Critical patent/CN111538720A/en
Application granted granted Critical
Publication of CN111538720B publication Critical patent/CN111538720B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/26Visual data mining; Browsing structured data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Computing Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a method for cleaning basic data in the power industry, which comprises the following steps: collecting real-time industrial data of power station equipment; checking and carding the real-time industrial data; cleaning and managing the real-time industrial data from a production stage of the real-time industrial data; cleaning and managing the real-time industrial data from a maintenance stage of the real-time industrial data; the real-time industrial data is cleaned and managed from a use stage of the real-time industrial data. The invention also discloses a system for cleaning the basic data in the power industry. By adopting the method, the data cleaning method can be researched from three aspects of data production, maintenance and use, and effective data which can be used for data analysis and the existing informatization system are finally arranged, so that reference and guidance are provided for data cleaning of other enterprise power plants of the group.

Description

Method and system for cleaning basic data of power industry
Technical Field
The invention relates to the technical field of big data, in particular to a method for cleaning basic data in the power industry and a system for cleaning the basic data in the power industry.
Background
With the advent of the big data age, data analysis methods and tools have been advanced, various intelligent analysis systems have been developed, and power generation enterprises can acquire new insights from a large amount of data and combine the new insights with various details of known services, so that brand-new productivity is created, and transformation and upgrading of traditional power stations to intelligent power stations are promoted. For this reason, we should consider data as the core asset of the power generation enterprise, i.e. "data asset".
Data becomes an asset, which is already an industry consensus, and even some have suggested to be credited to an asset liability statement. But if the physical assets are compared, the management of the data assets is still in a very primitive stage. Often, an organization lacks comprehensive knowledge about its data asset class and quantity, and is vulnerable to fine management, value mining, and continuous operation of data quality, data security, asset assessment, asset exchange transactions, and the like.
The data asset management is an important working content for promoting the deep fusion of big data and entity economy, conversion of new kinetic energy and old kinetic energy and economy to high-quality development stage at present. Data asset management requires a full convergence of services, techniques and management to ensure data asset warranty value-added.
Data asset management is a significant position in big data application systems. The data application development which takes the creation of value as the guide is supported on the upper part, and the management of the full life cycle of the data is realized on the lower part by depending on a big data platform.
Data asset management runs through the whole life cycle process of data acquisition, application, value realization and the like. The enterprise management of data assets is to improve the quality of the data assets through management of the life cycle of the data, and promote the value of the data in the aspects of 'internal increment and external increment'. The data is first normalized, defined, created or obtained, then stored, maintained and used, and finally destroyed. The lifecycle of data begins before data acquisition, where an enterprise pre-formulates data planning, defines data specifications, in an effort to achieve the technical capabilities required to achieve data acquisition, delivery, storage, and control.
At present, improving data quality and reducing cost have become a hot concern for industry enterprises. If the data cannot be effectively combed and finely managed, the value of the data cannot be well reflected, the data value exertion is seriously influenced, and even negative effects are brought to operation management. The importance of data asset management is mainly manifested in the following aspects:
(1) Lack of unified data standards: the data registration checking flow lacks a unified data standard, and the problems of data confusion conflict, one-number multi-source, multiple types and the like cannot be effectively avoided. The unified standard is a necessary premise for solving the association capability of data, ensuring information interaction, data circulation and smooth system access function.
(2) Data cycle planning confusion: for partial enterprises, unreasonable phenomena exist in planning of all links of full life cycle flows such as acquisition, transmission, storage, application, open sharing and the like of internal data. Such as when data is collected, the data source user is in an unknowing/non-agreeing state, violating the overstocked process or not processing information isolation.
(3) It is difficult to orchestrate business management: rights management confusion such as data addition and deletion, modification and use is difficult to establish a single data view which comprehensively, accurately and completely reflects the operation condition of an enterprise. The management and the solution of the problems of data demand, data quality, data application and the like are distributed in different business and technical departments, a clear coordination mechanism and a unified data management channel are not available, and the business cannot acquire data support in time and on demand.
(4) The data processing efficiency is low: the working period of data acquisition, preprocessing and the like is longer, the method is not convenient enough, the processing efficiency is low, perfect and high-quality data attributes cannot be quickly mined and tidied for analysis application, and development and treatment efficiency needs to be improved.
(5) The data quality is ragged: the problems of data quality such as data redundancy, data deficiency, data collision and the like cannot be found out in time and solved effectively. The method needs to establish a standard data treatment flow, an assessment mechanism and other ways to be perfected.
(6) Safety supervision is imperative: the lack of an effective data security management mechanism, the lack of effective control over the access of sensitive information, private information and secret information makes the sensitive information, the private information and the secret information desensitize and deshuff the rule, and even forms potential reputation, legal risks and the like for enterprises.
Disclosure of Invention
The invention aims to solve the technical problem of providing a method and a system for cleaning basic data in the power industry, which can clean and manage the data so as to sort out effective data.
In order to solve the technical problems, the invention provides a method for cleaning basic data in the power industry, which comprises the following steps: collecting real-time industrial data of power station equipment; checking and carding the real-time industrial data; cleaning and managing the real-time industrial data from a production stage of the real-time industrial data; cleaning and managing the real-time industrial data from a maintenance stage of the real-time industrial data; the real-time industrial data is cleaned and managed from a use stage of the real-time industrial data.
As an improvement of the above solution, the method for checking and carding real-time industrial data includes: checking and carding power generation business data of a hydropower station control layer and a wind power plant control system; aiming at the problematic data sources, the method goes deep into the hydropower and wind turbine units, and checks and combs the unit service data and communication protocols.
As an improvement of the above solution, the method for cleaning and managing real-time industrial data from the production stage of the real-time industrial data includes: respectively cleaning and managing real-time industrial data according to data attributes, wherein the data attributes comprise time attributes, model attributes and source system attributes; and cleaning and managing the real-time industrial data according to the data acquisition channels, wherein the acquisition channels comprise a data acquisition channel, a data extraction channel and a derivative calculation channel.
As an improvement of the above solution, the method for cleaning and managing real-time industrial data from the maintenance stage of real-time industrial data includes: constructing a data asset album, wherein the data asset album comprises a retrieval mode, a full data table and main and standby source data; constructing a storage database, wherein the storage database comprises a time sequence database, a relation database, an unstructured database and a streaming media database; constructing a security guarantee, wherein the security guarantee comprises link security, tenant security, content security and protection security; data services are constructed, including dispatch management, multi-tenant management, data synchronization, quarantine synchronization, data retrieval, and data invocation.
As an improvement of the above solution, the method for cleaning and managing real-time industrial data from the use stage of the real-time industrial data includes: respectively cleaning and managing the real-time industrial data according to application scenes, wherein the application scenes comprise real-time monitoring, associated display, history display and model display; and respectively cleaning and managing the real-time industrial data according to an algorithm model, wherein the algorithm model comprises a single quantity time model, a plurality of quantity time models, a plurality of quantity association models, a plurality of quantity mechanism models and a plurality of quantity other models.
As an improvement of the above solution, the method for collecting real-time industrial data of the power station equipment includes: passively receiving real-time industrial data of the power station equipment; real-time industrial data of power station equipment is actively collected.
Correspondingly, the invention also provides a system for cleaning the basic data of the power industry, which comprises the following steps: the acquisition module is used for acquiring real-time industrial data of the power station equipment; the checking and carding module is used for checking and carding the real-time industrial data; and the cleaning and management module is used for cleaning and managing the real-time industrial data from the production stage, the maintenance stage and the use stage of the data respectively.
As an improvement of the above solution, the cleaning and management module includes: the production stage cleaning unit is used for cleaning and managing real-time industrial data according to data attributes and data acquisition channels, wherein the data attributes comprise time attributes, model attributes and source system attributes, and the acquisition channels comprise data acquisition channels, data extraction channels and derivative calculation channels; the maintenance stage cleaning unit is used for constructing a data asset album, a storage database, a security guarantee and a data service, wherein the data asset album comprises a retrieval mode, a full data table and main and standby source data, the storage database comprises a time sequence database, a relational database, an unstructured database and a streaming media database, the security guarantee comprises link security, tenant security, content security and protection security, and the data service comprises scheduling management, multi-tenant management, data synchronization, isolation synchronization, data retrieval and data call. The using stage cleaning unit is used for cleaning and managing real-time industrial data according to an application scene and an algorithm model, wherein the application scene comprises real-time monitoring, association display, history display and model display, and the algorithm model comprises a single quantity time model, a plurality of quantity time models, a plurality of quantity association models, a plurality of quantity mechanism models and a plurality of quantity other models.
As an improvement of the scheme, the power industry basic data cleaning system further comprises a data analysis module for retrieving real-time industrial data according to codes, chinese names or fuzzy query modes.
As an improvement of the scheme, the power industry basic data cleaning system adopts a micro-service architecture and adopts a Docker containerization technology for management.
The implementation of the invention has the following beneficial effects:
according to the invention, a data cleaning method is researched from three aspects of data production, maintenance and use, technical requirements and design schemes of data asset management are provided according to research results, power station equipment data are collected, cleaning and management are carried out on needed data, and effective data which can be used by data analysis and the existing informatization system are finally arranged, so that references and guidance are provided for cleaning data of other enterprise power plants of a group.
Drawings
FIG. 1 is a flow chart of a method of power industry base data cleaning of the present invention;
FIG. 2 is a schematic diagram of a basic data cleaning system in the power industry according to the present invention;
fig. 3 is a schematic structural diagram of a cleaning and management module in the power industry basic data cleaning system according to the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings, for the purpose of making the objects, technical solutions and advantages of the present invention more apparent.
Referring to fig. 1, fig. 1 shows a flowchart of a method for cleaning basic data of a power industry according to the present invention, including:
s101, collecting real-time industrial data of power station equipment.
Specifically, the method for acquiring real-time industrial data of the power station equipment comprises the following steps: and passively receiving real-time industrial data of the power station equipment and actively collecting the real-time industrial data of the power station equipment.
It should be noted that, the real-time industrial data is collected by other external systems, and according to different related protocols, the invention can support two modes of passive receiving and active collecting, the active collecting mode needs to support scheduling setting to adjust the frequency of obtaining, and for different targets of collecting, the invention needs to support general industrial protocols, private protocols, databases, systems, files and the like.
S102, checking and carding the real-time industrial data.
Specifically, the method for checking and carding real-time industrial data comprises the following steps:
(1) Checking and carding power generation business data of a hydropower station control layer and a wind power plant control system; checking and carding all measuring point data of the existing station control layer system within a certain time range, marking the data with the problems of repetition, deletion, abnormality, error, inconsistency and the like, analyzing the possible reasons of the data to form a data evaluation report, and establishing a data quality judgment rule and model.
(2) Aiming at the problematic data sources, the method goes deep into the hydropower and wind turbine units, and checks and combs the unit service data and communication protocols. And the data with deep problems go deep into the source of the data, are combed from the design and communication protocol of the communication system in the unit, are combed from related measuring points in the unit, and are marked by touch-arranging the data and the data source with problems, so that the integrity and the quality of the service data of the power plant are improved, and a data check report and a communication rule standard component are formed.
S103, cleaning and managing the real-time industrial data from the production stage of the real-time industrial data.
Specifically, the method for cleaning and managing real-time industrial data from the production stage of the real-time industrial data comprises the following steps:
(1) And cleaning and managing the real-time industrial data according to data attributes, wherein the data attributes comprise time attributes, model attributes and source system attributes.
It should be noted that, the data attribute is essential information of the data, and is accompanied by data generation, and is also an important part of data cleaning, and cleaning and management are performed according to three classifications. Wherein:
time classification: classifying according to time identification, frequency and duration of adoption, stoping time and other information;
model classification: classifying according to the equipment model, the communication model and the information;
source system classification: the cleaning and management classification is performed by the characteristics of the source system.
(2) And cleaning and managing the real-time industrial data according to the data acquisition channels, wherein the acquisition channels comprise a data acquisition channel, a data extraction channel and a derivative calculation channel.
It should be noted that, different production modes of data result in different data acquisition channels, and data cleaning is performed according to the characteristics of different channels. Wherein:
and (3) data acquisition: collecting communication protocols, collecting databases, collecting API interfaces and the like;
and (3) data extraction: table structure analysis, search and query;
derivative calculation: data deformation, time sequence characteristics, compound calculation, association characteristics, semantic recognition, image recognition, data conversion and the like.
Therefore, in the production link of the data, the data is cleaned based on two dimensions of the data attribute and the acquisition channel (namely the production mode), the quality of the data in the whole flow direction process from the source end to the using end is ensured, and the methods of range definition, linear interpolation, association judgment and the like are comprehensively applied.
S104, cleaning and managing the real-time industrial data from the maintenance stage of the real-time industrial data.
Specifically, the method for cleaning and managing the real-time industrial data from the maintenance stage of the real-time industrial data comprises the following steps:
(1) And constructing a data asset album, wherein the data asset album comprises a retrieval mode, a full data table and main and standby source data.
Search mode: establishing a data identification system, defining according to three angles of an equipment domain, a production domain and a management domain, and defining a data retrieval mode;
full data table: establishing a full standard data table, defining a data stock state, and establishing a data asset table;
primary and backup source data: and a main mode and a standby mode are implemented on important data, so that the data security is ensured.
(2) And constructing a storage database, wherein the storage database comprises a time sequence database, a relational database, an unstructured database and a streaming media database.
A time sequence database: for storing real-time data generated by the device; the time sequence database is suitable for storing dynamic data changing along with time, and because the time sequence database does not need complex association relations in the relational database, the data efficiency in the retrieval time window is very high, and meanwhile, because the data structure is simple and the occupied storage space is small, the time sequence database can store high-density data for a long time, and provides support for analysis work based on the data. For measurement data automatically acquired by instruments and systems, the above characteristics are generally satisfied, and therefore it is recommended to use a time series database for storage.
Relational database: the device is used for storing ERP, equipment account and other data; the relational database is suitable for storing static information with different attributes in a defined structure, can establish a relation between different information, and is suitable for storing data needing to be subjected to association analysis of different dimensions, such as equipment account information, equipment fault information, technical supervision data and the like.
Unstructured database: the method is used for storing test records, two-ticket information and other data; for static data which cannot define structures, such as various structural diagrams, manually analyzed reports, images, audio, video and other data, an unstructured database is required to be used for storage, such as a document database for document types.
Streaming media database: the method is used for storing data such as security monitoring, patrol video and the like. For example, the video and audio class can use a streaming media database.
(3) And constructing a security guarantee, wherein the security guarantee comprises link security, tenant security, content security and protection security.
Link security: establishing a data encryption channel, sharing access interface authority and a private network transmission path;
tenant security: establishing multi-tenant authentication and authority management to realize data security access;
content security: perfecting a data storage strategy and guaranteeing data security storage;
protection safety: the safety of the deployment environment is enhanced, and the partition is managed in a grading manner.
(4) Data services are constructed, including dispatch management, multi-tenant management, data synchronization, quarantine synchronization, data retrieval, and data invocation.
Scheduling management: the associated data scheduling among multiple databases is realized;
multi-tenant management: realizing access authority management of multiple tenants;
data synchronization: realizing the self-defined target synchronization of data;
isolation synchronization: realizing data cross-isolation safety synchronization;
and (3) data retrieval: realizing multi-mode retrieval service of data;
and (3) data calling: and realizing a double calling mechanism of the local and the remote of the data.
Therefore, the data management effectively manages the data from the aspects of data asset album management and establishment, data classified storage design, data safety use, data service and the like, and ensures the manageability of the data carrier and the reliability of the use environment.
S105, cleaning and managing the real-time industrial data from the using stage of the real-time industrial data.
Specifically, the method for cleaning and managing the real-time industrial data from the using stage of the real-time industrial data comprises the following steps:
(1) And cleaning and managing the real-time industrial data according to application scenes, wherein the application scenes comprise real-time monitoring, associated display, history display and model display.
And (3) real-time monitoring: real-time monitoring of raw data;
and (3) association display: the comprehensive association real-time display of a plurality of original data;
history display: displaying different dimensionalities of the historical data;
model display: and displaying the data of the special mechanism model.
(2) And respectively cleaning and managing the real-time industrial data according to an algorithm model, wherein the algorithm model comprises a single quantity time model, a plurality of quantity time models, a plurality of quantity association models, a plurality of quantity mechanism models and a plurality of quantity other models.
Single-volume time model: a time model of a single monitored quantity;
multiple quantitative time models: a plurality of monitored time models;
a plurality of quantity correlation models: a plurality of monitored quantity correlation models;
a number of quantitative mechanism models: industry specialized mechanism models, such as spectrum models, rotation models, etc.;
multiple quantities other models: mainstream big data algorithms such as neural networks, linear regression, etc.
The data cleaning is a combination of cleaning and management methods, and the cleaning method research of different latitudes is performed from links such as data acquisition, data standardization, data storage, data access and the like throughout the data life cycle, so that a scientific method is provided for guaranteeing the data quality. Therefore, the invention researches the data cleaning method from three aspects of data production, maintenance and use, and provides technical requirements and design schemes of data asset management according to research results, and the invention cleans and manages the required data by collecting the power station equipment data, and finally sorts out effective data for data analysis and the existing informatization system, thereby providing reference and guidance for cleaning the data of other enterprises and power plants of the group.
Referring to fig. 2, fig. 2 shows a specific structure of the power industry base data cleaning system 100 of the present invention, which includes:
and the acquisition module 1 is used for acquiring real-time industrial data of the power station equipment.
And the checking and carding module 2 is used for checking and carding the real-time industrial data.
The cleaning and managing module 3 is used for cleaning and managing the real-time industrial data from the production stage, the maintenance stage and the use stage of the data respectively.
As shown in fig. 3, the cleaning and management module 3 includes:
the production stage cleaning unit 31 is configured to clean and manage real-time industrial data according to a data attribute and an acquisition channel of the data, where the data attribute includes a time attribute, a model attribute and a source system attribute, and the acquisition channel includes a data acquisition channel, a data extraction channel and a derivative calculation channel;
the maintenance phase cleaning unit 32 is configured to construct a data asset album, a storage database, a security guarantee and a data service, where the data asset album includes a search mode, a full data table and primary and backup source data, the storage database includes a time sequence database, a relational database, an unstructured database and a streaming media database, the security guarantee includes link security, tenant security, content security and protection security, and the data service includes scheduling management, multi-tenant management, data synchronization, isolation synchronization, data search and data call.
The use stage cleaning unit 33 is configured to clean and manage real-time industrial data according to an application scenario and an algorithm model, where the application scenario includes real-time monitoring, association display, history display and model display, and the algorithm model includes a single quantum time model, a plurality of quantum time models, a plurality of quantum association models, a plurality of quantum mechanism models and a plurality of quantum other models.
Therefore, the invention researches the data cleaning method from three aspects of data production, maintenance and use, and provides technical requirements and design schemes of data asset management according to research results, and the invention cleans and manages the required data by collecting the power station equipment data, and finally sorts out effective data for data analysis and the existing informatization system, thereby providing reference and guidance for cleaning the data of other enterprises and power plants of the group.
Further, for the data processing part, a special conversion bus should be set, the data processing process is reasonably planned, the flow is unified, the components which can be reused are extracted, and the flow configuration design is carried out. Correspondingly, the power industry basic data cleaning system is provided with an acquisition input module, an edge calculation module, a cache output module, a data monitoring module and a task scheduling module. Specifically:
and the acquisition input module is used for: the input data in the data conversion bus is collected by other external systems, and according to the difference of related protocols, two modes of passive receiving and active collection can be supported, the active collection mode needs to support scheduling setting so as to adjust the frequency of acquisition, and according to different collected targets, the universal industrial protocol, the private protocol, the database, the system, the file and the like need to be supported. The collected data is required to be stored in a buffer memory of the conversion bus and is processed by other subsequent services.
And an edge calculation module: the data conversion bus provides an edge calculation function, and can calculate and convert data in the cache through an algorithm. The data conversion needs to adopt a configuration design, the calculation unit is packaged into different operators, such as filtering, decomposing, merging, statistics, conversion and the like, a plurality of operators can form different processing flows, and the processing flows and the sequence can be flexibly configured. And recoding the data subjected to edge calculation, and putting the recoded data back into a cache.
And the cache output module is used for: the data acquired and calculated in the data conversion bus are all placed in a cache, the data are required to be issued and output to other systems, services, files and other targets, the interaction protocol of the data is required to be packaged in a plug-in mode aiming at different targets, and the rest parts are required to adopt a uniform operation mode so as to simplify the use cost. For different release targets, the output of targets such as general industrial protocols, private protocols, databases, systems, files and the like can be supported.
A data monitoring module: for the whole process, design, configuration, management and monitoring are required in a configuration mode. By assembling a plurality of processing flows between the input end points and the output end points, the processing flows can be intuitively controlled, the set flows are clear at a glance, and the maintenance and the adjustment are convenient. Meanwhile, in the configuration process, the processing results can be checked in different links for the design and the debugging of the conversion flow. For a normal running flow, the processing condition of each node, such as the processed total number, can also be counted.
Task scheduling module: the processing flow needs to control the processing speed, and for an input end, two processing modes of active acquisition and passive reception are supported at the same time, and for an output end, two modes of passive calling and active transmission are supported. In the mode of actively acquiring and transmitting, setting of different frequencies should be supported, meaningless repeated calling is avoided, and resources are reasonably allocated and used under the condition of meeting service requirements.
In order to realize sharing and accessing of data, the power industry basic data cleaning system further comprises a data retrieval module, a data calling module and a data synchronization module. Specifically:
and a data retrieval module: aiming at the identification, tree search, condition search, accurate search, fuzzy search and the like are required to be supported; for the retrieval of time sequence data, the functions of acquiring the latest data according to the identification and acquiring the historical data according to the time range are required to be supported, and the historical data also needs to be supported to be sparse or complement according to a certain method; for the retrieval of the relational data, functions such as identification retrieval, condition retrieval, association retrieval, and grouping statistics are required to be supported.
And a data calling module: providing a general and standard Restful interface, and using the interface when supplying and calling data by an external system; in addition to passive invocation, the method can also adopt an active sending mode to enable an external system to obtain data, for example, the data is sent to a designated target through a data sending program.
And a data synchronization module: for data distributed in different data pools, a synchronization mechanism should be provided to meet the sharing requirements between the different data pools. Data synchronization needs to support the functions of history synchronization and real-time synchronization, and simultaneously supports two modes of incremental synchronization and full synchronization. Different synchronization frequencies and modes are set for different data types. Based on the safety consideration, in some cases, only one-way transmission is performed between two systems which perform synchronization (for example, in the case of an isolation device), in such cases, synchronous services are required to support, and transmission limits are different for different isolation systems, so that componentization development is required to be supported, and only a corresponding adaptation protocol is required to be developed when a new isolation limit rule is faced.
In order to meet the use of different user groups, the power industry basic data cleaning system further comprises a system service module, a data service module and an application display module. Specifically:
and a system service module: from the system service perspective, the same software and hardware system can be supported by different units and departments, but independent data storage areas and micro-service examples are required to be created for different tenants so as to ensure the data security and service capability of the data storage areas and the micro-service examples, and different services for different tenants need to be supported and different resources are required to be set. Meanwhile, the management capability of different tenants on own owned data and services is required to be provided. In this respect, it is recommended to use a Docker containerization manner to manage, reasonably allocate resources, and use a virtual machine manner to occupy a large amount of resources, which is difficult to manage, operate and maintain, and should be avoided.
And the data service module is used for: from the data service perspective, the system should set permissions for different data, based on the data identification, different user groups can only access the respective permissions and the data thereof, the data accessed by the users of different levels are also related to the permissions, and the permissions can be set according to the levels (such as group level, division level and station level) of the users and the departments to carry out corresponding authorization.
An application display module: from the application display perspective, applications that can be seen by different users should also be different, for example, group users can see statistics information of all branches of a group, and a factory station can only see data information in the factory station.
In order to manage the monitoring application, the power industry basic data cleaning system further comprises a configuration display module, a data analysis module and a coding management module. Specifically:
the configuration display module provides a visual configuration display tool, and a user can flexibly configure required monitoring software so as to display a data mining result. And various quantization status monitoring and alarming software can be developed by using the configuration editing tool.
And the data analysis module is used for searching the real-time industrial data according to codes, chinese names or fuzzy query modes. It should be noted that, the data analysis module may search all data of the system according to the modes of coding, chinese name, fuzzy query, etc., and display the contents of real-time value, historical trend, coding information, etc.; the data concerned can be downloaded, exported, drawn into trend curve, and subjected to association analysis.
The code management module: the KKS standard management method is used for carrying out coding configuration and maintenance work and realizing KKS standard management of data. The code management can realize KKS code input, modification, deletion, batch addition, batch modification, import and export, historical trend and real-time trend checking functions, and can realize standardization of data of the multi-river basin multi-station KKS, and unified retrieval and management.
In addition, the power industry basic data cleaning system adopts a micro-service architecture and adopts a Docker containerization technology for management.
Compared with the traditional single-point system, the micro-service system independently packages the functions with different service requirements, so that the competition of resources and the influence of service flows among different service functions can be avoided, and different operation examples can be additionally arranged for the use frequency of the different functions to expand the performance; in addition, the micro-service can divide a complex large system into a plurality of simple small services, so that the business technology difficulty is reduced, and the development cost is reduced. And the micro service is independent and standard in definition, so that different suppliers can participate in collaborative development and subsequent function upgrading conveniently. Accordingly, to meet the management and business requirements, the system should be divided into a plurality of micro services according to the functional scope, and based on the management angle, the system should include a scheduling management micro service and a multi-tenant management micro service; based on the data angle, services such as synchronization, retrieval, invocation, conversion of data should be included.
Meanwhile, a unified management interface is required to schedule the service components contained in the system, and the functions of starting, stopping, deleting, configuring, backing up and the like are included, so that the above management requirements are met, a container is required to be packaged by adopting a Docker containerization technology, and the above scheduling function is completed by the Docker-based management interface.
Therefore, the system suggests to adopt a micro-service architecture, and simultaneously uses the Docker containerization technology to manage the service, thereby reducing the workload and difficulty of operation and maintenance management, improving the efficiency and reducing the cost.
In summary, the design of the invention needs to adopt the main stream big data platform framework and combine with the technical means such as artificial intelligence, and the like, thereby supporting the deepening and the expansion of the later demands from the use angles of different users, and having the characteristics of stability, high efficiency, convenient use, easy management, expandable performance, easy increase of functions and the like.
While the foregoing is directed to the preferred embodiments of the present invention, it will be appreciated by those skilled in the art that changes and modifications may be made without departing from the principles of the invention, such changes and modifications are also intended to be within the scope of the invention.

Claims (9)

1. A method of power industry base data cleaning, comprising:
collecting real-time industrial data of power station equipment;
checking and carding the real-time industrial data;
cleaning and managing the real-time industrial data from a production stage of the real-time industrial data; the method for cleaning and managing the real-time industrial data from the production stage of the real-time industrial data comprises the following steps: respectively cleaning and managing real-time industrial data according to data attributes, wherein the data attributes comprise time attributes, model attributes and source system attributes; respectively cleaning and managing real-time industrial data according to a data acquisition channel, wherein the acquisition channel comprises a data acquisition channel, a data extraction channel and a derivative calculation channel;
cleaning and managing the real-time industrial data from a maintenance stage of the real-time industrial data;
the real-time industrial data is cleaned and managed from a use stage of the real-time industrial data.
2. The method of power industry base data cleaning of claim 1, wherein the method of inspecting and grooming real-time industrial data comprises:
checking and carding power generation business data of a hydropower station control layer and a wind power plant control system;
aiming at the problematic data sources, the method goes deep into the hydropower and wind turbine units, and checks and combs the unit service data and communication protocols.
3. The method of power industry base data cleaning of claim 1, wherein the method of cleaning and managing real-time industrial data from a maintenance phase of the real-time industrial data comprises:
constructing a data asset album, wherein the data asset album comprises a retrieval mode, a full data table and main and standby source data;
constructing a storage database, wherein the storage database comprises a time sequence database, a relation database, an unstructured database and a streaming media database;
constructing a security guarantee, wherein the security guarantee comprises link security, tenant security, content security and protection security;
data services are constructed, including dispatch management, multi-tenant management, data synchronization, quarantine synchronization, data retrieval, and data invocation.
4. The method of power industry base data cleaning of claim 1, wherein the method of cleaning and managing real-time industrial data from the use phase of real-time industrial data comprises:
respectively cleaning and managing the real-time industrial data according to application scenes, wherein the application scenes comprise real-time monitoring, associated display, history display and model display;
and respectively cleaning and managing the real-time industrial data according to an algorithm model, wherein the algorithm model comprises a single quantity time model, a plurality of quantity time models, a plurality of quantity association models, a plurality of quantity mechanism models and a plurality of quantity other models.
5. The method of power industry base data cleaning of claim 1, wherein the method of collecting real-time industrial data of a power plant facility comprises:
passively receiving real-time industrial data of the power station equipment;
real-time industrial data of power station equipment is actively collected.
6. A power industry base data cleaning system, comprising:
the acquisition module is used for acquiring real-time industrial data of the power station equipment;
the checking and carding module is used for checking and carding the real-time industrial data;
the cleaning and managing module is used for cleaning and managing the real-time industrial data from the production stage, the maintenance stage and the use stage of the data respectively;
the cleaning and managing module comprises a production stage cleaning unit which is used for cleaning and managing real-time industrial data according to data attributes and a data acquisition channel, wherein the data attributes comprise time attributes, model attributes and source system attributes, and the acquisition channel comprises a data acquisition channel, a data extraction channel and a derivative calculation channel.
7. The power industry base data cleaning system of claim 6, wherein the cleaning and management module further comprises:
the maintenance stage cleaning unit is used for constructing a data asset album, a storage database, a security guarantee and a data service, wherein the data asset album comprises a retrieval mode, a full data table and main and standby source data, the storage database comprises a time sequence database, a relational database, an unstructured database and a streaming media database, the security guarantee comprises link security, tenant security, content security and protection security, and the data service comprises scheduling management, multi-tenant management, data synchronization, isolation synchronization, data retrieval and data call;
the using stage cleaning unit is used for cleaning and managing real-time industrial data according to an application scene and an algorithm model, wherein the application scene comprises real-time monitoring, association display, history display and model display, and the algorithm model comprises a single quantity time model, a plurality of quantity time models, a plurality of quantity association models, a plurality of quantity mechanism models and a plurality of quantity other models.
8. The power industry base data cleansing system of claim 6, further comprising a data analysis module for retrieving real-time industrial data in a code, chinese name, or fuzzy query.
9. The power industry base data cleaning system of any one of claims 6-8, wherein the power industry base data cleaning system is managed using a micro-service architecture and using a Docker containerization technique.
CN202010171013.0A 2020-03-12 2020-03-12 Method and system for cleaning basic data of power industry Active CN111538720B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010171013.0A CN111538720B (en) 2020-03-12 2020-03-12 Method and system for cleaning basic data of power industry

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010171013.0A CN111538720B (en) 2020-03-12 2020-03-12 Method and system for cleaning basic data of power industry

Publications (2)

Publication Number Publication Date
CN111538720A CN111538720A (en) 2020-08-14
CN111538720B true CN111538720B (en) 2023-07-21

Family

ID=71976753

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010171013.0A Active CN111538720B (en) 2020-03-12 2020-03-12 Method and system for cleaning basic data of power industry

Country Status (1)

Country Link
CN (1) CN111538720B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112348698A (en) * 2020-10-30 2021-02-09 中核核电运行管理有限公司 Nuclear power plant group pile management method, device and system
CN114722037B (en) * 2022-05-16 2022-08-26 中国信息通信研究院 Industrial Internet middleware data processing method, middleware and readable storage medium

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102609501A (en) * 2012-02-02 2012-07-25 北京华电天仁电力控制技术有限公司 Data cleaning method based on real-time historical database
CN104036001A (en) * 2014-06-13 2014-09-10 上海新炬网络技术有限公司 Dynamic hotlist priority scheduling based quick data cleaning method
CN105278373A (en) * 2015-10-16 2016-01-27 中国南方电网有限责任公司电网技术研究中心 Substation integrated information processing system realizing method
CN106777227A (en) * 2016-12-26 2017-05-31 河南信安通信技术股份有限公司 Multidimensional data convergence analysis system and method based on cloud platform
CN107153664A (en) * 2016-03-04 2017-09-12 同方知网(北京)技术有限公司 A kind of method flow that research conclusion is simplified based on the scientific and technical literature mark that assemblage characteristic is weighted
CN107908690A (en) * 2017-11-01 2018-04-13 南京欣网互联网络科技有限公司 A kind of data processing method based on big data OA operation analysis
CN109947754A (en) * 2019-01-28 2019-06-28 中科恒运股份有限公司 Data cleaning method and device
CN110489459A (en) * 2019-08-07 2019-11-22 国网安徽省电力有限公司 A kind of enterprise-level industry number fused data analysis system based on big data platform
CN110618983A (en) * 2019-08-15 2019-12-27 复旦大学 JSON document structure-based industrial big data multidimensional analysis and visualization method
CN110727666A (en) * 2019-09-25 2020-01-24 中冶赛迪重庆信息技术有限公司 Cache assembly, method, equipment and storage medium for industrial internet platform

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7680600B2 (en) * 2007-07-25 2010-03-16 Schlumberger Technology Corporation Method, system and apparatus for formation tester data processing
US10459932B2 (en) * 2014-12-18 2019-10-29 Business Objects Software Ltd Visualizing large data volumes utilizing initial sampling and multi-stage calculations

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102609501A (en) * 2012-02-02 2012-07-25 北京华电天仁电力控制技术有限公司 Data cleaning method based on real-time historical database
CN104036001A (en) * 2014-06-13 2014-09-10 上海新炬网络技术有限公司 Dynamic hotlist priority scheduling based quick data cleaning method
CN105278373A (en) * 2015-10-16 2016-01-27 中国南方电网有限责任公司电网技术研究中心 Substation integrated information processing system realizing method
CN107153664A (en) * 2016-03-04 2017-09-12 同方知网(北京)技术有限公司 A kind of method flow that research conclusion is simplified based on the scientific and technical literature mark that assemblage characteristic is weighted
CN106777227A (en) * 2016-12-26 2017-05-31 河南信安通信技术股份有限公司 Multidimensional data convergence analysis system and method based on cloud platform
CN107908690A (en) * 2017-11-01 2018-04-13 南京欣网互联网络科技有限公司 A kind of data processing method based on big data OA operation analysis
CN109947754A (en) * 2019-01-28 2019-06-28 中科恒运股份有限公司 Data cleaning method and device
CN110489459A (en) * 2019-08-07 2019-11-22 国网安徽省电力有限公司 A kind of enterprise-level industry number fused data analysis system based on big data platform
CN110618983A (en) * 2019-08-15 2019-12-27 复旦大学 JSON document structure-based industrial big data multidimensional analysis and visualization method
CN110727666A (en) * 2019-09-25 2020-01-24 中冶赛迪重庆信息技术有限公司 Cache assembly, method, equipment and storage medium for industrial internet platform

Also Published As

Publication number Publication date
CN111538720A (en) 2020-08-14

Similar Documents

Publication Publication Date Title
CN112685385B (en) Big data platform for smart city construction
CN112396404A (en) Data center system
CN110765337A (en) Service providing method based on internet big data
US9123006B2 (en) Techniques for parallel business intelligence evaluation and management
CN109582717A (en) A kind of database unified platform and its read method towards electric power big data
CN111538720B (en) Method and system for cleaning basic data of power industry
CN114153920A (en) Big data edge platform and method
CN113094385A (en) Data sharing fusion platform and method based on software definition open toolset
CN114218218A (en) Data processing method, device and equipment based on data warehouse and storage medium
CN114756563A (en) Data management system with multiple coexisting complex service lines of internet
CN112651872A (en) Community comprehensive treatment system and method based on data middlebox
CN112988919A (en) Power grid data market construction method and system, terminal device and storage medium
CN115934856A (en) Method and system for constructing comprehensive energy data assets
CN114706994A (en) Operation and maintenance management system and method based on knowledge base
Wu et al. An auxiliary decision-making system for electric power intelligent customer service based on Hadoop
CN115016902B (en) Industrial flow digital management system and method
CN116795816A (en) Stream processing-based multi-bin construction method and system
CN111797156A (en) Artificial intelligence micro service system
CN107423035A (en) A kind of software development process product data management system
CN113706101B (en) Intelligent system architecture and method for power grid project management
CN115936296A (en) Production and manufacturing data cockpit system of discrete manufacturing enterprise based on industrial internet big data lake
CN111797084B (en) Data coding through mark inspection method and system based on weapon equipment test flow
CN112965948A (en) Management service center system based on data
CN112101894A (en) Coal dressing intelligent system
CN108959398A (en) Isomery storage expansion system and method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant