CN116739408A - Power grid dispatching safety monitoring method and system based on data tag and electronic equipment - Google Patents

Power grid dispatching safety monitoring method and system based on data tag and electronic equipment Download PDF

Info

Publication number
CN116739408A
CN116739408A CN202310518939.6A CN202310518939A CN116739408A CN 116739408 A CN116739408 A CN 116739408A CN 202310518939 A CN202310518939 A CN 202310518939A CN 116739408 A CN116739408 A CN 116739408A
Authority
CN
China
Prior art keywords
data
power grid
tag
information
label
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310518939.6A
Other languages
Chinese (zh)
Inventor
谭期文
谢菁
张希翔
蒙琦
董贇
艾徐华
韦宗慧
古哲德
周迪贵
覃宁
陶思恒
黄汉华
陈昭利
张丽媛
曾虎双
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangxi Power Grid Co Ltd
Original Assignee
Guangxi Power Grid Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangxi Power Grid Co Ltd filed Critical Guangxi Power Grid Co Ltd
Priority to CN202310518939.6A priority Critical patent/CN116739408A/en
Publication of CN116739408A publication Critical patent/CN116739408A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J13/00Circuit arrangements for providing remote indication of network conditions, e.g. an instantaneous record of the open or closed condition of each circuitbreaker in the network; Circuit arrangements for providing remote control of switching means in a power distribution network, e.g. switching in and out of current consumers by using a pulse code signal carried by the network
    • H02J13/00006Circuit arrangements for providing remote indication of network conditions, e.g. an instantaneous record of the open or closed condition of each circuitbreaker in the network; Circuit arrangements for providing remote control of switching means in a power distribution network, e.g. switching in and out of current consumers by using a pulse code signal carried by the network characterised by information or instructions transport means between the monitoring, controlling or managing units and monitored, controlled or operated power network element or electrical equipment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06395Quality analysis or management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J13/00Circuit arrangements for providing remote indication of network conditions, e.g. an instantaneous record of the open or closed condition of each circuitbreaker in the network; Circuit arrangements for providing remote control of switching means in a power distribution network, e.g. switching in and out of current consumers by using a pulse code signal carried by the network
    • H02J13/00002Circuit arrangements for providing remote indication of network conditions, e.g. an instantaneous record of the open or closed condition of each circuitbreaker in the network; Circuit arrangements for providing remote control of switching means in a power distribution network, e.g. switching in and out of current consumers by using a pulse code signal carried by the network characterised by monitoring
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/38Arrangements for parallely feeding a single network by two or more generators, converters or transformers
    • H02J3/46Controlling of the sharing of output between the generators, converters, or transformers
    • H02J3/466Scheduling the operation of the generators, e.g. connecting or disconnecting generators to meet a given demand
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2203/00Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
    • H02J2203/20Simulating, e g planning, reliability check, modelling or computer assisted design [CAD]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Strategic Management (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Power Engineering (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Educational Administration (AREA)
  • Software Systems (AREA)
  • Development Economics (AREA)
  • Game Theory and Decision Science (AREA)
  • Primary Health Care (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Water Supply & Treatment (AREA)
  • Computing Systems (AREA)
  • Public Health (AREA)
  • Mathematical Physics (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a power grid dispatching safety monitoring method, a system and electronic equipment based on a data tag, wherein the method comprises the following steps: extracting core key data in a power grid system; constructing a data tag library; analyzing and mining a large amount of data in a power grid system by utilizing a big data technology, constructing a data quality abnormality identification model by utilizing a machine learning technology, and comprehensively evaluating core key data by utilizing the data tag library to obtain an evaluation tag of the core key data; and forming a health degree portrait of the data quality by a dynamic label technology, comprehensively evaluating the health degree of the data quality, and monitoring the dispatching safety of the power grid system according to the health degree of the data quality. According to the embodiment, the description and the portrait of the core data characteristics of the power grid are researched based on the enterprise-level information model of the power grid, so that data management staff can clearly know the relevant characteristics of the data, and the dispatching safety of the power grid system is monitored.

Description

Power grid dispatching safety monitoring method and system based on data tag and electronic equipment
Technical Field
The invention relates to the technical field of electric power, in particular to a power grid dispatching safety monitoring method, a system and electronic equipment based on data labels.
Background
The data tags should be part of the enterprise data management effort while also providing support for other data management efforts. In the data management knowledge system guideline issued by the international data management association (DAMA international), the proposed enterprise data management function does not relate to the related content of the data tag temporarily, but has close relations with the fields of data management, data architecture, data security, data quality, data application and the like from the aspects of development and application of the data tag, so that a data management and control means with more flexibility, finer granularity, more convenience and more service value can be provided for the data management work, the data tag becomes an important content of the data management, meanwhile, the life cycle process of the data tag has close internal relation with the life cycle process of the data, service personnel actively participate in the data management in the creation and maintenance processes of the tag, and the service personnel directly obtain service value feedback in the application process of the tag, and the development of the data management work can be actively promoted.
Data tags have evolved to date to derive more advanced applications such as sales for customer credit rating, user behavior analysis and personalized marketing services, power equipment portrayal technology, operation monitoring analysis in the power grid scheduling field, and the like. The application of the data tag not only can help people to quickly acquire effective information from mass data, but also can realize data analysis on the basis of the acquired data, thereby providing a reliable basis for subsequent data mining work.
Therefore, for operation monitoring analysis in the field of power grid dispatching, analysis and extraction of data labels of the power grid are required.
Disclosure of Invention
The invention aims to provide a power grid dispatching safety monitoring method, a system and electronic equipment based on a data tag, which can monitor the operation in the power grid dispatching field through the construction of the data tag of a power grid.
In a first aspect, an embodiment of the present invention provides a method for monitoring power grid dispatching security based on a data tag, including:
extracting core key data in a power grid system;
constructing a data tag library;
analyzing the generation reasons and characteristics of data quality and data management problems in the power grid system, analyzing and mining mass data in the power grid system by utilizing a big data technology, constructing a data quality anomaly identification model by utilizing a machine learning technology, and comprehensively evaluating the core key data by utilizing the data tag library to obtain an evaluation tag of the core key data;
and forming a health degree portrait of the data quality by a dynamic label technology, and comprehensively evaluating the health degree of the data quality so as to monitor the dispatching safety of the power grid system according to the health degree of the data quality.
Further, the constructing a database tag library includes:
starting from a core service entity of the power grid system, integrally planning a label system architecture based on a public information model text of the power grid system from a service perspective, and determining main problems to be solved by the label system;
formulating a data tag definition specification, refining tag definition according to the specification, and defining tag business rules;
preparing a data development environment, opening different source data around the core business entity, carrying out link data source, data type identification and data acquisition work, and acquiring source data information from each business system in the power grid system;
loading the source data information into a data encapsulation table, or performing data mapping according to upper application requirements, and processing the source data information into a data virtual table;
and establishing a data tag by adopting a proper definition method according to the complexity of the data tag according to the data encapsulation table or the data virtual table, and constructing the data tag library.
Further, the creating the data tag by adopting a proper definition method according to the complexity of the data tag according to the data encapsulation table or the data virtual table, and constructing the data tag library comprises the following steps:
And carrying out distribution analysis, comparison analysis, statistical analysis, pareto analysis, normalization detection and correlation analysis on the data encapsulation table or the data virtual table, creating a data tag of the power grid system, and constructing the data tag library.
Further, the forming the health portrait of the data quality through a dynamic label technology includes:
generating a health degree portrait of the data quality, and carrying out the health degree portrait on the data label from a statistics label, a rule label and an algorithm label;
the statistical type labels comprise data use frequency and table field number, the rule type labels comprise metadata information and determined rule generation, and the algorithm type labels are generated through machine learning algorithm mining and are used for analyzing and judging derived attributes of the data.
Further, the extracting core key data in the power grid system includes:
collecting source data information of each service system in the power grid system, removing redundant punctuation marks and special characters in the source data information, performing dictionary construction through a public information model text of the power grid system, and then performing word segmentation and word vector training to obtain word vectors of the source data information;
Based on a model training mode, firstly carrying out named entity identification on the source data information, then carrying out entity disambiguation, and carrying out verification and deviation correction on the results after the entity disambiguation by using a knowledge base to obtain disambiguation results of the source data information;
establishing a standard word stock of the power grid industry according to the disambiguation result of the source data information and the word vector of the source data information;
and acquiring the core key data based on the power grid industry standard word stock.
Further, the obtaining the core key data based on the grid industry standard word stock specifically includes the following three methods:
and acquiring the core key data based on the standard word stock of the power grid industry by one or more of a frequent pattern analysis technology, a network node centrality measurement technology and a network bridge node detection technology.
In a second aspect, an embodiment of the present invention provides a power grid dispatching security monitoring system based on a data tag, including:
the extraction module is used for extracting core key data in the power grid system;
the construction module is used for constructing a data tag library;
the analysis module is used for analyzing the generation reasons and characteristics of the data quality generation and the data management problems in the power grid system, analyzing and mining the mass data in the power grid system by utilizing a big data technology, constructing a data quality abnormality identification model by utilizing a machine learning technology, and comprehensively evaluating the core key data by utilizing the data tag library to obtain an evaluation tag of the core key data;
And the monitoring module is used for forming a health degree portrait of the data quality through a dynamic label technology, comprehensively evaluating the health degree of the data quality, and monitoring the dispatching safety of the power grid system according to the health degree of the data quality.
Further, the building module includes:
the service unit is used for starting from a core service entity of the power grid system, integrally planning a label system architecture based on a public information model text of the power grid system from a service perspective, and determining main problems to be solved by the label system;
the standard unit is used for making a data tag definition standard, refining tag definition according to the standard and defining tag service rules;
the development unit is used for preparing a data development environment, communicating different source data around the core business entity, carrying out link data source, data type identification and data acquisition work, and acquiring source data information from each business system in the power grid system;
the loading unit is used for loading the source data information into a data encapsulation table or carrying out data mapping according to the upper application requirement and processing the source data information into a data virtual table;
and the construction unit is used for constructing the data tag library by adopting a proper definition method according to the complexity of the data tag according to the data encapsulation table or the data virtual table.
In a third aspect, the present invention provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing a power grid dispatching security monitoring method based on data tags as provided in the first aspect when executing the program.
In a fourth aspect, the present invention provides a non-transitory computer readable storage medium, characterized in that it has stored thereon a computer program which, when executed by a processor, implements a data tag based power grid dispatching security monitoring method as provided in the first aspect.
Compared with the prior art, the power grid dispatching safety monitoring method, the system and the electronic equipment based on the data tag are based on the power grid enterprise-level information model and the power grid core data identification research result, and the description and the portrait of the power grid core data characteristics are researched. Various types of data acquired by a data acquisition adapter in the multi-source heterogeneous data intelligent acquisition system are subjected to research on a deep semantic analysis model by establishing a classification label management system of power grid data, and related characteristics such as business attributes, management attributes and technical attributes are distinguished by assisting in semi-automatic manual checking and adjustment through automatic semantic identification and analysis, and the data are marked with characteristic labels reflecting meanings of the data, so that panoramic data portraits are constructed, data management staff can clearly know the related characteristics of the data, and therefore the dispatching safety of the power grid system is monitored.
Drawings
Fig. 1 is a flowchart of a power grid dispatching safety monitoring method based on a data tag according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a power grid dispatching safety monitoring system based on a data tag according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In the description of the present invention, the meaning of "plurality" means at least two, for example, two, three, etc., unless specifically defined otherwise.
Reference in the specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the invention. Thus, the terms "comprising," "including," "having," and variations thereof herein mean "including but not limited to," unless expressly specified otherwise.
Fig. 1 is a flowchart of a power grid dispatching safety monitoring method based on a data tag according to an embodiment of the present invention, as shown in fig. 1, the method includes:
s110, extracting core key data in a power grid system;
in some embodiments, the extracting core key data in the grid system includes:
collecting source data information of each service system in the power grid system, removing redundant punctuation marks and special characters in the source data information, performing dictionary construction through a public information model text of the power grid system, and then performing word segmentation and word vector training to obtain word vectors of the source data information;
based on a model training mode, firstly carrying out named entity identification on the source data information, then carrying out entity disambiguation, and carrying out verification and deviation correction on the results after the entity disambiguation by using a knowledge base to obtain disambiguation results of the source data information;
Establishing a standard word stock of the power grid industry according to the disambiguation result of the source data information and the word vector of the source data information;
and acquiring the core key data based on the power grid industry standard word stock.
In some embodiments, the obtaining the core key data based on the grid industry standard word stock specifically includes the following three methods:
and acquiring the core key data based on the standard word stock of the power grid industry by one or more of a frequent pattern analysis technology, a network node centrality measurement technology and a network bridge node detection technology.
The label is a core part of the data image, based on the power grid enterprise-level information model, the research result is identified according to the power grid core data, and the characteristic label of the power grid data is designed, so that the characteristic label plays a decisive role in the effect of the data image. The main process of the core data feature description is as follows:
source data information collection
The source data information acquisition process mainly comprises the steps of source data information acquisition of a source system, removing redundant punctuation marks and special characters, performing dictionary construction by using a public information model (SG-CIM) text of a power grid company, and then performing word segmentation and word vector training and using the dictionary construction in subsequent work.
(1) Source data information preprocessing
The source heterogeneous data intelligent acquisition system is used for linking the source system, identifying the data type of the source system and acquiring source data information of the source system by using a corresponding acquisition adapter according to the type; redundant information, such as punctuation marks, special characters, etc., in the source data is then removed.
(2) Construction base dictionary
The accuracy of word segmentation has great influence on subsequent work, and in order to make the word segmentation more accurate, the collected corpus is used for extracting new words. In the corpus collected in this embodiment, SG-CIM text is preferentially adopted. The reason for choosing SG-CIM text as dictionary is as follows:
in order to unify the expression models of the power grid, the International Electrotechnical Commission (IEC) has established an IEC61970/61968 series of standards, the core of which is CIM, which provides a comprehensive logical view of information about the power enterprise, an abstract model representing all major objects of the power enterprise, including the common classes, attributes of these objects and the relationships between them. The interaction specification specifies the interface for data interaction between different systems.
The SG-CIM is a company public information model formed by inheriting and expanding the CIM according to the actual business of a power grid limited company by combining SG-ERP model design results and methods on the basis of international standard IEC latest version IEC61970 CIM17v08, IEC 61968CIM13v02 and IEC 62325CIM03v02 public information model (CIM) and a design method thereof. The SG-CIM is used for screening and classifying the business according to the power grid company, and 12 theme domains are divided in total, wherein the method comprises the following steps: customers, products, markets, equipment, grids, security, finance, assets, personnel, materials, projects, and synthesis. Each subject field contains several elements. Such as the following device domains: transmission equipment, transformation equipment, distribution equipment, protection equipment, automation equipment and the like. The SG-CIM adopts classes to describe the objects of each power system, and the relationships among the objects of the power system are described through the relationships among the classes, including inheritance, association, aggregation and the like. The SG-CIM model covers a core service object of the power system.
Other corpus are preprocessed to further enrich the dictionary, and the dictionary is constructed based on the following method.
(1) And constructing a dictionary by using the mutual information and the left entropy and the right entropy, extracting vocabulary based on the acquired corpus, and constructing a new dictionary. The method is characterized in that the method comprises the steps of firstly extracting possible words from an original text, and then comparing the extracted words with a basic dictionary, wherein words which are not in the original dictionary are new words, so that a word extraction algorithm becomes keys for constructing the dictionary. The new word recognition method adopts a method based on statistics, and mainly utilizes calculation rules of mutual information and left and right entropy, and the concepts of the mutual information and the left and right information entropy come from information theory. In the computational language model, mutual information is also called internal coagulability, and left and right information entropy is also called external degree of freedom.
(2) Word vector training
After word segmentation is performed on the original corpus, word vector training is needed next. The word vector is a form of converting words into a numerical value vector, and computer calculation can be facilitated only after the words are converted into the numerical value form. The Word vectors obtained in this embodiment are Word vectors trained using the open source tools Word2vec and Bert of Google.
(II) word segmentation and disambiguation of source data information
The method comprises the steps of segmenting source data information, carrying out source data word segmentation processing by adopting a Chinese word segmentation technology, wherein the Chinese word segmentation is mainly divided into the following three types of methods:
(1) Dictionary-based Chinese word segmentation
The method comprises the steps of firstly establishing a unified dictionary table, when a sentence needs to be segmented, firstly splitting the sentence into a plurality of parts, and enabling each part to be in one-to-one correspondence with the dictionary, if the word is in the dictionary, successfully segmenting the word, otherwise, continuing splitting and matching until success. The Chinese word segmentation based on dictionary has the core of segmentation rule and matching sequence.
(2) Chinese word segmentation method based on statistics
The statistics considers that word segmentation is a probability maximization problem, namely, sentences are split, the probability of occurrence of words composed of adjacent words is counted based on a corpus, the probability of occurrence is high when the number of occurrence of the adjacent words is large, word segmentation is carried out according to probability values, and therefore a complete corpus is important.
(3) Word segmentation method based on understanding
The word segmentation method based on understanding achieves the effect of word recognition by enabling a computer to simulate the understanding of a sentence by a person. The basic idea is that the syntactic and semantic analysis is performed while the words are segmented, and the syntactic information and the semantic information are utilized to process the ambiguity. It generally consists of three parts: the system comprises a word segmentation subsystem, a syntactic semantic subsystem and a general control part. Under the coordination of the general control part, the word segmentation subsystem can obtain the syntactic and semantic information of related words, sentences and the like to judge word segmentation ambiguity, namely, the word segmentation subsystem simulates the understanding process of people to sentences. This word segmentation method requires the use of a large amount of language knowledge and information. Because of the general and complex nature of Chinese language knowledge, it is difficult to organize various language information into machine-readable forms, and word segmentation systems based on understanding are still in the experimental stage at present.
According to the practical analysis of the embodiment, the embodiment adopts Chinese word segmentation based on dictionary.
And (III) source data information disambiguation, wherein the embodiment comprehensively applies a knowledge base-based method and a corpus-based method to realize entity disambiguation. Firstly, carrying out named entity identification based on a model training mode, and then carrying out entity disambiguation; and then, verifying and rectifying the results after entity disambiguation by using a knowledge base, and if the model training results of certain words are greatly different from the knowledge base identification results, explaining that the conditions possibly exist and are inconsistent with common sense, and screening the results in a manual intervention mode. The two models are combined, corpus training is taken as a main part, a traditional knowledge base mode is taken as an auxiliary part, and the results of the two modes are combined to achieve a good disambiguation effect.
Establishing a word stock of the power grid industry, analyzing a data model and design document information of a related system of the power grid industry based on the industry public word stock, carrying out corpus analysis training by utilizing NLP models such as BM25, textRank and LDA, gradually perfecting the word stock, constructing the word stock with the characteristics of the power grid industry, and improving the accuracy of subsequent word segmentation and analysis of source data.
And fourthly, core data identification is carried out according to the source data information by adopting the following 3 core data identification model algorithms which are suitable for the power grid and are provided by the power grid core data identification research result.
(1) Frequent pattern analysis technique
The method comprises the steps of adopting an algorithm HPrepost for efficiently mining frequent patterns in large data, firstly improving the PrePost algorithm in the aspects of compressing a database, simplifying data representation, adopting efficient connection and pruning strategies and the like, and then combining a Hadoop platform to perform parallel frequent pattern mining. Experimental results show that the HPREpost algorithm has good speed ratio and load balancing.
The power grid data log has the characteristics of huge data volume and complex data structure, the traditional frequent pattern analysis method is effective but is not suitable for large-scale mass data mining processing, the HPrepost algorithm is based on the expansibility of the hadoop platform and the high efficiency of the HPrepost algorithm, the data mining performance requirements of the increasingly large mass power grid data log can be well met, and the hidden valuable association relationship in the power grid data log can be found by mining frequent patterns in the power grid big data, so that the core data in the power grid can be obtained.
(2) Network node centrality measurement technique
Common node centrality measurement methods include degree, medium number, tightness, pageRank and K-shell, but node degree only considers local importance of nodes, K-shell 1 roughly layers the nodes in the whole network less finely, pageRank is only effective in a directed network, medium number and tightness consider global importance of the nodes by utilizing shortest paths between node pairs, and important nodes in a complex network still cannot be effectively mined in many times.
The proposed complex network node influence model is researched, and firstly, the influence of the node on a single node is started. The number of paths connected between the nodes is considered to be a key factor for measuring the influence among the nodes, so that the K-degree influence among the nodes is proposed, the influence is found to be converged when K tends to infinity, and the comparison experiment fully shows that the complex network node influence model is more effective than the node degree, the medium number, the tightness and the K-shell in identifying important nodes in the network.
(3) Network bridge node detection technology
The nature structure of the "clusters" exists in the grid data, and the "bridge nodes" are important nodes connected with different "clusters" in the network. Without a "bridge node," the entire network may not be connected, resulting in a system that is not functioning properly. Compared with the topology information, the complex relation in the real network can be easily captured by combining the attribute information contained in the network to perform the network representation learning algorithm, the characterization vector with more discriminative power can be obtained, and the node classification recognition performance is more excellent. Thus, the important 'clusters' and 'bridge nodes' in the power grid data (namely the core data of the power grid) can be more easily found by adopting a network representation learning algorithm combined with attribute information, so that the algorithm can be used as an important component of a power grid core data identification model.
S120, constructing a data tag library;
in some embodiments, the building a database of data tags includes:
starting from a core service entity of the power grid system, integrally planning a label system architecture based on a public information model text of the power grid system from a service perspective, and determining main problems to be solved by the label system;
formulating a data tag definition specification, refining tag definition according to the specification, and defining tag business rules;
preparing a data development environment, opening different source data around the core business entity, carrying out link data source, data type identification and data acquisition work, and acquiring source data information from each business system in the power grid system;
loading the source data information into a data encapsulation table, or performing data mapping according to upper application requirements, and processing the source data information into a data virtual table;
and establishing a data tag by adopting a proper definition method according to the complexity of the data tag according to the data encapsulation table or the data virtual table, and constructing the data tag library.
The enterprise data tag library needs the collaborative construction of a business department and a technical department, and generally mainly comprises the following 7 steps:
(1) Label system planning: starting from a power grid core service entity, a label system architecture is integrally planned based on a public information model text (SG-CIM) of a power grid system from a service perspective, and the main problem to be solved by the label system is defined.
(2) Tag rule carding: and (3) formulating a data tag definition specification, refining tag definition according to the specification, and defining tag business rules.
(3) And (3) data collection: preparing a data development environment, opening different source data around a core business entity, carrying out the work of linking data sources, identifying data types, collecting data and the like, and converging multi-source heterogeneous data resources.
(4) Data preprocessing: pre-processing is performed in advance for data tagging. The collected source data is loaded into a data encapsulation table or is processed into a data virtual table by data mapping according to the upper application requirement; and meanwhile, the related source data information is stored and managed.
(5) And (3) generating a data tag: and adopting a proper definition method to create the data label according to the complexity of the data label.
The Tag is a binary group containing data attributes and values, and is formally described as tag= < Name, W >, wherein Name represents attribute Name, W represents weight, the type and the Value range are determined by the attributes, for example, a device Tag is < device Name, main transformer D330>, the data Tag is different from the feature in that the data Tag gives the attributes at the same time, and the data Tag can be well stored in the data storage of Key-Value. Basic attributes of the tag include: tag name, ID, data type, weight, timeliness, tag type, creation date, creator, last date of modification, etc. When the data is imaged, the data tags are classified in the following ways.
a, statistics type labels: such tags are the most basic and common tag types. Such as data usage frequency (number of times of day of the last 7 days), number of table fields, etc. The labels form the basis of the data image;
b, rule type label: the class labels are generated based on the source data information and the determined rules. For example, "high heat data tag" is defined as "data having a number of times of use greater than 100 times on a daily basis of approximately 7 days"; . In the actual label development process, the operator is more familiar with the service, and the data operator is more familiar with the structure, distribution and characteristics of the data, so that the rules of the rule type labels are commonly negotiated and determined by the operator and the data operator;
c, algorithm class labels: the labels are mined and generated through a machine learning algorithm and are used for analyzing and judging certain derivative attributes of the data. Such as data tags with management features, such as data importance level, security level assessment, etc.
By adopting a deep semantic analysis model, automatic distribution of labels is realized by using a data virtual table and a data encapsulation table in the multi-source heterogeneous data intelligent acquisition system together through automatic semantic recognition and analysis, and semi-automatic manual checking and adjustment are assisted, so that the authenticity and accuracy of data images are ensured, data feature extraction is completed, the establishment of a mapping relation between source data and labels is realized, and namely, the data labeling processing of source data information of the source data is realized.
It should also be noted that, the data tag also requires tag deployment, application popularization, tag maintenance, and tag evaluation and feedback:
tag deployment: the approved data tags are published and provided to business or technical departments for use, or data services are provided to related system calls.
And (3) label application popularization: firstly, by combining a knowledge graph technology, the correlation among the multi-source heterogeneous data keywords is enhanced by establishing a multi-source heterogeneous data relation graph, and better data insight is obtained. The knowledge map is used as a knowledge mapping map with an reasoning function, and a knowledge structure of the multi-source heterogeneous data and a relation between the knowledge structure and the knowledge mapping map are displayed by means of a visualization technology; secondly, developing various forms of data tag services, wherein the main application direction of the data tag in the aspect of data management is provided with auxiliary data intelligent responsibility, improving a data architecture system, enhancing data security management and control, automatically distributing data quality problems, promoting data sharing, realizing data sharing traceability analysis, enabling data application and the like. Docking to downstream business system or analysis application, and popularizing data label application. For example, docking a data management capability maturity assessment tool, supporting intelligent acquisition of DCMM assessment data;
And (3) tag maintenance: according to business change or poor data label evaluation result, the data label needs to be maintained and updated, and if the application value is lost, the retirement operation can be performed.
Tag evaluation and feedback: and monitoring the service condition of the data tag, collecting feedback data, and adjusting a tag service algorithm and rules according to the monitoring result to promote the overall optimization of the tag system.
In some embodiments, the creating the data tag according to the data encapsulation table or the data virtual table by adopting a properly defined method according to the complexity of the data tag, and constructing the data tag library includes:
and carrying out distribution analysis, comparison analysis, statistical analysis, pareto analysis, normalization detection and correlation analysis on the data encapsulation table or the data virtual table, creating a data tag of the power grid system, and constructing the data tag library.
And according to the power grid core data information acquired by the core data identification algorithm, analyzing the data characteristics by adopting the following method, and establishing a power grid data characteristic label.
1. Distribution analysis:
the distribution analysis is to study the distribution characteristics and distribution types of the data, and separate quantitative data and qualitative data into basic statistics.
2. Comparison analysis:
two interrelated numbers (indices) are compared (absolute number comparison, relative number comparison)
3. Statistical analysis:
the statistical index is used for carrying out statistical description on quantitative data, and analysis is often carried out from two aspects of central trend and off-central trend.
4. Pareto analysis:
also becomes a contribution degree analysis, corresponding to pareto law: 2/8 law.
5. And (3) normal detection:
the test of judging whether the population obeys the normal distribution by using the observation data is called a normal test, which is a special goodness-of-fit hypothesis test important in statistical decision. The method mainly comprises 3 methods of histogram initial judgment, QQ diagram judgment and K-S test.
6. Correlation analysis:
correlation analysis refers to analyzing two or more variable elements with correlation, so as to measure the correlation degree of two variable factors. There is a certain association or probability between elements of the correlation to be able to perform the correlation analysis. The method mainly comprises 3 methods of initial judgment of the diagram, pearson correlation coefficient (Pearson correlation coefficient) and Sperman rank correlation coefficient (Styleman correlation coefficient).
S130, analyzing the generation reasons and characteristics of data quality and data management problems in the power grid system, analyzing and mining mass data in the power grid system by utilizing a big data technology, constructing a data quality anomaly identification model by utilizing a machine learning technology, and comprehensively evaluating the core key data by utilizing the data tag library to obtain an evaluation tag of the core key data;
Data tags are a technical means of describing objects with concisely abbreviated data content. Essentially, a method for analyzing, counting and abstracting data is added on the basis of extracting long text into refined and effective information, and the method has the characteristics of short text, semanteme, repeatable marking and the like. Data tags are typically a set of artificially defined, highly condensed, feature items that are imaged by one or more tag applications to hide information behind the data, unlike attributes, which are business conclusions drawn based on several attributes of an entity, based on artificially given business rules, or business rules trained through machine learning techniques. Data tagging is a simple and effective method of improving the efficiency of incremental data clustering by assigning each newly added data point to the cluster most similar thereto.
Data tags have evolved to date, deriving more advanced applications. For example, through data association, an overhaul label is marked on remote signaling warning information caused by overhaul, an accompanying label is marked on remote signaling warning information caused by manual operation of a switch, an invalid out-of-limit label is marked on remote signaling warning information which is out of limit and returns to a normal position, and further data mining analysis based on the label is facilitated. And the sales field is used for customer credit rating, user behavior analysis and personalized marketing service, power equipment portrait technology, operation monitoring analysis in the power grid dispatching field and the like. The application of the data tag not only can help people to quickly acquire effective information from mass data, but also can realize data analysis on the basis of the acquired data, thereby providing a reliable basis for subsequent data mining work. At present, some data labels in specific fields have source data standards and explicit description specifications so as to facilitate more scientific description and more efficient management, but the application and the specifications of the data labels in the electric power energy field are still in a relatively primary stage, and the data labels are still to be further popularized and applied.
Because of the diversity of data sources and the isomerism of various source data, the source of the data and the transmission and calculation process of the data can be recorded by using the data acquisition and tracing technology in big data, and auxiliary support is provided for later mining and decision making. The process of establishing the data label is a process of combing the data physical table by a business expert and a data manager together and abstracting the data physical table into label expression. Is a new beneficial exploration and attempt to the construction direction of the big power data. The expression is a process of setting up a bridge for business and data, and is also a process of carrying out explicit expression and solidification on experience of business experts and a data physical table structure. The data labels are built by providing the unified service labels to downstream users instead of directly providing the original data tables.
And S140, forming a health degree portrait of the data quality through a dynamic label technology, and comprehensively evaluating the health degree of the data quality so as to monitor the dispatching safety of the power grid system according to the health degree of the data quality.
In some embodiments, the forming the health representation of the data quality by dynamic tagging techniques includes:
generating a health degree portrait of the data quality, and carrying out the health degree portrait on the data label from a statistics label, a rule label and an algorithm label;
The statistical type labels comprise data use frequency and table field number, the rule type labels comprise generation based on source data information and determined rules, and the algorithm type labels are generated through machine learning algorithm mining and are used for analyzing and judging derived attributes of the data.
The method comprises the steps of analyzing and mining massive data by deep analysis of data quality generation reasons and data management problems, constructing a data quality abnormality recognition model by a machine learning technology by utilizing a big data technology, comprehensively evaluating data fields by utilizing a tag library, marking evaluation tags for each data field, forming a data quality health degree portrait by utilizing a dynamic tag technology, comprehensively evaluating marketing data quality health degree, comprehensively knowing the current situation of data quality by management staff, mining the situation of marketing full-scale data quality according to a data quality evaluation system, providing a targeted quality improvement strategy for stock data, providing a verification rule of the full-scale data field for newly added data, improving admission threshold and improving the reliability of the newly added data. Thereby realizing the treatment data of 'symptomatic drug delivery', perfecting the data and realizing the optimization and improvement of the data value.
Management of specific data quality rules
Aiming at the data quality problem and the service development requirement, the method has the advantages that the implementation work for the data quality management of the unified social credit code of the clients, including the mobile phone number, the electricity consumption address, the identity card in the certificate, is developed for the special management.
The data table fields are used as the smallest particles for evaluation, the evaluation direction mainly comprises the data missing rate, the distortion rate, the error rule rate, the business logic contradiction rate and the failure rate, and meanwhile, an evaluation system based on the association relationship 'point outline' is made by taking the relationship between the data tables into consideration.
(1) Mobile phone number anomaly identification and intelligent correction in contact information
The contact way of implementing the object for the user has the following problems: missing contact, irregular contact multiple numbers, irregular data format, null contact number, incorrect contact number, etc.
Loss of contact: and when all the mobile numbers or the fixed telephone numbers associated with the users are empty, the problem of missing is solved.
Contact number one unnormal: when the mobile number or the fixed telephone number information associated with the user is filled with a plurality of numbers, the problem of irregular is solved.
Data format is not canonical: when the mobile number or the fixed telephone number associated with the user has a bit error (such as a mobile phone number of other than 11 bits) and a format error (such as a mobile phone number of other than 1 beginning or a non-number), the mobile phone number is the format non-standard problem data.
The contact number is a null number: when the short message sends the receipt information, the receipt of the mobile number associated with the user is failed to be sent and the reasons are an invalid number, a null number, a number abnormality, an illegal number and the like, namely the null number problem.
Contact number error: when the mobile number sending information associated with the user in the short message receiving information definitely indicates that the mobile number sending information is irrelevant to the user number, the problem of association errors is solved.
(2) Power consumption address anomaly identification model and intelligent correction
The address exception correction is processed step by step in a mode from outside to inside, wherein the outside refers to the problem of the outer layer of the address, such as the problem that the address needs to be modified in a large range, such as the specification of the address, whether the address is complemented, whether the address codes are consistent, and the like; "internal" refers to a problem in detail in an address, such as refining to a street or road, to determine whether the street belongs to the region. From the canonical format class problem at the outer layer of the address to the detail problem at the inner layer of the address, the processing range is from whole to local.
(3) Identification card number anomaly identification model and intelligent correction
Writing an identity card abnormal recognition algorithm according to the identity card generation rule, then putting all the identity cards into the algorithm for calculation, and taking the obtained result as a basis for judging whether the identity card is correct or incorrect.
(4) Customer unified social credit code recognition model and intelligent matching
The 18-bit unified social credit code is used as an effective code standard to carry out treatment analysis implementation work, the effectiveness of the existing relevant stock data of marketing is verified, external data is introduced for comparison analysis, and finally the unified social credit code reference information with high accuracy is provided for the existing low-voltage non-resident and high-voltage 118 multi-ten thousand normal electricity users.
Intelligent data quality rule management based on data image
By implementing the existing label library module of the marketing system, a functional system for managing the data health labels is designed, the data layer at the rear end of the functional system depends on the marketing database, the front end of the functional system depends on the label library module, and the functions of label display, management, perspective analysis and the like are mainly covered.
Summarizing based on the missing rate, the distortion rate, the error rate, the business logic contradiction rate, the failure rate and the like of the data, wherein the missing rate corresponds to the empty label of the data; the error rate and the business logic contradiction rate correspond to the data error label; the distortion rate and the failure rate correspond to the agent label.
(1) Abnormal label system for identification card number
Based on the electricity consumption address anomaly identification model, constructing a user basic file identity card number anomaly label system, which comprises the following steps: the identity card is a label system of empty, false identity card, agent identity card and the like.
(2) Abnormal label system for mobile phone number
Based on the mobile phone number anomaly identification model, a mobile phone number anomaly tag system of a user basic file is constructed, and the method comprises the following steps: the mobile phone number is empty, the mobile phone number is wrong, the agent hands mobile phone number and other label systems.
(3) Power consumption address anomaly tag system
Based on the electricity consumption address anomaly identification model, constructing a user basic file electricity consumption address anomaly label system, which comprises the following steps: county code errors, street code errors, district code errors, house number being empty, abnormal affiliated station areas and other label systems.
(4) Unified social credit code anomaly tagging system
Based on the unified social credit code anomaly identification model, a user basic file unified social credit code anomaly tag system is constructed, which comprises: the unified social credit code is empty, the unified social credit code is wrong, and the agent unifies the label system of the social credit code and the like.
Intelligent correction of stock data quality problems
Aiming at the correction and promotion of stock data in a client domain, a service domain and a power grid domain of a marketing data model, a practical and effective correction scheme is completed through different problems. According to the abnormal situation of the user file of the marketing system which is analyzed through the combing, pushing the user file to an abnormal library (to-be-census library) of the marketing census client, and rectifying the quality problem of the existing data of the marketing system through the power grid terminal.
(1) Intelligent correction data for constructing electricity address, mobile phone number and identity card number
Based on the data of the online power grid, the 95598 worksheet, the short message platform, the third party of the online power grid, the marketing basic file and the like, and through technical means such as models, rules or text mining, an intelligent correction database of electricity consumption addresses, mobile phone numbers and identity card numbers is constructed.
(2) Intelligent correction data pushing to marketing basic data platform
And synchronizing the information such as the user number, the file exception type, the suspected correct information, the information source and the like screened by the tag library to the marketing basic data platform.
(3) Marketing system correction data processing
The marketing system corrects the tactics according to the unusual label of data and data intelligence, if the original file is empty or detail is wrong, then modify automatically in batches; if the error is suspected, the intelligent correction of the stock data is realized by pushing the data to business census.
3.5.5 New data check Admission implementation based on ingress modification
And combing the customer domain, the service domain and the power grid domain of the marketing data model, and the data sources of each table and each field, formulating effective verification modes aiming at different input modes such as manual input, machine input, other library input and the like, and controlling the quality from the data sources.
(1) Implementation of dismantling and checking of mutual inductor in field investigation link
The method comprises the steps of removing a transformer in an on-site investigation link, adding secondary circuit removal verification corresponding to the transformer, and solving the data asynchronous problems that the running transformer or the running electric energy meter does not have an associated metering point, the running transformer does not correspond to secondary circuit information, redundant secondary circuit information exists and the like.
(2) Perfecting implementation of power supply unit name attribute modification function
The power supply unit name attribute modification function is perfected, the new or deleted power supply units are limited, and the problem that data modification is required to be submitted when the power supply unit name is modified is solved.
Compared with the prior art, the power grid dispatching safety monitoring method based on the data tag is based on a power grid enterprise-level information model and a power grid core data identification research result, and the description and the portrait of the power grid core data characteristics are researched. Various types of data acquired by a data acquisition adapter in the multi-source heterogeneous data intelligent acquisition system are subjected to research on a deep semantic analysis model by establishing a classification label management system of power grid data, and related characteristics such as business attributes, management attributes and technical attributes are distinguished by assisting in semi-automatic manual checking and adjustment through automatic semantic identification and analysis, and the data are marked with characteristic labels reflecting meanings of the data, so that panoramic data portraits are constructed, data management staff can clearly know the related characteristics of the data, and therefore the dispatching safety of the power grid system is monitored.
Fig. 2 is a schematic structural diagram of a power grid dispatching safety monitoring system based on data labels according to an embodiment of the present invention, as shown in fig. 2, the system includes an extracting module 210, a constructing module 220, an analyzing module 230 and a monitoring module 240, wherein:
the extraction module 210 is configured to extract core key data in the power grid system;
the construction module 220 is used for constructing a data tag library;
the analysis module 230 is configured to analyze the cause and the characteristics of the generation of data quality and the generation of data management problems in the power grid system, analyze and mine mass data in the power grid system by using a big data technology, construct a data quality anomaly identification model by using a machine learning technology, and comprehensively evaluate the core key data by using the data tag library to obtain an evaluation tag of the core key data;
the monitoring module 240 is configured to form a health degree portrait of the data quality through a dynamic tag technology, and comprehensively evaluate the health degree of the data quality, so as to monitor the dispatching security of the power grid system according to the health degree of the data quality.
In some embodiments, the build module includes:
the service unit is used for starting from a core service entity of the power grid system, integrally planning a label system architecture based on a public information model text of the power grid system from a service perspective, and determining main problems to be solved by the label system;
The standard unit is used for making a data tag definition standard, refining tag definition according to the standard and defining tag service rules;
the development unit is used for preparing a data development environment, communicating different source data around the core business entity, carrying out link data source, data type identification and data acquisition work, and acquiring source data information from each business system in the power grid system;
the loading unit is used for loading the source data information into a data encapsulation table or carrying out data mapping according to the upper application requirement and processing the source data information into a data virtual table;
and the construction unit is used for constructing the data tag library by adopting a proper definition method according to the complexity of the data tag according to the data encapsulation table or the data virtual table.
The implementation process of the system embodiment corresponding to the above method is the same as that of the above method embodiment, and reference is made to the above method embodiment for details, which are not repeated here.
Fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, where, as shown in fig. 3, the device includes: processor 301, communication interface (Communications Interface) 302, memory (memory) 303 and communication bus 304, wherein processor 301, communication interface 302, memory 303 accomplish the communication between each other through communication bus 304. The processor 301 may call a computer program on the memory 303 and executable on the processor 301 to perform the method for grid dispatching safety monitoring based on data tags provided in the above embodiments.
Further, the logic instructions in the memory 303 may be implemented in the form of software functional units and stored in a computer readable storage medium when sold or used as a stand alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method of the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, randomAccess Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The embodiment of the invention also provides a non-transitory computer readable storage medium, on which a computer program is stored, which when executed by a processor is implemented to perform the method for power grid dispatching safety monitoring based on data labels provided in the above embodiments.
The above-described embodiments of electronic devices and the like are merely illustrative, in which elements illustrated as separate elements may or may not be physically separate, and elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on such understanding, the foregoing technical solutions may be embodied essentially or in part in the form of a software product, which may be stored in a computer-readable storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform the various embodiments or some part of the methods of the embodiments.
The foregoing descriptions of specific exemplary embodiments of the present invention are presented for purposes of illustration and description. It is not intended to limit the invention to the precise form disclosed, and obviously many modifications and variations are possible in light of the above teaching. The exemplary embodiments were chosen and described in order to explain the specific principles of the invention and its practical application to thereby enable one skilled in the art to make and utilize the invention in various exemplary embodiments and with various modifications as are suited to the particular use contemplated. It is intended that the scope of the invention be defined by the claims and their equivalents.

Claims (10)

1. The utility model provides a power grid dispatching safety monitoring method based on data labels, which is characterized by comprising the following steps:
extracting core key data in a power grid system;
constructing a data tag library;
analyzing the generation reasons and characteristics of data quality and data management problems in the power grid system, analyzing and mining mass data in the power grid system by utilizing a big data technology, constructing a data quality anomaly identification model by utilizing a machine learning technology, and comprehensively evaluating the core key data by utilizing the data tag library to obtain an evaluation tag of the core key data;
And forming a health degree portrait of the data quality by a dynamic label technology, and comprehensively evaluating the health degree of the data quality so as to monitor the dispatching safety of the power grid system according to the health degree of the data quality.
2. The method for monitoring power grid dispatching safety based on data labels according to claim 1, wherein the constructing a data label library comprises:
starting from a core service entity of the power grid system, integrally planning a label system architecture based on a public information model text of the power grid system from a service perspective, and determining main problems to be solved by the label system;
formulating a data tag definition specification, refining tag definition according to the specification, and defining tag business rules;
preparing a data development environment, opening different source data around the core business entity, carrying out link data source, data type identification and data acquisition work, and acquiring source data information from each business system in the power grid system;
loading the source data information into a data encapsulation table, or performing data mapping according to upper application requirements, and processing the source data information into a data virtual table;
and establishing a data tag by adopting a proper definition method according to the complexity of the data tag according to the data encapsulation table or the data virtual table, and constructing the data tag library.
3. The method for monitoring power grid dispatching safety based on data labels according to claim 2, wherein the creating data labels by adopting a properly defined method according to the complexity of the data labels according to the data encapsulation table or the data virtual table, and constructing the data label library comprises the following steps:
and carrying out distribution analysis, comparison analysis, statistical analysis, pareto analysis, normalization detection and correlation analysis on the data encapsulation table or the data virtual table, creating a data tag of the power grid system, and constructing the data tag library.
4. The method for monitoring power grid dispatching safety based on data labels according to claim 2, wherein the forming the health figure of the data quality by a dynamic label technology comprises the following steps:
generating a health degree portrait of the data quality, and carrying out the health degree portrait on the data label from a statistics label, a rule label and an algorithm label;
the statistical type labels comprise data use frequency and table field number, the rule type labels comprise metadata information and determined rule generation, and the algorithm type labels are generated through machine learning algorithm mining and are used for analyzing and judging derived attributes of the data.
5. The method for monitoring power grid dispatching safety based on data labels according to claim 1, wherein the extracting core key data in a power grid system comprises:
collecting source data information of each service system in the power grid system, removing redundant punctuation marks and special characters in the source data information, performing dictionary construction through a public information model text of the power grid system, and then performing word segmentation and word vector training to obtain word vectors of the source data information;
based on a model training mode, firstly carrying out named entity identification on the source data information, then carrying out entity disambiguation, and carrying out verification and deviation correction on the results after the entity disambiguation by using a knowledge base to obtain disambiguation results of the source data information;
establishing a standard word stock of the power grid industry according to the disambiguation result of the source data information and the word vector of the source data information;
and acquiring the core key data based on the power grid industry standard word stock.
6. The method for monitoring the dispatching safety of the power grid based on the data tag according to claim 5, wherein the core key data is obtained based on the standard word stock of the power grid industry, and the method specifically comprises the following three methods:
And acquiring the core key data based on the standard word stock of the power grid industry by one or more of a frequent pattern analysis technology, a network node centrality measurement technology and a network bridge node detection technology.
7. A data tag-based power grid dispatching safety monitoring system, comprising:
the extraction module is used for extracting core key data in the power grid system;
the construction module is used for constructing a data tag library;
the analysis module is used for analyzing the generation reasons and characteristics of the data quality generation and the data management problems in the power grid system, analyzing and mining the mass data in the power grid system by utilizing a big data technology, constructing a data quality abnormality identification model by utilizing a machine learning technology, and comprehensively evaluating the core key data by utilizing the data tag library to obtain an evaluation tag of the core key data;
and the monitoring module is used for forming a health degree portrait of the data quality through a dynamic label technology, comprehensively evaluating the health degree of the data quality, and monitoring the dispatching safety of the power grid system according to the health degree of the data quality.
8. The data tag-based grid dispatching safety monitoring system of claim 7, wherein the building module comprises:
the service unit is used for starting from a core service entity of the power grid system, integrally planning a label system architecture based on a public information model text of the power grid system from a service perspective, and determining main problems to be solved by the label system;
the standard unit is used for making a data tag definition standard, refining tag definition according to the standard and defining tag service rules;
the development unit is used for preparing a data development environment, communicating different source data around the core business entity, carrying out link data source, data type identification and data acquisition work, and acquiring source data information from each business system in the power grid system;
the loading unit is used for loading the source data information into a data encapsulation table or carrying out data mapping according to the upper application requirement and processing the source data information into a data virtual table;
and the construction unit is used for constructing the data tag library by adopting a proper definition method according to the complexity of the data tag according to the data encapsulation table or the data virtual table.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing a data tag based power grid dispatching security monitoring method as claimed in any one of claims 1 to 6 when the program is executed by the processor.
10. A non-transitory computer readable storage medium, having stored thereon a computer program which, when executed by a processor, implements a data tag based power grid dispatching safety monitoring method according to any of claims 1 to 6.
CN202310518939.6A 2023-05-09 2023-05-09 Power grid dispatching safety monitoring method and system based on data tag and electronic equipment Pending CN116739408A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310518939.6A CN116739408A (en) 2023-05-09 2023-05-09 Power grid dispatching safety monitoring method and system based on data tag and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310518939.6A CN116739408A (en) 2023-05-09 2023-05-09 Power grid dispatching safety monitoring method and system based on data tag and electronic equipment

Publications (1)

Publication Number Publication Date
CN116739408A true CN116739408A (en) 2023-09-12

Family

ID=87912293

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310518939.6A Pending CN116739408A (en) 2023-05-09 2023-05-09 Power grid dispatching safety monitoring method and system based on data tag and electronic equipment

Country Status (1)

Country Link
CN (1) CN116739408A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117949886A (en) * 2024-03-27 2024-04-30 国网山西省电力公司营销服务中心 Intelligent regulation and control method and system for transformer calibrator, electronic equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117949886A (en) * 2024-03-27 2024-04-30 国网山西省电力公司营销服务中心 Intelligent regulation and control method and system for transformer calibrator, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN113094200A (en) Application program fault prediction method and device
CN110598070B (en) Application type identification method and device, server and storage medium
CN117271767A (en) Operation and maintenance knowledge base establishing method based on multiple intelligent agents
CN112200465B (en) Electric power AI method and system based on multimedia information intelligent analysis
CN111984792A (en) Website classification method and device, computer equipment and storage medium
CN110147540B (en) Method and system for generating business security requirement document
CN112417852B (en) Method and device for judging importance of code segment
CN110175272A (en) One kind realizing the convergent control method of work order and control device based on feature modeling
Kong et al. Entity extraction of electrical equipment malfunction text by a hybrid natural language processing algorithm
CN116739408A (en) Power grid dispatching safety monitoring method and system based on data tag and electronic equipment
CN117592561B (en) Enterprise digital operation multidimensional data analysis method and system
CN114625406A (en) Application development control method, computer equipment and storage medium
CN111930944B (en) File label classification method and device
CN116795978A (en) Complaint information processing method and device, electronic equipment and medium
CN115495587A (en) Alarm analysis method and device based on knowledge graph
CN116522131A (en) Object representation method, device, electronic equipment and computer readable storage medium
CN112561538B (en) Risk model creation method, apparatus, computer device and readable storage medium
CN113407718A (en) Method and device for generating question bank, computer readable storage medium and processor
Wu et al. Template based attribute value words acquisition in entity attribute knowledge base construction
Jin et al. Construction and application of knowledge graph of domestic operating system testing
CN115114495B (en) Airworthiness data management auxiliary method and system based on deep learning
CN113872794B (en) IT operation and maintenance platform system based on cloud resource support and operation and maintenance method thereof
Zhang et al. Predicting Relations in SG-CIM Model Based on Graph Structure and Semantic Information
CN116757525A (en) Intelligent correction method and system for power grid data quality based on data portraits
CN117314568A (en) Method, device and storage medium for analyzing value-added demand of power customer

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination