CN112905804A - Dynamic updating method and device for power grid dispatching knowledge graph - Google Patents

Dynamic updating method and device for power grid dispatching knowledge graph Download PDF

Info

Publication number
CN112905804A
CN112905804A CN202110196210.2A CN202110196210A CN112905804A CN 112905804 A CN112905804 A CN 112905804A CN 202110196210 A CN202110196210 A CN 202110196210A CN 112905804 A CN112905804 A CN 112905804A
Authority
CN
China
Prior art keywords
power grid
entity
data
grid dispatching
knowledge graph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110196210.2A
Other languages
Chinese (zh)
Other versions
CN112905804B (en
Inventor
旷文腾
严晴
李红
张韬
谢峰
陆继翔
杨志宏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nari Technology Co Ltd
State Grid Electric Power Research Institute
Original Assignee
Nari Technology Co Ltd
State Grid Electric Power Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nari Technology Co Ltd, State Grid Electric Power Research Institute filed Critical Nari Technology Co Ltd
Priority to CN202110196210.2A priority Critical patent/CN112905804B/en
Publication of CN112905804A publication Critical patent/CN112905804A/en
Application granted granted Critical
Publication of CN112905804B publication Critical patent/CN112905804B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Business, Economics & Management (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Economics (AREA)
  • Databases & Information Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Animal Behavior & Ethology (AREA)
  • Evolutionary Biology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a dynamic updating method and a dynamic updating device for a power grid dispatching knowledge graph, which are used for solving the problem of synchronization of the power grid dispatching knowledge graph and a large amount of newly added power grid dispatching knowledge. The invention comprises the following steps: firstly, unifying data needing to be updated in power grid dispatching into a Json file format; then, Chinese word segmentation is carried out on the sentences by combining the word segmentation packets of the domain entity dictionary; then, recognizing the word of the power grid entity by using a named entity recognition model based on a RoBERTA _ base _ e-BilSTM-CRF model, and equivalently mapping the entity to a standard word of a power grid core dictionary; then, using the trained relation recognition model to recognize the relation between the entities, generating a triple and checking; and finally updating the generated triples into the knowledge graph. The method ensures the flexible adaptability and timeliness of the scheduling optimization decision map, and is beneficial to sharing and inheritance of scheduling knowledge and experience accumulated for a long time in the field of regulation and control decision.

Description

Dynamic updating method and device for power grid dispatching knowledge graph
Technical Field
The invention belongs to the technical field of power grid dispatching, and particularly relates to a dynamic updating method and device of a power grid dispatching knowledge graph.
Background
The active power scheduling of the power system is the basis for ensuring the safe and efficient operation of the system and comprises three links of day-ahead scheduling, day-in-day scheduling and real-time control. In the day-ahead and day-in stages, generally, with the operation economy in an optimization cycle as a target, according to a new energy and load prediction result, a multi-period Unit start-stop plan and a multi-period power generation plan are formulated through a Security-Constrained Unit requirement (SCUC) and a Security-Constrained Economic Dispatch (SCED) to realize supply and demand balance configuration, and the process follows an optimization modeling idea. Under the new trend of current energy source revolution and electric power market revolution, along with the continuous increase of resource permeability of renewable energy sources, flexible loads, energy storage and the like, the types and the number of power grid dispatching objects are exponentially increased, the uncertainty of a power grid operation mode is obviously increased, and dispatching optimization decisions are more complex. The method is limited by conditions such as prediction errors, boundary conditions, mathematical models and optimization algorithms, and the problems that the difference between an analysis result and the actual power grid condition is large, the optimization result is not solved or the solving time is too long and the like often occur in the actual scheduling. In the market environment, the problem of inaccurate prediction of new energy and load cannot be avoided (the prediction error of the new energy is as high as 30% -50%), and the above restriction factors still exist, so that a large amount of manual adjustment is needed before and after software optimization. Taking Ningxia power grid as an example, in recent years, under the influence of photovoltaic power price adjustment and Ningxia wind power early warning turning to 'green', Ningxia new energy is greatly increased, part of cross sections are close to stable limit operation for a long time, meanwhile, the problems of heating, maintenance and the like in winter are considered, power grid scheduling is not simple multi-target optimization calculation any more, but the process of manual reanalysis, adjustment and verification according to the calculation result of scheduling software is adopted, the process of manual decision-making usually takes a long time, the efficiency is low, and the complexity of optimal scheduling decision-making of a power system is suddenly increased.
Disclosure of Invention
The invention aims to overcome the defects in the prior art, provides a method and a device for dynamically updating a knowledge graph based on power grid dispatching, and solves the technical problem that the intellectualization level of a dispatching system in the prior art is not enough.
In order to solve the technical problems, the technical scheme adopted by the invention is as follows:
the embodiment of the invention provides a dynamic updating method of a power grid dispatching knowledge graph, which comprises the following steps:
analyzing data needing to be updated in power grid dispatching, and converting the data into a uniform format;
extracting the power grid dispatching data with the uniform format, and performing Chinese word segmentation;
recognizing entity words in the power grid dispatching data after Chinese word segmentation by adopting a trained named entity recognition model, and performing entity extraction;
extracting the relationship between the entities by adopting a trained entity relationship identification model to generate an entity-relationship-entity triple;
and updating the power grid dispatching knowledge graph based on the generated entity-relation-entity triad.
Further, the analyzing the data to be updated in the power grid dispatching and converting the data into a uniform format includes:
for structured data derived from a power grid real-time database, directly generating entity-relation-entity triples to be stored in a knowledge graph by adopting a regularized extraction mode; the structured data includes: load prediction data, new energy prediction data and a safety constraint section;
traversing the text content of unstructured text data in a data cleaning and interface conversion mode, dividing the text according to a chapter structure, and converting a docx document into a Json file; the unstructured text data comprises: the system comprises a tie line plan, a maintenance plan, a power grid operation mode and a power grid abnormal event.
Further, the extracting of the power grid dispatching data with the unified format for Chinese word segmentation includes:
constructing a domain entity dictionary;
and adding the domain entity dictionary into the jieba dictionary, and performing Chinese word segmentation on the power grid dispatching data.
Further, the domain entity dictionary comprises a core dictionary and an expansion dictionary, the core dictionary comprises standard words of power grid dispatching knowledge, and the expansion dictionary comprises expanded power grid non-standard words;
the constructing of the domain entity dictionary comprises the following steps:
adopting named entity recognition based on a deep neural network to automatically index a power grid entity, a scheduling event and power grid attributes;
classifying the power grid entity according to a transformer substation, a power plant, a circuit, a main transformer, a bus, a switch, a circuit breaker and a unit based on the relevance classification combined with the power business model;
verifying and matching the classified power grid entities based on a core dictionary;
and submitting the checked abnormal result to manual examination, and constructing an expansion dictionary.
Further, training the named entity recognition model comprises:
collecting power grid dispatching original text data and performing corpus segmentation;
labeling the divided corpora according to the power grid equipment, the scheduling event, the power grid attribute and the nonsense;
and (3) the labeled corpus is divided into 8: 1: 1, dividing the training set into a training set, a verification set and a test set;
training the training set based on a Chinese pre-training open source model RoBERTA _ base, and constructing a named entity recognition model.
Further, the collecting of the original text data for power grid dispatching and the corpus segmentation includes:
in the order of ",". ","; ",": "," ("and") "is used as a separator to cut the text data into blocks and generate corpus.
Further, the labeling the divided corpus according to the power grid equipment, the scheduling event, the power grid attribute and the nonsense includes:
marking the corpus by taking Chinese characters and English words as a unit by adopting BIO marking, comprising the following steps:
three forms of B-X, I-X and O;
b and I represent the positions of the marker words in the corpus, B represents the beginning, I represents the middle, X is an entity symbol, the entity symbol of the power grid equipment is DD, the entity symbol of the scheduling event is DE, and the entity symbol of the power grid attribute is DA; o is a non-entity.
Further, the performing entity extraction includes:
and mapping the entity words identified by the named entity identification model with the entity words in the domain entity dictionary one by one, and extracting the corresponding entity words in the domain entity dictionary.
Further, training the entity relationship recognition model comprises:
carrying out relation labeling on entities in the accumulated power grid dispatching plan text based on a predetermined global relation;
and training the labeled entity relationship based on the convolutional neural network to obtain an entity relationship recognition model.
The embodiment of the invention also provides a device for dynamically updating the power grid dispatching knowledge graph, which comprises the following steps:
the analysis module is used for analyzing the data needing to be updated in the power grid dispatching and converting the data into a uniform format;
the word segmentation module is used for extracting the power grid dispatching data with the uniform format and carrying out Chinese word segmentation;
the recognition module is used for recognizing entity words in the power grid dispatching data after Chinese word segmentation by adopting a trained named entity recognition model and extracting the entities;
the relation extraction module is used for extracting the relation between the entities by adopting the trained entity relation recognition model to generate an entity-relation-entity triple;
and the number of the first and second groups,
and the updating module is used for updating the power grid dispatching knowledge graph based on the generated entity-relation-entity triad.
Furthermore, the parsing module is specifically configured to,
for structured data derived from a power grid real-time database, directly generating entity-relation-entity triples to be stored in a knowledge graph by adopting a regularized extraction mode; the structured data includes: load prediction data, new energy prediction data and a safety constraint section;
traversing the text content of unstructured text data in a data cleaning and interface conversion mode, dividing the text according to a chapter structure, and converting a docx document into a Json file; the unstructured text data comprises: the system comprises a tie line plan, a maintenance plan, a power grid operation mode and a power grid abnormal event.
Further, the identification module is further configured to,
collecting power grid dispatching original text data and performing corpus segmentation;
labeling the divided corpora according to the power grid equipment, the scheduling event, the power grid attribute and the nonsense;
and (3) the labeled corpus is divided into 8: 1: 1, dividing the training set into a training set, a verification set and a test set;
training the training set based on a Chinese pre-training open source model RoBERTA _ base, and constructing a named entity recognition model.
Further, the relationship extraction module is further configured to,
carrying out relation labeling on entities in the accumulated power grid dispatching plan text based on a predetermined global relation;
and training the labeled entity relationship based on the convolutional neural network to obtain an entity relationship recognition model.
The invention has the beneficial effects that:
the invention provides a knowledge graph dynamic updating method based on power grid scheduling, which is characterized in that data needing to be updated in the power grid scheduling is unified into a Json file format, Chinese word segmentation is carried out based on a domain entity dictionary, entity recognition is carried out based on a RoBERTA _ base _ e-BilSTM-CRF model, entity relation extraction is carried out based on a deep learning model, an entity-relation-entity triple is finally generated, the power grid scheduling knowledge graph is updated, the problem of automatic synchronous updating of the power grid scheduling knowledge graph and a large amount of newly added power grid scheduling knowledge is solved, the flexible adaptability and timeliness of a scheduling optimization decision graph are ensured, and the sharing and inheritance of scheduling knowledge and experience accumulated for a long time in the control decision field are facilitated.
Drawings
Fig. 1 is a schematic diagram of a dynamic update method of a power grid scheduling knowledge graph according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a topic dictionary construction provided by an embodiment of the present invention;
fig. 3 is a schematic diagram of an entity equivalence mapping process provided in an embodiment of the present invention.
Detailed Description
The invention is further described below. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.
The embodiment of the invention provides a dynamic updating method of a power grid dispatching knowledge graph, which is shown in figure 1 and comprises the following 5 steps:
(1) analyzing the source data: analyzing useful information for data needing to be updated in power grid dispatching, and uniformly processing the useful information into a Json file format;
the scheduling optimization decision relates to multiple heterogeneous data such as load prediction data, new energy prediction data, a tie line plan, an overhaul plan, a power grid operation mode, a safety constraint section and a power grid abnormal event, wherein the load prediction data, the new energy prediction data and the safety constraint section are structured data, and the tie line plan, the overhaul plan, the power grid operation mode and the power grid abnormal event are text unstructured data.
Structured data can be derived from a power grid real-time database, and the data is extracted regularly, directly generated into triples and stored in a knowledge graph.
Traversing the document contents of unstructured text data in a data cleaning and interface conversion technology mode, and then dividing the documents according to the chapter structure of each document, so that the docx documents are converted into Json files, and the problem that original data formats are not uniform is solved.
(2) Aiming at the extracted power grid dispatching data, Chinese word segmentation is carried out on the sentences by combining jieba word segmentation of a power grid field dictionary;
the domain entity dictionary comprises a core dictionary and an expansion dictionary, the core dictionary comprises standard words of power grid dispatching knowledge, such as electric power terms and power grid models, the expansion dictionary comprises power grid non-standard words of newly-added expansion data, such as alias/combined words of core dictionary words, for example, the combined words of 'Hades and Huang 4681/82 double lines' is 'Hades and Huang 4681 line' and 'Hades and Huang 4682 line'.
Firstly, the input sentence is subjected to Chinese word segmentation by using a jieba, the word segmentation uses an accurate word segmentation mode, and meanwhile, a field entity dictionary is added into the jieba dictionary, so that the entity words in the power grid field are not wrongly segmented.
The overall idea of domain entity dictionary construction is shown in fig. 2, including,
the method comprises the following steps of utilizing a named entity recognition technology based on a deep neural network to realize automatic indexing of power grid entities, events and attributes;
the classification of the power grid entity words by 8 concepts of transformer substations, power plants, lines, main transformers, buses, switches, circuit breakers and units is realized by utilizing the association degree classification technology combined with the power service model;
the correctness checking technology based on the core dictionary is utilized to realize the checking matching of the power grid entity words;
and finally submitting the checked abnormal result to manual examination, and constructing an expansion dictionary by the non-core dictionary words.
(3) Automatic identification and extraction of scheduling entities based on BERT semantic model
For the processed data with uniform format, the extraction of the scheduling entity in full time interval, automation and high precision is realized, and the specific steps are as follows:
(31) recognizing the power grid entity words by using the trained named entity recognition model,
the method uses a RoBERTA _ base (open source model based on large-scale Chinese universal corpus training) -BilSTM (bidirectional long-time memory) -CRF (conditional random field) model for named entity recognition. And carrying out entity recognition on the entity words after word segmentation by using a trained named entity recognition model RoBERTA _ base _ e, wherein the entity recognition in the dispatching field mainly recognizes a standard entity in the power dispatching field, and the entity classification is as follows: the system comprises power grid equipment, scheduling events, power grid attributes and no real meanings, wherein the parts of speech are respectively marked as DD, DE, DA and O, and the power grid equipment can be subdivided into the following classes: transformer substation, power plant, circuit, owner become, generating line, switch, circuit breaker, unit.
In order to be more suitable for understanding of a power grid scheduling text, on the basis of RoBERTA _ base, a large amount of power corpora are used for migration learning, parameters of a RoBERTA _ base model are finely adjusted, and a language model RoBERTA _ base _ e suitable for the power corpora is trained.
The steps for training RoBERTa _ base _ e are as follows:
a. original corpus segmentation: and manually reading the original text data, summarizing sentence cutting rules, and then segmenting the original text. In this embodiment, ". ","; ",": the symbol of "and" (",") is used as a separator to segment the original text and generate the corpus.
b. And (3) corpus labeling: the identification target is 10 types of transformer substation, power plant, line, main transformer, bus, switch, breaker, unit, scheduling event and power grid attribute. In order to ensure the identification precision of the training model, a transformer substation, a power plant, a line, a main transformer, a bus, a switch, a circuit breaker and a unit are classified as power grid equipment, a scheduling event, power grid attributes and no real meaning, and the corresponding symbols are DD (power grid equipment), DE (scheduling event), DA (power grid attribute) and O (no real meaning). The basic corpus is labeled with Chinese characters and English words as a unit by adopting BIO labeling. Representing X as an entity symbol, three labels for BIO are: B-X: an entity start; II, I-X: an entity middle; ③ O: non-entity, nonsense words. B and I represent the position of the marker in the word, B represents the beginning and I represents the middle.
The three categories correspond to the label tags as follows:
switch label Middle label
Power grid entity B-DD I-DD
Scheduling events B-DE I-DE
Grid properties B-DA I-DA
Each word is labeled in the statement, e.g., "east bridge becomes" a grid entity, "east" is labeled B-DD, and "bridge" and "become" are both labeled I-DD.
c. Named entity recognition model training
Firstly, RoBERTA _ base migration learning is carried out by using a large amount of electric power corpora, parameters of a RoBERTA _ base model are finely adjusted, and a language model RoBERTA _ base _ e suitable for the electric power corpora is trained. Wherein RoBERTA _ base is an open source model based on large-scale Chinese universal corpus training, and the electric corpus is from a prepared domain entity dictionary.
And then, the trained corpora are processed according to the following steps of 8: 1: 1, dividing the model into a training set, a verification set and a test set, using a language model RoBERTA _ base _ e for electric corpus understanding, and training a named entity recognition model RoBERTA _ base _ e-BilSTM-CRF by using the training set.
(32) Equivalently mapping the identified power grid entity words to standard words of a power grid core dictionary to complete entity extraction of updated data,
the purpose of the entity equivalence mapping is to map entity words identified by the named entities with entity words of a power grid dispatching field dictionary one by one, and the problem that the names of the same entities are inconsistent is solved. For 8 types of grid devices, a total of 12 rules for mapping are proposed.
The 12 rules are specifically:
1. the main transformer, the interval, the circuit breaker, the bus, the switch and the disconnecting link are firstly positioned on a plant station, and the rod and the tower are positioned on a line;
2. firstly, mapping is positioned to a station/line, and specific equipment is mapped according to (#1#2), equipment types and controlled objects (according to the incidence relation of the controlled objects);
3. the station distinguishing rule comprises the following words: transformation (without main transformer), wind power, nuclear power, power plant, factory, gas turbine and machine;
4. line distinguishing rules: according to the keyword at the tail of the word: wire, bi-wire, tri-wire;
5. the marks with the # are distinguished according to the #, and the front part of the # is a station/line;
6. fixed paradigm of the circuit: "Wu Bao 2W73 line" Xindu 5116 line ";
7. three paradigms of two-wire: "Wu Bao 2W73/74 line", "Xindu 5115/5116 line", "Zhou Zhuang-ren Zhuang double line";
8. three-line paradigm: "Wu Bao 2W73/74/75 line", "Xindu 5114/5115/5116 line", "Zhou Zhuang-ren Zhuang three line";
9. the substation contains three forms: full name, short name, and full process with path;
10. the place name can be mapped to a station according to semantics;
11. mapping to specific devices, the specific device identification being #1, #2(#1, 2) (No. 1);
12. the bus main transformer is provided with voltage grade information and can be used for processing the voltage grade information.
Taking the electrical defect log as an example, as shown in fig. 3, three two-line writing methods of the Hades and the Hades 4681/4682, the Hades and the yellow 4681/82, the Hades and the yellow are shown in three texts, based on the previously researched conclusion, the types of the electric network entity words are known, the line rules summarized in the corresponding 7 types of rules are utilized, the line is determined to be a single line, a double line or a three line by identifying the line rules, the double line is determined to be two lines, the combined title abbreviation of the two corresponding stations is found by identifying the combined title rules, the Hades and the yellow are found, the yellow and the yellow are found by word meaning matching, the two associated lines are found by combining a knowledge map, and the two lines are finally mapped to the core words, the Jiangsu, the yellow and the Hades 4681 line, the Jiangsu, the yellow and the.
And for the new entity words mapped to the core vocabulary, the entity words are added into the expansion dictionary after mapping, and the mapping relation is generated between the entity words and the corresponding core words, so that the subsequent entity mapping is facilitated.
(4) Deep learning model-based extraction of relationships between scheduling entities
Recognizing the relationship between the entities by using the trained relationship recognition model, generating an entity-relationship-entity triple and checking;
a. extracting all the relation pairs in the scheduling plan text, and summarizing the global relation;
b. establishing a relationship between scheduling entities by a relationship identification method facing the entities in the scheduling plan text based on the global entity relationship research conclusion;
c. and the relation check of the scheduling entity is realized based on the relation check of the knowledge graph, and the reliability of relation extraction is improved.
The global relationship extraction method for the scheduling plan text data is based on a natural language processing technology, and is combined with the morphological characteristics and the semantic characteristics to automatically identify the relationship of entities in a corpus without manually establishing relationship types in advance. The main process is as follows:
(1) firstly, automatically acquiring candidate relation triples by using distance limits between entities and position limits of relation indicator words, and training the relation triples marked with credibility and incredibility by using a naive Bayes classifier to construct a relation representation model;
(2) performing relation identification on the trained classifier by using a relation representation model and data such as characteristics of parts of speech, sequences and the like to obtain a candidate relation triple;
(3) and combining the candidate relation triples, calculating the reliability of each relation triplet through a statistical method, and manually checking.
(4) And manually determining the relationship type of the triples, and summarizing the global relationship.
And based on the known global entity relationship set, researching a relationship identification method of the scheduling entities in the scheduling plan text, and establishing the relationship between the scheduling entities. The relationship identification method adopts a supervised learning method and mainly comprises the following steps:
labeling entity relations in the accumulated scheduling plan texts based on the global relations;
training a relation recognition model based on a convolutional neural network based on the labeled corpus;
and the method is used for the relationship recognition of the scheduling entity of the unlabeled text based on the trained relationship recognition model.
And finally, checking the entity relationship by combining a knowledge graph based on the recognition result of the entity relationship, and ensuring the consistency of the entity relationship. And checking whether the relation type exists in a relation set of the concept map or not by combining the concept map based on the extracted relation type, and prompting manual review if the relation type does not exist. And if the extracted relationship type exists in the relationship set of the concept map, searching a matched entity in the map through the entity semantic features, and if the matched entity relationship is inconsistent, prompting manual review.
(5) Updating power grid dispatching knowledge graph based on triad
And generating an entity-relationship-entity or entity-relationship-attribute triple after the examination as metadata of map updating, and storing the triple into the previously constructed knowledge map.
The storage database adopts an open source database Neo4 j.
The embodiment of the invention also provides a device for dynamically updating the power grid dispatching knowledge graph, which comprises the following steps:
the analysis module is used for analyzing the data needing to be updated in the power grid dispatching and converting the data into a uniform format;
the word segmentation module is used for extracting the power grid dispatching data with the uniform format and carrying out Chinese word segmentation;
the recognition module is used for recognizing entity words in the power grid dispatching data after Chinese word segmentation by adopting a trained named entity recognition model and extracting the entities;
the relation extraction module is used for extracting the relation between the entities by adopting the trained entity relation recognition model to generate an entity-relation-entity triple;
and the number of the first and second groups,
and the updating module is used for updating the power grid dispatching knowledge graph based on the generated entity-relation-entity triad.
Specifically, the parsing module is used for,
for structured data derived from a power grid real-time database, directly generating entity-relation-entity triples to be stored in a knowledge graph by adopting a regularized extraction mode; the structured data includes: load prediction data, new energy prediction data and a safety constraint section;
traversing the text content of unstructured text data in a data cleaning and interface conversion mode, dividing the text according to a chapter structure, and converting a docx document into a Json file; the unstructured text data comprises: the system comprises a tie line plan, a maintenance plan, a power grid operation mode and a power grid abnormal event.
In particular, the identification module is further configured to,
collecting power grid dispatching original text data and performing corpus segmentation;
labeling the divided corpora according to the power grid equipment, the scheduling event, the power grid attribute and the nonsense;
and (3) the labeled corpus is divided into 8: 1: 1, dividing the training set into a training set, a verification set and a test set;
training the training set based on a Chinese pre-training open source model RoBERTA _ base, and constructing a named entity recognition model.
In particular, the relationship extraction module is further configured to,
carrying out relation labeling on entities in the accumulated power grid dispatching plan text based on a predetermined global relation;
and training the labeled entity relationship based on the convolutional neural network to obtain an entity relationship recognition model.
It is to be noted that the apparatus embodiment corresponds to the method embodiment, and the implementation manners of the method embodiment are all applicable to the apparatus embodiment and can achieve the same or similar technical effects, so that the details are not described herein.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting the same, and although the present invention is described in detail with reference to the above embodiments, those of ordinary skill in the art should understand that: modifications and equivalents may be made to the embodiments of the invention without departing from the spirit and scope of the invention, which is to be covered by the claims.

Claims (13)

1. A power grid dispatching knowledge graph dynamic updating method is characterized by comprising the following steps:
analyzing data needing to be updated in power grid dispatching, and converting the data into a uniform format;
extracting the power grid dispatching data with the uniform format, and performing Chinese word segmentation;
recognizing entity words in the power grid dispatching data after Chinese word segmentation by adopting a trained named entity recognition model, and performing entity extraction;
extracting the relationship between the entities by adopting a trained entity relationship identification model to generate an entity-relationship-entity triple;
and updating the power grid dispatching knowledge graph based on the generated entity-relation-entity triad.
2. The method for dynamically updating the power grid dispatching knowledge graph according to claim 1, wherein the analyzing data to be updated in power grid dispatching and converting the data into a uniform format comprises:
for structured data derived from a power grid real-time database, directly generating entity-relation-entity triples to be stored in a knowledge graph by adopting a regularized extraction mode; the structured data includes: load prediction data, new energy prediction data and a safety constraint section;
traversing the text content of unstructured text data in a data cleaning and interface conversion mode, dividing the text according to a chapter structure, and converting a docx document into a Json file; the unstructured text data comprises: the system comprises a tie line plan, a maintenance plan, a power grid operation mode and a power grid abnormal event.
3. The method for dynamically updating the power grid dispatching knowledge graph according to claim 1, wherein the step of extracting the power grid dispatching data with the uniform format to perform Chinese word segmentation comprises the following steps:
constructing a domain entity dictionary;
and adding the domain entity dictionary into the jieba dictionary, and performing Chinese word segmentation on the power grid dispatching data.
4. The power grid dispatching knowledge graph dynamic updating method according to claim 3, wherein the domain entity dictionary comprises a core dictionary and an expansion dictionary, the core dictionary comprises standard words of power grid dispatching knowledge, and the expansion dictionary comprises expanded non-standard words of a power grid;
the constructing of the domain entity dictionary comprises the following steps:
adopting named entity recognition based on a deep neural network to automatically index a power grid entity, a scheduling event and power grid attributes;
classifying the power grid entity according to a transformer substation, a power plant, a circuit, a main transformer, a bus, a switch, a circuit breaker and a unit based on the relevance classification combined with the power business model;
verifying and matching the classified power grid entities based on a core dictionary;
and submitting the checked abnormal result to manual examination, and constructing an expansion dictionary.
5. The method for dynamically updating the power grid dispatching knowledge graph according to claim 3, wherein training the named entity recognition model comprises:
collecting power grid dispatching original text data and performing corpus segmentation;
labeling the divided corpora according to the power grid equipment, the scheduling event, the power grid attribute and the nonsense;
and (3) the labeled corpus is divided into 8: 1: 1, dividing the training set into a training set, a verification set and a test set;
training the training set based on a Chinese pre-training open source model RoBERTA _ base, and constructing a named entity recognition model.
6. The method for dynamically updating the power grid dispatching knowledge graph according to claim 5, wherein the collecting of the power grid dispatching original text data and the performing of the corpus segmentation comprises:
in the order of ",". ","; ",": "," ("and") "is used as a separator to cut the text data into blocks and generate corpus.
7. The method for dynamically updating the power grid dispatching knowledge graph according to claim 5, wherein the labeling of the segmented corpora according to power grid equipment, dispatching events, power grid attributes and nonsense comprises:
marking the corpus by taking Chinese characters and English words as a unit by adopting BIO marking, comprising the following steps:
three forms of B-X, I-X and O;
b and I represent the positions of the marker words in the corpus, B represents the beginning, I represents the middle, X is an entity symbol, the entity symbol of the power grid equipment is DD, the entity symbol of the scheduling event is DE, and the entity symbol of the power grid attribute is DA; o is a non-entity.
8. The method for dynamically updating the power grid dispatching knowledge graph according to claim 5, wherein the performing entity extraction comprises:
and mapping the entity words identified by the named entity identification model with the entity words in the domain entity dictionary one by one, and extracting the corresponding entity words in the domain entity dictionary.
9. The method for dynamically updating the power grid dispatching knowledge graph according to claim 1, wherein training the entity relationship recognition model comprises:
carrying out relation labeling on entities in the accumulated power grid dispatching plan text based on a predetermined global relation;
and training the labeled entity relationship based on the convolutional neural network to obtain an entity relationship recognition model.
10. A dynamic updating device of a power grid dispatching knowledge graph is characterized by comprising:
the analysis module is used for analyzing the data needing to be updated in the power grid dispatching and converting the data into a uniform format;
the word segmentation module is used for extracting the power grid dispatching data with the uniform format and carrying out Chinese word segmentation;
the recognition module is used for recognizing entity words in the power grid dispatching data after Chinese word segmentation by adopting a trained named entity recognition model and extracting the entities;
the relation extraction module is used for extracting the relation between the entities by adopting the trained entity relation recognition model to generate an entity-relation-entity triple;
and the number of the first and second groups,
and the updating module is used for updating the power grid dispatching knowledge graph based on the generated entity-relation-entity triad.
11. The dynamic power grid scheduling knowledge graph updating apparatus according to claim 10, wherein the parsing module is specifically configured to,
for structured data derived from a power grid real-time database, directly generating entity-relation-entity triples to be stored in a knowledge graph by adopting a regularized extraction mode; the structured data includes: load prediction data, new energy prediction data and a safety constraint section;
traversing the text content of unstructured text data in a data cleaning and interface conversion mode, dividing the text according to a chapter structure, and converting a docx document into a Json file; the unstructured text data comprises: the system comprises a tie line plan, a maintenance plan, a power grid operation mode and a power grid abnormal event.
12. The dynamic grid scheduling knowledge-graph updating apparatus according to claim 10, wherein the identification module is further configured to,
collecting power grid dispatching original text data and performing corpus segmentation;
labeling the divided corpora according to the power grid equipment, the scheduling event, the power grid attribute and the nonsense;
and (3) the labeled corpus is divided into 8: 1: 1, dividing the training set into a training set, a verification set and a test set;
training the training set based on a Chinese pre-training open source model RoBERTA _ base, and constructing a named entity recognition model.
13. The dynamic power grid dispatching knowledge-graph updating device according to claim 10, wherein the relationship extracting module is further configured to,
carrying out relation labeling on entities in the accumulated power grid dispatching plan text based on a predetermined global relation;
and training the labeled entity relationship based on the convolutional neural network to obtain an entity relationship recognition model.
CN202110196210.2A 2021-02-22 2021-02-22 Dynamic updating method and device for power grid dispatching knowledge graph Active CN112905804B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110196210.2A CN112905804B (en) 2021-02-22 2021-02-22 Dynamic updating method and device for power grid dispatching knowledge graph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110196210.2A CN112905804B (en) 2021-02-22 2021-02-22 Dynamic updating method and device for power grid dispatching knowledge graph

Publications (2)

Publication Number Publication Date
CN112905804A true CN112905804A (en) 2021-06-04
CN112905804B CN112905804B (en) 2022-08-26

Family

ID=76124250

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110196210.2A Active CN112905804B (en) 2021-02-22 2021-02-22 Dynamic updating method and device for power grid dispatching knowledge graph

Country Status (1)

Country Link
CN (1) CN112905804B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113342982A (en) * 2021-06-24 2021-09-03 中国科学技术大学智慧城市研究院(芜湖) Enterprise industry classification method integrating RoBERTA and external knowledge base
CN113434634A (en) * 2021-06-28 2021-09-24 国网北京市电力公司 Knowledge graph construction method and device
CN114386427A (en) * 2021-12-08 2022-04-22 国家电网有限公司西北分部 Semantic analysis-based power grid regulation unstructured table data extraction processing method and device and storage medium
CN114398880A (en) * 2021-12-06 2022-04-26 北京思特奇信息技术股份有限公司 System and method for optimizing Chinese word segmentation
CN114626367A (en) * 2022-03-11 2022-06-14 广东工业大学 Sentiment analysis method, system, equipment and medium based on news article content
CN115344717A (en) * 2022-10-18 2022-11-15 国网江西省电力有限公司电力科学研究院 Method and device for constructing regulation and control operation knowledge graph for multi-type energy supply and consumption system
CN115658931A (en) * 2022-12-27 2023-01-31 清华大学 Encyclopedic knowledge graph dynamic updating method, device, equipment and medium
CN117786126A (en) * 2023-12-28 2024-03-29 永信至诚科技集团股份有限公司 Knowledge graph-based naked-touch clue analysis method and device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111061841A (en) * 2019-12-19 2020-04-24 京东方科技集团股份有限公司 Knowledge graph construction method and device
CN111860882A (en) * 2020-06-17 2020-10-30 国网江苏省电力有限公司 Method and device for constructing power grid dispatching fault processing knowledge graph
CN111930784A (en) * 2020-07-23 2020-11-13 南京南瑞信息通信科技有限公司 Power grid knowledge graph construction method and system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111061841A (en) * 2019-12-19 2020-04-24 京东方科技集团股份有限公司 Knowledge graph construction method and device
CN111860882A (en) * 2020-06-17 2020-10-30 国网江苏省电力有限公司 Method and device for constructing power grid dispatching fault processing knowledge graph
CN111930784A (en) * 2020-07-23 2020-11-13 南京南瑞信息通信科技有限公司 Power grid knowledge graph construction method and system

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113342982A (en) * 2021-06-24 2021-09-03 中国科学技术大学智慧城市研究院(芜湖) Enterprise industry classification method integrating RoBERTA and external knowledge base
CN113342982B (en) * 2021-06-24 2023-07-25 长三角信息智能创新研究院 Enterprise industry classification method integrating Roberta and external knowledge base
CN113434634A (en) * 2021-06-28 2021-09-24 国网北京市电力公司 Knowledge graph construction method and device
CN114398880A (en) * 2021-12-06 2022-04-26 北京思特奇信息技术股份有限公司 System and method for optimizing Chinese word segmentation
CN114386427A (en) * 2021-12-08 2022-04-22 国家电网有限公司西北分部 Semantic analysis-based power grid regulation unstructured table data extraction processing method and device and storage medium
CN114626367A (en) * 2022-03-11 2022-06-14 广东工业大学 Sentiment analysis method, system, equipment and medium based on news article content
CN115344717A (en) * 2022-10-18 2022-11-15 国网江西省电力有限公司电力科学研究院 Method and device for constructing regulation and control operation knowledge graph for multi-type energy supply and consumption system
CN115344717B (en) * 2022-10-18 2023-02-17 国网江西省电力有限公司电力科学研究院 Method and device for constructing regulation and control operation knowledge graph for multi-type energy supply and consumption system
CN115658931A (en) * 2022-12-27 2023-01-31 清华大学 Encyclopedic knowledge graph dynamic updating method, device, equipment and medium
CN117786126A (en) * 2023-12-28 2024-03-29 永信至诚科技集团股份有限公司 Knowledge graph-based naked-touch clue analysis method and device

Also Published As

Publication number Publication date
CN112905804B (en) 2022-08-26

Similar Documents

Publication Publication Date Title
CN112905804B (en) Dynamic updating method and device for power grid dispatching knowledge graph
CN111860882B (en) Method and device for constructing power grid dispatching fault processing knowledge graph
CN111985653B (en) Power grid fault knowledge recommendation and knowledge management system and method based on knowledge graph
CN112527997B (en) Intelligent question-answering method and system based on power grid field scheduling scene knowledge graph
CN111274814B (en) Novel semi-supervised text entity information extraction method
CN109766416A (en) A kind of new energy policy information abstracting method and system
CN114077674A (en) Power grid dispatching knowledge graph data optimization method and system
CN110188345A (en) A kind of intelligent identification Method and device of electric power operation ticket
CN111814482A (en) Text key data extraction method and system and computer equipment
Kong et al. Entity extraction of electrical equipment malfunction text by a hybrid natural language processing algorithm
CN111177323A (en) Power failure plan unstructured data extraction and identification method based on artificial intelligence
CN114625837A (en) Intelligent operation and maintenance method and system for railway station interlocking system
CN113065580B (en) Power plant equipment management method and system based on multi-information fusion
CN117592482A (en) Operation ticket naming entity identification method based on BiLSTM+CRF model
CN111340253B (en) Analysis method and system for main network maintenance application form
Zhang et al. Defect diagnosis method of main transformer based on operation and maintenance text mining
CN112417852A (en) Method and device for judging importance of code segment
CN112036179A (en) Electric power plan information extraction method based on text classification and semantic framework
CN115409122A (en) Method, system, equipment and medium for analyzing concurrent faults of power transformation equipment
CN115563968A (en) Water and electricity transportation and inspection knowledge natural language artificial intelligence system and method
CN113987164A (en) Project studying and judging method and device based on domain event knowledge graph
CN113011183A (en) Unstructured text data processing method and system in electric power regulation and control field
Zhang et al. Research on knowledge graph construction for intelligent operation and maintenance of electrical transformers
CN111553158A (en) Method and system for identifying named entities in power scheduling field based on BilSTM-CRF model
Zheng et al. A CNN-Based Warning Information Preprocessing Method for Power Grid Fault Diagnosis Analytical Model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant