CN112905804A

CN112905804A - Dynamic updating method and device for power grid dispatching knowledge graph

Info

Publication number: CN112905804A
Application number: CN202110196210.2A
Authority: CN
Inventors: 旷文腾; 严晴; 李红; 张韬; 谢峰; 陆继翔; 杨志宏
Original assignee: Nari Technology Co Ltd; State Grid Electric Power Research Institute
Current assignee: Nari Technology Co Ltd; State Grid Electric Power Research Institute
Priority date: 2021-02-22
Filing date: 2021-02-22
Publication date: 2021-06-04
Anticipated expiration: 2041-02-22
Also published as: CN112905804B

Abstract

The invention discloses a dynamic updating method and a dynamic updating device for a power grid dispatching knowledge graph, which are used for solving the problem of synchronization of the power grid dispatching knowledge graph and a large amount of newly added power grid dispatching knowledge. The invention comprises the following steps: firstly, unifying data needing to be updated in power grid dispatching into a Json file format; then, Chinese word segmentation is carried out on the sentences by combining the word segmentation packets of the domain entity dictionary; then, recognizing the word of the power grid entity by using a named entity recognition model based on a RoBERTA _ base _ e-BilSTM-CRF model, and equivalently mapping the entity to a standard word of a power grid core dictionary; then, using the trained relation recognition model to recognize the relation between the entities, generating a triple and checking; and finally updating the generated triples into the knowledge graph. The method ensures the flexible adaptability and timeliness of the scheduling optimization decision map, and is beneficial to sharing and inheritance of scheduling knowledge and experience accumulated for a long time in the field of regulation and control decision.

Description

Dynamic updating method and device for power grid dispatching knowledge graph

Technical Field

The invention belongs to the technical field of power grid dispatching, and particularly relates to a dynamic updating method and device of a power grid dispatching knowledge graph.

Background

The active power scheduling of the power system is the basis for ensuring the safe and efficient operation of the system and comprises three links of day-ahead scheduling, day-in-day scheduling and real-time control. In the day-ahead and day-in stages, generally, with the operation economy in an optimization cycle as a target, according to a new energy and load prediction result, a multi-period Unit start-stop plan and a multi-period power generation plan are formulated through a Security-Constrained Unit requirement (SCUC) and a Security-Constrained Economic Dispatch (SCED) to realize supply and demand balance configuration, and the process follows an optimization modeling idea. Under the new trend of current energy source revolution and electric power market revolution, along with the continuous increase of resource permeability of renewable energy sources, flexible loads, energy storage and the like, the types and the number of power grid dispatching objects are exponentially increased, the uncertainty of a power grid operation mode is obviously increased, and dispatching optimization decisions are more complex. The method is limited by conditions such as prediction errors, boundary conditions, mathematical models and optimization algorithms, and the problems that the difference between an analysis result and the actual power grid condition is large, the optimization result is not solved or the solving time is too long and the like often occur in the actual scheduling. In the market environment, the problem of inaccurate prediction of new energy and load cannot be avoided (the prediction error of the new energy is as high as 30% -50%), and the above restriction factors still exist, so that a large amount of manual adjustment is needed before and after software optimization. Taking Ningxia power grid as an example, in recent years, under the influence of photovoltaic power price adjustment and Ningxia wind power early warning turning to 'green', Ningxia new energy is greatly increased, part of cross sections are close to stable limit operation for a long time, meanwhile, the problems of heating, maintenance and the like in winter are considered, power grid scheduling is not simple multi-target optimization calculation any more, but the process of manual reanalysis, adjustment and verification according to the calculation result of scheduling software is adopted, the process of manual decision-making usually takes a long time, the efficiency is low, and the complexity of optimal scheduling decision-making of a power system is suddenly increased.

Disclosure of Invention

The invention aims to overcome the defects in the prior art, provides a method and a device for dynamically updating a knowledge graph based on power grid dispatching, and solves the technical problem that the intellectualization level of a dispatching system in the prior art is not enough.

In order to solve the technical problems, the technical scheme adopted by the invention is as follows:

the embodiment of the invention provides a dynamic updating method of a power grid dispatching knowledge graph, which comprises the following steps:

analyzing data needing to be updated in power grid dispatching, and converting the data into a uniform format;

extracting the power grid dispatching data with the uniform format, and performing Chinese word segmentation;

recognizing entity words in the power grid dispatching data after Chinese word segmentation by adopting a trained named entity recognition model, and performing entity extraction;

extracting the relationship between the entities by adopting a trained entity relationship identification model to generate an entity-relationship-entity triple;

and updating the power grid dispatching knowledge graph based on the generated entity-relation-entity triad.

Further, the analyzing the data to be updated in the power grid dispatching and converting the data into a uniform format includes:

for structured data derived from a power grid real-time database, directly generating entity-relation-entity triples to be stored in a knowledge graph by adopting a regularized extraction mode; the structured data includes: load prediction data, new energy prediction data and a safety constraint section;

traversing the text content of unstructured text data in a data cleaning and interface conversion mode, dividing the text according to a chapter structure, and converting a docx document into a Json file; the unstructured text data comprises: the system comprises a tie line plan, a maintenance plan, a power grid operation mode and a power grid abnormal event.

Further, the extracting of the power grid dispatching data with the unified format for Chinese word segmentation includes:

constructing a domain entity dictionary;

and adding the domain entity dictionary into the jieba dictionary, and performing Chinese word segmentation on the power grid dispatching data.

Further, the domain entity dictionary comprises a core dictionary and an expansion dictionary, the core dictionary comprises standard words of power grid dispatching knowledge, and the expansion dictionary comprises expanded power grid non-standard words;

the constructing of the domain entity dictionary comprises the following steps:

adopting named entity recognition based on a deep neural network to automatically index a power grid entity, a scheduling event and power grid attributes;

classifying the power grid entity according to a transformer substation, a power plant, a circuit, a main transformer, a bus, a switch, a circuit breaker and a unit based on the relevance classification combined with the power business model;

verifying and matching the classified power grid entities based on a core dictionary;

and submitting the checked abnormal result to manual examination, and constructing an expansion dictionary.

Further, training the named entity recognition model comprises:

collecting power grid dispatching original text data and performing corpus segmentation;

labeling the divided corpora according to the power grid equipment, the scheduling event, the power grid attribute and the nonsense;

and (3) the labeled corpus is divided into 8: 1: 1, dividing the training set into a training set, a verification set and a test set;

training the training set based on a Chinese pre-training open source model RoBERTA _ base, and constructing a named entity recognition model.

Further, the collecting of the original text data for power grid dispatching and the corpus segmentation includes:

in the order of ",". ","; ",": "," ("and") "is used as a separator to cut the text data into blocks and generate corpus.

Further, the labeling the divided corpus according to the power grid equipment, the scheduling event, the power grid attribute and the nonsense includes:

marking the corpus by taking Chinese characters and English words as a unit by adopting BIO marking, comprising the following steps:

three forms of B-X, I-X and O;

b and I represent the positions of the marker words in the corpus, B represents the beginning, I represents the middle, X is an entity symbol, the entity symbol of the power grid equipment is DD, the entity symbol of the scheduling event is DE, and the entity symbol of the power grid attribute is DA; o is a non-entity.

Further, the performing entity extraction includes:

and mapping the entity words identified by the named entity identification model with the entity words in the domain entity dictionary one by one, and extracting the corresponding entity words in the domain entity dictionary.

Further, training the entity relationship recognition model comprises:

carrying out relation labeling on entities in the accumulated power grid dispatching plan text based on a predetermined global relation;

and training the labeled entity relationship based on the convolutional neural network to obtain an entity relationship recognition model.

The embodiment of the invention also provides a device for dynamically updating the power grid dispatching knowledge graph, which comprises the following steps:

the analysis module is used for analyzing the data needing to be updated in the power grid dispatching and converting the data into a uniform format;

the word segmentation module is used for extracting the power grid dispatching data with the uniform format and carrying out Chinese word segmentation;

the recognition module is used for recognizing entity words in the power grid dispatching data after Chinese word segmentation by adopting a trained named entity recognition model and extracting the entities;

the relation extraction module is used for extracting the relation between the entities by adopting the trained entity relation recognition model to generate an entity-relation-entity triple;

and the number of the first and second groups,

and the updating module is used for updating the power grid dispatching knowledge graph based on the generated entity-relation-entity triad.

Furthermore, the parsing module is specifically configured to,

Further, the identification module is further configured to,

Further, the relationship extraction module is further configured to,

The invention has the beneficial effects that:

the invention provides a knowledge graph dynamic updating method based on power grid scheduling, which is characterized in that data needing to be updated in the power grid scheduling is unified into a Json file format, Chinese word segmentation is carried out based on a domain entity dictionary, entity recognition is carried out based on a RoBERTA _ base _ e-BilSTM-CRF model, entity relation extraction is carried out based on a deep learning model, an entity-relation-entity triple is finally generated, the power grid scheduling knowledge graph is updated, the problem of automatic synchronous updating of the power grid scheduling knowledge graph and a large amount of newly added power grid scheduling knowledge is solved, the flexible adaptability and timeliness of a scheduling optimization decision graph are ensured, and the sharing and inheritance of scheduling knowledge and experience accumulated for a long time in the control decision field are facilitated.

Drawings

Fig. 1 is a schematic diagram of a dynamic update method of a power grid scheduling knowledge graph according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a topic dictionary construction provided by an embodiment of the present invention;

fig. 3 is a schematic diagram of an entity equivalence mapping process provided in an embodiment of the present invention.

Detailed Description

The invention is further described below. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.

The embodiment of the invention provides a dynamic updating method of a power grid dispatching knowledge graph, which is shown in figure 1 and comprises the following 5 steps:

(1) analyzing the source data: analyzing useful information for data needing to be updated in power grid dispatching, and uniformly processing the useful information into a Json file format;

the scheduling optimization decision relates to multiple heterogeneous data such as load prediction data, new energy prediction data, a tie line plan, an overhaul plan, a power grid operation mode, a safety constraint section and a power grid abnormal event, wherein the load prediction data, the new energy prediction data and the safety constraint section are structured data, and the tie line plan, the overhaul plan, the power grid operation mode and the power grid abnormal event are text unstructured data.

Structured data can be derived from a power grid real-time database, and the data is extracted regularly, directly generated into triples and stored in a knowledge graph.

Traversing the document contents of unstructured text data in a data cleaning and interface conversion technology mode, and then dividing the documents according to the chapter structure of each document, so that the docx documents are converted into Json files, and the problem that original data formats are not uniform is solved.

(2) Aiming at the extracted power grid dispatching data, Chinese word segmentation is carried out on the sentences by combining jieba word segmentation of a power grid field dictionary;

the domain entity dictionary comprises a core dictionary and an expansion dictionary, the core dictionary comprises standard words of power grid dispatching knowledge, such as electric power terms and power grid models, the expansion dictionary comprises power grid non-standard words of newly-added expansion data, such as alias/combined words of core dictionary words, for example, the combined words of 'Hades and Huang 4681/82 double lines' is 'Hades and Huang 4681 line' and 'Hades and Huang 4682 line'.

Firstly, the input sentence is subjected to Chinese word segmentation by using a jieba, the word segmentation uses an accurate word segmentation mode, and meanwhile, a field entity dictionary is added into the jieba dictionary, so that the entity words in the power grid field are not wrongly segmented.

The overall idea of domain entity dictionary construction is shown in fig. 2, including,

the method comprises the following steps of utilizing a named entity recognition technology based on a deep neural network to realize automatic indexing of power grid entities, events and attributes;

the classification of the power grid entity words by 8 concepts of transformer substations, power plants, lines, main transformers, buses, switches, circuit breakers and units is realized by utilizing the association degree classification technology combined with the power service model;

the correctness checking technology based on the core dictionary is utilized to realize the checking matching of the power grid entity words;

and finally submitting the checked abnormal result to manual examination, and constructing an expansion dictionary by the non-core dictionary words.

(3) Automatic identification and extraction of scheduling entities based on BERT semantic model

For the processed data with uniform format, the extraction of the scheduling entity in full time interval, automation and high precision is realized, and the specific steps are as follows:

(31) recognizing the power grid entity words by using the trained named entity recognition model,

the method uses a RoBERTA _ base (open source model based on large-scale Chinese universal corpus training) -BilSTM (bidirectional long-time memory) -CRF (conditional random field) model for named entity recognition. And carrying out entity recognition on the entity words after word segmentation by using a trained named entity recognition model RoBERTA _ base _ e, wherein the entity recognition in the dispatching field mainly recognizes a standard entity in the power dispatching field, and the entity classification is as follows: the system comprises power grid equipment, scheduling events, power grid attributes and no real meanings, wherein the parts of speech are respectively marked as DD, DE, DA and O, and the power grid equipment can be subdivided into the following classes: transformer substation, power plant, circuit, owner become, generating line, switch, circuit breaker, unit.

In order to be more suitable for understanding of a power grid scheduling text, on the basis of RoBERTA _ base, a large amount of power corpora are used for migration learning, parameters of a RoBERTA _ base model are finely adjusted, and a language model RoBERTA _ base _ e suitable for the power corpora is trained.

The steps for training RoBERTa _ base _ e are as follows:

a. original corpus segmentation: and manually reading the original text data, summarizing sentence cutting rules, and then segmenting the original text. In this embodiment, ". ","; ",": the symbol of "and" (",") is used as a separator to segment the original text and generate the corpus.

b. And (3) corpus labeling: the identification target is 10 types of transformer substation, power plant, line, main transformer, bus, switch, breaker, unit, scheduling event and power grid attribute. In order to ensure the identification precision of the training model, a transformer substation, a power plant, a line, a main transformer, a bus, a switch, a circuit breaker and a unit are classified as power grid equipment, a scheduling event, power grid attributes and no real meaning, and the corresponding symbols are DD (power grid equipment), DE (scheduling event), DA (power grid attribute) and O (no real meaning). The basic corpus is labeled with Chinese characters and English words as a unit by adopting BIO labeling. Representing X as an entity symbol, three labels for BIO are: B-X: an entity start; II, I-X: an entity middle; ③ O: non-entity, nonsense words. B and I represent the position of the marker in the word, B represents the beginning and I represents the middle.

The three categories correspond to the label tags as follows:

	switch label	Middle label
			Power grid entity	B-DD	I-DD
Scheduling events	B-DE	I-DE
			Grid properties	B-DA	I-DA

Each word is labeled in the statement, e.g., "east bridge becomes" a grid entity, "east" is labeled B-DD, and "bridge" and "become" are both labeled I-DD.

c. Named entity recognition model training

Firstly, RoBERTA _ base migration learning is carried out by using a large amount of electric power corpora, parameters of a RoBERTA _ base model are finely adjusted, and a language model RoBERTA _ base _ e suitable for the electric power corpora is trained. Wherein RoBERTA _ base is an open source model based on large-scale Chinese universal corpus training, and the electric corpus is from a prepared domain entity dictionary.

And then, the trained corpora are processed according to the following steps of 8: 1: 1, dividing the model into a training set, a verification set and a test set, using a language model RoBERTA _ base _ e for electric corpus understanding, and training a named entity recognition model RoBERTA _ base _ e-BilSTM-CRF by using the training set.

(32) Equivalently mapping the identified power grid entity words to standard words of a power grid core dictionary to complete entity extraction of updated data,

the purpose of the entity equivalence mapping is to map entity words identified by the named entities with entity words of a power grid dispatching field dictionary one by one, and the problem that the names of the same entities are inconsistent is solved. For 8 types of grid devices, a total of 12 rules for mapping are proposed.

The 12 rules are specifically:

1. the main transformer, the interval, the circuit breaker, the bus, the switch and the disconnecting link are firstly positioned on a plant station, and the rod and the tower are positioned on a line;

2. firstly, mapping is positioned to a station/line, and specific equipment is mapped according to (#1#2), equipment types and controlled objects (according to the incidence relation of the controlled objects);

3. the station distinguishing rule comprises the following words: transformation (without main transformer), wind power, nuclear power, power plant, factory, gas turbine and machine;

4. line distinguishing rules: according to the keyword at the tail of the word: wire, bi-wire, tri-wire;

5. the marks with the # are distinguished according to the #, and the front part of the # is a station/line;

6. fixed paradigm of the circuit: "Wu Bao 2W73 line" Xindu 5116 line ";

7. three paradigms of two-wire: "Wu Bao 2W73/74 line", "Xindu 5115/5116 line", "Zhou Zhuang-ren Zhuang double line";

8. three-line paradigm: "Wu Bao 2W73/74/75 line", "Xindu 5114/5115/5116 line", "Zhou Zhuang-ren Zhuang three line";

9. the substation contains three forms: full name, short name, and full process with path;

10. the place name can be mapped to a station according to semantics;

11. mapping to specific devices, the specific device identification being #1, #2(#1, 2) (No. 1);

12. the bus main transformer is provided with voltage grade information and can be used for processing the voltage grade information.

Taking the electrical defect log as an example, as shown in fig. 3, three two-line writing methods of the Hades and the Hades 4681/4682, the Hades and the yellow 4681/82, the Hades and the yellow are shown in three texts, based on the previously researched conclusion, the types of the electric network entity words are known, the line rules summarized in the corresponding 7 types of rules are utilized, the line is determined to be a single line, a double line or a three line by identifying the line rules, the double line is determined to be two lines, the combined title abbreviation of the two corresponding stations is found by identifying the combined title rules, the Hades and the yellow are found, the yellow and the yellow are found by word meaning matching, the two associated lines are found by combining a knowledge map, and the two lines are finally mapped to the core words, the Jiangsu, the yellow and the Hades 4681 line, the Jiangsu, the yellow and the.

And for the new entity words mapped to the core vocabulary, the entity words are added into the expansion dictionary after mapping, and the mapping relation is generated between the entity words and the corresponding core words, so that the subsequent entity mapping is facilitated.

(4) Deep learning model-based extraction of relationships between scheduling entities

Recognizing the relationship between the entities by using the trained relationship recognition model, generating an entity-relationship-entity triple and checking;

a. extracting all the relation pairs in the scheduling plan text, and summarizing the global relation;

b. establishing a relationship between scheduling entities by a relationship identification method facing the entities in the scheduling plan text based on the global entity relationship research conclusion;

c. and the relation check of the scheduling entity is realized based on the relation check of the knowledge graph, and the reliability of relation extraction is improved.

The global relationship extraction method for the scheduling plan text data is based on a natural language processing technology, and is combined with the morphological characteristics and the semantic characteristics to automatically identify the relationship of entities in a corpus without manually establishing relationship types in advance. The main process is as follows:

(1) firstly, automatically acquiring candidate relation triples by using distance limits between entities and position limits of relation indicator words, and training the relation triples marked with credibility and incredibility by using a naive Bayes classifier to construct a relation representation model;

(2) performing relation identification on the trained classifier by using a relation representation model and data such as characteristics of parts of speech, sequences and the like to obtain a candidate relation triple;

(3) and combining the candidate relation triples, calculating the reliability of each relation triplet through a statistical method, and manually checking.

(4) And manually determining the relationship type of the triples, and summarizing the global relationship.

And based on the known global entity relationship set, researching a relationship identification method of the scheduling entities in the scheduling plan text, and establishing the relationship between the scheduling entities. The relationship identification method adopts a supervised learning method and mainly comprises the following steps:

labeling entity relations in the accumulated scheduling plan texts based on the global relations;

training a relation recognition model based on a convolutional neural network based on the labeled corpus;

and the method is used for the relationship recognition of the scheduling entity of the unlabeled text based on the trained relationship recognition model.

And finally, checking the entity relationship by combining a knowledge graph based on the recognition result of the entity relationship, and ensuring the consistency of the entity relationship. And checking whether the relation type exists in a relation set of the concept map or not by combining the concept map based on the extracted relation type, and prompting manual review if the relation type does not exist. And if the extracted relationship type exists in the relationship set of the concept map, searching a matched entity in the map through the entity semantic features, and if the matched entity relationship is inconsistent, prompting manual review.

(5) Updating power grid dispatching knowledge graph based on triad

And generating an entity-relationship-entity or entity-relationship-attribute triple after the examination as metadata of map updating, and storing the triple into the previously constructed knowledge map.

The storage database adopts an open source database Neo4 j.

and the number of the first and second groups,

Specifically, the parsing module is used for,

In particular, the identification module is further configured to,

In particular, the relationship extraction module is further configured to,

It is to be noted that the apparatus embodiment corresponds to the method embodiment, and the implementation manners of the method embodiment are all applicable to the apparatus embodiment and can achieve the same or similar technical effects, so that the details are not described herein.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting the same, and although the present invention is described in detail with reference to the above embodiments, those of ordinary skill in the art should understand that: modifications and equivalents may be made to the embodiments of the invention without departing from the spirit and scope of the invention, which is to be covered by the claims.

Claims

1. A power grid dispatching knowledge graph dynamic updating method is characterized by comprising the following steps:

2. The method for dynamically updating the power grid dispatching knowledge graph according to claim 1, wherein the analyzing data to be updated in power grid dispatching and converting the data into a uniform format comprises:

3. The method for dynamically updating the power grid dispatching knowledge graph according to claim 1, wherein the step of extracting the power grid dispatching data with the uniform format to perform Chinese word segmentation comprises the following steps:

constructing a domain entity dictionary;

4. The power grid dispatching knowledge graph dynamic updating method according to claim 3, wherein the domain entity dictionary comprises a core dictionary and an expansion dictionary, the core dictionary comprises standard words of power grid dispatching knowledge, and the expansion dictionary comprises expanded non-standard words of a power grid;

the constructing of the domain entity dictionary comprises the following steps:

5. The method for dynamically updating the power grid dispatching knowledge graph according to claim 3, wherein training the named entity recognition model comprises:

6. The method for dynamically updating the power grid dispatching knowledge graph according to claim 5, wherein the collecting of the power grid dispatching original text data and the performing of the corpus segmentation comprises:

7. The method for dynamically updating the power grid dispatching knowledge graph according to claim 5, wherein the labeling of the segmented corpora according to power grid equipment, dispatching events, power grid attributes and nonsense comprises:

three forms of B-X, I-X and O;

8. The method for dynamically updating the power grid dispatching knowledge graph according to claim 5, wherein the performing entity extraction comprises:

9. The method for dynamically updating the power grid dispatching knowledge graph according to claim 1, wherein training the entity relationship recognition model comprises:

10. A dynamic updating device of a power grid dispatching knowledge graph is characterized by comprising:

and the number of the first and second groups,

11. The dynamic power grid scheduling knowledge graph updating apparatus according to claim 10, wherein the parsing module is specifically configured to,

12. The dynamic grid scheduling knowledge-graph updating apparatus according to claim 10, wherein the identification module is further configured to,

13. The dynamic power grid dispatching knowledge-graph updating device according to claim 10, wherein the relationship extracting module is further configured to,